Sample records for gene set test

  1. An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

    PubMed

    Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H

    2015-01-01

    Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

  2. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  3. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  4. A support vector machine based test for incongruence between sets of trees in tree space

    PubMed Central

    2012-01-01

    Background The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. Results Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. Conclusions The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. PMID:22909268

  5. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.

    PubMed

    Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A

    2013-01-29

    Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  6. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878

  7. Computation and application of tissue-specific gene set weights.

    PubMed

    Frost, H Robert

    2018-04-06

    Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.

  8. Ranking metrics in gene set enrichment analysis: do they matter?

    PubMed

    Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

    2017-05-12

    There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

  9. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    PubMed

    Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin

    2011-04-14

    High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.

  11. Accurately Assessing the Risk of Schizophrenia Conferred by Rare Copy-Number Variation Affecting Genes with Brain Function

    PubMed Central

    Raychaudhuri, Soumya; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David; Sklar, Pamela; Purcell, Shaun; Daly, Mark J.

    2010-01-01

    Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package. PMID:20838587

  12. Inference of Evolutionary Forces Acting on Human Biological Pathways

    PubMed Central

    Daub, Josephine T.; Dupanloup, Isabelle; Robinson-Rechavi, Marc; Excoffier, Laurent

    2015-01-01

    Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures. PMID:25971280

  13. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    PubMed Central

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775

  14. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  15. Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma

    PubMed Central

    2013-01-01

    Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432

  16. Examination of the Involvement of Cholinergic-Associated Genes in Nicotine Behaviors in European and African Americans.

    PubMed

    Melroy-Greif, Whitney E; Simonson, Matthew A; Corley, Robin P; Lutz, Sharon M; Hokanson, John E; Ehringer, Marissa A

    2017-04-01

    Cigarette smoking is a physiologically harmful habit. Nicotinic acetylcholine receptors (nAChRs) are bound by nicotine and upregulated in response to chronic exposure to nicotine. It is known that upregulation of these receptors is not due to a change in mRNA of these genes, however, more precise details on the process are still uncertain, with several plausible hypotheses describing how nAChRs are upregulated. We have manually curated a set of genes believed to play a role in nicotine-induced nAChR upregulation. Here, we test the hypothesis that these genes are associated with and contribute risk for nicotine dependence (ND) and the number of cigarettes smoked per day (CPD). Studies with genotypic data on European and African Americans (EAs and AAs, respectively) were collected and a gene-based test was run to test for an association between each gene and ND and CPD. Although several novel genes were associated with CPD and ND at P < 0.05 in EAs and AAs, these associations did not survive correction for multiple testing. Previous associations between CHRNA3, CHRNA5, CHRNB4 and CPD in EAs were replicated. Our hypothesis-driven approach avoided many of the limitations inherent in pathway analyses and provided nominal evidence for association between cholinergic-related genes and nicotine behaviors. We evaluated the evidence for association between a manually curated set of genes and nicotine behaviors in European and African Americans. Although no genes were associated after multiple testing correction, this study has several strengths: by manually curating a set of genes we circumvented the limitations inherent in many pathway analyses and tested several genes that had not yet been examined in a human genetic study; gene-based tests are a useful way to test for association with a set of genes; and these genes were collected based on literature review and conversations with experts, highlighting the importance of scientific collaboration. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories

    PubMed Central

    2014-01-01

    Background In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. Results We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. Conclusions The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. PMID:24731138

  18. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    PubMed

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  19. NEAT: an efficient network enrichment analysis test.

    PubMed

    Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C

    2016-09-05

    Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).

  20. Investigating the different mechanisms of genotoxic and non-genotoxic carcinogens by a gene set analysis.

    PubMed

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways.

  1. Investigating the Different Mechanisms of Genotoxic and Non-Genotoxic Carcinogens by a Gene Set Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways. PMID:24497971

  2. A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning

    PubMed Central

    2018-01-01

    Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968

  3. Beyond main effects of gene-sets: harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior.

    PubMed

    Windhorst, Dafna A; Mileva-Seitz, Viara R; Rippe, Ralph C A; Tiemeier, Henning; Jaddoe, Vincent W V; Verhulst, Frank C; van IJzendoorn, Marinus H; Bakermans-Kranenburg, Marian J

    2016-08-01

    In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome-wide environmental interaction (GWEI) studies in the search for genetic variation underlying individual differences in behavior. Genetic variants in 12 autosomal dopaminergic genes were available in an ethnically homogenous part of a population-based cohort. Harsh parenting was assessed with maternal (n = 1881) and paternal (n = 1710) reports at age 3. Externalizing behavior was assessed with the Child Behavior Checklist (CBCL) at age 5 (71 ± 3.7 months). We conducted gene-set analyses of the association between variation in dopaminergic genes and externalizing behavior, stratified for harsh parenting. The association was statistically significant or approached significance for children without harsh parenting experiences, but was absent in the group with harsh parenting. Similarly, significant associations between single genes and externalizing behavior were only found in the group without harsh parenting. Effect sizes in the groups with and without harsh parenting did not differ significantly. Gene-environment interaction tests were conducted for individual genetic variants, resulting in two significant interaction effects (rs1497023 and rs4922132) after correction for multiple testing. Our findings are suggestive of G × E interplay, with associations between dopamine genes and externalizing behavior present in children without harsh parenting, but not in children with harsh parenting experiences. Harsh parenting may overrule the role of genetic factors in externalizing behavior. Gene-based and gene-set analyses offer promising new alternatives to analyses focusing on single candidate polymorphisms when examining the interplay between genetic and environmental factors.

  4. Genotypic and Phenotypic Detection of AmpC β-lactamases in Enterobacter spp. Isolated from a Teaching Hospital in Malaysia.

    PubMed

    Mohd Khari, Fatin Izzati; Karunakaran, Rina; Rosli, Roshalina; Tee Tay, Sun

    2016-01-01

    The objective of this study was to determine the occurrence of chromosomal and plasmid-mediated β-lactamases (AmpC) genes in a collection of Malaysian isolates of Enterobacter species. Several phenotypic tests for detection of AmpC production of Enterobacter spp. were evaluated and the agreements between tests were determined. Antimicrobial susceptibility profiles for 117 Enterobacter clinical isolates obtained from the Medical Microbiology Diagnostic Laboratory, University Malaya Medical Centre, Malaysia, from November 2012-February 2014 were determined in accordance to CLSI guidelines. AmpC genes were detected using a multiplex PCR assay targeting the MIR/ACT gene (closely related to chromosomal EBC family gene) and other plasmid-mediated genes, including DHA, MOX, CMY, ACC, and FOX. The AmpC β-lactamase production of the isolates was assessed using cefoxitin disk screening test, D69C AmpC detection set, cefoxitin-cloxacillin double disk synergy test (CC-DDS) and AmpC induction test. Among the Enterobacter isolates in this study, 39.3% were resistant to cefotaxime and ceftriaxone and 23.9% were resistant to ceftazidime. Ten (8.5%) of the isolates were resistant to cefepime, and one isolate was resistant to meropenem. Chromosomal EBC family gene was amplified from 36 (47.4%) E. cloacae and three (25%) E. asburiae. A novel blaDHA type plasmid-mediated AmpC gene was identified for the first time from an E. cloacae isolate. AmpC β-lactamase production was detected in 99 (89.2%) of 111 potential AmpC β-lactamase producers (positive in cefoxitin disk screening) using D69C AmpC detection set. The detection rates were lower with CC-DDS (80.2%) and AmpC induction tests (50.5%). There was low agreement between the D69C AmpC detection set and the other two phenotypic tests. Of the 40 isolates with AmpC genes detected in this study, 87.5%, 77.5% and 50.0% of these isolates were positive by the D69C AmpC detection set, CC-DDS and AmpC induction tests, respectively. Besides MIR/ACT gene, a novel plasmid-mediated AmpC gene belonging to the DHA-type was identified in this study. Low agreement was noted between the D69C AmpC detection set and two other phenotypic tests for detection of AmpC production in Enterobacter spp. As plasmid-mediated genes may serve as the reservoir for the emergence of antibiotic resistance in a clinical setting, surveillance and infection control measures are necessary to limit the spread of these genes in the hospital.

  5. Genotypic and Phenotypic Detection of AmpC β-lactamases in Enterobacter spp. Isolated from a Teaching Hospital in Malaysia

    PubMed Central

    Mohd Khari, Fatin Izzati; Karunakaran, Rina; Rosli, Roshalina; Tee Tay, Sun

    2016-01-01

    Objectives The objective of this study was to determine the occurrence of chromosomal and plasmid-mediated β-lactamases (AmpC) genes in a collection of Malaysian isolates of Enterobacter species. Several phenotypic tests for detection of AmpC production of Enterobacter spp. were evaluated and the agreements between tests were determined. Methods Antimicrobial susceptibility profiles for 117 Enterobacter clinical isolates obtained from the Medical Microbiology Diagnostic Laboratory, University Malaya Medical Centre, Malaysia, from November 2012—February 2014 were determined in accordance to CLSI guidelines. AmpC genes were detected using a multiplex PCR assay targeting the MIR/ACT gene (closely related to chromosomal EBC family gene) and other plasmid-mediated genes, including DHA, MOX, CMY, ACC, and FOX. The AmpC β-lactamase production of the isolates was assessed using cefoxitin disk screening test, D69C AmpC detection set, cefoxitin-cloxacillin double disk synergy test (CC-DDS) and AmpC induction test. Results Among the Enterobacter isolates in this study, 39.3% were resistant to cefotaxime and ceftriaxone and 23.9% were resistant to ceftazidime. Ten (8.5%) of the isolates were resistant to cefepime, and one isolate was resistant to meropenem. Chromosomal EBC family gene was amplified from 36 (47.4%) E. cloacae and three (25%) E. asburiae. A novel blaDHA type plasmid-mediated AmpC gene was identified for the first time from an E. cloacae isolate. AmpC β-lactamase production was detected in 99 (89.2%) of 111 potential AmpC β-lactamase producers (positive in cefoxitin disk screening) using D69C AmpC detection set. The detection rates were lower with CC-DDS (80.2%) and AmpC induction tests (50.5%). There was low agreement between the D69C AmpC detection set and the other two phenotypic tests. Of the 40 isolates with AmpC genes detected in this study, 87.5%, 77.5% and 50.0% of these isolates were positive by the D69C AmpC detection set, CC-DDS and AmpC induction tests, respectively. Conclusions Besides MIR/ACT gene, a novel plasmid-mediated AmpC gene belonging to the DHA-type was identified in this study. Low agreement was noted between the D69C AmpC detection set and two other phenotypic tests for detection of AmpC production in Enterobacter spp. As plasmid-mediated genes may serve as the reservoir for the emergence of antibiotic resistance in a clinical setting, surveillance and infection control measures are necessary to limit the spread of these genes in the hospital. PMID:26963619

  6. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  7. Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.

    PubMed

    Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping

    2016-08-26

    Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.

  8. gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2017-05-01

    Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.

  9. A statistical approach to identify, monitor, and manage incomplete curated data sets.

    PubMed

    Howe, Douglas G

    2018-04-02

    Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to identify data sets that are incompletely curated, as demonstrated using the gene expression data set from ZFIN. This information can help both database resources and data consumers gauge when it may be useful to look further for published data to augment the existing expertly curated information.

  10. The limitations of simple gene set enrichment analysis assuming gene independence.

    PubMed

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.

  11. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.

    PubMed

    Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

    2011-01-01

    Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.

  12. Novel gene sets improve set-level classification of prokaryotic gene expression data.

    PubMed

    Holec, Matěj; Kuželka, Ondřej; Železný, Filip

    2015-10-28

    Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

  13. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    PubMed

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  14. A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

    PubMed Central

    Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378

  15. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGES

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  16. Distributional fold change test – a statistical approach for detecting differential expression in microarray experiments

    PubMed Central

    2012-01-01

    Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set. PMID:23122055

  17. GARNET--gene set analysis with exploration of annotation relations.

    PubMed

    Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

    2011-02-15

    Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).

  18. GeneTools--application for functional annotation and statistical hypothesis testing.

    PubMed

    Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid

    2006-10-24

    Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no

  19. A cis-regulatory logic simulator.

    PubMed

    Zeigler, Robert D; Gertz, Jason; Cohen, Barak A

    2007-07-27

    A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence. We developed a gene expression simulator which generates expression data using user-defined interactions between cis-regulatory sites. The simulator can incorporate additive, cooperative, competitive, and synergistic interactions between regulatory elements. Constraints on the spacing, distance, and orientation of regulatory elements and their interactions may also be defined and Gaussian noise can be added to the expression values. The simulator allows for a data transformation that simulates the sigmoid shape of expression levels from real promoters. We found good agreement between sets of simulated promoters and predicted regulatory modules from real expression data. We present several data sets that may be useful for testing new methodologies for predicting gene expression from promoter sequence. We developed a flexible gene expression simulator that rapidly generates large numbers of simulated promoters and their corresponding transcriptional output based on specified interactions between cis-regulatory sites. When appropriate rule sets are used, the data generated by our simulator faithfully reproduces experimentally derived data sets. We anticipate that using simulated gene expression data sets will facilitate the direct comparison of computational strategies to predict gene expression from promoter sequence. The source code is available online and as additional material. The test sets are available as additional material.

  20. Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans

    PubMed Central

    Moretti, S.; Davydov, I.I.; Excoffier, L.

    2017-01-01

    Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345

  1. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI.

    PubMed

    Vimaleswaran, Karani S; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N; Dudbridge, Frank; Loos, Ruth J F

    2012-10-15

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.

  2. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification.

    PubMed

    Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino

    2014-01-30

    The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.

  3. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

    PubMed Central

    Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

    2016-01-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405

  4. Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

    PubMed

    Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

    2015-11-01

    Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.

  5. OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines.

    PubMed

    Chen, Wei-Hua; Lu, Guanting; Chen, Xiao; Zhao, Xing-Ming; Bork, Peer

    2017-01-04

    OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Gene set analysis of purine and pyrimidine antimetabolites cancer therapies.

    PubMed

    Fridley, Brooke L; Batzler, Anthony; Li, Liang; Li, Fang; Matimba, Alice; Jenkins, Gregory D; Ji, Yuan; Wang, Liewei; Weinshilboum, Richard M

    2011-11-01

    Responses to therapies, either with regard to toxicities or efficacy, are expected to involve complex relationships of gene products within the same molecular pathway or functional gene set. Therefore, pathways or gene sets, as opposed to single genes, may better reflect the true underlying biology and may be more appropriate units for analysis of pharmacogenomic studies. Application of such methods to pharmacogenomic studies may enable the detection of more subtle effects of multiple genes in the same pathway that may be missed by assessing each gene individually. A gene set analysis of 3821 gene sets is presented assessing the association between basal messenger RNA expression and drug cytotoxicity using ethnically defined human lymphoblastoid cell lines for two classes of drugs: pyrimidines [gemcitabine (dFdC) and arabinoside] and purines [6-thioguanine and 6-mercaptopurine]. The gene set nucleoside-diphosphatase activity was found to be significantly associated with both dFdC and arabinoside, whereas gene set γ-aminobutyric acid catabolic process was associated with dFdC and 6-thioguanine. These gene sets were significantly associated with the phenotype even after adjusting for multiple testing. In addition, five associated gene sets were found in common between the pyrimidines and two gene sets for the purines (3',5'-cyclic-AMP phosphodiesterase activity and γ-aminobutyric acid catabolic process) with a P value of less than 0.0001. Functional validation was attempted with four genes each in gene sets for thiopurine and pyrimidine antimetabolites. All four genes selected from the pyrimidine gene sets (PSME3, CANT1, ENTPD6, ADRM1) were validated, but only one (PDE4D) was validated for the thiopurine gene sets. In summary, results from the gene set analysis of pyrimidine and purine therapies, used often in the treatment of various cancers, provide novel insight into the relationship between genomic variation and drug response.

  7. Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis.

    PubMed

    de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Verheijen, Mark H G; Posthuma, Danielle

    2015-11-01

    Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis.

  8. Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis

    PubMed Central

    de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Scharf, J M; Pauls, D L; Yu, D; Illmann, C; Osiecki, L; Neale, B M; Mathews, C A; Reus, V I; Lowe, T L; Freimer, N B; Cox, N J; Davis, L K; Rouleau, G A; Chouinard, S; Dion, Y; Girard, S; Cath, D C; Posthuma, D; Smit, J H; Heutink, P; King, R A; Fernandez, T; Leckman, J F; Sandor, P; Barr, C L; McMahon, W; Lyon, G; Leppert, M; Morgan, J; Weiss, R; Grados, M A; Singer, H; Jankovic, J; Tischfield, J A; Heiman, G A; Verheijen, Mark H G; Posthuma, Danielle

    2015-01-01

    Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis. PMID:25735483

  9. COMPADRE: an R and web resource for pathway activity analysis by component decompositions.

    PubMed

    Ramos-Rodriguez, Roberto-Rafael; Cuevas-Diaz-Duran, Raquel; Falciani, Francesco; Tamez-Peña, Jose-Gerardo; Trevino, Victor

    2012-10-15

    The analysis of biological networks has become essential to study functional genomic data. Compadre is a tool to estimate pathway/gene sets activity indexes using sub-matrix decompositions for biological networks analyses. The Compadre pipeline also includes one of the direct uses of activity indexes to detect altered gene sets. For this, the gene expression sub-matrix of a gene set is decomposed into components, which are used to test differences between groups of samples. This procedure is performed with and without differentially expressed genes to decrease false calls. During this process, Compadre also performs an over-representation test. Compadre already implements four decomposition methods [principal component analysis (PCA), Isomaps, independent component analysis (ICA) and non-negative matrix factorization (NMF)], six statistical tests (t- and f-test, SAM, Kruskal-Wallis, Welch and Brown-Forsythe), several gene sets (KEGG, BioCarta, Reactome, GO and MsigDB) and can be easily expanded. Our simulation results shown in Supplementary Information suggest that Compadre detects more pathways than over-representation tools like David, Babelomics and Webgestalt and less false positives than PLAGE. The output is composed of results from decomposition and over-representation analyses providing a more complete biological picture. Examples provided in Supplementary Information show the utility, versatility and simplicity of Compadre for analyses of biological networks. Compadre is freely available at http://bioinformatica.mty.itesm.mx:8080/compadre. The R package is also available at https://sourceforge.net/p/compadre.

  10. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    PubMed

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  11. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  12. ArrayVigil: a methodology for statistical comparison of gene signatures using segregated-one-tailed (SOT) Wilcoxon's signed-rank test.

    PubMed

    Khan, Haseeb Ahmad

    2005-01-28

    Due to versatile diagnostic and prognostic fidelity molecular signatures or fingerprints are anticipated as the most powerful tools for cancer management in the near future. Notwithstanding the experimental advancements in microarray technology, methods for analyzing either whole arrays or gene signatures have not been firmly established. Recently, an algorithm, ArraySolver has been reported by Khan for two-group comparison of microarray gene expression data using two-tailed Wilcoxon signed-rank test. Most of the molecular signatures are composed of two sets of genes (hybrid signatures) wherein up-regulation of one set and down-regulation of the other set collectively define the purpose of a gene signature. Since the direction of a selected gene's expression (positive or negative) with respect to a particular disease condition is known, application of one-tailed statistics could be a more relevant choice. A novel method, ArrayVigil, is described for comparing hybrid signatures using segregated-one-tailed (SOT) Wilcoxon signed-rank test and the results compared with integrated-two-tailed (ITT) procedures (SPSS and ArraySolver). ArrayVigil resulted in lower P values than those obtained from ITT statistics while comparing real data from four signatures.

  13. A long non-coding RNA expression profile can predict early recurrence in hepatocellular carcinoma after curative resection.

    PubMed

    Lv, Yufeng; Wei, Wenhao; Huang, Zhong; Chen, Zhichao; Fang, Yuan; Pan, Lili; Han, Xueqiong; Xu, Zihai

    2018-06-20

    The aim of this study was to develop a novel long non-coding RNA (lncRNA) expression signature to accurately predict early recurrence for patients with hepatocellular carcinoma (HCC) after curative resection. Using expression profiles downloaded from The Cancer Genome Atlas database, we identified multiple lncRNAs with differential expression between early recurrence (ER) group and non-early recurrence (non-ER) group of HCC. Least absolute shrinkage and selection operator (LASSO) for logistic regression models were used to develop a lncRNA-based classifier for predicting ER in the training set. An independent test set was used to validated the predictive value of this classifier. Futhermore, a co-expression network based on these lncRNAs and its highly related genes was constructed and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of genes in the network were performed. We identified 10 differentially expressed lncRNAs, including 3 that were upregulated and 7 that were downregulated in ER group. The lncRNA-based classifier was constructed based on 7 lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861 and LINC02084), and its accuracy was 0.83 in training set, 0.87 in test set and 0.84 in total set. And ROC curve analysis showed the AUROC was 0.741 in training set, 0.824 in the test set and 0.765 in total set. A functional enrichment analysis suggested that the genes of which is highly related to 4 lncRNAs were involved in immune system. This 7-lncRNA expression profile can effectively predict the early recurrence after surgical resection for HCC. This article is protected by copyright. All rights reserved.

  14. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks.

    PubMed

    Maere, Steven; Heymans, Karel; Kuiper, Martin

    2005-08-15

    The Biological Networks Gene Ontology tool (BiNGO) is an open-source Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks visualized in Cytoscape. BiNGO maps the predominant functional themes of the tested gene set on the GO hierarchy, and takes advantage of Cytoscape's versatile visualization environment to produce an intuitive and customizable visual representation of the results.

  15. Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

    PubMed Central

    Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

    2005-01-01

    We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308

  16. QSAR Study for Carcinogenic Potency of Aromatic Amines Based on GEP and MLPs

    PubMed Central

    Song, Fucheng; Zhang, Anling; Liang, Hui; Cui, Lianhua; Li, Wenlian; Si, Hongzong; Duan, Yunbo; Zhai, Honglin

    2016-01-01

    A new analysis strategy was used to classify the carcinogenicity of aromatic amines. The physical-chemical parameters are closely related to the carcinogenicity of compounds. Quantitative structure activity relationship (QSAR) is a method of predicting the carcinogenicity of aromatic amine, which can reveal the relationship between carcinogenicity and physical-chemical parameters. This study accessed gene expression programming by APS software, the multilayer perceptrons by Weka software to predict the carcinogenicity of aromatic amines, respectively. All these methods relied on molecular descriptors calculated by CODESSA software and eight molecular descriptors were selected to build function equations. As a remarkable result, the accuracy of gene expression programming in training and test sets are 0.92 and 0.82, the accuracy of multilayer perceptrons in training and test sets are 0.84 and 0.74 respectively. The precision of the gene expression programming is obviously superior to multilayer perceptrons both in training set and test set. The QSAR application in the identification of carcinogenic compounds is a high efficiency method. PMID:27854309

  17. Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

    PubMed Central

    Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco

    2008-01-01

    Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways. PMID:18764936

  18. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

    PubMed Central

    Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe

    2015-01-01

    Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374

  19. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  20. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  1. Performance of amplicon-based next generation DNA sequencing for diagnostic gene mutation profiling in oncopathology.

    PubMed

    Sie, Daoud; Snijders, Peter J F; Meijer, Gerrit A; Doeleman, Marije W; van Moorsel, Marinda I H; van Essen, Hendrik F; Eijk, Paul P; Grünberg, Katrien; van Grieken, Nicole C T; Thunnissen, Erik; Verheul, Henk M; Smit, Egbert F; Ylstra, Bauke; Heideman, Daniëlle A M

    2014-10-01

    Next generation DNA sequencing (NGS) holds promise for diagnostic applications, yet implementation in routine molecular pathology practice requires performance evaluation on DNA derived from routine formalin-fixed paraffin-embedded (FFPE) tissue specimens. The current study presents a comprehensive analysis of TruSeq Amplicon Cancer Panel-based NGS using a MiSeq Personal sequencer (TSACP-MiSeq-NGS) for somatic mutation profiling. TSACP-MiSeq-NGS (testing 212 hotspot mutation amplicons of 48 genes) and a data analysis pipeline were evaluated in a retrospective learning/test set approach (n = 58/n = 45 FFPE-tumor DNA samples) against 'gold standard' high-resolution-melting (HRM)-sequencing for the genes KRAS, EGFR, BRAF and PIK3CA. Next, the performance of the validated test algorithm was assessed in an independent, prospective cohort of FFPE-tumor DNA samples (n = 75). In the learning set, a number of minimum parameter settings was defined to decide whether a FFPE-DNA sample is qualified for TSACP-MiSeq-NGS and for calling mutations. The resulting test algorithm revealed 82% (37/45) compliance to the quality criteria and 95% (35/37) concordant assay findings for KRAS, EGFR, BRAF and PIK3CA with HRM-sequencing (kappa = 0.92; 95% CI = 0.81-1.03) in the test set. Subsequent application of the validated test algorithm to the prospective cohort yielded a success rate of 84% (63/75), and a high concordance with HRM-sequencing (95% (60/63); kappa = 0.92; 95% CI = 0.84-1.01). TSACP-MiSeq-NGS detected 77 mutations in 29 additional genes. TSACP-MiSeq-NGS is suitable for diagnostic gene mutation profiling in oncopathology.

  2. In silico experiment system for testing hypothesis on gene functions using three condition specific biological networks.

    PubMed

    Lee, Chai-Jin; Kang, Dongwon; Lee, Sangseon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2018-05-25

    Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. http://biohealth.snu.ac.kr/software/insilico/. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families.

    PubMed

    Wang, Yi-Ting; Sung, Pei-Yuan; Lin, Peng-Lin; Yu, Ya-Wen; Chung, Ren-Hua

    2015-05-15

    Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set. We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value=2.5×10(-6)). Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.

  4. Correcting Systematic Inflation in Genetic Association Tests That Consider Interaction Effects

    PubMed Central

    Almli, Lynn M.; Duncan, Richard; Feng, Hao; Ghosh, Debashis; Binder, Elisabeth B.; Bradley, Bekh; Ressler, Kerry J.; Conneely, Karen N.; Epstein, Michael P.

    2015-01-01

    IMPORTANCE Genetic association studies of psychiatric outcomes often consider interactions with environmental exposures and, in particular, apply tests that jointly consider gene and gene-environment interaction effects for analysis. Using a genome-wide association study (GWAS) of posttraumatic stress disorder (PTSD), we report that heteroscedasticity (defined as variability in outcome that differs by the value of the environmental exposure) can invalidate traditional joint tests of gene and gene-environment interaction. OBJECTIVES To identify the cause of bias in traditional joint tests of gene and gene-environment interaction in a PTSD GWAS and determine whether proposed robust joint tests are insensitive to this problem. DESIGN, SETTING, AND PARTICIPANTS The PTSD GWAS data set consisted of 3359 individuals (978 men and 2381 women) from the Grady Trauma Project (GTP), a cohort study from Atlanta, Georgia. The GTP performed genome-wide genotyping of participants and collected environmental exposures using the Childhood Trauma Questionnaire and Trauma Experiences Inventory. MAIN OUTCOMES AND MEASURES We performed joint interaction testing of the Beck Depression Inventory and modified PTSD Symptom Scale in the GTP GWAS. We assessed systematic bias in our interaction analyses using quantile-quantile plots and genome-wide inflation factors. RESULTS Application of the traditional joint interaction test to the GTP GWAS yielded systematic inflation across different outcomes and environmental exposures (inflation-factor estimates ranging from 1.07 to 1.21), whereas application of the robust joint test to the same data set yielded no such inflation (inflation-factor estimates ranging from 1.01 to 1.02). Simulated data further revealed that the robust joint test is valid in different heteroscedasticity models, whereas the traditional joint test is invalid. The robust joint test also has power similar to the traditional joint test when heteroscedasticity is not an issue. CONCLUSIONS AND RELEVANCE We believe the robust joint test should be used in candidate-gene studies and GWASs of psychiatric outcomes that consider environmental interactions. To make the procedure useful for applied investigators, we created a software tool that can be called from the popular PLINK package for analysis. PMID:25354142

  5. Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

    PubMed

    Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

    2016-02-01

    Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  6. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

    PubMed

    Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E

    2017-02-01

    Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  7. Virulotyping of Shigella spp. isolated from pediatric patients in Tehran, Iran.

    PubMed

    Ranjbar, Reza; Bolandian, Masomeh; Behzadi, Payam

    2017-03-01

    Shigellosis is a considerable infectious disease with high morbidity and mortality among children worldwide. In this survey the prevalence of four important virulence genes including ial, ipaH, set1A, and set1B were investigated among Shigella strains and the related gene profiles identified in the present investigation, stool specimens were collected from children who were referred to two hospitals in Tehran, Iran. The samples were collected during 3 years (2008-2010) from children who were suspected to shigellosis. Shigella spp. were identified throughout microbiological and serological tests and then subjected to PCR for virulotyping. Shigella sonnei was ranking first (65.5%) followed by Shigella flexneri (25.9%), Shigella boydii (6.9%), and Shigella dysenteriae (1.7%). The ial gene was the most frequent virulence gene among isolated bacterial strains and was followed by ipaH, set1B, and set1A. S. flexneri possessed all of the studied virulence genes (ial 65.51%, ipaH 58.62%, set1A 12.07%, and set1B 22.41%). Moreover, the pattern of virulence gene profiles including ial, ial-ipaH, ial-ipaH-set1B, and ial-ipaH-set1B-set1A was identified for isolated Shigella spp. strains. The pattern of virulence genes is changed in isolated strains of Shigella in this study. So, the ial gene is placed first and the ipaH in second.

  8. TEGS-CN: A Statistical Method for Pathway Analysis of Genome-wide Copy Number Profile.

    PubMed

    Huang, Yen-Tsung; Hsu, Thomas; Christiani, David C

    2014-01-01

    The effects of copy number alterations make up a significant part of the tumor genome profile, but pathway analyses of these alterations are still not well established. We proposed a novel method to analyze multiple copy numbers of genes within a pathway, termed Test for the Effect of a Gene Set with Copy Number data (TEGS-CN). TEGS-CN was adapted from TEGS, a method that we previously developed for gene expression data using a variance component score test. With additional development, we extend the method to analyze DNA copy number data, accounting for different sizes and thus various numbers of copy number probes in genes. The test statistic follows a mixture of X (2) distributions that can be obtained using permutation with scaled X (2) approximation. We conducted simulation studies to evaluate the size and the power of TEGS-CN and to compare its performance with TEGS. We analyzed a genome-wide copy number data from 264 patients of non-small-cell lung cancer. With the Molecular Signatures Database (MSigDB) pathway database, the genome-wide copy number data can be classified into 1814 biological pathways or gene sets. We investigated associations of the copy number profile of the 1814 gene sets with pack-years of cigarette smoking. Our analysis revealed five pathways with significant P values after Bonferroni adjustment (<2.8 × 10(-5)), including the PTEN pathway (7.8 × 10(-7)), the gene set up-regulated under heat shock (3.6 × 10(-6)), the gene sets involved in the immune profile for rejection of kidney transplantation (9.2 × 10(-6)) and for transcriptional control of leukocytes (2.2 × 10(-5)), and the ganglioside biosynthesis pathway (2.7 × 10(-5)). In conclusion, we present a new method for pathway analyses of copy number data, and causal mechanisms of the five pathways require further study.

  9. Identification of genes and gene pathways associated with major depressive disorder by integrative brain analysis of rat and human prefrontal cortex transcriptomes

    PubMed Central

    Malki, K; Pain, O; Tosto, M G; Du Rietz, E; Carboni, L; Schalkwyk, L C

    2015-01-01

    Despite moderate heritability estimates, progress in uncovering the molecular substrate underpinning major depressive disorder (MDD) has been slow. In this study, we used prefrontal cortex (PFC) gene expression from a genetic rat model of MDD to inform probe set prioritization in PFC in a human post-mortem study to uncover genes and gene pathways associated with MDD. Gene expression differences between Flinders sensitive (FSL) and Flinders resistant (FRL) rat lines were statistically evaluated using the RankProd, non-parametric algorithm. Top ranking probe sets in the rat study were subsequently used to prioritize orthologous selection in a human PFC in a case–control post-mortem study on MDD from the Stanley Brain Consortium. Candidate genes in the human post-mortem study were then tested against a matched control sample using the RankProd method. A total of 1767 probe sets were differentially expressed in the PFC between FSL and FRL rat lines at (q⩽0.001). A total of 898 orthologous probe sets was found on Affymetrix's HG-U95A chip used in the human study. Correcting for the number of multiple, non-independent tests, 20 probe sets were found to be significantly dysregulated between human cases and controls at q⩽0.05. These probe sets tagged the expression profile of 18 human genes (11 upregulated and seven downregulated). Using an integrative rat–human study, a number of convergent genes that may have a role in pathogenesis of MDD were uncovered. Eighty percent of these genes were functionally associated with a key stress response signalling cascade, involving NF-κB (nuclear factor kappa-light-chain-enhancer of activated B cells), AP-1 (activator protein 1) and ERK/MAPK, which has been systematically associated with MDD, neuroplasticity and neurogenesis. PMID:25734512

  10. Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

    PubMed

    Gold, David L; Coombes, Kevin R; Wang, Jing; Mallick, Bani

    2007-03-01

    Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is 'enriched' for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental.

  11. ADGO: analysis of differentially expressed gene sets using composite GO annotation.

    PubMed

    Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun

    2006-09-15

    Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.

  12. Reference Genes for Accurate Transcript Normalization in Citrus Genotypes under Different Experimental Conditions

    PubMed Central

    Mafra, Valéria; Kubo, Karen S.; Alves-Ferreira, Marcio; Ribeiro-Alves, Marcelo; Stuart, Rodrigo M.; Boava, Leonardo P.; Rodrigues, Carolina M.; Machado, Marcos A.

    2012-01-01

    Real-time reverse transcription PCR (RT-qPCR) has emerged as an accurate and widely used technique for expression profiling of selected genes. However, obtaining reliable measurements depends on the selection of appropriate reference genes for gene expression normalization. The aim of this work was to assess the expression stability of 15 candidate genes to determine which set of reference genes is best suited for transcript normalization in citrus in different tissues and organs and leaves challenged with five pathogens (Alternaria alternata, Phytophthora parasitica, Xylella fastidiosa and Candidatus Liberibacter asiaticus). We tested traditional genes used for transcript normalization in citrus and orthologs of Arabidopsis thaliana genes described as superior reference genes based on transcriptome data. geNorm and NormFinder algorithms were used to find the best reference genes to normalize all samples and conditions tested. Additionally, each biotic stress was individually analyzed by geNorm. In general, FBOX (encoding a member of the F-box family) and GAPC2 (GAPDH) was the most stable candidate gene set assessed under the different conditions and subsets tested, while CYP (cyclophilin), TUB (tubulin) and CtP (cathepsin) were the least stably expressed genes found. Validation of the best suitable reference genes for normalizing the expression level of the WRKY70 transcription factor in leaves infected with Candidatus Liberibacter asiaticus showed that arbitrary use of reference genes without previous testing could lead to misinterpretation of data. Our results revealed FBOX, SAND (a SAND family protein), GAPC2 and UPL7 (ubiquitin protein ligase 7) to be superior reference genes, and we recommend their use in studies of gene expression in citrus species and relatives. This work constitutes the first systematic analysis for the selection of superior reference genes for transcript normalization in different citrus organs and under biotic stress. PMID:22347455

  13. Gene panel testing for inherited cancer risk.

    PubMed

    Hall, Michael J; Forman, Andrea D; Pilarski, Robert; Wiesner, Georgia; Giri, Veda N

    2014-09-01

    Next-generation sequencing technologies have ushered in the capability to assess multiple genes in parallel for genetic alterations that may contribute to inherited risk for cancers in families. Thus, gene panel testing is now an option in the setting of genetic counseling and testing for cancer risk. This article describes the many gene panel testing options clinically available to assess inherited cancer susceptibility, the potential advantages and challenges associated with various types of panels, clinical scenarios in which gene panels may be particularly useful in cancer risk assessment, and testing and counseling considerations. Given the potential issues for patients and their families, gene panel testing for inherited cancer risk is recommended to be offered in conjunction or consultation with an experienced cancer genetic specialist, such as a certified genetic counselor or geneticist, as an integral part of the testing process. Copyright © 2014 by the National Comprehensive Cancer Network.

  14. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2016-07-15

    Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  15. Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity.

    PubMed

    Chang, Jinyuan; Zheng, Chao; Zhou, Wen-Xin; Zhou, Wen

    2017-12-01

    In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2017, The International Biometric Society.

  16. Reading and Generalist Genes

    ERIC Educational Resources Information Center

    Haworth, Claire M. A.; Meaburn, Emma L.; Harlaar, Nicole; Plomin, Robert

    2007-01-01

    Twin-study research suggests that many (but not all) of the same genes contribute to genetic influence on diverse learning abilities and disabilities, a hypothesis called "generalist genes". This generalist genes hypothesis was tested using a set of 10 DNA markers (single nucleotide polymorphisms [SNPs]) found to be associated with early reading…

  17. An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features.

    PubMed

    Nandi, Sutanu; Subramanian, Abhishek; Sarkar, Ram Rup

    2017-07-25

    Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for the appropriate functioning and survival of a cell. The available machine learning techniques for essential gene prediction have inherent problems, like imbalanced provision of training datasets, biased choice of the best model for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for the prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, a unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate the best machine learning model (the model with the highest accuracy among all the models trained for different sample training sets). For the first time, we also introduce flux-coupled metabolic subnetwork-based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. The testing accuracy was always high as compared to the known techniques, proving that our method outperforms known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate high codon usage bias, GC content and gene expression, and predominantly possess a tendency to form physiological flux modules in metabolism.

  18. A novel feature extraction approach for microarray data based on multi-algorithm fusion

    PubMed Central

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277

  19. A novel feature extraction approach for microarray data based on multi-algorithm fusion.

    PubMed

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.

  20. GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network

    PubMed Central

    2012-01-01

    Background Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overly simplistic. The creation of sets from individual pathways (in isolation) is a poor reflection of the complexity of the underlying metabolic network. We have developed a novel approach to set generation via the use of Principal Component Analysis of the Laplacian matrix of a metabolic network. We have analysed a relatively simple data set to show the difference in results between our method and the current state-of-the-art pathway-based sets. Results The sets generated with this method are semi-exhaustive and capture much of the topological complexity of the metabolic network. The semi-exhaustive nature of this method has also allowed us to design a hypergeometric enrichment test to determine which genes are likely responsible for set significance. We show that our method finds significant aspects of biology that would be missed (i.e. false negatives) and addresses the false positive rates found with the use of simple pathway-based sets. Conclusions The set generation step for GSA is often neglected but is a crucial part of the analysis as it defines the full context for the analysis. As such, set generation methods should be robust and yield as complete a representation of the extant biological knowledge as possible. The method reported here achieves this goal and is demonstrably superior to previous set analysis methods. PMID:22876834

  1. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

    PubMed

    Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh

    2018-04-26

    Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

  2. Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.

    PubMed

    Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano

    2011-04-11

    In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.

  3. Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls.

    PubMed

    Meijer, Rosa J; Goeman, Jelle J

    2016-09-01

    The use of multiple testing procedures in the context of gene-set testing is an important but relatively underexposed topic. If a multiple testing method is used, this is usually a standard familywise error rate (FWER) or false discovery rate (FDR) controlling procedure in which the logical relationships that exist between the different (self-contained) hypotheses are not taken into account. Taking those relationships into account, however, can lead to more powerful variants of existing multiple testing procedures and can make summarizing and interpreting the final results easier. We will show that, from the perspective of interpretation as well as from the perspective of power improvement, FWER controlling methods are more suitable than FDR controlling methods. As an example of a possible power improvement, we suggest a modified version of the popular method by Holm, which we also implemented in the R package cherry. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  4. A feasibility study of returning clinically actionable somatic genomic alterations identified in a research laboratory.

    PubMed

    Arango, Natalia Paez; Brusco, Lauren; Mills Shaw, Kenna R; Chen, Ken; Eterovic, Agda Karina; Holla, Vijaykumar; Johnson, Amber; Litzenburger, Beate; Khotskaya, Yekaterina B; Sanchez, Nora; Bailey, Ann; Zheng, Xiaofeng; Horombe, Chacha; Kopetz, Scott; Farhangfar, Carol J; Routbort, Mark; Broaddus, Russell; Bernstam, Elmer V; Mendelsohn, John; Mills, Gordon B; Meric-Bernstam, Funda

    2017-06-27

    Molecular profiling performed in the research setting usually does not benefit the patients that donate their tissues. Through a prospective protocol, we sought to determine the feasibility and utility of performing broad genomic testing in the research laboratory for discovery, and the utility of giving treating physicians access to research data, with the option of validating actionable alterations in the CLIA environment. 1200 patients with advanced cancer underwent characterization of their tumors with high depth hybrid capture sequencing of 201 genes in the research setting. Tumors were also tested in the CLIA laboratory, with a standardized hotspot mutation analysis on an 11, 46 or 50 gene platform. 527 patients (44%) had at least one likely somatic mutation detected in an actionable gene using hotspot testing. With the 201 gene panel, 945 patients (79%) had at least one alteration in a potentially actionable gene that was undetected with the more limited CLIA panel testing. Sixty-four genomic alterations identified on the research panel were subsequently tested using an orthogonal CLIA assay. Of 16 mutations tested in the CLIA environment, 12 (75%) were confirmed. Twenty-five (52%) of 48 copy number alterations were confirmed. Nine (26.5%) of 34 patients with confirmed results received genotype-matched therapy. Seven of these patients were enrolled onto genotype-matched targeted therapy trials. Expanded cancer gene sequencing identifies more actionable genomic alterations. The option of CLIA validating research results can provide alternative targets for personalized cancer therapy.

  5. PathwaySplice: An R package for unbiased pathway analysis of alternative splicing in RNA-Seq data.

    PubMed

    Yan, Aimin; Ban, Yuguang; Gao, Zhen; Chen, Xi; Wang, Lily

    2018-04-24

    Pathway analysis of alternative splicing would be biased without accounting for the different number of exons or junctions associated with each gene, because genes with higher number of exons or junctions are more likely to be included in the "significant" gene list in alternative splicing. We present PathwaySplice, an R package that (1) Performs pathway analysis that explicitly adjusts for the number of exons or junctions associated with each gene; (2) Visualizes selection bias due to different number of exons or junctions for each gene and formally tests for presence of bias using logistic regression; (3) Supports gene sets based on the Gene Ontology terms, as well as more broadly defined gene sets (e.g. MSigDB) or user defined gene sets; (4) Identifies the significant genes driving pathway significance and (5) Organizes significant pathways with an enrichment map, where pathways with large number of overlapping genes are grouped together in a network graph. https://bioconductor.org/packages/release/bioc/html/PathwaySplice.html. lily.wangg@gmail.com, xi.steven.chen@gmail.com.

  6. Using parentage analysis to examine gene flow and spatial genetic structure.

    PubMed

    Kane, Nolan C; King, Matthew G

    2009-04-01

    Numerous approaches have been developed to examine recent and historical gene flow between populations, but few studies have used empirical data sets to compare different approaches. Some methods are expected to perform better under particular scenarios, such as high or low gene flow, but this, too, has rarely been tested. In this issue of Molecular Ecology, Saenz-Agudelo et al. (2009) apply assignment tests and parentage analysis to microsatellite data from five geographically proximal (2-6 km) and one much more distant (1500 km) panda clownfish populations, showing that parentage analysis performed better in situations of high gene flow, while their assignment tests did better with low gene flow. This unusually complete data set is comprised of multiple exhaustively sampled populations, including nearly all adults and large numbers of juveniles, enabling the authors to ask questions that in many systems would be impossible to answer. Their results emphasize the importance of selecting the right analysis to use, based on the underlying model and how well its assumptions are met by the populations to be analysed.

  7. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    PubMed

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  8. Genome-Wide Identification and Testing of Superior Reference Genes for Transcript Normalization in Arabidopsis1[w

    PubMed Central

    Czechowski, Tomasz; Stitt, Mark; Altmann, Thomas; Udvardi, Michael K.; Scheible, Wolf-Rüdiger

    2005-01-01

    Gene transcripts with invariant abundance during development and in the face of environmental stimuli are essential reference points for accurate gene expression analyses, such as RNA gel-blot analysis or quantitative reverse transcription-polymerase chain reaction (PCR). An exceptionally large set of data from Affymetrix ATH1 whole-genome GeneChip studies provided the means to identify a new generation of reference genes with very stable expression levels in the model plant species Arabidopsis (Arabidopsis thaliana). Hundreds of Arabidopsis genes were found that outperform traditional reference genes in terms of expression stability throughout development and under a range of environmental conditions. Most of these were expressed at much lower levels than traditional reference genes, making them very suitable for normalization of gene expression over a wide range of transcript levels. Specific and efficient primers were developed for 22 genes and tested on a diverse set of 20 cDNA samples. Quantitative reverse transcription-PCR confirmed superior expression stability and lower absolute expression levels for many of these genes, including genes encoding a protein phosphatase 2A subunit, a coatomer subunit, and an ubiquitin-conjugating enzyme. The developed PCR primers or hybridization probes for the novel reference genes will enable better normalization and quantification of transcript levels in Arabidopsis in the future. PMID:16166256

  9. Identifying prognostic signature in ovarian cancer using DirGenerank

    PubMed Central

    Wang, Jian-Yong; Chen, Ling-Ling; Zhou, Xiong-Hui

    2017-01-01

    Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes’ correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients. PMID:28615526

  10. An Adaptive Genetic Association Test Using Double Kernel Machines.

    PubMed

    Zhan, Xiang; Epstein, Michael P; Ghosh, Debashis

    2015-10-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.

  11. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome.

    PubMed

    Morine, Melissa J; McMonagle, Jolene; Toomey, Sinead; Reynolds, Clare M; Moloney, Aidan P; Gormley, Isobel C; Gaora, Peadar O; Roche, Helen M

    2010-10-07

    Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p < 0.05), followed by muscle (601 genes) and adipose (16 genes). Results from modified GSEA showed that the high-CLA beef diet affected diverse biological processes across the three tissues, and that the majority of pathway changes reached significance only with the bi-directional test. Combining the liver tissue microarray results with plasma marker data revealed 110 CLA-sensitive genes showing strong canonical correlation with one or more plasma markers of metabolic health, and 9 significantly overrepresented pathways among this set; each of these pathways was also significantly changed by the high-CLA diet. Closer inspection of two of these pathways--selenoamino acid metabolism and steroid biosynthesis--illustrated clear diet-sensitive changes in constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect. Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease.

  12. Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome

    PubMed Central

    2010-01-01

    Background Currently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets. Results Here, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p < 0.05), followed by muscle (601 genes) and adipose (16 genes). Results from modified GSEA showed that the high-CLA beef diet affected diverse biological processes across the three tissues, and that the majority of pathway changes reached significance only with the bi-directional test. Combining the liver tissue microarray results with plasma marker data revealed 110 CLA-sensitive genes showing strong canonical correlation with one or more plasma markers of metabolic health, and 9 significantly overrepresented pathways among this set; each of these pathways was also significantly changed by the high-CLA diet. Closer inspection of two of these pathways - selenoamino acid metabolism and steroid biosynthesis - illustrated clear diet-sensitive changes in constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect. Conclusion Bi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease. PMID:20929581

  13. Establishing the role of rare coding variants in known Parkinson's disease risk loci.

    PubMed

    Jansen, Iris E; Gibbs, J Raphael; Nalls, Mike A; Price, T Ryan; Lubbe, Steven; van Rooij, Jeroen; Uitterlinden, André G; Kraaij, Robert; Williams, Nigel M; Brice, Alexis; Hardy, John; Wood, Nicholas W; Morris, Huw R; Gasser, Thomas; Singleton, Andrew B; Heutink, Peter; Sharma, Manu

    2017-11-01

    Many common genetic factors have been identified to contribute to Parkinson's disease (PD) susceptibility, improving our understanding of the related underlying biological mechanisms. The involvement of rarer variants in these loci has been poorly studied. Using International Parkinson's Disease Genomics Consortium data sets, we performed a comprehensive study to determine the impact of rare variants in 23 previously published genome-wide association studies (GWAS) loci in PD. We applied Prix fixe to select the putative causal genes underneath the GWAS peaks, which was based on underlying functional similarities. The Sequence Kernel Association Test was used to analyze the joint effect of rare, common, or both types of variants on PD susceptibility. All genes were tested simultaneously as a gene set and each gene individually. We observed a moderate association of common variants, confirming the involvement of the known PD risk loci within our genetic data sets. Focusing on rare variants, we identified additional association signals for LRRK2, STBD1, and SPATA19. Our study suggests an involvement of rare variants within several putatively causal genes underneath previously identified PD GWAS peaks. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Integrative set enrichment testing for multiple omics platforms

    PubMed Central

    2011-01-01

    Background Enrichment testing assesses the overall evidence of differential expression behavior of the elements within a defined set. When we have measured many molecular aspects, e.g. gene expression, metabolites, proteins, it is desirable to assess their differential tendencies jointly across platforms using an integrated set enrichment test. In this work we explore the properties of several methods for performing a combined enrichment test using gene expression and metabolomics as the motivating platforms. Results Using two simulation models we explored the properties of several enrichment methods including two novel methods: the logistic regression 2-degree of freedom Wald test and the 2-dimensional permutation p-value for the sum-of-squared statistics test. In relation to their univariate counterparts we find that the joint tests can improve our ability to detect results that are marginal univariately. We also find that joint tests improve the ranking of associated pathways compared to their univariate counterparts. However, there is a risk of Type I error inflation with some methods and self-contained methods lose specificity when the sets are not representative of underlying association. Conclusions In this work we show that consideration of data from multiple platforms, in conjunction with summarization via a priori pathway information, leads to increased power in detection of genomic associations with phenotypes. PMID:22118224

  15. Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.

    PubMed

    Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J

    2017-04-01

    Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on Alcoholism.

  16. Reference genes for real-time PCR quantification of messenger RNAs and microRNAs in mouse model of obesity.

    PubMed

    Matoušková, Petra; Bártíková, Hana; Boušová, Iva; Hanušová, Veronika; Szotáková, Barbora; Skálová, Lenka

    2014-01-01

    Obesity and metabolic syndrome is increasing health problem worldwide. Among other ways, nutritional intervention using phytochemicals is important method for treatment and prevention of this disease. Recent studies have shown that certain phytochemicals could alter the expression of specific genes and microRNAs (miRNAs) that play a fundamental role in the pathogenesis of obesity. For study of the obesity and its treatment, monosodium glutamate (MSG)-injected mice with developed central obesity, insulin resistance and liver lipid accumulation are frequently used animal models. To understand the mechanism of phytochemicals action in obese animals, the study of selected genes expression together with miRNA quantification is extremely important. For this purpose, real-time quantitative PCR is a sensitive and reproducible method, but it depends on proper normalization entirely. The aim of present study was to identify the appropriate reference genes for mRNA and miRNA quantification in MSG mice treated with green tea catechins, potential anti-obesity phytochemicals. Two sets of reference genes were tested: first set contained seven commonly used genes for normalization of messenger RNA, the second set of candidate reference genes included ten small RNAs for normalization of miRNA. The expression stability of these reference genes were tested upon treatment of mice with catechins using geNorm, NormFinder and BestKeeper algorithms. Selected normalizers for mRNA quantification were tested and validated on expression of quinone oxidoreductase, biotransformation enzyme known to be modified by catechins. The effect of selected normalizers for miRNA quantification was tested on two obesity- and diabetes- related miRNAs, miR-221 and miR-29b, respectively. Finally, the combinations of B2M/18S/HPRT1 and miR-16/sno234 were validated as optimal reference genes for mRNA and miRNA quantification in liver and 18S/RPlP0/HPRT1 and sno234/miR-186 in small intestine of MSG mice. These reference genes will be used for mRNA and miRNA normalization in further study of green tea catechins action in obese mice.

  17. Prediction of cancer class with majority voting genetic programming classifier using gene expression data.

    PubMed

    Paul, Topon Kumar; Iba, Hitoshi

    2009-01-01

    In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.

  18. An ensemble rank learning approach for gene prioritization.

    PubMed

    Lee, Po-Feng; Soo, Von-Wun

    2013-01-01

    Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.

  19. [Enterotoxin genes occurance among S. aureus strains isolated from inpatients and carriers].

    PubMed

    Lawrynowicz-Paciorek, Maja; Kochman, Maria; Piekarska, Katarzyna; Wyrebiak, Agata; Potracka, Ewa; Leniak-Chmiel, Urszula; Magdziak, Agnieszka

    2006-01-01

    We examined 44 inpatients and 66 carriers Staphylococcus aureus strains, isolated in years 2002-2005, for the presence of 18 enterotoxin genes (se/sel) (by PCR), the ability for A-D enterotoxin production (by SET-RPLA) and antibiotic resistance distribution (by disc diffusion method). se/sel genes were detected in 90,9% of all strains, sea (70,5%) and selk and selq (52,3%) - among inpatients strains and egc (65,2%) - among carriers strains were the most frequently se/sel genes found. Positive results of SET-RPLA were consistent with PCR results. There was no correlation observed between antibiotic resistance and se/sel genes distribution among tested S. aureus strains.

  20. OPATs: Omnibus P-value association tests.

    PubMed

    Chen, Chia-Wei; Yang, Hsin-Chou

    2017-07-10

    Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.

  1. Reproducible detection of disease-associated markers from gene expression data.

    PubMed

    Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto

    2016-08-18

    Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

  2. A powerful and efficient set test for genetic markers that handles confounders

    PubMed Central

    Listgarten, Jennifer; Lippert, Christoph; Kang, Eun Yong; Xiang, Jing; Kadie, Carl M.; Heckerman, David

    2013-01-01

    Motivation: Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power. Results: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects—one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn’s disease case–control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. Availability: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com. Contact: jennl@microsoft.com or lippert@microsoft.com or heckerma@microsoft.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23599503

  3. An analysis of gene expression in PTSD implicates genes involved in the glucocorticoid receptor pathway and neural responses to stress

    PubMed Central

    Logue, Mark W.; Smith, Alicia K.; Baldwin, Clinton; Wolf, Erika J.; Guffanti, Guia; Ratanatharathorn, Andrew; Stone, Annjanette; Schichman, Steven A.; Humphries, Donald; Binder, Elisabeth B.; Arloth, Janine; Menke, Andreas; Uddin, Monica; Wildman, Derek; Galea, Sandro; Aiello, Allison E.; Koenen, Karestan C.; Miller, Mark W.

    2015-01-01

    We examined the association between posttraumatic stress disorder (PTSD) and gene expression using whole blood samples from a cohort of trauma-exposed white non-Hispanic male veterans (115 cases and 28 controls). 10,264 probes of genes and gene transcripts were analyzed. We found 41 that were differentially expressed in PTSD cases versus controls (multiple-testing corrected p<0.05). The most significant was DSCAM, a neurological gene expressed widely in the developing brain and in the amygdala and hippocampus of the adult brain. We then examined the 41 differentially expressed genes in a meta-analysis using two replication cohorts and found significant associations with PTSD for 7 of the 41 (p<0.05), one of which (ATP6AP1L) survived multiple-testing correction. There was also broad evidence of overlap across the discovery and replication samples for the entire set of genes implicated in the discovery data based on the direction of effect and an enrichment of p<0.05 significant probes beyond what would be expected under the null. Finally, we found that the set of differentially expressed genes from the discovery sample was enriched for genes responsive to glucocorticoid signaling with most showing reduced expression in PTSD cases compared to controls. PMID:25867994

  4. Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork.

    PubMed

    Druka, Arnis; Druka, Ilze; Centeno, Arthur G; Li, Hongqiang; Sun, Zhaohui; Thomas, William T B; Bonar, Nicola; Steffenson, Brian J; Ullrich, Steven E; Kleinhofs, Andris; Wise, Roger P; Close, Timothy J; Potokina, Elena; Luo, Zewei; Wagner, Carola; Schweizer, Günther F; Marshall, David F; Kearsey, Michael J; Williams, Robert W; Waugh, Robbie

    2008-11-18

    A typical genetical genomics experiment results in four separate data sets; genotype, gene expression, higher-order phenotypic data and metadata that describe the protocols, processing and the array platform. Used in concert, these data sets provide the opportunity to perform genetic analysis at a systems level. Their predictive power is largely determined by the gene expression dataset where tens of millions of data points can be generated using currently available mRNA profiling technologies. Such large, multidimensional data sets often have value beyond that extracted during their initial analysis and interpretation, particularly if conducted on widely distributed reference genetic materials. Besides quality and scale, access to the data is of primary importance as accessibility potentially allows the extraction of considerable added value from the same primary dataset by the wider research community. Although the number of genetical genomics experiments in different plant species is rapidly increasing, none to date has been presented in a form that allows quick and efficient on-line testing for possible associations between genes, loci and traits of interest by an entire research community. Using a reference population of 150 recombinant doubled haploid barley lines we generated novel phenotypic, mRNA abundance and SNP-based genotyping data sets, added them to a considerable volume of legacy trait data and entered them into the GeneNetwork http://www.genenetwork.org. GeneNetwork is a unified on-line analytical environment that enables the user to test genetic hypotheses about how component traits, such as mRNA abundance, may interact to condition more complex biological phenotypes (higher-order traits). Here we describe these barley data sets and demonstrate some of the functionalities GeneNetwork provides as an easily accessible and integrated analytical environment for exploring them. By integrating barley genotypic, phenotypic and mRNA abundance data sets directly within GeneNetwork's analytical environment we provide simple web access to the data for the research community. In this environment, a combination of correlation analysis and linkage mapping provides the potential to identify and substantiate gene targets for saturation mapping and positional cloning. By integrating datasets from an unsequenced crop plant (barley) in a database that has been designed for an animal model species (mouse) with a well established genome sequence, we prove the importance of the concept and practice of modular development and interoperability of software engineering for biological data sets.

  5. The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.

    PubMed

    Good, Benjamin M; Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2014-07-29

    Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

  6. Genome-Wide Gene Set Analysis for Identification of Pathways Associated with Alcohol Dependence

    PubMed Central

    Biernacka, Joanna M.; Geske, Jennifer; Jenkins, Gregory D.; Colby, Colin; Rider, David N.; Karpyak, Victor M.; Choi, Doo-Sup; Fridley, Brooke L.

    2013-01-01

    It is believed that multiple genetic variants with small individual effects contribute to the risk of alcohol dependence. Such polygenic effects are difficult to detect in genome-wide association studies that test for association of the phenotype with each single nucleotide polymorphism (SNP) individually. To overcome this challenge, gene set analysis (GSA) methods that jointly test for the effects of pre-defined groups of genes have been proposed. Rather than testing for association between the phenotype and individual SNPs, these analyses evaluate the global evidence of association with a set of related genes enabling the identification of cellular or molecular pathways or biological processes that play a role in development of the disease. It is hoped that by aggregating the evidence of association for all available SNPs in a group of related genes, these approaches will have enhanced power to detect genetic associations with complex traits. We performed GSA using data from a genome-wide study of 1165 alcohol dependent cases and 1379 controls from the Study of Addiction: Genetics and Environment (SAGE), for all 200 pathways listed in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Results demonstrated a potential role of the “Synthesis and Degradation of Ketone Bodies” pathway. Our results also support the potential involvement of the “Neuroactive Ligand Receptor Interaction” pathway, which has previously been implicated in addictive disorders. These findings demonstrate the utility of GSA in the study of complex disease, and suggest specific directions for further research into the genetic architecture of alcohol dependence. PMID:22717047

  7. Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder.

    PubMed

    Almli, Lynn M; Duncan, Richard; Feng, Hao; Ghosh, Debashis; Binder, Elisabeth B; Bradley, Bekh; Ressler, Kerry J; Conneely, Karen N; Epstein, Michael P

    2014-12-01

    Genetic association studies of psychiatric outcomes often consider interactions with environmental exposures and, in particular, apply tests that jointly consider gene and gene-environment interaction effects for analysis. Using a genome-wide association study (GWAS) of posttraumatic stress disorder (PTSD), we report that heteroscedasticity (defined as variability in outcome that differs by the value of the environmental exposure) can invalidate traditional joint tests of gene and gene-environment interaction. To identify the cause of bias in traditional joint tests of gene and gene-environment interaction in a PTSD GWAS and determine whether proposed robust joint tests are insensitive to this problem. The PTSD GWAS data set consisted of 3359 individuals (978 men and 2381 women) from the Grady Trauma Project (GTP), a cohort study from Atlanta, Georgia. The GTP performed genome-wide genotyping of participants and collected environmental exposures using the Childhood Trauma Questionnaire and Trauma Experiences Inventory. We performed joint interaction testing of the Beck Depression Inventory and modified PTSD Symptom Scale in the GTP GWAS. We assessed systematic bias in our interaction analyses using quantile-quantile plots and genome-wide inflation factors. Application of the traditional joint interaction test to the GTP GWAS yielded systematic inflation across different outcomes and environmental exposures (inflation-factor estimates ranging from 1.07 to 1.21), whereas application of the robust joint test to the same data set yielded no such inflation (inflation-factor estimates ranging from 1.01 to 1.02). Simulated data further revealed that the robust joint test is valid in different heteroscedasticity models, whereas the traditional joint test is invalid. The robust joint test also has power similar to the traditional joint test when heteroscedasticity is not an issue. We believe the robust joint test should be used in candidate-gene studies and GWASs of psychiatric outcomes that consider environmental interactions. To make the procedure useful for applied investigators, we created a software tool that can be called from the popular PLINK package for analysis.

  8. High prevalence of multidrug-resistant tuberculosis among patients with rifampicin resistance using GeneXpert Mycobacterium tuberculosis/rifampicin in Ghana.

    PubMed

    Boakye-Appiah, Justice K; Steinmetz, Alexis R; Pupulampu, Peter; Ofori-Yirenkyi, Stephen; Tetteh, Ishmael; Frimpong, Michael; Oppong, Patrick; Opare-Sem, Ohene; Norman, Betty R; Stienstra, Ymkje; van der Werf, Tjip S; Wansbrough-Jones, Mark; Bonsu, Frank; Obeng-Baah, Joseph; Phillips, Richard O

    2016-06-01

    Drug-resistant strains of tuberculosis (TB) represent a major threat to global TB control. In low- and middle-income countries, resource constraints make it difficult to identify and monitor cases of resistance using drug susceptibility testing and culture. Molecular assays such as the GeneXpert Mycobacterium tuberculosis/rifampicin may prove to be a cost-effective solution to this problem in these settings. The objective of this study is to evaluate the use of GeneXpert in the diagnosis of pulmonary TB since it was introduced into two tertiary hospitals in Ghana in 2013. A 2-year retrospective audit of clinical cases involving patients who presented with clinically suspected TB or documented TB not improving on standard therapy and had samples sent for GeneXpert testing. GeneXpert identified 169 cases of TB, including 17 cases of rifampicin-resistant TB. Of the seven cases with final culture and drug susceptibility testing results, six demonstrated further drug resistance and five of these were multidrug-resistant TB. These findings call for a scale-up of TB control in Ghana and provide evidence that the expansion of GeneXpert may be an optimal means to improve case finding and guide treatment of drug-resistant TB in this setting. Copyright © 2016. Published by Elsevier Ltd.

  9. Broad-Enrich: functional interpretation of large sets of broad genomic regions.

    PubMed

    Cavalcante, Raymond G; Lee, Chee; Welch, Ryan P; Patil, Snehal; Weymouth, Terry; Scott, Laura J; Sartor, Maureen A

    2014-09-01

    Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes. To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations. http://broad-enrich.med.umich.edu for Web version and R package. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  10. Microgravity and Immunity: Changes in Lymphocyte Gene Expression

    NASA Technical Reports Server (NTRS)

    Risin, D.; Pellis, N. R.; Ward, N. E.; Risin, S. A.

    2006-01-01

    Earlier studies had shown that modeled and true microgravity (MG) cause multiple direct effects on human lymphocytes. MG inhibits lymphocyte locomotion, suppresses polyclonal and antigen-specific activation, affects signal transduction mechanisms, as well as activation-induced apoptosis. In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes (MGSG) in general and specifically those genes that might be responsible for the functional and structural changes observed earlier. Two sets of experiments targeting different goals were conducted. In the first set, T-lymphocytes from normal donors were activated with antiCD3 and IL2 and then cultured in 1g (static) and modeled MG (MMG) conditions (Rotating Wall Vessel bioreactor) for 24 hours. This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity. In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus (PHA) thus triggering the apoptotic pathway. Total RNA was extracted using the RNeasy isolation kit (Qiagen, Valencia, CA). Affymetrix Gene Chips (U133A), allowing testing for 18,400 human genes, were used for microarray analysis. In the first set of experiments MMG exposure resulted in altered expression of 89 genes, 10 of them were up-regulated and 79 down-regulated. In the second set, changes in expression were revealed in 85 genes, 20 were up-regulated and 65 were down-regulated. The analysis revealed that significant numbers of MGS genes are associated with signal transduction and apoptotic pathways. Interestingly, the majority of genes that responded by up- or down-regulation in the alternative sets of experiments were not the same, possibly reflecting different functional states of the examined T-lymphocyte populations. The responder genes (MGSG) might play an essential role in adaptation to MG and/or be responsible for pathologic changes encountered in Space and thus represent potential targets for molecular-based countermeasures

  11. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies.

    PubMed

    Rollins, Derrick K; Teh, Ailing

    2010-12-17

    Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

  12. Investigation of exomic variants associated with overall survival in ovarian cancer

    PubMed Central

    Ann Chen, Yian; Larson, Melissa C; Fogarty, Zachary C; Earp, Madalene A; Anton-Culver, Hoda; Bandera, Elisa V; Cramer, Daniel; Doherty, Jennifer A; Goodman, Marc T; Gronwald, Jacek; Karlan, Beth Y; Kjaer, Susanne K; Levine, Douglas A; Menon, Usha; Ness, Roberta B; Pearce, Celeste L; Pejovic, Tanja; Rossing, Mary Anne; Wentzensen, Nicolas; Bean, Yukie T; Bisogna, Maria; Brinton, Louise A; Carney, Michael E; Cunningham, Julie M; Cybulski, Cezary; deFazio, Anna; Dicks, Ed M; Edwards, Robert P; Gayther, Simon A; Gentry-Maharaj, Aleksandra; Gore, Martin; Iversen, Edwin S; Jensen, Allan; Johnatty, Sharon E; Lester, Jenny; Lin, Hui-Yi; Lissowska, Jolanta; Lubinski, Jan; Menkiszak, Janusz; Modugno, Francesmary; Moysich, Kirsten B; Orlow, Irene; Pike, Malcolm C; Ramus, Susan J; Song, Honglin; Terry, Kathryn L; Thompson, Pamela J; Tyrer, Jonathan P; van den Berg, David J; Vierkant, Robert A; Vitonis, Allison F; Walsh, Christine; Wilkens, Lynne R; Wu, Anna H; Yang, Hannah; Ziogas, Argyrios; Berchuck, Andrew; Chenevix-Trench, Georgia; Schildkraut, Joellen M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pharoah, Paul D P; Fridley, Brooke L

    2016-01-01

    Background While numerous susceptibility loci for epithelial ovarian cancer (EOC) have been identified, few associations have been reported with overall survival. In the absence of common prognostic genetic markers, we hypothesize that rare coding variants may be associated with overall EOC survival and assessed their contribution in two exome-based genotyping projects of the Ovarian Cancer Association Consortium (OCAC). Methods The primary patient set (Set 1) included 14 independent EOC studies (4293 patients) and 227,892 variants, and a secondary patient set (Set 2) included six additional EOC studies (1744 patients) and 114,620 variants. Because power to detect rare variants individually is reduced, gene-level tests were conducted. Sets were analyzed separately at individual variants and by gene, and then combined with meta-analyses (73,203 variants and 13,163 genes overlapped). Results No individual variant reached genome-wide statistical significance. A SNP previously implicated to be associated with EOC risk and, to a lesser extent, survival, rs8170, showed the strongest evidence of association with survival and similar effect size estimates across sets (Pmeta=1.1E-6, HRSet1=1.17, HRSet2=1.14). Rare variants in ATG2B, an autophagy gene important for apoptosis, were significantly associated with survival after multiple testing correction (Pmeta=1.1E-6; Pcorrected=0.01). Conclusions Common variant rs8170 and rare variants in ATG2B may be associated with EOC overall survival, although further study is needed. Impact This study represents the first exome-wide association study of EOC survival to include rare variant analyses, and suggests that complementary single variant and gene-level analyses in large studies are needed to identify rare variants that warrant follow-up study. PMID:26747452

  13. An Adaptive Genetic Association Test Using Double Kernel Machines

    PubMed Central

    Zhan, Xiang; Epstein, Michael P.; Ghosh, Debashis

    2014-01-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study. PMID:26640602

  14. Prioritizing individual genetic variants after kernel machine testing using variable selection.

    PubMed

    He, Qianchuan; Cai, Tianxi; Liu, Yang; Zhao, Ni; Harmon, Quaker E; Almli, Lynn M; Binder, Elisabeth B; Engel, Stephanie M; Ressler, Kerry J; Conneely, Karen N; Lin, Xihong; Wu, Michael C

    2016-12-01

    Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach. © 2016 WILEY PERIODICALS, INC.

  15. Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods

    PubMed Central

    2012-01-01

    High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods. Reviewers This article was reviewed by Arcady Mushegian, Byung-Soo Kim and Joel Bader. PMID:23227854

  16. Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.

    PubMed

    Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo

    2016-08-30

    Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.

  17. Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

    PubMed Central

    Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298

  18. A mixture model-based approach to the clustering of microarray expression data.

    PubMed

    McLachlan, G J; Bean, R W; Peel, D

    2002-03-01

    This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

  19. Transcriptome meta-analysis reveals common differential and global gene expression profiles in cystic fibrosis and other respiratory disorders and identifies CFTR regulators.

    PubMed

    Clarke, Luka A; Botelho, Hugo M; Sousa, Lisete; Falcao, Andre O; Amaral, Margarida D

    2015-11-01

    A meta-analysis of 13 independent microarray data sets was performed and gene expression profiles from cystic fibrosis (CF), similar disorders (COPD: chronic obstructive pulmonary disease, IPF: idiopathic pulmonary fibrosis, asthma), environmental conditions (smoking, epithelial injury), related cellular processes (epithelial differentiation/regeneration), and non-respiratory "control" conditions (schizophrenia, dieting), were compared. Similarity among differentially expressed (DE) gene lists was assessed using a permutation test, and a clustergram was constructed, identifying common gene markers. Global gene expression values were standardized using a novel approach, revealing that similarities between independent data sets run deeper than shared DE genes. Correlation of gene expression values identified putative gene regulators of the CF transmembrane conductance regulator (CFTR) gene, of potential therapeutic significance. Our study provides a novel perspective on CF epithelial gene expression in the context of other lung disorders and conditions, and highlights the contribution of differentiation/EMT and injury to gene signatures of respiratory disease. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Fast and robust group-wise eQTL mapping using sparse graphical models.

    PubMed

    Cheng, Wei; Shi, Yu; Zhang, Xiang; Wang, Wei

    2015-01-16

    Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways. We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods. Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes.

  1. Functional and evolutionary insights from the Ciona notochord transcriptome.

    PubMed

    Reeves, Wendy M; Wu, Yuye; Harder, Matthew J; Veeman, Michael T

    2017-09-15

    The notochord of the ascidian Ciona consists of only 40 cells, and is a longstanding model for studying organogenesis in a small, simple embryo. Here, we perform RNAseq on flow-sorted notochord cells from multiple stages to define a comprehensive Ciona notochord transcriptome. We identify 1364 genes with enriched expression and extensively validate the results by in situ hybridization. These genes are highly enriched for Gene Ontology terms related to the extracellular matrix, cell adhesion and cytoskeleton. Orthologs of 112 of the Ciona notochord genes have known notochord expression in vertebrates, more than twice as many as predicted by chance alone. This set of putative effector genes with notochord expression conserved from tunicates to vertebrates will be invaluable for testing hypotheses about notochord evolution. The full set of Ciona notochord genes provides a foundation for systems-level studies of notochord gene regulation and morphogenesis. We find only modest overlap between this set of notochord-enriched transcripts and the genes upregulated by ectopic expression of the key notochord transcription factor Brachyury, indicating that Brachyury is not a notochord master regulator gene as strictly defined. © 2017. Published by The Company of Biologists Ltd.

  2. Co-expression network analysis identified six hub genes in association with metastasis risk and prognosis in hepatocellular carcinoma

    PubMed Central

    Feng, Juerong; Zhou, Rui; Chang, Ying; Liu, Jing; Zhao, Qiu

    2017-01-01

    Hepatocellular carcinoma (HCC) has a high incidence and mortality worldwide, and its carcinogenesis and progression are influenced by a complex network of gene interactions. A weighted gene co-expression network was constructed to identify gene modules associated with the clinical traits in HCC (n = 214). Among the 13 modules, high correlation was only found between the red module and metastasis risk (classified by the HCC metastasis gene signature) (R2 = −0.74). Moreover, in the red module, 34 network hub genes for metastasis risk were identified, six of which (ABAT, AGXT, ALDH6A1, CYP4A11, DAO and EHHADH) were also hub nodes in the protein-protein interaction network of the module genes. Thus, a total of six hub genes were identified. In validation, all hub genes showed a negative correlation with the four-stage HCC progression (P for trend < 0.05) in the test set. Furthermore, in the training set, HCC samples with any hub gene lowly expressed demonstrated a higher recurrence rate and poorer survival rate (hazard ratios with 95% confidence intervals > 1). RNA-sequencing data of 142 HCC samples showed consistent results in the prognosis. Gene set enrichment analysis (GSEA) demonstrated that in the samples with any hub gene highly expressed, a total of 24 functional gene sets were enriched, most of which focused on amino acid metabolism and oxidation. In conclusion, co-expression network analysis identified six hub genes in association with HCC metastasis risk and prognosis, which might improve the prognosis by influencing amino acid metabolism and oxidation. PMID:28430663

  3. Validation of the Lung Subtyping Panel in Multiple Fresh-Frozen and Formalin-Fixed, Paraffin-Embedded Lung Tumor Gene Expression Data Sets.

    PubMed

    Faruki, Hawazin; Mayhew, Gregory M; Fan, Cheng; Wilkerson, Matthew D; Parker, Scott; Kam-Morgan, Lauren; Eisenberg, Marcia; Horten, Bruce; Hayes, D Neil; Perou, Charles M; Lai-Goldman, Myla

    2016-06-01

    Context .- A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective .- To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design .- The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription-polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results .- The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions .- The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly when classification by standard methods is challenging.

  4. Gene-gene-environment interactions between drugs, transporters, receptors, and metabolizing enzymes: Statins, SLCO1B1, and CYP3A4 as an example.

    PubMed

    Sadee, Wolfgang

    2013-09-01

    Pharmacogenetic biomarker tests include mostly specific single gene-drug pairs, capable of accounting for a portion of interindividual variability in drug response and toxicity. However, multiple genes are likely to contribute, either acting independently or epistatically, with the CYP2C9-VKORC1-warfarin test panel, an example of a clinically used gene-gene-dug interaction. I discuss here further instances of gene-gene-drug interactions, including a proposed dynamic effect on statin therapy by genetic variants in both a transporter (SLCO1B1) and a metabolizing enzyme (CYP3A4) in liver cells, the main target site where statins block cholesterol synthesis. These examples set a conceptual framework for developing diagnostic panels involving multiple gene-drug combinations. Copyright © 2013 Wiley Periodicals, Inc.

  5. Multidisease testing for HIV and TB using the GeneXpert platform: A feasibility study in rural Zimbabwe

    PubMed Central

    Fajardo, Emmanuel; Mbofana, Elton; Maparo, Tatenda; Garone, Daniela; Metcalf, Carol; Bygrave, Helen; Kao, Kekeletso; Zinyowera, Sekesai

    2018-01-01

    Background HIV Viral Load and Early Infant Diagnosis technologies in many high burden settings are restricted to centralized laboratory testing, leading to long result turnaround times and patient attrition. GeneXpert (Cepheid, CA, USA) is a polyvalent near point-of-care platform and is widely implemented for Xpert MTB/RIF diagnosis. This study sought to evaluate the operational feasibility of integrated HIV VL, EID and MTB/RIF testing in new GeneXpert platforms. Methods Whole blood samples were collected from consenting patients due for routine HIV VL testing and DBS samples from infants due for EID testing, at three rural health facilities in Zimbabwe. Sputum samples were collected from all individuals suspected of TB. GeneXpert testing was reserved for all EID, all TB suspects and priority HIV VL at each site. Blood samples were further sent to centralized laboratories for confirmatory testing. GeneXpert polyvalent testing results and patient outcomes, including infrastructural and logistical requirements are reported. The study was conducted over a 10-month period. Results The fully automated GeneXpert testing device, required minimal training and biosafety considerations. A total of 1,302 HIV VL, 277 EID and 1,581 MTB/RIF samples were tested on a four module GeneXpert platform in each study site. Xpert HIV-1 VL testing was prioritized for patients who presented with advanced HIV disease, pregnant women, adolescents and suspected ART failures patients. On average, the study sites had a GeneXpert utilization rate of 50.4% (Gutu Mission Hospital), 63.5% (Murambinda Mission Hospital) and 17.5% (Chimombe Rural Health Centre) per month. GeneXpert polyvalent testing error rates remained lower than 4% in all sites. Decentralized EID and VL testing on Xpert had shorter overall median TAT (1 day [IQR: 0–4] and 1 day [IQR: 0–1] respectively) compared to centralized testing (17 days [IQR: 13–21] and 26 days [IQR: 23–32] respectively). Among patients with VL >1000 copies/ml (73/640; 11.4%) at GMH health facility, median time to enhanced adherence counselling was 8 days and majority of those with documented outcomes had re-suppressed VL (20/32; 62.5%). Median time to ART initiation among Xpert EID positive infants at GMH was 1 day [IQR: 0–1]. Conclusion Implementation of near point-of-care GeneXpert platform for integrated multi-disease testing within district and sub-district healthcare settings is feasible and will increase access to VL, and EID testing to priority populations. Quality management systems including monitoring of performance indicators, together with regular on-site supervision are crucial, and near-POC test results must be promptly actioned-on by clinicians for patient management. PMID:29499042

  6. Multidisease testing for HIV and TB using the GeneXpert platform: A feasibility study in rural Zimbabwe.

    PubMed

    Ndlovu, Zibusiso; Fajardo, Emmanuel; Mbofana, Elton; Maparo, Tatenda; Garone, Daniela; Metcalf, Carol; Bygrave, Helen; Kao, Kekeletso; Zinyowera, Sekesai

    2018-01-01

    HIV Viral Load and Early Infant Diagnosis technologies in many high burden settings are restricted to centralized laboratory testing, leading to long result turnaround times and patient attrition. GeneXpert (Cepheid, CA, USA) is a polyvalent near point-of-care platform and is widely implemented for Xpert MTB/RIF diagnosis. This study sought to evaluate the operational feasibility of integrated HIV VL, EID and MTB/RIF testing in new GeneXpert platforms. Whole blood samples were collected from consenting patients due for routine HIV VL testing and DBS samples from infants due for EID testing, at three rural health facilities in Zimbabwe. Sputum samples were collected from all individuals suspected of TB. GeneXpert testing was reserved for all EID, all TB suspects and priority HIV VL at each site. Blood samples were further sent to centralized laboratories for confirmatory testing. GeneXpert polyvalent testing results and patient outcomes, including infrastructural and logistical requirements are reported. The study was conducted over a 10-month period. The fully automated GeneXpert testing device, required minimal training and biosafety considerations. A total of 1,302 HIV VL, 277 EID and 1,581 MTB/RIF samples were tested on a four module GeneXpert platform in each study site. Xpert HIV-1 VL testing was prioritized for patients who presented with advanced HIV disease, pregnant women, adolescents and suspected ART failures patients. On average, the study sites had a GeneXpert utilization rate of 50.4% (Gutu Mission Hospital), 63.5% (Murambinda Mission Hospital) and 17.5% (Chimombe Rural Health Centre) per month. GeneXpert polyvalent testing error rates remained lower than 4% in all sites. Decentralized EID and VL testing on Xpert had shorter overall median TAT (1 day [IQR: 0-4] and 1 day [IQR: 0-1] respectively) compared to centralized testing (17 days [IQR: 13-21] and 26 days [IQR: 23-32] respectively). Among patients with VL >1000 copies/ml (73/640; 11.4%) at GMH health facility, median time to enhanced adherence counselling was 8 days and majority of those with documented outcomes had re-suppressed VL (20/32; 62.5%). Median time to ART initiation among Xpert EID positive infants at GMH was 1 day [IQR: 0-1]. Implementation of near point-of-care GeneXpert platform for integrated multi-disease testing within district and sub-district healthcare settings is feasible and will increase access to VL, and EID testing to priority populations. Quality management systems including monitoring of performance indicators, together with regular on-site supervision are crucial, and near-POC test results must be promptly actioned-on by clinicians for patient management.

  7. Evaluation of reference genes for insect olfaction studies.

    PubMed

    Omondi, Bonaventure Aman; Latorre-Estivalis, Jose Manuel; Rocha Oliveira, Ivana Helena; Ignell, Rickard; Lorenzo, Marcelo Gustavo

    2015-04-22

    Quantitative reverse transcription PCR (qRT-PCR) is a robust and accessible method to assay gene expression and to infer gene regulation. Being a chain of procedures, this technique is subject to systematic error due to biological and technical limitations mainly set by the starting material and downstream procedures. Thus, rigorous data normalization is critical to grant reliability and repeatability of gene expression quantification by qRT-PCR. A number of 'housekeeping genes', involved in basic cellular functions, have been commonly used as internal controls for this normalization process. However, these genes could themselves be regulated and must therefore be tested a priori. We evaluated eight potential reference genes for their stability as internal controls for RT-qPCR studies of olfactory gene expression in the antennae of Rhodnius prolixus, a Chagas disease vector. The set of genes included were: α-tubulin; β-actin; Glyceraldehyde-3-phosphate dehydrogenase; Eukaryotic initiation factor 1A; Glutathione-S-transferase; Serine protease; Succinate dehydrogenase; and Glucose-6-phosphate dehydrogenase. Five experimental conditions, including changes in age,developmental stage and feeding status were tested in both sexes. We show that the evaluation of candidate reference genes is necessary for each combination of sex, tissue and physiological condition analyzed in order to avoid inconsistent results and conclusions. Although, Normfinder and geNorm software yielded different results between males and females, five genes (SDH, Tub, GAPDH, Act and G6PDH) appeared in the first positions in all rankings obtained. By using gene expression data of a single olfactory coreceptor gene as an example, we demonstrated the extent of changes expected using different internal standards. This work underlines the need for a rigorous selection of internal standards to grant the reliability of normalization processes in qRT-PCR studies. Furthermore, we show that particular physiological or developmental conditions require independent evaluation of a diverse set of potential reference genes.

  8. Metallo-Beta-Lactamase Producing Pseudomonas aeruginosa in a Healthcare Setting in Alexandria, Egypt.

    PubMed

    Abaza, Amani F; El Shazly, Soraya A; Selim, Heba S A; Aly, Gehan S A

    2017-09-27

    Pseudomonas aeruginosa has emerged as a major healthcare associated pathogen that creates a serious public health disaster in both developing and developed countries. In this work we aimed at studying the occurrence of metallo-beta-lactamase (MBL) producing P. aeruginosa in a healthcare setting in Alexandria, Egypt. This cross sectional study included 1583 clinical samples that were collected from patients admitted to Alexandria University Students' Hospital. P. aeruginosa isolates were identified using standard microbiological methods and were tested for their antimicrobial susceptibility patterns using single disc diffusion method according to the Clinical and Laboratory Standards Institute recommendations. Thirty P. aeruginosa isolates were randomly selected and tested for their MBL production by both phenotypic and genotypic methods. Diagnostic Epsilometer test was done to detect metallo-beta-lactamase enzyme producers and polymerase chain reaction test was done to detect imipenemase (IMP), Verona integron-encoded (VIM) and Sao Paulo metallo-beta-lactamase (IMP) encoding genes. Of the 1583 clinical samples, 175 (11.3%) P. aeruginosa isolates were identified. All the 30 (100%) selected P. aeruginosa isolates that were tested for MBL production by Epsilometer test were found to be positive; where 19 (63.3%) revealed blaSPM gene and 11 (36.7%) had blaIMP gene. blaVIM gene was not detected in any of the tested isolates. Isolates of MBL producing P. aeruginosa were highly susceptible to polymyxin B 26 (86.7%) and highly resistant to amikacin 26 (86.7%). MBL producers were detected phenotypically by Epsilometer test in both carbapenem susceptible and resistant P. aeruginosa isolates. blaSPM was the most commonly detected MBL gene in P. aeruginosa isolates.

  9. Improvement of experimental testing and network training conditions with genome-wide microarrays for more accurate predictions of drug gene targets

    PubMed Central

    2014-01-01

    Background Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism. Results S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug’s transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions – exposure time and concentration and (ii) Network training conditions – training compendium modifications. Two analyses of SSEM-Lasso output – gene set and single gene – were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets. Conclusions This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved. PMID:24444313

  10. How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?

    PubMed

    Veturi, Yogasudha; Ritchie, Marylyn D

    2018-01-01

    Transcriptome-wide association studies (TWAS) have recently been employed as an approach that can draw upon the advantages of genome-wide association studies (GWAS) and gene expression studies to identify genes associated with complex traits. Unlike standard GWAS, summary level data suffices for TWAS and offers improved statistical power. Two popular TWAS methods include either (a) imputing the cis genetic component of gene expression from smaller sized studies (using multi-SNP prediction or MP) into much larger effective sample sizes afforded by GWAS - TWAS-MP or (b) using summary-based Mendelian randomization - TWAS-SMR. Although these methods have been effective at detecting functional variants, it remains unclear how extensive variability in the genetic architecture of complex traits and diseases impacts TWAS results. Our goal was to investigate the different scenarios under which these methods yielded enough power to detect significant expression-trait associations. In this study, we conducted extensive simulations based on 6000 randomly chosen, unrelated Caucasian males from Geisinger's MyCode population to compare the power to detect cis expression-trait associations (within 500 kb of a gene) using the above-described approaches. To test TWAS across varying genetic backgrounds we simulated gene expression and phenotype using different quantitative trait loci per gene and cis-expression /trait heritability under genetic models that differentiate the effect of causality from that of pleiotropy. For each gene, on a training set ranging from 100 to 1000 individuals, we either (a) estimated regression coefficients with gene expression as the response using five different methods: LASSO, elastic net, Bayesian LASSO, Bayesian spike-slab, and Bayesian ridge regression or (b) performed eQTL analysis. We then sampled with replacement 50,000, 150,000, and 300,000 individuals respectively from the testing set of the remaining 5000 individuals and conducted GWAS on each set. Subsequently, we integrated the GWAS summary statistics derived from the testing set with the weights (or eQTLs) derived from the training set to identify expression-trait associations using (a) TWAS-MP (b) TWAS-SMR (c) eQTL-based GWAS, or (d) standalone GWAS. Finally, we examined the power to detect functionally relevant genes using the different approaches under the considered simulation scenarios. In general, we observed great similarities among TWAS-MP methods although the Bayesian methods resulted in improved power in comparison to LASSO and elastic net as the trait architecture grew more complex while training sample sizes and expression heritability remained small. Finally, we observed high power under causality but very low to moderate power under pleiotropy.

  11. Combinatorial therapy discovery using mixed integer linear programming.

    PubMed

    Pang, Kaifang; Wan, Ying-Wooi; Choi, William T; Donehower, Lawrence A; Sun, Jingchun; Pant, Dhruv; Liu, Zhandong

    2014-05-15

    Combinatorial therapies play increasingly important roles in combating complex diseases. Owing to the huge cost associated with experimental methods in identifying optimal drug combinations, computational approaches can provide a guide to limit the search space and reduce cost. However, few computational approaches have been developed for this purpose, and thus there is a great need of new algorithms for drug combination prediction. Here we proposed to formulate the optimal combinatorial therapy problem into two complementary mathematical algorithms, Balanced Target Set Cover (BTSC) and Minimum Off-Target Set Cover (MOTSC). Given a disease gene set, BTSC seeks a balanced solution that maximizes the coverage on the disease genes and minimizes the off-target hits at the same time. MOTSC seeks a full coverage on the disease gene set while minimizing the off-target set. Through simulation, both BTSC and MOTSC demonstrated a much faster running time over exhaustive search with the same accuracy. When applied to real disease gene sets, our algorithms not only identified known drug combinations, but also predicted novel drug combinations that are worth further testing. In addition, we developed a web-based tool to allow users to iteratively search for optimal drug combinations given a user-defined gene set. Our tool is freely available for noncommercial use at http://www.drug.liuzlab.org/. zhandong.liu@bcm.edu Supplementary data are available at Bioinformatics online.

  12. Polygenic overlap between schizophrenia risk and antipsychotic response: a genomic medicine approach

    PubMed Central

    Ruderfer, Douglas M; Charney, Alexander W; Readhead, Ben; Kidd, Brian A; Kähler, Anna K; Kenny, Paul J; Keiser, Michael J; Moran, Jennifer L; Hultman, Christina M; Scott, Stuart A; Sullivan, Patrick F; Purcell, Shaun M; Dudley, Joel T; Sklar, Pamela

    2016-01-01

    Summary Background Therapeutic treatments for schizophrenia do not alleviate symptoms for all patients and efficacy is limited by common, often severe, side-effects. Genetic studies of disease can identify novel drug targets, and drugs for which the mechanism has direct genetic support have increased likelihood of clinical success. Large-scale genetic studies of schizophrenia have increased the number of genes and gene sets associated with risk. We aimed to examine the overlap between schizophrenia risk loci and gene targets of a comprehensive set of medications to potentially inform and improve treatment of schizophrenia. Methods We defined schizophrenia risk loci as genomic regions reaching genome-wide significance in the latest Psychiatric Genomics Consortium schizophrenia genome-wide association study (GWAS) of 36 989 cases and 113 075 controls and loss of function variants observed only once among 5079 individuals in an exome-sequencing study of 2536 schizophrenia cases and 2543 controls (Swedish Schizophrenia Study). Using two large and orthogonally created databases, we collated drug targets into 167 gene sets targeted by pharmacologically similar drugs and examined enrichment of schizophrenia risk loci in these sets. We further linked the exome-sequenced data with a national drug registry (the Swedish Prescribed Drug Register) to assess the contribution of rare variants to treatment response, using clozapine prescription as a proxy for treatment resistance. Findings We combined results from testing rare and common variation and, after correction for multiple testing, two gene sets were associated with schizophrenia risk: agents against amoebiasis and other protozoal diseases (106 genes, p=0·00046, pcorrected =0·024) and antipsychotics (347 genes, p=0·00078, pcorrected=0·046). Further analysis pointed to antipsychotics as having independent enrichment after removing genes that overlapped these two target sets. We noted significant enrichment both in known targets of antipsychotics (70 genes, p=0·0078) and novel predicted targets (277 genes, p=0·019). Patients with treatment-resistant schizophrenia had an excess of rare disruptive variants in gene targets of antipsychotics (347 genes, p=0·0067) and in genes with evidence for a role in antipsychotic efficacy (91 genes, p=0·0029). Interpretation Our results support genetic overlap between schizophrenia pathogenesis and antipsychotic mechanism of action. This finding is consistent with treatment efficacy being polygenic and suggests that single-target therapeutics might be insufficient. We provide evidence of a role for rare functional variants in antipsychotic treatment response, pointing to a subset of patients where their genetic information could inform treatment. Finally, we present a novel framework for identifying treatments from genetic data and improving our understanding of therapeutic mechanism. PMID:26915512

  13. Learning a Markov Logic network for supervised gene regulatory network inference

    PubMed Central

    2013-01-01

    Background Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. Results We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate “regulates”, starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. Conclusions The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge. PMID:24028533

  14. Learning a Markov Logic network for supervised gene regulatory network inference.

    PubMed

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.

  15. Alteration of topoisomerase II-alpha gene in human breast cancer: association with responsiveness to anthracycline-based chemotherapy.

    PubMed

    Press, Michael F; Sauter, Guido; Buyse, Marc; Bernstein, Leslie; Guzman, Roberta; Santiago, Angela; Villalobos, Ivonne E; Eiermann, Wolfgang; Pienkowski, Tadeusz; Martin, Miguel; Robert, Nicholas; Crown, John; Bee, Valerie; Taupin, Henry; Flom, Kerry J; Tabah-Fisch, Isabelle; Pauletti, Giovanni; Lindsay, Mary-Ann; Riva, Alessandro; Slamon, Dennis J

    2011-03-01

    Approximately 35% of HER2-amplified breast cancers have coamplification of the topoisomerase II-alpha (TOP2A) gene encoding an enzyme that is a major target of anthracyclines. This study was designed to evaluate whether TOP2A gene alterations may predict incremental responsiveness to anthracyclines in some breast cancers. A total of 4,943 breast cancers were analyzed for alterations in TOP2A and HER2. Primary tumor tissues from patients with metastatic breast cancer treated in a trial of chemotherapy plus/minus trastuzumab were studied for amplification/deletion of TOP2A and HER2 as a test set followed by evaluation of malignancies from two separate, large trials for changes in these same genes as a validation set. Association between these alterations and clinical outcomes was determined. Test set cases containing HER2 amplification treated with doxorubicin and cyclophosphamide (AC) plus trastuzumab, demonstrated longer progression-free survival compared to those treated with AC alone (P = .0002). However, patients treated with AC alone whose tumors contain HER2/TOP2A coamplification experienced a similar improvement in survival (P = .004). Conversely, for patients treated with paclitaxel, HER2/TOP2A coamplification was not associated with improved outcomes. These observations were confirmed in a larger validation set, where HER2/TOP2A coamplification was again associated with longer survival when only anthracycline-containing chemotherapy was used for treatment compared with outcome in HER2-positive cancers lacking TOP2A coamplification. In a study involving nearly 5,000 breast malignancies, both test set and validation set demonstrate that TOP2A coamplification, not HER2 amplification, is the clinically useful predictive marker of an incremental response to anthracycline-based chemotherapy. Absence of HER2/TOP2A coamplification may indicate a more restricted efficacy advantage for breast cancers than previously thought.

  16. Alteration of Topoisomerase II–Alpha Gene in Human Breast Cancer: Association With Responsiveness to Anthracycline-Based Chemotherapy

    PubMed Central

    Press, Michael F.; Sauter, Guido; Buyse, Marc; Bernstein, Leslie; Guzman, Roberta; Santiago, Angela; Villalobos, Ivonne E.; Eiermann, Wolfgang; Pienkowski, Tadeusz; Martin, Miguel; Robert, Nicholas; Crown, John; Bee, Valerie; Taupin, Henry; Flom, Kerry J.; Tabah-Fisch, Isabelle; Pauletti, Giovanni; Lindsay, Mary-Ann; Riva, Alessandro; Slamon, Dennis J.

    2011-01-01

    Purpose Approximately 35% of HER2-amplified breast cancers have coamplification of the topoisomerase II-alpha (TOP2A) gene encoding an enzyme that is a major target of anthracyclines. This study was designed to evaluate whether TOP2A gene alterations may predict incremental responsiveness to anthracyclines in some breast cancers. Methods A total of 4,943 breast cancers were analyzed for alterations in TOP2A and HER2. Primary tumor tissues from patients with metastatic breast cancer treated in a trial of chemotherapy plus/minus trastuzumab were studied for amplification/deletion of TOP2A and HER2 as a test set followed by evaluation of malignancies from two separate, large trials for changes in these same genes as a validation set. Association between these alterations and clinical outcomes was determined. Results Test set cases containing HER2 amplification treated with doxorubicin and cyclophosphamide (AC) plus trastuzumab, demonstrated longer progression-free survival compared to those treated with AC alone (P = .0002). However, patients treated with AC alone whose tumors contain HER2/TOP2A coamplification experienced a similar improvement in survival (P = .004). Conversely, for patients treated with paclitaxel, HER2/TOP2A coamplification was not associated with improved outcomes. These observations were confirmed in a larger validation set, where HER2/TOP2A coamplification was again associated with longer survival when only anthracycline-containing chemotherapy was used for treatment compared with outcome in HER2-positive cancers lacking TOP2A coamplification. Conclusion In a study involving nearly 5,000 breast malignancies, both test set and validation set demonstrate that TOP2A coamplification, not HER2 amplification, is the clinically useful predictive marker of an incremental response to anthracycline-based chemotherapy. Absence of HER2/TOP2A coamplification may indicate a more restricted efficacy advantage for breast cancers than previously thought. PMID:21189395

  17. Development of an objective gene expression panel as an alternative to self-reported symptom scores in human influenza challenge trials.

    PubMed

    Muller, Julius; Parizotto, Eneida; Antrobus, Richard; Francis, James; Bunce, Campbell; Stranks, Amanda; Nichols, Marshall; McClain, Micah; Hill, Adrian V S; Ramasamy, Adaikalavan; Gilbert, Sarah C

    2017-06-08

    Influenza challenge trials are important for vaccine efficacy testing. Currently, disease severity is determined by self-reported scores to a list of symptoms which can be highly subjective. A more objective measure would allow for improved data analysis. Twenty-one volunteers participated in an influenza challenge trial. We calculated the daily sum of scores (DSS) for a list of 16 influenza symptoms. Whole blood collected at baseline and 24, 48, 72 and 96 h post challenge was profiled on Illumina HT12v4 microarrays. Changes in gene expression most strongly correlated with DSS were selected to train a Random Forest model and tested on two independent test sets consisting of 41 individuals profiled on a different microarray platform and 33 volunteers assayed by qRT-PCR. 1456 probes are significantly associated with DSS at 1% false discovery rate. We selected 19 genes with the largest fold change to train a random forest model. We observed good concordance between predicted and actual scores in the first test set (r = 0.57; RMSE = -16.1%) with the greatest agreement achieved on samples collected approximately 72 h post challenge. Therefore, we assayed samples collected at baseline and 72 h post challenge in the second test set by qRT-PCR and observed good concordance (r = 0.81; RMSE = -36.1%). We developed a 19-gene qRT-PCR panel to predict DSS, validated on two independent datasets. A transcriptomics based panel could provide a more objective measure of symptom scoring in future influenza challenge studies. Trial registration Samples were obtained from a clinical trial with the ClinicalTrials.gov Identifier: NCT02014870, first registered on December 5, 2013.

  18. htsint: a Python library for sequencing pipelines that combines data through gene set generation.

    PubMed

    Richards, Adam J; Herrel, Anthony; Bonneaud, Camille

    2015-09-24

    Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.

  19. Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork

    PubMed Central

    Druka, Arnis; Druka, Ilze; Centeno, Arthur G; Li, Hongqiang; Sun, Zhaohui; Thomas, William TB; Bonar, Nicola; Steffenson, Brian J; Ullrich, Steven E; Kleinhofs, Andris; Wise, Roger P; Close, Timothy J; Potokina, Elena; Luo, Zewei; Wagner, Carola; Schweizer, Günther F; Marshall, David F; Kearsey, Michael J; Williams, Robert W; Waugh, Robbie

    2008-01-01

    Background A typical genetical genomics experiment results in four separate data sets; genotype, gene expression, higher-order phenotypic data and metadata that describe the protocols, processing and the array platform. Used in concert, these data sets provide the opportunity to perform genetic analysis at a systems level. Their predictive power is largely determined by the gene expression dataset where tens of millions of data points can be generated using currently available mRNA profiling technologies. Such large, multidimensional data sets often have value beyond that extracted during their initial analysis and interpretation, particularly if conducted on widely distributed reference genetic materials. Besides quality and scale, access to the data is of primary importance as accessibility potentially allows the extraction of considerable added value from the same primary dataset by the wider research community. Although the number of genetical genomics experiments in different plant species is rapidly increasing, none to date has been presented in a form that allows quick and efficient on-line testing for possible associations between genes, loci and traits of interest by an entire research community. Description Using a reference population of 150 recombinant doubled haploid barley lines we generated novel phenotypic, mRNA abundance and SNP-based genotyping data sets, added them to a considerable volume of legacy trait data and entered them into the GeneNetwork . GeneNetwork is a unified on-line analytical environment that enables the user to test genetic hypotheses about how component traits, such as mRNA abundance, may interact to condition more complex biological phenotypes (higher-order traits). Here we describe these barley data sets and demonstrate some of the functionalities GeneNetwork provides as an easily accessible and integrated analytical environment for exploring them. Conclusion By integrating barley genotypic, phenotypic and mRNA abundance data sets directly within GeneNetwork's analytical environment we provide simple web access to the data for the research community. In this environment, a combination of correlation analysis and linkage mapping provides the potential to identify and substantiate gene targets for saturation mapping and positional cloning. By integrating datasets from an unsequenced crop plant (barley) in a database that has been designed for an animal model species (mouse) with a well established genome sequence, we prove the importance of the concept and practice of modular development and interoperability of software engineering for biological data sets. PMID:19017390

  20. Distinct gene expression profiles determine molecular treatment response in childhood acute lymphoblastic leukemia.

    PubMed

    Cario, Gunnar; Stanulla, Martin; Fine, Bernard M; Teuffel, Oliver; Neuhoff, Nils V; Schrauder, André; Flohr, Thomas; Schäfer, Beat W; Bartram, Claus R; Welte, Karl; Schlegelberger, Brigitte; Schrappe, Martin

    2005-01-15

    Treatment resistance, as indicated by the presence of high levels of minimal residual disease (MRD) after induction therapy and induction consolidation, is associated with a poor prognosis in childhood acute lymphoblastic leukemia (ALL). We hypothesized that treatment resistance is an intrinsic feature of ALL cells reflected in the gene expression pattern and that resistance to chemotherapy can be predicted before treatment. To test these hypotheses, gene expression signatures of ALL samples with high MRD load were compared with those of samples without measurable MRD during treatment. We identified 54 genes that clearly distinguished resistant from sensitive ALL samples. Genes with low expression in resistant samples were predominantly associated with cell-cycle progression and apoptosis, suggesting that impaired cell proliferation and apoptosis are involved in treatment resistance. Prediction analysis using randomly selected samples as a training set and the remaining samples as a test set revealed an accuracy of 84%. We conclude that resistance to chemotherapy seems at least in part to be an intrinsic feature of ALL cells. Because treatment response could be predicted with high accuracy, gene expression profiling could become a clinically relevant tool for treatment stratification in the early course of childhood ALL.

  1. A 10-Gene Classifier for Indeterminate Thyroid Nodules: Development and Multicenter Accuracy Study

    PubMed Central

    González, Hernán E.; Martínez, José R.; Vargas-Salas, Sergio; Solar, Antonieta; Veliz, Loreto; Cruz, Francisco; Arias, Tatiana; Loyola, Soledad; Horvath, Eleonora; Tala, Hernán; Traipe, Eufrosina; Meneses, Manuel; Marín, Luis; Wohllk, Nelson; Diaz, René E.; Véliz, Jesús; Pineda, Pedro; Arroyo, Patricia; Mena, Natalia; Bracamonte, Milagros; Miranda, Giovanna; Bruce, Elsa

    2017-01-01

    Background: In most of the world, diagnostic surgery remains the most frequent approach for indeterminate thyroid cytology. Although several molecular tests are available for testing in centralized commercial laboratories in the United States, there are no available kits for local laboratory testing. The aim of this study was to develop a prototype in vitro diagnostic (IVD) gene classifier for the further characterization of nodules with an indeterminate thyroid cytology. Methods: In a first stage, the expression of 18 genes was determined by quantitative polymerase chain reaction (qPCR) in a broad histopathological spectrum of 114 fresh-tissue biopsies. Expression data were used to train several classifiers by supervised machine learning approaches. Classifiers were tested in an independent set of 139 samples. In a second stage, the best classifier was chosen as a model to develop a multiplexed-qPCR IVD prototype assay, which was tested in a prospective multicenter cohort of fine-needle aspiration biopsies. Results: In tissue biopsies, the best classifier, using only 10 genes, reached an optimal and consistent performance in the ninefold cross-validated testing set (sensitivity 93% and specificity 81%). In the multicenter cohort of fine-needle aspiration biopsy samples, the 10-gene signature, built into a multiplexed-qPCR IVD prototype, showed an area under the curve of 0.97, a positive predictive value of 78%, and a negative predictive value of 98%. By Bayes' theorem, the IVD prototype is expected to achieve a positive predictive value of 64–82% and a negative predictive value of 97–99% in patients with a cancer prevalence range of 20–40%. Conclusions: A new multiplexed-qPCR IVD prototype is reported that accurately classifies thyroid nodules and may provide a future solution suitable for local reference laboratory testing. PMID:28521616

  2. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms.

    PubMed

    Jani, Saurin D; Argraves, Gary L; Barth, Jeremy L; Argraves, W Scott

    2010-04-01

    An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies, intermolecular interactions, overlays of genes onto KEGG pathway diagrams and heatmaps of expression intensity values. GeneMesh is freely available online at http://proteogenomics.musc.edu/genemesh/.

  3. The Cure: Design and Evaluation of a Crowdsourcing Game for Gene Selection for Breast Cancer Survival Prediction

    PubMed Central

    Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2014-01-01

    Background Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. Objective The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player’s prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Methods We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Results Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. Conclusions The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge. PMID:25654473

  4. Network-based integration of GWAS and gene expression identifies a HOX-centric network associated with serous ovarian cancer risk

    PubMed Central

    Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.

    2015-01-01

    Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509

  5. Incorporating networks in a probabilistic graphical model to find drivers for complex human diseases.

    PubMed

    Mezlini, Aziz M; Goldenberg, Anna

    2017-10-01

    Discovering genetic mechanisms driving complex diseases is a hard problem. Existing methods often lack power to identify the set of responsible genes. Protein-protein interaction networks have been shown to boost power when detecting gene-disease associations. We introduce a Bayesian framework, Conflux, to find disease associated genes from exome sequencing data using networks as a prior. There are two main advantages to using networks within a probabilistic graphical model. First, networks are noisy and incomplete, a substantial impediment to gene discovery. Incorporating networks into the structure of a probabilistic models for gene inference has less impact on the solution than relying on the noisy network structure directly. Second, using a Bayesian framework we can keep track of the uncertainty of each gene being associated with the phenotype rather than returning a fixed list of genes. We first show that using networks clearly improves gene detection compared to individual gene testing. We then show consistently improved performance of Conflux compared to the state-of-the-art diffusion network-based method Hotnet2 and a variety of other network and variant aggregation methods, using randomly generated and literature-reported gene sets. We test Hotnet2 and Conflux on several network configurations to reveal biases and patterns of false positives and false negatives in each case. Our experiments show that our novel Bayesian framework Conflux incorporates many of the advantages of the current state-of-the-art methods, while offering more flexibility and improved power in many gene-disease association scenarios.

  6. Developing a Synthetic Biology Toolkit for Comamonas testosteroni, an Emerging Cellular Chassis for Bioremediation.

    PubMed

    Tang, Qiang; Lu, Ting; Liu, Shuang-Jiang

    2018-06-12

    Synthetic biology is rapidly evolving into a new phase that emphasizes real-world applications such as environmental remediation. Recently, Comamonas testosteroni has become a promising chassis for bioremediation due to its natural pollutant-degrading capacity; however, its application is hindered by the lack of fundamental gene expression tools. Here, we present a synthetic biology toolkit that enables rapid creation of functional gene circuits in C. testosteroni. We first built a shuttle system that allows efficient circuit construction in E. coli and necessary phenotypic testing in C. testosteroni. Then, we tested a set of wildtype inducible promoters, and further used a hybrid strategy to create engineered promoters to expand expression strength and dynamics. Additionally, we tested the T7 RNA Polymerase-P T7 promoter system and reduced its leaky expression through promoter mutation for gene expression. By coupling random library construction with FACS screening, we further developed a synthetic T7 promoter library to confer a wider range of expression strength and dynamic characteristics. This study provides a set of valuable tools to engineer gene circuits in C. testosteroni, facilitating the establishment of the organism as a useful microbial chassis for bioremediation purposes.

  7. A formalin-fixed paraffin-embedded (FFPE)-based prognostic signature to predict metastasis in clinically low risk stage I/II microsatellite stable colorectal cancer.

    PubMed

    Low, Yee Syuen; Blöcker, Christopher; McPherson, John R; Tang, See Aik; Cheng, Ying Ying; Wong, Joyner Y S; Chua, Clarinda; Lim, Tony K H; Tang, Choong Leong; Chew, Min Hoe; Tan, Patrick; Tan, Iain B; Rozen, Steven G; Cheah, Peh Yean

    2017-09-10

    Approximately 20% early-stage (I/II) colorectal cancer (CRC) patients develop metastases despite curative surgery. We aim to develop a formalin-fixed and paraffin-embedded (FFPE)-based predictor of metastases in early-stage, clinically-defined low risk, microsatellite-stable (MSS) CRC patients. We considered genome-wide mRNA and miRNA expression and mutation status of 20 genes assayed in 150 fresh-frozen tumours with known metastasis status. We selected 193 genes for further analysis using NanoString nCounter arrays on corresponding FFPE tumours. Neither mutation status nor miRNA expression improved the estimated prediction. The final predictor, ColoMet19, based on the top 19 genes' mRNA levels trained by Random Forest machine-learning strategy, had an estimated positive-predictive-value (PPV) of 0.66. We tested ColoMet19 on an independent test-set of 131 tumours and obtained a population-adjusted PPV of 0.67 indicating that early-stage CRC patients who tested positive have a 67% risk of developing metastases, substantially higher than the metastasis risk of 40% for node-positive (Stage III) patients who are generally treated with chemotherapy. Predicted-positive patients also had poorer metastasis-free survival (hazard ratios [HR] = 1.92, design-set; HR = 2.05, test-set). Thus, early-stage CRC patients who test positive may be considered for adjuvant therapy after surgery. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. A New Primer Set to Amplify the Mitochondrial Cytochrome C Oxidase Subunit I (COI) Gene in the DHA-Rich Microalgae, the Genus Aurantiochytrium.

    PubMed

    Nishitani, Goh; Yoshida, Masaki

    2018-06-01

    This study was performed in order to develop a primer set for mitochondrial cytochrome c oxidase subunit I (COI) in the DHA-rich microalgae of the genus Aurantiochytrium. The performance of the primer set was tested using 12 Aurantiochytrium strains and other thraustochytrid species. There were no genetic polymorphisms in the mitochondrial sequences from the Aurantiochytrium strains, in contrast to the nuclear 18S rRNA gene sequence. This newly developed primer set amplified sequences from Aurantiochytrium and closely related genera, and may be useful for species identification and clarifying the genetic diversity of Aurantiochytrium in the field.

  9. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants.

    PubMed

    Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer

    2014-11-15

    Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. heckerma@microsoft.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  10. A Versatile Panel of Reference Gene Assays for the Measurement of Chicken mRNA by Quantitative PCR

    PubMed Central

    Maier, Helena J.; Van Borm, Steven; Young, John R.; Fife, Mark

    2016-01-01

    Quantitative real-time PCR assays are widely used for the quantification of mRNA within avian experimental samples. Multiple stably-expressed reference genes, selected for the lowest variation in representative samples, can be used to control random technical variation. Reference gene assays must be reliable, have high amplification specificity and efficiency, and not produce signals from contaminating DNA. Whilst recent research papers identify specific genes that are stable in particular tissues and experimental treatments, here we describe a panel of ten avian gene primer and probe sets that can be used to identify suitable reference genes in many experimental contexts. The panel was tested with TaqMan and SYBR Green systems in two experimental scenarios: a tissue collection and virus infection of cultured fibroblasts. GeNorm and NormFinder algorithms were able to select appropriate reference gene sets in each case. We show the effects of using the selected genes on the detection of statistically significant differences in expression. The results are compared with those obtained using 28s ribosomal RNA, the present most widely accepted reference gene in chicken work, identifying circumstances where its use might provide misleading results. Methods for eliminating DNA contamination of RNA reduced, but did not completely remove, detectable DNA. We therefore attached special importance to testing each qPCR assay for absence of signal using DNA template. The assays and analyses developed here provide a useful resource for selecting reference genes for investigations of avian biology. PMID:27537060

  11. RNA polymerase I transcription in a Brassica interspecific hybrid and its progenitors: Tests of transcription factor involvement in nucleolar dominance.

    PubMed Central

    Frieman, M; Chen, Z J; Saez-Vasquez, J; Shen, L A; Pikaard, C S

    1999-01-01

    In interspecific hybrids or allopolyploids, often one parental set of ribosomal RNA genes is transcribed and the other is silent, an epigenetic phenomenon known as nucleolar dominance. Silencing is enforced by cytosine methylation and histone deacetylation, but the initial discrimination mechanism is unknown. One hypothesis is that a species-specific transcription factor is inactivated, thereby silencing one set of rRNA genes. Another is that dominant rRNA genes have higher binding affinities for limiting transcription factors. A third suggests that selective methylation of underdominant rRNA genes blocks transcription factor binding. We tested these hypotheses using Brassica napus (canola), an allotetraploid derived from B. rapa and B. oleracea in which only B. rapa rRNA genes are transcribed. B. oleracea and B. rapa rRNA genes were active when transfected into protoplasts of the other species, which argues against the species-specific transcription factor model. B. oleracea and B. rapa rRNA genes also competed equally for the pol I transcription machinery in vitro and in vivo. Cytosine methylation had no effect on rRNA gene transcription in vitro, which suggests that transcription factor binding was unimpaired. These data are inconsistent with the prevailing models and point to discrimination mechanisms that are likely to act at a chromosomal level. PMID:10224274

  12. Model-based gene set analysis for Bioconductor.

    PubMed

    Bauer, Sebastian; Robinson, Peter N; Gagneur, Julien

    2011-07-01

    Gene Ontology and other forms of gene-category analysis play a major role in the evaluation of high-throughput experiments in molecular biology. Single-category enrichment analysis procedures such as Fisher's exact test tend to flag large numbers of redundant categories as significant, which can complicate interpretation. We have recently developed an approach called model-based gene set analysis (MGSA), that substantially reduces the number of redundant categories returned by the gene-category analysis. In this work, we present the Bioconductor package mgsa, which makes the MGSA algorithm available to users of the R language. Our package provides a simple and flexible application programming interface for applying the approach. The mgsa package has been made available as part of Bioconductor 2.8. It is released under the conditions of the Artistic license 2.0. peter.robinson@charite.de; julien.gagneur@embl.de.

  13. A comprehensive custom panel design for routine hereditary cancer testing: preserving control, improving diagnostics and revealing a complex variation landscape.

    PubMed

    Castellanos, Elisabeth; Gel, Bernat; Rosas, Inma; Tornero, Eva; Santín, Sheila; Pluvinet, Raquel; Velasco, Juan; Sumoy, Lauro; Del Valle, Jesús; Perucho, Manuel; Blanco, Ignacio; Navarro, Matilde; Brunet, Joan; Pineda, Marta; Feliubadaló, Lidia; Capellá, Gabi; Lázaro, Conxi; Serra, Eduard

    2017-01-04

    We wanted to implement an NGS strategy to globally analyze hereditary cancer with diagnostic quality while retaining the same degree of understanding and control we had in pre-NGS strategies. To do this, we developed the I2HCP panel, a custom bait library covering 122 hereditary cancer genes. We improved bait design, tested different NGS platforms and created a clinically driven custom data analysis pipeline. The I2HCP panel was developed using a training set of hereditary colorectal cancer, hereditary breast and ovarian cancer and neurofibromatosis patients and reached an accuracy, analytical sensitivity and specificity greater than 99%, which was maintained in a validation set. I2HCP changed our diagnostic approach, involving clinicians and a genetic diagnostics team from panel design to reporting. The new strategy improved diagnostic sensitivity, solved uncertain clinical diagnoses and identified mutations in new genes. We assessed the genetic variation in the complete set of hereditary cancer genes, revealing a complex variation landscape that coexists with the disease-causing mutation. We developed, validated and implemented a custom NGS-based strategy for hereditary cancer diagnostics that improved our previous workflows. Additionally, the existence of a rich genetic variation in hereditary cancer genes favors the use of this panel to investigate their role in cancer risk.

  14. In vitro transcriptomic prediction of hepatotoxicity for early drug discovery

    PubMed Central

    Cheng, Feng; Theodorescu, Dan; Schulman, Ira G.; Lee, Jae K.

    2012-01-01

    Liver toxicity (hepatotoxicity) is a critical issue in drug discovery and development. Standard preclinical evaluation of drug hepatotoxicity is generally performed using in vivo animal systems. However, only a small number of preselected compounds can be examined in vivo due to high experimental costs. A more efficient yet accurate screening technique which can identify potentially hepatotoxic compounds in the early stages of drug development would thus be valuable. Here, we develop and apply a novel genomic prediction technique for screening hepatotoxic compounds based on in vitro human liver cell tests. Using a training set of in vivo rodent experiments for drug hepatotoxicity evaluation, we discovered common biomarkers of drug-induced liver toxicity among six heterogeneous compounds. This gene set was further triaged to a subset of 32 genes that can be used as a multi-gene expression signature to predict hepatotoxicity. This multi-gene predictor was independently validated and showed consistently high prediction performance on five test sets of in vitro human liver cell and in vivo animal toxicity experiments. The predictor also demonstrated utility in evaluating different degrees of toxicity in response to drug concentrations which may be useful not only for discerning a compound’s general hepatotoxicity but also for determining its toxic concentration. PMID:21884709

  15. The Classification of Sini Decoction Pattern in Traditional Chinese Medicine by Gene Expression Profiling

    PubMed Central

    Cheng, Hung-Tsu; Chen, Chaang-Ray; Li, Chia-Yang; Huang, Chao-Ying

    2016-01-01

    We investigated the syndromes of the Sini decoction pattern (SDP), a common ZHENG in traditional Chinese medicine (TCM). The syndromes of SDP were correlated with various severe Yang deficiency related symptoms. To obtain a common profile for SDP, we distributed questionnaires to 300 senior clinical TCM practitioners. According to the survey, we concluded 2 sets of symptoms for SDP: (1) pulse feels deep or faint and (2) reversal cold of the extremities. Twenty-four individuals from Taipei City Hospital, Linsen Chinese Medicine Branch, Taiwan, were recruited. We extracted the total mRNA of peripheral blood mononuclear cells from the 24 individuals for microarray experiments. Twelve individuals (including 6 SDP patients and 6 non-SDP individuals) were used as the training set to identify biomarkers for distinguishing the SDP and non-SDP groups. The remaining 12 individuals were used as the test set. The test results indicated that the gene expression profiles of the identified biomarkers could effectively distinguish the 2 groups by adopting a hierarchical clustering algorithm. Our results suggest the feasibility of using the identified biomarkers in facilitating the diagnosis of TCM ZHENGs. Furthermore, the gene expression profiles of biomarker genes could provide a molecular explanation corresponding to the ZHENG of TCM. PMID:27200105

  16. Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

    PubMed

    Mathur, Sunil; Sadana, Ajit

    2015-12-01

    We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.

  17. A 6-gene signature identifies four molecular subgroups of neuroblastoma

    PubMed Central

    2011-01-01

    Background There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. Results The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). Conclusions Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics. PMID:21492432

  18. Identification of a set of genes showing regionally enriched expression in the mouse brain

    PubMed Central

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM

    2008-01-01

    Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066

  19. Identification of a set of genes showing regionally enriched expression in the mouse brain.

    PubMed

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa L C; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven J M

    2008-07-14

    The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression.

  20. Microgravity and immunity: Changes in lymphocyte gene expression.

    NASA Astrophysics Data System (ADS)

    Risin, D.; Ward, N. E.; Risin, S. A.; Pellis, N. R.

    Earlier studies had shown that modeled and true microgravity MG cause multiple direct effects on human lymphocytes MG inhibits lymphocyte locomotion suppresses polyclonal and antigen-specific activation affects signal transduction mechanisms as well as activation-induced apoptosis In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes MGSG in general and specifically those genes that might be responsible for the functional and structural changes observed earlier Two sets of experiments targeting different goals were conducted In the first set T-lymphocytes from normal donors were activated with anti-CD3 and IL2 and then cultured in 1g static and modeled MG MMG conditions Rotating Wall Vessel bioreactor for 24 hours This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus PHA thus triggering the apoptotic pathway Total RNA was extracted using the RNeasy isolation kit Qiagen Valencia CA Affymetrix Gene Chips U133A allowing testing for 18 400 human genes were used for microarray analysis The experiments were performed in triplicates with T-cells obtained from different blood donors to minimize the possible input of biological variation in gene expression and discriminate changes that are associated with the

  1. Detecting differentially expressed genes in heterogeneous diseases using half Student's t-test.

    PubMed

    Hsu, Chun-Lun; Lee, Wen-Chung

    2010-12-01

    Microarray technology provides information about hundreds and thousands of gene-expression data in a single experiment. To search for disease-related genes, researchers test for those genes that are differentially expressed between the case subjects and the control subjects. The authors propose a new test, the 'half Student's t-test', specifically for detecting differentially expressed genes in heterogeneous diseases. Monte-Carlo simulation shows that the test maintains the nominal α level quite well for both normal and non-normal distributions. Power of the half Student's t is higher than that of the conventional 'pooled' Student's t when there is heterogeneity in the disease under study. The power gain by using the half Student's t can reach ∼10% when the standard deviation of the case group is 50% larger than that of the control group. Application to a colon cancer data reveals that when the false discovery rate (FDR) is controlled at 0.05, the half Student's t can detect 344 differentially expressed genes, whereas the pooled Student's t can detect only 65 genes. Or alternatively, if only 50 genes are to be selected, the FDR for the pooled Student's t has to be set at 0.0320 (false positive rate of ∼3%), but for the half Student's t, it can be at as low as 0.0001 (false positive rate of about one per ten thousands). The half Student's t-test is to be recommended for the detection of differentially expressed genes in heterogeneous diseases.

  2. Quality controls in cellular immunotherapies: rapid assessment of clinical grade dendritic cells by gene expression profiling.

    PubMed

    Castiello, Luciano; Sabatino, Marianna; Zhao, Yingdong; Tumaini, Barbara; Ren, Jiaqiang; Ping, Jin; Wang, Ena; Wood, Lauren V; Marincola, Francesco M; Puri, Raj K; Stroncek, David F

    2013-02-01

    Cell-based immunotherapies are among the most promising approaches for developing effective and targeted immune response. However, their clinical usefulness and the evaluation of their efficacy rely heavily on complex quality control assessment. Therefore, rapid systematic methods are urgently needed for the in-depth characterization of relevant factors affecting newly developed cell product consistency and the identification of reliable markers for quality control. Using dendritic cells (DCs) as a model, we present a strategy to comprehensively characterize manufactured cellular products in order to define factors affecting their variability, quality and function. After generating clinical grade human monocyte-derived mature DCs (mDCs), we tested by gene expression profiling the degrees of product consistency related to the manufacturing process and variability due to intra- and interdonor factors, and how each factor affects single gene variation. Then, by calculating for each gene an index of variation we selected candidate markers for identity testing, and defined a set of genes that may be useful comparability and potency markers. Subsequently, we confirmed the observed gene index of variation in a larger clinical data set. In conclusion, using high-throughput technology we developed a method for the characterization of cellular therapies and the discovery of novel candidate quality assurance markers.

  3. A MAOA gene*cocaine severity interaction on impulsivity and neuropsychological measures of orbitofrontal dysfunction: preliminary results.

    PubMed

    Verdejo-García, Antonio; Albein-Urios, Natalia; Molina, Esther; Ching-López, Ana; Martínez-González, José M; Gutiérrez, Blanca

    2013-11-01

    Based on previous evidence of a MAOA gene*cocaine use interaction on orbitofrontal cortex volume attrition, we tested whether the MAOA low activity variant and cocaine use severity are interactively associated with impulsivity and behavioral indices of orbitofrontal dysfunction: emotion recognition and decision-making. 72 cocaine dependent individuals and 52 non-drug using controls (including healthy individuals and problem gamblers) were genotyped for the MAOA gene and tested using the UPPS-P Impulsive Behavior Scale, the Iowa Gambling Task and the Ekman's Facial Emotions Recognition Test. To test the main hypothesis, we conducted hierarchical multiple regression analyses including three sets of predictors: (1) age, (2) MAOA genotype and severity of cocaine use, and (3) the interaction between MAOA genotype and severity of cocaine use. UPPS-P, Ekman Test and Iowa Gambling Task's scores were the outcome measures. We computed the statistical significance of the prediction change yielded by each consecutive set, with 'a priori' interest in the MAOA*cocaine severity interaction. We found significant effects of the MAOA gene*cocaine use severity interaction on the emotion recognition scores and the UPPS-P's dimensions of Positive Urgency and Sensation Seeking: Low activity carriers with higher cocaine exposure had poorer emotion recognition and higher Positive Urgency and Sensation Seeking. Cocaine users carrying the MAOA low activity show a greater impact of cocaine use on impulsivity and behavioral measures of orbitofrontal cortex dysfunction. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function

    PubMed Central

    2009-01-01

    Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443

  5. Economic benefits of using adaptive predictive models of reproductive toxicity in the context of a tiered testing program

    EPA Science Inventory

    A predictive model of reproductive toxicity, as observed in rat multigeneration reproductive (MGR) studies, was previously developed using high throughput screening (HTS) data from 36 in vitro assays mapped to 8 genes or gene-sets from Phase I of USEPA ToxCast research program, t...

  6. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing.

    PubMed

    Mandelker, Diana; Schmidt, Ryan J; Ankala, Arunkanth; McDonald Gibson, Kristin; Bowser, Mark; Sharma, Himanshu; Duffy, Elizabeth; Hegde, Madhuri; Santani, Avni; Lebo, Matthew; Funke, Birgit

    2016-12-01

    Next-generation sequencing (NGS) is now routinely used to interrogate large sets of genes in a diagnostic setting. Regions of high sequence homology continue to be a major challenge for short-read technologies and can lead to false-positive and false-negative diagnostic errors. At the scale of whole-exome sequencing (WES), laboratories may be limited in their knowledge of genes and regions that pose technical hurdles due to high homology. We have created an exome-wide resource that catalogs highly homologous regions that is tailored toward diagnostic applications. This resource was developed using a mappability-based approach tailored to current Sanger and NGS protocols. Gene-level and exon-level lists delineate regions that are difficult or impossible to analyze via standard NGS. These regions are ranked by degree of affectedness, annotated for medical relevance, and classified by the type of homology (within-gene, different functional gene, known pseudogene, uncharacterized noncoding region). Additionally, we provide a list of exons that cannot be analyzed by short-amplicon Sanger sequencing. This resource can help guide clinical test design, supplemental assay implementation, and results interpretation in the context of high homology.Genet Med 18 12, 1282-1289.

  7. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    PubMed

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  8. Reliable pre-eclampsia pathways based on multiple independent microarray data sets.

    PubMed

    Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo

    2015-02-01

    Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre-eclampsia. © The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Role of DISC1 interacting proteins in schizophrenia risk from genome-wide analysis of missense SNPs.

    PubMed

    Costas, Javier; Suárez-Rama, Jose Javier; Carrera, Noa; Paz, Eduardo; Páramo, Mario; Agra, Santiago; Brenlla, Julio; Ramos-Ríos, Ramón; Arrojo, Manuel

    2013-11-01

    A balanced translocation affecting DISC1 cosegregates with several psychiatric disorders, including schizophrenia, in a Scottish family. DISC1 is a hub protein of a network of protein-protein interactions involved in multiple developmental pathways within the brain. Gene set-based analysis has been proposed as an alternative to individual analysis of single nucleotide polymorphisms (SNPs) to get information from genome-wide association studies. In this work, we tested for an overrepresentation of the DISC1 interacting proteins within the top results of our ranked list of genes based on our previous genome-wide association study of missense SNPs in schizophrenia. Our data set consisted of 5100 common missense SNPs genotyped in 476 schizophrenic patients and 447 control subjects from Galicia, NW Spain. We used a modification of the Gene Set Enrichment Analysis adapted for SNPs, as implemented in the GenGen software. The analysis detected an overrepresentation of the DISC1 interacting proteins (permuted P-value=0.0158), indicative of the role of this gene set in schizophrenia risk. We identified seven leading-edge genes, MACF1, UTRN, DST, DISC1, KIF3A, SYNE1, and AKAP9, responsible for the overrepresentation. These genes are involved in neuronal cytoskeleton organization and intracellular transport through the microtubule cytoskeleton, suggesting that these processes may be impaired in schizophrenia. © 2013 John Wiley & Sons Ltd/University College London.

  10. SYBR green-based real-time reverse transcription-PCR for typing and subtyping of all hemagglutinin and neuraminidase genes of avian influenza viruses and comparison to standard serological subtyping tests.

    PubMed

    Tsukamoto, Kenji; Panei, Carlos Javier; Javier, Panei Carlos; Shishido, Makiko; Noguchi, Daigo; Pearce, John; Kang, Hyun-Mi; Jeong, Ok Mi; Lee, Youn-Jeong; Nakanishi, Koji; Ashizawa, Takayoshi

    2012-01-01

    Continuing outbreaks of H5N1 highly pathogenic (HP) avian influenza virus (AIV) infections of wild birds and poultry worldwide emphasize the need for global surveillance of wild birds. To support the future surveillance activities, we developed a SYBR green-based, real-time reverse transcriptase PCR (rRT-PCR) for detecting nucleoprotein (NP) genes and subtyping 16 hemagglutinin (HA) and 9 neuraminidase (NA) genes simultaneously. Primers were improved by focusing on Eurasian or North American lineage genes; the number of mixed-base positions per primer was set to five or fewer, and the concentration of each primer set was optimized empirically. Also, 30 cycles of amplification of 1:10 dilutions of cDNAs from cultured viruses effectively reduced minor cross- or nonspecific reactions. Under these conditions, 346 HA and 345 NA genes of 349 AIVs were detected, with average sensitivities of NP, HA, and NA genes of 10(1.5), 10(2.3), and 10(3.1) 50% egg infective doses, respectively. Utility of rRT-PCR for subtyping AIVs was compared with that of current standard serological tests by using 104 recent migratory duck virus isolates. As a result, all HA genes and 99% of the NA genes were genetically subtyped, while only 45% of HA genes and 74% of NA genes were serologically subtyped. Additionally, direct subtyping of AIVs in fecal samples was possible by 40 cycles of amplification: approximately 70% of HA and NA genes of NP gene-positive samples were successfully subtyped. This validation study indicates that rRT-PCR with optimized primers and reaction conditions is a powerful tool for subtyping varied AIVs in clinical and cultured samples.

  11. RAMONA: a Web application for gene set analysis on multilevel omics data.

    PubMed

    Sass, Steffen; Buettner, Florian; Mueller, Nikola S; Theis, Fabian J

    2015-01-01

    Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data, which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds. RAMONA is implemented as ASP.NET Web application and publicly available at http://icb.helmholtz-muenchen.de/ramona. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. The Contribution of Psychosocial Stress to the Obesity Epidemic

    PubMed Central

    Siervo, M.; Wells, J. C. K.; Cizza, G.

    2009-01-01

    The Thrifty Gene hypothesis theorizes that during evolution a set of genes has been selected to ensure survival in environments with limited food supply and marked seasonality. Contemporary environments have predictable and unlimited food availability, an attenuated seasonality due to artificial lighting, indoor heating during the winter and air conditioning during the summer, and promote sedentariness and overeating. In this setting the thrifty genes are constantly activated to enhance energy storage. Psychosocial stress and sleep deprivation are other features of modern societies. Stress-induced hypercortisolemia in the setting of unlimited food supply promotes adiposity. Modern man is becoming obese because these ancient mechanisms are efficiently promoting a positive energy balance. We propose that in today’s plentifully provisioned societies, where sedentariness and mental stress have become typical traits, chronic activation of the neuroendocrine systems may contribute to the increased prevalence of obesity. We suggest that some of the yet unidentified thrifty genes may be linked to highly conserved energy sensing mechanisms (AMP kinase, mTOR kinase). These hypotheses are testable. Rural societies that are becoming rapidly industrialized and are witnessing a dramatic increase in obesity may provide a historical opportunity to conduct epidemiological studies of the thrifty genotype. In experimental settings, the effects of various forms of psychosocial stress in increasing metabolic efficiency and gene expression can be further tested. PMID:19156597

  13. A Comprehensive Analysis of Nuclear-Encoded Mitochondrial Genes in Schizophrenia.

    PubMed

    Gonçalves, Vanessa F; Cappi, Carolina; Hagen, Christian M; Sequeira, Adolfo; Vawter, Marquis P; Derkach, Andriy; Zai, Clement C; Hedley, Paula L; Bybjerg-Grauholm, Jonas; Pouget, Jennie G; Cuperfain, Ari B; Sullivan, Patrick F; Christiansen, Michael; Kennedy, James L; Sun, Lei

    2018-05-01

    The genetic risk factors of schizophrenia (SCZ), a severe psychiatric disorder, are not yet fully understood. Multiple lines of evidence suggest that mitochondrial dysfunction may play a role in SCZ, but comprehensive association studies are lacking. We hypothesized that variants in nuclear-encoded mitochondrial genes influence susceptibility to SCZ. We conducted gene-based and gene-set analyses using summary association results from the Psychiatric Genomics Consortium Schizophrenia Phase 2 (PGC-SCZ2) genome-wide association study comprising 35,476 cases and 46,839 control subjects. We applied the MAGMA method to three sets of nuclear-encoded mitochondrial genes: oxidative phosphorylation genes, other nuclear-encoded mitochondrial genes, and genes involved in nucleus-mitochondria crosstalk. Furthermore, we conducted a replication study using the iPSYCH SCZ sample of 2290 cases and 21,621 control subjects. In the PGC-SCZ2 sample, 1186 mitochondrial genes were analyzed, among which 159 had p values < .05 and 19 remained significant after multiple testing correction. A meta-analysis of 818 genes combining the PGC-SCZ2 and iPSYCH samples resulted in 104 nominally significant and nine significant genes, suggesting a polygenic model for the nuclear-encoded mitochondrial genes. Gene-set analysis, however, did not show significant results. In an in silico protein-protein interaction network analysis, 14 mitochondrial genes interacted directly with 158 SCZ risk genes identified in PGC-SCZ2 (permutation p = .02), and aldosterone signaling in epithelial cells and mitochondrial dysfunction pathways appeared to be overrepresented in this network of mitochondrial and SCZ risk genes. This study provides evidence that specific aspects of mitochondrial function may play a role in SCZ, but we did not observe its broad involvement even using a large sample. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  14. Fully moderated T-statistic for small sample size gene expression arrays.

    PubMed

    Yu, Lianbo; Gulati, Parul; Fernandez, Soledad; Pennell, Michael; Kirschner, Lawrence; Jarjoura, David

    2011-09-15

    Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.

  15. Multi-walled carbon nanotube-induced gene signatures in the mouse lung: potential predictive value for human lung cancer risk and prognosis

    PubMed Central

    Guo, Nancy L; Wan, Ying-Wooi; Denvir, James; Porter, Dale W; Pacurari, Maricica; Wolfarth, Michael G; Castranova, Vincent; Qian, Yong

    2012-01-01

    Concerns over the potential for multi-walled carbon nanotubes (MWCNT) to induce lung carcinogenesis have emerged. This study sought to (1) identify gene expression signatures in the mouse lungs following pharyngeal aspiration of well-dispersed MWCNT and (2) determine if these genes were associated with human lung cancer risk and progression. Genome-wide mRNA expression profiles were analyzed in mouse lungs (n=160) exposed to 0, 10, 20, 40, or 80 µg of MWCNT by pharyngeal aspiration at 1, 7, 28, and 56 days post-exposure. By using pairwise-Statistical Analysis of Microarray (SAM) and linear modeling, 24 genes were selected, which have significant changes in at least two time points, have a more than 1.5 fold change at all doses, and are significant in the linear model for the dose or the interaction of time and dose. Additionally, a 38-gene set was identified as related to cancer from 330 genes differentially expressed at day 56 post-exposure in functional pathway analysis. Using the expression profiles of the cancer-related gene set in 8 mice at day 56 post-exposure to 10 µg of MWCNT, a nearest centroid classification accurately predicts human lung cancer survival with a significant hazard ratio in training set (n=256) and test set (n=186). Furthermore, both gene signatures were associated with human lung cancer risk (n=164) with significant odds ratios. These results may lead to development of a surveillance approach for early detection of lung cancer and prognosis associated with MWCNT in the workplace. PMID:22891886

  16. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association

    PubMed Central

    Grinde, Kelsey E.; Arbet, Jaron; Green, Alden; O'Connell, Michael; Valcarcel, Alessandra; Westra, Jason; Tintle, Nathan

    2017-01-01

    To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as “winner's curse.” We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10−6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures. PMID:28959274

  17. Systems Genetics Analysis of GWAS reveals Novel Associations between Key Biological Processes and Coronary Artery Disease

    PubMed Central

    Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre FR; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth

    2016-01-01

    Objective Genome-wide association (GWA) studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Approaches and Results Employing pathways (gene sets) from Reactome, we carried out a two-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CADGWAS data sets (9,889 cases/11,089 controls), nominally significant gene-sets were tested for replication in a meta-analysis of 9 additional studies (15,502 cases/55,730 controls) from the CARDIoGRAM Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication p<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix integrity, innate immunity, axon guidance, and signaling by PDRF, NOTCH, and the TGF-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (e.g. semaphorin regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared to random networks (p<0.001). Network centrality analysis (‘degree’ and ‘betweenness’) further identified genes (e.g. NCAM1, FYN, FURIN etc.) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. Conclusions These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. PMID:25977570

  18. A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies

    PubMed Central

    Sun, Jianping; Zheng, Yingye; Hsu, Li

    2013-01-01

    For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651

  19. Fuzzy measures on the Gene Ontology for gene product similarity.

    PubMed

    Popescu, Mihail; Keller, James M; Mitchell, Joyce A

    2006-01-01

    One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.

  20. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  1. Artificial neural network classifier predicts neuroblastoma patients' outcome.

    PubMed

    Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi

    2016-11-08

    More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic status of the tumor in patients with poor prognosis. We developed a robust classifier predicting neuroblastoma patients' outcome with a very low error rate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the potential of using hypoxia as target for neuroblastoma treatment.

  2. Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes

    PubMed Central

    2009-01-01

    Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. Conclusion This automated process allows laboratories to discover DNA variations in a short time and at low cost. PMID:19835634

  3. Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes.

    PubMed

    Bennett, Richard R; Schneider, Hal E; Estrella, Elicia; Burgess, Stephanie; Cheng, Andrew S; Barrett, Caitlin; Lip, Va; Lai, Poh San; Shen, Yiping; Wu, Bai-Lin; Darras, Basil T; Beggs, Alan H; Kunkel, Louis M

    2009-10-18

    One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive.These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels.The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. This automated process allows laboratories to discover DNA variations in a short time and at low cost.

  4. "I Do Feel Like a Scientist at Times": A Qualitative Study of the Acceptability of Molecular Point-Of-Care Testing for Chlamydia and Gonorrhoea to Primary Care Professionals in a Remote High STI Burden Setting.

    PubMed

    Natoli, Lisa; Guy, Rebecca J; Shephard, Mark; Causer, Louise; Badman, Steven G; Hengel, Belinda; Tangey, Annie; Ward, James; Coburn, Tony; Anderson, David; Kaldor, John; Maher, Lisa

    2015-01-01

    Point-of-care tests for chlamydia (CT) and gonorrhoea (NG) could increase the uptake and timeliness of testing and treatment, contribute to improved disease control and reduce reproductive morbidity. The GeneXpert (Xpert CT/NG assay), suited to use at the point-of-care, is being used in the TTANGO randomised controlled trial (RCT) in 12 remote Australian health services with a high burden of sexually transmissible infections (STIs). This represents the first ever routine use of a molecular point-of-care diagnostic for STIs in primary care. The purpose of this study was to explore the acceptability of the GeneXpert to primary care staff in remote Australia. In-depth qualitative interviews were conducted with 16 staff (registered or enrolled nurses and Aboriginal Health Workers/Practitioners) trained and experienced with GeneXpert testing. Interviews were digitally-recorded and transcribed verbatim prior to content analysis. Most participants displayed positive attitudes, indicating the test was both easy to use and useful in their clinical context. Participants indicated that point-of-care testing had improved management of STIs, resulting in more timely and targeted treatment, earlier commencement of partner notification, and reduced follow up efforts associated with client recall. Staff expressed confidence in point-of-care test results and treating patients on this basis, and reported greater job satisfaction. While point-of-care testing did not negatively impact on client flow, several found the manual documentation processes time consuming, suggesting that improved electronic connectivity and test result transfer between the GeneXpert and patient management systems could overcome this. Managing positive test results in a shorter time frame was challenging for some but most found it satisfying to complete episodes of care more quickly. In the context of a RCT, health professionals working in remote primary care in Australia found the GeneXpert highly acceptable. These findings have implications for use in other primary care settings around the world.

  5. “I Do Feel Like a Scientist at Times”: A Qualitative Study of the Acceptability of Molecular Point-Of-Care Testing for Chlamydia and Gonorrhoea to Primary Care Professionals in a Remote High STI Burden Setting

    PubMed Central

    Natoli, Lisa; Guy, Rebecca J.; Shephard, Mark; Causer, Louise; Badman, Steven G.; Hengel, Belinda; Tangey, Annie; Ward, James; Coburn, Tony; Anderson, David; Kaldor, John; Maher, Lisa

    2015-01-01

    Background Point-of-care tests for chlamydia (CT) and gonorrhoea (NG) could increase the uptake and timeliness of testing and treatment, contribute to improved disease control and reduce reproductive morbidity. The GeneXpert (Xpert CT/NG assay), suited to use at the point-of-care, is being used in the TTANGO randomised controlled trial (RCT) in 12 remote Australian health services with a high burden of sexually transmissible infections (STIs). This represents the first ever routine use of a molecular point-of-care diagnostic for STIs in primary care. The purpose of this study was to explore the acceptability of the GeneXpert to primary care staff in remote Australia. Methods In-depth qualitative interviews were conducted with 16 staff (registered or enrolled nurses and Aboriginal Health Workers/Practitioners) trained and experienced with GeneXpert testing. Interviews were digitally-recorded and transcribed verbatim prior to content analysis. Results Most participants displayed positive attitudes, indicating the test was both easy to use and useful in their clinical context. Participants indicated that point-of-care testing had improved management of STIs, resulting in more timely and targeted treatment, earlier commencement of partner notification, and reduced follow up efforts associated with client recall. Staff expressed confidence in point-of-care test results and treating patients on this basis, and reported greater job satisfaction. While point-of-care testing did not negatively impact on client flow, several found the manual documentation processes time consuming, suggesting that improved electronic connectivity and test result transfer between the GeneXpert and patient management systems could overcome this. Managing positive test results in a shorter time frame was challenging for some but most found it satisfying to complete episodes of care more quickly. Conclusions In the context of a RCT, health professionals working in remote primary care in Australia found the GeneXpert highly acceptable. These findings have implications for use in other primary care settings around the world. PMID:26713441

  6. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data.

    PubMed

    Pesesky, Mitchell W; Hussain, Tahir; Wallace, Meghan; Patel, Sanket; Andleeb, Saadia; Burnham, Carey-Ann D; Dantas, Gautam

    2016-01-01

    The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0 and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance factors and incomplete genome assembly confounded the rules-based algorithm, resulting in predictions based on gene family, rather than on knowledge of the specific variant found. Low-frequency resistance caused errors in the machine-learning algorithm because those genes were not seen or seen infrequently in the test set. We also identified an example of variability in the phenotype-based results that led to disagreement with both genotype-based methods. Genotype-based antimicrobial susceptibility testing shows great promise as a diagnostic tool, and we outline specific research goals to further refine this methodology.

  7. A methodology for the analysis of differential coexpression across the human lifespan.

    PubMed

    Gillis, Jesse; Pavlidis, Paul

    2009-09-22

    Differential coexpression is a change in coexpression between genes that may reflect 'rewiring' of transcriptional networks. It has previously been hypothesized that such changes might be occurring over time in the lifespan of an organism. While both coexpression and differential expression of genes have been previously studied in life stage change or aging, differential coexpression has not. Generalizing differential coexpression analysis to many time points presents a methodological challenge. Here we introduce a method for analyzing changes in coexpression across multiple ordered groups (e.g., over time) and extensively test its validity and usefulness. Our method is based on the use of the Haar basis set to efficiently represent changes in coexpression at multiple time scales, and thus represents a principled and generalizable extension of the idea of differential coexpression to life stage data. We used published microarray studies categorized by age to test the methodology. We validated the methodology by testing our ability to reconstruct Gene Ontology (GO) categories using our measure of differential coexpression and compared this result to using coexpression alone. Our method allows significant improvement in characterizing these groups of genes. Further, we examine the statistical properties of our measure of differential coexpression and establish that the results are significant both statistically and by an improvement in semantic similarity. In addition, we found that our method finds more significant changes in gene relationships compared to several other methods of expressing temporal relationships between genes, such as coexpression over time. Differential coexpression over age generates significant and biologically relevant information about the genes producing it. Our Haar basis methodology for determining age-related differential coexpression performs better than other tested methods. The Haar basis set also lends itself to ready interpretation in terms of both evolutionary and physiological mechanisms of aging and can be seen as a natural generalization of two-category differential coexpression. paul@bioinformatics.ubc.ca.

  8. Predicting degree of benefit from adjuvant trastuzumab in NSABP trial B-31.

    PubMed

    Pogue-Geile, Katherine L; Kim, Chungyeul; Jeong, Jong-Hyeon; Tanaka, Noriko; Bandos, Hanna; Gavin, Patrick G; Fumagalli, Debora; Goldstein, Lynn C; Sneige, Nour; Burandt, Eike; Taniyama, Yusuke; Bohn, Olga L; Lee, Ahwon; Kim, Seung-Il; Reilly, Megan L; Remillard, Matthew Y; Blackmon, Nicole L; Kim, Seong-Rim; Horne, Zachary D; Rastogi, Priya; Fehrenbacher, Louis; Romond, Edward H; Swain, Sandra M; Mamounas, Eleftherios P; Wickerham, D Lawrence; Geyer, Charles E; Costantino, Joseph P; Wolmark, Norman; Paik, Soonmyung

    2013-12-04

    National Surgical Adjuvant Breast and Bowel Project (NSABP) trial B-31 suggested the efficacy of adjuvant trastuzumab, even in HER2-negative breast cancer. This finding prompted us to develop a predictive model for degree of benefit from trastuzumab using archived tumor blocks from B-31. Case subjects with tumor blocks were randomly divided into discovery (n = 588) and confirmation cohorts (n = 991). A predictive model was built from the discovery cohort through gene expression profiling of 462 genes with nCounter assay. A predefined cut point for the predictive model was tested in the confirmation cohort. Gene-by-treatment interaction was tested with Cox models, and correlations between variables were assessed with Spearman correlation. Principal component analysis was performed on the final set of selected genes. All statistical tests were two-sided. Eight predictive genes associated with HER2 (ERBB2, c17orf37, GRB7) or ER (ESR1, NAT1, GATA3, CA12, IGF1R) were selected for model building. Three-dimensional subset treatment effect pattern plot using two principal components of these genes was used to identify a subset with no benefit from trastuzumab, characterized by intermediate-level ERBB2 and high-level ESR1 mRNA expression. In the confirmation set, the predefined cut points for this model classified patients into three subsets with differential benefit from trastuzumab with hazard ratios of 1.58 (95% confidence interval [CI] = 0.67 to 3.69; P = .29; n = 100), 0.60 (95% CI = 0.41 to 0.89; P = .01; n = 449), and 0.28 (95% CI = 0.20 to 0.41; P < .001; n = 442; P(interaction) between the model and trastuzumab < .001). We developed a gene expression-based predictive model for degree of benefit from trastuzumab and demonstrated that HER2-negative tumors belong to the moderate benefit group, thus providing justification for testing trastuzumab in HER2-negative patients (NSABP B-47).

  9. Predicting Degree of Benefit From Adjuvant Trastuzumab in NSABP Trial B-31

    PubMed Central

    Pogue-Geile, Katherine L.; Kim, Chungyeul; Jeong, Jong-Hyeon; Tanaka, Noriko; Bandos, Hanna; Gavin, Patrick G.; Fumagalli, Debora; Goldstein, Lynn C.; Sneige, Nour; Burandt, Eike; Taniyama, Yusuke; Bohn, Olga L.; Lee, Ahwon; Kim, Seung-Il; Reilly, Megan L.; Remillard, Matthew Y.; Blackmon, Nicole L.; Kim, Seong-Rim; Horne, Zachary D.; Rastogi, Priya; Fehrenbacher, Louis; Romond, Edward H.; Swain, Sandra M.; Mamounas, Eleftherios P.; Wickerham, D. Lawrence; Geyer, Charles E.; Costantino, Joseph P.; Wolmark, Norman

    2013-01-01

    Background National Surgical Adjuvant Breast and Bowel Project (NSABP) trial B-31 suggested the efficacy of adjuvant trastuzumab, even in HER2-negative breast cancer. This finding prompted us to develop a predictive model for degree of benefit from trastuzumab using archived tumor blocks from B-31. Methods Case subjects with tumor blocks were randomly divided into discovery (n = 588) and confirmation cohorts (n = 991). A predictive model was built from the discovery cohort through gene expression profiling of 462 genes with nCounter assay. A predefined cut point for the predictive model was tested in the confirmation cohort. Gene-by-treatment interaction was tested with Cox models, and correlations between variables were assessed with Spearman correlation. Principal component analysis was performed on the final set of selected genes. All statistical tests were two-sided. Results Eight predictive genes associated with HER2 (ERBB2, c17orf37, GRB7) or ER (ESR1, NAT1, GATA3, CA12, IGF1R) were selected for model building. Three-dimensional subset treatment effect pattern plot using two principal components of these genes was used to identify a subset with no benefit from trastuzumab, characterized by intermediate-level ERBB2 and high-level ESR1 mRNA expression. In the confirmation set, the predefined cut points for this model classified patients into three subsets with differential benefit from trastuzumab with hazard ratios of 1.58 (95% confidence interval [CI] = 0.67 to 3.69; P = .29; n = 100), 0.60 (95% CI = 0.41 to 0.89; P = .01; n = 449), and 0.28 (95% CI = 0.20 to 0.41; P < .001; n = 442; P interaction between the model and trastuzumab < .001). Conclusions We developed a gene expression–based predictive model for degree of benefit from trastuzumab and demonstrated that HER2-negative tumors belong to the moderate benefit group, thus providing justification for testing trastuzumab in HER2-negative patients (NSABP B-47). PMID:24262440

  10. Down-weighting overlapping genes improves gene set analysis

    PubMed Central

    2012-01-01

    Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124

  11. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  12. Genome-Wide Characterization of Transcriptional Patterns in High and Low Antibody Responders to Rubella Vaccination

    PubMed Central

    Haralambieva, Iana H.; Oberg, Ann L.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Grill, Diane E.; Middha, Sumit; Bot, Brian M.; Wang, Vivian W.; Smith, David I.; Jacobson, Robert M.; Poland, Gregory A.

    2013-01-01

    Immune responses to current rubella vaccines demonstrate significant inter-individual variability. We performed mRNA-Seq profiling on PBMCs from high and low antibody responders to rubella vaccination to delineate transcriptional differences upon viral stimulation. Generalized linear models were used to assess the per gene fold change (FC) for stimulated versus unstimulated samples or the interaction between outcome and stimulation. Model results were evaluated by both FC and p-value. Pathway analysis and self-contained gene set tests were performed for assessment of gene group effects. Of 17,566 detected genes, we identified 1,080 highly significant differentially expressed genes upon viral stimulation (p<1.00E−15, FDR<1.00E−14), including various immune function and inflammation-related genes, genes involved in cell signaling, cell regulation and transcription, and genes with unknown function. Analysis by immune outcome and stimulation status identified 27 genes (p≤0.0006 and FDR≤0.30) that responded differently to viral stimulation in high vs. low antibody responders, including major histocompatibility complex (MHC) class I genes (HLA-A, HLA-B and B2M with p = 0.0001, p = 0.0005 and p = 0.0002, respectively), and two genes related to innate immunity and inflammation (EMR3 and MEFV with p = 1.46E−08 and p = 0.0004, respectively). Pathway and gene set analysis also revealed transcriptional differences in antigen presentation and innate/inflammatory gene sets and pathways between high and low responders. Using mRNA-Seq genome-wide transcriptional profiling, we identified antigen presentation and innate/inflammatory genes that may assist in explaining rubella vaccine-induced immune response variations. Such information may provide new scientific insights into vaccine-induced immunity useful in rational vaccine development and immune response monitoring. PMID:23658707

  13. BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.

    PubMed

    Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong

    2015-10-19

    Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .

  14. Reliable measurement of E. coli single cell fluorescence distribution using a standard microscope set-up.

    PubMed

    Cortesi, Marilisa; Bandiera, Lucia; Pasini, Alice; Bevilacqua, Alessandro; Gherardi, Alessandro; Furini, Simone; Giordano, Emanuele

    2017-01-01

    Quantifying gene expression at single cell level is fundamental for the complete characterization of synthetic gene circuits, due to the significant impact of noise and inter-cellular variability on the system's functionality. Commercial set-ups that allow the acquisition of fluorescent signal at single cell level (flow cytometers or quantitative microscopes) are expensive apparatuses that are hardly affordable by small laboratories. A protocol that makes a standard optical microscope able to acquire quantitative, single cell, fluorescent data from a bacterial population transformed with synthetic gene circuitry is presented. Single cell fluorescence values, acquired with a microscope set-up and processed with custom-made software, are compared with results that were obtained with a flow cytometer in a bacterial population transformed with the same gene circuitry. The high correlation between data from the two experimental set-ups, with a correlation coefficient computed over the tested dynamic range > 0.99, proves that a standard optical microscope- when coupled with appropriate software for image processing- might be used for quantitative single-cell fluorescence measurements. The calibration of the set-up, together with its validation, is described. The experimental protocol described in this paper makes quantitative measurement of single cell fluorescence accessible to laboratories equipped with standard optical microscope set-ups. Our method allows for an affordable measurement/quantification of intercellular variability, whose better understanding of this phenomenon will improve our comprehension of cellular behaviors and the design of synthetic gene circuits. All the required software is freely available to the synthetic biology community (MUSIQ Microscope flUorescence SIngle cell Quantification).

  15. Knockdown of the schizophrenia susceptibility gene TCF4 alters gene expression and proliferation of progenitor cells from the developing human neocortex.

    PubMed

    Hill, Matthew J; Killick, Richard; Navarrete, Katherinne; Maruszak, Aleksandra; McLaughlin, Gemma M; Williams, Brenda P; Bray, Nicholas J

    2017-05-01

    Common variants in the TCF4 gene are among the most robustly supported genetic risk factors for schizophrenia. Rare TCF4 deletions and loss-of-function point mutations cause Pitt-Hopkins syndrome, a developmental disorder associated with severe intellectual disability. To explore molecular and cellular mechanisms by which TCF4 perturbation could interfere with human cortical development, we experimentally reduced the endogenous expression of TCF4 in a neural progenitor cell line derived from the developing human cerebral cortex using RNA interference. Effects on genome-wide gene expression were assessed by microarray, followed by Gene Ontology and pathway analysis of differentially expressed genes. We tested for genetic association between the set of differentially expressed genes and schizophrenia using genome-wide association study data from the Psychiatric Genomics Consortium and competitive gene set analysis (MAGMA). Effects on cell proliferation were assessed using high content imaging. Genes that were differentially expressed following TCF4 knockdown were highly enriched for involvement in the cell cycle. There was a nonsignificant trend for genetic association between the differentially expressed gene set and schizophrenia. Consistent with the gene expression data, TCF4 knockdown was associated with reduced proliferation of cortical progenitor cells in vitro. A detailed mechanistic explanation of how TCF4 knockdown alters human neural progenitor cell proliferation is not provided by this study. Our data indicate effects of TCF4 perturbation on human cortical progenitor cell proliferation, a process that could contribute to cognitive deficits in individuals with Pitt-Hopkins syndrome and risk for schizophrenia.

  16. Placental invasion, preeclampsia risk and adaptive molecular evolution at the origin of the great apes: evidence from genome-wide analyses.

    PubMed

    Crosley, E J; Elliot, M G; Christians, J K; Crespi, B J

    2013-02-01

    Recent evidence from chimpanzees and gorillas has raised doubts that preeclampsia is a uniquely human disease. The deep extravillous trophoblast (EVT) invasion and spiral artery remodeling that characterizes our placenta (and is abnormal in preeclampsia) is shared within great apes, setting Homininae apart from Hylobatidae and Old World Monkeys, which show much shallower trophoblast invasion and limited spiral artery remodeling. We hypothesize that the evolution of a more invasive placenta in the lineage ancestral to the great apes involved positive selection on genes crucial to EVT invasion and spiral artery remodeling. Furthermore, identification of placentally-expressed genes under selection in this lineage may identify novel genes involved in placental development. We tested for positive selection in approximately 18,000 genes using the ratio of non-synonymous to synonymous amino acid substitution for protein-coding DNA. DAVID Bioinformatics Resources identified biological processes enriched in positively selected genes, including processes related to EVT invasion and spiral artery remodeling. Analyses revealed 295 and 264 genes under significant positive selection on the branches ancestral to Hominidae (Human, Chimp, Gorilla, Orangutan) and Homininae (Human, Chimp, Gorilla), respectively. Gene ontology analysis of these gene sets demonstrated significant enrichments for several functional gene clusters relevant to preeclampsia risk, and sets of placentally-expressed genes that have been linked with preeclampsia and/or trophoblast invasion in other studies. Our study represents a novel approach to the identification of candidate genes and amino acid residues involved in placental pathologies by implicating them in the evolution of highly-invasive placenta. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Systems Biology-Based Identification of Mycobacterium tuberculosis Persistence Genes in Mouse Lungs

    PubMed Central

    Dutta, Noton K.; Bandyopadhyay, Nirmalya; Veeramani, Balaji; Lamichhane, Gyanu; Karakousis, Petros C.; Bader, Joel S.

    2014-01-01

    ABSTRACT Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. PMID:24549847

  18. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  19. Genomic evidence of reactive oxygen species elevation in papillary thyroid carcinoma with Hashimoto thyroiditis.

    PubMed

    Yi, Jin Wook; Park, Ji Yeon; Sung, Ji-Youn; Kwak, Sang Hyuk; Yu, Jihan; Chang, Ji Hyun; Kim, Jo-Heon; Ha, Sang Yun; Paik, Eun Kyung; Lee, Woo Seung; Kim, Su-Jin; Lee, Kyu Eun; Kim, Ju Han

    2015-01-01

    Elevated levels of reactive oxygen species (ROS) have been proposed as a risk factor for the development of papillary thyroid carcinoma (PTC) in patients with Hashimoto thyroiditis (HT). However, it has yet to be proven that the total levels of ROS are sufficiently increased to contribute to carcinogenesis. We hypothesized that if the ROS levels were increased in HT, ROS-related genes would also be differently expressed in PTC with HT. To find differentially expressed genes (DEGs) we analyzed data from the Cancer Genomic Atlas, gene expression data from RNA sequencing: 33 from normal thyroid tissue, 232 from PTC without HT, and 60 from PTC with HT. We prepared 402 ROS-related genes from three gene sets by genomic database searching. We also analyzed a public microarray data to validate our results. Thirty-three ROS related genes were up-regulated in PTC with HT, whereas there were only nine genes in PTC without HT (Chi-square p-value < 0.001). Mean log2 fold changes of up-regulated genes was 0.562 in HT group and 0.252 in PTC without HT group (t-test p-value = 0.001). In microarray data analysis, 12 of 32 ROS-related genes showed the same differential expression pattern with statistical significance. In gene ontology analysis, up-regulated ROS-related genes were related with ROS metabolism and apoptosis. Immune function-related and carcinogenesis-related gene sets were enriched only in HT group in Gene Set Enrichment Analysis. Our results suggested that ROS levels may be increased in PTC with HT. Increased levels of ROS may contribute to PTC development in patients with HT.

  20. Selection and testing of reference genes for accurate RT-qPCR in rice seedlings under iron toxicity.

    PubMed

    Santos, Fabiane Igansi de Castro Dos; Marini, Naciele; Santos, Railson Schreinert Dos; Hoffman, Bianca Silva Fernandes; Alves-Ferreira, Marcio; de Oliveira, Antonio Costa

    2018-01-01

    Reverse Transcription quantitative PCR (RT-qPCR) is a technique for gene expression profiling with high sensibility and reproducibility. However, to obtain accurate results, it depends on data normalization by using endogenous reference genes whose expression is constitutive or invariable. Although the technique is widely used in plant stress analyzes, the stability of reference genes for iron toxicity in rice (Oryza sativa L.) has not been thoroughly investigated. Here, we tested a set of candidate reference genes for use in rice under this stressful condition. The test was performed using four distinct methods: NormFinder, BestKeeper, geNorm and the comparative ΔCt. To achieve reproducible and reliable results, Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines were followed. Valid reference genes were found for shoot (P2, OsGAPDH and OsNABP), root (OsEF-1a, P8 and OsGAPDH) and root+shoot (OsNABP, OsGAPDH and P8) enabling us to perform further reliable studies for iron toxicity in both indica and japonica subspecies. The importance of the study of other than the traditional endogenous genes for use as normalizers is also shown here.

  1. Selection and testing of reference genes for accurate RT-qPCR in rice seedlings under iron toxicity

    PubMed Central

    dos Santos, Fabiane Igansi de Castro; Marini, Naciele; dos Santos, Railson Schreinert; Hoffman, Bianca Silva Fernandes; Alves-Ferreira, Marcio

    2018-01-01

    Reverse Transcription quantitative PCR (RT-qPCR) is a technique for gene expression profiling with high sensibility and reproducibility. However, to obtain accurate results, it depends on data normalization by using endogenous reference genes whose expression is constitutive or invariable. Although the technique is widely used in plant stress analyzes, the stability of reference genes for iron toxicity in rice (Oryza sativa L.) has not been thoroughly investigated. Here, we tested a set of candidate reference genes for use in rice under this stressful condition. The test was performed using four distinct methods: NormFinder, BestKeeper, geNorm and the comparative ΔCt. To achieve reproducible and reliable results, Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines were followed. Valid reference genes were found for shoot (P2, OsGAPDH and OsNABP), root (OsEF-1a, P8 and OsGAPDH) and root+shoot (OsNABP, OsGAPDH and P8) enabling us to perform further reliable studies for iron toxicity in both indica and japonica subspecies. The importance of the study of other than the traditional endogenous genes for use as normalizers is also shown here. PMID:29494624

  2. Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting.

    PubMed

    Zhao, Wei; Ware, Erin B; He, Zihuai; Kardia, Sharon L R; Faul, Jessica D; Smith, Jennifer A

    2017-09-29

    Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) ( p = 0.07).

  3. Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting

    PubMed Central

    Zhao, Wei; He, Zihuai; Kardia, Sharon L. R.; Faul, Jessica D.

    2017-01-01

    Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) (p = 0.07). PMID:28961216

  4. Exposure to Nickel, Chromium, or Cadmium Causes Distinct Changes in the Gene Expression Patterns of a Rat Liver Derived Cell Line

    DTIC Science & Technology

    2011-11-16

    nickel, cadmium, and chromium are toxic industrial chemicals with an exposure. While these substances are known to produce adverse health effects leading...in both occupational and environmental settings that may cause harmful outcomes. While these substances are known to produce adverse health effects...that particular bin. A chi-squared test was used to test bin enrichment ( p ≤0.05). Probe sets that did not contain any biological process annotation were

  5. DNA methylation of phosphatase and actin regulator 3 detects colorectal cancer in stool and complements FIT.

    PubMed

    Bosch, Linda J W; Oort, Frank A; Neerincx, Maarten; Khalid-de Bakker, Carolina A J; Terhaar sive Droste, Jochim S; Melotte, Veerle; Jonkers, Daisy M A E; Masclee, Ad A M; Mongera, Sandra; Grooteclaes, Madeleine; Louwagie, Joost; van Criekinge, Wim; Coupé, Veerle M H; Mulder, Chris J; van Engeland, Manon; Carvalho, Beatriz; Meijer, Gerrit A

    2012-03-01

    Using a bioinformatics-based strategy, we set out to identify hypermethylated genes that could serve as biomarkers for early detection of colorectal cancer (CRC) in stool. In addition, the complementary value to a Fecal Immunochemical Test (FIT) was evaluated. Candidate genes were selected by applying cluster alignment and computational analysis of promoter regions to microarray-expression data of colorectal adenomas and carcinomas. DNA methylation was measured by quantitative methylation-specific PCR on 34 normal colon mucosa, 71 advanced adenoma, and 64 CRC tissues. The performance as biomarker was tested in whole stool samples from in total 193 subjects, including 19 with advanced adenoma and 66 with CRC. For a large proportion of these series, methylation data for GATA4 and OSMR were available for comparison. The complementary value to FIT was measured in stool subsamples from 92 subjects including 44 with advanced adenoma or CRC. Phosphatase and Actin Regulator 3 (PHACTR3) was identified as a novel hypermethylated gene showing more than 70-fold increased DNA methylation levels in advanced neoplasia compared with normal colon mucosa. In a stool training set, PHACTR3 methylation showed a sensitivity of 55% (95% CI: 33-75) for CRC and a specificity of 95% (95% CI: 87-98). In a stool validation set, sensitivity reached 66% (95% CI: 50-79) for CRC and 32% (95% CI: 14-57) for advanced adenomas at a specificity of 100% (95% CI: 86-100). Adding PHACTR3 methylation to FIT increased sensitivity for CRC up to 15%. PHACTR3 is a new hypermethylated gene in CRC with a good performance in stool DNA testing and has complementary value to FIT.

  6. Analysis of high-throughput biological data using their rank values.

    PubMed

    Dembélé, Doulaye

    2018-01-01

    High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .

  7. Integrating mean and variance heterogeneities to identify differentially expressed genes.

    PubMed

    Ouyang, Weiwei; An, Qiang; Zhao, Jinying; Qin, Huaizhen

    2016-12-06

    In functional genomics studies, tests on mean heterogeneity have been widely employed to identify differentially expressed genes with distinct mean expression levels under different experimental conditions. Variance heterogeneity (aka, the difference between condition-specific variances) of gene expression levels is simply neglected or calibrated for as an impediment. The mean heterogeneity in the expression level of a gene reflects one aspect of its distribution alteration; and variance heterogeneity induced by condition change may reflect another aspect. Change in condition may alter both mean and some higher-order characteristics of the distributions of expression levels of susceptible genes. In this report, we put forth a conception of mean-variance differentially expressed (MVDE) genes, whose expression means and variances are sensitive to the change in experimental condition. We mathematically proved the null independence of existent mean heterogeneity tests and variance heterogeneity tests. Based on the independence, we proposed an integrative mean-variance test (IMVT) to combine gene-wise mean heterogeneity and variance heterogeneity induced by condition change. The IMVT outperformed its competitors under comprehensive simulations of normality and Laplace settings. For moderate samples, the IMVT well controlled type I error rates, and so did existent mean heterogeneity test (i.e., the Welch t test (WT), the moderated Welch t test (MWT)) and the procedure of separate tests on mean and variance heterogeneities (SMVT), but the likelihood ratio test (LRT) severely inflated type I error rates. In presence of variance heterogeneity, the IMVT appeared noticeably more powerful than all the valid mean heterogeneity tests. Application to the gene profiles of peripheral circulating B raised solid evidence of informative variance heterogeneity. After adjusting for background data structure, the IMVT replicated previous discoveries and identified novel experiment-wide significant MVDE genes. Our results indicate tremendous potential gain of integrating informative variance heterogeneity after adjusting for global confounders and background data structure. The proposed informative integration test better summarizes the impacts of condition change on expression distributions of susceptible genes than do the existent competitors. Therefore, particular attention should be paid to explicitly exploit the variance heterogeneity induced by condition change in functional genomics analysis.

  8. Analyzing the most frequent disease loci in targeted patient categories optimizes disease gene identification and test accuracy worldwide.

    PubMed

    Lebo, Roger V; Tonk, Vijay S

    2015-01-21

    Our genomewide studies support targeted testing the most frequent genetic diseases by patient category: (1) pregnant patients, (2) at-risk conceptuses, (3) affected children, and (4) abnormal adults. This approach not only identifies most reported disease causing sequences accurately, but also minimizes incorrectly identified additional disease causing loci. Diseases were grouped in descending order of occurrence from four data sets: (1) GeneTests 534 listed population prevalences, (2) 4129 high risk prenatal karyotypes, (3) 1265 affected patient microarrays, and (4) reanalysis of 25,452 asymptomatic patient results screened prenatally for 108 genetic diseases. These most frequent diseases are categorized by transmission: (A) autosomal recessive, (B) X-linked, (C) autosomal dominant, (D) microscopic chromosome rearrangements, (E) submicroscopic copy number changes, and (F) frequent ethnic diseases. Among affected and carrier patients worldwide, most reported mutant genes would be identified correctly according to one of four patient categories from at-risk couples with <64 tested genes to affected adults with 314 tested loci. Three clinically reported patient series confirmed this approach. First, only 54 targeted chromosomal sites would have detected all 938 microscopically visible unbalanced karyotypes among 4129 karyotyped POC, CVS, and amniocentesis samples. Second, 37 of 48 reported aneuploid regions were found among our 1265 clinical microarrays confirming the locations of 8 schizophrenia loci and 20 aneuploidies altering intellectual ability, while also identifying 9 of the most frequent deletion syndromes. Third, testing 15 frequent genes would have identified 124 couples with a 1 in 4 risk of a fetus with a recessive disease compared to the 127 couples identified by testing all 108 genes, while testing all mutations in 15 genes could have identified more couples. Testing the most frequent disease causing abnormalities in 1 of 8 reported disease loci [~1 of 84 total genes] will identify ~ 7 of 8 reported abnormal Caucasian newborn genotypes. This would eliminate ~8 to 10 of ~10 Caucasian newborn gene sequences selected as abnormal that are actually normal variants identified when testing all ~2500 diseases looking for the remaining 1 of 8 disease causing genes. This approach enables more accurate testing within available laboratory and reimbursement resources.

  9. FUN-L: gene prioritization for RNAi screens.

    PubMed

    Lees, Jonathan G; Hériché, Jean-Karim; Morilla, Ian; Fernández, José M; Adler, Priit; Krallinger, Martin; Vilo, Jaak; Valencia, Alfonso; Ellenberg, Jan; Ranea, Juan A; Orengo, Christine

    2015-06-15

    Most biological processes remain only partially characterized with many components still to be identified. Given that a whole genome can usually not be tested in a functional assay, identifying the genes most likely to be of interest is of critical importance to avoid wasting resources. Given a set of known functionally related genes and using a state-of-the-art approach to data integration and mining, our Functional Lists (FUN-L) method provides a ranked list of candidate genes for testing. Validation of predictions from FUN-L with independent RNAi screens confirms that FUN-L-produced lists are enriched in genes with the expected phenotypes. In this article, we describe a website front end to FUN-L. The website is freely available to use at http://funl.org © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Development of a versatile enrichment analysis tool reveals associations between the maternal brain and mental health disorders, including autism

    PubMed Central

    2013-01-01

    Background A recent study of lateral septum (LS) suggested a large number of autism-related genes with altered expression in the postpartum state. However, formally testing the findings for enrichment of autism-associated genes proved to be problematic with existing software. Many gene-disease association databases have been curated which are not currently incorporated in popular, full-featured enrichment tools, and the use of custom gene lists in these programs can be difficult to perform and interpret. As a simple alternative, we have developed the Modular Single-set Enrichment Test (MSET), a minimal tool that enables one to easily evaluate expression data for enrichment of any conceivable gene list of interest. Results The MSET approach was validated by testing several publicly available expression data sets for expected enrichment in areas of autism, attention deficit hyperactivity disorder (ADHD), and arthritis. Using nine independent, unique autism gene lists extracted from association databases and two recent publications, a striking consensus of enrichment was detected within gene expression changes in LS of postpartum mice. A network of 160 autism-related genes was identified, representing developmental processes such as synaptic plasticity, neuronal morphogenesis, and differentiation. Additionally, maternal LS displayed enrichment for genes associated with bipolar disorder, schizophrenia, ADHD, and depression. Conclusions The transition to motherhood includes the most fundamental social bonding event in mammals and features naturally occurring changes in sociability. Some individuals with autism, schizophrenia, or other mental health disorders exhibit impaired social traits. Genes involved in these deficits may also contribute to elevated sociability in the maternal brain. To date, this is the first study to show a significant, quantitative link between the maternal brain and mental health disorders using large scale gene expression data. Thus, the postpartum brain may provide a novel and promising platform for understanding the complex genetics of improved sociability that may have direct relevance for multiple psychiatric illnesses. This study also provides an important new tool that fills a critical analysis gap and makes evaluation of enrichment using any database of interest possible with an emphasis on ease of use and methodological transparency. PMID:24245670

  11. Development of a versatile enrichment analysis tool reveals associations between the maternal brain and mental health disorders, including autism.

    PubMed

    Eisinger, Brian E; Saul, Michael C; Driessen, Terri M; Gammie, Stephen C

    2013-11-19

    A recent study of lateral septum (LS) suggested a large number of autism-related genes with altered expression in the postpartum state. However, formally testing the findings for enrichment of autism-associated genes proved to be problematic with existing software. Many gene-disease association databases have been curated which are not currently incorporated in popular, full-featured enrichment tools, and the use of custom gene lists in these programs can be difficult to perform and interpret. As a simple alternative, we have developed the Modular Single-set Enrichment Test (MSET), a minimal tool that enables one to easily evaluate expression data for enrichment of any conceivable gene list of interest. The MSET approach was validated by testing several publicly available expression data sets for expected enrichment in areas of autism, attention deficit hyperactivity disorder (ADHD), and arthritis. Using nine independent, unique autism gene lists extracted from association databases and two recent publications, a striking consensus of enrichment was detected within gene expression changes in LS of postpartum mice. A network of 160 autism-related genes was identified, representing developmental processes such as synaptic plasticity, neuronal morphogenesis, and differentiation. Additionally, maternal LS displayed enrichment for genes associated with bipolar disorder, schizophrenia, ADHD, and depression. The transition to motherhood includes the most fundamental social bonding event in mammals and features naturally occurring changes in sociability. Some individuals with autism, schizophrenia, or other mental health disorders exhibit impaired social traits. Genes involved in these deficits may also contribute to elevated sociability in the maternal brain. To date, this is the first study to show a significant, quantitative link between the maternal brain and mental health disorders using large scale gene expression data. Thus, the postpartum brain may provide a novel and promising platform for understanding the complex genetics of improved sociability that may have direct relevance for multiple psychiatric illnesses. This study also provides an important new tool that fills a critical analysis gap and makes evaluation of enrichment using any database of interest possible with an emphasis on ease of use and methodological transparency.

  12. Testing genotyping strategies for ultra-deep sequencing of a co-amplifying gene family: MHC class I in a passerine bird.

    PubMed

    Biedrzycka, Aleksandra; Sebastian, Alvaro; Migalska, Magdalena; Westerdahl, Helena; Radwan, Jacek

    2017-07-01

    Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co-amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra-deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500-20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within-method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co-amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co-amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage. © 2016 John Wiley & Sons Ltd.

  13. Selection of suitable reference genes for normalization of genes of interest in canine soft tissue sarcomas using quantitative real-time polymerase chain reaction.

    PubMed

    Zornhagen, K W; Kristensen, A T; Hansen, A E; Oxboel, J; Kjaer, A

    2015-12-01

    Quantitative real-time reverse transcription polymerase chain reaction (RT-qPCR) is a sensitive technique for quantifying gene expression. Stably expressed reference genes are necessary for normalization of RT-qPCR data. Only a few articles have been published on reference genes in canine tumours. The objective of this study was to demonstrate how to identify suitable reference genes for normalization of genes of interest in canine soft tissue sarcomas using RT-qPCR. Primer pairs for 17 potential reference genes were designed and tested in archival tumour biopsies from six dogs. The geNorm algorithm was used to analyse the most suitable reference genes. Eight potential reference genes were excluded from this final analysis because of their dissociation curves. β-Glucuronidase (GUSB) and proteasome subunit, beta type, 6 (PSMB6) were most stably expressed with an M value of 0.154 and a CV of 0.053 describing their average stability. We suggest that choice of reference genes should be based on specific testing in every new experimental set-up. © 2014 John Wiley & Sons Ltd.

  14. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature

    PubMed Central

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V.; Lawson, Andrew; Zeng, Jia; Johnson, Amber M.; Holla, Vijaykumar; Bailey, Ann M.; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W.

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org PMID:25858285

  15. ABAEnrichment: an R package to test for gene set expression enrichment in the adult and developing human brain.

    PubMed

    Grote, Steffi; Prüfer, Kay; Kelso, Janet; Dannemann, Michael

    2016-10-15

    We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. Genome-wide association study of borderline personality disorder reveals genetic overlap with bipolar disorder, major depression and schizophrenia.

    PubMed

    Witt, S H; Streit, F; Jungkunz, M; Frank, J; Awasthi, S; Reinbold, C S; Treutlein, J; Degenhardt, F; Forstner, A J; Heilmann-Heimbach, S; Dietl, L; Schwarze, C E; Schendel, D; Strohmaier, J; Abdellaoui, A; Adolfsson, R; Air, T M; Akil, H; Alda, M; Alliey-Rodriguez, N; Andreassen, O A; Babadjanova, G; Bass, N J; Bauer, M; Baune, B T; Bellivier, F; Bergen, S; Bethell, A; Biernacka, J M; Blackwood, D H R; Boks, M P; Boomsma, D I; Børglum, A D; Borrmann-Hassenbach, M; Brennan, P; Budde, M; Buttenschøn, H N; Byrne, E M; Cervantes, P; Clarke, T-K; Craddock, N; Cruceanu, C; Curtis, D; Czerski, P M; Dannlowski, U; Davis, T; de Geus, E J C; Di Florio, A; Djurovic, S; Domenici, E; Edenberg, H J; Etain, B; Fischer, S B; Forty, L; Fraser, C; Frye, M A; Fullerton, J M; Gade, K; Gershon, E S; Giegling, I; Gordon, S D; Gordon-Smith, K; Grabe, H J; Green, E K; Greenwood, T A; Grigoroiu-Serbanescu, M; Guzman-Parra, J; Hall, L S; Hamshere, M; Hauser, J; Hautzinger, M; Heilbronner, U; Herms, S; Hitturlingappa, S; Hoffmann, P; Holmans, P; Hottenga, J-J; Jamain, S; Jones, I; Jones, L A; Juréus, A; Kahn, R S; Kammerer-Ciernioch, J; Kirov, G; Kittel-Schneider, S; Kloiber, S; Knott, S V; Kogevinas, M; Landén, M; Leber, M; Leboyer, M; Li, Q S; Lissowska, J; Lucae, S; Martin, N G; Mayoral-Cleries, F; McElroy, S L; McIntosh, A M; McKay, J D; McQuillin, A; Medland, S E; Middeldorp, C M; Milaneschi, Y; Mitchell, P B; Montgomery, G W; Morken, G; Mors, O; Mühleisen, T W; Müller-Myhsok, B; Myers, R M; Nievergelt, C M; Nurnberger, J I; O'Donovan, M C; Loohuis, L M O; Ophoff, R; Oruc, L; Owen, M J; Paciga, S A; Penninx, B W J H; Perry, A; Pfennig, A; Potash, J B; Preisig, M; Reif, A; Rivas, F; Rouleau, G A; Schofield, P R; Schulze, T G; Schwarz, M; Scott, L; Sinnamon, G C B; Stahl, E A; Strauss, J; Turecki, G; Van der Auwera, S; Vedder, H; Vincent, J B; Willemsen, G; Witt, C C; Wray, N R; Xi, H S; Tadic, A; Dahmen, N; Schott, B H; Cichon, S; Nöthen, M M; Ripke, S; Mobascher, A; Rujescu, D; Lieb, K; Roepke, S; Schmahl, C; Bohus, M; Rietschel, M

    2017-06-20

    Borderline personality disorder (BOR) is determined by environmental and genetic factors, and characterized by affective instability and impulsivity, diagnostic symptoms also observed in manic phases of bipolar disorder (BIP). Up to 20% of BIP patients show comorbidity with BOR. This report describes the first case-control genome-wide association study (GWAS) of BOR, performed in one of the largest BOR patient samples worldwide. The focus of our analysis was (i) to detect genes and gene sets involved in BOR and (ii) to investigate the genetic overlap with BIP. As there is considerable genetic overlap between BIP, major depression (MDD) and schizophrenia (SCZ) and a high comorbidity of BOR and MDD, we also analyzed the genetic overlap of BOR with SCZ and MDD. GWAS, gene-based tests and gene-set analyses were performed in 998 BOR patients and 1545 controls. Linkage disequilibrium score regression was used to detect the genetic overlap between BOR and these disorders. Single marker analysis revealed no significant association after correction for multiple testing. Gene-based analysis yielded two significant genes: DPYD (P=4.42 × 10 -7 ) and PKP4 (P=8.67 × 10 -7 ); and gene-set analysis yielded a significant finding for exocytosis (GO:0006887, P FDR =0.019; FDR, false discovery rate). Prior studies have implicated DPYD, PKP4 and exocytosis in BIP and SCZ. The most notable finding of the present study was the genetic overlap of BOR with BIP (r g =0.28 [P=2.99 × 10 -3 ]), SCZ (r g =0.34 [P=4.37 × 10 -5 ]) and MDD (r g =0.57 [P=1.04 × 10 -3 ]). We believe our study is the first to demonstrate that BOR overlaps with BIP, MDD and SCZ on the genetic level. Whether this is confined to transdiagnostic clinical symptoms should be examined in future studies.

  17. Quantitative Detection of the nosZ Gene, Encoding Nitrous Oxide Reductase, and Comparison of the Abundances of 16S rRNA, narG, nirK, and nosZ Genes in Soils

    PubMed Central

    Henry, S.; Bru, D.; Stres, B.; Hallet, S.; Philippot, L.

    2006-01-01

    Nitrous oxide (N2O) is an important greenhouse gas in the troposphere controlling ozone concentration in the stratosphere through nitric oxide production. In order to quantify bacteria capable of N2O reduction, we developed a SYBR green quantitative real-time PCR assay targeting the nosZ gene encoding the catalytic subunit of the nitrous oxide reductase. Two independent sets of nosZ primers flanking the nosZ fragment previously used in diversity studies were designed and tested (K. Kloos, A. Mergel, C. Rösch, and H. Bothe, Aust. J. Plant Physiol. 28:991-998, 2001). The utility of these real-time PCR assays was demonstrated by quantifying the nosZ gene present in six different soils. Detection limits were between 101 and 102 target molecules per reaction for all assays. Sequence analysis of 128 cloned quantitative PCR products confirmed the specificity of the designed primers. The abundance of nosZ genes ranged from 105 to 107 target copies g−1 of dry soil, whereas genes for 16S rRNA were found at 108 to 109 target copies g−1 of dry soil. The abundance of narG and nirK genes was within the upper and lower limits of the 16S rRNA and nosZ gene copy numbers. The two sets of nosZ primers gave similar gene copy numbers for all tested soils. The maximum abundance of nosZ and nirK relative to 16S rRNA was 5 to 6%, confirming the low proportion of denitrifiers to total bacteria in soils. PMID:16885263

  18. Mutation in an alternative transcript of CDKL5 in a boy with early-onset seizures.

    PubMed

    Bodian, Dale L; Schreiber, John M; Vilboux, Thierry; Khromykh, Alina; Hauser, Natalie S

    2018-06-01

    Infantile-onset epilepsies are a set of severe, heterogeneous disorders for which clinical genetic testing yields causative mutations in ∼20%-50% of affected individuals. We report the case of a boy presenting with intractable seizures at 2 wk of age, for whom gene panel testing was unrevealing. Research-based whole-genome sequencing of the proband and four unaffected family members identified a de novo mutation, NM_001323289.1:c.2828_2829delGA in CDKL5, a gene associated with X-linked early infantile epileptic encephalopathy 2. CDKL5 has multiple alternative transcripts, and the mutation lies in an exon in the brain-expressed forms. The mutation was undetected by gene panel sequencing because of its intronic location in the CDKL5 transcript typically used to define the exons of this gene for clinical exon-based tests (NM_003159). This is the first report of a patient with a mutation in an alternative transcript of CDKL5 This finding suggests that incorporating alternative transcripts into the design and variant interpretation of exon-based tests, including gene panel and exome sequencing, could improve the diagnostic yield. © 2018 Bodian et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Mutation in an alternative transcript of CDKL5 in a boy with early-onset seizures

    PubMed Central

    Bodian, Dale L.; Schreiber, John M.; Vilboux, Thierry; Khromykh, Alina; Hauser, Natalie S.

    2018-01-01

    Infantile-onset epilepsies are a set of severe, heterogeneous disorders for which clinical genetic testing yields causative mutations in ∼20%–50% of affected individuals. We report the case of a boy presenting with intractable seizures at 2 wk of age, for whom gene panel testing was unrevealing. Research-based whole-genome sequencing of the proband and four unaffected family members identified a de novo mutation, NM_001323289.1:c.2828_2829delGA in CDKL5, a gene associated with X-linked early infantile epileptic encephalopathy 2. CDKL5 has multiple alternative transcripts, and the mutation lies in an exon in the brain-expressed forms. The mutation was undetected by gene panel sequencing because of its intronic location in the CDKL5 transcript typically used to define the exons of this gene for clinical exon-based tests (NM_003159). This is the first report of a patient with a mutation in an alternative transcript of CDKL5. This finding suggests that incorporating alternative transcripts into the design and variant interpretation of exon-based tests, including gene panel and exome sequencing, could improve the diagnostic yield. PMID:29444904

  20. Sensitive and specific detection of early gastric cancer with DNA methylation analysis of gastric washes.

    PubMed

    Watanabe, Yoshiyuki; Kim, Hyun Soo; Castoro, Ryan J; Chung, Woonbok; Estecio, Marcos R H; Kondo, Kimie; Guo, Yi; Ahmed, Saira S; Toyota, Minoru; Itoh, Fumio; Suk, Ki Tae; Cho, Mee-Yon; Shen, Lanlan; Jelinek, Jaroslav; Issa, Jean-Pierre J

    2009-06-01

    Aberrant DNA methylation is an early and frequent process in gastric carcinogenesis and could be useful for detection of gastric neoplasia. We hypothesized that methylation analysis of DNA recovered from gastric washes could be used to detect gastric cancer. We studied 51 candidate genes in 7 gastric cancer cell lines and 24 samples (training set) and identified 6 for further studies. We examined the methylation status of these genes in a test set consisting of 131 gastric neoplasias at various stages. Finally, we validated the 6 candidate genes in a different population of 40 primary gastric cancer samples and 113 nonneoplastic gastric mucosa samples. Six genes (MINT25, RORA, GDNF, ADAM23, PRDM5, MLF1) showed frequent differential methylation between gastric cancer and normal mucosa in the training, test, and validation sets. GDNF and MINT25 were most sensitive molecular markers of early stage gastric cancer, whereas PRDM5 and MLF1 were markers of a field defect. There was a close correlation (r = 0.5-0.9, P = .03-.001) between methylation levels in tumor biopsy and gastric washes. MINT25 methylation had the best sensitivity (90%), specificity (96%), and area under the receiver operating characteristic curve (0.961) in terms of tumor detection in gastric washes. These findings suggest MINT25 is a sensitive and specific marker for screening in gastric cancer. Additionally, we have developed a new method for gastric cancer detection by DNA methylation in gastric washes.

  1. Discovery of error-tolerant biclusters from noisy gene expression data.

    PubMed

    Gupta, Rohit; Rao, Navneet; Kumar, Vipin

    2011-11-24

    An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.

  2. Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence.

    PubMed

    Liu, Gang; Mukherjee, Bhramar; Lee, Seunggeun; Lee, Alice W; Wu, Anna H; Bandera, Elisa V; Jensen, Allan; Rossing, Mary Anne; Moysich, Kirsten B; Chang-Claude, Jenny; Doherty, Jennifer A; Gentry-Maharaj, Aleksandra; Kiemeney, Lambertus; Gayther, Simon A; Modugno, Francesmary; Massuger, Leon; Goode, Ellen L; Fridley, Brooke L; Terry, Kathryn L; Cramer, Daniel W; Ramus, Susan J; Anton-Culver, Hoda; Ziogas, Argyrios; Tyrer, Jonathan P; Schildkraut, Joellen M; Kjaer, Susanne K; Webb, Penelope M; Ness, Roberta B; Menon, Usha; Berchuck, Andrew; Pharoah, Paul D; Risch, Harvey; Pearce, Celeste Leigh

    2018-02-01

    There have been recent proposals advocating the use of additive gene-environment interaction instead of the widely used multiplicative scale, as a more relevant public health measure. Using gene-environment independence enhances statistical power for testing multiplicative interaction in case-control studies. However, under departure from this assumption, substantial bias in the estimates and inflated type I error in the corresponding tests can occur. In this paper, we extend the empirical Bayes (EB) approach previously developed for multiplicative interaction, which trades off between bias and efficiency in a data-adaptive way, to the additive scale. An EB estimator of the relative excess risk due to interaction is derived, and the corresponding Wald test is proposed with a general regression setting under a retrospective likelihood framework. We study the impact of gene-environment association on the resultant test with case-control data. Our simulation studies suggest that the EB approach uses the gene-environment independence assumption in a data-adaptive way and provides a gain in power compared with the standard logistic regression analysis and better control of type I error when compared with the analysis assuming gene-environment independence. We illustrate the methods with data from the Ovarian Cancer Association Consortium. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.

    PubMed

    Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J

    2006-01-01

    One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.

  4. Identification of a 5‑lncRNA signature‑based risk scoring system for survival prediction in colorectal cancer.

    PubMed

    Gu, Liqiang; Yu, Jun; Wang, Qing; Xu, Bin; Ji, Liechen; Yu, Lin; Zhang, Xipeng; Cai, Hui

    2018-05-03

    The present study aimed to investigate potential prognostic long noncoding RNAs (lncRNAs) associated with colorectal cancer (CRC). An mRNA‑seq dataset obtained from The Cancer Genome Atlas was employed to identify the differentially expressed lncRNAs (DELs) between CRC patients with good and poor prognoses. Subsequently, univariate and multivariate Cox regression analyses were conducted to analyze the prognosis‑associated lncRNAs among all DELs. In addition, a risk scoring system was developed according to the expression levels of the prognostic lncRNAs, which was then applied to a training set and an independent testing set. Furthermore, the co‑expressed genes of prognostic lncRNAs were screened using a Multi‑Experiment Matrix online tool for construction of lncRNA‑gene networks. Finally, Kyoto Encyclopedia of Genes and Genomes pathway and Gene Ontology (GO) function enrichment analyses were performed on genes in the lncRNA‑gene networks using KOBAS, GOATOOLS and ClusterProfiler. The present study identified 82 DELs, of which long intergenic nonprotein coding RNA 2159, RP11‑452L6.6, RP11‑894P9.1 and RP11‑69M1.6, and whey acidic protein four‑disulfide core domain 21 (WFDC21P) were reported to be independently associated with the prognosis of patients with CRC. A 5‑lncRNA signature‑based risk scoring system was developed, which may be used to classify patients into low‑ and high‑risk groups with significantly different recurrence‑free survival times in the training and testing sets (P<0.05). Co‑expressed genes of WFDC21P or RP11‑69M1.6 were utilized to construct the lncRNA‑gene networks. Genes in the networks were significantly enriched in 'tight junction', 'focal adhesion' and 'regulation of actin cytoskeleton' pathways, and numerous GO terms associated with 'reactive oxygen species metabolism' and 'nitric oxide metabolism'. The present study proposed a 5‑lncRNA signature‑based risk scoring system for predicting the prognosis of patients with CRC, and revealed the associated signaling pathways and biological processes. The results of the present study may help improve prognostic evaluation in clinical practice.

  5. Logical analysis of diffuse large B-cell lymphomas.

    PubMed

    Alexe, G; Alexe, S; Axelrod, D E; Hammer, P L; Weissmann, D

    2005-07-01

    The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al., which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma. The central concept of LAD is a pattern or combinatorial biomarker, a concept that resembles a rule as used in decision tree methods. LAD is able to exhaustively generate the collection of all those patterns which satisfy certain quality constraints, through a systematic combinatorial process guided by clear optimization criteria. Then, based on a set covering approach, LAD aggregates the collection of patterns into classification models. In addition, LAD is able to use the information provided by large collections of patterns in order to extract subsets of variables, which collectively are able to distinguish between different types of disease. For the differential diagnosis of DLBCL versus FL, a model based on eight significant genes is constructed and shown to have a sensitivity of 94.7% and a specificity of 100% on the test set. For the prognosis of good versus poor outcome among the DLBCL patients, a model is constructed on another set consisting also of eight significant genes, and shown to have a sensitivity of 87.5% and a specificity of 90% on the test set. The genes selected by LAD also work well as a basis for other kinds of statistical analysis, indicating their robustness. These two models exhibit accuracies that compare favorably to those in the original study. In addition, the current study also provides a ranking by importance of the genes in the selected significant subsets as well as a library of dozens of combinatorial biomarkers (i.e. pairs or triplets of genes) that can serve as a source of mathematically generated, statistically significant research hypotheses in need of biological explanation.

  6. Predicting Gene Structures from Multiple RT-PCR Tests

    NASA Astrophysics Data System (ADS)

    Kováč, Jakub; Vinař, Tomáš; Brejová, Broňa

    It has been demonstrated that the use of additional information such as ESTs and protein homology can significantly improve accuracy of gene prediction. However, many sources of external information are still being omitted from consideration. Here, we investigate the use of product lengths from RT-PCR experiments in gene finding. We present hardness results and practical algorithms for several variants of the problem and apply our methods to a real RT-PCR data set in the Drosophila genome. We conclude that the use of RT-PCR data can improve the sensitivity of gene prediction and locate novel splicing variants.

  7. When is hub gene selection better than standard meta-analysis?

    PubMed

    Langfelder, Peter; Mischel, Paul S; Horvath, Steve

    2013-01-01

    Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.

  8. Genome-Wide Association Study for Identification and Validation of Novel SNP Markers for Sr6 Stem Rust Resistance Gene in Bread Wheat.

    PubMed

    Mourad, Amira M I; Sallam, Ahmed; Belamkar, Vikas; Wegulo, Stephen; Bowden, Robert; Jin, Yue; Mahdy, Ezzat; Bakheit, Bahy; El-Wafaa, Atif A; Poland, Jesse; Baenziger, Peter S

    2018-01-01

    Stem rust (caused by Puccinia graminis f. sp. tritici Erikss. & E. Henn.), is a major disease in wheat ( Triticum aestivium L.). However, in recent years it occurs rarely in Nebraska due to weather and the effective selection and gene pyramiding of resistance genes. To understand the genetic basis of stem rust resistance in Nebraska winter wheat, we applied genome-wide association study (GWAS) on a set of 270 winter wheat genotypes (A-set). Genotyping was carried out using genotyping-by-sequencing and ∼35,000 high-quality SNPs were identified. The tested genotypes were evaluated for their resistance to the common stem rust race in Nebraska (QFCSC) in two replications. Marker-trait association identified 32 SNP markers, which were significantly (Bonferroni corrected P < 0.05) associated with the resistance on chromosome 2D. The chromosomal location of the significant SNPs (chromosome 2D) matched the location of Sr6 gene which was expected in these genotypes based on pedigree information. A highly significant linkage disequilibrium (LD, r 2 ) was found between the significant SNPs and the specific SSR marker for the Sr6 gene ( Xcfd43 ). This suggests the significant SNP markers are tagging Sr6 gene. Out of the 32 significant SNPs, eight SNPs were in six genes that are annotated as being linked to disease resistance in the IWGSC RefSeq v1.0. The 32 significant SNP markers were located in nine haplotype blocks. All the 32 significant SNPs were validated in a set of 60 different genotypes (V-set) using single marker analysis. SNP markers identified in this study can be used in marker-assisted selection, genomic selection, and to develop KASP (Kompetitive Allele Specific PCR) marker for the Sr6 gene. Novel SNPs for Sr6 gene, an important stem rust resistant gene, were identified and validated in this study. These SNPs can be used to improve stem rust resistance in wheat.

  9. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.

    PubMed

    Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew

    2012-08-08

    Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  10. “Soldier's Heart”: A Genetic Basis for Elevated Cardiovascular Disease Risk Associated with Post-traumatic Stress Disorder

    PubMed Central

    Pollard, Harvey B.; Shivakumar, Chittari; Starr, Joshua; Eidelman, Ofer; Jacobowitz, David M.; Dalgard, Clifton L.; Srivastava, Meera; Wilkerson, Matthew D.; Stein, Murray B.; Ursano, Robert J.

    2016-01-01

    “Soldier's Heart,” is an American Civil War term linking post-traumatic stress disorder (PTSD) with increased propensity for cardiovascular disease (CVD). We have hypothesized that there might be a quantifiable genetic basis for this linkage. To test this hypothesis we identified a comprehensive set of candidate risk genes for PTSD, and tested whether any were also independent risk genes for CVD. A functional analysis algorithm was used to identify associated signaling networks. We identified 106 PTSD studies that report one or more polymorphic variants in 87 candidate genes in 83,463 subjects and controls. The top upstream drivers for these PTSD risk genes are predicted to be the glucocorticoid receptor (NR3C1) and Tumor Necrosis Factor alpha (TNFA). We find that 37 of the PTSD candidate risk genes are also candidate independent risk genes for CVD. The association between PTSD and CVD is significant by Fisher's Exact Test (P = 3 × 10−54). We also find 15 PTSD risk genes that are independently associated with Type 2 Diabetes Mellitus (T2DM; also significant by Fisher's Exact Test (P = 1.8 × 10−16). Our findings offer quantitative evidence for a genetic link between post-traumatic stress and cardiovascular disease, Computationally, the common mechanism for this linkage between PTSD and CVD is innate immunity and NFκB-mediated inflammation. PMID:27721742

  11. "Soldier's Heart": A Genetic Basis for Elevated Cardiovascular Disease Risk Associated with Post-traumatic Stress Disorder.

    PubMed

    Pollard, Harvey B; Shivakumar, Chittari; Starr, Joshua; Eidelman, Ofer; Jacobowitz, David M; Dalgard, Clifton L; Srivastava, Meera; Wilkerson, Matthew D; Stein, Murray B; Ursano, Robert J

    2016-01-01

    "Soldier's Heart," is an American Civil War term linking post-traumatic stress disorder (PTSD) with increased propensity for cardiovascular disease (CVD). We have hypothesized that there might be a quantifiable genetic basis for this linkage. To test this hypothesis we identified a comprehensive set of candidate risk genes for PTSD, and tested whether any were also independent risk genes for CVD. A functional analysis algorithm was used to identify associated signaling networks. We identified 106 PTSD studies that report one or more polymorphic variants in 87 candidate genes in 83,463 subjects and controls. The top upstream drivers for these PTSD risk genes are predicted to be the glucocorticoid receptor (NR3C1) and Tumor Necrosis Factor alpha (TNFA). We find that 37 of the PTSD candidate risk genes are also candidate independent risk genes for CVD. The association between PTSD and CVD is significant by Fisher's Exact Test ( P = 3 × 10 -54 ). We also find 15 PTSD risk genes that are independently associated with Type 2 Diabetes Mellitus (T2DM; also significant by Fisher's Exact Test ( P = 1.8 × 10 -16 ). Our findings offer quantitative evidence for a genetic link between post-traumatic stress and cardiovascular disease, Computationally, the common mechanism for this linkage between PTSD and CVD is innate immunity and NFκB-mediated inflammation.

  12. A powerful nonparametric method for detecting differentially co-expressed genes: distance correlation screening and edge-count test.

    PubMed

    Zhang, Qingyang

    2018-05-16

    Differential co-expression analysis, as a complement of differential expression analysis, offers significant insights into the changes in molecular mechanism of different phenotypes. A prevailing approach to detecting differentially co-expressed genes is to compare Pearson's correlation coefficients in two phenotypes. However, due to the limitations of Pearson's correlation measure, this approach lacks the power to detect nonlinear changes in gene co-expression which is common in gene regulatory networks. In this work, a new nonparametric procedure is proposed to search differentially co-expressed gene pairs in different phenotypes from large-scale data. Our computational pipeline consisted of two main steps, a screening step and a testing step. The screening step is to reduce the search space by filtering out all the independent gene pairs using distance correlation measure. In the testing step, we compare the gene co-expression patterns in different phenotypes by a recently developed edge-count test. Both steps are distribution-free and targeting nonlinear relations. We illustrate the promise of the new approach by analyzing the Cancer Genome Atlas data and the METABRIC data for breast cancer subtypes. Compared with some existing methods, the new method is more powerful in detecting nonlinear type of differential co-expressions. The distance correlation screening can greatly improve computational efficiency, facilitating its application to large data sets.

  13. Adaptive Set-Based Methods for Association Testing

    PubMed Central

    Su, Yu-Chen; Gauderman, W. James; Kiros, Berhane; Lewinger, Juan Pablo

    2017-01-01

    With a typical sample size of a few thousand subjects, a single genomewide association study (GWAS) using traditional one-SNP-at-a-time methods can only detect genetic variants conferring a sizable effect on disease risk. Set-based methods, which analyze sets of SNPs jointly, can detect variants with smaller effects acting within a gene, a pathway, or other biologically relevant sets. While self-contained set-based methods (those that test sets of variants without regard to variants not in the set) are generally more powerful than competitive set-based approaches (those that rely on comparison of variants in the set of interest with variants not in the set), there is no consensus as to which self-contained methods are best. In particular, several self-contained set tests have been proposed to directly or indirectly ‘adapt’ to the a priori unknown proportion and distribution of effects of the truly associated SNPs in the set, which is a major determinant of their power. A popular adaptive set-based test is the adaptive rank truncated product (ARTP), which seeks the set of SNPs that yields the best-combined evidence of association. We compared the standard ARTP, several ARTP variations we introduced, and other adaptive methods in a comprehensive simulation study to evaluate their performance. We used permutations to assess significance for all the methods and thus provide a level playing field for comparison. We found the standard ARTP test to have the highest power across our simulations followed closely by the global model of random effects (GMRE) and a LASSO based test. PMID:26707371

  14. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  15. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  16. SYBR green-based real-time reverse transcription-PCR for typing and subtyping of all hemagglutinin and neuraminidase genes of avian influenza viruses and comparison to standard serological subtyping tests

    USGS Publications Warehouse

    Tsukamoto, K.; Javier, P.C.; Shishido, M.; Noguchi, D.; Pearce, J.; Kang, H.-M.; Jeong, O.M.; Lee, Y.-J.; Nakanishi, K.; Ashizawa, T.

    2012-01-01

    Continuing outbreaks of H5N1 highly pathogenic (HP) avian influenza virus (AIV) infections of wild birds and poultry worldwide emphasize the need for global surveillance of wild birds. To support the future surveillance activities, we developed a SYBR green-based, real-time reverse transcriptase PCR (rRT-PCR) for detecting nucleoprotein (NP) genes and subtyping 16 hemagglutinin (HA) and 9 neuraminidase (NA) genes simultaneously. Primers were improved by focusing on Eurasian or North American lineage genes; the number of mixed-base positions per primer was set to five or fewer, and the concentration of each primer set was optimized empirically. Also, 30 cycles of amplification of 1:10 dilutions of cDNAs from cultured viruses effectively reduced minor cross- or nonspecific reactions. Under these conditions, 346 HA and 345 NA genes of 349 AIVs were detected, with average sensitivities of NP, HA, and NA genes of 10 1.5, 10 2.3, and 10 3.1 50% egg infective doses, respectively. Utility of rRT-PCR for subtyping AIVs was compared with that of current standard serological tests by using 104 recent migratory duck virus isolates. As a result, all HA genes and 99% of the NA genes were genetically subtyped, while only 45% of HA genes and 74% of NA genes were serologically subtyped. Additionally, direct subtyping of AIVs in fecal samples was possible by 40 cycles of amplification: approximately 70% of HA and NA genes of NP gene-positive samples were successfully subtyped. This validation study indicates that rRT-PCR with optimized primers and reaction conditions is a powerful tool for subtyping varied AIVs in clinical and cultured samples. Copyright ?? 2012, American Society for Microbiology. All Rights Reserved.

  17. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  18. PCR and restriction fragment length polymorphism of a pel gene as a tool to identify Erwinia carotovora in relation to potato diseases.

    PubMed Central

    Darrasse, A; Priou, S; Kotoujansky, A; Bertheau, Y

    1994-01-01

    Using a sequenced pectate lyase-encoding gene (pel gene), we developed a PCR test for Erwinia carotovora. A set of primers allowed the amplification of a 434-bp fragment in E. carotovora strains. Among the 89 E. carotovora strains tested, only the Erwinia carotovora subsp. betavasculorum strains were not detected. A restriction fragment length polymorphism (RFLP) study was undertaken on the amplified fragment with seven endonucleases. The Sau3AI digestion pattern specifically identified the Erwinia carotovora subsp. atroseptica strains, and the whole set of data identified the Erwinia carotovora subsp. wasabiae strains. However, Erwinia carotovora subsp. carotovora and Erwinia carotovora subsp. odorifera could not be separated. Phenetic and phylogenic analyses of RFLP results showed E. carotovora subsp. atroseptica as a homogeneous group while E. carotovora subsp. carotovora and E. carotovora subsp. odorifera strains exhibited a genetic diversity that may result from a nonmonophyletic origin. The use of RFLP on amplified fragments in epidemiology and for diagnosis is discussed. Images PMID:7912502

  19. An Efficient Approach for the Development of Locus Specific Primers in Bread Wheat (Triticum aestivum L.) and Its Application to Re-Sequencing of Genes Involved in Frost Tolerance

    PubMed Central

    Babben, Steve; Perovic, Dragan; Koch, Michael; Ordon, Frank

    2015-01-01

    Recent declines in costs accelerated sequencing of many species with large genomes, including hexaploid wheat (Triticum aestivum L.). Although the draft sequence of bread wheat is known, it is still one of the major challenges to developlocus specific primers suitable to be used in marker assisted selection procedures, due to the high homology of the three genomes. In this study we describe an efficient approach for the development of locus specific primers comprising four steps, i.e. (i) identification of genomic and coding sequences (CDS) of candidate genes, (ii) intron- and exon-structure reconstruction, (iii) identification of wheat A, B and D sub-genome sequences and primer development based on sequence differences between the three sub-genomes, and (iv); testing of primers for functionality, correct size and localisation. This approach was applied to single, low and high copy genes involved in frost tolerance in wheat. In summary for 27 of these genes for which sequences were derived from Triticum aestivum, Triticum monococcum and Hordeum vulgare, a set of 119 primer pairs was developed and after testing on Nulli-tetrasomic (NT) lines, a set of 65 primer pairs (54.6%), corresponding to 19 candidate genes, turned out to be specific. Out of these a set of 35 fragments was selected for validation via Sanger's amplicon re-sequencing. All fragments, with the exception of one, could be assigned to the original reference sequence. The approach presented here showed a much higher specificity in primer development in comparison to techniques used so far in bread wheat and can be applied to other polyploid species with a known draft sequence. PMID:26565976

  20. Latent variable models for gene-environment interactions in longitudinal studies with multiple correlated exposures.

    PubMed

    Tao, Yebin; Sánchez, Brisa N; Mukherjee, Bhramar

    2015-03-30

    Many existing cohort studies designed to investigate health effects of environmental exposures also collect data on genetic markers. The Early Life Exposures in Mexico to Environmental Toxicants project, for instance, has been genotyping single nucleotide polymorphisms on candidate genes involved in mental and nutrient metabolism and also in potentially shared metabolic pathways with the environmental exposures. Given the longitudinal nature of these cohort studies, rich exposure and outcome data are available to address novel questions regarding gene-environment interaction (G × E). Latent variable (LV) models have been effectively used for dimension reduction, helping with multiple testing and multicollinearity issues in the presence of correlated multivariate exposures and outcomes. In this paper, we first propose a modeling strategy, based on LV models, to examine the association between repeated outcome measures (e.g., child weight) and a set of correlated exposure biomarkers (e.g., prenatal lead exposure). We then construct novel tests for G × E effects within the LV framework to examine effect modification of outcome-exposure association by genetic factors (e.g., the hemochromatosis gene). We consider two scenarios: one allowing dependence of the LV models on genes and the other assuming independence between the LV models and genes. We combine the two sets of estimates by shrinkage estimation to trade off bias and efficiency in a data-adaptive way. Using simulations, we evaluate the properties of the shrinkage estimates, and in particular, we demonstrate the need for this data-adaptive shrinkage given repeated outcome measures, exposure measures possibly repeated and time-varying gene-environment association. Copyright © 2014 John Wiley & Sons, Ltd.

  1. Validation of the β-amy1 transcription profiling assay and selection of reference genes suited for a RT-qPCR assay in developing barley caryopsis.

    PubMed

    Ovesná, Jaroslava; Kučera, Ladislav; Vaculová, Kateřina; Štrymplová, Kamila; Svobodová, Ilona; Milella, Luigi

    2012-01-01

    Reverse transcription coupled with real-time quantitative PCR (RT-qPCR) is a frequently used method for gene expression profiling. Reference genes (RGs) are commonly employed to normalize gene expression data. A limited information exist on the gene expression and profiling in developing barley caryopsis. Expression stability was assessed by measuring the cycle threshold (Ct) range and applying both the GeNorm (pair-wise comparison of geometric means) and Normfinder (model-based approach) principles for the calculation. Here, we have identified a set of four RGs suitable for studying gene expression in the developing barley caryopsis. These encode the proteins GAPDH, HSP90, HSP70 and ubiquitin. We found a correlation between the frequency of occurrence of a transcript in silico and its suitability as an RG. This set of RGs was tested by comparing the normalized level of β-amylase (β-amy1) transcript with directly measured quantities of the BMY1 gene product in the developing barley caryopsis. This panel of genes could be used for other gene expression studies, as well as to optimize β-amy1 analysis for study of the impact of β-amy1 expression upon barley end-use quality.

  2. A Fast Multiple-Kernel Method With Applications to Detect Gene-Environment Interaction.

    PubMed

    Marceau, Rachel; Lu, Wenbin; Holloway, Shannon; Sale, Michèle M; Worrall, Bradford B; Williams, Stephen R; Hsu, Fang-Chi; Tzeng, Jung-Ying

    2015-09-01

    Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene-gene or gene-environment interactions, incorporating variance-component based methods for population substructure into rare-variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the "expectation-maximization (EM)" algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene-environment interaction, we propose a computationally efficient and statistically rigorous "fastKM" algorithm for multikernel analysis that is based on a low-rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single-kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM-based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene-by-vitamin effects on recurrent stroke risk and gene-by-age effects on change in homocysteine level. © 2015 WILEY PERIODICALS, INC.

  3. Challenges in projecting clustering results across gene expression-profiling datasets.

    PubMed

    Lusa, Lara; McShane, Lisa M; Reid, James F; De Cecco, Loris; Ambrogi, Federico; Biganzoli, Elia; Gariboldi, Manuela; Pierotti, Marco A

    2007-11-21

    Gene expression microarray studies for several types of cancer have been reported to identify previously unknown subtypes of tumors. For breast cancer, a molecular classification consisting of five subtypes based on gene expression microarray data has been proposed. These subtypes have been reported to exist across several breast cancer microarray studies, and they have demonstrated some association with clinical outcome. A classification rule based on the method of centroids has been proposed for identifying the subtypes in new collections of breast cancer samples; the method is based on the similarity of the new profiles to the mean expression profile of the previously identified subtypes. Previously identified centroids of five breast cancer subtypes were used to assign 99 breast cancer samples, including a subset of 65 estrogen receptor-positive (ER+) samples, to five breast cancer subtypes based on microarray data for the samples. The effect of mean centering the genes (i.e., transforming the expression of each gene so that its mean expression is equal to 0) on subtype assignment by method of centroids was assessed. Further studies of the effect of mean centering and of class prevalence in the test set on the accuracy of method of centroids classifications of ER status were carried out using training and test sets for which ER status had been independently determined by ligand-binding assay and for which the proportion of ER+ and ER- samples were systematically varied. When all 99 samples were considered, mean centering before application of the method of centroids appeared to be helpful for correctly assigning samples to subtypes, as evidenced by the expression of genes that had previously been used as markers to identify the subtypes. However, when only the 65 ER+ samples were considered for classification, many samples appeared to be misclassified, as evidenced by an unexpected distribution of ER+ samples among the resultant subtypes. When genes were mean centered before classification of samples for ER status, the accuracy of the ER subgroup assignments was highly dependent on the proportion of ER+ samples in the test set; this effect of subtype prevalence was not seen when gene expression data were not mean centered. Simple corrections such as mean centering of genes aimed at microarray platform or batch effect correction can have undesirable consequences because patient population effects can easily be confused with these assay-related effects. Careful thought should be given to the comparability of the patient populations before attempting to force data comparability for purposes of assigning subtypes to independent subjects.

  4. GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach.

    PubMed

    Zhang, Song; Cao, Jing; Kong, Y Megan; Scheuermann, Richard H

    2010-04-01

    A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy. We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.

  5. Development of PCR primers specific for the amplification and direct sequencing of gyrB genes from microbacteria, order Actinomycetales.

    PubMed

    Richert, Kathrin; Brambilla, Evelyne; Stackebrandt, Erko

    2005-01-01

    PCR primer sets were developed for the specific amplification and sequence analyses encoding the gyrase subunit B (gyrB) of members of the family Microbacteriaceae, class Actinobacteria. The family contains species highly related by 16S rRNA gene sequence analyses. In order to test if the gene sequence analysis of gyrB is appropriate to discriminate between closely related species, we evaluate the 16S rRNA gene phylogeny of its members. As the published universal primer set for gyrB failed to amplify the responding gene of the majority of the 80 type strains of the family, three new primer sets were identified that generated fragments with a composite sequence length of about 900 nt. However, the amplification of all three fragments was successful only in 25% of the 80 type strains. In this study, the substitution frequencies in genes encoding gyrase and 16S rDNA were compared for 10 strains of nine genera. The frequency of gyrB nucleotide substitution is significantly higher than that of the 16S rDNA, and no linear correlation exists between the similarities of both molecules among members of the Microbacteriaceae. The phylogenetic analyses using the gyrB sequences provide higher resolution than using 16S rDNA sequences and seem able to discriminate between closely related species.

  6. Systems Genetics Analysis of Genome-Wide Association Study Reveals Novel Associations Between Key Biological Processes and Coronary Artery Disease.

    PubMed

    Ghosh, Sujoy; Vivar, Juan; Nelson, Christopher P; Willenborg, Christina; Segrè, Ayellet V; Mäkinen, Ville-Petteri; Nikpay, Majid; Erdmann, Jeannette; Blankenberg, Stefan; O'Donnell, Christopher; März, Winfried; Laaksonen, Reijo; Stewart, Alexandre F R; Epstein, Stephen E; Shah, Svati H; Granger, Christopher B; Hazen, Stanley L; Kathiresan, Sekar; Reilly, Muredach P; Yang, Xia; Quertermous, Thomas; Samani, Nilesh J; Schunkert, Heribert; Assimes, Themistocles L; McPherson, Ruth

    2015-07-01

    Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks. Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways. These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD. © 2015 American Heart Association, Inc.

  7. Estimation of gene induction enables a relevance-based ranking of gene sets.

    PubMed

    Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens

    2009-07-01

    In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.

  8. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.

    PubMed

    Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W

    2015-01-01

    Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.

  9. Integrative analysis of micro-RNA, gene expression, and survival of glioblastoma multiforme.

    PubMed

    Huang, Yen-Tsung; Hsu, Thomas; Kelsey, Karl T; Lin, Chien-Ling

    2015-02-01

    Glioblastoma multiforme (GBM), the most common type of malignant brain tumor, is highly fatal. Limited understanding of its rapid progression necessitates additional approaches that integrate what is known about the genomics of this cancer. Using a discovery set (n = 348) and a validation set (n = 174) of GBM patients, we performed genome-wide analyses that integrated mRNA and micro-RNA expression data from GBM as well as associated survival information, assessing coordinated variability in each as this reflects their known mechanistic functions. Cox proportional hazards models were used for the survival analyses, and nonparametric permutation tests were performed for the micro-RNAs to investigate the association between the number of associated genes and its prognostication. We also utilized mediation analyses for micro-RNA-gene pairs to identify their mediation effects. Genome-wide analyses revealed a novel pattern: micro-RNAs related to more gene expressions are more likely to be associated with GBM survival (P = 4.8 × 10(-5)). Genome-wide mediation analyses for the 32,660 micro-RNA-gene pairs with strong association (false discovery rate [FDR] < 0.01%) identified 51 validated pairs with significant mediation effect. Of the 51 pairs, miR-223 had 16 mediation genes. These 16 mediation genes of miR-223 were also highly associated with various other micro-RNAs and mediated their prognostic effects as well. We further constructed a gene signature using the 16 genes, which was highly associated with GBM survival in both the discovery and validation sets (P = 9.8 × 10(-6)). This comprehensive study discovered mediation effects of micro-RNA to gene expression and GBM survival and provided a new analytic framework for integrative genomics. © 2014 WILEY PERIODICALS, INC.

  10. Mining gene link information for survival pathway hunting.

    PubMed

    Jing, Gao-Jian; Zhang, Zirui; Wang, Hong-Qiang; Zheng, Hong-Mei

    2015-08-01

    This study proposes a gene link-based method for survival time-related pathway hunting. In this method, the authors incorporate gene link information to estimate how a pathway is associated with cancer patient's survival time. Specifically, a gene link-based Cox proportional hazard model (Link-Cox) is established, in which two linked genes are considered together to represent a link variable and the association of the link with survival time is assessed using Cox proportional hazard model. On the basis of the Link-Cox model, the authors formulate a new statistic for measuring the association of a pathway with survival time of cancer patients, referred to as pathway survival score (PSS), by summarising survival significance over all the gene links in the pathway, and devise a permutation test to test the significance of an observed PSS. To evaluate the proposed method, the authors applied it to simulation data and two publicly available real-world gene expression data sets. Extensive comparisons with previous methods show the effectiveness and efficiency of the proposed method for survival pathway hunting.

  11. Association of High Myopia with Crystallin Beta A4 (CRYBA4) Gene Polymorphisms in the Linkage-Identified MYP6 Locus

    PubMed Central

    Ho, Daniel W. H.; Yap, Maurice K. H.; Ng, Po Wah; Fung, Wai Yan; Yip, Shea Ping

    2012-01-01

    Background Myopia is the most common ocular disorder worldwide and imposes tremendous burden on the society. It is a complex disease. The MYP6 locus at 22 q12 is of particular interest because many studies have detected linkage signals at this interval. The MYP6 locus is likely to contain susceptibility gene(s) for myopia, but none has yet been identified. Methodology/Principal Findings Two independent subject groups of southern Chinese in Hong Kong participated in the study an initial study using a discovery sample set of 342 cases and 342 controls, and a follow-up study using a replication sample set of 316 cases and 313 controls. Cases with high myopia were defined by spherical equivalent ≤ -8 dioptres and emmetropic controls by spherical equivalent within ±1.00 dioptre for both eyes. Manual candidate gene selection from the MYP6 locus was supported by objective in silico prioritization. DNA samples of discovery sample set were genotyped for 178 tagging single nucleotide polymorphisms (SNPs) from 26 genes. For replication, 25 SNPs (tagging or located at predicted transcription factor or microRNA binding sites) from 4 genes were subsequently examined using the replication sample set. Fisher P value was calculated for all SNPs and overall association results were summarized by meta-analysis. Based on initial and replication studies, rs2009066 located in the crystallin beta A4 (CRYBA4) gene was identified to be the most significantly associated with high myopia (initial study: P = 0.02; replication study: P = 1.88e-4; meta-analysis: P = 1.54e-5) among all the SNPs tested. The association result survived correction for multiple comparisons. Under the allelic genetic model for the combined sample set, the odds ratio of the minor allele G was 1.41 (95% confidence intervals, 1.21-1.64). Conclusions/Significance A novel susceptibility gene (CRYBA4) was discovered for high myopia. Our study also signified the potential importance of appropriate gene prioritization in candidate selection. PMID:22792142

  12. Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes.

    PubMed

    Taylor, Louis J; Strebel, Klaus

    2017-01-07

    Gene knockouts are a common tool used to study gene function in various organisms. However, designing gene knockouts is complicated in viruses, which frequently contain sequences that code for multiple overlapping genes. Designing mutants that can be traced by the creation of new or elimination of existing restriction sites further compounds the difficulty in experimental design of knockouts of overlapping genes. While software is available to rapidly identify restriction sites in a given nucleotide sequence, no existing software addresses experimental design of mutations involving multiple overlapping amino acid sequences in generating gene knockouts. Pyviko performed well on a test set of over 240,000 gene pairs collected from viral genomes deposited in the National Center for Biotechnology Information Nucleotide database, identifying a point mutation which added a premature stop codon within the first 20 codons of the target gene in 93.2% of all tested gene-overprinted gene pairs. This shows that Pyviko can be used successfully in a wide variety of contexts to facilitate the molecular cloning and study of viral overprinted genes. Pyviko is an extensible and intuitive Python tool for designing knockouts of overlapping genes. Freely available as both a Python package and a web-based interface ( http://louiejtaylor.github.io/pyViKO/ ), Pyviko simplifies the experimental design of gene knockouts in complex viruses with overlapping genes.

  13. The Gene Set Builder: collation, curation, and distribution of sets of genes

    PubMed Central

    Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W

    2005-01-01

    Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of genes in a user-friendly environment. The application can be accessed via . PMID:16371163

  14. Novel PCR Assays Complement Laser Biosensor-Based Method and Facilitate Listeria Species Detection from Food.

    PubMed

    Kim, Kwang-Pyo; Singh, Atul K; Bai, Xingjian; Leprun, Lena; Bhunia, Arun K

    2015-09-08

    The goal of this study was to develop the Listeria species-specific PCR assays based on a house-keeping gene (lmo1634) encoding alcohol acetaldehyde dehydrogenase (Aad), previously designated as Listeria adhesion protein (LAP), and compare results with a label-free light scattering sensor, BARDOT (bacterial rapid detection using optical scattering technology). PCR primer sets targeting the lap genes from the species of Listeria sensu stricto were designed and tested with 47 Listeria and 8 non-Listeria strains. The resulting PCR primer sets detected either all species of Listeria sensu stricto or individual L. innocua, L. ivanovii and L. seeligeri, L. welshimeri, and L. marthii without producing any amplified products from other bacteria tested. The PCR assays with Listeria sensu stricto-specific primers also successfully detected all species of Listeria sensu stricto and/or Listeria innocua from mixed culture-inoculated food samples, and each bacterium in food was verified by using the light scattering sensor that generated unique scatter signature for each species of Listeria tested. The PCR assays based on the house-keeping gene aad (lap) can be used for detection of either all species of Listeria sensu stricto or certain individual Listeria species in a mixture from food with a detection limit of about 10⁴ CFU/mL.

  15. Novel PCR Assays Complement Laser Biosensor-Based Method and Facilitate Listeria Species Detection from Food

    PubMed Central

    Kim, Kwang-Pyo; Singh, Atul K.; Bai, Xingjian; Leprun, Lena; Bhunia, Arun K.

    2015-01-01

    The goal of this study was to develop the Listeria species-specific PCR assays based on a house-keeping gene (lmo1634) encoding alcohol acetaldehyde dehydrogenase (Aad), previously designated as Listeria adhesion protein (LAP), and compare results with a label-free light scattering sensor, BARDOT (bacterial rapid detection using optical scattering technology). PCR primer sets targeting the lap genes from the species of Listeria sensu stricto were designed and tested with 47 Listeria and 8 non-Listeria strains. The resulting PCR primer sets detected either all species of Listeria sensu stricto or individual L. innocua, L. ivanovii and L. seeligeri, L. welshimeri, and L. marthii without producing any amplified products from other bacteria tested. The PCR assays with Listeria sensu stricto-specific primers also successfully detected all species of Listeria sensu stricto and/or Listeria innocua from mixed culture-inoculated food samples, and each bacterium in food was verified by using the light scattering sensor that generated unique scatter signature for each species of Listeria tested. The PCR assays based on the house-keeping gene aad (lap) can be used for detection of either all species of Listeria sensu stricto or certain individual Listeria species in a mixture from food with a detection limit of about 104 CFU/mL. PMID:26371000

  16. Pathway-based analyses.

    PubMed

    Kent, Jack W

    2016-02-03

    New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation and penalties for multiple testing. The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge. Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data. The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.

  17. Validation of reference genes for RT-qPCR studies of gene expression in banana fruit under different experimental conditions.

    PubMed

    Chen, Lei; Zhong, Hai-ying; Kuang, Jian-fei; Li, Jian-guo; Lu, Wang-jin; Chen, Jian-ye

    2011-08-01

    Reverse transcription quantitative real-time PCR (RT-qPCR) is a sensitive technique for quantifying gene expression, but its success depends on the stability of the reference gene(s) used for data normalization. Only a few studies on validation of reference genes have been conducted in fruit trees and none in banana yet. In the present work, 20 candidate reference genes were selected, and their expression stability in 144 banana samples were evaluated and analyzed using two algorithms, geNorm and NormFinder. The samples consisted of eight sample sets collected under different experimental conditions, including various tissues, developmental stages, postharvest ripening, stresses (chilling, high temperature, and pathogen), and hormone treatments. Our results showed that different suitable reference gene(s) or combination of reference genes for normalization should be selected depending on the experimental conditions. The RPS2 and UBQ2 genes were validated as the most suitable reference genes across all tested samples. More importantly, our data further showed that the widely used reference genes, ACT and GAPDH, were not the most suitable reference genes in many banana sample sets. In addition, the expression of MaEBF1, a gene of interest that plays an important role in regulating fruit ripening, under different experimental conditions was used to further confirm the validated reference genes. Taken together, our results provide guidelines for reference gene(s) selection under different experimental conditions and a foundation for more accurate and widespread use of RT-qPCR in banana.

  18. Identification and validation of biomarkers of IgV(H) mutation status in chronic lymphocytic leukemia using microfluidics quantitative real-time polymerase chain reaction technology.

    PubMed

    Abruzzo, Lynne V; Barron, Lynn L; Anderson, Keith; Newman, Rachel J; Wierda, William G; O'brien, Susan; Ferrajoli, Alessandra; Luthra, Madan; Talwalkar, Sameer; Luthra, Rajyalakshmi; Jones, Dan; Keating, Michael J; Coombes, Kevin R

    2007-09-01

    To develop a model incorporating relevant prognostic biomarkers for untreated chronic lymphocytic leukemia patients, we re-analyzed the raw data from four published gene expression profiling studies. We selected 88 candidate biomarkers linked to immunoglobulin heavy-chain variable region gene (IgV(H)) mutation status and produced a reliable and reproducible microfluidics quantitative real-time polymerase chain reaction array. We applied this array to a training set of 29 purified samples from previously untreated patients. In an unsupervised analysis, the samples clustered into two groups. Using a cutoff point of 2% homology to the germline IgV(H) sequence, one group contained all 14 IgV(H)-unmutated samples; the other contained all 15 mutated samples. We confirmed the differential expression of 37 of the candidate biomarkers using two-sample t-tests. Next, we constructed 16 different models to predict IgV(H) mutation status and evaluated their performance on an independent test set of 20 new samples. Nine models correctly classified 11 of 11 IgV(H)-mutated cases and eight of nine IgV(H)-unmutated cases, with some models using three to seven genes. Thus, we can classify cases with 95% accuracy based on the expression of as few as three genes.

  19. SiBIC: a web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining.

    PubMed

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.

  20. Dynamic association rules for gene expression data analysis.

    PubMed

    Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung

    2015-10-14

    The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

  1. Putative synaptic genes defined from a Drosophila whole body developmental transcriptome by a machine learning approach.

    PubMed

    Pazos Obregón, Flavio; Papalardo, Cecilia; Castro, Sebastián; Guerberoff, Gustavo; Cantera, Rafael

    2015-09-15

    Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Although roughly a thousand genes are expected to be important for this function in Drosophila melanogaster, just a few hundreds of them are known so far. In this work we trained three learning algorithms to predict a "synaptic function" for genes of Drosophila using data from a whole-body developmental transcriptome published by others. Using statistical and biological criteria to analyze and combine the predictions, we obtained a gene catalogue that is highly enriched in genes of relevance for Drosophila synapse assembly and function but still not recognized as such. The utility of our approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.

  2. Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

    PubMed

    Lukashin, A V; Fuchs, R

    2001-05-01

    Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.

  3. Exploiting the full power of temporal gene expression profiling through a new statistical test: application to the analysis of muscular dystrophy data.

    PubMed

    Vinciotti, Veronica; Liu, Xiaohui; Turk, Rolf; de Meijer, Emile J; 't Hoen, Peter A C

    2006-04-03

    The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523.

  4. Exploiting the full power of temporal gene expression profiling through a new statistical test: Application to the analysis of muscular dystrophy data

    PubMed Central

    Vinciotti, Veronica; Liu, Xiaohui; Turk, Rolf; de Meijer, Emile J; 't Hoen, Peter AC

    2006-01-01

    Background The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. Results We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. Conclusion The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523. PMID:16584545

  5. Coregulation of srGAP1 by Wnt and Androgen Receptor Signaling: A New Target for Treatment of CRPC

    DTIC Science & Technology

    2015-10-01

    Specific Aim 1: Test the... Specific Aim2: Test the hypothesis that down regulating srGAP1 in CRPC cells change phenotypic...direct interaction between AR and β-catenin seemed to elicit a specific expression of a set of target genes in low androgen conditions in CRPC.

  6. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  7. Evolutionary changes in the notochord genetic toolkit: a comparative analysis of notochord genes in the ascidian Ciona and the larvacean Oikopleura.

    PubMed

    Kugler, Jamie E; Kerner, Pierre; Bouquet, Jean-Marie; Jiang, Di; Di Gregorio, Anna

    2011-01-20

    The notochord is a defining feature of the chordate clade, and invertebrate chordates, such as tunicates, are uniquely suited for studies of this structure. Here we used a well-characterized set of 50 notochord genes known to be targets of the notochord-specific Brachyury transcription factor in one tunicate, Ciona intestinalis (Class Ascidiacea), to begin determining whether the same genetic toolkit is employed to build the notochord in another tunicate, Oikopleura dioica (Class Larvacea). We identified Oikopleura orthologs of the Ciona notochord genes, as well as lineage-specific duplicates for which we determined the phylogenetic relationships with related genes from other chordates, and we analyzed their expression patterns in Oikopleura embryos. Of the 50 Ciona notochord genes that were used as a reference, only 26 had clearly identifiable orthologs in Oikopleura. Two of these conserved genes appeared to have undergone Oikopleura- and/or tunicate-specific duplications, and one was present in three copies in Oikopleura, thus bringing the number of genes to test to 30. We were able to clone and test 28 of these genes. Thirteen of the 28 Oikopleura orthologs of Ciona notochord genes showed clear expression in all or in part of the Oikopleura notochord, seven were diffusely expressed throughout the tail, six were expressed in tissues other than the notochord, while two probes did not provide a detectable signal at any of the stages analyzed. One of the notochord genes identified, Oikopleura netrin, was found to be unevenly expressed in notochord cells, in a pattern reminiscent of that previously observed for one of the Oikopleura Hox genes. A surprisingly high number of Ciona notochord genes do not have apparent counterparts in Oikopleura, and only a fraction of the evolutionarily conserved genes show clear notochord expression. This suggests that Ciona and Oikopleura, despite the morphological similarities of their notochords, have developed rather divergent sets of notochord genes after their split from a common tunicate ancestor. This study demonstrates that comparisons between divergent tunicates can lead to insights into the basic complement of genes sufficient for notochord development, and elucidate the constraints that control its composition.

  8. Evolutionary changes in the notochord genetic toolkit: a comparative analysis of notochord genes in the ascidian Ciona and the larvacean Oikopleura

    PubMed Central

    2011-01-01

    Background The notochord is a defining feature of the chordate clade, and invertebrate chordates, such as tunicates, are uniquely suited for studies of this structure. Here we used a well-characterized set of 50 notochord genes known to be targets of the notochord-specific Brachyury transcription factor in one tunicate, Ciona intestinalis (Class Ascidiacea), to begin determining whether the same genetic toolkit is employed to build the notochord in another tunicate, Oikopleura dioica (Class Larvacea). We identified Oikopleura orthologs of the Ciona notochord genes, as well as lineage-specific duplicates for which we determined the phylogenetic relationships with related genes from other chordates, and we analyzed their expression patterns in Oikopleura embryos. Results Of the 50 Ciona notochord genes that were used as a reference, only 26 had clearly identifiable orthologs in Oikopleura. Two of these conserved genes appeared to have undergone Oikopleura- and/or tunicate-specific duplications, and one was present in three copies in Oikopleura, thus bringing the number of genes to test to 30. We were able to clone and test 28 of these genes. Thirteen of the 28 Oikopleura orthologs of Ciona notochord genes showed clear expression in all or in part of the Oikopleura notochord, seven were diffusely expressed throughout the tail, six were expressed in tissues other than the notochord, while two probes did not provide a detectable signal at any of the stages analyzed. One of the notochord genes identified, Oikopleura netrin, was found to be unevenly expressed in notochord cells, in a pattern reminiscent of that previously observed for one of the Oikopleura Hox genes. Conclusions A surprisingly high number of Ciona notochord genes do not have apparent counterparts in Oikopleura, and only a fraction of the evolutionarily conserved genes show clear notochord expression. This suggests that Ciona and Oikopleura, despite the morphological similarities of their notochords, have developed rather divergent sets of notochord genes after their split from a common tunicate ancestor. This study demonstrates that comparisons between divergent tunicates can lead to insights into the basic complement of genes sufficient for notochord development, and elucidate the constraints that control its composition. PMID:21251251

  9. Combined protein construct and synthetic gene engineering for heterologous protein expression and crystallization using Gene Composer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raymond, Amy; Lovell, Scott; Lorimer, Don

    2009-12-01

    With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript. In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38{alpha}), viral polymerase (HCV NS5B), and bacterial structural protein (FtsZ) were expressed in both E. colimore » and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias. The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.« less

  10. Adaptive Set-Based Methods for Association Testing.

    PubMed

    Su, Yu-Chen; Gauderman, William James; Berhane, Kiros; Lewinger, Juan Pablo

    2016-02-01

    With a typical sample size of a few thousand subjects, a single genome-wide association study (GWAS) using traditional one single nucleotide polymorphism (SNP)-at-a-time methods can only detect genetic variants conferring a sizable effect on disease risk. Set-based methods, which analyze sets of SNPs jointly, can detect variants with smaller effects acting within a gene, a pathway, or other biologically relevant sets. Although self-contained set-based methods (those that test sets of variants without regard to variants not in the set) are generally more powerful than competitive set-based approaches (those that rely on comparison of variants in the set of interest with variants not in the set), there is no consensus as to which self-contained methods are best. In particular, several self-contained set tests have been proposed to directly or indirectly "adapt" to the a priori unknown proportion and distribution of effects of the truly associated SNPs in the set, which is a major determinant of their power. A popular adaptive set-based test is the adaptive rank truncated product (ARTP), which seeks the set of SNPs that yields the best-combined evidence of association. We compared the standard ARTP, several ARTP variations we introduced, and other adaptive methods in a comprehensive simulation study to evaluate their performance. We used permutations to assess significance for all the methods and thus provide a level playing field for comparison. We found the standard ARTP test to have the highest power across our simulations followed closely by the global model of random effects (GMRE) and a least absolute shrinkage and selection operator (LASSO)-based test. © 2015 WILEY PERIODICALS, INC.

  11. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering.

    PubMed

    Chang, Jinyuan; Zhou, Wen; Zhou, Wen-Xin; Wang, Lan

    2017-03-01

    Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2016, The International Biometric Society.

  12. Representing virus-host interactions and other multi-organism processes in the Gene Ontology.

    PubMed

    Foulger, R E; Osumi-Sutherland, D; McIntosh, B K; Hulo, C; Masson, P; Poux, S; Le Mercier, P; Lomax, J

    2015-07-28

    The Gene Ontology project is a collaborative effort to provide descriptions of gene products in a consistent and computable language, and in a species-independent manner. The Gene Ontology is designed to be applicable to all organisms but up to now has been largely under-utilized for prokaryotes and viruses, in part because of a lack of appropriate ontology terms. To address this issue, we have developed a set of Gene Ontology classes that are applicable to microbes and their hosts, improving both coverage and quality in this area of the Gene Ontology. Describing microbial and viral gene products brings with it the additional challenge of capturing both the host and the microbe. Recognising this, we have worked closely with annotation groups to test and optimize the GO classes, and we describe here a set of annotation guidelines that allow the controlled description of two interacting organisms. Building on the microbial resources already in existence such as ViralZone, UniProtKB keywords and MeGO, this project provides an integrated ontology to describe interactions between microbial species and their hosts, with mappings to the external resources above. Housing this information within the freely-accessible Gene Ontology project allows the classes and annotation structure to be utilized by a large community of biologists and users.

  13. Substitutions in the Glycogenin-1 Gene Are Associated with the Evolution of Endothermy in Sharks and Tunas.

    PubMed

    Ciezarek, Adam G; Dunning, Luke T; Jones, Catherine S; Noble, Leslie R; Humble, Emily; Stefanni, Sergio S; Savolainen, Vincent

    2016-10-05

    Despite 400-450 million years of independent evolution, a strong phenotypic convergence has occurred between two groups of fish: tunas and lamnid sharks. This convergence is characterized by centralization of red muscle, a distinctive swimming style (stiffened body powered through tail movements) and elevated body temperature (endothermy). Furthermore, both groups demonstrate elevated white muscle metabolic capacities. All these traits are unusual in fish and more likely evolved to support their fast-swimming, pelagic, predatory behavior. Here, we tested the hypothesis that their convergent evolution was driven by selection on a set of metabolic genes. We sequenced white muscle transcriptomes of six tuna, one mackerel, and three shark species, and supplemented this data set with previously published RNA-seq data. Using 26 species in total (including 7,032 tuna genes plus 1,719 shark genes), we constructed phylogenetic trees and carried out maximum-likelihood analyses of gene selection. We inferred several genes relating to metabolism to be under selection. We also found that the same one gene, glycogenin-1, evolved under positive selection independently in tunas and lamnid sharks, providing evidence of convergent selective pressures at gene level possibly underlying shared physiology. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Enterotoxigenic coagulase positive Staphylococcus in milk and milk products, lben and jben, in northern Morocco.

    PubMed

    Bendahou, Abdrezzak; Abid, Mohammed; Bouteldoun, Nadine; Catelejine, Dierick; Lebbadi, Mariam

    2009-04-30

    The aim of this research was to determine the prevalence of enterotoxin genes (sea-seo) in Coagulase Positive Staphylococcus (CPS) isolated from unpasteurized milk and milk products. These results were compared with the results obtained by using the detection kit SET-RPLA for the specific detection of staphylococcal enterotoxins (SEA-SED). Eighty-one samples of milk and milk products were analyzed for the presence of Staphylococcus strains. Forty-six coagulase positive Staphylococcus isolates were tested for the production of staphylococcal enterotoxins (SEA-SED) by using the reversed passive latex agglutination method. The strains were also tested for the presence of se genes (sea-seo) by polymerase chain reaction. One or more classical enterotoxin products (SEA-SED) were observed in 39% of the strains tested, while se genes were detected in 56.5%. SEA and sea were most commonly detected. For newly discovered se genes among CPS isolates tested in this study, except the seh gene which was revealed in four isolates (8.7 %), none of the strains harbored any of the other se genes (see, seg, sei, sej, sek, sel, sem, seo and sen). The finding of a pathogen such as staphylococci-producing SEs and containing se genes in milk and milk products in northern Morocco may indicate a problem for public health in this region. The presence of enterotoxigenic strains in food does not always necessarily mean that the toxin will be produced. For that reason, the combination of both methods (RPLA and PCR) is a guarantee for success in diagnostic analysis tests.

  15. Use of genomic data in risk assessment case study: II. Evaluation of the dibutyl phthalate toxicogenomic data set

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Euling, Susan Y., E-mail: euling.susan@epa.gov; White, Lori D.; Kim, Andrea S.

    An evaluation of the toxicogenomic data set for dibutyl phthalate (DBP) and male reproductive developmental effects was performed as part of a larger case study to test an approach for incorporating genomic data in risk assessment. The DBP toxicogenomic data set is composed of nine in vivo studies from the published literature that exposed rats to DBP during gestation and evaluated gene expression changes in testes or Wolffian ducts of male fetuses. The exercise focused on qualitative evaluation, based on a lack of available dose–response data, of the DBP toxicogenomic data set to postulate modes and mechanisms of action formore » the male reproductive developmental outcomes, which occur in the lower dose range. A weight-of-evidence evaluation was performed on the eight DBP toxicogenomic studies of the rat testis at the gene and pathway levels. The results showed relatively strong evidence of DBP-induced downregulation of genes in the steroidogenesis pathway and lipid/sterol/cholesterol transport pathway as well as effects on immediate early gene/growth/differentiation, transcription, peroxisome proliferator-activated receptor signaling and apoptosis pathways in the testis. Since two established modes of action (MOAs), reduced fetal testicular testosterone production and Insl3 gene expression, explain some but not all of the testis effects observed in rats after in utero DBP exposure, other MOAs are likely to be operative. A reanalysis of one DBP microarray study identified additional pathways within cell signaling, metabolism, hormone, disease, and cell adhesion biological processes. These putative new pathways may be associated with DBP effects on the testes that are currently unexplained. This case study on DBP identified data gaps and research needs for the use of toxicogenomic data in risk assessment. Furthermore, this study demonstrated an approach for evaluating toxicogenomic data in human health risk assessment that could be applied to future chemicals. - Highlights: ► We evaluate the dibutyl phthalate toxicogenomic data for use in risk assessment. ► We focus on information about the mechanism of action for the developing testis. ► Multiple studies report effects on testosterone and insl3-related pathways. ► We identify additional affected pathways that may explain some testis effects. ► The case study is a template for evaluating toxicogenomic data in risk assessment.« less

  16. Analyzing the Role of MicroRNAs in Schizophrenia in the Context of Common Genetic Risk Variants.

    PubMed

    Hauberg, Mads Engel; Roussos, Panos; Grove, Jakob; Børglum, Anders Dupont; Mattheisen, Manuel

    2016-04-01

    The recent implication of 108 genomic loci in schizophrenia marked a great advancement in our understanding of the disease. Against the background of its polygenic nature there is a necessity to identify how schizophrenia risk genes interplay. As regulators of gene expression, microRNAs (miRNAs) have repeatedly been implicated in schizophrenia etiology. It is therefore of interest to establish their role in the regulation of schizophrenia risk genes in disease-relevant biological processes. To examine the role of miRNAs in schizophrenia in the context of disease-associated genetic variation. The basis of this study was summary statistics from the largest schizophrenia genome-wide association study meta-analysis to date (83 550 individuals in a meta-analysis of 52 genome-wide association studies) completed in 2014 along with publicly available data for predicted miRNA targets. We examined whether schizophrenia risk genes were more likely to be regulated by miRNA. Further, we used gene set analyses to identify miRNAs that are regulators of schizophrenia risk genes. Results from association tests for miRNA targetomes and related analyses. In line with previous studies, we found that similar to other complex traits, schizophrenia risk genes were more likely to be regulated by miRNAs (P < 2 × 10-16). Further, the gene set analyses revealed several miRNAs regulating schizophrenia risk genes, with the strongest enrichment for targets of miR-9-5p (P = .0056 for enrichment among the top 1% most-associated single-nucleotide polymorphisms, corrected for multiple testing). It is further of note that MIR9-2 is located in a genomic region showing strong evidence for association with schizophrenia (P = 7.1 × 10-8). The second and third strongest gene set signals were seen for the targets of miR-485-5p and miR-137, respectively. This study provides evidence for a role of miR-9-5p in the etiology of schizophrenia. Its implication is of particular interest as the functions of this neurodevelopmental miRNA tie in with established disease biology: it has a regulatory loop with the fragile X mental retardation homologue FXR1 and regulates dopamine D2 receptor density.

  17. Endeavour update: a web resource for gene prioritization in multiple species

    PubMed Central

    Tranchevent, Léon-Charles; Barriot, Roland; Yu, Shi; Van Vooren, Steven; Van Loo, Peter; Coessens, Bert; De Moor, Bart; Aerts, Stein; Moreau, Yves

    2008-01-01

    Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis. PMID:18508807

  18. Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest.

    PubMed

    Saka, Ernur; Harrison, Benjamin J; West, Kirk; Petruska, Jeffrey C; Rouchka, Eric C

    2017-12-06

    Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature. We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences. Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.

  19. Sensitive and Specific Detection of Early Gastric Cancer Using DNA Methylation Analysis of Gastric Washes

    PubMed Central

    Watanabe, Yoshiyuki; Kim, Hyun Soo; Castoro, Ryan J.; Chung, Woonbok; Estecio, Marcos R. H.; Kondo, Kimie; Guo, Yi; Ahmed, Saira S.; Toyota, Minoru; Itoh, Fumio; Suk, Ki Tae; Cho, Mee-Yon; Shen, Lanlan; Jelinek, Jaroslav; Issa, Jean-Pierre J.

    2009-01-01

    Background & Aims Aberrant DNA methylation is an early and frequent process in gastric carcinogenesis and could be useful for detection of gastric neoplasia. We hypothesized that methylation analysis of DNA recovered from gastric washes could be used to detect gastric cancer. Methods We studied 51 candidate genes in 7 gastric cancer cell lines and 24 samples (training set) and identified 6 for further studies. We examined the methylation status of these genes in a test set consisting of 131 gastric neoplasias at various stages. Finally, we validated the 6 candidate genes in a different population of 40 primary gastric cancer samples and 113 non-neoplastic gastric mucosa samples. Results 6 genes (MINT25, RORA, GDNF, ADAM23, PRDM5, MLF1) showed frequent differential methylation between gastric cancer and normal mucosa in the training, test and validation sets. GDNF and MINT25 were most sensitive molecular markers of early stage gastric cancer while PRDM5 and MLF1 were markers of a field defect. There was a close correlation (r=0.5 to 0.9, p=0.03 to 0.001) between methylation levels in tumor biopsy and gastric washes. MINT25 methylation had the best sensitivity (90%), specificity (96%), and area under the ROC curve (0.961) in terms of tumor detection in gastric washes. Conclusions These findings suggest MINT25 is a sensitive and specific marker for screening in gastric cancer. Additionally we have developed a new methodology for gastric cancer detection by DNA methylation in gastric washes. PMID:19375421

  20. Genome wide predictions of miRNA regulation by transcription factors.

    PubMed

    Ruffalo, Matthew; Bar-Joseph, Ziv

    2016-09-01

    Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ zivbj@cs.cmu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Identification of Modulators of the Nuclear Receptor ...

    EPA Pesticide Factsheets

    The nuclear receptor family member peroxisome proliferator-activated receptor α (PPARα) is activated by therapeutic hypolipidemic drugs and environmentally-relevant chemicals to regulate genes involved in lipid transport and catabolism. Chronic activation of PPARα in rodents increases in liver cancer incidence, whereas suppression of PPARα activity can lead to hepatocellular steatosis. Analytical approaches were developed to identify biosets (i.e., gene expression differences between two conditions) in a genomic database in which PPARα activity was altered. A gene expression signature of 131 PPARα-dependent genes was built using profiles from the livers of wild-type and PPARα-null mice after exposure to three structurally diverse PPARα activators (WY-14,643, fenofibrate and perfluorohexane sulfonate). A rank-based test (Running Fisher’s test (p-value ≤ 10-4)) was used to evaluate the similarity between the PPARα signature and a test set of 48 and 31 biosets positive or negative, respectively for PPARα activation; the test resulted in a balanced accuracy of 98%. The signature was used to identify factors that activate or suppress PPARα in an annotated mouse liver/primary hepatocyte gene expression database of ~1850 biosets. In addition to the expected activation of PPARα by fibrate drugs, di(2-ethylhexyl) phthalate, and perfluorinated compounds, PPARα was activated by benzofuran, galactosamine and TCDD and suppressed by hepatotoxins acetami

  2. Biotechnology Conference: Diagnostics 󈨛 Held in Cambridge, England on 10 and 11 December 1987.

    DTIC Science & Technology

    1988-05-25

    settings. 1 -hour culture confirmation test for herpes (ColorGene DNA hybridization test for HSV confirmation). This test NEW AMPEROMETRIC BIOSENSORS...I Thin Layer Technology: Monolayers to Multi Thin Films ................. 1 Single-Step Immunoassay Systems...if this thin-layer pr•ccss~is probe technolh,,y. and biosensors. The aim of the con- demonstrated in Figure 1 . which shows the disposition of ference

  3. MalaCards: an integrated compendium for diseases and their annotation

    PubMed Central

    Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron

    2013-01-01

    Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/ PMID:23584832

  4. Evaluation of Reference Genes for RT qPCR Analyses of Structure-Specific and Hormone Regulated Gene Expression in Physcomitrella patens Gametophytes

    PubMed Central

    Le Bail, Aude; Scholz, Sebastian; Kost, Benedikt

    2013-01-01

    The use of the moss Physcomitrella patens as a model system to study plant development and physiology is rapidly expanding. The strategic position of P. patens within the green lineage between algae and vascular plants, the high efficiency with which transgenes are incorporated by homologous recombination, advantages associated with the haploid gametophyte representing the dominant phase of the P. patens life cycle, the simple structure of protonemata, leafy shoots and rhizoids that constitute the haploid gametophyte, as well as a readily accessible high-quality genome sequence make this moss a very attractive experimental system. The investigation of the genetic and hormonal control of P. patens development heavily depends on the analysis of gene expression patterns by real time quantitative PCR (RT qPCR). This technique requires well characterized sets of reference genes, which display minimal expression level variations under all analyzed conditions, for data normalization. Sets of suitable reference genes have been described for most widely used model systems including e.g. Arabidopsis thaliana, but not for P. patens. Here, we present a RT qPCR based comparison of transcript levels of 12 selected candidate reference genes in a range of gametophytic P. patens structures at different developmental stages, and in P. patens protonemata treated with hormones or hormone transport inhibitors. Analysis of these RT qPCR data using GeNorm and NormFinder software resulted in the identification of sets of P. patens reference genes suitable for gene expression analysis under all tested conditions, and suggested that the two best reference genes are sufficient for effective data normalization under each of these conditions. PMID:23951063

  5. Dexamethasone Stimulated Gene Expression in Peripheral Blood is a Sensitive Marker for Glucocorticoid Receptor Resistance in Depressed Patients

    PubMed Central

    Menke, Andreas; Arloth, Janine; Pütz, Benno; Weber, Peter; Klengel, Torsten; Mehta, Divya; Gonik, Mariya; Rex-Haffner, Monika; Rubel, Jennifer; Uhr, Manfred; Lucae, Susanne; Deussing, Jan M; Müller-Myhsok, Bertram; Holsboer, Florian; Binder, Elisabeth B

    2012-01-01

    Although gene expression profiles in peripheral blood in major depression are not likely to identify genes directly involved in the pathomechanism of affective disorders, they may serve as biomarkers for this disorder. As previous studies using baseline gene expression profiles have provided mixed results, our approach was to use an in vivo dexamethasone challenge test and to compare glucocorticoid receptor (GR)-mediated changes in gene expression between depressed patients and healthy controls. Whole genome gene expression data (baseline and following GR-stimulation with 1.5 mg dexamethasone p.o.) from two independent cohorts were analyzed to identify gene expression pattern that would predict case and control status using a training (N=18 cases/18 controls) and a test cohort (N=11/13). Dexamethasone led to reproducible regulation of 2670 genes in controls and 1151 transcripts in cases. Several genes, including FKBP5 and DUSP1, previously associated with the pathophysiology of major depression, were found to be reliable markers of GR-activation. Using random forest analyses for classification, GR-stimulated gene expression outperformed baseline gene expression as a classifier for case and control status with a correct classification of 79.1 vs 41.6% in the test cohort. GR-stimulated gene expression performed best in dexamethasone non-suppressor patients (88.7% correctly classified with 100% sensitivity), but also correctly classified 77.3% of the suppressor patients (76.7% sensitivity), when using a refined set of 19 genes. Our study suggests that in vivo stimulated gene expression in peripheral blood cells could be a promising molecular marker of altered GR-functioning, an important component of the underlying pathology, in patients suffering from depressive episodes. PMID:22237309

  6. The Molecular Signatures Database (MSigDB) hallmark gene set collection.

    PubMed

    Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo

    2015-12-23

    The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.

  7. Identification and evaluation of new reference genes in Gossypium hirsutum for accurate normalization of real-time quantitative RT-PCR data.

    PubMed

    Artico, Sinara; Nardeli, Sarah M; Brilhante, Osmundo; Grossi-de-Sa, Maria Fátima; Alves-Ferreira, Marcio

    2010-03-21

    Normalizing through reference genes, or housekeeping genes, can make more accurate and reliable results from reverse transcription real-time quantitative polymerase chain reaction (qPCR). Recent studies have shown that no single housekeeping gene is universal for all experiments. Thus, suitable reference genes should be the first step of any qPCR analysis. Only a few studies on the identification of housekeeping gene have been carried on plants. Therefore qPCR studies on important crops such as cotton has been hampered by the lack of suitable reference genes. By the use of two distinct algorithms, implemented by geNorm and NormFinder, we have assessed the gene expression of nine candidate reference genes in cotton: GhACT4, GhEF1alpha5, GhFBX6, GhPP2A1, GhMZA, GhPTB, GhGAPC2, GhbetaTUB3 and GhUBQ14. The candidate reference genes were evaluated in 23 experimental samples consisting of six distinct plant organs, eight stages of flower development, four stages of fruit development and in flower verticils. The expression of GhPP2A1 and GhUBQ14 genes were the most stable across all samples and also when distinct plants organs are examined. GhACT4 and GhUBQ14 present more stable expression during flower development, GhACT4 and GhFBX6 in the floral verticils and GhMZA and GhPTB during fruit development. Our analysis provided the most suitable combination of reference genes for each experimental set tested as internal control for reliable qPCR data normalization. In addition, to illustrate the use of cotton reference genes we checked the expression of two cotton MADS-box genes in distinct plant and floral organs and also during flower development. We have tested the expression stabilities of nine candidate genes in a set of 23 tissue samples from cotton plants divided into five different experimental sets. As a result of this evaluation, we recommend the use of GhUBQ14 and GhPP2A1 housekeeping genes as superior references for normalization of gene expression measures in different cotton plant organs; GhACT4 and GhUBQ14 for flower development, GhACT4 and GhFBX6 for the floral organs and GhMZA and GhPTB for fruit development. We also provide the primer sequences whose performance in qPCR experiments is demonstrated. These genes will enable more accurate and reliable normalization of qPCR results for gene expression studies in this important crop, the major source of natural fiber and also an important source of edible oil. The use of bona fide reference genes allowed a detailed and accurate characterization of the temporal and spatial expression pattern of two MADS-box genes in cotton.

  8. Identification and evaluation of new reference genes in Gossypium hirsutum for accurate normalization of real-time quantitative RT-PCR data

    PubMed Central

    2010-01-01

    Background Normalizing through reference genes, or housekeeping genes, can make more accurate and reliable results from reverse transcription real-time quantitative polymerase chain reaction (qPCR). Recent studies have shown that no single housekeeping gene is universal for all experiments. Thus, suitable reference genes should be the first step of any qPCR analysis. Only a few studies on the identification of housekeeping gene have been carried on plants. Therefore qPCR studies on important crops such as cotton has been hampered by the lack of suitable reference genes. Results By the use of two distinct algorithms, implemented by geNorm and NormFinder, we have assessed the gene expression of nine candidate reference genes in cotton: GhACT4, GhEF1α5, GhFBX6, GhPP2A1, GhMZA, GhPTB, GhGAPC2, GhβTUB3 and GhUBQ14. The candidate reference genes were evaluated in 23 experimental samples consisting of six distinct plant organs, eight stages of flower development, four stages of fruit development and in flower verticils. The expression of GhPP2A1 and GhUBQ14 genes were the most stable across all samples and also when distinct plants organs are examined. GhACT4 and GhUBQ14 present more stable expression during flower development, GhACT4 and GhFBX6 in the floral verticils and GhMZA and GhPTB during fruit development. Our analysis provided the most suitable combination of reference genes for each experimental set tested as internal control for reliable qPCR data normalization. In addition, to illustrate the use of cotton reference genes we checked the expression of two cotton MADS-box genes in distinct plant and floral organs and also during flower development. Conclusion We have tested the expression stabilities of nine candidate genes in a set of 23 tissue samples from cotton plants divided into five different experimental sets. As a result of this evaluation, we recommend the use of GhUBQ14 and GhPP2A1 housekeeping genes as superior references for normalization of gene expression measures in different cotton plant organs; GhACT4 and GhUBQ14 for flower development, GhACT4 and GhFBX6 for the floral organs and GhMZA and GhPTB for fruit development. We also provide the primer sequences whose performance in qPCR experiments is demonstrated. These genes will enable more accurate and reliable normalization of qPCR results for gene expression studies in this important crop, the major source of natural fiber and also an important source of edible oil. The use of bona fide reference genes allowed a detailed and accurate characterization of the temporal and spatial expression pattern of two MADS-box genes in cotton. PMID:20302670

  9. A comparative analysis of biclustering algorithms for gene expression data

    PubMed Central

    Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.

    2013-01-01

    The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters. PMID:22772837

  10. Landscape genomic analysis of candidate genes for climate adaptation in a California endemic oak, Quercus lobata.

    PubMed

    Sork, Victoria L; Squire, Kevin; Gugger, Paul F; Steele, Stephanie E; Levy, Eric D; Eckert, Andrew J

    2016-01-01

    The ability of California tree populations to survive anthropogenic climate change will be shaped by the geographic structure of adaptive genetic variation. Our goal is to test whether climate-associated candidate genes show evidence of spatially divergent selection in natural populations of valley oak, Quercus lobata, as preliminary indication of local adaptation. Using DNA from 45 individuals from 13 localities across the species' range, we sequenced portions of 40 candidate genes related to budburst/flowering, growth, osmotic stress, and temperature stress. Using 195 single nucleotide polymorphisms (SNPs), we estimated genetic differentiation across populations and correlated allele frequencies with climate gradients using single-locus and multivariate models. The top 5% of FST estimates ranged from 0.25 to 0.68, yielding loci potentially under spatially divergent selection. Environmental analyses of SNP frequencies with climate gradients revealed three significantly correlated SNPs within budburst/flowering genes and two SNPs within temperature stress genes with mean annual precipitation, after controlling for multiple testing. A redundancy model showed a significant association between SNPs and climate variables and revealed a similar set of SNPs with high loadings on the first axis. In the RDA, climate accounted for 67% of the explained variation, when holding climate constant, in contrast to a putatively neutral SSR data set where climate accounted for only 33%. Population differentiation and geographic gradients of allele frequencies in climate-associated functional genes in Q. lobata provide initial evidence of adaptive genetic variation and background for predicting population response to climate change. © 2016 Botanical Society of America.

  11. A Novel Predictive Equation for Potential Diagnosis of Cholangiocarcinoma

    PubMed Central

    Kraiklang, Ratthaphol; Pairojkul, Chawalit; Khuntikeo, Narong; Imtawil, Kanokwan; Wongkham, Sopit; Wongkham, Chaisiri

    2014-01-01

    Cholangiocarcinoma (CCA) is the second most common-primary liver cancer. The difficulties in diagnosis limit successful treatment of CCA. At present, histological investigation is the standard diagnosis for CCA. However, there are some poor-defined tumor tissues which cannot be definitively diagnosed by general histopathology. As molecular signatures can define molecular phenotypes related to diagnosis, prognosis, or treatment outcome, and CCA is the second most common cancer found after hepatocellularcarcinoma (HCC), the aim of this study was to develop a predictive model which differentiates CCA from HCC and normal liver tissues. An in-house PCR array containing 176 putative CCA marker genes was tested with the training set tissues of 20 CCA and 10 HCC cases. The molecular signature of CCA revealed the prominent expression of genes involved in cell adhesion and cell movement, whereas HCC showed elevated expression of genes related to cell proliferation/differentiation and metabolisms. A total of 69 genes differentially expressed in CCA and HCC were optimized statistically to formulate a diagnostic equation which distinguished CCA cases from HCC cases. Finally, a four-gene diagnostic equation (CLDN4, HOXB7, TMSB4 and TTR) was formulated and then successfully validated using real-time PCR in an independent testing set of 68 CCA samples and 77 non-CCA controls. Discrimination analysis showed that a combination of these genes could be used as a diagnostic marker for CCA with better diagnostic parameters with high sensitivity and specificity than using a single gene marker or the usual serum markers (CA19-9 and CEA). This new combination marker may help physicians to identify CCA in liver tissues when the histopathology is uncertain. PMID:24586698

  12. Massive-scale gene co-expression network construction and robustness testing using random matrix theory.

    PubMed

    Gibson, Scott M; Ficklin, Stephen P; Isaacson, Sven; Luo, Feng; Feltus, Frank A; Smith, Melissa C

    2013-01-01

    The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust.

  13. Selecting Question-Specific Genes to Reduce Incongruence in Phylogenomics: A Case Study of Jawed Vertebrate Backbone Phylogeny.

    PubMed

    Chen, Meng-Yun; Liang, Dan; Zhang, Peng

    2015-11-01

    Incongruence between different phylogenomic analyses is the main challenge faced by phylogeneticists in the genomic era. To reduce incongruence, phylogenomic studies normally adopt some data filtering approaches, such as reducing missing data or using slowly evolving genes, to improve the signal quality of data. Here, we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes to investigate the backbone phylogeny of jawed vertebrates under both concatenation and coalescent-based frameworks. To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods, we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions. We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions. Our analyses showed that non-specific data sets that are generated without bias toward specific questions are not sufficient to produce consistent results when there are several difficult nodes within a phylogeny. Moreover, phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods. To address such incongruences, we selected genes that resolve a given internode but not the entire phylogeny. Notably, not only can this strategy yield correct relationships for the question, but it also reduces inconsistency associated with data sizes and inference methods. Our study highlights the importance of gene selection in phylogenomic analyses, suggesting that simply using a large amount of data cannot guarantee correct results. Constructing question-specific data sets may be more powerful for resolving problematic nodes. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Clinical interpretation of pathogenic ATM and CHEK2 variants on multigene panel tests: navigating moderate risk.

    PubMed

    West, Allison H; Blazer, Kathleen R; Stoll, Jessica; Jones, Matthew; Weipert, Caroline M; Nielsen, Sarah M; Kupfer, Sonia S; Weitzel, Jeffrey N; Olopade, Olufunmilayo I

    2018-02-14

    Comprehensive genomic cancer risk assessment (GCRA) helps patients, family members, and providers make informed choices about cancer screening, surgical and chemotherapeutic risk reduction, and genetically targeted cancer therapies. The increasing availability of multigene panel tests for clinical applications allows testing of well-defined high-risk genes, as well as moderate-risk genes, for which the penetrance and spectrum of cancer risk are less well characterized. Moderate-risk genes are defined as genes that, when altered by a pathogenic variant, confer a 2 to fivefold relative risk of cancer. Two such genes included on many comprehensive cancer panels are the DNA repair genes ATM and CHEK2, best known for moderately increased risk of breast cancer development. However, the impact of screening and preventative interventions and spectrum of cancer risk beyond breast cancer associated with ATM and/or CHEK2 variants remain less well characterized. We convened a large, multidisciplinary, cross-sectional panel of GCRA clinicians to review challenging, peer-submitted cases of patients identified with ATM or CHEK2 variants. This paper summarizes the inter-professional case discussion and recommendations generated during the session, the level of concordance with respect to recommendations between the academic and community clinician participants for each case, and potential barriers to implementing recommended care in various practice settings.

  15. The Drosophila Translational Control Element (TCE) Is Required for High-Level Transcription of Many Genes That Are Specifically Expressed in Testes

    PubMed Central

    Anderson, Ashley K.; Ohler, Uwe; Wassarman, David A.

    2012-01-01

    To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5′ untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300–400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation. PMID:22984601

  16. The Drosophila Translational Control Element (TCE) is required for high-level transcription of many genes that are specifically expressed in testes.

    PubMed

    Katzenberger, Rebeccah J; Rach, Elizabeth A; Anderson, Ashley K; Ohler, Uwe; Wassarman, David A

    2012-01-01

    To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5' untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300-400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation.

  17. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  18. The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks1[C][W][OA

    PubMed Central

    Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex

    2010-01-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062

  19. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks.

    PubMed

    Ficklin, Stephen P; Luo, Feng; Feltus, F Alex

    2010-09-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.

  20. Chronic Antibody-Mediated Rejection in Nonhuman Primate Renal Allografts: Validation of Human Histological and Molecular Phenotypes.

    PubMed

    Adam, B A; Smith, R N; Rosales, I A; Matsunami, M; Afzali, B; Oura, T; Cosimi, A B; Kawai, T; Colvin, R B; Mengel, M

    2017-11-01

    Molecular testing represents a promising adjunct for the diagnosis of antibody-mediated rejection (AMR). Here, we apply a novel gene expression platform in sequential formalin-fixed paraffin-embedded samples from nonhuman primate (NHP) renal transplants. We analyzed 34 previously described gene transcripts related to AMR in humans in 197 archival NHP samples, including 102 from recipients that developed chronic AMR, 80 from recipients without AMR, and 15 normal native nephrectomies. Three endothelial genes (VWF, DARC, and CAV1), derived from 10-fold cross-validation receiver operating characteristic curve analysis, demonstrated excellent discrimination between AMR and non-AMR samples (area under the curve = 0.92). This three-gene set correlated with classic features of AMR, including glomerulitis, capillaritis, glomerulopathy, C4d deposition, and DSAs (r = 0.39-0.63, p < 0.001). Principal component analysis confirmed the association between three-gene set expression and AMR and highlighted the ambiguity of v lesions and ptc lesions between AMR and T cell-mediated rejection (TCMR). Elevated three-gene set expression corresponded with the development of immunopathological evidence of rejection and often preceded it. Many recipients demonstrated mixed AMR and TCMR, suggesting that this represents the natural pattern of rejection. These data provide NHP animal model validation of recent updates to the Banff classification including the assessment of molecular markers for diagnosing AMR. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.

  1. In silico prediction of novel therapeutic targets using gene-disease association data.

    PubMed

    Ferrero, Enrico; Dunham, Ian; Sanseau, Philippe

    2017-08-29

    Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.

  2. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-08

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Reference genes for gene expression studies in wheat flag leaves grown under different farming conditions

    PubMed Central

    2011-01-01

    Background Internal control genes with highly uniform expression throughout the experimental conditions are required for accurate gene expression analysis as no universal reference genes exists. In this study, the expression stability of 24 candidate genes from Triticum aestivum cv. Cubus flag leaves grown under organic and conventional farming systems was evaluated in two locations in order to select suitable genes that can be used for normalization of real-time quantitative reverse-transcription PCR (RT-qPCR) reactions. The genes were selected among the most common used reference genes as well as genes encoding proteins involved in several metabolic pathways. Findings Individual genes displayed different expression rates across all samples assayed. Applying geNorm, a set of three potential reference genes were suitable for normalization of RT-qPCR reactions in winter wheat flag leaves cv. Cubus: TaFNRII (ferredoxin-NADP(H) oxidoreductase; AJ457980.1), ACT2 (actin 2; TC234027), and rrn26 (a putative homologue to RNA 26S gene; AL827977.1). In addition of these three genes that were also top-ranked by NormFinder, two extra genes: CYP18-2 (Cyclophilin A, AY456122.1) and TaWIN1 (14-3-3 like protein, AB042193) were most consistently stably expressed. Furthermore, we showed that TaFNRII, ACT2, and CYP18-2 are suitable for gene expression normalization in other two winter wheat varieties (Tommi and Centenaire) grown under three treatments (organic, conventional and no nitrogen) and a different environment than the one tested with cv. Cubus. Conclusions This study provides a new set of reference genes which should improve the accuracy of gene expression analyses when using wheat flag leaves as those related to the improvement of nitrogen use efficiency for cereal production. PMID:21951810

  4. Gene Editing and Gene-Based Therapeutics for Cardiomyopathies.

    PubMed

    Ohiri, Joyce C; McNally, Elizabeth M

    2018-04-01

    With an increasing understanding of genetic defects leading to cardiomyopathy, focus is shifting to correcting these underlying genetic defects. One approach involves treating mutant RNA through antisense oligonucleotides; the first drug has received regulatory approval to treat specific mutations associated with Duchenne muscular dystrophy. Gene editing is being evaluated in the preclinical setting. For inherited cardiomyopathies, genetic correction strategies require tight specificity for the mutant allele. Gene-editing methods are being tested to create deletions that may be useful to restore protein expression by through the bypass of mutations that restore protein production. Site-specific gene editing, which is required to correct many point mutations, is a less efficient process than inducing deletions. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Statistical assessment of crosstalk enrichment between gene groups in biological networks.

    PubMed

    McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L

    2013-01-01

    Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

  6. EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries

    PubMed Central

    Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

    2008-01-01

    Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects. PMID:18402700

  7. Differential network analysis reveals the genome-wide landscape of estrogen receptor modulation in hormonal cancers

    PubMed Central

    Hsiao, Tzu-Hung; Chiu, Yu-Chiao; Hsu, Pei-Yin; Lu, Tzu-Pin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Huang, Tim H.-M.; Chuang, Eric Y.; Chen, Yidong

    2016-01-01

    Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC. PMID:26972162

  8. EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

    PubMed

    Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

    2008-04-10

    Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.

  9. Breast cancer prognosis by combinatorial analysis of gene expression data.

    PubMed

    Alexe, Gabriela; Alexe, Sorin; Axelrod, David E; Bonates, Tibérius O; Lozina, Irina I; Reiss, Michael; Hammer, Peter L

    2006-01-01

    The potential of applying data analysis tools to microarray data for diagnosis and prognosis is illustrated on the recent breast cancer dataset of van 't Veer and coworkers. We re-examine that dataset using the novel technique of logical analysis of data (LAD), with the double objective of discovering patterns characteristic for cases with good or poor outcome, using them for accurate and justifiable predictions; and deriving novel information about the role of genes, the existence of special classes of cases, and other factors. Data were analyzed using the combinatorics and optimization-based method of LAD, recently shown to provide highly accurate diagnostic and prognostic systems in cardiology, cancer proteomics, hematology, pulmonology, and other disciplines. LAD identified a subset of 17 of the 25,000 genes, capable of fully distinguishing between patients with poor, respectively good prognoses. An extensive list of 'patterns' or 'combinatorial biomarkers' (that is, combinations of genes and limitations on their expression levels) was generated, and 40 patterns were used to create a prognostic system, shown to have 100% and 92.9% weighted accuracy on the training and test sets, respectively. The prognostic system uses fewer genes than other methods, and has similar or better accuracy than those reported in other studies. Out of the 17 genes identified by LAD, three (respectively, five) were shown to play a significant role in determining poor (respectively, good) prognosis. Two new classes of patients (described by similar sets of covering patterns, gene expression ranges, and clinical features) were discovered. As a by-product of the study, it is shown that the training and the test sets of van 't Veer have differing characteristics. The study shows that LAD provides an accurate and fully explanatory prognostic system for breast cancer using genomic data (that is, a system that, in addition to predicting good or poor prognosis, provides an individualized explanation of the reasons for that prognosis for each patient). Moreover, the LAD model provides valuable insights into the roles of individual and combinatorial biomarkers, allows the discovery of new classes of patients, and generates a vast library of biomedical research hypotheses.

  10. A chain reaction approach to modelling gene pathways.

    PubMed

    Cheng, Gary C; Chen, Dung-Tsa; Chen, James J; Soong, Seng-Jaw; Lamartiniere, Coral; Barnes, Stephen

    2012-08-01

    BACKGROUND: Of great interest in cancer prevention is how nutrient components affect gene pathways associated with the physiological events of puberty. Nutrient-gene interactions may cause changes in breast or prostate cells and, therefore, may result in cancer risk later in life. Analysis of gene pathways can lead to insights about nutrient-gene interactions and the development of more effective prevention approaches to reduce cancer risk. To date, researchers have relied heavily upon experimental assays (such as microarray analysis, etc.) to identify genes and their associated pathways that are affected by nutrient and diets. However, the vast number of genes and combinations of gene pathways, coupled with the expense of the experimental analyses, has delayed the progress of gene-pathway research. The development of an analytical approach based on available test data could greatly benefit the evaluation of gene pathways, and thus advance the study of nutrient-gene interactions in cancer prevention. In the present study, we have proposed a chain reaction model to simulate gene pathways, in which the gene expression changes through the pathway are represented by the species undergoing a set of chemical reactions. We have also developed a numerical tool to solve for the species changes due to the chain reactions over time. Through this approach we can examine the impact of nutrient-containing diets on the gene pathway; moreover, transformation of genes over time with a nutrient treatment can be observed numerically, which is very difficult to achieve experimentally. We apply this approach to microarray analysis data from an experiment which involved the effects of three polyphenols (nutrient treatments), epigallo-catechin-3-O-gallate (EGCG), genistein, and resveratrol, in a study of nutrient-gene interaction in the estrogen synthesis pathway during puberty. RESULTS: In this preliminary study, the estrogen synthesis pathway was simulated by a chain reaction model. By applying it to microarray data, the chain reaction model computed a set of reaction rates to examine the effects of three polyphenols (EGCG, genistein, and resveratrol) on gene expression in this pathway during puberty. We first performed statistical analysis to test the time factor on the estrogen synthesis pathway. Global tests were used to evaluate an overall gene expression change during puberty for each experimental group. Then, a chain reaction model was employed to simulate the estrogen synthesis pathway. Specifically, the model computed the reaction rates in a set of ordinary differential equations to describe interactions between genes in the pathway (A reaction rate K of A to B represents gene A will induce gene B per unit at a rate of K; we give details in the "method" section). Since disparate changes of gene expression may cause numerical error problems in solving these differential equations, we used an implicit scheme to address this issue. We first applied the chain reaction model to obtain the reaction rates for the control group. A sensitivity study was conducted to evaluate how well the model fits to the control group data at Day 50. Results showed a small bias and mean square error. These observations indicated the model is robust to low random noises and has a good fit for the control group. Then the chain reaction model derived from the control group data was used to predict gene expression at Day 50 for the three polyphenol groups. If these nutrients affect the estrogen synthesis pathways during puberty, we expect discrepancy between observed and expected expressions. Results indicated some genes had large differences in the EGCG (e.g., Hsd3b and Sts) and the resveratrol (e.g., Hsd3b and Hrmt12) groups. CONCLUSIONS: In the present study, we have presented (I) experimental studies of the effect of nutrient diets on the gene expression changes in a selected estrogen synthesis pathway. This experiment is valuable because it allows us to examine how the nutrient-containing diets regulate gene expression in the estrogen synthesis pathway during puberty; (II) global tests to assess an overall association of this particular pathway with time factor by utilizing generalized linear models to analyze microarray data; and (III) a chain reaction model to simulate the pathway. This is a novel application because we are able to translate the gene pathway into the chemical reactions in which each reaction channel describes gene-gene relationship in the pathway. In the chain reaction model, the implicit scheme is employed to efficiently solve the differential equations. Data analysis results show the proposed model is capable of predicting gene expression changes and demonstrating the effect of nutrient-containing diets on gene expression changes in the pathway. One of the objectives of this study is to explore and develop a numerical approach for simulating the gene expression change so that it can be applied and calibrated when the data of more time slices are available, and thus can be used to interpolate the expression change at a desired time point without conducting expensive experiments for a large amount of time points. Hence, we are not claiming this is either essential or the most efficient way for simulating this problem, rather a mathematical/numerical approach that can model the expression change of a large set of genes of a complex pathway. In addition, we understand the limitation of this experiment and realize that it is still far from being a complete model of predicting nutrient-gene interactions. The reason is that in the present model, the reaction rates were estimated based on available data at two time points; hence, the gene expression change is dependent upon the reaction rates and a linear function of the gene expressions. More data sets containing gene expression at various time slices are needed in order to improve the present model so that a non-linear variation of gene expression changes at different time can be predicted.

  11. Quantitative assessment of Hox complex expression in the indirect development of the polychaete annelid Chaetopterus sp

    NASA Technical Reports Server (NTRS)

    Peterson, K. J.; Irvine, S. Q.; Cameron, R. A.; Davidson, E. H.

    2000-01-01

    A prediction from the set-aside theory of bilaterian origins is that pattern formation processes such as those controlled by the Hox cluster genes are required specifically for adult body plan formation. This prediction can be tested in animals that use maximal indirect development, in which the embryonic formation of the larva and the postembryonic formation of the adult body plan are temporally and spatially distinct. To this end, we quantitatively measured the amount of transcripts for five Hox genes in embryos of a lophotrochozoan, the polychaete annelid Chaetopterus sp. The polychaete Hox complex is shown not to be expressed during embryogenesis, but transcripts of all measured Hox complex genes are detected at significant levels during the initial stages of adult body plan formation. Temporal colinearity in the sequence of their activation is observed, so that activation follows the 3'-5' arrangement of the genes. Moreover, Hox gene expression is spatially localized to the region of teloblastic set-aside cells of the later-stage embryos. This study shows that an indirectly developing lophotrochozoan shares with an indirectly developing deuterostome, the sea urchin, a common mode of Hox complex utilization: construction of the larva, whether a trochophore or dipleurula, does not involve Hox cluster expression, but in both forms the complex is expressed in the set-aside cells from which the adult body plan derives.

  12. Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes

    PubMed Central

    Premzl, Marko

    2015-01-01

    Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635

  13. Development of a cross-platform biomarker signature to detect renal transplant tolerance in humans

    PubMed Central

    Sagoo, Pervinder; Perucha, Esperanza; Sawitzki, Birgit; Tomiuk, Stefan; Stephens, David A.; Miqueu, Patrick; Chapman, Stephanie; Craciun, Ligia; Sergeant, Ruhena; Brouard, Sophie; Rovis, Flavia; Jimenez, Elvira; Ballow, Amany; Giral, Magali; Rebollo-Mesa, Irene; Le Moine, Alain; Braudeau, Cecile; Hilton, Rachel; Gerstmayer, Bernhard; Bourcier, Katarzyna; Sharif, Adnan; Krajewska, Magdalena; Lord, Graham M.; Roberts, Ian; Goldman, Michel; Wood, Kathryn J.; Newell, Kenneth; Seyfert-Margolis, Vicki; Warrens, Anthony N.; Janssen, Uwe; Volk, Hans-Dieter; Soulillou, Jean-Paul; Hernandez-Fuentes, Maria P.; Lechler, Robert I.

    2010-01-01

    Identifying transplant recipients in whom immunological tolerance is established or is developing would allow an individually tailored approach to their posttransplantation management. In this study, we aimed to develop reliable and reproducible in vitro assays capable of detecting tolerance in renal transplant recipients. Several biomarkers and bioassays were screened on a training set that included 11 operationally tolerant renal transplant recipients, recipient groups following different immunosuppressive regimes, recipients undergoing chronic rejection, and healthy controls. Highly predictive assays were repeated on an independent test set that included 24 tolerant renal transplant recipients. Tolerant patients displayed an expansion of peripheral blood B and NK lymphocytes, fewer activated CD4+ T cells, a lack of donor-specific antibodies, donor-specific hyporesponsiveness of CD4+ T cells, and a high ratio of forkhead box P3 to α-1,2-mannosidase gene expression. Microarray analysis further revealed in tolerant recipients a bias toward differential expression of B cell–related genes and their associated molecular pathways. By combining these indices of tolerance as a cross-platform biomarker signature, we were able to identify tolerant recipients in both the training set and the test set. This study provides an immunological profile of the tolerant state that, with further validation, should inform and shape drug-weaning protocols in renal transplant recipients. PMID:20501943

  14. Microarray and network-based identification of functional modules and pathways of active tuberculosis.

    PubMed

    Bian, Zhong-Rui; Yin, Juan; Sun, Wen; Lin, Dian-Jie

    2017-04-01

    Diagnose of active tuberculosis (TB) is challenging and treatment response is also difficult to efficiently monitor. The aim of this study was to use an integrated analysis of microarray and network-based method to the samples from publically available datasets to obtain a diagnostic module set and pathways in active TB. Towards this goal, background protein-protein interactions (PPI) network was generated based on global PPI information and gene expression data, following by identification of differential expression network (DEN) from the background PPI network. Then, ego genes were extracted according to the degree features in DEN. Next, module collection was conducted by ego gene expansion based on EgoNet algorithm. After that, differential expression of modules between active TB and controls was evaluated using random permutation test. Finally, biological significance of differential modules was detected by pathways enrichment analysis based on Reactome database, and Fisher's exact test was implemented to extract differential pathways for active TB. Totally, 47 ego genes and 47 candidate modules were identified from the DEN. By setting the cutoff-criteria of gene size >5 and classification accuracy ≥0.9, 7 ego modules (Module 4, Module 7, Module 9, Module 19, Module 25, Module 38 and Module 43) were extracted, and all of them had the statistical significance between active TB and controls. Then, Fisher's exact test was conducted to capture differential pathways for active TB. Interestingly, genes in Module 4, Module 25, Module 38, and Module 43 were enriched in the same pathway, formation of a pool of free 40S subunits. Significant pathway for Module 7 and Module 9 was eukaryotic translation termination, and for Module 19 was nonsense mediated decay enhanced by the exon junction complex (EJC). Accordingly, differential modules and pathways might be potential biomarkers for treating active TB, and provide valuable clues for better understanding of molecular mechanism of active TB. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Common Variants within Oxidative Phosphorylation Genes Influence Risk of Ischemic Stroke and Intracerebral Hemorrhage

    PubMed Central

    Anderson, Christopher D.; Biffi, Alessandro; Nalls, Michael A.; Devan, William J.; Schwab, Kristin; Ayres, Alison M.; Valant, Valerie; Ross, Owen A.; Rost, Natalia S.; Saxena, Richa; Viswanathan, Anand; Worrall, Bradford B.; Brott, Thomas G.; Goldstein, Joshua N.; Brown, Devin; Broderick, Joseph P.; Norrving, Bo; Greenberg, Steven M.; Silliman, Scott L.; Hansen, Björn M.; Tirschwell, David L.; Lindgren, Arne; Slowik, Agnieszka; Schmidt, Reinhold; Selim, Magdy; Roquer, Jaume; Montaner, Joan; Singleton, Andrew B.; Kidwell, Chelsea S.; Woo, Daniel; Furie, Karen L.; Meschia, James F.; Rosand, Jonathan

    2013-01-01

    Background and Purpose Prior studies demonstrated association between mitochondrial DNA variants and ischemic stroke (IS). We investigated whether variants within a larger set of oxidative phosphorylation (OXPHOS) genes encoded by both autosomal and mitochondrial DNA were associated with risk of IS and, based on our results, extended our investigation to intracerebral hemorrhage (ICH). Methods This association study employed a discovery cohort of 1643 individuals, a validation cohort of 2432 individuals for IS, and an extension cohort of 1476 individuals for ICH. Gene-set enrichment analysis (GSEA) was performed on all structural OXPHOS genes, as well as genes contributing to individual respiratory complexes. Gene-sets passing GSEA were tested by constructing genetic scores using common variants residing within each gene. Associations between each variant and IS that emerged in the discovery cohort were examined in validation and extension cohorts. Results IS was associated with genetic risk scores in OXPHOS as a whole (odds ratio (OR)=1.17, p=0.008) and Complex I (OR=1.06, p=0.050). Among IS subtypes, small vessel (SV) stroke showed association with OXPHOS (OR=1.16, p=0.007), Complex I (OR=1.13, p=0.027) and Complex IV (OR 1.14, p=0.018). To further explore this SV association, we extended our analysis to ICH, revealing association between deep hemispheric ICH and Complex IV (OR=1.08, p=0.008). Conclusions This pathway analysis demonstrates association between common genetic variants within OXPHOS genes and stroke. The associations for SV stroke and deep ICH suggest that genetic variation in OXPHOS influences small vessel pathobiology. Further studies are needed to identify culprit genetic variants and assess their functional consequences. PMID:23362085

  16. Inter- and intra-species variation in genome-wide gene expression of Drosophila in response to parasitoid wasp attack.

    PubMed

    Salazar-Jaramillo, Laura; Jalvingh, Kirsten M; de Haan, Ammerins; Kraaijeveld, Ken; Buermans, Henk; Wertheim, Bregje

    2017-04-27

    Parasitoid resistance in Drosophila varies considerably, among and within species. An immune response, lamellocyte-mediated encapsulation, evolved in a subclade of Drosophila and was subsequently lost in at least one species within this subclade. While the mechanisms of resistance are fairly well documented in D. melanogaster, much less is known for closely related species. Here, we studied the inter- and intra-species variation in gene expression after parasitoid attack in Drosophila. We used RNA-seq after parasitization of four closely related Drosophila species of the melanogaster subgroup and replicated lines of D. melanogaster experimentally selected for increased resistance to gain insights into short- and long-term evolutionary changes. We found a core set of genes that are consistently up-regulated after parasitoid attack in the species and lines tested, regardless of their level of resistance. Another set of genes showed no up-regulation or expression in D. sechellia, the species unable to raise an immune response against parasitoids. This set consists largely of genes that are lineage-restricted to the melanogaster subgroup. Artificially selected lines did not show significant differences in gene expression with respect to non-selected lines in their responses to parasitoid attack, but several genes showed differential exon usage. We showed substantial similarities, but also notable differences, in the transcriptional responses to parasitoid attack among four closely related Drosophila species. In contrast, within D. melanogaster, the responses were remarkably similar. We confirmed that in the short-term, selection does not act on a pre-activation of the immune response. Instead it may target alternative mechanisms such as differential exon usage. In the long-term, we found support for the hypothesis that the ability to immunologically resist parasitoid attack is contingent on new genes that are restricted to the melanogaster subgroup.

  17. Training set selection for the prediction of essential genes.

    PubMed

    Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng

    2014-01-01

    Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.

  18. Evaluating Reported Candidate Gene Associations with Polycystic Ovary Syndrome

    PubMed Central

    Pau, Cindy; Saxena, Richa; Welt, Corrine Kolka

    2013-01-01

    Objective To replicate variants in candidate genes associated with PCOS in a population of European PCOS and control subjects. Design Case-control association analysis and meta-analysis. Setting Major academic hospital Patients Women of European ancestry with PCOS (n=525) and controls (n=472), aged 18 to 45 years. Intervention Variants previously associated with PCOS in candidate gene studies were genotyped (n=39). Metabolic, reproductive and anthropomorphic parameters were examined as a function of the candidate variants. All genetic association analyses were adjusted for age, BMI and ancestry and were reported after correction for multiple testing. Main Outcome Measure Association of candidate gene variants with PCOS. Results Three variants, rs3797179 (SRD5A1), rs12473543 (POMC), and rs1501299 (ADIPOQ), were nominally associated with PCOS. However, they did not remain significant after correction for multiple testing and none of the variants replicated in a sufficiently powered meta-analysis. Variants in the FBN3 gene (rs17202517 and rs73503752) were associated with smaller waist circumferences and variant rs727428 in the SHBG gene was associated with lower SHBG levels. Conclusion Previously identified variants in candidate genes do not appear to be associated with PCOS risk. PMID:23375202

  19. Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

    PubMed Central

    Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

    2005-01-01

    Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134

  20. Adaptation to climate through flowering phenology: a case study in Medicago truncatula.

    PubMed

    Burgarella, Concetta; Chantret, Nathalie; Gay, Laurène; Prosperi, Jean-Marie; Bonhomme, Maxime; Tiffin, Peter; Young, Nevin D; Ronfort, Joelle

    2016-07-01

    Local climatic conditions likely constitute an important selective pressure on genes underlying important fitness-related traits such as flowering time, and in many species, flowering phenology and climatic gradients strongly covary. To test whether climate shapes the genetic variation on flowering time genes and to identify candidate flowering genes involved in the adaptation to environmental heterogeneity, we used a large Medicago truncatula core collection to examine the association between nucleotide polymorphisms at 224 candidate genes and both climate variables and flowering phenotypes. Unlike genome-wide studies, candidate gene approaches are expected to enrich for the number of meaningful trait associations because they specifically target genes that are known to affect the trait of interest. We found that flowering time mediates adaptation to climatic conditions mainly by variation at genes located upstream in the flowering pathways, close to the environmental stimuli. Variables related to the annual precipitation regime reflected selective constraints on flowering time genes better than the other variables tested (temperature, altitude, latitude or longitude). By comparing phenotype and climate associations, we identified 12 flowering genes as the most promising candidates responsible for phenological adaptation to climate. Four of these genes were located in the known flowering time QTL region on chromosome 7. However, climate and flowering associations also highlighted largely distinct gene sets, suggesting different genetic architectures for adaptation to climate and flowering onset. © 2016 John Wiley & Sons Ltd.

  1. The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies

    PubMed Central

    Barnett, Ian; Mukherjee, Rajarshi; Lin, Xihong

    2017-01-01

    It is of substantial interest to study the effects of genes, genetic pathways, and networks on the risk of complex diseases. These genetic constructs each contain multiple SNPs, which are often correlated and function jointly, and might be large in number. However, only a sparse subset of SNPs in a genetic construct is generally associated with the disease of interest. In this article, we propose the generalized higher criticism (GHC) to test for the association between an SNP set and a disease outcome. The higher criticism is a test traditionally used in high-dimensional signal detection settings when marginal test statistics are independent and the number of parameters is very large. However, these assumptions do not always hold in genetic association studies, due to linkage disequilibrium among SNPs and the finite number of SNPs in an SNP set in each genetic construct. The proposed GHC overcomes the limitations of the higher criticism by allowing for arbitrary correlation structures among the SNPs in an SNP-set, while performing accurate analytic p-value calculations for any finite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association study. Supplementary materials for this article are available online. PMID:28736464

  2. Inversion of exons 1-7 of the MSH2 gene is a frequent cause of unexplained Lynch syndrome in one local population.

    PubMed

    Rhees, Jennifer; Arnold, Mildred; Boland, C Richard

    2014-06-01

    Germline mutations in DNA mismatch repair (MMR) genes, such as MSH2, cause Lynch syndrome, an autosomal dominant predisposition to colorectal as well as other cancers. Our research clinic focuses on hereditary colorectal cancer, and over the past 9 years we have identified germline mutations in DNA MMR genes in 101 patients using commercial genetic reference laboratories. We also collected samples from twelve patients with absent MSH2 protein expression and microsatellite instability in tumor tissue, with a family history suggestive of Lynch syndrome, but negative germline test results. The most likely explanation for this set of results is that the germline testing did not detect true germline mutations in these patients. Two of our patients with failed commercial testing were later found to have deletions in the 3' region of EPCAM, the gene just upstream of MSH2, but no explanation could be found for inactivation of MSH2 in the other ten patients. We used allelic dropout in long PCR to look for potential regions of rearrangement in the MSH2 gene. This method detected a potential rearrangement breakpoint in the same region of MSH2 where one breakpoint of a 10 Mb inversion was reported previously. We tested these ten patients for this inversion. Six of 10 patients had the inversion, indicating the importance of including testing for this inversion in patients suspected of having MSH2-type Lynch syndrome in our population. Additionally, this method could be further developed to look for inversions in other genes where current methods of testing fail to find a causative mutation.

  3. Positive-unlabeled learning for disease gene identification

    PubMed Central

    Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong

    2012-01-01

    Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Result: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r.a-star.edu.sg/∼xlli/PUDI/PUDI.html. Contact: xlli@i2r.a-star.edu.sg or yang0293@e.ntu.edu.sg Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:22923290

  4. Gene array analysis reveals a common Runx transcriptional program controlling cell adhesion and survival

    PubMed Central

    Wotton, Sandy; Terry, Anne; Kilbey, Anna; Jenkins, Alma; Herzyk, Pawel; Cameron, Ewan; Neil, James C.

    2008-01-01

    The Runx genes play divergent roles in development and cancer, where they can act either as oncogenes or tumour suppressors. We compared the effects of ectopic Runx expression in established fibroblasts, where all three genes produce an indistinguishable phenotype entailing epithelioid morphology and increased cell survival under stress conditions. Gene array analysis revealed a strongly overlapping transcriptional signature, with no examples of opposing regulation of the same target gene. A common set of 50 highly regulated genes was identified after further filtering on regulation by inducible RUNX1-ER. This set revealed a strong bias towards genes with annotated roles in cancer and development, and a preponderance of targets encoding extracellular or surface proteins, reflecting the marked effects of Runx on cell adhesion. Furthermore, in silico prediction of resistance to glucocorticoid growth inhibition was confirmed in fibroblasts and lymphoid cells expressing ectopic Runx. The effects of fibroblast expression of common RUNX1 fusion oncoproteins (RUNX1-ETO, TEL-RUNX1, CBFB-MYH11) were also tested. While two direct Runx activation target genes were repressed (Ncam1, Rgc32), the fusion proteins appeared to disrupt regulation of down-regulated targets (Cebpd, Id2, Rgs2) rather than impose constitutive repression. These results elucidate the oncogenic potential of the Runx family and reveal novel targets for therapeutic inhibition. PMID:18560354

  5. A 16-Gene Signature Distinguishes Anaplastic Astrocytoma from Glioblastoma

    PubMed Central

    Rao, Soumya Alige Mahabala; Srinivasan, Sujaya; Patric, Irene Rosita Pia; Hegde, Alangar Sathyaranjandas; Chandramouli, Bangalore Ashwathnarayanara; Arimappamagan, Arivazhagan; Santosh, Vani; Kondaiah, Paturu; Rao, Manchanahalli R. Sathyanarayana; Somasundaram, Kumaravel

    2014-01-01

    Anaplastic astrocytoma (AA; Grade III) and glioblastoma (GBM; Grade IV) are diffusely infiltrating tumors and are called malignant astrocytomas. The treatment regimen and prognosis are distinctly different between anaplastic astrocytoma and glioblastoma patients. Although histopathology based current grading system is well accepted and largely reproducible, intratumoral histologic variations often lead to difficulties in classification of malignant astrocytoma samples. In order to obtain a more robust molecular classifier, we analysed RT-qPCR expression data of 175 differentially regulated genes across astrocytoma using Prediction Analysis of Microarrays (PAM) and found the most discriminatory 16-gene expression signature for the classification of anaplastic astrocytoma and glioblastoma. The 16-gene signature obtained in the training set was validated in the test set with diagnostic accuracy of 89%. Additionally, validation of the 16-gene signature in multiple independent cohorts revealed that the signature predicted anaplastic astrocytoma and glioblastoma samples with accuracy rates of 99%, 88%, and 92% in TCGA, GSE1993 and GSE4422 datasets, respectively. The protein-protein interaction network and pathway analysis suggested that the 16-genes of the signature identified epithelial-mesenchymal transition (EMT) pathway as the most differentially regulated pathway in glioblastoma compared to anaplastic astrocytoma. In addition to identifying 16 gene classification signature, we also demonstrated that genes involved in epithelial-mesenchymal transition may play an important role in distinguishing glioblastoma from anaplastic astrocytoma. PMID:24475040

  6. In vitro perturbations of targets in cancer hallmark processes predict rodent chemical carcinogenesis.

    PubMed

    Kleinstreuer, Nicole C; Dix, David J; Houck, Keith A; Kavlock, Robert J; Knudsen, Thomas B; Martin, Matthew T; Paul, Katie B; Reif, David M; Crofton, Kevin M; Hamilton, Kerry; Hunter, Ronald; Shah, Imran; Judson, Richard S

    2013-01-01

    Thousands of untested chemicals in the environment require efficient characterization of carcinogenic potential in humans. A proposed solution is rapid testing of chemicals using in vitro high-throughput screening (HTS) assays for targets in pathways linked to disease processes to build models for priority setting and further testing. We describe a model for predicting rodent carcinogenicity based on HTS data from 292 chemicals tested in 672 assays mapping to 455 genes. All data come from the EPA ToxCast project. The model was trained on a subset of 232 chemicals with in vivo rodent carcinogenicity data in the Toxicity Reference Database (ToxRefDB). Individual HTS assays strongly associated with rodent cancers in ToxRefDB were linked to genes, pathways, and hallmark processes documented to be involved in tumor biology and cancer progression. Rodent liver cancer endpoints were linked to well-documented pathways such as peroxisome proliferator-activated receptor signaling and TP53 and novel targets such as PDE5A and PLAUR. Cancer hallmark genes associated with rodent thyroid tumors were found to be linked to human thyroid tumors and autoimmune thyroid disease. A model was developed in which these genes/pathways function as hypothetical enhancers or promoters of rat thyroid tumors, acting secondary to the key initiating event of thyroid hormone disruption. A simple scoring function was generated to identify chemicals with significant in vitro evidence that was predictive of in vivo carcinogenicity in different rat tissues and organs. This scoring function was applied to an external test set of 33 compounds with carcinogenicity classifications from the EPA's Office of Pesticide Programs and successfully (p = 0.024) differentiated between chemicals classified as "possible"/"probable"/"likely" carcinogens and those designated as "not likely" or with "evidence of noncarcinogenicity." This model represents a chemical carcinogenicity prioritization tool supporting targeted testing and functional validation of cancer pathways.

  7. Inter-species pathway perturbation prediction via data-driven detection of functional homology.

    PubMed

    Hafemeister, Christoph; Romero, Roberto; Bilal, Erhan; Meyer, Pablo; Norel, Raquel; Rhrissorrakrai, Kahn; Bonneau, Richard; Tarca, Adi L

    2015-02-15

    Experiments in animal models are often conducted to infer how humans will respond to stimuli by assuming that the same biological pathways will be affected in both organisms. The limitations of this assumption were tested in the IMPROVER Species Translation Challenge, where 52 stimuli were applied to both human and rat cells and perturbed pathways were identified. In the Inter-species Pathway Perturbation Prediction sub-challenge, multiple teams proposed methods to use rat transcription data from 26 stimuli to predict human gene set and pathway activity under the same perturbations. Submissions were evaluated using three performance metrics on data from the remaining 26 stimuli. We present two approaches, ranked second in this challenge, that do not rely on sequence-based orthology between rat and human genes to translate pathway perturbation state but instead identify transcriptional response orthologs across a set of training conditions. The translation from rat to human accomplished by these so-called direct methods is not dependent on the particular analysis method used to identify perturbed gene sets. In contrast, machine learning-based methods require performing a pathway analysis initially and then mapping the pathway activity between organisms. Unlike most machine learning approaches, direct methods can be used to predict the activation of a human pathway for a new (test) stimuli, even when that pathway was never activated by a training stimuli. Gene expression data are available from ArrayExpress (accession E-MTAB-2091), while software implementations are available from http://bioinformaticsprb.med.wayne.edu?p=50 and http://goo.gl/hJny3h. christoph.hafemeister@nyu.edu or atarca@med.wayne.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  8. A Host-Based RT-PCR Gene Expression Signature to Identify Acute Respiratory Viral Infection

    PubMed Central

    Zaas, Aimee K.; Burke, Thomas; Chen, Minhua; McClain, Micah; Nicholson, Bradly; Veldman, Timothy; Tsalik, Ephraim L.; Fowler, Vance; Rivers, Emanuel P.; Otero, Ronny; Kingsmore, Stephen F.; Voora, Deepak; Lucas, Joseph; Hero, Alfred O.; Carin, Lawrence; Woods, Christopher W.; Ginsburg, Geoffrey S.

    2014-01-01

    Improved ways to diagnose acute respiratory viral infections could decrease inappropriate antibacterial use and serve as a vital triage mechanism in the event of a potential viral pandemic. Measurement of the host response to infection is an alternative to pathogen-based diagnostic testing and may improve diagnostic accuracy. We have developed a host-based assay with a reverse transcription polymerase chain reaction (RT-PCR) TaqMan low-density array (TLDA) platform for classifying respiratory viral infection. We developed the assay using two cohorts experimentally infected with influenza A H3N2/Wisconsin or influenza A H1N1/Brisbane, and validated the assay in a sample of adults presenting to the emergency department with fever (n = 102) and in healthy volunteers (n = 41). Peripheral blood RNA samples were obtained from individuals who underwent experimental viral challenge or who presented to the emergency department and had microbiologically proven viral respiratory infection or systemic bacterial infection. The selected gene set on the RT-PCR TLDA assay classified participants with experimentally induced influenza H3N2 and H1N1 infection with 100 and 87% accuracy, respectively. We validated this host gene expression signature in a cohort of 102 individuals arriving at the emergency department. The sensitivity of the RT-PCR test was 89% [95% confidence interval (CI), 72 to 98%], and the specificity was 94% (95% CI, 86 to 99%). These results show that RT-PCR–based detection of a host gene expression signature can classify individuals with respiratory viral infection and sets the stage for prospective evaluation of this diagnostic approach in a clinical setting. PMID:24048524

  9. Comparison of a PfHRP2-based rapid diagnostic test and PCR for malaria in a low prevalence setting in rural southern Zambia: implications for elimination.

    PubMed

    Laban, Natasha M; Kobayashi, Tamaki; Hamapumbu, Harry; Sullivan, David; Mharakurwa, Sungano; Thuma, Philip E; Shiff, Clive J; Moss, William J

    2015-01-28

    Rapid diagnostic tests (RDTs) detecting histidine-rich protein 2 (PfHRP2) antigen are used to identify individuals with Plasmodium falciparum infection even in low transmission settings seeking to achieve elimination. However, these RDTs lack sensitivity to detect low-density infections, produce false negatives for P. falciparum strains lacking pfhrp2 gene and do not detect species other than P. falciparum. Results of a PfHRP2-based RDT and Plasmodium nested PCR were compared in a region of declining malaria transmission in southern Zambia using samples from community-based, cross-sectional surveys from 2008 to 2012. Participants were tested with a PfHRP2-based RDT and a finger prick blood sample was spotted onto filter paper for PCR analysis and used to prepare blood smears for microscopy. Species-specific, real-time, quantitative PCR (q-PCR) was performed on samples that tested positive either by microscopy, RDT or nested PCR. Of 3,292 total participants enrolled, 12 (0.4%) tested positive by microscopy and 42 (1.3%) by RDT. Of 3,213 (98%) samples tested by nested PCR, 57 (1.8%) were positive, resulting in 87 participants positive by at least one of the three tests. Of these, 61 tested positive for P. falciparum by q-PCR with copy numbers ≤ 2 x 10(3) copies/μL, 5 were positive for both P. falciparum and Plasmodium malariae and 2 were positive for P. malariae alone. RDT detected 32 (53%) of P. falciparum positives, failing to detect three of the dual infections with P. malariae. Among 2,975 participants enrolled during a low transmission period between 2009 and 2012, sensitivity of the PfHRP2-based RDT compared to nested PCR was only 17%, with specificity of >99%. The pfhrp gene was detected in 80% of P. falciparum positives; however, comparison of copy number between RDT negative and RDT positive samples suggested that RDT negatives resulted from low parasitaemia and not pfhrp2 gene deletion. Low-density P. falciparum infections not identified by currently used PfHRP2-based RDTs and the inability to detect non-falciparum malaria will hinder progress to further reduce malaria in low transmission settings of Zambia. More sensitive and specific diagnostic tests will likely be necessary to identify parasite reservoirs and achieve malaria elimination.

  10. GO-based functional dissimilarity of gene sets.

    PubMed

    Díaz-Díaz, Norberto; Aguilar-Ruiz, Jesús S

    2011-09-01

    The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.

  11. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Completemore » descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral meristem identity gene (APETALA1 or AP1), auxin response factor gene (ETTIN), the gene encoding transcriptional factor of WD40 family (TRANSPARENTTESTAGLABRA1 or TTG1), and the auxin efflux carrier (PIN-FORMED2 or PIN2) gene. More than 220 transgenic lines of the 1st, 2nd and 3rd generations were analyzed for RNAi suppression phenotypes (Filichkin et. al., manuscript submitted). A total of 108 constructs were supplied by ORNL, UF and OSU and used to generate over 1,881 PCR verified transgenic Populus and over 300 PCR verified transgenic Arabidopsis events. The Populus transgenics alone required Agrobacterium co-cultivations of 124.406 explants.« less

  12. Perceptions and understanding of genetics and genetic eye disease and attitudes to genetic testing and gene therapy in a primary eye care setting.

    PubMed

    Ganne, Pratyusha; Garrioch, Robert; Votruba, Marcela

    2015-03-01

    Genetic eye pathology represents a significant percentage of the causes of blindness in industrialized countries. This study explores the level of understanding and perceptions of genetics and inherited eye diseases and the attitudes to genetic testing and gene therapy. The study was conducted in two parts. Participant groups included were: undergraduate students of optometry, primary eye care professionals and members of the general public. A preliminary study aimed to understand perceptions and to explore the level of knowledge about genetics in general, eye genetics and gene therapy. A second survey was designed to explore attitudes to genetic testing and gene therapy. The majority of participants (82%) perceived genetics as an important science. However, none of them showed a high level of understanding of genetics and inherited eye diseases. Undergraduate students and primary eye care professionals were better informed about inherited eye diseases than the general public (p = 0.001). The majority (80%) across all three groups had a positive attitude to genetic testing and gene therapy. There was a lack of knowledge about the genetic services available among all groups of participants. This calls for serious thinking about the level of dissemination of information about genetics and inherited eye diseases. It shows a broadly supportive attitude to genomic medicine among the public. Improving public awareness and education in inherited eye diseases can improve the utility of genetic testing and therapy.

  13. Independent test assessment using the extreme value distribution theory.

    PubMed

    Almeida, Marcio; Blondell, Lucy; Peralta, Juan M; Kent, Jack W; Jun, Goo; Teslovich, Tanya M; Fuchsberger, Christian; Wood, Andrew R; Manning, Alisa K; Frayling, Timothy M; Cingolani, Pablo E; Sladek, Robert; Dyer, Thomas D; Abecasis, Goncalo; Duggirala, Ravindranath; Blangero, John

    2016-01-01

    The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14 , associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4 , encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

  14. Cost-effectiveness of molecular testing for thyroid nodules with atypia of undetermined significance cytology.

    PubMed

    Lee, Lawrence; How, Jacques; Tabah, Roger J; Mitmaker, Elliot J

    2014-08-01

    Novel molecular diagnostics, such as the gene expression classifier (GEC) and gene mutation panel (GMP) testing, may improve the management for thyroid nodules with atypia of undetermined significance (AUS) cytology. The cost-effectiveness of an approach combining both tests in different practice settings in North America is unknown. The aim of the study was to determine the cost-effectiveness of two diagnostic molecular tests, singly or in combination, for AUS thyroid nodules. We constructed a microsimulation model to investigate cost-effectiveness from US (Medicare) and Canadian healthcare system perspectives. Low-risk patients with AUS thyroid nodules were simulated. We examined five management strategies: 1) routine GEC; 2) routine GEC + selective GMP; 3) routine GMP; 4) routine GMP + selective GEC; and 5) standard management. Lifetime costs and quality-adjusted life-years were measured. From the US perspective, the routine GEC + selective GMP strategy was the dominant strategy. From the Canadian perspective, routine GEC + selective GMP cost and additional CAN$24 030 per quality-adjusted life-year gained over standard management, and was dominant over the other strategies. Sensitivity analyses reported that the decisions from both perspectives were sensitive to variations in the probability of malignancy in the nodule and the costs of the GEC and GMP. The probability of cost-effectiveness for routine GEC + selective GMP was low. In the US setting, the most cost-effective strategy was routine GEC + selective GMP. In the Canadian setting, standard management was most likely to be cost effective. The cost of these molecular diagnostics will need to be reduced to increase their cost-effectiveness for practice settings outside the United States.

  15. Functional Analysis of the Polyketide Synthase Genes in the Filamentous Fungus Gibberella zeae (Anamorph Fusarium graminearum)

    PubMed Central

    Gaffoor, Iffa; Brown, Daren W.; Plattner, Ron; Proctor, Robert H.; Qi, Weihong; Trail, Frances

    2005-01-01

    Polyketides are a class of secondary metabolites that exhibit a vast diversity of form and function. In fungi, these compounds are produced by large, multidomain enzymes classified as type I polyketide synthases (PKSs). In this study we identified and functionally disrupted 15 PKS genes from the genome of the filamentous fungus Gibberella zeae. Five of these genes are responsible for producing the mycotoxins zearalenone, aurofusarin, and fusarin C and the black perithecial pigment. A comprehensive expression analysis of the 15 genes revealed diverse expression patterns during grain colonization, plant colonization, sexual development, and mycelial growth. Expression of one of the PKS genes was not detected under any of 18 conditions tested. This is the first study to genetically characterize a complete set of PKS genes from a single organism. PMID:16278459

  16. Inference of sigma factor controlled networks by using numerical modeling applied to microarray time series data of the germinating prokaryote.

    PubMed

    Strakova, Eva; Zikova, Alice; Vohradsky, Jiri

    2014-01-01

    A computational model of gene expression was applied to a novel test set of microarray time series measurements to reveal regulatory interactions between transcriptional regulators represented by 45 sigma factors and the genes expressed during germination of a prokaryote Streptomyces coelicolor. Using microarrays, the first 5.5 h of the process was recorded in 13 time points, which provided a database of gene expression time series on genome-wide scale. The computational modeling of the kinetic relations between the sigma factors, individual genes and genes clustered according to the similarity of their expression kinetics identified kinetically plausible sigma factor-controlled networks. Using genome sequence annotations, functional groups of genes that were predominantly controlled by specific sigma factors were identified. Using external binding data complementing the modeling approach, specific genes involved in the control of the studied process were identified and their function suggested.

  17. Testing Domestication Scenarios of Lima Bean (Phaseolus lunatus L.) in Mesoamerica: Insights from Genome-Wide Genetic Markers

    PubMed Central

    Chacón-Sánchez, María I.; Martínez-Castillo, Jaime

    2017-01-01

    Plant domestication can be seen as a long-term process that involves a complex interplay among demographic processes and evolutionary forces. Previous studies have suggested two domestication scenarios for Lima bean in Mesoamerica: two separate domestication events, one from gene pool MI in central-western Mexico and another one from gene pool MII in the area Guatemala-Costa Rica, or a single domestication from gene pool MI in central-western Mexico followed by post-domestication gene flow with wild populations. In this study we evaluated the genetic structure of the wild gene pool and tested these two competing domestication scenarios of Lima bean in Mesoamerica by applying an ABC approach to a set of genome-wide SNP markers. The results confirm the existence of three gene pools in wild Lima bean, two Mesoamerican gene pools (MI and MII) and the Andean gene pool (AI), and suggest the existence of another gene pool in central Colombia. The results indicate that although both domestication scenarios may be supported by genetic data, higher statistical support was given to the single domestication scenario in central-western Mexico followed by admixture with wild populations. Domestication would have involved strong founder effects reflected in loss of genetic diversity and increased LD levels in landraces. Genomic regions affected by selection were detected and these may harbor candidate genes related to domestication. PMID:28955351

  18. Current BPC3 Research Plan

    Cancer.gov

    This will expand the BPC3 to serve as a rapid verification test set for SNPs identified in the scans other than the CGEMS scan, and to examine gene-environment interactions in the SNPs identified in CGEMS and other studies as being associated with breast and prostate cancer.

  19. Prevalence and Characterization of Carbapenem-Resistant Enterobacteriaceae Isolated from Mulago National Referral Hospital, Uganda

    PubMed Central

    Okoche, Deogratius; Asiimwe, Benon B.; Katabazi, Fred Ashaba; Kato, Laban; Najjuka, Christine F.

    2015-01-01

    Introduction Carbapenemases have increasingly been reported in enterobacteriaceae worldwide. Most carbapenemases are plasmid encoded hence resistance can easily spread. Carbapenem-resistant enterobacteriaceae are reported to cause mortality in up to 50% of patients who acquire bloodstream infections. We set out to determine the burden of carbapenem resistance as well as establish genes encoding for carbapenemases in enterobacteriaceae clinical isolates obtained from Mulago National Referral Hospital, Uganda. Methods This was a cross-sectional study with a total of 196 clinical isolates previously collected from pus swabs, urine, blood, sputum, tracheal aspirates, cervical swabs, endomentrial aspirates, rectal swabs, Vaginal swabs, ear swabs, products of conception, wound biopsy and amniotic fluid. All isolates were subjected to phenotypic carbapenemase screening using Boronic acid-based inhibition, Modified Hodge and EDTA double combined disk test. In addition, all the isolates were subjected to PCR assay to confirm presence of carbapenemase encoding genes. Results The study found carbapenemase prevalence of 22.4% (44/196) in the isolates using phenotypic tests, with the genotypic prevalence slightly higher at 28.6% (56/196). Over all, the most prevalent gene was blaVIM (21,10.7%), followed by blaOXA-48 (19, 9.7%), blaIMP (12, 6.1%), blaKPC (10, 5.1%) and blaNDM-1 (5, 2.6%). Among 56 isolates positive for 67 carbapenemase encoding genes, Klebsiella pneumonia was the species with the highest number (52.2%). Most 32/67(47.7%) of these resistance genes were in bacteria isolated from pus swabs. Conclusion There is a high prevalence of carbapenemases and carbapenem-resistance encoding genes among third generation cephalosporins resistant Enterobacteriaceae in Uganda, indicating a danger of limited treatment options in this setting in the near future. PMID:26284519

  20. Multiplex real-time PCR assay for Legionella species.

    PubMed

    Kim, Seung Min; Jeong, Yoojung; Sohn, Jang Wook; Kim, Min Ja

    2015-12-01

    Legionella pneumophila serogroup 1 (sg1) accounts for the majority of infections in humans, but other Legionella species are also associated with human disease. In this study, a new SYBR Green I-based multiplex real-time PCR assay in a single reaction was developed to allow the rapid detection and differentiation of Legionella species by targeting specific gene sequences. Candidate target genes were selected, and primer sets were designed by referring to comparative genomic hybridization data of Legionella species. The Legionella species-specific groES primer set successfully detected all 30 Legionella strains tested. The xcpX and rfbA primers specifically detected L. pneumophila sg1-15 and L. pneumophila sg1, respectively. In addition, this assay was validated by testing clinical samples and isolates. In conclusion, this novel multiplex real-time PCR assay might be a useful diagnostic tool for the rapid detection and differentiation of Legionella species in both clinical and epidemiological studies. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

    PubMed Central

    Hestand, Matthew S; van Galen, Michiel; Villerius, Michel P; van Ommen, Gert-Jan B; den Dunnen, Johan T; 't Hoen, Peter AC

    2008-01-01

    Background The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. Results We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFACR database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. Conclusion The program CORE_TF is accessible in a user friendly web interface at . It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites. PMID:19036135

  2. Non-invasive prenatal diagnosis (NIPD) for single gene disorders: cost analysis of NIPD and invasive testing pathways.

    PubMed

    Verhoef, Talitha I; Hill, Melissa; Drury, Suzanne; Mason, Sarah; Jenkins, Lucy; Morris, Stephen; Chitty, Lyn S

    2016-07-01

    Evaluate the costs of offering non-invasive prenatal diagnosis (NIPD) for single gene disorders compared to traditional invasive testing to inform NIPD implementation into clinical practice. Total costs of diagnosis using NIPD or invasive testing pathways were compared for a representative set of single gene disorders. For autosomal dominant conditions, where NIPD molecular techniques are straightforward, NIPD cost £314 less than invasive testing. NIPD for autosomal recessive and X-linked conditions requires more complicated technical approaches and total costs were more than invasive testing, e.g. NIPD for spinal muscular atrophy was £1090 more than invasive testing. Impact of test uptake on costs was assessed using sickle cell disorder as an example. Anticipated high uptake of NIPD resulted in an incremental cost of NIPD over invasive testing of £48 635 per 100 pregnancies at risk of sickle cell disorder. Total costs of NIPD are dependent upon the complexity of the testing technique required. Anticipated increased demand for testing may have economic implications for prenatal diagnostic services. Ethical issues requiring further consideration are highlighted including directing resources to NIPD when used for information only and restricting access to safe tests if it is not cost-effective to develop NIPD for rare conditions. © 2016 The Authors. Prenatal Diagnosis published by John Wiley & Sons, Ltd. © 2016 The Authors. Prenatal Diagnosis published by John Wiley & Sons, Ltd.

  3. Database resources of the National Center for Biotechnology Information

    PubMed Central

    2015-01-01

    The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906

  4. GeneTopics - interpretation of gene sets via literature-driven topic models

    PubMed Central

    2013-01-01

    Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875

  5. Allelic variation in dopamine D2 receptor gene is associated with attentional impulsiveness on the Barratt Impulsiveness Scale (BIS-11).

    PubMed

    Taylor, Jasmine B; Cummins, Tarrant D R; Fox, Allison M; Johnson, Beth P; Tong, Janette H; Visser, Troy A W; Hawi, Ziarih; Bellgrove, Mark A

    2017-01-20

    Previous studies have postulated that noradrenergic and/or dopaminergic gene variations are likely to underlie individual differences in impulsiveness, however, few have shown this. The current study examined the relationship between catecholamine gene variants and self-reported impulsivity, as measured by the Barratt Impulsiveness Scale (Version 11; BIS-11) Methods: Six hundred and seventy-seven non-clinical adults completed the Barratt Impulsiveness Scale (BIS-11). DNA was analysed for a set of 142 single-nucleotide polymorphisms (SNPs) across 20 autosomal catecholamine genes. Association was tested using an additive regression model with permutation testing used to control for the influence of multiple comparison. Analysis revealed an influence of rs4245146 of the dopamine D2 receptor (DRD2) gene on the BIS-11 attention first-order factor, such that self-reported attentional impulsiveness increased in an additive fashion with each copy of the T allele. These findings provide preliminary evidence that allelic variation in DRD2 may influence impulsiveness by increasing the propensity for attentional lapses.

  6. Filtering genetic variants and placing informative priors based on putative biological function.

    PubMed

    Friedrichs, Stefanie; Malzahn, Dörthe; Pugh, Elizabeth W; Almeida, Marcio; Liu, Xiao Qing; Bailey, Julia N

    2016-02-03

    High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.

  7. Comprehensive analysis of MGMT promoter methylation: correlation with MGMT expression and clinical response in GBM.

    PubMed

    Shah, Nameeta; Lin, Biaoyang; Sibenaller, Zita; Ryken, Timothy; Lee, Hwahyung; Yoon, Jae-Geun; Rostad, Steven; Foltz, Greg

    2011-01-07

    O⁶-methylguanine DNA-methyltransferase (MGMT) promoter methylation has been identified as a potential prognostic marker for glioblastoma patients. The relationship between the exact site of promoter methylation and its effect on gene silencing, and the patient's subsequent response to therapy, is still being defined. The aim of this study was to comprehensively characterize cytosine-guanine (CpG) dinucleotide methylation across the entire MGMT promoter and to correlate individual CpG site methylation patterns to mRNA expression, protein expression, and progression-free survival. To best identify the specific MGMT promoter region most predictive of gene silencing and response to therapy, we determined the methylation status of all 97 CpG sites in the MGMT promoter in tumor samples from 70 GBM patients using quantitative bisulfite sequencing. We next identified the CpG site specific and regional methylation patterns most predictive of gene silencing and improved progression-free survival. Using this data, we propose a new classification scheme utilizing methylation data from across the entire promoter and show that an analysis based on this approach, which we call 3R classification, is predictive of progression-free survival (HR  = 5.23, 95% CI [2.089-13.097], p<0.0001). To adapt this approach to the clinical setting, we used a methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) test based on the 3R classification and show that this test is both feasible in the clinical setting and predictive of progression free survival (HR  = 3.076, 95% CI [1.301-7.27], p = 0.007). We discuss the potential advantages of a test based on this promoter-wide analysis and compare it to the commonly used methylation-specific PCR test. Further prospective validation of these two methods in a large independent patient cohort will be needed to confirm the added value of promoter wide analysis of MGMT methylation in the clinical setting.

  8. Comprehensive Analysis of MGMT Promoter Methylation: Correlation with MGMT Expression and Clinical Response in GBM

    PubMed Central

    Shah, Nameeta; Lin, Biaoyang; Sibenaller, Zita; Ryken, Timothy; Lee, Hwahyung; Yoon, Jae-Geun; Rostad, Steven; Foltz, Greg

    2011-01-01

    O6-methylguanine DNA-methyltransferase (MGMT) promoter methylation has been identified as a potential prognostic marker for glioblastoma patients. The relationship between the exact site of promoter methylation and its effect on gene silencing, and the patient's subsequent response to therapy, is still being defined. The aim of this study was to comprehensively characterize cytosine-guanine (CpG) dinucleotide methylation across the entire MGMT promoter and to correlate individual CpG site methylation patterns to mRNA expression, protein expression, and progression-free survival. To best identify the specific MGMT promoter region most predictive of gene silencing and response to therapy, we determined the methylation status of all 97 CpG sites in the MGMT promoter in tumor samples from 70 GBM patients using quantitative bisulfite sequencing. We next identified the CpG site specific and regional methylation patterns most predictive of gene silencing and improved progression-free survival. Using this data, we propose a new classification scheme utilizing methylation data from across the entire promoter and show that an analysis based on this approach, which we call 3R classification, is predictive of progression-free survival (HR  = 5.23, 95% CI [2.089–13.097], p<0.0001). To adapt this approach to the clinical setting, we used a methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) test based on the 3R classification and show that this test is both feasible in the clinical setting and predictive of progression free survival (HR  = 3.076, 95% CI [1.301–7.27], p = 0.007). We discuss the potential advantages of a test based on this promoter-wide analysis and compare it to the commonly used methylation-specific PCR test. Further prospective validation of these two methods in a large independent patient cohort will be needed to confirm the added value of promoter wide analysis of MGMT methylation in the clinical setting. PMID:21249131

  9. Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses.

    PubMed

    Orr, N; Back, W; Gu, J; Leegwater, P; Govindarajan, P; Conroy, J; Ducro, B; Van Arendonk, J A M; MacHugh, D E; Ennis, S; Hill, E W; Brama, P A J

    2010-12-01

    The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of inheritance, to a 2-MB region of chromosome 14 using just 10 affected animals and 10 controls. We successfully genotyped 34,429 SNPs that were tested for association with dwarfism using chi-square tests. The most significant SNP in our study, BIEC2-239376 (P(2df)=4.54 × 10(-5), P(rec)=7.74 × 10(-6)), is located close to a gene implicated in human dwarfism. Fine-mapping and resequencing analyses did not aid in further localization of the causative variant, and replication of our findings in independent sample sets will be necessary to confirm these results. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.

  10. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

    PubMed

    Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

    2017-12-03

    A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.

  11. Gene Selection and Cancer Classification: A Rough Sets Based Approach

    NASA Astrophysics Data System (ADS)

    Sun, Lijun; Miao, Duoqian; Zhang, Hongyun

    Indentification of informative gene subsets responsible for discerning between available samples of gene expression data is an important task in bioinformatics. Reducts, from rough sets theory, corresponding to a minimal set of essential genes for discerning samples, is an efficient tool for gene selection. Due to the compuational complexty of the existing reduct algoritms, feature ranking is usually used to narrow down gene space as the first step and top ranked genes are selected . In this paper,we define a novel certierion based on the expression level difference btween classes and contribution to classification of the gene for scoring genes and present a algorithm for generating all possible reduct from informative genes.The algorithm takes the whole attribute sets into account and find short reduct with a significant reduction in computational complexity. An exploration of this approach on benchmark gene expression data sets demonstrates that this approach is successful for selecting high discriminative genes and the classification accuracy is impressive.

  12. A Gene Signature to Determine Metastatic Behavior in Thymomas

    PubMed Central

    Gökmen-Polar, Yesim; Wilkinson, Jeff; Maetzold, Derek; Stone, John F.; Oelschlager, Kristen M.; Vladislav, Ioan Tudor; Shirar, Kristen L.; Kesler, Kenneth A.; Loehrer, Patrick J.; Badve, Sunil

    2013-01-01

    Purpose Thymoma represents one of the rarest of all malignancies. Stage and completeness of resection have been used to ascertain postoperative therapeutic strategies albeit with limited prognostic accuracy. A molecular classifier would be useful to improve the assessment of metastatic behaviour and optimize patient management. Methods qRT-PCR assay for 23 genes (19 test and four reference genes) was performed on multi-institutional archival primary thymomas (n = 36). Gene expression levels were used to compute a signature, classifying tumors into classes 1 and 2, corresponding to low or high likelihood for metastases. The signature was validated in an independent multi-institutional cohort of patients (n = 75). Results A nine-gene signature that can predict metastatic behavior of thymomas was developed and validated. Using radial basis machine modeling in the training set, 5-year and 10-year metastasis-free survival rates were 77% and 26% for predicted low (class 1) and high (class 2) risk of metastasis (P = 0.0047, log-rank), respectively. For the validation set, 5-year metastasis-free survival rates were 97% and 30% for predicted low- and high-risk patients (P = 0.0004, log-rank), respectively. The 5-year metastasis-free survival rates for the validation set were 49% and 41% for Masaoka stages I/II and III/IV (P = 0.0537, log-rank), respectively. In univariate and multivariate Cox models evaluating common prognostic factors for thymoma metastasis, the nine-gene signature was the only independent indicator of metastases (P = 0.036). Conclusion A nine-gene signature was established and validated which predicts the likelihood of metastasis more accurately than traditional staging. This further underscores the biologic determinants of the clinical course of thymoma and may improve patient management. PMID:23894276

  13. TCDD and a putative endogenous AhR ligand, ITE, elicit the same immediate changes in gene expression in mouse lung fibroblasts.

    PubMed

    Henry, Ellen C; Welle, Stephen L; Gasiewicz, Thomas A

    2010-03-01

    The aryl hydrocarbon receptor (AhR), a ligand-dependent transcription factor, mediates toxicity of several classes of xenobiotics and also has important physiological roles in differentiation, reproduction, and immunity, although the endogenous ligand(s) mediating these functions is/are as yet unidentified. One candidate endogenous ligand, 2-(1'H-indolo-3'-carbonyl)-thiazole-4-carboxylic acid methyl ester (ITE), is a potent AhR agonist in vitro, activates the murine AhR in vivo, but does not induce toxicity. We hypothesized that ITE and the toxic ligand, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), may modify transcription of different sets of genes to account for their different toxicity. To test this hypothesis, primary mouse lung fibroblasts were exposed to 0.5muM ITE, 0.2nM TCDD, or vehicle for 4 h, and total gene expression was evaluated using microarrays. After this short-term and low-dose treatment, several hundred genes were changed significantly, and the response to ITE and TCDD was remarkably similar, both qualitatively and quantitatively. Induced gene sets included the expected battery of AhR-dependent xenobiotic-metabolizing enzymes, as well as several sets that reflect the inflammatory role of lung fibroblasts. Real time quantitative RT-qPCR assay of several selected genes confirmed these microarray data and further suggested that there may be kinetic differences in expression between ligands. These data suggest that ITE and TCDD elicit an analogous change in AhR conformation such that the initial transcription response is the same. Furthermore, if the difference in toxicity between TCDD and ITE is mediated by differences in gene expression, then it is likely that secondary changes enabled by the persistent TCDD, but not by the shorter lived ITE, are responsible.

  14. Bioinformatic analysis of primary endothelial cell gene array data illustrated by the analysis of transcriptome changes in endothelial cells exposed to VEGF-A and PlGF.

    PubMed

    Schoenfeld, Jonathan; Lessan, Khashayar; Johnson, Nicola A; Charnock-Jones, D Stephen; Evans, Amanda; Vourvouhaki, Ekaterini; Scott, Laurie; Stephens, Richard; Freeman, Tom C; Saidi, Samir A; Tom, Brian; Weston, Gareth C; Rogers, Peter; Smith, Stephen K; Print, Cristin G

    2004-01-01

    We recently published a review in this journal describing the design, hybridisation and basic data processing required to use gene arrays to investigate vascular biology (Evans et al. Angiogenesis 2003; 6: 93-104). Here, we build on this review by describing a set of powerful and robust methods for the analysis and interpretation of gene array data derived from primary vascular cell cultures. First, we describe the evaluation of transcriptome heterogeneity between primary cultures derived from different individuals, and estimation of the false discovery rate introduced by this heterogeneity and by experimental noise. Then, we discuss the appropriate use of Bayesian t-tests, clustering and independent component analysis to mine the data. We illustrate these principles by analysis of a previously unpublished set of gene array data in which human umbilical vein endothelial cells (HUVEC) cultured in either rich or low-serum media were exposed to vascular endothelial growth factor (VEGF)-A165 or placental growth factor (PlGF)-1(131). We have used Affymetrix U95A gene arrays to map the effects of these factors on the HUVEC transcriptome. These experiments followed a paired design and were biologically replicated three times. In addition, one experiment was repeated using serial analysis of gene expression (SAGE). In contrast to some previous studies, we found that VEGF-A and PlGF consistently regulated only small, non-overlapping and culture media-dependant sets of HUVEC transcripts, despite causing significant cell biological changes.

  15. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

    PubMed Central

    Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi

    2016-01-01

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961

  16. Should genes with missing data be excluded from phylogenetic analyses?

    PubMed

    Jiang, Wei; Chen, Si-Yun; Wang, Hong; Li, De-Zhu; Wiens, John J

    2014-11-01

    Phylogeneticists often design their studies to maximize the number of genes included but minimize the overall amount of missing data. However, few studies have addressed the costs and benefits of adding characters with missing data, especially for likelihood analyses of multiple loci. In this paper, we address this topic using two empirical data sets (in yeast and plants) with well-resolved phylogenies. We introduce varying amounts of missing data into varying numbers of genes and test whether the benefits of excluding genes with missing data outweigh the costs of excluding the non-missing data that are associated with them. We also test if there is a proportion of missing data in the incomplete genes at which they cease to be beneficial or harmful, and whether missing data consistently bias branch length estimates. Our results indicate that adding incomplete genes generally increases the accuracy of phylogenetic analyses relative to excluding them, especially when there is a high proportion of incomplete genes in the overall dataset (and thus few complete genes). Detailed analyses suggest that adding incomplete genes is especially helpful for resolving poorly supported nodes. Given that we find that excluding genes with missing data often decreases accuracy relative to including these genes (and that decreases are generally of greater magnitude than increases), there is little basis for assuming that excluding these genes is necessarily the safer or more conservative approach. We also find no evidence that missing data consistently bias branch length estimates. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. SET oncoprotein accumulation regulates transcription through DNA demethylation and histone hypoacetylation.

    PubMed

    Almeida, Luciana O; Neto, Marinaldo P C; Sousa, Lucas O; Tannous, Maryna A; Curti, Carlos; Leopoldino, Andreia M

    2017-04-18

    Epigenetic modifications are essential in the control of normal cellular processes and cancer development. DNA methylation and histone acetylation are major epigenetic modifications involved in gene transcription and abnormal events driving the oncogenic process. SET protein accumulates in many cancer types, including head and neck squamous cell carcinoma (HNSCC); SET is a member of the INHAT complex that inhibits gene transcription associating with histones and preventing their acetylation. We explored how SET protein accumulation impacts on the regulation of gene expression, focusing on DNA methylation and histone acetylation. DNA methylation profile of 24 tumour suppressors evidenced that SET accumulation decreased DNA methylation in association with loss of 5-methylcytidine, formation of 5-hydroxymethylcytosine and increased TET1 levels, indicating an active DNA demethylation mechanism. However, the expression of some suppressor genes was lowered in cells with high SET levels, suggesting that loss of methylation is not the main mechanism modulating gene expression. SET accumulation also downregulated the expression of 32 genes of a panel of 84 transcription factors, and SET directly interacted with chromatin at the promoter of the downregulated genes, decreasing histone acetylation. Gene expression analysis after cell treatment with 5-aza-2'-deoxycytidine (5-AZA) and Trichostatin A (TSA) revealed that histone acetylation reversed transcription repression promoted by SET. These results suggest a new function for SET in the regulation of chromatin dynamics. In addition, TSA diminished both SET protein levels and SET capability to bind to gene promoter, suggesting that administration of epigenetic modifier agents could be efficient to reverse SET phenotype in cancer.

  18. Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory

    PubMed Central

    Isaacson, Sven; Luo, Feng; Feltus, Frank A.; Smith, Melissa C.

    2013-01-01

    The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust. PMID:23409071

  19. Heterogeneity of heat-resistant proteases from milk Pseudomonas species.

    PubMed

    Marchand, Sophie; Vandriesche, Gonzalez; Coorevits, An; Coudijzer, Katleen; De Jonghe, Valerie; Dewettinck, Koen; De Vos, Paul; Devreese, Bart; Heyndrickx, Marc; De Block, Jan

    2009-07-31

    Pseudomonas fragi, Pseudomonas lundensis and members of the Pseudomonas fluorescens group may spoil Ultra High Temperature (UHT) treated milk and dairy products, due to the production of heat-stable proteases in the cold chain of raw milk. Since the aprX gene codes for a heat-resistant protease in P. fluorescens, the presence of this gene has also been investigated in other members of the genus. For this purpose an aprX-screening PCR test has been developed. Twenty-nine representatives of important milk Pseudomonas species and thirty-five reference strains were screened. In 42 out of 55 investigated Pseudomonas strains, the aprX gene was detected, which proves the potential of the aprX-PCR test as a screening tool for potentially proteolytic Pseudomonas strains in milk samples. An extensive study of the obtained aprX-sequences on the DNA and the amino acid level, however, revealed a large heterogeneity within the investigated milk isolates. Although this heterogeneity sets limitations to a general detection method for all proteolytic Pseudomonas strains in milk, it offers a great potential for the development of a multiplex PCR screening test targeting individual aprX-genes. Furthermore, our data illustrated the potential use of the aprX gene as a taxonomic marker, which may help in resolving the current taxonomic deadlock in the P. fluorescens group.

  20. An Efficient Test for Gene-Environment Interaction in Generalized Linear Mixed Models with Family Data.

    PubMed

    Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza

    2017-09-27

    Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.

  1. Comprehensive RNA-Seq transcriptomic profiling across 11 organs, 4 ages, and 2 sexes of Fischer 344 rats.

    PubMed

    Yu, Ying; Zhao, Chen; Su, Zhenqiang; Wang, Charles; Fuscoe, James C; Tong, Weida; Shi, Leming

    2014-01-01

    The rat is used extensively by the pharmaceutical, regulatory, and academic communities for safety assessment of drugs and chemicals and for studying human diseases; however, its transcriptome has not been well studied. As part of the SEQC (i.e., MAQC-III) consortium efforts, a comprehensive RNA-Seq data set was constructed using 320 RNA samples isolated from 10 organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testes or uterus) from both sexes of Fischer 344 rats across four ages (2-, 6-, 21-, and 104-week-old) with four biological replicates for each of the 80 sample groups (organ-sex-age). With the Ribo-Zero rRNA removal and Illumina RNA-Seq protocols, 41 million 50 bp single-end reads were generated per sample, yielding a total of 13.4 billion reads. This data set could be used to identify and validate new rat genes and transcripts, develop a more comprehensive rat transcriptome annotation system, identify novel gene regulatory networks related to tissue specific gene expression and development, and discover genes responsible for disease and drug toxicity and efficacy.

  2. Preemptive Pharmacogenomic Testing for Precision Medicine

    PubMed Central

    Ji, Yuan; Skierka, Jennifer M.; Blommel, Joseph H.; Moore, Brenda E.; VanCuyk, Douglas L.; Bruflat, Jamie K.; Peterson, Lisa M.; Veldhuizen, Tamra L.; Fadra, Numrah; Peterson, Sandra E.; Lagerstedt, Susan A.; Train, Laura J.; Baudhuin, Linnea M.; Klee, Eric W.; Ferber, Matthew J.; Bielinski, Suzette J.; Caraballo, Pedro J.; Weinshilboum, Richard M.; Black, John L.

    2017-01-01

    Significant barriers, such as lack of professional guidelines, specialized training for interpretation of pharmacogenomics (PGx) data, and insufficient evidence to support clinical utility, prevent preemptive PGx testing from being widely clinically implemented. The current study, as a pilot project for the Right Drug, Right Dose, Right Time–Using Genomic Data to Individualize Treatment Protocol, was designed to evaluate the impact of preemptive PGx and to optimize the workflow in the clinic setting. We used an 84-gene next-generation sequencing panel that included SLCO1B1, CYP2C19, CYP2C9, and VKORC1 together with a custom-designed CYP2D6 testing cascade to genotype the 1013 subjects in laboratories approved by the Clinical Laboratory Improvement Act. Actionable PGx variants were placed in patient's electronic medical records where integrated clinical decision support rules alert providers when a relevant medication is ordered. The fraction of this cohort carrying actionable PGx variant(s) in individual genes ranged from 30% (SLCO1B1) to 79% (CYP2D6). When considering all five genes together, 99% of the subjects carried an actionable PGx variant(s) in at least one gene. Our study provides evidence in favor of preemptive PGx testing by identifying the risk of a variant being present in the population we studied. PMID:26947514

  3. Immunohistochemistry as a surrogate for molecular testing: a review.

    PubMed

    Swanson, Paul E

    2015-02-01

    Despite the myriad of genetic and epigenetic alterations in human neoplasms that seem to demand specific molecular probes for their identification and practical application to diagnostic pathology, immunohistochemistry (IHC) remains a vital component of laboratory testing in the emerging molecular era. The development and proper application of sensitive and specific antibodies raised against cryptic proteins only expressed in quantity after gene translocation, translocation-specific chimeric fusion peptides, and gene products overexpressed because of gene amplification demonstrate that IHC is a legitimate surrogate for traditional cytogenetic and in situ hybridization-based identification of chromosomal abnormalities, if not a viable molecular technique in its own right. Similarly, the detection of mutational events, through the reliable demonstration of protein loss, the identification of proteins overexpressed because of activating mutations, the specific visualization of mutant gene products, and the localization of splice variant gene products emphasizes the potential value of IHC as a surrogate for mutational analyses of genes important to both diagnosis and prediction of therapeutic response. In the latter setting IHC also provides a means of approximating gene expression profiles in the molecular classification and risk stratification of human neoplasms. For time being, the application of appropriately targeted sensitive and specific antibodies provides a cost-effective screening modality, if not replacement, for selected molecular techniques, but IHC will lose its value if the development of companion tests for emerging novel biomarkers does not keep pace with molecular techniques, particularly as the costs and time constraints of genomic sequencing diminish over time.

  4. Aspirin exposure reveals novel genes associated with platelet function and cardiovascular events.

    PubMed

    Voora, Deepak; Cyr, Derek; Lucas, Joseph; Chi, Jen-Tsan; Dungan, Jennifer; McCaffrey, Timothy A; Katz, Richard; Newby, L Kristin; Kraus, William E; Becker, Richard C; Ortel, Thomas L; Ginsburg, Geoffrey S

    2013-10-01

    The aim of this study was to develop ribonucleic acid (RNA) profiles that could serve as novel biomarkers for the response to aspirin. Aspirin reduces death and myocardial infarction (MI), suggesting that aspirin interacts with biological pathways that may underlie these events. Aspirin was administered, followed by whole-blood RNA microarray profiling, in a discovery cohort of healthy volunteers (HV1) (n = 50) and 2 validation cohorts of healthy volunteers (HV2) (n = 53) and outpatient cardiology patients (OPC) (n = 25). Platelet function was assessed using the platelet function score (PFS) in HV1 and HV2 and the VerifyNow Aspirin Test (Accumetrics, Inc., San Diego, California) in OPC. Bayesian sparse factor analysis identified sets of coexpressed transcripts, which were examined for associations with PFS in HV1 and validated in HV2 and OPC. Proteomic analysis confirmed the association of validated transcripts in platelet proteins. Validated gene sets were tested for association with death or MI in 2 patient cohorts (n = 587 total) from RNA samples collected at cardiac catheterization. A set of 60 coexpressed genes named the "aspirin response signature" (ARS) was associated with PFS in HV1 (r = -0.31, p = 0.03), HV2 (r = -0.34, Bonferroni p = 0.03), and OPC (p = 0.046). Corresponding proteins for the 17 ARS genes were identified in the platelet proteome, of which 6 were associated with PFS. The ARS was associated with death or MI in both patient cohorts (odds ratio: 1.2 [p = 0.01]; hazard ratio: 1.5 [p = 0.001]), independent of cardiovascular risk factors. Compared with traditional risk factors, reclassification (net reclassification index = 31% to 37%, p ≤ 0.0002) was improved by including the ARS or 1 of its genes, ITGA2B. RNA profiles of platelet-specific genes are novel biomarkers for identifying patients who do not respond adequately to aspirin and who are at risk for death or MI. Copyright © 2013 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  5. Association of Single-Nucleotide Polymorphisms of the Tau Gene With Late-Onset Parkinson Disease

    PubMed Central

    Martin, Eden R.; Scott, William K.; Nance, Martha A.; Watts, Ray L.; Hubble, Jean P.; Koller, William C.; Lyons, Kelly; Pahwa, Rajesh; Stern, Matthew B.; Colcher, Amy; Hiner, Bradley C.; Jankovic, Joseph; Ondo, William G.; Allen, Fred H.; Goetz, Christopher G.; Small, Gary W.; Masterman, Donna; Mastaglia, Frank; Laing, Nigel G.; Stajich, Jeffrey M.; Ribble, Robert C.; Booze, Michael W.; Rogala, Allison; Hauser, Michael A.; Zhang, Fengyu; Gibson, Rachel A.; Middleton, Lefkos T.; Roses, Allen D.; Haines, Jonathan L.; Scott, Burton L.; Pericak-Vance, Margaret A.; Vance, Jeffery M.

    2013-01-01

    Context The human tau gene, which promotes assembly of neuronal microtubules, has been associated with several rare neurologic diseases that clinically include parkinsonian features. We recently observed linkage in idiopathic Parkinson disease (PD) to a region on chromosome 17q21 that contains the tau gene. These factors make tau a good candidate for investigation as a susceptibility gene for idiopathic PD, the most common form of the disease. Objective To investigate whether the tau gene is involved in idiopathic PD. Design, Setting, and Participants Among a sample of 1056 individuals from 235 families selected from 13 clinical centers in the United States and Australia and from a family ascertainment core center, we tested 5 single-nucleotide polymorphisms (SNPs) within the tau gene for association with PD, using family-based tests of association. Both affected (n = 426) and unaffected (n = 579) family members were included; 51 individuals had unclear PD status. Analyses were conducted to test individual SNPs and SNP haplotypes within the tau gene. Main Outcome Measure Family-based tests of association, calculated using asymptotic distributions. Results Analysis of association between the SNPs and PD yielded significant evidence of association for 3 of the 5 SNPs tested: SNP 3, P = .03; SNP 9i, P = .04; and SNP 11, P = .04. The 2 other SNPs did not show evidence of significant association (SNP 9ii, P = .11, and SNP 9iii, P = .87). Strong evidence of association was found with haplotype analysis, with a positive association with one haplotype (P = .009) and a negative association with another haplotype (P = .007). Substantial linkage disequilibrium (P<.001) was detected between 4 of the 5 SNPs (SNPs 3,9i, 9ii, and 11). Conclusions This integrated approach of genetic linkage and positional association analyses implicates tau as a susceptibility gene for idiopathic PD. PMID:11710889

  6. Regulation of behaviorally associated gene networks in worker honey bee ovaries

    PubMed Central

    Wang, Ying; Kocher, Sarah D.; Linksvayer, Timothy A.; Grozinger, Christina M.; Page, Robert E.; Amdam, Gro V.

    2012-01-01

    SUMMARY Several lines of evidence support genetic links between ovary size and division of labor in worker honey bees. However, it is largely unknown how ovaries influence behavior. To address this question, we first performed transcriptional profiling on worker ovaries from two genotypes that differ in social behavior and ovary size. Then, we contrasted the differentially expressed ovarian genes with six sets of available brain transcriptomes. Finally, we probed behavior-related candidate gene networks in wild-type ovaries of different sizes. We found differential expression in 2151 ovarian transcripts in these artificially selected honey bee strains, corresponding to approximately 20.3% of the predicted gene set of honey bees. Differences in gene expression overlapped significantly with changes in the brain transcriptomes. Differentially expressed genes were associated with neural signal transmission (tyramine receptor, TYR) and ecdysteroid signaling; two independently tested nuclear hormone receptors (HR46 and ftz-f1) were also significantly correlated with ovary size in wild-type bees. We suggest that the correspondence between ovary and brain transcriptomes identified here indicates systemic regulatory networks among hormones (juvenile hormone and ecdysteroids), pheromones (queen mandibular pheromone), reproductive organs and nervous tissues in worker honey bees. Furthermore, robust correlations between ovary size and neuraland endocrine response genes are consistent with the hypothesized roles of the ovaries in honey bee behavioral regulation. PMID:22162860

  7. Examining non-syndromic autosomal recessive intellectual disability (NS-ARID) genes for an enriched association with intelligence differences☆

    PubMed Central

    Hill, W.D.; Davies, G.; Liewald, D.C.; Payton, A.; McNeil, C.J.; Whalley, L.J.; Horan, M.; Ollier, W.; Starr, J.M.; Pendleton, N.; Hansel, N.K.; Montgomery, G.W.; Medland, S.E.; Martin, N.G.; Wright, M.J.; Bates, T.C.; Deary, I.J.

    2016-01-01

    Two themes are emerging regarding the molecular genetic aetiology of intelligence. The first is that intelligence is influenced by many variants and those that are tagged by common single nucleotide polymorphisms account for around 30% of the phenotypic variation. The second, in line with other polygenic traits such as height and schizophrenia, is that these variants are not randomly distributed across the genome but cluster in genes that work together. Less clear is whether the very low range of cognitive ability (intellectual disability) is simply one end of the normal distribution describing individual differences in cognitive ability across a population. Here, we examined 40 genes with a known association with non-syndromic autosomal recessive intellectual disability (NS-ARID) to determine if they are enriched for common variants associated with the normal range of intelligence differences. The current study used the 3511 individuals of the Cognitive Ageing Genetics in England and Scotland (CAGES) consortium. In addition, a text mining analysis was used to identify gene sets biologically related to the NS-ARID set. Gene-based tests indicated that genes implicated in NS-ARID were not significantly enriched for quantitative trait loci (QTL) associated with intelligence. These findings suggest that genes in which mutations can have a large and deleterious effect on intelligence are not associated with variation across the range of intelligence differences. PMID:26912939

  8. paraGSEA: a scalable approach for large-scale gene expression profiling

    PubMed Central

    Peng, Shaoliang; Yang, Shunyun

    2017-01-01

    Abstract More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. PMID:28973463

  9. Examining non-syndromic autosomal recessive intellectual disability (NS-ARID) genes for an enriched association with intelligence differences.

    PubMed

    Hill, W D; Davies, G; Liewald, D C; Payton, A; McNeil, C J; Whalley, L J; Horan, M; Ollier, W; Starr, J M; Pendleton, N; Hansel, N K; Montgomery, G W; Medland, S E; Martin, N G; Wright, M J; Bates, T C; Deary, I J

    2016-01-01

    Two themes are emerging regarding the molecular genetic aetiology of intelligence. The first is that intelligence is influenced by many variants and those that are tagged by common single nucleotide polymorphisms account for around 30% of the phenotypic variation. The second, in line with other polygenic traits such as height and schizophrenia, is that these variants are not randomly distributed across the genome but cluster in genes that work together. Less clear is whether the very low range of cognitive ability (intellectual disability) is simply one end of the normal distribution describing individual differences in cognitive ability across a population. Here, we examined 40 genes with a known association with non-syndromic autosomal recessive intellectual disability (NS-ARID) to determine if they are enriched for common variants associated with the normal range of intelligence differences. The current study used the 3511 individuals of the Cognitive Ageing Genetics in England and Scotland (CAGES) consortium. In addition, a text mining analysis was used to identify gene sets biologically related to the NS-ARID set. Gene-based tests indicated that genes implicated in NS-ARID were not significantly enriched for quantitative trait loci (QTL) associated with intelligence. These findings suggest that genes in which mutations can have a large and deleterious effect on intelligence are not associated with variation across the range of intelligence differences.

  10. An omnibus permutation test on ensembles of two-locus analyses can detect pure epistasis and genetic heterogeneity in genome-wide association studies.

    PubMed

    Setsirichok, Damrongrit; Tienboon, Phuwadej; Jaroonruang, Nattapong; Kittichaijaroen, Somkit; Wongseree, Waranyu; Piroonratana, Theera; Usavanarong, Touchpong; Limwongse, Chanin; Aporntewan, Chatchawit; Phadoongsidhi, Marong; Chaiyaratana, Nachol

    2013-01-01

    This article presents the ability of an omnibus permutation test on ensembles of two-locus analyses (2LOmb) to detect pure epistasis in the presence of genetic heterogeneity. The performance of 2LOmb is evaluated in various simulation scenarios covering two independent causes of complex disease where each cause is governed by a purely epistatic interaction. Different scenarios are set up by varying the number of available single nucleotide polymorphisms (SNPs) in data, number of causative SNPs and ratio of case samples from two affected groups. The simulation results indicate that 2LOmb outperforms multifactor dimensionality reduction (MDR) and random forest (RF) techniques in terms of a low number of output SNPs and a high number of correctly-identified causative SNPs. Moreover, 2LOmb is capable of identifying the number of independent interactions in tractable computational time and can be used in genome-wide association studies. 2LOmb is subsequently applied to a type 1 diabetes mellitus (T1D) data set, which is collected from a UK population by the Wellcome Trust Case Control Consortium (WTCCC). After screening for SNPs that locate within or near genes and exhibit no marginal single-locus effects, the T1D data set is reduced to 95,991 SNPs from 12,146 genes. The 2LOmb search in the reduced T1D data set reveals that 12 SNPs, which can be divided into two independent sets, are associated with the disease. The first SNP set consists of three SNPs from MUC21 (mucin 21, cell surface associated), three SNPs from MUC22 (mucin 22), two SNPs from PSORS1C1 (psoriasis susceptibility 1 candidate 1) and one SNP from TCF19 (transcription factor 19). A four-locus interaction between these four genes is also detected. The second SNP set consists of three SNPs from ATAD1 (ATPase family, AAA domain containing 1). Overall, the findings indicate the detection of pure epistasis in the presence of genetic heterogeneity and provide an alternative explanation for the aetiology of T1D in the UK population.

  11. Gene expression analysis using a highly sensitive DNA microarray for colorectal cancer screening.

    PubMed

    Koga, Yoshikatsu; Yamazaki, Nobuyoshi; Takizawa, Satoko; Kawauchi, Junpei; Nomura, Osamu; Yamamoto, Seiichiro; Saito, Norio; Kakugawa, Yasuo; Otake, Yosuke; Matsumoto, Minori; Matsumura, Yasuhiro

    2014-01-01

    Half of all patients with small, right-sided, non-metastatic colorectal cancer (CRC) have negative results for the fecal occult blood test (FOBT). In the present study, the usefulness of CRC screening with a highly sensitive DNA microarray was evaluated in comparison with that by FOBT using fecal samples. A total of 53 patients with CRC and 61 healthy controls were divided into "training" and "validation sets". For the gene profiling, total RNA extracted from 0.5 g of feces was hybridized to a highly sensitive DNA chip. The expressions of 43 genes were significantly higher in the patients with CRC than in healthy controls (p<0.05). In the training set, the sensitivity and specificity of the DNA chip assay using six genes were 85.4% and 85.2%, respectively. On the other hand, in the validation set, the sensitivity and specificity of the DNA chip assay were 85.2% and 85.7%, respectively. The sensitivities of the DNA chip assay were higher than those of FOBT in cases of the small, right-sided, early-CRC, tumor invading up to the muscularis propria (i.e. surface tumor) subgroups. In particular, the sensitivities of the DNA chip assay in the surface tumor and early-CRC subgroups were significantly higher than those of FOBT (p=0.023 and 0.019, respectively.). Gene profiling assay using a highly sensitive DNA chip was more effective than FOBT at detecting patients with small, right-sided, surface tumor, and early-stage CRC.

  12. GSCALite: A Web Server for Gene Set Cancer Analysis.

    PubMed

    Liu, Chun-Jie; Hu, Fei-Fei; Xia, Mengxuan; Han, Leng; Zhang, Qiong; Guo, An-Yuan

    2018-05-22

    The availability of cancer genomic data makes it possible to analyze genes related to cancer. Cancer is usually the result of a set of genes and the signal of a single gene could be covered by background noise. Here, we present a web server named Gene Set Cancer Analysis (GSCALite) to analyze a set of genes in cancers with the following functional modules. (i) Differential expression in tumor vs normal, and the survival analysis; (ii) Genomic variations and their survival analysis; (iii) Gene expression associated cancer pathway activity; (iv) miRNA regulatory network for genes; (v) Drug sensitivity for genes; (vi) Normal tissue expression and eQTL for genes. GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers. GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/. guoay@hust.edu.cn or zhangqiong@hust.edu.cn. Supplementary data are available at Bioinformatics online.

  13. ABC transporters and the proteasome complex are implicated in susceptibility to Stevens-Johnson syndrome and toxic epidermal necrolysis across multiple drugs.

    PubMed

    Nicoletti, Paola; Bansal, Mukesh; Lefebvre, Celine; Guarnieri, Paolo; Shen, Yufeng; Pe'er, Itsik; Califano, Andrea; Floratos, Aris

    2015-01-01

    Stevens-Johnson syndrome (SJS) and Toxic Epidermal Necrolysis (TEN) represent rare but serious adverse drug reactions (ADRs). Both are characterized by distinctive blistering lesions and significant mortality rates. While there is evidence for strong drug-specific genetic predisposition related to HLA alleles, recent genome wide association studies (GWAS) on European and Asian populations have failed to identify genetic susceptibility alleles that are common across multiple drugs. We hypothesize that this is a consequence of the low to moderate effect size of individual genetic risk factors. To test this hypothesis we developed Pointer, a new algorithm that assesses the aggregate effect of multiple low risk variants on a pathway using a gene set enrichment approach. A key advantage of our method is the capability to associate SNPs with genes by exploiting physical proximity as well as by using expression quantitative trait loci (eQTLs) that capture information about both cis- and trans-acting regulatory effects. We control for known bias-inducing aspects of enrichment based analyses, such as: 1) gene length, 2) gene set size, 3) presence of biologically related genes within the same linkage disequilibrium (LD) region, and, 4) genes shared among multiple gene sets. We applied this approach to publicly available SJS/TEN genome-wide genotype data and identified the ABC transporter and Proteasome pathways as potentially implicated in the genetic susceptibility of non-drug-specific SJS/TEN. We demonstrated that the innovative SNP-to-gene mapping phase of the method was essential in detecting the significant enrichment for those pathways. Analysis of an independent gene expression dataset provides supportive functional evidence for the involvement of Proteasome pathways in SJS/TEN cutaneous lesions. These results suggest that Pointer provides a useful framework for the integrative analysis of pharmacogenetic GWAS data, by increasing the power to detect aggregate effects of multiple low risk variants. The software is available for download at https://sourceforge.net/projects/pointergsa/.

  14. Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of data partitions with heterogeneous base composition.

    PubMed

    Springer, M S; Amrine, H M; Burk, A; Stanhope, M J

    1999-03-01

    We concatenated sequences for four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes [aquaporin, alpha 2B adrenergic receptor (A2AB), interphotoreceptor retinoid-binding protein (IRBP), von Willebrand factor (vWF)] into a multigene data set representing 11 eutherian orders (Artiodactyla, Hyracoidea, Insectivora, Lagomorpha, Macroscelidea, Perissodactyla, Primates, Proboscidea, Rodentia, Sirenia, Tubulidentata). Within this data set, we recognized nine mitochondrial partitions (both stems and loops, for each of 12S rRNA, tRNA valine, and 16S rRNA; and first, second, and third codon positions of cytochrome b) and 12 nuclear partitions (first, second, and third codon positions, respectively, of each of the four nuclear genes). Four of the 21 partitions (third positions of cytochrome b, A2AB, IRBP, and vWF) showed significant heterogeneity in base composition across taxa. Phylogenetic analyses (parsimony, minimum evolution, maximum likelihood) based on sequences for all 21 partitions provide 99-100% bootstrap support for Afrotheria and Paenungulata. With the elimination of the four partitions exhibiting heterogeneity in base composition, there is also high bootstrap support (89-100%) for cow + horse. Statistical tests reject Altungulata, Anagalida, and Ungulata. Data set heterogeneity between mitochondrial and nuclear genes is most evident when all partitions are included in the phylogenetic analyses. Mitochondrial-gene trees associate cow with horse, whereas nuclear-gene trees associate cow with hedgehog and these two with horse. However, after eliminating third positions of A2AB, IRBP, and vWF, nuclear data agree with mitochondrial data in supporting cow + horse. Nuclear genes provide stronger support for both Afrotheria and Paenungulata. Removal of third positions of cytochrome b results in improved performance for the mitochondrial genes in recovering these clades.

  15. Biomarker discovery for colon cancer using a 761 gene RT-PCR assay.

    PubMed

    Clark-Langone, Kim M; Wu, Jenny Y; Sangli, Chithra; Chen, Angela; Snable, James L; Nguyen, Anhthu; Hackett, James R; Baker, Joffre; Yothers, Greg; Kim, Chungyeul; Cronin, Maureen T

    2007-08-15

    Reverse transcription PCR (RT-PCR) is widely recognized to be the gold standard method for quantifying gene expression. Studies using RT-PCR technology as a discovery tool have historically been limited to relatively small gene sets compared to other gene expression platforms such as microarrays. We have recently shown that TaqMan RT-PCR can be scaled up to profile expression for 192 genes in fixed paraffin-embedded (FPE) clinical study tumor specimens. This technology has also been used to develop and commercialize a widely used clinical test for breast cancer prognosis and prediction, the Onco typeDX assay. A similar need exists in colon cancer for a test that provides information on the likelihood of disease recurrence in colon cancer (prognosis) and the likelihood of tumor response to standard chemotherapy regimens (prediction). We have now scaled our RT-PCR assay to efficiently screen 761 biomarkers across hundreds of patient samples and applied this process to biomarker discovery in colon cancer. This screening strategy remains attractive due to the inherent advantages of maintaining platform consistency from discovery through clinical application. RNA was extracted from formalin fixed paraffin embedded (FPE) tissue, as old as 28 years, from 354 patients enrolled in NSABP C-01 and C-02 colon cancer studies. Multiplexed reverse transcription reactions were performed using a gene specific primer pool containing 761 unique primers. PCR was performed as independent TaqMan reactions for each candidate gene. Hierarchal clustering demonstrates that genes expected to co-express form obvious, distinct and in certain cases very tightly correlated clusters, validating the reliability of this technical approach to biomarker discovery. We have developed a high throughput, quantitatively precise multi-analyte gene expression platform for biomarker discovery that approaches low density DNA arrays in numbers of genes analyzed while maintaining the high specificity, sensitivity and reproducibility that are characteristics of RT-PCR. Biomarkers discovered using this approach can be transferred to a clinical reference laboratory setting without having to re-validate the assay on a second technology platform.

  16. Variants in the ATP-Binding Cassette Transporter (ABCA7), Apolipoprotein E ε4, and the Risk of Late-Onset Alzheimer Disease in African Americans

    PubMed Central

    Reitz, Christiane; Jun, Gyungah; Naj, Adam; Rajbhandary, Ruchita; Vardarajan, Badri Narayan; Wang, Li-San; Valladares, Otto; Lin, Chiao-Feng; Larson, Eric B.; Graff-Radford, Neill R.; Evans, Denis; De Jager, Philip L.; Crane, Paul K.; Buxbaum, Joseph D.; Murrell, Jill R.; Raj, Towfique; Ertekin-Taner, Nilufer; Logue, Mark; Baldwin, Clinton T.; Green, Robert C.; Barnes, Lisa L.; Cantwell, Laura B.; Fallin, M. Daniele; Go, Rodney C. P.; Griffith, Patrick; Obisesan, Thomas O.; Manly, Jennifer J.; Lunetta, Kathryn L.; Kamboh, M. Ilyas; Lopez, Oscar L.; Bennett, David A.; Hendrie, Hugh; Hall, Kathleen S.; Goate, Alison M.; Byrd, Goldie S.; Kukull, Walter A.; Foroud, Tatiana M.; Haines, Jonathan L.; Farrer, Lindsay A.; Pericak-Vance, Margaret A.; Schellenberg, Gerard D.; Mayeux, Richard

    2013-01-01

    Importance Genetic variants associated with susceptibility to late-onset Alzheimer disease are known for individuals of European ancestry, but whether the same or different variants account for the genetic risk of Alzheimer disease in African American individuals is unknown. Identification of disease-associated variants helps identify targets for genetic testing, prevention, and treatment. Objective To identify genetic loci associated with late-onset Alzheimer disease in African Americans. Design, Setting, and Participants The Alzheimer Disease Genetics Consortium (ADGC) assembled multiple data sets representing a total of 5896 African Americans (1968 case participants, 3928 control participants) 60 years or older that were collected between 1989 and 2011 at multiple sites. The association of Alzheimer disease with genotyped and imputed single-nucleotide polymorphisms (SNPs) was assessed in case-control and in family-based data sets. Results from individual data sets were combined to perform an inverse variance–weighted meta-analysis, first with genome-wide analyses and subsequently with gene-based tests for previously reported loci. Main Outcomes and Measures Presence of Alzheimer disease according to standardized criteria. Results Genome-wide significance in fully adjusted models (sex, age, APOE genotype, population stratification) was observed for a SNP in ABCA7 (rs115550680, allele = G; frequency, 0.09 cases and 0.06 controls; odds ratio [OR], 1.79 [95% CI, 1.47-2.12]; P = 2.2 × 10–9), which is in linkage disequilibrium with SNPs previously associated with Alzheimer disease in Europeans (0.8

  17. Evidence of Recessive Alzheimer Disease Loci in a Caribbean Hispanic Data Set

    PubMed Central

    Ghani, Mahdi; Sato, Christine; Lee, Joseph H.; Reitz, Christiane; Moreno, Danielle; Mayeux, Richard; St George-Hyslop, Peter; Rogaeva, Ekaterina

    2014-01-01

    IMPORTANCE The search for novel Alzheimer disease (AD) genes or pathologic mutations within known AD loci is ongoing. The development of array technologies has helped to identify rare recessive mutations among long runs of homozygosity (ROHs), in which both parental alleles are identical. Caribbean Hispanics are known to have an elevated risk for AD and tend to have large families with evidence of inbreeding. OBJECTIVE To test the hypothesis that the late-onset AD in a Caribbean Hispanic population might be explained in part by the homozygosity of unknown loci that could harbor recessive AD risk haplotypes or pathologic mutations. DESIGN We used genome-wide array data to identify ROHs (>1 megabase) and conducted global burden and locus-specific ROH analyses. SETTING A whole-genome case-control ROH study. PARTICIPANTS A Caribbean Hispanic data set of 547 unrelated cases (48.8% with familial AD) and 542 controls collected from a population known to have a 3-fold higher risk of AD vs non-Hispanics in the same community. Based on a Structure program analysis, our data set consisted of African Hispanic (207 cases and 192 controls) and European Hispanic (329 cases and 326 controls) participants. EXPOSURE Alzheimer disease risk genes. MAIN OUTCOMES AND MEASURES We calculated the total and mean lengths of the ROHs per sample. Global burden measurements among autosomal chromosomes were investigated in cases vs controls. Pools of overlapping ROH segments (consensus regions) were identified, and the case to control ratio was calculated for each consensus region. We formulated the tested hypothesis before data collection. RESULTS In total, we identified 17 137 autosomal regions with ROHs. The mean length of the ROH per person was significantly greater in cases vs controls (P = .0039), and this association was stronger with familial AD (P = .0005). Among the European Hispanics, a consensus region at the EXOC4 locus was significantly associated with AD even after correction for multiple testing (empirical P value 1 [EMP1], .0001; EMP2, .002; 21 AD cases vs 2 controls). Among the African Hispanic subset, the most significant but nominal association was observed for CTNNA3, a well-known AD gene candidate (EMP1, .002; 10 AD cases vs 0 controls). CONCLUSIONS AND RELEVANCE Our results show that ROHs could significantly contribute to the etiology of AD. Future studies would require the analysis of larger, relatively inbred data sets that might reveal novel recessive AD genes. The next step is to conduct sequencing of top significant loci in a subset of samples with overlapping ROHs. PMID:23978990

  18. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

    PubMed

    Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

    2016-07-08

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Economic analysis of ALK testing and crizotinib therapy for advanced non-small-cell lung cancer.

    PubMed

    Lu, Shun; Zhang, Jie; Ye, Ming; Wang, Baoai; Wu, Bin

    2016-06-01

    The economic outcome of crizotinib in advanced non-small-cell lung cancer harboring anaplastic lymphoma kinase rearrangement would be investigated. Based on a mathematical model, the economic outcome of three techniques for testing ALK gene rearrangement combing with crizotinib would be evaluated and compared with traditional regimen. The impact of the crizotinib patient assistance program (PAP) was assessed. Ventana immunohistochemistry, quantitative real-time reverse transcription-polymerase chain reaction and IHC testing plus fluorescent in situ hybridization confirmation for anaplastic lymphoma kinase testing following crizotinib treatment leaded to the incremental cost-effectiveness ratios of US$16,820 and US$223,242, US$24,424 and US$223,271, and US$16,850 and US$254,668 per quality-adjusted life-year gained with and without PAP, respectively. Gene-guided crizotinib therapy might be a cost-effective alternative comparing with the traditional regimen in the PAP setting.

  20. Modified signal-to-noise: a new simple and practical gene filtering approach based on the concept of projective adaptive resonance theory (PART) filtering method.

    PubMed

    Takahashi, Hiro; Honda, Hiroyuki

    2006-07-01

    Considering the recent advances in and the benefits of DNA microarray technologies, many gene filtering approaches have been employed for the diagnosis and prognosis of diseases. In our previous study, we developed a new filtering method, namely, the projective adaptive resonance theory (PART) filtering method. This method was effective in subclass discrimination. In the PART algorithm, the genes with a low variance in gene expression in either class, not both classes, were selected as important genes for modeling. Based on this concept, we developed novel simple filtering methods such as modified signal-to-noise (S2N') in the present study. The discrimination model constructed using these methods showed higher accuracy with higher reproducibility as compared with many conventional filtering methods, including the t-test, S2N, NSC and SAM. The reproducibility of prediction was evaluated based on the correlation between the sets of U-test p-values on randomly divided datasets. With respect to leukemia, lymphoma and breast cancer, the correlation was high; a difference of >0.13 was obtained by the constructed model by using <50 genes selected by S2N'. Improvement was higher in the smaller genes and such higher correlation was observed when t-test, NSC and SAM were used. These results suggest that these modified methods, such as S2N', have high potential to function as new methods for marker gene selection in cancer diagnosis using DNA microarray data. Software is available upon request.

  1. Variants in the ATP-binding cassette transporter (ABCA7), apolipoprotein E ϵ4,and the risk of late-onset Alzheimer disease in African Americans.

    PubMed

    Reitz, Christiane; Jun, Gyungah; Naj, Adam; Rajbhandary, Ruchita; Vardarajan, Badri Narayan; Wang, Li-San; Valladares, Otto; Lin, Chiao-Feng; Larson, Eric B; Graff-Radford, Neill R; Evans, Denis; De Jager, Philip L; Crane, Paul K; Buxbaum, Joseph D; Murrell, Jill R; Raj, Towfique; Ertekin-Taner, Nilufer; Logue, Mark; Baldwin, Clinton T; Green, Robert C; Barnes, Lisa L; Cantwell, Laura B; Fallin, M Daniele; Go, Rodney C P; Griffith, Patrick; Obisesan, Thomas O; Manly, Jennifer J; Lunetta, Kathryn L; Kamboh, M Ilyas; Lopez, Oscar L; Bennett, David A; Hendrie, Hugh; Hall, Kathleen S; Goate, Alison M; Byrd, Goldie S; Kukull, Walter A; Foroud, Tatiana M; Haines, Jonathan L; Farrer, Lindsay A; Pericak-Vance, Margaret A; Schellenberg, Gerard D; Mayeux, Richard

    2013-04-10

    Genetic variants associated with susceptibility to late-onset Alzheimer disease are known for individuals of European ancestry, but whether the same or different variants account for the genetic risk of Alzheimer disease in African American individuals is unknown. Identification of disease-associated variants helps identify targets for genetic testing, prevention, and treatment. To identify genetic loci associated with late-onset Alzheimer disease in African Americans. The Alzheimer Disease Genetics Consortium (ADGC) assembled multiple data sets representing a total of 5896 African Americans (1968 case participants, 3928 control participants) 60 years or older that were collected between 1989 and 2011 at multiple sites. The association of Alzheimer disease with genotyped and imputed single-nucleotide polymorphisms (SNPs) was assessed in case-control and in family-based data sets. Results from individual data sets were combined to perform an inverse variance-weighted meta-analysis, first with genome-wide analyses and subsequently with gene-based tests for previously reported loci. Presence of Alzheimer disease according to standardized criteria. Genome-wide significance in fully adjusted models (sex, age, APOE genotype, population stratification) was observed for a SNP in ABCA7 (rs115550680, allele = G; frequency, 0.09 cases and 0.06 controls; odds ratio [OR], 1.79 [95% CI, 1.47-2.12]; P = 2.2 × 10(-9)), which is in linkage disequilibrium with SNPs previously associated with Alzheimer disease in Europeans (0.8 < D' < 0.9). The effect size for the SNP in ABCA7 was comparable with that of the APOE ϵ4-determining SNP rs429358 (allele = C; frequency, 0.30 cases and 0.18 controls; OR, 2.31 [95% CI, 2.19-2.42]; P = 5.5 × 10(-47)). Several loci previously associated with Alzheimer disease but not reaching significance in genome-wide analyses were replicated in gene-based analyses accounting for linkage disequilibrium between markers and correcting for number of tests performed per gene (CR1, BIN1, EPHA1, CD33; 0.0005 < empirical P < .001). In this meta-analysis of data from African American participants, Alzheimer disease was significantly associated with variants in ABCA7 and with other genes that have been associated with Alzheimer disease in individuals of European ancestry. Replication and functional validation of this finding is needed before this information is used in clinical settings.

  2. Reproducibility-optimized test statistic for ranking genes in microarray studies.

    PubMed

    Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero

    2008-01-01

    A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

  3. Comparison of various primer sets for detection of Toxoplasma gondii by polymerase chain reaction in fetal tissues from naturally aborted foxes.

    PubMed

    Smielewska-Loś, E

    2003-01-01

    Tissues from 4 aborted polar foxes (3 samples of brain and 4 samples of liver) were selected for Toxoplasma gondii PCR assay. Positive results of serological tests of mothers and immunofluorescence test (IFT) of fetal organ smears were the criteria of sample selection. Five sets of primers designed from B1 gene and ITS1 sequences of T. gondii were used for detection of the parasite in fetal fox tissues. All used primer sets successfully amplified T. gondii DNA in PCR from organs which were positive by IFT. Single tube nested PCR also showed positive result from a sample negative by IFT, but this product was not confirmed. The studies showed usefullness of PCR for routine diagnosis of toxoplasmosis in carnivores.

  4. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain

    PubMed Central

    Krienen, Fenna M.; Yeo, B. T. Thomas; Ge, Tian; Buckner, Randy L.; Sherwood, Chet C.

    2016-01-01

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute’s human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections. PMID:26739559

  5. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain.

    PubMed

    Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C

    2016-01-26

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.

  6. Genome-wide differences in hepatitis C- vs alcoholism-associated hepatocellular carcinoma

    PubMed Central

    Derambure, Céline; Coulouarn, Cédric; Caillot, Frédérique; Daveau, Romain; Hiron, Martine; Scotte, Michel; François, Arnaud; Duclos, Celia; Goria, Odile; Gueudin, Marie; Cavard, Catherine; Terris, Benoit; Daveau, Maryvonne; Salier, Jean-Philippe

    2008-01-01

    AIM: To look at a comprehensive picture of etiology-dependent gene abnormalities in hepatocellular carcinoma in Western Europe. METHODS: With a liver-oriented microarray, transcript levels were compared in nodules and cirrhosis from a training set of patients with hepatocellular carcinoma (alcoholism, 12; hepatitis C, 10) and 5 controls. Loose or tight selection of informative transcripts with an abnormal abundance was statistically valid and the tightly selected transcripts were next quantified by qRTPCR in the nodules from our training set (12 + 10) and a test set (6 + 7). RESULTS: A selection of 475 transcripts pointed to significant gene over-representation on chromosome 8 (alcoholism) or -2 (hepatitis C) and ontology indicated a predominant inflammatory response (alcoholism) or changes in cell cycle regulation, transcription factors and interferon responsiveness (hepatitis C). A stringent selection of 23 transcripts whose differences between etiologies were significant in nodules but not in cirrhotic tissue indicated that the above dysregulations take place in tumor but not in the surrounding cirrhosis. These 23 transcripts separated our test set according to etiologies. The inflammation-associated transcripts pointed to limited alterations of free iron metabolism in alcoholic vs hepatitis C tumors. CONCLUSION: Etiology-specific abnormalities (chromosome preference; differences in transcriptomes and related functions) have been identified in hepatocellular carcinoma driven by alcoholism or hepatitis C. This may open novel avenues for differential therapies in this disease. PMID:18350606

  7. Phylogenetics and evolution of Su(var)3-9 SET genes in land plants: rapid diversification in structure and function.

    PubMed

    Zhu, Xinyu; Ma, Hong; Chen, Zhiduan

    2011-03-09

    Plants contain numerous Su(var)3-9 homologues (SUVH) and related (SUVR) genes, some of which await functional characterization. Although there have been studies on the evolution of plant Su(var)3-9 SET genes, a systematic evolutionary study including major land plant groups has not been reported. Large-scale phylogenetic and evolutionary analyses can help to elucidate the underlying molecular mechanisms and contribute to improve genome annotation. Putative orthologs of plant Su(var)3-9 SET protein sequences were retrieved from major representatives of land plants. A novel clustering that included most members analyzed, henceforth referred to as core Su(var)3-9 homologues and related (cSUVHR) gene clade, was identified as well as all orthologous groups previously identified. Our analysis showed that plant Su(var)3-9 SET proteins possessed a variety of domain organizations, and can be classified into five types and ten subtypes. Plant Su(var)3-9 SET genes also exhibit a wide range of gene structures among different paralogs within a family, even in the regions encoding conserved PreSET and SET domains. We also found that the majority of SUVH members were intronless and formed three subclades within the SUVH clade. A detailed phylogenetic analysis of the plant Su(var)3-9 SET genes was performed. A novel deep phylogenetic relationship including most plant Su(var)3-9 SET genes was identified. Additional domains such as SAR, ZnF_C2H2 and WIYLD were early integrated into primordial PreSET/SET/PostSET domain organization. At least three classes of gene structures had been formed before the divergence of Physcomitrella patens (moss) from other land plants. One or multiple retroposition events might have occurred among SUVH genes with the donor genes leading to the V-2 orthologous group. The structural differences among evolutionary groups of plant Su(var)3-9 SET genes with different functions were described, contributing to the design of further experimental studies.

  8. Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets.

    PubMed

    Park, Inho; Lee, Kwang H; Lee, Doheon

    2010-06-15

    Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. Supplementary data are available at Bioinformatics online.

  9. CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.

    PubMed

    Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P

    2015-03-11

    The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.

  10. Phylogenetics and evolution of Trx SET genes in fully sequenced land plants.

    PubMed

    Zhu, Xinyu; Chen, Caoyi; Wang, Baohua

    2012-04-01

    Plant Trx SET proteins are involved in H3K4 methylation and play a key role in plant floral development. Genes encoding Trx SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history of the Trx SET gene family, we made a comprehensive evolutionary analysis on this gene family from 13 major representatives of green plants. A novel clustering (here named as cpTrx clade), which included the III-1, III-2, and III-4 orthologous groups, previously resolved was identified. Our analysis showed that plant Trx proteins possessed a variety of domain organizations and gene structures among paralogs. Additional domains such as PHD, PWWP, and FYR were early integrated into primordial SET-PostSET domain organization of cpTrx clade. We suggested that the PostSET domain was lost in some members of III-4 orthologous group during the evolution of land plants. At least four classes of gene structures had been formed at the early evolutionary stage of land plants. Three intronless orphan Trx SET genes from the Physcomitrella patens (moss) were identified, and supposedly, their parental genes have been eliminated from the genome. The structural differences among evolutionary groups of plant Trx SET genes with different functions were described, contributing to the design of further experimental studies.

  11. A polygenic burden of rare disruptive mutations in schizophrenia

    PubMed Central

    Purcell, Shaun M.; Moran, Jennifer L.; Fromer, Menachem; Ruderfer, Douglas; Solovieff, Nadia; Roussos, Panos; O’Dushlaine, Colm; Chambert, Kimberly; Bergen, Sarah E.; Kähler, Anna; Duncan, Laramie; Stahl, Eli; Genovese, Giulio; Fernández, Esperanza; Collins, Mark O; Komiyama, Noboru H.; Choudhary, Jyoti S.; Magnusson, Patrik K. E.; Banks, Eric; Shakir, Khalid; Garimella, Kiran; Fennell, Tim; de Pristo, Mark; Grant, Seth G.N.; Haggarty, Stephen; Gabriel, Stacey; Scolnick, Edward M.; Lander, Eric S.; Hultman, Christina; Sullivan, Patrick F.; McCarroll, Steven A.; Sklar, Pamela

    2014-01-01

    By analyzing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we have demonstrated a polygenic burden primarily arising from rare (<1/10,000), disruptive mutations distributed across many genes. Especially enriched genesets included the voltage-gated calcium ion channel and the signaling complex formed by the activity-regulated cytoskeleton-associated (ARC) scaffold protein of the postsynaptic density (PSD), sets previously implicated by genome-wide association studies (GWAS) and copy-number variation (CNV) studies. Similar to reports in autism, targets of the fragile × mental retardation protein (FMRP, product of FMR1) were enriched for case mutations. No individual gene-based test achieved significance after correction for multiple testing and we did not detect any alleles of moderately low frequency (~0.5-1%) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene mapping paradigms in neuropsychiatric disease. PMID:24463508

  12. Segregation of genes from donor strain during the production of recombinant congenic strains.

    PubMed

    van Zutphen, L F; Den Bieman, M; Lankhorst, A; Demant, P

    1991-07-01

    Recombinant congenic strains (RCS) constitute a set of inbred strains which are designed to dissect the genetic control of multigenic traits, such as tumour susceptibility or disease resistance. Each RCS contains a small fraction of the genome of a common donor strain, while the majority of genes stem from a common background strain. We tested at two stages of the inbreeding process in 20 RCS, derived from BALB/cHeA and STS/A, to see whether alleles from the STS/A donor strain are distributed over the RCS in a ratio as would theoretically be expected. Four marker genes (Pep-3; Pgm-1; Gpi-1 and Es-3) located at 4 different chromosomes were selected and the allelic distribution was tested after 3-4 and after 12 generations of inbreeding. The data obtained do not significantly deviate from the expected pattern, thus supporting the validity of the concept of RCS.

  13. A multistage gene normalization system integrating multiple effective methods.

    PubMed

    Li, Lishuang; Liu, Shanshan; Li, Lihua; Fan, Wenting; Huang, Degen; Zhou, Huiwei

    2013-01-01

    Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.

  14. Changes in Gene Expression Predicting Local Control in Cervical Cancer: Results from Radiation Therapy Oncology Group 0128

    PubMed Central

    Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David

    2009-01-01

    Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178

  15. Development of a multiplex PCR assay for detection and discrimination of Theileria annulata and Theileria sergenti in cattle.

    PubMed

    Junlong, Liu; Li, Youquan; Liu, Aihong; Guan, Guiquan; Xie, Junren; Yin, Hong; Luo, Jianxun

    2015-07-01

    Aim to construct a simple and efficient diagnostic assay for Theileria annulata and Theileria sergenti, a multiplex polymerase chain reaction (PCR) method was developed in this study. Following the alignment of the related sequences, two primer sets were designed specific targeting on T. annulata cytochrome b (COB) gene and T. sergenti internal transcribed spacer (ITS) sequences. It was found that the designed primers could react in one PCR system and generating amplifications of 818 and 393 base pair for T. sergenti and T. annulata, respectively. The standard genomic DNA of both species Theileria was serial tenfold diluted for testing the sensitivity, while specificity test confirmed both primer sets have no cross-reaction with other Theileria and Babesia species. In addition, 378 field samples were used for evaluation of the utility of the multiplex PCR assay for detection of the pathogens infection. The detection results were compared with the other two published PCR methods which targeting on T. annulata COB gene and T. sergenti major piroplasm surface protein (MPSP) gene, respectively. The developed multiplex PCR assay has similar efficient detection with COB and MPSP PCR, which indicates this multiplex PCR may be a valuable assay for the epidemiological studies for T. annulata and T. sergenti.

  16. Pathway-based analysis of GWAs data identifies association of sex determination genes with susceptibility to testicular germ cell tumors.

    PubMed

    Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A

    2014-11-15

    Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. A deep learning-based multi-model ensemble method for cancer prediction.

    PubMed

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset.

    PubMed

    Mallik, Saurav; Maulik, Ujjwal

    2015-10-01

    Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. cDREM: inferring dynamic combinatorial gene regulation.

    PubMed

    Wise, Aaron; Bar-Joseph, Ziv

    2015-04-01

    Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation. Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu, we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.

  20. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough

    PubMed Central

    McMahon, George; Ring, Susan M.; Davey-Smith, George; Timpson, Nicholas J.

    2015-01-01

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case–control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E − 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. PMID:26231221

  1. Anti-oxidative assays as markers for anti-inflammatory activity of flavonoids.

    PubMed

    Chanput, Wasaporn; Krueyos, Narumol; Ritthiruangdej, Pitiporn

    2016-11-01

    The complexity of in vitro anti-inflammatory assays, the cost and time consumed, and the necessary skills can be a hurdle to apply to promising compounds in a high throughput setting. In this study, several antioxidative assays i.e. DPPH, ABTS, ORAC and xanthine oxidase (XO) were used to examine the antioxidative activity of three sub groups of flavonoids: (i) flavonol: quercetin, myricetin, (ii) flavanone: eriodictyol, naringenin (iii) flavone: luteolin, apigenin. A range of flavonoid concentrations was tested for their antioxidative activities and were found to be dose-dependent. However, the flavonoid concentrations over 50ppm were found to be toxic to the THP-1 monocytes. Therefore, 10, 20 and 50ppm of flavonoid concentrations were tested for their anti-inflammatory activity in lipopolysaccharide (LPS)-stimulated THP-1 monocytes. Expression of inflammatory genes, IL-1β, IL-6, IL-8, IL-10 and TNF-α was found to be sequentially decreased when flavonoid concentration increased. Principle component analysis (PCA) was used to investigate the relationship between the data sets of antioxidative assays and the expression of inflammatory genes. The results showed that DPPH, ABTS and ORAC assays have an opposite correlation with the reduction of inflammatory genes. Pearson correlation exhibited a relationship between the ABTS assay and the expression of three out of five analyzed genes; IL-1β, IL-6 and IL-8. Our findings indicate that ABTS assay can potentially be an assay marker for anti-inflammatory activity of flavonoids. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Comparative study on gene set and pathway topology-based enrichment methods.

    PubMed

    Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

    2015-10-22

    Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.

  3. A common variant of staphylococcal cassette chromosome mec type IVa in isolates from Copenhagen, Denmark, is not detected by the BD GeneOhm methicillin-resistant Staphylococcus aureus assay.

    PubMed

    Bartels, Mette Damkjaer; Boye, Kit; Rohde, Susanne Mie; Larsen, Anders Rhod; Torfs, Herbert; Bouchy, Peggy; Skov, Robert; Westh, Henrik

    2009-05-01

    Rapid tests for detection of methicillin-resistant Staphylococcus aureus (MRSA) carriage are important to limit the transmission of MRSA in the health care setting. We evaluated the performance of the BD GeneOhm MRSA real-time PCR assay using a diverse collection of MRSA isolates, mainly from Copenhagen, Denmark, but also including international isolates, e.g., USA100-1100. Pure cultures of 349 MRSA isolates representing variants of staphylococcal cassette chromosome mec (SCCmec) types I to V and 103 different staphylococcal protein A (spa) types were tested. In addition, 53 methicillin-susceptible Staphylococcus aureus isolates were included as negative controls. Forty-four MRSA isolates were undetectable; of these, 95% harbored SCCmec type IVa, and these included the most-common clone in Copenhagen, spa t024-sequence type 8-IVa. The false-negative MRSA isolates were tested with new primers (analyte-specific reagent [ASR] BD GeneOhm MRSA assay) supplied by Becton Dickinson (BD). The ASR BD GeneOhm MRSA assay detected 42 of the 44 isolates that were false negative in the BD GeneOhm MRSA assay. Combining the BD GeneOhm MRSA assay with the ASR BD GeneOhm MRSA assay greatly improved the results, with only two MRSA isolates being false negative. The BD GeneOhm MRSA assay alone is not adequate for MRSA detection in Copenhagen, Denmark, as more than one-third of our MRSA isolates would not be detected. We recommend that the BD GeneOhm MRSA assay be evaluated against the local MRSA diversity before being established as a standard assay, and due to the constant evolution of SCCmec cassettes, a continuous global surveillance is advisable in order to update the assay as necessary.

  4. Analytical performance of the ThyroSeq v3 genomic classifier for cancer diagnosis in thyroid nodules.

    PubMed

    Nikiforova, Marina N; Mercurio, Stephanie; Wald, Abigail I; Barbi de Moura, Michelle; Callenberg, Keith; Santana-Santos, Lucas; Gooding, William E; Yip, Linwah; Ferris, Robert L; Nikiforov, Yuri E

    2018-04-15

    Molecular tests have clinical utility for thyroid nodules with indeterminate fine-needle aspiration (FNA) cytology, although their performance requires further improvement. This study evaluated the analytical performance of the newly created ThyroSeq v3 test. ThyroSeq v3 is a DNA- and RNA-based next-generation sequencing assay that analyzes 112 genes for a variety of genetic alterations, including point mutations, insertions/deletions, gene fusions, copy number alterations, and abnormal gene expression, and it uses a genomic classifier (GC) to separate malignant lesions from benign lesions. It was validated in 238 tissue samples and 175 FNA samples with known surgical follow-up. Analytical performance studies were conducted. In the training tissue set of samples, ThyroSeq GC detected more than 100 genetic alterations, including BRAF, RAS, TERT, and DICER1 mutations, NTRK1/3, BRAF, and RET fusions, 22q loss, and gene expression alterations. GC cutoffs were established to distinguish cancer from benign nodules with 93.9% sensitivity, 89.4% specificity, and 92.1% accuracy. This correctly classified most papillary, follicular, and Hurthle cell lesions, medullary thyroid carcinomas, and parathyroid lesions. In the FNA validation set, the GC sensitivity was 98.0%, the specificity was 81.8%, and the accuracy was 90.9%. Analytical accuracy studies demonstrated a minimal required nucleic acid input of 2.5 ng, a 12% minimal acceptable tumor content, and reproducible test results under variable stress conditions. The ThyroSeq v3 GC analyzes 5 different classes of molecular alterations and provides high accuracy for detecting all common types of thyroid cancer and parathyroid lesions. The analytical sensitivity, specificity, and robustness of the test have been successfully validated and indicate its suitability for clinical use. Cancer 2018;124:1682-90. © 2018 American Cancer Society. © 2018 American Cancer Society.

  5. Simple, rapid and sensitive detection of Orientia tsutsugamushi by loop-isothermal DNA amplification.

    PubMed

    Paris, Daniel H; Blacksell, Stuart D; Newton, Paul N; Day, Nicholas P J

    2008-12-01

    We present a loop-mediated isothermal PCR assay (LAMP) targeting the groEL gene, which encodes the 60kDa heat shock protein of Orientia tsutsugamushi. Evaluation included testing of 63 samples of contemporary in vitro isolates, buffy coats and whole blood samples from patients with fever. Detection limits for LAMP were assessed by serial dilutions and quantitation by real-time PCR assay based on the same target gene: three copies/microl for linearized plasmids, 26 copies/microl for VERO cell culture isolates, 14 copies/microl for full blood samples and 41 copies/microl for clinical buffy coats. Based on a limited sample number, the LAMP assay is comparable in sensitivity with conventional nested PCR (56kDa gene), with limits of detection well below the range of known admission bacterial loads of patients with scrub typhus. This inexpensive method requires no sophisticated equipment or sample preparation, and may prove useful as a diagnostic assay in financially poor settings; however, it requires further prospective validation in the field setting.

  6. Identification of a B cell signature associated with renal transplant tolerance in humans

    PubMed Central

    Newell, Kenneth A.; Asare, Adam; Kirk, Allan D.; Gisler, Trang D.; Bourcier, Kasia; Suthanthiran, Manikkam; Burlingham, William J.; Marks, William H.; Sanz, Ignacio; Lechler, Robert I.; Hernandez-Fuentes, Maria P.; Turka, Laurence A.; Seyfert-Margolis, Vicki L.

    2010-01-01

    Establishing long-term allograft acceptance without the requirement for continuous immunosuppression, a condition known as allograft tolerance, is a highly desirable therapeutic goal in solid organ transplantation. Determining which recipients would benefit from withdrawal or minimization of immunosuppression would be greatly facilitated by biomarkers predictive of tolerance. In this study, we identified the largest reported cohort to our knowledge of tolerant renal transplant recipients, as defined by stable graft function and receiving no immunosuppression for more than 1 year, and compared their gene expression profiles and peripheral blood lymphocyte subsets with those of subjects with stable graft function who are receiving immunosuppressive drugs as well as healthy controls. In addition to being associated with clinical and phenotypic parameters, renal allograft tolerance was strongly associated with a B cell signature using several assays. Tolerant subjects showed increased expression of multiple B cell differentiation genes, and a set of just 3 of these genes distinguished tolerant from nontolerant recipients in a unique test set of samples. This B cell signature was associated with upregulation of CD20 mRNA in urine sediment cells and elevated numbers of peripheral blood naive and transitional B cells in tolerant participants compared with those receiving immunosuppression. These results point to a critical role for B cells in regulating alloimmunity and provide a candidate set of genes for wider-scale screening of renal transplant recipients. PMID:20501946

  7. It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

    PubMed

    Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K

    2016-01-01

    RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

  8. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network.

    PubMed

    Hur, Junguk; Xiang, Zuoshuang; Feldman, Eva L; He, Yongqun

    2011-08-26

    Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were identified. The asserted and inferred VO hierarchies provide semantic support for inferring novel knowledge of association of vaccines and genes from the retrieved data. New hypotheses were generated based on this analysis approach. VO-SciMiner can be used to improve the efficiency for PubMed searching in the vaccine domain.

  9. Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network

    PubMed Central

    2011-01-01

    Background Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals. Results The VO-based SciMiner (VO-SciMiner) was developed to incorporate a total of 67 Brucella vaccine terms. A set of rules for term expansion of VO terms were learned from training data, consisting of 90 biomedical articles related to Brucella vaccine terms. VO-SciMiner demonstrated high recall (91%) and precision (99%) from testing a separate set of 100 manually selected biomedical articles. VO-SciMiner indexing exhibited superior performance in retrieving Brucella vaccine-related papers over that obtained with MeSH-based PubMed literature search. For example, a VO-SciMiner search of "live attenuated Brucella vaccine" returned 922 hits as of April 20, 2011, while a PubMed search of the same query resulted in only 74 hits. Using the abstracts of 14,947 Brucella-related papers, VO-SciMiner identified 140 Brucella genes associated with Brucella vaccines. These genes included known protective antigens, virulence factors, and genes closely related to Brucella vaccines. These VO-interacting Brucella genes were significantly over-represented in biological functional categories, including metabolite transport and metabolism, replication and repair, cell wall biogenesis, intracellular trafficking and secretion, posttranslational modification, and chaperones. Furthermore, a comprehensive interaction network of Brucella vaccines and genes were identified. The asserted and inferred VO hierarchies provide semantic support for inferring novel knowledge of association of vaccines and genes from the retrieved data. New hypotheses were generated based on this analysis approach. Conclusion VO-SciMiner can be used to improve the efficiency for PubMed searching in the vaccine domain. PMID:21871085

  10. Improved score statistics for meta-analysis in single-variant and gene-level association studies.

    PubMed

    Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

    2018-06-01

    Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.

  11. Transcriptional alterations in the left ventricle of three hypertensive rat models.

    PubMed

    Cerutti, Catherine; Kurdi, Mazen; Bricca, Giampiero; Hodroj, Wassim; Paultre, Christian; Randon, Jacques; Gustin, Marie-Paule

    2006-11-27

    Left ventricular hypertrophy (LVH) is commonly associated with hypertension and represents an independent cardiovascular risk factor. The aim of this study was to test the hypothesis that the cardiac overload related to hypertension is associated to a specific gene expression pattern independently of genetic background. Gene expression levels were obtained with microarrays for 15,866 transcripts from RNA of left ventricles from 12-wk-old rats of three hypertensive models [spontaneously hypertensive rat (SHR), Lyon hypertensive rat (LH), and heterozygous TGR(mRen2)27 rat] and their respective controls. More than 60% of the detected transcripts displayed significant changes between the three groups of normotensive rats, showing large interstrain variability. Expression data were analyzed with respect to hypertension, LVH, and chromosomal distribution. Only four genes had significantly modified expression in the three hypertensive models among which a single gene, coding for sialyltransferase 7A, was consistently overexpressed. Correlation analysis between expression data and left ventricular mass index (LVMI) over all rats identified a larger set of genes whose expression was continuously related with LVMI, including known genes associated with cardiac remodeling. Positioning the detected transcripts along the chromosomes pointed out high-density regions mostly located within blood pressure and cardiac mass quantitative trait loci. Although our study could not detect a unique reprogramming of cardiac cells involving specific genes at early stage of LVH, it allowed the identification of some genes associated with LVH regardless of genetic background. This study thus provides a set of potentially important genes contained within restricted chromosomal regions involved in cardiovascular diseases.

  12. Antioxidant Defense Enzyme Genes and Asthma Susceptibility: Gender-Specific Effects and Heterogeneity in Gene-Gene Interactions between Pathogenetic Variants of the Disease

    PubMed Central

    Polonikov, Alexey V.; Ivanov, Vladimir P.; Bogomazov, Alexey D.; Freidin, Maxim B.; Illig, Thomas; Solodilova, Maria A.

    2014-01-01

    Oxidative stress resulting from an increased amount of reactive oxygen species and an imbalance between oxidants and antioxidants plays an important role in the pathogenesis of asthma. The present study tested the hypothesis that genetic susceptibility to allergic and nonallergic variants of asthma is determined by complex interactions between genes encoding antioxidant defense enzymes (ADE). We carried out a comprehensive analysis of the associations between adult asthma and 46 single nucleotide polymorphisms of 34 ADE genes and 12 other candidate genes of asthma in Russian population using set association analysis and multifactor dimensionality reduction approaches. We found for the first time epistatic interactions between ADE genes underlying asthma susceptibility and the genetic heterogeneity between allergic and nonallergic variants of the disease. We identified GSR (glutathione reductase) and PON2 (paraoxonase 2) as novel candidate genes for asthma susceptibility. We observed gender-specific effects of ADE genes on the risk of asthma. The results of the study demonstrate complexity and diversity of interactions between genes involved in oxidative stress underlying susceptibility to allergic and nonallergic asthma. PMID:24895604

  13. Predicting fatty acid profiles in blood based on food intake and the FADS1 rs174546 SNP.

    PubMed

    Hallmann, Jacqueline; Kolossa, Silvia; Gedrich, Kurt; Celis-Morales, Carlos; Forster, Hannah; O'Donovan, Clare B; Woolhead, Clara; Macready, Anna L; Fallaize, Rosalind; Marsaux, Cyril F M; Lambrinou, Christina-Paulina; Mavrogianni, Christina; Moschonis, George; Navas-Carretero, Santiago; San-Cristobal, Rodrigo; Godlewska, Magdalena; Surwiłło, Agnieszka; Mathers, John C; Gibney, Eileen R; Brennan, Lorraine; Walsh, Marianne C; Lovegrove, Julie A; Saris, Wim H M; Manios, Yannis; Martinez, Jose Alfredo; Traczyk, Iwona; Gibney, Michael J; Daniel, Hannelore

    2015-12-01

    A high intake of n-3 PUFA provides health benefits via changes in the n-6/n-3 ratio in blood. In addition to such dietary PUFAs, variants in the fatty acid desaturase 1 (FADS1) gene are also associated with altered PUFA profiles. We used mathematical modeling to predict levels of PUFA in whole blood, based on multiple hypothesis testing and bootstrapped LASSO selected food items, anthropometric and lifestyle factors, and the rs174546 genotypes in FADS1 from 1607 participants (Food4Me Study). The models were developed using data from the first reported time point (training set) and their predictive power was evaluated using data from the last reported time point (test set). Among other food items, fish, pizza, chicken, and cereals were identified as being associated with the PUFA profiles. Using these food items and the rs174546 genotypes as predictors, models explained 26-43% of the variability in PUFA concentrations in the training set and 22-33% in the test set. Selecting food items using multiple hypothesis testing is a valuable contribution to determine predictors, as our models' predictive power is higher compared to analogue studies. As unique feature, we additionally confirmed our models' power based on a test set. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Generalized Functional Linear Models for Gene-based Case-Control Association Studies

    PubMed Central

    Mills, James L.; Carter, Tonia C.; Lobach, Iryna; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Weeks, Daniel E.; Xiong, Momiao

    2014-01-01

    By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene are disease-related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease data sets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. PMID:25203683

  15. Genome-wide gene–environment interaction analysis for asbestos exposure in lung cancer susceptibility

    PubMed Central

    Wei, Qingyi Wei

    2012-01-01

    Asbestos exposure is a known risk factor for lung cancer. Although recent genome-wide association studies (GWASs) have identified some novel loci for lung cancer risk, few addressed genome-wide gene–environment interactions. To determine gene–asbestos interactions in lung cancer risk, we conducted genome-wide gene–environment interaction analyses at levels of single nucleotide polymorphisms (SNPs), genes and pathways, using our published Texas lung cancer GWAS dataset. This dataset included 317 498 SNPs from 1154 lung cancer cases and 1137 cancer-free controls. The initial SNP-level P-values for interactions between genetic variants and self-reported asbestos exposure were estimated by unconditional logistic regression models with adjustment for age, sex, smoking status and pack-years. The P-value for the most significant SNP rs13383928 was 2.17×10–6, which did not reach the genome-wide statistical significance. Using a versatile gene-based test approach, we found that the top significant gene was C7orf54, located on 7q32.1 (P = 8.90×10–5). Interestingly, most of the other significant genes were located on 11q13. When we used an improved gene-set-enrichment analysis approach, we found that the Fas signaling pathway and the antigen processing and presentation pathway were most significant (nominal P < 0.001; false discovery rate < 0.05) among 250 pathways containing 17 572 genes. We believe that our analysis is a pilot study that first describes the gene–asbestos interaction in lung cancer risk at levels of SNPs, genes and pathways. Our findings suggest that immune function regulation-related pathways may be mechanistically involved in asbestos-associated lung cancer risk. Abbreviations:CIconfidence intervalEenvironmentFDRfalse discovery rateGgeneGSEAgene-set-enrichment analysisGWASgenome-wide association studiesi-GSEAimproved gene-set-enrichment analysis approachORodds ratioSNPsingle nucleotide polymorphism PMID:22637743

  16. Whole genome sequencing data and de novo draft assemblies for 66 teleost species

    PubMed Central

    Malmstrøm, Martin; Matschiner, Michael; Tørresen, Ole K.; Jakobsen, Kjetill S.; Jentoft, Sissel

    2017-01-01

    Teleost fishes comprise more than half of all vertebrate species, yet genomic data are only available for 0.2% of their diversity. Here, we present whole genome sequencing data for 66 new species of teleosts, vastly expanding the availability of genomic data for this important vertebrate group. We report on de novo assemblies based on low-coverage (9–39×) sequencing and present detailed methodology for all analyses. To facilitate further utilization of this data set, we present statistical analyses of the gene space completeness and verify the expected phylogenetic position of the sequenced genomes in a large mitogenomic context. We further present a nuclear marker set used for phylogenetic inference and evaluate each gene tree in relation to the species tree to test for homogeneity in the phylogenetic signal. Collectively, these analyses illustrate the robustness of this highly diverse data set and enable extensive reuse of the selected phylogenetic markers and the genomic data in general. This data set covers all major teleost lineages and provides unprecedented opportunities for comparative studies of teleosts. PMID:28094797

  17. Comprehensive analysis of orthologous protein domains using the HOPS database.

    PubMed

    Storm, Christian E V; Sonnhammer, Erik L L

    2003-10-01

    One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.

  18. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  19. Proposed methods for testing and selecting the ERCC external RNA controls

    PubMed Central

    2005-01-01

    The External RNA Control Consortium (ERCC) is an ad-hoc group with approximately 70 members from private, public, and academic organizations. The group is developing a set of external RNA control transcripts that can be used to assess technical performance in gene expression assays. The ERCC is now initiating the Testing Phase of the project, during which candidate external RNA controls will be evaluated in both microarray and QRT-PCR gene expression platforms. This document describes the proposed experiments and informatics process that will be followed to test and qualify individual controls. The ERCC is distributing this description of the proposed testing process in an effort to gain consensus and to encourage feedback from the scientific community. On October 4–5, 2005, the ERCC met to further review the document, clarify ambiguities, and plan next steps. A summary of this meeting and changes to the test plan are provided as an appendix to this manuscript. PMID:16266432

  20. Accurate clinical genetic testing for autoinflammatory diseases using the next-generation sequencing platform MiSeq.

    PubMed

    Nakayama, Manabu; Oda, Hirotsugu; Nakagawa, Kenji; Yasumi, Takahiro; Kawai, Tomoki; Izawa, Kazushi; Nishikomori, Ryuta; Heike, Toshio; Ohara, Osamu

    2017-03-01

    Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA), also known as neonatal-onset multisystem inflammatory disease (NOMID). In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program-which detects and avoids common SNPs in gene-specific PCR primers-we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of somatic mosaic mutations.

  1. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer.

    PubMed

    Wolf, Yuri I; Makarova, Kira S; Yutin, Natalya; Koonin, Eugene V

    2012-12-14

    Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea. The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major 'highways' of horizontal gene transfer. The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time. This article was reviewed by (for complete reviews see the Reviewers' Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).

  2. MiR-137-derived polygenic risk: effects on cognitive performance in patients with schizophrenia and controls.

    PubMed

    Cosgrove, D; Harold, D; Mothersill, O; Anney, R; Hill, M J; Bray, N J; Blokland, G; Petryshen, T; Richards, A; Mantripragada, K; Owen, M; O'Donovan, M C; Gill, M; Corvin, A; Morris, D W; Donohoe, G

    2017-01-24

    Variants at microRNA-137 (MIR137), one of the most strongly associated schizophrenia risk loci identified to date, have been associated with poorer cognitive performance. As microRNA-137 is known to regulate the expression of ~1900 other genes, including several that are independently associated with schizophrenia, we tested whether this gene set was also associated with variation in cognitive performance. Our analysis was based on an empirically derived list of genes whose expression was altered by manipulation of MIR137 expression. This list was cross-referenced with genome-wide schizophrenia association data to construct individual polygenic scores. We then tested, in a sample of 808 patients and 192 controls, whether these risk scores were associated with altered performance on cognitive functions known to be affected in schizophrenia. A subgroup of healthy participants also underwent functional imaging during memory (n=108) and face processing tasks (n=83). Increased polygenic risk within the empirically derived miR-137 regulated gene score was associated with significantly lower performance on intelligence quotient, working memory and episodic memory. These effects were observed most clearly at a polygenic threshold of P=0.05, although significant results were observed at all three thresholds analyzed. This association was found independently for the gene set as a whole, excluding the schizophrenia-associated MIR137 SNP itself. Analysis of the spatial working memory fMRI task further suggested that increased risk score (thresholded at P=10 -5 ) was significantly associated with increased activation of the right inferior occipital gyrus. In conclusion, these data are consistent with emerging evidence that MIR137 associated risk for schizophrenia may relate to its broader downstream genetic effects.

  3. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

    PubMed

    Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K

    2003-11-01

    Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf

  4. GeneOnEarth: fitting genetic PC plots on the globe.

    PubMed

    Torres-Sánchez, Sergio; Medina-Medina, Nuria; Gignoux, Chris; Abad-Grau, María M; González-Burchard, Esteban

    2013-01-01

    Principal component (PC) plots have become widely used to summarize genetic variation of individuals in a sample. The similarity between genetic distance in PC plots and geographical distance has shown to be quite impressive. However, in most situations, individual ancestral origins are not precisely known or they are heterogeneously distributed; hence, they are hardly linked to a geographical area. We have developed GeneOnEarth, a user-friendly web-based tool to help geneticists to understand whether a linear isolation-by-distance model may apply to a genetic data set; thus, genetic distances among a set of individuals resemble geographical distances among their origins. Its main goal is to allow users to first apply a by-view Procrustes method to visually learn whether this model holds. To do that, the user can choose the exact geographical area from an on line 2D or 3D world map by using, respectively, Google Maps or Google Earth, and rotate, flip, and resize the images. GeneOnEarth can also compute the optimal rotation angle using Procrustes analysis and assess statistical evidence of similarity when a different rotation angle has been chosen by the user. An online version of GeneOnEarth is available for testing and using purposes at http://bios.ugr.es/GeneOnEarth.

  5. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

    PubMed

    Zheng, Qi; Wang, Xiu-Jie

    2008-07-01

    Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/

  6. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  7. Gene and pathway level analyses of germline DNA-repair gene variants and prostate cancer susceptibility using the iCOGS-genotyping array.

    PubMed

    Saunders, Edward J; Dadaev, Tokhir; Leongamornlert, Daniel A; Al Olama, Ali Amin; Benlloch, Sara; Giles, Graham G; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schleutker, Johanna; Nordestgaard, Borge G; Travis, Ruth C; Neal, David; Pasayan, Nora; Khaw, Kay-Tee; Stanford, Janet L; Blot, William J; Thibodeau, Stephen N; Maier, Christiane; Kibel, Adam S; Cybulski, Cezary; Cannon-Albright, Lisa; Brenner, Hermann; Park, Jong Y; Kaneva, Radka; Batra, Jyotsna; Teixeira, Manuel R; Pandha, Hardev; Govindasami, Koveela; Muir, Ken; Easton, Douglas F; Eeles, Rosalind A; Kote-Jarai, Zsofia

    2016-04-12

    Germline mutations within DNA-repair genes are implicated in susceptibility to multiple forms of cancer. For prostate cancer (PrCa), rare mutations in BRCA2 and BRCA1 give rise to moderately elevated risk, whereas two of B100 common, low-penetrance PrCa susceptibility variants identified so far by genome-wide association studies implicate RAD51B and RAD23B. Genotype data from the iCOGS array were imputed to the 1000 genomes phase 3 reference panel for 21 780 PrCa cases and 21 727 controls from the Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium. We subsequently performed single variant, gene and pathway-level analyses using 81 303 SNPs within 20 Kb of a panel of 179 DNA-repair genes. Single SNP analyses identified only the previously reported association with RAD51B. Gene-level analyses using the SKAT-C test from the SNP-set (Sequence) Kernel Association Test (SKAT) identified a significant association with PrCa for MSH5. Pathway-level analyses suggested a possible role for the translesion synthesis pathway in PrCa risk and Homologous recombination/Fanconi Anaemia pathway for PrCa aggressiveness, even though after adjustment for multiple testing these did not remain significant. MSH5 is a novel candidate gene warranting additional follow-up as a prospective PrCa-risk locus. MSH5 has previously been reported as a pleiotropic susceptibility locus for lung, colorectal and serous ovarian cancers.

  8. Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

    1998-01-01

    Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

  9. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    PubMed

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10 -10 ), MGC57346 (p value=6.92×10 -7 ), BLK (p value=1.01×10 -6 ), XKR6 (p value=1.11×10 -6 ), C17ORF69 (p value=1.12×10 -6 ) and KIAA1267 (p value=4.00×10 -6 ). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  10. ExAtlas: An interactive online tool for meta-analysis of gene expression data.

    PubMed

    Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H

    2015-12-01

    We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.

  11. Genetic Evidence of Human Adaptation to a Cooked Diet

    PubMed Central

    Carmody, Rachel N.; Dannemann, Michael; Briggs, Adrian W.; Nickel, Birgit; Groopman, Emily E.; Wrangham, Richard W.; Kelso, Janet

    2016-01-01

    Humans have been argued to be biologically adapted to a cooked diet, but this hypothesis has not been tested at the molecular level. Here, we combine controlled feeding experiments in mice with comparative primate genomics to show that consumption of a cooked diet influences gene expression and that affected genes bear signals of positive selection in the human lineage. Liver gene expression profiles in mice fed standardized diets of meat or tuber were affected by food type and cooking, but not by caloric intake or consumer energy balance. Genes affected by cooking were highly correlated with genes known to be differentially expressed in liver between humans and other primates, and more genes in this overlap set show signals of positive selection in humans than would be expected by chance. Sequence changes in the genes under selection appear before the split between modern humans and two archaic human groups, Neandertals and Denisovans, supporting the idea that human adaptation to a cooked diet had begun by at least 275,000 years ago. PMID:26979798

  12. Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill

    2016-01-01

    Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956

  13. On the statistical assessment of classifiers using DNA microarray data

    PubMed Central

    Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F

    2006-01-01

    Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. Conclusions The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. PMID:16919171

  14. Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer.

    PubMed

    Li, Bailiang; Cui, Yi; Diehn, Maximilian; Li, Ruijiang

    2017-11-01

    The prevalence of early-stage non-small cell lung cancer (NSCLC) is expected to increase with recent implementation of annual screening programs. Reliable prognostic biomarkers are needed to identify patients at a high risk for recurrence to guide adjuvant therapy. To develop a robust, individualized immune signature that can estimate prognosis in patients with early-stage nonsquamous NSCLC. This retrospective study analyzed the gene expression profiles of frozen tumor tissue samples from 19 public NSCLC cohorts, including 18 microarray data sets and 1 RNA-Seq data set for The Cancer Genome Atlas (TCGA) lung adenocarcinoma cohort. Only patients with nonsquamous NSCLC with clinical annotation were included. Samples were from 2414 patients with nonsquamous NSCLC, divided into a meta-training cohort (729 patients), meta-testing cohort (716 patients), and 3 independent validation cohorts (439, 323, and 207 patients). All patients underwent surgery with a negative surgical margin, received no adjuvant or neoadjuvant therapy, and had publicly available gene expression data and survival information. Data were collected from July 22 through September 8, 2016. Overall survival. Of 2414 patients (1205 men [50%], 1111 women [46%], and 98 of unknown sex [4%]; median age [range], 64 [15-90] years), a prognostic immune signature of 25 gene pairs consisting of 40 unique genes was constructed using the meta-training data set. In the meta-testing and validation cohorts, the immune signature significantly stratified patients into high- vs low-risk groups in terms of overall survival across and within subpopulations with stage I, IA, IB, or II disease and remained as an independent prognostic factor in multivariate analyses (hazard ratio range, 1.72 [95% CI, 1.26-2.33; P < .001] to 2.36 [95% CI, 1.47-3.79; P < .001]) after adjusting for clinical and pathologic factors. Several biological processes, including chemotaxis, were enriched among genes in the immune signature. The percentage of neutrophil infiltration (5.6% vs 1.8%) and necrosis (4.6% vs 1.5%) was significantly higher in the high-risk immune group compared with the low-risk groups in TCGA data set (P < .003). The immune signature achieved a higher accuracy (mean concordance index [C-index], 0.64) than 2 commercialized multigene signatures (mean C-index, 0.53 and 0.61) for estimation of survival in comparable validation cohorts. When integrated with clinical characteristics such as age and stage, the composite clinical and immune signature showed improved prognostic accuracy in all validation data sets relative to molecular signatures alone (mean C-index, 0.70 vs 0.63) and another commercialized clinical-molecular signature (mean C-index, 0.68 vs 0.65). The proposed clinical-immune signature is a promising biomarker for estimating overall survival in nonsquamous NSCLC, including early-stage disease. Prospective studies are needed to test the clinical utility of the biomarker in individualized management of nonsquamous NSCLC.

  15. Development and validation of broad-range qualitative and clade-specific quantitative molecular probes for assessing mercury methylation in the environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Christensen, Geoff A.; Wymore, Ann M.; King, Andrew J.

    Two genes, hgcA and hgcB, are essential for microbial mercury (Hg)-methylation. Detection and estimation of their abundance, in conjunction with Hg concentration, bioavailability and biogeochemistry is critical in determining potential hot spots of methylmercury (MeHg) generation in at-risk environments. We developed broad-range degenerate PCR primers spanning known hgcAB genes to determine the presence of both genes in diverse environments. These primers were tested against an extensive set of pure cultures with published genomes, including 13 Deltaproteobacteria, nine Firmicutes, and nine methanogenic Archaea. A distinct PCR product at the expected size was confirmed for all hgcAB+ strains tested via Sanger sequencing.more » Additionally, we developed clade-specific degenerate quantitative primers (qPCR) that targeted hgcA for each of the three dominant Hg-methylating clades. The clade-specific qPCR primers amplified hgcA from 64%, 88% and 86% of tested pure cultures of Deltaproteobacteria, Firmicutes and Archaea, respectively, and were highly specific for each clade. Amplification efficiencies and detection limits were quantified for each organism. Primer sensitivity varied among species based on sequence conservation. Finally, to begin to evaluate the utility of our primer sets in nature, we tested hgcA and hgcAB recovery from pure cultures spiked into sand and soil. These novel quantitative molecular tools designed in this study will allow for more accurate identification and quantification of the individual Hg-methylating groups of microorganisms in the environment. Here, the resulting data will be essential in developing accurate and robust predictive models of Hg-methylation potential, ideally integrating the geochemistry of Hg methylation to the microbiology and genetics of hgcAB.« less

  16. Development and validation of broad-range qualitative and clade-specific quantitative molecular probes for assessing mercury methylation in the environment

    DOE PAGES

    Christensen, Geoff A.; Wymore, Ann M.; King, Andrew J.; ...

    2016-07-15

    Two genes, hgcA and hgcB, are essential for microbial mercury (Hg)-methylation. Detection and estimation of their abundance, in conjunction with Hg concentration, bioavailability and biogeochemistry is critical in determining potential hot spots of methylmercury (MeHg) generation in at-risk environments. We developed broad-range degenerate PCR primers spanning known hgcAB genes to determine the presence of both genes in diverse environments. These primers were tested against an extensive set of pure cultures with published genomes, including 13 Deltaproteobacteria, nine Firmicutes, and nine methanogenic Archaea. A distinct PCR product at the expected size was confirmed for all hgcAB+ strains tested via Sanger sequencing.more » Additionally, we developed clade-specific degenerate quantitative primers (qPCR) that targeted hgcA for each of the three dominant Hg-methylating clades. The clade-specific qPCR primers amplified hgcA from 64%, 88% and 86% of tested pure cultures of Deltaproteobacteria, Firmicutes and Archaea, respectively, and were highly specific for each clade. Amplification efficiencies and detection limits were quantified for each organism. Primer sensitivity varied among species based on sequence conservation. Finally, to begin to evaluate the utility of our primer sets in nature, we tested hgcA and hgcAB recovery from pure cultures spiked into sand and soil. These novel quantitative molecular tools designed in this study will allow for more accurate identification and quantification of the individual Hg-methylating groups of microorganisms in the environment. Here, the resulting data will be essential in developing accurate and robust predictive models of Hg-methylation potential, ideally integrating the geochemistry of Hg methylation to the microbiology and genetics of hgcAB.« less

  17. Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

    PubMed

    Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

    2014-01-01

    Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

  18. Axonal guidance signaling pathway interacting with smoking in modifying the risk of pancreatic cancer: a gene- and pathway-based interaction analysis of GWAS data.

    PubMed

    Tang, Hongwei; Wei, Peng; Duell, Eric J; Risch, Harvey A; Olson, Sara H; Bueno-de-Mesquita, H Bas; Gallinger, Steven; Holly, Elizabeth A; Petersen, Gloria; Bracci, Paige M; McWilliams, Robert R; Jenab, Mazda; Riboli, Elio; Tjønneland, Anne; Boutron-Ruault, Marie Christine; Kaaks, Rudolph; Trichopoulos, Dimitrios; Panico, Salvatore; Sund, Malin; Peeters, Petra H M; Khaw, Kay-Tee; Amos, Christopher I; Li, Donghui

    2014-05-01

    Cigarette smoking is the best established modifiable risk factor for pancreatic cancer. Genetic factors that underlie smoking-related pancreatic cancer have previously not been examined at the genome-wide level. Taking advantage of the existing Genome-wide association study (GWAS) genotype and risk factor data from the Pancreatic Cancer Case Control Consortium, we conducted a discovery study in 2028 cases and 2109 controls to examine gene-smoking interactions at pathway/gene/single nucleotide polymorphism (SNP) level. Using the likelihood ratio test nested in logistic regression models and ingenuity pathway analysis (IPA), we examined 172 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, 3 manually curated gene sets, 3 nicotine dependency gene ontology pathways, 17 912 genes and 468 114 SNPs. None of the individual pathway/gene/SNP showed significant interaction with smoking after adjusting for multiple comparisons. Six KEGG pathways showed nominal interactions (P < 0.05) with smoking, and the top two are the pancreatic secretion and salivary secretion pathways (major contributing genes: RAB8A, PLCB and CTRB1). Nine genes, i.e. ZBED2, EXO1, PSG2, SLC36A1, CLSTN1, MTHFSD, FAT2, IL10RB and ATXN2 had P interaction < 0.0005. Five intergenic region SNPs and two SNPs of the EVC and KCNIP4 genes had P interaction < 0.00003. In IPA analysis of genes with nominal interactions with smoking, axonal guidance signaling $$\\left(P=2.12\\times 1{0}^{-7}\\right)$$ and α-adrenergic signaling $$\\left(P=2.52\\times 1{0}^{-5}\\right)$$ genes were significantly overrepresented canonical pathways. Genes contributing to the axon guidance signaling pathway included the SLIT/ROBO signaling genes that were frequently altered in pancreatic cancer. These observations need to be confirmed in additional data set. Once confirmed, it will open a new avenue to unveiling the etiology of smoking-associated pancreatic cancer.

  19. Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes

    PubMed Central

    Liu, Kuan-Liang; Porras-Alfaro, Andrea; Eichorst, Stephanie A.

    2012-01-01

    Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp). PMID:22194300

  20. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

    PubMed

    Dwivedi, Bhakti; Kowalski, Jeanne

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.

  1. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations

    PubMed Central

    Dwivedi, Bhakti

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010

  2. Evidence for gene-gene epistatic interactions among susceptibility loci for systemic lupus erythematosus.

    PubMed

    Hughes, Travis; Adler, Adam; Kelly, Jennifer A; Kaufman, Kenneth M; Williams, Adrienne H; Langefeld, Carl D; Brown, Elizabeth E; Alarcón, Graciela S; Kimberly, Robert P; Edberg, Jeffrey C; Ramsey-Goldman, Rosalind; Petri, Michelle; Boackle, Susan A; Stevens, Anne M; Reveille, John D; Sanchez, Elena; Martín, Javier; Niewold, Timothy B; Vilá, Luis M; Scofield, R Hal; Gilkeson, Gary S; Gaffney, Patrick M; Criswell, Lindsey A; Moser, Kathy L; Merrill, Joan T; Jacob, Chaim O; Tsao, Betty P; James, Judith A; Vyse, Timothy J; Alarcón-Riquelme, Marta E; Harley, John B; Richardson, Bruce C; Sawalha, Amr H

    2012-02-01

    Several confirmed genetic susceptibility loci for lupus have been described. To date, no clear evidence for genetic epistasis in lupus has been established. The aim of this study was to test for gene-gene interactions in a number of known lupus susceptibility loci. Eighteen single-nucleotide polymorphisms tagging independent and confirmed lupus susceptibility loci were genotyped in a set of 4,248 patients with lupus and 3,818 normal healthy control subjects of European descent. Epistasis was tested by a 2-step approach using both parametric and nonparametric methods. The false discovery rate (FDR) method was used to correct for multiple testing. We detected and confirmed gene-gene interactions between the HLA region and CTLA4, IRF5, and ITGAM and between PDCD1 and IL21 in patients with lupus. The most significant interaction detected by parametric analysis was between rs3131379 in the HLA region and rs231775 in CTLA4 (interaction odds ratio 1.19, Z = 3.95, P = 7.8 × 10(-5) [FDR ≤0.05], P for multifactor dimensionality reduction = 5.9 × 10(-45)). Importantly, our data suggest that in patients with lupus, the presence of the HLA lupus risk alleles in rs1270942 and rs3131379 increases the odds of also carrying the lupus risk allele in IRF5 (rs2070197) by 17% and 16%, respectively (P = 0.0028 and P = 0.0047, respectively). We provide evidence for gene-gene epistasis in systemic lupus erythematosus. These findings support a role for genetic interaction contributing to the complexity of lupus heritability. Copyright © 2012 by the American College of Rheumatology.

  3. Gene expression markers of age-related inflammation in two human cohorts.

    PubMed

    Pilling, Luke C; Joehanes, Roby; Melzer, David; Harries, Lorna W; Henley, William; Dupuis, Josée; Lin, Honghuang; Mitchell, Marcus; Hernandez, Dena; Ying, Sai-Xia; Lunetta, Kathryn L; Benjamin, Emelia J; Singleton, Andrew; Levy, Daniel; Munson, Peter; Murabito, Joanne M; Ferrucci, Luigi

    2015-10-01

    Chronically elevated circulating inflammatory markers are common in older persons but mechanisms are unclear. Many blood transcripts (>800 genes) are associated with interleukin-6 protein levels (IL6) independent of age. We aimed to identify gene transcripts statistically mediating, as drivers or responders, the increasing levels of IL6 protein in blood at older ages. Blood derived in-vivo RNA from the Framingham Heart Study (FHS, n=2422, ages 40-92 yrs) and InCHIANTI study (n=694, ages 30-104 yrs), with Affymetrix and Illumina expression arrays respectively (>17,000 genes tested), were tested for statistical mediation of the age-IL6 association using resampling techniques, adjusted for confounders and multiple testing. In FHS, IL6 expression was not associated with IL6 protein levels in blood. 102 genes (0.6% of 17,324 expressed) statistically mediated the age-IL6 association of which 25 replicated in InCHIANTI (including 5 of the 10 largest effect genes). The largest effect gene (SLC4A10, coding for NCBE, a sodium bicarbonate transporter) mediated 19% (adjusted CI 8.9 to 34.1%) and replicated by PCR in InCHIANTI (n=194, 35.6% mediated, p=0.01). Other replicated mediators included PRF1 (perforin, a cytolytic protein in cytotoxic T lymphocytes and NK cells) and IL1B (Interleukin 1 beta): few other cytokines were significant mediators. This transcriptome-wide study on human blood identified a small distinct set of genes that statistically mediate the age-IL6 association. Findings are robust across two cohorts and different expression technologies. Raised IL6 levels may not derive from circulating white cells in age related inflammation. Published by Elsevier Inc.

  4. Rapid and visual detection of Leptospira in urine by LigB-LAMP assay with pre-addition of dye.

    PubMed

    Ali, Syed Atif; Kaur, Gurpreet; Boby, Nongthombam; Sabarinath, T; Solanki, Khushal; Pal, Dheeraj; Chaudhuri, Pallab

    2017-12-01

    Leptospirosis is considered to be the most widespread zoonotic disease caused by pathogenic species of Leptospira. The present study reports a novel set of primers targeting LigB gene for visual detection of pathogenic Leptospira in urine samples through Loop-mediated isothermal amplification (LAMP). The results were recorded by using Hydroxyl napthol blue (HNB), SYBR GREEN I and calcein. Analytical sensitivity of LAMP was as few as 10 leptospiral organisms in spiked urine samples from cattle and dog. LigB gene based LAMP, termed as LigB-LAMP, was found 10 times more sensitive than conventional PCR. The diagnostic specificity of LAMP was 100% when compared to SYBR green qPCR for detection of Leptospira in urine samples. Though qPCR was found more sensitive, the rapidity and simplicity in setting LAMP test followed by visual detection of Leptospira infection in clinical samples makes LigB-LAMP an alternative and favourable diagnostic tool in resource poor setting. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Functional gene groups are concentrated within chromosomes, among chromosomes and in the nuclear space of the human genome.

    PubMed

    Thévenin, Annelyse; Ein-Dor, Liat; Ozery-Flato, Michal; Shamir, Ron

    2014-09-01

    Genomes undergo changes in organization as a result of gene duplications, chromosomal rearrangements and local mutations, among other mechanisms. In contrast to prokaryotes, in which genes of a common function are often organized in operons and reside contiguously along the genome, most eukaryotes show much weaker clustering of genes by function, except for few concrete functional groups. We set out to check systematically if there is a relation between gene function and gene organization in the human genome. We test this question for three types of functional groups: pairs of interacting proteins, complexes and pathways. We find a significant concentration of functional groups both in terms of their distance within the same chromosome and in terms of their dispersal over several chromosomes. Moreover, using Hi-C contact map of the tendency of chromosomal segments to appear close in the 3D space of the nucleus, we show that members of the same functional group that reside on distinct chromosomes tend to co-localize in space. The result holds for all three types of functional groups that we tested. Hence, the human genome shows substantial concentration of functional groups within chromosomes and across chromosomes in space. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. An independent validation of a gene expression signature to differentiate malignant melanoma from benign melanocytic nevi.

    PubMed

    Clarke, Loren E; Flake, Darl D; Busam, Klaus; Cockerell, Clay; Helm, Klaus; McNiff, Jennifer; Reed, Jon; Tschen, Jaime; Kim, Jinah; Barnhill, Raymond; Elenitsas, Rosalie; Prieto, Victor G; Nelson, Jonathan; Kimbrell, Hillary; Kolquist, Kathryn A; Brown, Krystal L; Warf, M Bryan; Roa, Benjamin B; Wenstrup, Richard J

    2017-02-15

    Recently, a 23-gene signature was developed to produce a melanoma diagnostic score capable of differentiating malignant and benign melanocytic lesions. The primary objective of this study was to independently assess the ability of the gene signature to differentiate melanoma from benign nevi in clinically relevant lesions. A set of 1400 melanocytic lesions was selected from samples prospectively submitted for gene expression testing at a clinical laboratory. Each sample was tested and subjected to an independent histopathologic evaluation by 3 experienced dermatopathologists. A primary diagnosis (benign or malignant) was assigned to each sample, and diagnostic concordance among the 3 dermatopathologists was required for inclusion in analyses. The sensitivity and specificity of the score in differentiating benign and malignant melanocytic lesions were calculated to assess the association between the score and the pathologic diagnosis. The gene expression signature differentiated benign nevi from malignant melanoma with a sensitivity of 91.5% and a specificity of 92.5%. These results reflect the performance of the gene signature in a diverse array of samples encountered in routine clinical practice. Cancer 2017;123:617-628. © 2016 American Cancer Society. © 2016 Myriad Genetics, Inc. Cancer published by Wiley Periodicals, Inc. on behalf of American Cancer Society.

  7. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    PubMed

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  8. Human growth is associated with distinct patterns of gene expression in evolutionarily conserved networks

    PubMed Central

    2013-01-01

    Background A co-ordinated tissue-independent gene expression profile associated with growth is present in rodent models and this is hypothesised to extend to all mammals. Growth in humans has similarities to other mammals but the return to active long bone growth in the pubertal growth spurt is a distinctly human growth event. The aim of this study was to describe gene expression and biological pathways associated with stages of growth in children and to assess tissue-independent expression patterns in relation to human growth. Results We conducted gene expression analysis on a library of datasets from normal children with age annotation, collated from the NCBI Gene Expression Omnibus (GEO) and EBI Arrayexpress databases. A primary data set was generated using cells of lymphoid origin from normal children; the expression of 688 genes (ANOVA false discovery rate modified p-value, q < 0.1) was associated with age, and subsets of these genes formed clusters that correlated with the phases of growth – infancy, childhood, puberty and final height. Network analysis on these clusters identified evolutionarily conserved growth pathways (NOTCH, VEGF, TGFB, WNT and glucocorticoid receptor – Hyper-geometric test, q < 0.05). The greatest degree of network ‘connectivity’ and hence functional significance was present in infancy (Wilcoxon test, p < 0.05), which then decreased through to adulthood. These observations were confirmed in a separate validation data set from lymphoid tissue. Similar biological pathways were observed to be associated with development-related gene expression in other tissues (conjunctival epithelia, temporal lobe brain tissue and bone marrow) suggesting the existence of a tissue-independent genetic program for human growth and maturation. Conclusions Similar evolutionarily conserved pathways have been associated with gene expression and child growth in multiple tissues. These expression profiles associate with the developmental phases of growth including the return to active long bone growth in puberty, a distinctly human event. These observations also have direct medical relevance to pathological changes that induce disease in children. Taking into account development-dependent gene expression profiles for normal children will be key to the appropriate selection of genes and pathways as potential biomarkers of disease or as drug targets. PMID:23941278

  9. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

    PubMed

    Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

    2013-03-15

    The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.

  10. Meta-Analysis of Transcriptome Data Related to Hippocampus Biopsies and iPSC-Derived Neuronal Cells from Alzheimer's Disease Patients Reveals an Association with FOXA1 and FOXA2 Gene Regulatory Networks.

    PubMed

    Wruck, Wasco; Schröter, Friederike; Adjaye, James

    2016-01-01

    Although the incidence of Alzheimer's disease (AD) is continuously increasing in the aging population worldwide, effective therapies are not available. The interplay between causative genetic and environmental factors is partially understood. Meta-analyses have been performed on aspects such as polymorphisms, cytokines, and cognitive training. Here, we propose a meta-analysis approach based on hierarchical clustering analysis of a reliable training set of hippocampus biopsies, which is condensed to a gene expression signature. This gene expression signature was applied to various test sets of brain biopsies and iPSC-derived neuronal cell models to demonstrate its ability to distinguish AD samples from control. Thus, our identified AD-gene signature may form the basis for determination of biomarkers that are urgently needed to overcome current diagnostic shortfalls. Intriguingly, the well-described AD-related genes APP and APOE are not within the signature because their gene expression profiles show a lower correlation to the disease phenotype than genes from the signature. This is in line with the differing characteristics of the disease as early-/late-onset or with/without genetic predisposition. To investigate the gene signature's systemic role(s), signaling pathways, gene ontologies, and transcription factors were analyzed which revealed over-representation of response to stress, regulation of cellular metabolic processes, and reactive oxygen species. Additionally, our results clearly point to an important role of FOXA1 and FOXA2 gene regulatory networks in the etiology of AD. This finding is in corroboration with the recently reported major role of the dopaminergic system in the development of AD and its regulation by FOXA1 and FOXA2.

  11. GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.

    PubMed

    Subhash, Santhilal; Kanduri, Chandrasekhar

    2016-09-13

    High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.

  12. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

    PubMed

    Mallik, Saurav; Zhao, Zhongming

    2017-12-28

    For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  13. Gene selection for tumor classification using neighborhood rough sets and entropy measures.

    PubMed

    Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu

    2017-03-01

    With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Ion channel gene expression predicts survival in glioma patients

    PubMed Central

    Wang, Rong; Gurguis, Christopher I.; Gu, Wanjun; Ko, Eun A; Lim, Inja; Bang, Hyoweon; Zhou, Tong; Ko, Jae-Hong

    2015-01-01

    Ion channels are important regulators in cell proliferation, migration, and apoptosis. The malfunction and/or aberrant expression of ion channels may disrupt these important biological processes and influence cancer progression. In this study, we investigate the expression pattern of ion channel genes in glioma. We designate 18 ion channel genes that are differentially expressed in high-grade glioma as a prognostic molecular signature. This ion channel gene expression based signature predicts glioma outcome in three independent validation cohorts. Interestingly, 16 of these 18 genes were down-regulated in high-grade glioma. This signature is independent of traditional clinical, molecular, and histological factors. Resampling tests indicate that the prognostic power of the signature outperforms random gene sets selected from human genome in all the validation cohorts. More importantly, this signature performs better than the random gene signatures selected from glioma-associated genes in two out of three validation datasets. This study implicates ion channels in brain cancer, thus expanding on knowledge of their roles in other cancers. Individualized profiling of ion channel gene expression serves as a superior and independent prognostic tool for glioma patients. PMID:26235283

  15. Curated eutherian third party data gene data sets.

    PubMed

    Premzl, Marko

    2016-03-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.

  16. Transcriptional responses in thyroid tissues from rats treated with a tumorigenic and a non-tumorigenic triazole conazole fungicide.

    PubMed

    Hester, Susan D; Nesnow, Stephen

    2008-03-15

    Conazoles are azole-containing fungicides that are used in agriculture and medicine. Conazoles can induce follicular cell adenomas of the thyroid in rats after chronic bioassay. The goal of this study was to identify pathways and networks of genes that were associated with thyroid tumorigenesis through transcriptional analyses. To this end, we compared transcriptional profiles from tissues of rats treated with a tumorigenic and a non-tumorigenic conazole. Triadimefon, a rat thyroid tumorigen, and myclobutanil, which was not tumorigenic in rats after a 2-year bioassay, were administered in the feed to male Wistar/Han rats for 30 or 90 days similar to the treatment conditions previously used in their chronic bioassays. Thyroid gene expression was determined using high density Affymetrix GeneChips (Rat 230_2). Gene expression was analyzed by the Gene Set Expression Analyses method which clearly separated the tumorigenic treatments (tumorigenic response group (TRG)) from the non-tumorigenic treatments (non-tumorigenic response group (NRG)). Core genes from these gene sets were mapped to canonical, metabolic, and GeneGo processes and these processes compared across group and treatment time. Extensive analyses were performed on the 30-day gene sets as they represented the major perturbations. Gene sets in the 30-day TRG group had over representation of fatty acid metabolism, oxidation, and degradation processes (including PPARgamma and CYP involvement), and of cell proliferation responses. Core genes from these gene sets were combined into networks and found to possess signaling interactions. In addition, the core genes in each gene set were compared with genes known to be associated with human thyroid cancer. Among the genes that appeared in both rat and human data sets were: Acaca, Asns, Cebpg, Crem, Ddit3, Gja1, Grn, Jun, Junb, and Vegf. These genes were major contributors in the previously developed network from triadimefon-treated rat thyroids. It is postulated that triadimefon induces oxidative response genes and activates the nuclear receptor, Ppargamma, initiating transcription of gene products and signaling to a series of genes involved in cell proliferation.

  17. Impact of Gene Patents and Licensing Practices on Access to Genetic Testing for Inherited Susceptibility to Cancer: Comparing Breast and Ovarian Cancers to Colon Cancers

    PubMed Central

    Cook-Deegan, Robert; DeRienzo, Christopher; Carbone, Julia; Chandrasekharan, Subhashini; Heaney, Christopher; Conover, Christopher

    2011-01-01

    Genetic testing for inherited susceptibility to breast and ovarian cancer can be compared to similar testing for colorectal cancer as a “natural experiment.” Inherited susceptibility accounts for a similar fraction of both cancers and genetic testing results guide decisions about options for prophylactic surgery in both sets of conditions. One major difference is that in the United States, Myriad Genetics is the sole provider of genetic testing, because it has sole control of relevant patents for BRCA1 and BRCA2 genes whereas genetic testing for familial colorectal cancer is available from multiple laboratories. Colorectal cancer-associated genes are also patented, but they have been nonexclusively licensed. Prices for BRCA1 and 2 testing do not reflect an obvious price premium attributable to exclusive patent rights compared to colorectal cancer testing, and indeed Myriad’s per unit costs are somewhat lower for BRCA1/2 testing than testing for colorectal cancer susceptibility. Myriad has not enforced patents against basic research, and negotiated a Memorandum of Understanding with the National Cancer Institute in 1999 for institutional BRCA testing in clinical research. The main impact of patenting and licensing in BRCA compared to colorectal cancer is the business model of genetic testing, with a sole provider for BRCA and multiple laboratories for colorectal cancer genetic testing. Myriad’s sole provider model has not worked in jurisdictions outside the United States, largely because of differences in breadth of patent protection, responses of government health services, and difficulty in patent enforcement. PMID:20393305

  18. Identification of modulators of the nuclear receptor peroxisome proliferator-activated receptor α (PPARα) in a mouse liver gene expression compendium.

    PubMed

    Oshida, Keiyu; Vasani, Naresh; Thomas, Russell S; Applegate, Dawn; Rosen, Mitch; Abbott, Barbara; Lau, Christopher; Guo, Grace; Aleksunes, Lauren M; Klaassen, Curtis; Corton, J Christopher

    2015-01-01

    The nuclear receptor family member peroxisome proliferator-activated receptor α (PPARα) is activated by therapeutic hypolipidemic drugs and environmentally-relevant chemicals to regulate genes involved in lipid transport and catabolism. Chronic activation of PPARα in rodents increases liver cancer incidence, whereas suppression of PPARα activity leads to hepatocellular steatosis. Analytical approaches were developed to identify biosets (i.e., gene expression differences between two conditions) in a genomic database in which PPARα activity was altered. A gene expression signature of 131 PPARα-dependent genes was built using microarray profiles from the livers of wild-type and PPARα-null mice after exposure to three structurally diverse PPARα activators (WY-14,643, fenofibrate and perfluorohexane sulfonate). A fold-change rank-based test (Running Fisher's test (p-value ≤ 10(-4))) was used to evaluate the similarity between the PPARα signature and a test set of 48 and 31 biosets positive or negative, respectively for PPARα activation; the test resulted in a balanced accuracy of 98%. The signature was then used to identify factors that activate or suppress PPARα in an annotated mouse liver/primary hepatocyte gene expression compendium of ~1850 biosets. In addition to the expected activation of PPARα by fibrate drugs, di(2-ethylhexyl) phthalate, and perfluorinated compounds, PPARα was activated by benzofuran, galactosamine, and TCDD and suppressed by hepatotoxins acetaminophen, lipopolysaccharide, silicon dioxide nanoparticles, and trovafloxacin. Additional factors that activate (fasting, caloric restriction) or suppress (infections) PPARα were also identified. This study 1) developed methods useful for future screening of environmental chemicals, 2) identified chemicals that activate or suppress PPARα, and 3) identified factors including diets and infections that modulate PPARα activity and would be hypothesized to affect chemical-induced PPARα activity.

  19. Identification of Modulators of the Nuclear Receptor Peroxisome Proliferator-Activated Receptor α (PPARα) in a Mouse Liver Gene Expression Compendium

    PubMed Central

    Oshida, Keiyu; Vasani, Naresh; Thomas, Russell S.; Applegate, Dawn; Rosen, Mitch; Abbott, Barbara; Lau, Christopher; Guo, Grace; Aleksunes, Lauren M.; Klaassen, Curtis; Corton, J. Christopher

    2015-01-01

    The nuclear receptor family member peroxisome proliferator-activated receptor α (PPARα) is activated by therapeutic hypolipidemic drugs and environmentally-relevant chemicals to regulate genes involved in lipid transport and catabolism. Chronic activation of PPARα in rodents increases liver cancer incidence, whereas suppression of PPARα activity leads to hepatocellular steatosis. Analytical approaches were developed to identify biosets (i.e., gene expression differences between two conditions) in a genomic database in which PPARα activity was altered. A gene expression signature of 131 PPARα-dependent genes was built using microarray profiles from the livers of wild-type and PPARα-null mice after exposure to three structurally diverse PPARα activators (WY-14,643, fenofibrate and perfluorohexane sulfonate). A fold-change rank-based test (Running Fisher’s test (p-value ≤ 10-4)) was used to evaluate the similarity between the PPARα signature and a test set of 48 and 31 biosets positive or negative, respectively for PPARα activation; the test resulted in a balanced accuracy of 98%. The signature was then used to identify factors that activate or suppress PPARα in an annotated mouse liver/primary hepatocyte gene expression compendium of ~1850 biosets. In addition to the expected activation of PPARα by fibrate drugs, di(2-ethylhexyl) phthalate, and perfluorinated compounds, PPARα was activated by benzofuran, galactosamine, and TCDD and suppressed by hepatotoxins acetaminophen, lipopolysaccharide, silicon dioxide nanoparticles, and trovafloxacin. Additional factors that activate (fasting, caloric restriction) or suppress (infections) PPARα were also identified. This study 1) developed methods useful for future screening of environmental chemicals, 2) identified chemicals that activate or suppress PPARα, and 3) identified factors including diets and infections that modulate PPARα activity and would be hypothesized to affect chemical-induced PPARα activity. PMID:25689681

  20. A genetic study of the ghrelin and growth hormone secretagogue receptor (GHSR) genes and stature.

    PubMed

    Gueorguiev, M; Lecoeur, C; Benzinou, M; Mein, C A; Meyre, D; Vatin, V; Weill, J; Heude, B; Grossman, A B; Froguel, P; Korbonits, M

    2009-01-01

    Growth and nutrition are interrelated and influenced by multiple genetic and environmental factors. We studied whether common variants in ghrelin and ghrelin receptor (GHSR) genes could play a role in stature variation in the general population and in families ascertained for obesity. Selected tagging SNPs in the ghrelin and GHSR genes were genotyped in 263 Caucasian families recruited for childhood obesity (1,275 subjects), and in 287 families from a general population (1,072 subjects). We performed familial testing for associations in the entire population and in a sub-set of the samples selected for a case-control study. In the case-control study for height (cases were selected from the obese cohort with mean ZH = 3.17 +/- 0.15 confidence interval (CI) versus controls with mean ZH 0.14 +/- 0.09), we found an association with a 2 base-pair intronic deletion in the GHSR gene (rs10618418) (p = 0.006, odds ratio (OR) 1.86, 95% CI [1.26;2.74] under additive model), although when adjusting for BMI, the association disappeared (p = 0.06). Individuals carrying no deletion or who were heterozygous were significantly more frequent among the tall obese population (52% vs. 36% in controls, p = 0.007, OR 1.97, 95%CI [1.22;3.18]). However, the association was not maintained after correcting for multiple testing. Familial association testing of the ghrelin and GHSR genes and their interaction testing failed to show that any combination of SNPs had any significant effect. Thus, our results suggest that common variants of the ghrelin and GHSR genes are not major contributors to height variation in a French population.

  1. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

    PubMed

    Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

    2007-06-01

    Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

  2. Global Expression Profiling in Atopic Eczema Reveals Reciprocal Expression of Inflammatory and Lipid Genes

    PubMed Central

    Sääf, Annika M.; Tengvall-Linder, Maria; Chang, Howard Y.; Adler, Adam S.; Wahlgren, Carl-Fredrik; Scheynius, Annika; Nordenskjöld, Magnus; Bradley, Maria

    2008-01-01

    Background Atopic eczema (AE) is a common chronic inflammatory skin disorder. In order to dissect the genetic background several linkage and genetic association studies have been performed. Yet very little is known about specific genes involved in this complex skin disease, and the underlying molecular mechanisms are not fully understood. Methodology/Findings We used human DNA microarrays to identify a molecular picture of the programmed responses of the human genome to AE. The transcriptional program was analyzed in skin biopsy samples from lesional and patch-tested skin from AE patients sensitized to Malassezia sympodialis (M. sympodialis), and corresponding biopsies from healthy individuals. The most notable feature of the global gene-expression pattern observed in AE skin was a reciprocal expression of induced inflammatory genes and repressed lipid metabolism genes. The overall transcriptional response in M. sympodialis patch-tested AE skin was similar to the gene-expression signature identified in lesional AE skin. In the constellation of genes differentially expressed in AE skin compared to healthy control skin, we have identified several potential susceptibility genes that may play a critical role in the pathological condition of AE. Many of these genes, including genes with a role in immune responses, lipid homeostasis, and epidermal differentiation, are localized on chromosomal regions previously linked to AE. Conclusions/Significance Through genome-wide expression profiling, we were able to discover a distinct reciprocal expression pattern of induced inflammatory genes and repressed lipid metabolism genes in skin from AE patients. We found a significant enrichment of differentially expressed genes in AE with cytobands associated to the disease, and furthermore new chromosomal regions were found that could potentially guide future region-specific linkage mapping in AE. The full data set is available at http://microarray-pubs.stanford.edu/eczema. PMID:19107207

  3. Case-based retrieval framework for gene expression data.

    PubMed

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J

    2015-01-01

    The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.

  4. Effect of feed supplementation with live yeast on the intestinal transcriptome profile of weaning pigs orally challenged with Escherichia coli F4.

    PubMed

    Trevisi, P; Latorre, R; Priori, D; Luise, D; Archetti, I; Mazzoni, M; D'Inca, R; Bosi, P

    2017-01-01

    The ability of live yeasts to modulate pig intestinal cell signals in response to infection with Escherichia coli F4ac (ETEC) has not been studied in-depth. The aim of this trial was to evaluate the effect of Saccharomyces cerevisiae CNCM I-4407 (Sc), supplied at different times, on the transcriptome profile of the jejunal mucosa of pigs 24 h after infection with ETEC. In total, 20 piglets selected to be ETEC-susceptible were weaned at 24 days of age (day 0) and allotted by litter to one of following groups: control (CO), CO+colistin (AB), CO+5×1010 colony-forming unit (CFU) Sc/kg feed, from day 0 (PR) and CO+5×1010 CFU Sc/kg feed from day 7 (CM). On day 7, the pigs were orally challenged with ETEC and were slaughtered 24 h later after blood sampling for haptoglobin (Hp) and C-reactive protein (CRP) determination. The jejunal mucosa was sampled (1) for morphometry; (2) for quantification of proliferation, apoptosis and zonula occludens (ZO-1); (3) to carry out the microarray analysis. A functional analysis was carried out using Gene Set Enrichment Analysis. The normalized enrichment score (NES) was calculated for each gene set, and statistical significance was defined when the False Discovery Rate % was <25 and P-values of NES were <0.05. The blood concentration of CRP and Hp, and the score for ZO-1 integrity on the jejunal villi did not differ between groups. The intestinal crypts were deeper in the AB (P=0.05) and the yeast groups (P<0.05) than in the CO group. Antibiotic treatment increased the number of mitotic cells in intestinal villi as compared with the control group (P<0.05). The PR group tended to increase the mitotic cells in villi and crypts and tended to reduce the cells in apoptosis as compared with the CM group. The transcriptome profiles of the AB and PR groups were similar. In both groups, the gene sets involved in mitosis and in mitochondria development ranked the highest, whereas in the CO group, the gene sets related to cell junction and anion channels were affected. In the CM group, the gene sets linked to the metabolic process, and transcription ranked the highest; a gene set linked with a negative effect on growth was also affected. In conclusion, the constant supplementation in the feed with the strain of yeast tested was effective in counteracting the detrimental effect of ETEC infection in susceptible pigs limits the early activation of the gene sets related to the impairment of the jejunal mucosa.

  5. Consistency of gene starts among Burkholderia genomes

    PubMed Central

    2011-01-01

    Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528

  6. Genome-wide association study for rotator cuff tears identifies two significant single-nucleotide polymorphisms.

    PubMed

    Tashjian, Robert Z; Granger, Erin K; Farnham, James M; Cannon-Albright, Lisa A; Teerlink, Craig C

    2016-02-01

    The precise etiology of rotator cuff disease is unknown, but prior evidence suggests a role for genetic factors. Limited data exist identifying specific genes associated with rotator cuff tearing. The purpose of this study was to identify specific genes or genetic variants associated with rotator cuff tearing by a genome-wide association study with an independent set of rotator cuff tear cases. A set of 311 full-thickness rotator cuff tear cases genotyped on the Illumina 5M single-nucleotide polymorphism (SNP) platform were used in a genome-wide association study with 2641 genetically matched white population controls available from the Illumina iControls database. Tests of association were performed with GEMMA software at 257,558 SNPs that compose the intersection of Illumina SNP platforms and that passed general quality control metrics. SNPs were considered significant if P < 1.94 × 10(-7) (Bonferroni correction: 0.05/257,558). Tests of association revealed 2 significantly associated SNPs, one occurring in SAP30BP (rs820218; P = 3.8E-9) on chromosome 17q25 and another occurring in SASH1 (rs12527089; P = 1.9E-7) on chromosome 6q24. This study represents the first attempt to identify genetic factors influencing rotator cuff tearing by a genome-wide association study using a dense/complete set of SNPs. Two SNPs were significantly associated with rotator cuff tearing, residing in SAP30BP on chromosome 17 and SASH1 on chromosome 6. Both genes are associated with the cellular process of apoptosis. Identification of potential genes or genetic variants associated with rotator cuff tearing may help in identifying individuals at risk for the development of rotator cuff tearing. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.

  7. Gene finding in metatranscriptomic sequences.

    PubMed

    Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

    2014-01-01

    Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.

  8. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  9. CYP1A1, GCLC, AGT, AGTR1 gene-gene interactions in community-acquired pneumonia pulmonary complications.

    PubMed

    Salnikova, Lyubov E; Smelaya, Tamara V; Golubev, Arkadiy M; Rubanovich, Alexander V; Moroz, Viktor V

    2013-11-01

    This study was conducted to establish the possible contribution of functional gene polymorphisms in detoxification/oxidative stress and vascular remodeling pathways to community-acquired pneumonia (CAP) susceptibility in the case-control study (350 CAP patients, 432 control subjects) and to predisposition to the development of CAP complications in the prospective study. All subjects were genotyped for 16 polymorphic variants in the 14 genes of xenobiotics detoxification CYP1A1, AhR, GSTM1, GSTT1, ABCB1, redox-status SOD2, CAT, GCLC, and vascular homeostasis ACE, AGT, AGTR1, NOS3, MTHFR, VEGFα. Risk of pulmonary complications (PC) in the single locus analysis was associated with CYP1A1, GCLC and AGTR1 genes. Extra PC (toxic shock syndrome and myocarditis) were not associated with these genes. We evaluated gene-gene interactions using multi-factor dimensionality reduction, and cumulative gene risk score approaches. The final model which included >5 risk alleles in the CYP1A1 (rs2606345, rs4646903, rs1048943), GCLC, AGT, and AGTR1 genes was associated with pleuritis, empyema, acute respiratory distress syndrome, all PC and acute respiratory failure (ARF). We considered CYP1A1, GCLC, AGT, AGTR1 gene set using Set Distiller mode implemented in GeneDecks for discovering gene-set relations via the degree of sharing descriptors within a given gene set. N-acetylcysteine and oxygen were defined by Set Distiller as the best descriptors for the gene set associated in the present study with PC and ARF. Results of the study are in line with literature data and suggest that genetically determined oxidative stress exacerbation may contribute to the progression of lung inflammation.

  10. A Model Program for Translational Medicine in Epilepsy Genetics

    PubMed Central

    Smith, Lacey A.; Ullmann, Jeremy F. P.; Olson, Heather E.; El Achkar, Christelle M.; Truglio, Gessica; Kelly, McKenna; Rosen-Sheidley, Beth; Poduri, Annapurna

    2017-01-01

    Recent technological advances in gene sequencing have led to a rapid increase in gene discovery in epilepsy. However, the ability to assess pathogenicity of variants, provide functional analysis, and develop targeted therapies has not kept pace with rapid advances in sequencing technology. Thus, although clinical genetic testing may lead to a specific molecular diagnosis for some patients, test results often lead to more questions than answers. As the field begins to focus on therapeutic applications of genetic diagnoses using precision medicine, developing processes that offer more than equivocal test results is essential. The success of precision medicine in epilepsy relies on establishing a correct genetic diagnosis, analyzing functional consequences of genetic variants, screening potential therapeutics in the preclinical laboratory setting, and initiating targeted therapy trials for patients. We describe the structure of a comprehensive, pediatric Epilepsy Genetics Program that can serve as a model for translational medicine in epilepsy. PMID:28056630

  11. Repression of Middle Sporulation Genes in Saccharomyces cerevisiae by the Sum1-Rfm1-Hst1 Complex Is Maintained by Set1 and H3K4 Methylation

    PubMed Central

    Jaiswal, Deepika; Jezek, Meagan; Quijote, Jeremiah; Lum, Joanna; Choi, Grace; Kulkarni, Rushmie; Park, DoHwan; Green, Erin M.

    2017-01-01

    The conserved yeast histone methyltransferase Set1 targets H3 lysine 4 (H3K4) for mono, di, and trimethylation and is linked to active transcription due to the euchromatic distribution of these methyl marks and the recruitment of Set1 during transcription. However, loss of Set1 results in increased expression of multiple classes of genes, including genes adjacent to telomeres and middle sporulation genes, which are repressed under normal growth conditions because they function in meiotic progression and spore formation. The mechanisms underlying Set1-mediated gene repression are varied, and still unclear in some cases, although repression has been linked to both direct and indirect action of Set1, associated with noncoding transcription, and is often dependent on the H3K4me2 mark. We show that Set1, and particularly the H3K4me2 mark, are implicated in repression of a subset of middle sporulation genes during vegetative growth. In the absence of Set1, there is loss of the DNA-binding transcriptional regulator Sum1 and the associated histone deacetylase Hst1 from chromatin in a locus-specific manner. This is linked to increased H4K5ac at these loci and aberrant middle gene expression. These data indicate that, in addition to DNA sequence, histone modification status also contributes to proper localization of Sum1. Our results also show that the role for Set1 in middle gene expression control diverges as cells receive signals to undergo meiosis. Overall, this work dissects an unexplored role for Set1 in gene-specific repression, and provides important insights into a new mechanism associated with the control of gene expression linked to meiotic differentiation. PMID:29066473

  12. Prediction of atmospheric degradation data for POPs by gene expression programming.

    PubMed

    Luan, F; Si, H Z; Liu, H T; Wen, Y Y; Zhang, X Y

    2008-01-01

    Quantitative structure-activity relationship models for the prediction of the mean and the maximum atmospheric degradation half-life values of persistent organic pollutants were developed based on the linear heuristic method (HM) and non-linear gene expression programming (GEP). Molecular descriptors, calculated from the structures alone, were used to represent the characteristics of the compounds. HM was used both to pre-select the whole descriptor sets and to build the linear model. GEP yielded satisfactory prediction results: the square of the correlation coefficient r(2) was 0.80 and 0.81 for the mean and maximum half-life values of the test set, and the root mean square errors were 0.448 and 0.426, respectively. The results of this work indicate that the GEP is a very promising tool for non-linear approximations.

  13. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants

    PubMed Central

    Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer

    2014-01-01

    Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test—a score test—with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene–gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test—up to 23 more associations—whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene–gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. Contact: heckerma@microsoft.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25075117

  14. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537

  15. An in vitro skin sensitization assay termed EpiSensA for broad sets of chemicals including lipophilic chemicals and pre/pro-haptens.

    PubMed

    Saito, Kazutoshi; Takenouchi, Osamu; Nukada, Yuko; Miyazawa, Masaaki; Sakaguchi, Hitoshi

    2017-04-01

    To evaluate chemicals (e.g. lipophilic chemicals, pre/pro-haptens) that are difficult to correctly evaluate using in vitro skin sensitization tests (e.g. DPRA, KeratinoSens or h-CLAT), we developed a novel in vitro test termed "Epidermal Sensitization Assay: EpiSensA" that uses reconstructed human epidermis. This assay is based on the induction of multiple marker genes (ATF3, IL-8, DNAJB4 and GCLM) related to two keratinocyte responses (inflammatory or cytoprotective) in the induction of skin sensitization. Here, we first confirmed the mechanistic relevance of these marker genes by focusing on key molecules that regulate keratinocyte responses in vivo (P2X 7 for inflammatory and Nrf2 for cytoprotective responses). The up-regulation of ATF3 and IL-8, or DNAJB4 and GCLM induced by the representative sensitizer 2,4-dinitrochlorobenzene in human keratinocytes was significantly suppressed by a P2X 7 specific antagonist KN-62, or by Nrf2 siRNA, respectively, which supported mechanistic relevance of marker genes. Moreover, the EpiSensA had sensitivity, specificity and accuracy of 93%, 100% and 93% for 29 lipophilic chemicals (logKow≥3.5), and of 96%, 75% and 88% for 43 hydrophilic chemicals including 11 pre/pro-haptens, compared with the LLNA. These results suggested that the EpiSensA could be a mechanism-based test applicable to broad sets of chemicals including lipophilic chemicals and pre/pro-haptens. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Potential Impact of Rapid Blood Culture Testing for Gram-Positive Bacteremia in Japan with the Verigene Gram-Positive Blood Culture Test

    PubMed Central

    Matsuda, Mari; Iguchi, Shigekazu; Mizutani, Tomonori; Hiramatsu, Keiichi; Tega-Ishii, Michiru; Sansaka, Kaori; Negishi, Kenta; Shimada, Kimie; Umemura, Jun; Notake, Shigeyuki; Yanagisawa, Hideji; Yabusaki, Reiko; Araoka, Hideki; Yoneyama, Akiko

    2017-01-01

    Background. Early detection of Gram-positive bacteremia and timely appropriate antimicrobial therapy are required for decreasing patient mortality. The purpose of our study was to evaluate the performance of the Verigene Gram-positive blood culture assay (BC-GP) in two special healthcare settings and determine the potential impact of rapid blood culture testing for Gram-positive bacteremia within the Japanese healthcare delivery system. Furthermore, the study included simulated blood cultures, which included a library of well-characterized methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant enterococci (VRE) isolates reflecting different geographical regions in Japan. Methods. A total 347 BC-GP assays were performed on clinical and simulated blood cultures. BC-GP results were compared to results obtained by reference methods for genus/species identification and detection of resistance genes using molecular and MALDI-TOF MS methodologies. Results. For identification and detection of resistance genes at two clinical sites and simulated blood cultures, overall concordance of BC-GP with reference methods was 327/347 (94%). The time for identification and antimicrobial resistance detection by BC-GP was significantly shorter compared to routine testing especially at the cardiology hospital, which does not offer clinical microbiology services on weekends and holidays. Conclusion. BC-GP generated accurate identification and detection of resistance markers compared with routine laboratory methods for Gram-positive organisms in specialized clinical settings providing more rapid results than current routine testing. PMID:28316631

  17. Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.).

    PubMed

    Xiong, Wangdan; Xu, Xueqin; Zhang, Lin; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Jiang, Huawu; Wu, Guojiang

    2013-07-25

    The WRKY proteins, which contain highly conserved WRKYGQK amino acid sequences and zinc-finger-like motifs, constitute a large family of transcription factors in plants. They participate in diverse physiological and developmental processes. WRKY genes have been identified and characterized in a number of plant species. We identified a total of 58 WRKY genes (JcWRKY) in the genome of the physic nut (Jatropha curcas L.). On the basis of their conserved WRKY domain sequences, all of the JcWRKY proteins could be assigned to one of the previously defined groups, I-III. Phylogenetic analysis of JcWRKY genes with Arabidopsis and rice WRKY genes, and separately with castor bean WRKY genes, revealed no evidence of recent gene duplication in JcWRKY gene family. Analysis of transcript abundance of JcWRKY gene products were tested in different tissues under normal growth condition. In addition, 47 WRKY genes responded to at least one abiotic stress (drought, salinity, phosphate starvation and nitrogen starvation) in individual tissues (leaf, root and/or shoot cortex). Our study provides a useful reference data set as the basis for cloning and functional analysis of physic nut WRKY genes. Copyright © 2013 Elsevier B.V. All rights reserved.

  18. Feasibility of a new model for early detection of patients with multidrug-resistant tuberculosis in a developed setting of eastern China.

    PubMed

    Liu, Zhengwei; Pan, Aizhen; Wu, BeiBei; Zhou, Lin; He, Haibo; Meng, Qiong; Chen, Songhua; Pang, Yu; Wang, Xiaomeng

    2017-10-01

    The poor detection rate of multidrug-resistant tuberculosis (MDR-TB) highlights the urgent need to explore new case finding model to improve the detection of MDR-TB in China. The aim of this study was to evaluate the feasibility of a new model that combines molecular diagnostics and sputum transportation for early detection of patients with MDR-TB in Zhejiang. From May 2014 to January 2015, TB suspects were continuously enrolled at six county-level designated TB hospitals in Zhejiang. Each patient gave three sputum samples, which were submitted to laboratory for smear microscopy, solid culture and GeneXpert. The specimens from rifampin (RIF)-resistant cases detected by GeneXpert, and positive cultures were transported from county-level to prefecture-level laboratories for line probe analysis (LPA) and drug susceptibility testing (DST). The performance and interval of MDR-TB detection of the new model were compared with those of conventional model. A total of 3151 sputum specimens were collected from TB suspects. The sensitivity of GeneXpert for detecting culture-positive cases was 92.7% (405/437), and its specificity was 91.3% (2428/2659). Of 16 RIF-resistant cases detected by DST, GeneXpert could correctly identify 15 cases, yielding a sensitivity of 93.8% (15/16). The specificity of GeneXpert for detecting RIF susceptibility was 100.0% (383/383). The average interval to diagnosis of the conventional DST model was 56.5 days, ranging from 43 to 71 days, which was significantly longer than that of GeneXpert plus LPA (22.2 days, P < 0.01). Our data demonstrate that the combination of improved molecular TB tests and sputum transportation could significantly shorten the time required for detection of MDR-TB, which will bring benefits for preventing an epidemic of MDR-TB in this high-prevalence setting. © 2017 John Wiley & Sons Ltd.

  19. Type 2 Deiodinase Disruption in Astrocytes Results in Anxiety-Depressive-Like Behavior in Male Mice.

    PubMed

    Bocco, Barbara M L C; Werneck-de-Castro, João Pedro; Oliveira, Kelen C; Fernandes, Gustavo W; Fonseca, Tatiana L; Nascimento, Bruna P P; McAninch, Elizabeth A; Ricci, Esther; Kvárta-Papp, Zsuzsanna; Fekete, Csaba; Bernardi, Maria Martha; Gereben, Balázs; Bianco, Antonio C; Ribeiro, Miriam O

    2016-09-01

    Millions of levothyroxine-treated hypothyroid patients complain of impaired cognition despite normal TSH serum levels. This could reflect abnormalities in the type 2 deiodinase (D2)-mediated T4-to-T3 conversion, given their much greater dependence on the D2 pathway for T3 production. T3 normally reaches the brain directly from the circulation or is produced locally by D2 in astrocytes. Here we report that mice with astrocyte-specific Dio2 inactivation (Astro-D2KO) have normal serum T3 but exhibit anxiety-depression-like behavior as found in open field and elevated plus maze studies and when tested for depression using the tail-suspension and the forced-swimming tests. Remarkably, 4 weeks of daily treadmill exercise sessions eliminated this phenotype. Microarray gene expression profiling of the Astro-D2KO hippocampi identified an enrichment of three gene sets related to inflammation and impoverishment of three gene sets related to mitochondrial function and response to oxidative stress. Despite normal neurogenesis, the Astro-D2KO hippocampi exhibited decreased expression of four of six known to be positively regulated genes by T3, ie, Mbp (∼43%), Mag (∼34%), Hr (∼49%), and Aldh1a1 (∼61%) and increased expression of 3 of 12 genes negatively regulated by T3, ie, Dgkg (∼17%), Syce2 (∼26%), and Col6a1 (∼3-fold) by quantitative real-time PCR. Notably, in Astro-D2KO animals, there was also a reduction in mRNA levels of genes known to be affected in classical animal models of depression, ie, Bdnf (∼18%), Ntf3 (∼43%), Nmdar (∼26%), and GR (∼20%), which were also normalized by daily exercise sessions. These findings suggest that defects in Dio2 expression in the brain could result in mood and behavioral disorders.

  20. Alcohol-related Genes Show an Enrichment of Associations with a Persistent Externalizing Factor

    PubMed Central

    Ashenhurst, James R.; Harden, K. Paige; Corbin, William R.; Fromme, Kim

    2016-01-01

    Research using twins has found that much of the variability in externalizing phenotypes – including alcohol and drug use, impulsive personality traits, risky sex and property crime – is explained by genetic factors. Nevertheless, identification of specific genes and variants associated with these traits has proven to be difficult, likely because individual differences in externalizing are explained by many genes of small individual effect. Moreover, twin research indicates that heritable variance in externalizing behaviors is mostly shared across the externalizing spectrum rather than specific to any behavior. We use a longitudinal, “deep phenotyping” approach to model a general externalizing factor reflecting persistent engagement in a variety of socially problematic behaviors measured at eleven assessment occasions spanning early adulthood (ages 18 to 28). In an ancestrally homogenous sample of non-Hispanic Whites (N = 337), we then tested for enrichment of associations between the persistent externalizing factor and a set of 3,281 polymorphisms within 104 genes that were previously identified as associated with alcohol-use behaviors. Next we tested for enrichment among domain-specific factors (e.g., property crime) composed of residual variance not accounted for by the common factor. Significance was determined relative to bootstrapped empirical thresholds derived from permutations of phenotypic data. Results indicated significant enrichment of genetic associations for persistent externalizing, but not for domain-specific factors. Consistent with twin research findings, these results suggest that genetic variants are broadly associated with externalizing behaviors rather than unique to specific behaviors. General Scientific Summary This study shows that variation in 104 genes is associated with socially problematic “externalizing” behavior, including substance misuse, property crime, risky sex, and aspects of impulsive personality. Importantly, this association was with the common variation across these behaviors rather than with the variation unique to any given behavior. The manuscript demonstrates a potentially advantageous technique for relating sets of hypothesized genes to complex traits or behaviors. PMID:27505405

  1. Reduction in expression of the benign AR transcriptome is a hallmark of localised prostate cancer progression.

    PubMed

    Stuchbery, Ryan; Macintyre, Geoff; Cmero, Marek; Harewood, Laurence M; Peters, Justin S; Costello, Anthony J; Hovens, Christopher M; Corcoran, Niall M

    2016-05-24

    Despite the importance of androgen receptor (AR) signalling to prostate cancer development, little is known about how this signalling pathway changes with increasing grade and stage of the disease. To explore changes in the normal AR transcriptome in localised prostate cancer, and its relation to adverse pathological features and disease recurrence. Publically accessible human prostate cancer expression arrays as well as RNA sequencing data from the prostate TCGA. Tumour associated PSA and PSAD were calculated for a large cohort of men (n=1108) undergoing prostatectomy. We performed a meta-analysis of the expression of an androgen-regulated gene set across datasets using Oncomine. Differential expression of selected genes in the prostate TCGA database was probed using the edgeR Bioconductor package. Changes in tumour PSA density with stage and grade were assessed by Student's t-test, and its association with biochemical recurrence explored by Kaplan-Meier curves and Cox regression. Meta-analysis revealed a systematic decline in the expression of a previously identified benign prostate androgen-regulated gene set with increasing tumour grade, reaching significance in nine of 25 genes tested despite increasing AR expression. These results were confirmed in a large independent dataset from the TCGA. At the protein level, when serum PSA was corrected for tumour volume, significantly lower levels were observed with increasing tumour grade and stage, and predicted disease recurrence. Lower PSA secretion-per-tumour-volume is associated with increasing grade and stage of prostate cancer, has prognostic relevance, and reflects a systematic perturbation of androgen signalling.

  2. Quantification of Endospore-Forming Firmicutes by Quantitative PCR with the Functional Gene spo0A

    PubMed Central

    Bueche, Matthieu; Wunderlin, Tina; Roussel-Delif, Ludovic; Junier, Thomas; Sauvain, Loic; Jeanneret, Nicole

    2013-01-01

    Bacterial endospores are highly specialized cellular forms that allow endospore-forming Firmicutes (EFF) to tolerate harsh environmental conditions. EFF are considered ubiquitous in natural environments, in particular, those subjected to stress conditions. In addition to natural habitats, EFF are often the cause of contamination problems in anthropogenic environments, such as industrial production plants or hospitals. It is therefore desirable to assess their prevalence in environmental and industrial fields. To this end, a high-sensitivity detection method is still needed. The aim of this study was to develop and evaluate an approach based on quantitative PCR (qPCR). For this, the suitability of functional genes specific for and common to all EFF were evaluated. Seven genes were considered, but only spo0A was retained to identify conserved regions for qPCR primer design. An approach based on multivariate analysis was developed for primer design. Two primer sets were obtained and evaluated with 16 pure cultures, including representatives of the genera Bacillus, Paenibacillus, Brevibacillus, Geobacillus, Alicyclobacillus, Sulfobacillus, Clostridium, and Desulfotomaculum, as well as with environmental samples. The primer sets developed gave a reliable quantification when tested on laboratory strains, with the exception of Sulfobacillus and Desulfotomaculum. A test using sediment samples with a diverse EFF community also gave a reliable quantification compared to 16S rRNA gene pyrosequencing. A detection limit of about 104 cells (or spores) per gram of initial material was calculated, indicating this method has a promising potential for the detection of EFF over a wide range of applications. PMID:23811505

  3. Genome-wide association study identifies SNPs in the MHC class II loci that are associated with self-reported history of whooping cough.

    PubMed

    McMahon, George; Ring, Susan M; Davey-Smith, George; Timpson, Nicholas J

    2015-10-15

    Whooping cough is currently seeing resurgence in countries despite high vaccine coverage. There is considerable variation in subject-specific response to infection and vaccine efficacy, but little is known about the role of human genetics. We carried out a case-control genome-wide association study of adult or parent-reported history of whooping cough in two cohorts from the UK: the ALSPAC cohort and the 1958 British Birth Cohort (815/758 cases and 6341/4308 controls, respectively). We also imputed HLA alleles using dense SNP data in the MHC region and carried out gene-based and gene-set tests of association and estimated the amount of additive genetic variation explained by common SNPs. We observed a novel association at SNPs in the MHC class II region in both cohorts [lead SNP rs9271768 after meta-analysis, odds ratio [95% confidence intervals (CIs)] 1.47 (1.35, 1.6), P-value 1.21E - 18]. Multiple strong associations were also observed at alleles at the HLA class II loci. The majority of these associations were explained by the lead SNP rs9271768. Gene-based and gene-set tests and estimates of explainable common genetic variation could not establish the presence of additional associations in our sample. Genetic variation at the MHC class II region plays a role in susceptibility to whooping cough. These findings provide additional perspective on mechanisms of whooping cough infection and vaccine efficacy. © The Author 2015. Published by Oxford University Press.

  4. Selection of novel reference genes for use in the human central nervous system: a BrainNet Europe Study.

    PubMed

    Durrenberger, Pascal F; Fernando, Francisca S; Magliozzi, Roberta; Kashefi, Samira N; Bonnert, Timothy P; Ferrer, Isidro; Seilhean, Danielle; Nait-Oumesmar, Brahim; Schmitt, Andrea; Gebicke-Haerter, Peter J; Falkai, Peter; Grünblatt, Edna; Palkovits, Miklos; Parchi, Piero; Capellari, Sabina; Arzberger, Thomas; Kretzschmar, Hans; Roncaroli, Federico; Dexter, David T; Reynolds, Richard

    2012-12-01

    The use of an appropriate reference gene to ensure accurate normalisation is crucial for the correct quantification of gene expression using qPCR assays and RNA arrays. The main criterion for a gene to qualify as a reference gene is a stable expression across various cell types and experimental settings. Several reference genes are commonly in use but more and more evidence reveals variations in their expression due to the presence of on-going neuropathological disease processes, raising doubts concerning their use. We conducted an analysis of genome-wide changes of gene expression in the human central nervous system (CNS) covering several neurological disorders and regions, including the spinal cord, and were able to identify a number of novel stable reference genes. We tested the stability of expression of eight novel (ATP5E, AARS, GAPVD1, CSNK2B, XPNPEP1, OSBP, NAT5 and DCTN2) and four more commonly used (BECN1, GAPDH, QARS and TUBB) reference genes in a smaller cohort using RT-qPCR. The most stable genes out of the 12 reference genes were tested as normaliser to validate increased levels of a target gene in CNS disease. We found that in human post-mortem tissue the novel reference genes, XPNPEP1 and AARS, were efficient in replicating microarray target gene expression levels and that XPNPEP1 was more efficient as a normaliser than BECN1, which has been shown to change in expression as a consequence of neuronal cell loss. We provide herein one more suitable novel reference gene, XPNPEP1, with no current neuroinflammatory or neurodegenerative associations that can be used for gene quantitative gene expression studies with human CNS post-mortem tissue and also suggest a list of potential other candidates. These data also emphasise the importance of organ/tissue-specific stably expressed genes as reference genes for RNA studies.

  5. Leveraging blood serotonin as an endophenotype to identify de novo and rare variants involved in autism.

    PubMed

    Chen, Rui; Davis, Lea K; Guter, Stephen; Wei, Qiang; Jacob, Suma; Potter, Melissa H; Cox, Nancy J; Cook, Edwin H; Sutcliffe, James S; Li, Bingshan

    2017-01-01

    Autism spectrum disorder (ASD) is one of the most highly heritable neuropsychiatric disorders, but underlying molecular mechanisms are still unresolved due to extreme locus heterogeneity. Leveraging meaningful endophenotypes or biomarkers may be an effective strategy to reduce heterogeneity to identify novel ASD genes. Numerous lines of evidence suggest a link between hyperserotonemia, i.e., elevated serotonin (5-hydroxytryptamine or 5-HT) in whole blood, and ASD. However, the genetic determinants of blood 5-HT level and their relationship to ASD are largely unknown. In this study, pursuing the hypothesis that de novo variants (DNVs) and rare risk alleles acting in a recessive mode may play an important role in predisposition of hyperserotonemia in people with ASD, we carried out whole exome sequencing (WES) in 116 ASD parent-proband trios with most (107) probands having 5-HT measurements. Combined with published ASD DNVs, we identified USP15 as having recurrent de novo loss of function mutations and discovered evidence supporting two other known genes with recurrent DNVs ( FOXP1 and KDM5B ). Genes harboring functional DNVs significantly overlap with functional/disease gene sets known to be involved in ASD etiology, including FMRP targets and synaptic formation and transcriptional regulation genes. We grouped the probands into High-5HT and Normal-5HT groups based on normalized serotonin levels, and used network-based gene set enrichment analysis (NGSEA) to identify novel hyperserotonemia-related ASD genes based on LoF and missense DNVs. We found enrichment in the High-5HT group for a gene network module (DAWN-1) previously implicated in ASD, and this points to the TGF-β pathway and cell junction processes. Through analysis of rare recessively acting variants (RAVs), we also found that rare compound heterozygotes (CHs) in the High-5HT group were enriched for loci in an ASD-associated gene set. Finally, we carried out rare variant group-wise transmission disequilibrium tests (gTDT) and observed significant association of rare variants in genes encoding a subset of the serotonin pathway with ASD. Our study identified USP15 as a novel gene implicated in ASD based on recurrent DNVs. It also demonstrates the potential value of 5-HT as an effective endophenotype for gene discovery in ASD, and the effectiveness of this strategy needs to be further explored in studies of larger sample sizes.

  6. TCDD and a Putative Endogenous AhR Ligand, ITE, Elicit the Same Immediate Changes in Gene Expression in Mouse Lung Fibroblasts

    PubMed Central

    Henry, Ellen C.; Welle, Stephen L.; Gasiewicz, Thomas A.

    2010-01-01

    The aryl hydrocarbon receptor (AhR), a ligand-dependent transcription factor, mediates toxicity of several classes of xenobiotics and also has important physiological roles in differentiation, reproduction, and immunity, although the endogenous ligand(s) mediating these functions is/are as yet unidentified. One candidate endogenous ligand, 2-(1′H-indolo-3′-carbonyl)-thiazole-4-carboxylic acid methyl ester (ITE), is a potent AhR agonist in vitro, activates the murine AhR in vivo, but does not induce toxicity. We hypothesized that ITE and the toxic ligand, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), may modify transcription of different sets of genes to account for their different toxicity. To test this hypothesis, primary mouse lung fibroblasts were exposed to 0.5μM ITE, 0.2nM TCDD, or vehicle for 4 h, and total gene expression was evaluated using microarrays. After this short-term and low-dose treatment, several hundred genes were changed significantly, and the response to ITE and TCDD was remarkably similar, both qualitatively and quantitatively. Induced gene sets included the expected battery of AhR-dependent xenobiotic-metabolizing enzymes, as well as several sets that reflect the inflammatory role of lung fibroblasts. Real time quantitative RT-qPCR assay of several selected genes confirmed these microarray data and further suggested that there may be kinetic differences in expression between ligands. These data suggest that ITE and TCDD elicit an analogous change in AhR conformation such that the initial transcription response is the same. Furthermore, if the difference in toxicity between TCDD and ITE is mediated by differences in gene expression, then it is likely that secondary changes enabled by the persistent TCDD, but not by the shorter lived ITE, are responsible. PMID:19933214

  7. AB033. Preimplantation genetic diagnosis of spinal muscular atrophy in Vietnam

    PubMed Central

    Khoa, Tran Van; Nga, Nguyen Thi Thanh; Tao, Nguyen Dinh; Sang, Trieu Tien; Giang, Ngo Truong; Dung, Vu Chi

    2015-01-01

    Objective Spinal muscular atrophy (SMA) is a severe neurodegenerative autosomal recessive disorder. Most of patients are caused by the homozygous absence of exon 7 of the telomeric copy of the SMN gene (SMNt) on chromosome 5. Setting up a molecular diagnostic protocol for detecting exon 7 gen SMNT homozygous deletion in single cell is basic to preimplantation genetic diagnosis of spinal muscular atrophy. Methods This study was carried out on 17 patients and their parents. Firstly, lymphocytes of patients and their parents were isolated from fresh blood by ficoll. Taking a lymphocyte on stereoscopic microscope, lysing the cell, amplifying whole genome, then amplifying exon 7 of SMNT gene by using a polymerase chain reaction, followed by HinfI restriction digest enzyme of the PCR enabling the important SMNT gene to be distinguished from the centromic SMN gene (SMNc) which has no clinical phenotype to detect mutation. Electrophoresis PCR products after digesting by restriction enzyme and analysis. Besides, the minisequencing technique has also been used to detect the absence of exon 7 of SMNT gene based on the difference of one nucleotide at 214-position in exon 7 (C-SMNT, T-SMNc). Secondly, the application of the protocol was set up on one lymphocyte to preimplantation genetic diagnosis of spinal muscular atrophy on biopsied blastomeres. Results Two different protocols which were PCR-RFLP and minisequencing, were set up on 200 lymphocytes from 17 patients and their parents to screen the homozygous deletion in exon 7 SMNT gene with the PCR efficiency in 96%. The results were similar with the gene diagnosed from fresh blood. The methods were also efficient, providing interpretable result in 96.55% (28/29) of the blastomeres tested. Three couples were treated using this method. Three normal embryos were transfer which resulted in one clinical pregnancy. Conclusions We have successfully applied the technique of PCR-RFLP and minisequencing for the preimplantation genetic diagnosis of spinal muscular atrophy.

  8. Molecular Classifiers for Acute Kidney Transplant Rejection in Peripheral Blood by Whole Genome Gene Expression Profiling

    PubMed Central

    Kurian, S. M.; Williams, A. N.; Gelbart, T.; Campbell, D.; Mondala, T. S.; Head, S. R.; Horvath, S.; Gaber, L.; Thompson, R.; Whisenant, T.; Lin, W.; Langfelder, P.; Robison, E. H.; Schaffer, R. L.; Fisher, J. S.; Friedewald, J.; Flechner, S. M.; Chan, L. K.; Wiseman, A. C.; Shidban, H.; Mendez, R.; Heilman, R.; Abecassis, M. M.; Marsh, C. L.; Salomon, D. R.

    2015-01-01

    There are no minimally invasive diagnostic metrics for acute kidney transplant rejection (AR), especially in the setting of the common confounding diagnosis, acute dysfunction with no rejection (ADNR). Thus, though kidney transplant biopsies remain the gold standard, they are invasive, have substantial risks, sampling error issues and significant costs and are not suitable for serial monitoring. Global gene expression profiles of 148 peripheral blood samples from transplant patients with excellent function and normal histology (TX; n = 46), AR (n = 63) and ADNR (n = 39), from two independent cohorts were analyzed with DNA microarrays. We applied a new normalization tool, frozen robust multi-array analysis, particularly suitable for clinical diagnostics, multiple prediction tools to discover, refine and validate robust molecular classifiers and we tested a novel one-by-one analysis strategy to model the real clinical application of this test. Multiple three-way classifier tools identified 200 highest value probesets with sensitivity, specificity, positive predictive value, negative predictive value and area under the curve for the validation cohort ranging from 82% to 100%, 76% to 95%, 76% to 95%, 79% to 100%, 84% to 100% and 0.817 to 0.968, respectively. We conclude that peripheral blood gene expression profiling can be used as a minimally invasive tool to accurately reveal TX, AR and ADNR in the setting of acute kidney transplant dysfunction. PMID:24725967

  9. Clinical implementation of integrated whole-genome copy number and mutation profiling for glioblastoma

    PubMed Central

    Ramkissoon, Shakti H.; Bi, Wenya Linda; Schumacher, Steven E.; Ramkissoon, Lori A.; Haidar, Sam; Knoff, David; Dubuc, Adrian; Brown, Loreal; Burns, Margot; Cryan, Jane B.; Abedalthagafi, Malak; Kang, Yun Jee; Schultz, Nikolaus; Reardon, David A.; Lee, Eudocia Q.; Rinne, Mikael L.; Norden, Andrew D.; Nayak, Lakshmi; Ruland, Sandra; Doherty, Lisa M.; LaFrankie, Debra C.; Horvath, Margaret; Aizer, Ayal A.; Russo, Andrea; Arvold, Nils D.; Claus, Elizabeth B.; Al-Mefty, Ossama; Johnson, Mark D.; Golby, Alexandra J.; Dunn, Ian F.; Chiocca, E. Antonio; Trippa, Lorenzo; Santagata, Sandro; Folkerth, Rebecca D.; Kantoff, Philip; Rollins, Barrett J.; Lindeman, Neal I.; Wen, Patrick Y.; Ligon, Azra H.; Beroukhim, Rameen; Alexander, Brian M.; Ligon, Keith L.

    2015-01-01

    Background Multidimensional genotyping of formalin-fixed paraffin-embedded (FFPE) samples has the potential to improve diagnostics and clinical trials for brain tumors, but prospective use in the clinical setting is not yet routine. We report our experience with implementing a multiplexed copy number and mutation-testing program in a diagnostic laboratory certified by the Clinical Laboratory Improvement Amendments. Methods We collected and analyzed clinical testing results from whole-genome array comparative genomic hybridization (OncoCopy) of 420 brain tumors, including 148 glioblastomas. Mass spectrometry–based mutation genotyping (OncoMap, 471 mutations) was performed on 86 glioblastomas. Results OncoCopy was successful in 99% of samples for which sufficient DNA was obtained (n = 415). All clinically relevant loci for glioblastomas were detected, including amplifications (EGFR, PDGFRA, MET) and deletions (EGFRvIII, PTEN, 1p/19q). Glioblastoma patients ≤40 years old had distinct profiles compared with patients >40 years. OncoMap testing reliably identified mutations in IDH1, TP53, and PTEN. Seventy-seven glioblastoma patients enrolled on trials, of whom 51% participated in targeted therapeutic trials where multiplex data informed eligibility or outcomes. Data integration identified patients with complete tumor suppressor inactivation, albeit rarely (5% of patients) due to lack of whole-gene coverage in OncoMap. Conclusions Combined use of multiplexed copy number and mutation detection from FFPE samples in the clinical setting can efficiently replace singleton tests for clinical diagnosis and prognosis in most settings. Our results support incorporation of these assays into clinical trials as integral biomarkers and their potential to impact interpretation of results. Limited tumor suppressor variant capture by targeted genotyping highlights the need for whole-gene sequencing in glioblastoma. PMID:25754088

  10. Investigating a multigene prognostic assay based on significant pathways for Luminal A breast cancer through gene expression profile analysis.

    PubMed

    Gao, Haiyan; Yang, Mei; Zhang, Xiaolan

    2018-04-01

    The present study aimed to investigate potential recurrence-risk biomarkers based on significant pathways for Luminal A breast cancer through gene expression profile analysis. Initially, the gene expression profiles of Luminal A breast cancer patients were downloaded from The Cancer Genome Atlas database. The differentially expressed genes (DEGs) were identified using a Limma package and the hierarchical clustering analysis was conducted for the DEGs. In addition, the functional pathways were screened using Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses and rank ratio calculation. The multigene prognostic assay was exploited based on the statistically significant pathways and its prognostic function was tested using train set and verified using the gene expression data and survival data of Luminal A breast cancer patients downloaded from the Gene Expression Omnibus. A total of 300 DEGs were identified between good and poor outcome groups, including 176 upregulated genes and 124 downregulated genes. The DEGs may be used to effectively distinguish Luminal A samples with different prognoses verified by hierarchical clustering analysis. There were 9 pathways screened as significant pathways and a total of 18 DEGs involved in these 9 pathways were identified as prognostic biomarkers. According to the survival analysis and receiver operating characteristic curve, the obtained 18-gene prognostic assay exhibited good prognostic function with high sensitivity and specificity to both the train and test samples. In conclusion the 18-gene prognostic assay including the key genes, transcription factor 7-like 2, anterior parietal cortex and lymphocyte enhancer factor-1 may provide a new method for predicting outcomes and may be conducive to the promotion of precision medicine for Luminal A breast cancer.

  11. Sites of disruption within E1 and E2 genes of HPV16 and association with cervical dysplasia.

    PubMed

    Tsakogiannis, D; Gortsilas, P; Kyriakopoulou, Z; Ruether, I G A; Dimitriou, T G; Orfanoudakis, G; Markoulatos, P

    2015-11-01

    Integration of HPV16 DNA into the host chromosome usually disrupts the E1 and/or E2 genes. The present study investigated the disruption of E1, E2 genes in a total of eighty four HPV16-positive precancerous and cervical cancer specimens derived from Greek women (seventeen paraffin-embedded cervical biopsies and sixty seven Thin Prep samples). Complete E2 and E1 genes were amplified using three and nine overlapping primer sets respectively, in order to define the sites of disruption. Extensive mapping analysis revealed that disruption/deletion events within E2 gene occurred in high grade and cervical cancer samples (x(2) test, P < 0.01), while no evidence of E2 gene disruption was documented among low grade cervical intraepithelial neoplasias. In addition, disruptions within the E1 gene occur both in high and low grade cervical intraepithelial neoplasia. This leads to the assumption that in low grade cervical intraepithelial neoplasias only E1 gene disruption was involved (Fisher's exact test, P < 0.05), while in high grade malignancies and cervical cancer cases deletions in both E1 and E2 genes occurred. Furthermore, the most prevalent site of disruption of E1 gene was located between nucleotides 1059 and 1323, while the most prevalent deleted region of the E2 gene was located between nucleotides 3172 and 3649 (E2 hinge region). Therefore, it is proposed that each population has its own profile of frequencies and sites of disruptions and extensive mapping analysis of E1 and E2 genes is mandatory in order to determine suitable markers for HPV16 DNA integration analysis in distinct populations. © 2015 Wiley Periodicals, Inc.

  12. Blood-Bourne MicroRNA Biomarker Evaluation in Attention-Deficit/Hyperactivity Disorder of Han Chinese Individuals: An Exploratory Study.

    PubMed

    Wang, Liang-Jen; Li, Sung-Chou; Lee, Min-Jing; Chou, Miao-Chun; Chou, Wen-Jiun; Lee, Sheng-Yu; Hsu, Chih-Wei; Huang, Lien-Hung; Kuo, Ho-Chang

    2018-01-01

    Background: Attention-deficit/hyperactivity disorder (ADHD) is a highly genetic neurodevelopmental disorder, and its dysregulation of gene expression involves microRNAs (miRNAs). The purpose of this study was to identify potential miRNAs biomarkers and then use these biomarkers to establish a diagnostic panel for ADHD. Design and methods: RNA samples from white blood cells (WBCs) of five ADHD patients and five healthy controls were combined to create one pooled patient library and one control library. We identified 20 candidate miRNAs with the next-generation sequencing (NGS) technique (Illumina). Blood samples were then collected from a Training Set (68 patients and 54 controls) and a Testing Set (20 patients and 20 controls) to identify the expression profiles of these miRNAs with real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR). We used receiver operating characteristic (ROC) curves and the area under the curve (AUC) to evaluate both the specificity and sensitivity of the probability score yielded by the support vector machine (SVM) model. Results: We identified 13 miRNAs as potential ADHD biomarkers. The ΔCt values of these miRNAs in the Training Set were integrated to create a biomarker model using the SVM algorithm, which demonstrated good validity in differentiating ADHD patients from control subjects (sensitivity: 86.8%, specificity: 88.9%, AUC: 0.94, p < 0.001). The results of the blind testing showed that 85% of the subjects in the Testing Set were correctly classified using the SVM model alignment (AUC: 0.91, p < 0.001). The discriminative validity is not influenced by patients' age or gender, indicating both the robustness and the reliability of the SVM classification model. Conclusion: As measured in peripheral blood, miRNA-based biomarkers can aid in the differentiation of ADHD in clinical settings. Additional studies are needed in the future to clarify the ADHD-associated gene functions and biological mechanisms modulated by miRNAs.

  13. Transcriptome analysis identifies genes involved in sex determination and development of Xenopus laevis gonads.

    PubMed

    Piprek, Rafal P; Damulewicz, Milena; Kloc, Malgorzata; Kubiak, Jacek Z

    Development of the gonads is a complex process, which starts with a period of undifferentiated, bipotential gonads. During this period the expression of sex-determining genes is initiated. Sex determination is a process triggering differentiation of the gonads into the testis or ovary. Sex determination period is followed by sexual differentiation, i.e. appearance of the first testis- and ovary-specific features. In Xenopus laevis W-linked DM-domain gene (DM-W) had been described as a master determinant of the gonadal female sex. However, the data on the expression and function of other genes participating in gonad development in X. laevis, and in anurans, in general, are very limited. We applied microarray technique to analyze the expression pattern of a subset of X. laevis genes previously identified to be involved in gonad development in several vertebrate species. We also analyzed the localization and the expression level of proteins encoded by these genes in developing X. laevis gonads. These analyses pointed to the set of genes differentially expressed in developing testes and ovaries. Gata4, Sox9, Dmrt1, Amh, Fgf9, Ptgds, Pdgf, Fshr, and Cyp17a1 expression was upregulated in developing testes, while DM-W, Fst, Foxl2, and Cyp19a1 were upregulated in developing ovaries. We discuss the possible roles of these genes in development of X. laevis gonads. Copyright © 2018 International Society of Differentiation. Published by Elsevier B.V. All rights reserved.

  14. Extra Large G-Protein Interactome Reveals Multiple Stress Response Function and Partner-Dependent XLG Subcellular Localization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liang, Ying; Gao, Yajun; Jones, Alan M.

    The three-member family of Arabidopsis extra-large G proteins (XLG1-3) defines the prototype of an atypical Ga subunit in the heterotrimeric G protein complex. Some recent evidence indicate that XLG subunits operate along with its Gbg dimer in root morphology, stress responsiveness, and cytokinin induced development, however downstream targets of activated XLG proteins in the stress pathways are rarely known. In order to assemble a set of candidate XLG-targeted proteins, a yeast two-hybrid complementation-based screen was performed using XLG protein baits to query interactions between XLG and partner protein found in glucose-treated seedlings, roots, and Arabidopsis cells in culture. Seventy twomore » interactors were identified and >60% of a test set displayed in vivo interaction with XLG proteins. Gene co-expression analysis shows that >70% of the interactors are positively correlated with the corresponding XLG partners. Gene Ontology enrichment for all the candidates indicates stress responses and posits a molecular mechanism involving a specific set of transcription factor partners to XLG. Genes encoding two of these transcription factors, SZF1 and 2, require XLG proteins for full NaCl-induced expression. Furthermore, the subcellular localization of the XLG proteins in the nucleus, endosome, and plasma membrane is dependent on the specific interacting partner.« less

  15. Extra Large G-Protein Interactome Reveals Multiple Stress Response Function and Partner-Dependent XLG Subcellular Localization

    DOE PAGES

    Liang, Ying; Gao, Yajun; Jones, Alan M.

    2017-06-13

    The three-member family of Arabidopsis extra-large G proteins (XLG1-3) defines the prototype of an atypical Ga subunit in the heterotrimeric G protein complex. Some recent evidence indicate that XLG subunits operate along with its Gbg dimer in root morphology, stress responsiveness, and cytokinin induced development, however downstream targets of activated XLG proteins in the stress pathways are rarely known. In order to assemble a set of candidate XLG-targeted proteins, a yeast two-hybrid complementation-based screen was performed using XLG protein baits to query interactions between XLG and partner protein found in glucose-treated seedlings, roots, and Arabidopsis cells in culture. Seventy twomore » interactors were identified and >60% of a test set displayed in vivo interaction with XLG proteins. Gene co-expression analysis shows that >70% of the interactors are positively correlated with the corresponding XLG partners. Gene Ontology enrichment for all the candidates indicates stress responses and posits a molecular mechanism involving a specific set of transcription factor partners to XLG. Genes encoding two of these transcription factors, SZF1 and 2, require XLG proteins for full NaCl-induced expression. Furthermore, the subcellular localization of the XLG proteins in the nucleus, endosome, and plasma membrane is dependent on the specific interacting partner.« less

  16. Assessing the readiness of precision medicine interoperabilty: An exploratory study of the National Institutes of Health genetic testing registry.

    PubMed

    Ronquillo, Jay G; Weng, Chunhua; Lester, William T

    2017-11-17

      Precision medicine involves three major innovations currently taking place in healthcare:  electronic health records, genomics, and big data.  A major challenge for healthcare providers, however, is understanding the readiness for practical application of initiatives like precision medicine.   To better understand the current state and challenges of precision medicine interoperability using a national genetic testing registry as a starting point, placed in the context of established interoperability formats.   We performed an exploratory analysis of the National Institutes of Health Genetic Testing Registry.  Relevant standards included Health Level Seven International Version 3 Implementation Guide for Family History, the Human Genome Organization Gene Nomenclature Committee (HGNC) database, and Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT).  We analyzed the distribution of genetic testing laboratories, genetic test characteristics, and standardized genome/clinical code mappings, stratified by laboratory setting. There were a total of 25472 genetic tests from 240 laboratories testing for approximately 3632 distinct genes.  Most tests focused on diagnosis, mutation confirmation, and/or risk assessment of germline mutations that could be passed to offspring.  Genes were successfully mapped to all HGNC identifiers, but less than half of tests mapped to SNOMED CT codes, highlighting significant gaps when linking genetic tests to standardized clinical codes that explain the medical motivations behind test ordering.  Conclusion:  While precision medicine could potentially transform healthcare, successful practical and clinical application will first require the comprehensive and responsible adoption of interoperable standards, terminologies, and formats across all aspects of the precision medicine pipeline.

  17. Genomic tests for ovarian cancer detection and management.

    PubMed

    Myers, Evan R; Havrilesky, Laura J; Kulasingam, Shalini L; Sanders, Gillian D; Cline, Kathryn E; Gray, Rebecca N; Berchuck, Andrew; McCrory, Douglas C

    2006-10-01

    To assess the evidence that the use of genomic tests for ovarian cancer screening, diagnosis, and treatment leads to improved outcomes. PubMed and reference lists of recent reviews. We evaluated tests for: (a) single gene products; (b) genetic variations affecting risk of ovarian cancer; (c) gene expression; and (d) proteomics. For tests covered in recent evidence reports (cancer antigen 125 [CA-125] and breast cancer genes 1 and 2 [BRCA1/2]), we added studies published subsequent to the reports. We sought evidence on: (a) the analytic performance of tests in clinical laboratories; (b) the sensitivity and specificity of tests in different patient populations; (c) the clinical impact of testing in asymptomatic women, women with suspected ovarian cancer, and women with diagnosed ovarian cancer; (d) the harms of genomic testing; and (e) the impact of direct-to-consumer and direct-to-physician advertising on appropriate use of tests. We also constructed a computer simulation model to test the impact of different assumptions about ovarian cancer natural history on the relative effectiveness of different strategies. There are reasonable data on the clinical laboratory performance of most radioimmunoassays, but the majority of the data on other genomic tests comes from research laboratories. Genomic test sensitivity/specificity estimates are limited by small sample sizes, spectrum bias, and unrealistically large prevalences of ovarian cancer; in particular, estimates of positive predictive values derived from most of the studies are substantially higher than would be expected in most screening or diagnostic settings. We found no evidence relevant to the question of the impact of genomic tests on health outcomes in asymptomatic women. Although there is a relatively large literature on the association of test results and various clinical outcomes, the clinical utility of changing management based on these results has not been evaluated. We found no evidence that genomic tests for ovarian cancer have unique harms beyond those common to other tests for genetic susceptibility or other tests used in screening, diagnosis, and management of ovarian cancer. Studies of a direct-to-consumer campaign for BRCA1/2 testing suggest increased utilization, but the effect on "appropriateness" was unclear. Model simulations suggest that annual screening, even with a highly sensitive test, will not reduce ovarian cancer mortality by more than 50 percent; frequent screening has a very low positive predictive value, even with a highly specific test. Although research remains promising, adaptation of genomic tests into clinical practice must await appropriately designed and powered studies in relevant clinical settings.

  18. A Norway spruce FLOWERING LOCUS T homolog is implicated in control of growth rhythm in conifers.

    PubMed

    Gyllenstrand, Niclas; Clapham, David; Källman, Thomas; Lagercrantz, Ulf

    2007-05-01

    Growth in perennial plants possesses an annual cycle of active growth and dormancy that is controlled by environmental factors, mainly photoperiod and temperature. In conifers and other nonangiosperm species, the molecular mechanisms behind these responses are currently unknown. In Norway spruce (Picea abies L. Karst.) seedlings, growth cessation and bud set are induced by short days and plants from southern latitudes require at least 7 to 10 h of darkness, whereas plants from northern latitudes need only 2 to 3 h of darkness. Bud burst, on the other hand, is almost exclusively controlled by temperature. To test the possible role of Norway spruce FLOWERING LOCUS T (FT)-like genes in growth rhythm, we have studied expression patterns of four Norway spruce FT family genes in two populations with a divergent bud set response under various photoperiodic conditions. Our data show a significant and tight correlation between growth rhythm (both bud set and bud burst), and expression pattern of one of the four Norway spruce phosphatidylethanolamine-binding protein gene family members (PaFT4) over a variety of experimental conditions. This study strongly suggests that one Norway spruce homolog to the FT gene, which controls flowering in angiosperms, is also a key integrator of photoperiodic and thermal signals in the control of growth rhythms in gymnosperms. The data also indicate that the divergent adaptive bud set responses of northern and southern Norway spruce populations, both to photoperiod and light quality, are mediated through PaFT4. These results provide a major advance in our understanding of the molecular control of a major adaptive trait in conifers and a tool for further molecular studies of adaptive variation in plants.

  19. Prediction model of potential hepatocarcinogenicity of rat hepatocarcinogens using a large-scale toxicogenomics database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uehara, Takeki, E-mail: takeki.uehara@shionogi.co.jp; Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, 7-6-8 Asagi, Ibaraki, Osaka 567-0085; Minowa, Yohsuke

    2011-09-15

    The present study was performed to develop a robust gene-based prediction model for early assessment of potential hepatocarcinogenicity of chemicals in rats by using our toxicogenomics database, TG-GATEs (Genomics-Assisted Toxicity Evaluation System developed by the Toxicogenomics Project in Japan). The positive training set consisted of high- or middle-dose groups that received 6 different non-genotoxic hepatocarcinogens during a 28-day period. The negative training set consisted of high- or middle-dose groups of 54 non-carcinogens. Support vector machine combined with wrapper-type gene selection algorithms was used for modeling. Consequently, our best classifier yielded prediction accuracies for hepatocarcinogenicity of 99% sensitivity and 97% specificitymore » in the training data set, and false positive prediction was almost completely eliminated. Pathway analysis of feature genes revealed that the mitogen-activated protein kinase p38- and phosphatidylinositol-3-kinase-centered interactome and the v-myc myelocytomatosis viral oncogene homolog-centered interactome were the 2 most significant networks. The usefulness and robustness of our predictor were further confirmed in an independent validation data set obtained from the public database. Interestingly, similar positive predictions were obtained in several genotoxic hepatocarcinogens as well as non-genotoxic hepatocarcinogens. These results indicate that the expression profiles of our newly selected candidate biomarker genes might be common characteristics in the early stage of carcinogenesis for both genotoxic and non-genotoxic carcinogens in the rat liver. Our toxicogenomic model might be useful for the prospective screening of hepatocarcinogenicity of compounds and prioritization of compounds for carcinogenicity testing. - Highlights: >We developed a toxicogenomic model to predict hepatocarcinogenicity of chemicals. >The optimized model consisting of 9 probes had 99% sensitivity and 97% specificity. >This model enables us to detect genotoxic as well as non-genotoxic hepatocarcinogens.« less

  20. A Norway Spruce FLOWERING LOCUS T Homolog Is Implicated in Control of Growth Rhythm in Conifers1[OA

    PubMed Central

    Gyllenstrand, Niclas; Clapham, David; Källman, Thomas; Lagercrantz, Ulf

    2007-01-01

    Growth in perennial plants possesses an annual cycle of active growth and dormancy that is controlled by environmental factors, mainly photoperiod and temperature. In conifers and other nonangiosperm species, the molecular mechanisms behind these responses are currently unknown. In Norway spruce (Picea abies L. Karst.) seedlings, growth cessation and bud set are induced by short days and plants from southern latitudes require at least 7 to 10 h of darkness, whereas plants from northern latitudes need only 2 to 3 h of darkness. Bud burst, on the other hand, is almost exclusively controlled by temperature. To test the possible role of Norway spruce FLOWERING LOCUS T (FT)-like genes in growth rhythm, we have studied expression patterns of four Norway spruce FT family genes in two populations with a divergent bud set response under various photoperiodic conditions. Our data show a significant and tight correlation between growth rhythm (both bud set and bud burst), and expression pattern of one of the four Norway spruce phosphatidylethanolamine-binding protein gene family members (PaFT4) over a variety of experimental conditions. This study strongly suggests that one Norway spruce homolog to the FT gene, which controls flowering in angiosperms, is also a key integrator of photoperiodic and thermal signals in the control of growth rhythms in gymnosperms. The data also indicate that the divergent adaptive bud set responses of northern and southern Norway spruce populations, both to photoperiod and light quality, are mediated through PaFT4. These results provide a major advance in our understanding of the molecular control of a major adaptive trait in conifers and a tool for further molecular studies of adaptive variation in plants. PMID:17369429

Top