Sample records for gene model validation

  1. Validation of reference genes for quantitative gene expression analysis in experimental epilepsy.

    PubMed

    Sadangi, Chinmaya; Rosenow, Felix; Norwood, Braxton A

    2017-12-01

    To grasp the molecular mechanisms and pathophysiology underlying epilepsy development (epileptogenesis) and epilepsy itself, it is important to understand the gene expression changes that occur during these phases. Quantitative real-time polymerase chain reaction (qPCR) is a technique that rapidly and accurately determines gene expression changes. It is crucial, however, that stable reference genes are selected for each experimental condition to ensure that accurate values are obtained for genes of interest. If reference genes are unstably expressed, this can lead to inaccurate data and erroneous conclusions. To date, epilepsy studies have used mostly single, nonvalidated reference genes. This is the first study to systematically evaluate reference genes in male Sprague-Dawley rat models of epilepsy. We assessed 15 potential reference genes in hippocampal tissue obtained from 2 different models during epileptogenesis, 1 model during chronic epilepsy, and a model of noninjurious seizures. Reference gene ranking varied between models and also differed between epileptogenesis and chronic epilepsy time points. There was also some variance between the four mathematical models used to rank reference genes. Notably, we found novel reference genes to be more stably expressed than those most often used in experimental epilepsy studies. The consequence of these findings is that reference genes suitable for one epilepsy model may not be appropriate for others and that reference genes can change over time. It is, therefore, critically important to validate potential reference genes before using them as normalizing factors in expression analysis in order to ensure accurate, valid results. © 2017 Wiley Periodicals, Inc.

  2. A whole blood gene expression-based signature for smoking status

    PubMed Central

    2012-01-01

    Background Smoking is the leading cause of preventable death worldwide and has been shown to increase the risk of multiple diseases including coronary artery disease (CAD). We sought to identify genes whose levels of expression in whole blood correlate with self-reported smoking status. Methods Microarrays were used to identify gene expression changes in whole blood which correlated with self-reported smoking status; a set of significant genes from the microarray analysis were validated by qRT-PCR in an independent set of subjects. Stepwise forward logistic regression was performed using the qRT-PCR data to create a predictive model whose performance was validated in an independent set of subjects and compared to cotinine, a nicotine metabolite. Results Microarray analysis of whole blood RNA from 209 PREDICT subjects (41 current smokers, 4 quit ≤ 2 months, 64 quit > 2 months, 100 never smoked; NCT00500617) identified 4214 genes significantly correlated with self-reported smoking status. qRT-PCR was performed on 1,071 PREDICT subjects across 256 microarray genes significantly correlated with smoking or CAD. A five gene (CLDND1, LRRN3, MUC1, GOPC, LEF1) predictive model, derived from the qRT-PCR data using stepwise forward logistic regression, had a cross-validated mean AUC of 0.93 (sensitivity=0.78; specificity=0.95), and was validated using 180 independent PREDICT subjects (AUC=0.82, CI 0.69-0.94; sensitivity=0.63; specificity=0.94). Plasma from the 180 validation subjects was used to assess levels of cotinine; a model using a threshold of 10 ng/ml cotinine resulted in an AUC of 0.89 (CI 0.81-0.97; sensitivity=0.81; specificity=0.97; kappa with expression model = 0.53). Conclusion We have constructed and validated a whole blood gene expression score for the evaluation of smoking status, demonstrating that clinical and environmental factors contributing to cardiovascular disease risk can be assessed by gene expression. PMID:23210427

  3. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less

  4. In silico selection of expression reference genes with demonstrated stability in barley among a diverse set of tissues and cultivars

    USDA-ARS?s Scientific Manuscript database

    Premise of the study: Reference genes are selected based on the assumption of temporal and spatial expression stability and on their widespread use in model species. They are often used in new target species without validation, presumed as stable. For barley, reference gene validation is lacking, bu...

  5. Use of Bayesian Networks to Probabilistically Model and Improve the Likelihood of Validation of Microarray Findings by RT-PCR

    PubMed Central

    English, Sangeeta B.; Shih, Shou-Ching; Ramoni, Marco F.; Smith, Lois E.; Butte, Atul J.

    2014-01-01

    Though genome-wide technologies, such as microarrays, are widely used, data from these methods are considered noisy; there is still varied success in downstream biological validation. We report a method that increases the likelihood of successfully validating microarray findings using real time RT-PCR, including genes at low expression levels and with small differences. We use a Bayesian network to identify the most relevant sources of noise based on the successes and failures in validation for an initial set of selected genes, and then improve our subsequent selection of genes for validation based on eliminating these sources of noise. The network displays the significant sources of noise in an experiment, and scores the likelihood of validation for every gene. We show how the method can significantly increase validation success rates. In conclusion, in this study, we have successfully added a new automated step to determine the contributory sources of noise that determine successful or unsuccessful downstream biological validation. PMID:18790084

  6. Genotet: An Interactive Web-based Visual Exploration Framework to Support Validation of Gene Regulatory Networks.

    PubMed

    Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T

    2014-12-01

    Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).

  7. Computational discovery and in vivo validation of hnf4 as a regulatory gene in planarian regeneration.

    PubMed

    Lobo, Daniel; Morokuma, Junji; Levin, Michael

    2016-09-01

    Automated computational methods can infer dynamic regulatory network models directly from temporal and spatial experimental data, such as genetic perturbations and their resultant morphologies. Recently, a computational method was able to reverse-engineer the first mechanistic model of planarian regeneration that can recapitulate the main anterior-posterior patterning experiments published in the literature. Validating this comprehensive regulatory model via novel experiments that had not yet been performed would add in our understanding of the remarkable regeneration capacity of planarian worms and demonstrate the power of this automated methodology. Using the Michigan Molecular Interactions and STRING databases and the MoCha software tool, we characterized as hnf4 an unknown regulatory gene predicted to exist by the reverse-engineered dynamic model of planarian regeneration. Then, we used the dynamic model to predict the morphological outcomes under different single and multiple knock-downs (RNA interference) of hnf4 and its predicted gene pathway interactors β-catenin and hh Interestingly, the model predicted that RNAi of hnf4 would rescue the abnormal regenerated phenotype (tailless) of RNAi of hh in amputated trunk fragments. Finally, we validated these predictions in vivo by performing the same surgical and genetic experiments with planarian worms, obtaining the same phenotypic outcomes predicted by the reverse-engineered model. These results suggest that hnf4 is a regulatory gene in planarian regeneration, validate the computational predictions of the reverse-engineered dynamic model, and demonstrate the automated methodology for the discovery of novel genes, pathways and experimental phenotypes. michael.levin@tufts.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Creating and validating cis-regulatory maps of tissue-specific gene expression regulation

    PubMed Central

    O'Connor, Timothy R.; Bailey, Timothy L.

    2014-01-01

    Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps. PMID:25200088

  9. Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier.

    PubMed

    Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold

    2015-03-01

    A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Reverse transcription quantitative real-time polymerase chain reaction reference genes in the spared nerve injury model of neuropathic pain: validation and literature search.

    PubMed

    Piller, Nicolas; Decosterd, Isabelle; Suter, Marc R

    2013-07-10

    The reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a widely used, highly sensitive laboratory technique to rapidly and easily detect, identify and quantify gene expression. Reliable RT-qPCR data necessitates accurate normalization with validated control genes (reference genes) whose expression is constant in all studied conditions. This stability has to be demonstrated.We performed a literature search for studies using quantitative or semi-quantitative PCR in the rat spared nerve injury (SNI) model of neuropathic pain to verify whether any reference genes had previously been validated. We then analyzed the stability over time of 7 commonly used reference genes in the nervous system - specifically in the spinal cord dorsal horn and the dorsal root ganglion (DRG). These were: Actin beta (Actb), Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ribosomal proteins 18S (18S), L13a (RPL13a) and L29 (RPL29), hypoxanthine phosphoribosyltransferase 1 (HPRT1) and hydroxymethylbilane synthase (HMBS). We compared the candidate genes and established a stability ranking using the geNorm algorithm. Finally, we assessed the number of reference genes necessary for accurate normalization in this neuropathic pain model. We found GAPDH, HMBS, Actb, HPRT1 and 18S cited as reference genes in literature on studies using the SNI model. Only HPRT1 and 18S had been once previously demonstrated as stable in RT-qPCR arrays. All the genes tested in this study, using the geNorm algorithm, presented gene stability values (M-value) acceptable enough for them to qualify as potential reference genes in both DRG and spinal cord. Using the coefficient of variation, 18S failed the 50% cut-off with a value of 61% in the DRG. The two most stable genes in the dorsal horn were RPL29 and RPL13a; in the DRG they were HPRT1 and Actb. Using a 0.15 cut-off for pairwise variations we found that any pair of stable reference gene was sufficient for the normalization process. In the rat SNI model, we validated and ranked Actb, RPL29, RPL13a, HMBS, GAPDH, HPRT1 and 18S as good reference genes in the spinal cord. In the DRG, 18S did not fulfill stability criteria. The combination of any two stable reference genes was sufficient to provide an accurate normalization.

  11. Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

    PubMed

    Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

    2007-02-01

    Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.

  12. Genetic risk prediction and neurobiological understanding of alcoholism

    PubMed Central

    Levey, D F; Le-Niculescu, H; Frank, J; Ayalew, M; Jain, N; Kirlin, B; Learman, R; Winiger, E; Rodd, Z; Shekhar, A; Schork, N; Kiefe, F; Wodarz, N; Müller-Myhsok, B; Dahmen, N; Nöthen, M; Sherva, R; Farrer, L; Smith, A H; Kranzler, H R; Rietschel, M; Gelernter, J; Niculescu, A B

    2014-01-01

    We have used a translational Convergent Functional Genomics (CFG) approach to discover genes involved in alcoholism, by gene-level integration of genome-wide association study (GWAS) data from a German alcohol dependence cohort with other genetic and gene expression data, from human and animal model studies, similar to our previous work in bipolar disorder and schizophrenia. A panel of all the nominally significant P-value SNPs in the top candidate genes discovered by CFG  (n=135 genes, 713 SNPs) was used to generate a genetic  risk prediction score (GRPS), which showed a trend towards significance (P=0.053) in separating  alcohol dependent individuals from controls in an independent German test cohort. We then validated and prioritized our top findings from this discovery work, and subsequently tested them in three independent cohorts, from two continents. In order to validate and prioritize the key genes that drive behavior without some of the pleiotropic environmental confounds present in humans, we used a stress-reactive animal model of alcoholism developed by our group, the D-box binding protein (DBP) knockout mouse, consistent with the surfeit of stress theory of addiction proposed by Koob and colleagues. A much smaller panel (n=11 genes, 66 SNPs) of the top CFG-discovered genes for alcoholism, cross-validated and prioritized by this stress-reactive animal model showed better predictive ability in the independent German test cohort (P=0.041). The top CFG scoring gene for alcoholism from the initial discovery step, synuclein alpha (SNCA) remained the top gene after the stress-reactive animal model cross-validation. We also tested this small panel of genes in two other independent test cohorts from the United States, one with alcohol dependence (P=0.00012) and one with alcohol abuse (a less severe form of alcoholism; P=0.0094). SNCA by itself was able to separate alcoholics from controls in the alcohol-dependent cohort (P=0.000013) and the alcohol abuse cohort (P=0.023). So did eight other genes from the panel of 11 genes taken individually, albeit to a lesser extent and/or less broadly across cohorts. SNCA, GRM3 and MBP survived strict Bonferroni correction for multiple comparisons. Taken together, these results suggest that our stress-reactive DBP animal model helped to validate and prioritize from the CFG-discovered genes some of the key behaviorally relevant genes for alcoholism. These genes fall into a series of biological pathways involved in signal transduction, transmission of nerve impulse (including myelination) and cocaine addiction. Overall, our work provides leads towards a better understanding of illness, diagnostics and therapeutics, including treatment with omega-3 fatty acids. We also examined the overlap between the top candidate genes for alcoholism from this work and the top candidate genes for bipolar disorder, schizophrenia, anxiety from previous CFG analyses conducted by us, as well as cross-tested genetic risk predictions. This revealed the significant genetic overlap with other major psychiatric disorder domains, providing a basis for comorbidity and dual diagnosis, and placing alcohol use in the broader context of modulating the mental landscape. PMID:24844177

  13. A Gene Signature to Determine Metastatic Behavior in Thymomas

    PubMed Central

    Gökmen-Polar, Yesim; Wilkinson, Jeff; Maetzold, Derek; Stone, John F.; Oelschlager, Kristen M.; Vladislav, Ioan Tudor; Shirar, Kristen L.; Kesler, Kenneth A.; Loehrer, Patrick J.; Badve, Sunil

    2013-01-01

    Purpose Thymoma represents one of the rarest of all malignancies. Stage and completeness of resection have been used to ascertain postoperative therapeutic strategies albeit with limited prognostic accuracy. A molecular classifier would be useful to improve the assessment of metastatic behaviour and optimize patient management. Methods qRT-PCR assay for 23 genes (19 test and four reference genes) was performed on multi-institutional archival primary thymomas (n = 36). Gene expression levels were used to compute a signature, classifying tumors into classes 1 and 2, corresponding to low or high likelihood for metastases. The signature was validated in an independent multi-institutional cohort of patients (n = 75). Results A nine-gene signature that can predict metastatic behavior of thymomas was developed and validated. Using radial basis machine modeling in the training set, 5-year and 10-year metastasis-free survival rates were 77% and 26% for predicted low (class 1) and high (class 2) risk of metastasis (P = 0.0047, log-rank), respectively. For the validation set, 5-year metastasis-free survival rates were 97% and 30% for predicted low- and high-risk patients (P = 0.0004, log-rank), respectively. The 5-year metastasis-free survival rates for the validation set were 49% and 41% for Masaoka stages I/II and III/IV (P = 0.0537, log-rank), respectively. In univariate and multivariate Cox models evaluating common prognostic factors for thymoma metastasis, the nine-gene signature was the only independent indicator of metastases (P = 0.036). Conclusion A nine-gene signature was established and validated which predicts the likelihood of metastasis more accurately than traditional staging. This further underscores the biologic determinants of the clinical course of thymoma and may improve patient management. PMID:23894276

  14. Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blinded Validation Study

    PubMed Central

    Shedden, Kerby; Taylor, Jeremy M.G.; Enkemann, Steve A.; Tsao, Ming S.; Yeatman, Timothy J.; Gerald, William L.; Eschrich, Steve; Jurisica, Igor; Venkatraman, Seshan E.; Meyerson, Matthew; Kuick, Rork; Dobbin, Kevin K.; Lively, Tracy; Jacobson, James W.; Beer, David G.; Giordano, Thomas J.; Misek, David E.; Chang, Andrew C.; Zhu, Chang Qi; Strumpf, Dan; Hanash, Samir; Shepherd, Francis A.; Ding, Kuyue; Seymour, Lesley; Naoki, Katsuhiko; Pennell, Nathan; Weir, Barbara; Verhaak, Roel; Ladd-Acosta, Christine; Golub, Todd; Gruidl, Mike; Szoke, Janos; Zakowski, Maureen; Rusch, Valerie; Kris, Mark; Viale, Agnes; Motoi, Noriko; Travis, William; Sharma, Anupama

    2009-01-01

    Although prognostic gene expression signatures for survival in early stage lung cancer have been proposed, for clinical application it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) can be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas. PMID:18641660

  15. Prediction of chemo-response in serous ovarian cancer.

    PubMed

    Gonzalez Bosquet, Jesus; Newtson, Andreea M; Chung, Rebecca K; Thiel, Kristina W; Ginader, Timothy; Goodheart, Michael J; Leslie, Kimberly K; Smith, Brian J

    2016-10-19

    Nearly one-third of serous ovarian cancer (OVCA) patients will not respond to initial treatment with surgery and chemotherapy and die within one year of diagnosis. If patients who are unlikely to respond to current standard therapy can be identified up front, enhanced tumor analyses and treatment regimens could potentially be offered. Using the Cancer Genome Atlas (TCGA) serous OVCA database, we previously identified a robust molecular signature of 422-genes associated with chemo-response. Our objective was to test whether this signature is an accurate and sensitive predictor of chemo-response in serous OVCA. We first constructed prediction models to predict chemo-response using our previously described 422-gene signature that was associated with response to treatment in serous OVCA. Performance of all prediction models were measured with area under the curves (AUCs, a measure of the model's accuracy) and their respective confidence intervals (CIs). To optimize the prediction process, we determined which elements of the signature most contributed to chemo-response prediction. All prediction models were replicated and validated using six publicly available independent gene expression datasets. The 422-gene signature prediction models predicted chemo-response with AUCs of ~70 %. Optimization of prediction models identified the 34 most important genes in chemo-response prediction. These 34-gene models had improved performance, with AUCs approaching 80 %. Both 422-gene and 34-gene prediction models were replicated and validated in six independent datasets. These prediction models serve as the foundation for the future development and implementation of a diagnostic tool to predict response to chemotherapy for serous OVCA patients.

  16. Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

    PubMed Central

    Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

    2013-01-01

    Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129

  17. Selection of appropriate reference genes for RT-qPCR analysis in a streptozotocin-induced Alzheimer's disease model of cynomolgus monkeys (Macaca fascicularis).

    PubMed

    Park, Sang-Je; Kim, Young-Hyun; Lee, Youngjeon; Kim, Kyoung-Min; Kim, Heui-Soo; Lee, Sang-Rae; Kim, Sun-Uk; Kim, Sang-Hyun; Kim, Ji-Su; Jeong, Kang-Jin; Lee, Kyoung-Min; Huh, Jae-Won; Chang, Kyu-Tae

    2013-01-01

    Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) has been widely used to quantify relative gene expression because of the specificity, sensitivity, and accuracy of this technique. In order to obtain reliable gene expression data from RT-qPCR experiments, it is important to utilize optimal reference genes for the normalization of target gene expression under varied experimental conditions. Previously, we developed and validated a novel icv-STZ cynomolgus monkey model for Alzheimer's disease (AD) research. However, in order to enhance the reliability of this disease model, appropriate reference genes must be selected to allow meaningful analysis of the gene expression levels in the icv-STZ cynomolgus monkey brain. In this study, we assessed the expression stability of 9 candidate reference genes in 2 matched-pair brain samples (5 regions) of control cynomolgus monkeys and those who had received intracerebroventricular injection of streptozotocin (icv-STZ). Three well-known analytical programs geNorm, NormFinder, and BestKeeper were used to choose the suitable reference genes from the total sample group, control group, and icv-STZ group. Combination analysis of the 3 different programs clearly indicated that the ideal reference genes are RPS19 and YWHAZ in the total sample group, GAPDH and RPS19 in the control group, and ACTB and GAPDH in the icv-STZ group. Additionally, we validated the normalization accuracy of the most appropriate reference genes (RPS19 and YWHAZ) by comparison with the least stable gene (TBP) using quantification of the APP and MAPT genes in the total sample group. To the best of our knowledge, this research is the first study to identify and validate the appropriate reference genes in cynomolgus monkey brains. These findings provide useful information for future studies involving the expression of target genes in the cynomolgus monkey.

  18. Genetic risk prediction and neurobiological understanding of alcoholism.

    PubMed

    Levey, D F; Le-Niculescu, H; Frank, J; Ayalew, M; Jain, N; Kirlin, B; Learman, R; Winiger, E; Rodd, Z; Shekhar, A; Schork, N; Kiefer, F; Kiefe, F; Wodarz, N; Müller-Myhsok, B; Dahmen, N; Nöthen, M; Sherva, R; Farrer, L; Smith, A H; Kranzler, H R; Rietschel, M; Gelernter, J; Niculescu, A B

    2014-05-20

    We have used a translational Convergent Functional Genomics (CFG) approach to discover genes involved in alcoholism, by gene-level integration of genome-wide association study (GWAS) data from a German alcohol dependence cohort with other genetic and gene expression data, from human and animal model studies, similar to our previous work in bipolar disorder and schizophrenia. A panel of all the nominally significant P-value SNPs in the top candidate genes discovered by CFG  (n=135 genes, 713 SNPs) was used to generate a genetic  risk prediction score (GRPS), which showed a trend towards significance (P=0.053) in separating  alcohol dependent individuals from controls in an independent German test cohort. We then validated and prioritized our top findings from this discovery work, and subsequently tested them in three independent cohorts, from two continents. A panel of all the nominally significant P-value single-nucleotide length polymorphisms (SNPs) in the top candidate genes discovered by CFG (n=135 genes, 713 SNPs) were used to generate a Genetic Risk Prediction Score (GRPS), which showed a trend towards significance (P=0.053) in separating alcohol-dependent individuals from controls in an independent German test cohort. In order to validate and prioritize the key genes that drive behavior without some of the pleiotropic environmental confounds present in humans, we used a stress-reactive animal model of alcoholism developed by our group, the D-box binding protein (DBP) knockout mouse, consistent with the surfeit of stress theory of addiction proposed by Koob and colleagues. A much smaller panel (n=11 genes, 66 SNPs) of the top CFG-discovered genes for alcoholism, cross-validated and prioritized by this stress-reactive animal model showed better predictive ability in the independent German test cohort (P=0.041). The top CFG scoring gene for alcoholism from the initial discovery step, synuclein alpha (SNCA) remained the top gene after the stress-reactive animal model cross-validation. We also tested this small panel of genes in two other independent test cohorts from the United States, one with alcohol dependence (P=0.00012) and one with alcohol abuse (a less severe form of alcoholism; P=0.0094). SNCA by itself was able to separate alcoholics from controls in the alcohol-dependent cohort (P=0.000013) and the alcohol abuse cohort (P=0.023). So did eight other genes from the panel of 11 genes taken individually, albeit to a lesser extent and/or less broadly across cohorts. SNCA, GRM3 and MBP survived strict Bonferroni correction for multiple comparisons. Taken together, these results suggest that our stress-reactive DBP animal model helped to validate and prioritize from the CFG-discovered genes some of the key behaviorally relevant genes for alcoholism. These genes fall into a series of biological pathways involved in signal transduction, transmission of nerve impulse (including myelination) and cocaine addiction. Overall, our work provides leads towards a better understanding of illness, diagnostics and therapeutics, including treatment with omega-3 fatty acids. We also examined the overlap between the top candidate genes for alcoholism from this work and the top candidate genes for bipolar disorder, schizophrenia, anxiety from previous CFG analyses conducted by us, as well as cross-tested genetic risk predictions. This revealed the significant genetic overlap with other major psychiatric disorder domains, providing a basis for comorbidity and dual diagnosis, and placing alcohol use in the broader context of modulating the mental landscape.

  19. Gene expression complex networks: synthesis, identification, and analysis.

    PubMed

    Lopes, Fabrício M; Cesar, Roberto M; Costa, Luciano Da F

    2011-10-01

    Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree variation, decreasing its network recovery rate with the increase of . The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

  20. Gene network biological validity based on gene-gene interaction relevance.

    PubMed

    Gómez-Vela, Francisco; Díaz-Díaz, Norberto

    2014-01-01

    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.

  1. Selection and validation of reliable housekeeping genes to evaluate Piscirickettsia salmonis gene expression.

    PubMed

    Flores-Herrera, Patricio; Arredondo-Zelada, Oscar; Marshall, Sergio H; Gómez, Fernando A

    2018-06-01

    Piscirickettsia salmonis is a highly aggressive facultative intracellular bacterium that challenges the sustainability of Chilean salmon production. Due to the limited knowledge of its biology, there is a need to identify key molecular markers that could help define the pathogenic potential of this bacterium. We think a model system should be implemented that efficiently evaluates the expression of putative bacterial markers by using validated, stable, and highly specific housekeeping genes to properly select target genes, which could lead to identifying those responsible for infection and disease induction in naturally infected fish. Here, we selected a set of validated reference or housekeeping genes for RT-qPCR expression analyses of P. salmonis under different growth and stress conditions, including an in vitro infection kinetic. After a thorough screening, we selected sdhA as the most reliable housekeeping gene able to represent stable and highly specific host reference genes for RT-qPCR-driven P. salmonis analysis. Copyright © 2018. Published by Elsevier B.V.

  2. Gene Environment Interactions and Predictors of Colorectal Cancer in Family-Based, Multi-Ethnic Groups.

    PubMed

    Shiao, S Pamela K; Grayson, James; Yu, Chong Ho; Wasek, Brandi; Bottiglieri, Teodoro

    2018-02-16

    For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene-environment interactions and predictors of colorectal cancer (CRC) by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black). We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls ( p < 0.05), on MTHFR C677T , MTR A2756G , MTRR A66G, and DHFR 19 bp except MTHFR A1298C. Four racial groups presented different polymorphism rates for four genes (all p < 0.05) except MTHFR A1298C. Following the ensemble method, the most influential factors were identified, and the best predictive models were generated by using the generalized regression models, with Akaike's information criterion and leave-one-out cross validation methods. Body mass index (BMI) and gender were consistent predictors of CRC for both models when individual genes versus total polymorphism counts were used, and alcohol use was interactive with BMI status. Body mass index status was also interactive with both gender and MTHFR C677T gene polymorphism, and the exposure to environmental pollutants was an additional predictor. These results point to the important roles of environmental and modifiable factors in relation to gene-environment interactions in the prevention of CRC.

  3. Identifying differentially expressed genes in cancer patients using a non-parameter Ising model.

    PubMed

    Li, Xumeng; Feltus, Frank A; Sun, Xiaoqian; Wang, James Z; Luo, Feng

    2011-10-01

    Identification of genes and pathways involved in diseases and physiological conditions is a major task in systems biology. In this study, we developed a novel non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also proposed a simulated annealing algorithm to find the optimal configuration of the Ising model. The Ising model was applied to two breast cancer microarray data sets. The results showed that more cancer-related DE sub-networks and genes were identified by the Ising model than those by the Markov random field model. Furthermore, cross-validation experiments showed that DE genes identified by Ising model can improve classification performance compared with DE genes identified by Markov random field model. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Gene expression profile of mouse prostate tumors reveals dysregulations in major biological processes and identifies potential murine targets for preclinical development of human prostate cancer therapy.

    PubMed

    Haram, Kerstyn M; Peltier, Heidi J; Lu, Bin; Bhasin, Manoj; Otu, Hasan H; Choy, Bob; Regan, Meredith; Libermann, Towia A; Latham, Gary J; Sanda, Martin G; Arredouani, Mohamed S

    2008-10-01

    Translation of preclinical studies into effective human cancer therapy is hampered by the lack of defined molecular expression patterns in mouse models that correspond to the human counterpart. We sought to generate an open source TRAMP mouse microarray dataset and to use this array to identify differentially expressed genes from human prostate cancer (PCa) that have concordant expression in TRAMP tumors, and thereby represent lead targets for preclinical therapy development. We performed microarrays on total RNA extracted and amplified from eight TRAMP tumors and nine normal prostates. A subset of differentially expressed genes was validated by QRT-PCR. Differentially expressed TRAMP genes were analyzed for concordant expression in publicly available human prostate array datasets and a subset of resulting genes was analyzed by QRT-PCR. Cross-referencing differentially expressed TRAMP genes to public human prostate array datasets revealed 66 genes with concordant expression in mouse and human PCa; 56 between metastases and normal and 10 between primary tumor and normal tissues. Of these 10 genes, two, Sox4 and Tubb2a, were validated by QRT-PCR. Our analysis also revealed various dysregulations in major biologic pathways in the TRAMP prostates. We report a TRAMP microarray dataset of which a gene subset was validated by QRT-PCR with expression patterns consistent with previous gene-specific TRAMP studies. Concordance analysis between TRAMP and human PCa associated genes supports the utility of the model and suggests several novel molecular targets for preclinical therapy.

  5. Validation of RNAi Silencing Efficiency Using Gene Array Data shows 18.5% Failure Rate across 429 Independent Experiments.

    PubMed

    Munkácsy, Gyöngyi; Sztupinszki, Zsófia; Herman, Péter; Bán, Bence; Pénzváltó, Zsófia; Szarvas, Nóra; Győrffy, Balázs

    2016-09-27

    No independent cross-validation of success rate for studies utilizing small interfering RNA (siRNA) for gene silencing has been completed before. To assess the influence of experimental parameters like cell line, transfection technique, validation method, and type of control, we have to validate these in a large set of studies. We utilized gene chip data published for siRNA experiments to assess success rate and to compare methods used in these experiments. We searched NCBI GEO for samples with whole transcriptome analysis before and after gene silencing and evaluated the efficiency for the target and off-target genes using the array-based expression data. Wilcoxon signed-rank test was used to assess silencing efficacy and Kruskal-Wallis tests and Spearman rank correlation were used to evaluate study parameters. All together 1,643 samples representing 429 experiments published in 207 studies were evaluated. The fold change (FC) of down-regulation of the target gene was above 0.7 in 18.5% and was above 0.5 in 38.7% of experiments. Silencing efficiency was lowest in MCF7 and highest in SW480 cells (FC = 0.59 and FC = 0.30, respectively, P = 9.3E-06). Studies utilizing Western blot for validation performed better than those with quantitative polymerase chain reaction (qPCR) or microarray (FC = 0.43, FC = 0.47, and FC = 0.55, respectively, P = 2.8E-04). There was no correlation between type of control, transfection method, publication year, and silencing efficiency. Although gene silencing is a robust feature successfully cross-validated in the majority of experiments, efficiency remained insufficient in a significant proportion of studies. Selection of cell line model and validation method had the highest influence on silencing proficiency.

  6. Systems biology approach to late-onset Alzheimer's disease genome-wide association study identifies novel candidate genes validated using brain expression data and Caenorhabditis elegans experiments.

    PubMed

    Mukherjee, Shubhabrata; Russell, Joshua C; Carr, Daniel T; Burgess, Jeremy D; Allen, Mariet; Serie, Daniel J; Boehme, Kevin L; Kauwe, John S K; Naj, Adam C; Fardo, David W; Dickson, Dennis W; Montine, Thomas J; Ertekin-Taner, Nilufer; Kaeberlein, Matt R; Crane, Paul K

    2017-10-01

    We sought to determine whether a systems biology approach may identify novel late-onset Alzheimer's disease (LOAD) loci. We performed gene-wide association analyses and integrated results with human protein-protein interaction data using network analyses. We performed functional validation on novel genes using a transgenic Caenorhabditis elegans Aβ proteotoxicity model and evaluated novel genes using brain expression data from people with LOAD and other neurodegenerative conditions. We identified 13 novel candidate LOAD genes outside chromosome 19. Of those, RNA interference knockdowns of the C. elegans orthologs of UBC, NDUFS3, EGR1, and ATP5H were associated with Aβ toxicity, and NDUFS3, SLC25A11, ATP5H, and APP were differentially expressed in the temporal cortex. Network analyses identified novel LOAD candidate genes. We demonstrated a functional role for four of these in a C. elegans model and found enrichment of differentially expressed genes in the temporal cortex. Copyright © 2017 the Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  7. The druggable genome and support for target identification and validation in drug development.

    PubMed

    Finan, Chris; Gaulton, Anna; Kruger, Felix A; Lumbers, R Thomas; Shah, Tina; Engmann, Jorgen; Galver, Luana; Kelley, Ryan; Karlsson, Anneli; Santos, Rita; Overington, John P; Hingorani, Aroon D; Casas, Juan P

    2017-03-29

    Target identification (determining the correct drug targets for a disease) and target validation (demonstrating an effect of target perturbation on disease biomarkers and disease end points) are important steps in drug development. Clinically relevant associations of variants in genes encoding drug targets model the effect of modifying the same targets pharmacologically. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome-wide association studies to an updated set of genes encoding druggable human proteins, to agents with bioactivity against these targets, and, where there were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, which will enable association studies of druggable genes for drug target selection and validation in human disease. Copyright © 2017, American Association for the Advancement of Science.

  8. Predicting selective drug targets in cancer through metabolic networks

    PubMed Central

    Folger, Ori; Jerby, Livnat; Frezza, Christian; Gottlieb, Eyal; Ruppin, Eytan; Shlomi, Tomer

    2011-01-01

    The interest in studying metabolic alterations in cancer and their potential role as novel targets for therapy has been rejuvenated in recent years. Here, we report the development of the first genome-scale network model of cancer metabolism, validated by correctly identifying genes essential for cellular proliferation in cancer cell lines. The model predicts 52 cytostatic drug targets, of which 40% are targeted by known, approved or experimental anticancer drugs, and the rest are new. It further predicts combinations of synthetic lethal drug targets, whose synergy is validated using available drug efficacy and gene expression measurements across the NCI-60 cancer cell line collection. Finally, potential selective treatments for specific cancers that depend on cancer type-specific downregulation of gene expression and somatic mutations are compiled. PMID:21694718

  9. Simulating pattern-process relationships to validate landscape genetic models

    Treesearch

    A. J. Shirk; S. A. Cushman; E. L. Landguth

    2012-01-01

    Landscapes may resist gene flow and thereby give rise to a pattern of genetic isolation within a population. The mechanism by which a landscape resists gene flow can be inferred by evaluating the relationship between landscape models and an observed pattern of genetic isolation. This approach risks false inferences because researchers can never feasibly test all...

  10. Predictive model for inflammation grades of chronic hepatitis B: Large-scale analysis of clinical parameters and gene expressions.

    PubMed

    Zhou, Weichen; Ma, Yanyun; Zhang, Jun; Hu, Jingyi; Zhang, Menghan; Wang, Yi; Li, Yi; Wu, Lijun; Pan, Yida; Zhang, Yitong; Zhang, Xiaonan; Zhang, Xinxin; Zhang, Zhanqing; Zhang, Jiming; Li, Hai; Lu, Lungen; Jin, Li; Wang, Jiucun; Yuan, Zhenghong; Liu, Jie

    2017-11-01

    Liver biopsy is the gold standard to assess pathological features (eg inflammation grades) for hepatitis B virus-infected patients although it is invasive and traumatic; meanwhile, several gene profiles of chronic hepatitis B (CHB) have been separately described in relatively small hepatitis B virus (HBV)-infected samples. We aimed to analyse correlations among inflammation grades, gene expressions and clinical parameters (serum alanine amino transaminase, aspartate amino transaminase and HBV-DNA) in large-scale CHB samples and to predict inflammation grades by using clinical parameters and/or gene expressions. We analysed gene expressions with three clinical parameters in 122 CHB samples by an improved regression model. Principal component analysis and machine-learning methods including Random Forest, K-nearest neighbour and support vector machine were used for analysis and further diagnosis models. Six normal samples were conducted to validate the predictive model. Significant genes related to clinical parameters were found enriching in the immune system, interferon-stimulated, regulation of cytokine production, anti-apoptosis, and etc. A panel of these genes with clinical parameters can effectively predict binary classifications of inflammation grade (area under the ROC curve [AUC]: 0.88, 95% confidence interval [CI]: 0.77-0.93), validated by normal samples. A panel with only clinical parameters was also valuable (AUC: 0.78, 95% CI: 0.65-0.86), indicating that liquid biopsy method for detecting the pathology of CHB is possible. This is the first study to systematically elucidate the relationships among gene expressions, clinical parameters and pathological inflammation grades in CHB, and to build models predicting inflammation grades by gene expressions and/or clinical parameters as well. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  11. A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records.

    PubMed

    Jiang, Li; Edwards, Stefan M; Thomsen, Bo; Workman, Christopher T; Guldbrandtsen, Bernt; Sørensen, Peter

    2014-09-24

    Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.

  12. Gene Expression Differences in Peripheral Blood of Parkinson’s Disease Patients with Distinct Progression Profiles

    PubMed Central

    Soreq, Lilach; Lobo, Patrícia P.; Mestre, Tiago; Coelho, Miguel; Rosa, Mário M.; Gonçalves, Nilza; Wales, Pauline; Mendes, Tiago; Gerhardt, Ellen; Fahlbusch, Christiane; Bonifati, Vincenzo; Bonin, Michael; Miltenberger-Miltényi, Gabriel; Borovecki, Fran; Soreq, Hermona; Ferreira, Joaquim J.; F. Outeiro, Tiago

    2016-01-01

    The prognosis of neurodegenerative disorders is clinically challenging due to the inexistence of established biomarkers for predicting disease progression. Here, we performed an exploratory cross-sectional, case-control study aimed at determining whether gene expression differences in peripheral blood may be used as a signature of Parkinson’s disease (PD) progression, thereby shedding light into potential molecular mechanisms underlying disease development. We compared transcriptional profiles in the blood from 34 PD patients who developed postural instability within ten years with those of 33 patients who did not develop postural instability within this time frame. Our study identified >200 differentially expressed genes between the two groups. The expression of several of the genes identified was previously found deregulated in animal models of PD and in PD patients. Relevant genes were selected for validation by real-time PCR in a subset of patients. The genes validated were linked to nucleic acid metabolism, mitochondria, immune response and intracellular-transport. Interestingly, we also found deregulation of these genes in a dopaminergic cell model of PD, a simple paradigm that can now be used to further dissect the role of these molecular players on dopaminergic cell loss. Altogether, our study provides preliminary evidence that expression changes in specific groups of genes and pathways, detected in peripheral blood samples, may be correlated with differential PD progression. Our exploratory study suggests that peripheral gene expression profiling may prove valuable for assisting in prediction of PD prognosis, and identifies novel culprits possibly involved in dopaminergic cell death. Given the exploratory nature of our study, further investigations using independent, well-characterized cohorts will be essential in order to validate our candidates as predictors of PD prognosis and to definitively confirm the value of gene expression analysis in aiding patient stratification and therapeutic intervention. PMID:27322389

  13. Genetic mouse models relevant to schizophrenia: taking stock and looking forward.

    PubMed

    Harrison, Paul J; Pritchett, David; Stumpenhorst, Katharina; Betts, Jill F; Nissen, Wiebke; Schweimer, Judith; Lane, Tracy; Burnet, Philip W J; Lamsa, Karri P; Sharp, Trevor; Bannerman, David M; Tunbridge, Elizabeth M

    2012-03-01

    Genetic mouse models relevant to schizophrenia complement, and have to a large extent supplanted, pharmacological and lesion-based rat models. The main attraction is that they potentially have greater construct validity; however, they share the fundamental limitations of all animal models of psychiatric disorder, and must also be viewed in the context of the uncertain and complex genetic architecture of psychosis. Some of the key issues, including the choice of gene to target, the manner of its manipulation, gene-gene and gene-environment interactions, and phenotypic characterization, are briefly considered in this commentary, illustrated by the relevant papers reported in this special issue. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. Building and validating a prediction model for paediatric type 1 diabetes risk using next generation targeted sequencing of class II HLA genes.

    PubMed

    Zhao, Lue Ping; Carlsson, Annelie; Larsson, Helena Elding; Forsander, Gun; Ivarsson, Sten A; Kockum, Ingrid; Ludvigsson, Johnny; Marcus, Claude; Persson, Martina; Samuelsson, Ulf; Örtqvist, Eva; Pyo, Chul-Woo; Bolouri, Hamid; Zhao, Michael; Nelson, Wyatt C; Geraghty, Daniel E; Lernmark, Åke

    2017-11-01

    It is of interest to predict possible lifetime risk of type 1 diabetes (T1D) in young children for recruiting high-risk subjects into longitudinal studies of effective prevention strategies. Utilizing a case-control study in Sweden, we applied a recently developed next generation targeted sequencing technology to genotype class II genes and applied an object-oriented regression to build and validate a prediction model for T1D. In the training set, estimated risk scores were significantly different between patients and controls (P = 8.12 × 10 -92 ), and the area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was 0.917. Using the validation data set, we validated the result with AUC of 0.886. Combining both training and validation data resulted in a predictive model with AUC of 0.903. Further, we performed a "biological validation" by correlating risk scores with 6 islet autoantibodies, and found that the risk score was significantly correlated with IA-2A (Z-score = 3.628, P < 0.001). When applying this prediction model to the Swedish population, where the lifetime T1D risk ranges from 0.5% to 2%, we anticipate identifying approximately 20 000 high-risk subjects after testing all newborns, and this calculation would identify approximately 80% of all patients expected to develop T1D in their lifetime. Through both empirical and biological validation, we have established a prediction model for estimating lifetime T1D risk, using class II HLA. This prediction model should prove useful for future investigations to identify high-risk subjects for prevention research in high-risk populations. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm

    PubMed Central

    Zheng, Ming; Wu, Jia-nan; Huang, Yan-xin; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

    2012-01-01

    Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms. PMID:23226565

  16. Sustained phenotypic reversion of junctional epidermolysis bullosa dog keratinocytes: Establishment of an immunocompetent animal model for cutaneous gene therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Spirito, Flavia; Capt, Annabelle; Rio, Marcela Del

    2006-01-20

    Gene transfer represents the unique therapeutic issue for a number of inherited skin disorders including junctional epidermolysis bullosa (JEB), an untreatable genodermatose caused by mutations in the adhesion ligand laminin 5 ({alpha}3{beta}3{gamma}2) that is secreted in the extracellular matrix by the epidermal basal keratinocytes. Because gene therapy protocols require validation in animal models, we have phenotypically reverted by oncoretroviral transfer of the curative gene the keratinocytes isolated from dogs with a spontaneous form of JEB associated with a genetic mutation in the {alpha}3 chain of laminin 5. We show that the transduced dog JEB keratinocytes: (1) display a sustained secretionmore » of laminin 5 in the extracellular matrix; (2) recover the adhesion, proliferation, and clonogenic capacity of wild-type keratinocytes; (3) generate fully differentiated stratified epithelia that after grafting on immunocompromised mice produce phenotypically normal skin and sustain permanent expression of the transgene. We validate an animal model that appears particularly suitable to demonstrate feasibility, efficacy, and safety of genetic therapeutic strategies for cutaneous disorders before undertaking human clinical trials.« less

  17. Gene expression analysis uncovers novel Hedgehog interacting protein (HHIP) effects in human bronchial epithelial cells

    PubMed Central

    Zhou, Xiaobo; Qiu, Weiliang; Sathirapongsasuti, J. Fah.; Cho, Michael H.; Mancini, John D.; Lao, Taotao; Thibault, Derek M.; Litonjua, Gus; Bakke, Per S.; Gulsvik, Amund; Lomas, David A.; Beaty, Terri H.; Hersh, Craig P.; Anderson, Christopher; Geigenmuller, Ute; Raby, Benjamin A.; Rennard, Stephen I.; Perrella, Mark A.; Choi, Augustine M.K.; Quackenbush, John; Silverman, Edwin K.

    2013-01-01

    Hedgehog Interacting Protein (HHIP) was implicated in chronic obstructive pulmonary disease (COPD) by genome-wide association studies (GWAS). However, it remains unclear how HHIP contributes to COPD pathogenesis. To identify genes regulated by HHIP, we performed gene expression microarray analysis in a human bronchial epithelial cell line (Beas-2B) stably infected with HHIP shRNAs. HHIP silencing led to differential expression of 296 genes; enrichment for variants nominally associated with COPD was found. Eighteen of the differentially expressed genes were validated by real-time PCR in Beas-2B cells. Seven of 11 validated genes tested in human COPD and control lung tissues demonstrated significant gene expression differences. Functional annotation indicated enrichment for extracellular matrix and cell growth genes. Network modeling demonstrated that the extracellular matrix and cell proliferation genes influenced by HHIP tended to be interconnected. Thus, we identified potential HHIP targets in human bronchial epithelial cells that may contribute to COPD pathogenesis. PMID:23459001

  18. Function-driven discovery of disease genes in zebrafish using an integrated genomics big data resource.

    PubMed

    Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk

    2016-11-16

    Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Gene-environment interactions and construct validity in preclinical models of psychiatric disorders.

    PubMed

    Burrows, Emma L; McOmish, Caitlin E; Hannan, Anthony J

    2011-08-01

    The contributions of genetic risk factors to susceptibility for brain disorders are often so closely intertwined with environmental factors that studying genes in isolation cannot provide the full picture of pathogenesis. With recent advances in our understanding of psychiatric genetics and environmental modifiers we are now in a position to develop more accurate animal models of psychiatric disorders which exemplify the complex interaction of genes and environment. Here, we consider some of the insights that have emerged from studying the relationship between defined genetic alterations and environmental factors in rodent models. A key issue in such animal models is the optimization of construct validity, at both genetic and environmental levels. Standard housing of laboratory mice and rats generally includes ad libitum food access and limited opportunity for physical exercise, leading to metabolic dysfunction under control conditions, and thus reducing validity of animal models with respect to clinical populations. A related issue, of specific relevance to neuroscientists, is that most standard-housed rodents have limited opportunity for sensory and cognitive stimulation, which in turn provides reduced incentive for complex motor activity. Decades of research using environmental enrichment has demonstrated beneficial effects on brain and behavior in both wild-type and genetically modified rodent models, relative to standard-housed littermate controls. One interpretation of such studies is that environmentally enriched animals more closely approximate average human levels of cognitive and sensorimotor stimulation, whereas the standard housing currently used in most laboratories models a more sedentary state of reduced mental and physical activity and abnormal stress levels. The use of such standard housing as a single environmental variable may limit the capacity for preclinical models to translate into successful clinical trials. Therefore, there is a need to optimize 'environmental construct validity' in animal models, while maintaining comparability between laboratories, so as to ensure optimal scientific and medical outcomes. Utilizing more sophisticated models to elucidate the relative contributions of genetic and environmental factors will allow for improved construct, face and predictive validity, thus facilitating the identification of novel therapeutic targets. Copyright © 2010 Elsevier Inc. All rights reserved.

  20. Gene expression profiles reveal key genes for early diagnosis and treatment of adamantinomatous craniopharyngioma.

    PubMed

    Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing

    2018-04-23

    Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis and benefit the therapy improvement.

  1. Zebrafish models for the functional genomics of neurogenetic disorders.

    PubMed

    Kabashi, Edor; Brustein, Edna; Champagne, Nathalie; Drapeau, Pierre

    2011-03-01

    In this review, we consider recent work using zebrafish to validate and study the functional consequences of mutations of human genes implicated in a broad range of degenerative and developmental disorders of the brain and spinal cord. Also we present technical considerations for those wishing to study their own genes of interest by taking advantage of this easily manipulated and clinically relevant model organism. Zebrafish permit mutational analyses of genetic function (gain or loss of function) and the rapid validation of human variants as pathological mutations. In particular, neural degeneration can be characterized at genetic, cellular, functional, and behavioral levels. Zebrafish have been used to knock down or express mutations in zebrafish homologs of human genes and to directly express human genes bearing mutations related to neurodegenerative disorders such as spinal muscular atrophy, ataxia, hereditary spastic paraplegia, amyotrophic lateral sclerosis (ALS), epilepsy, Huntington's disease, Parkinson's disease, fronto-temporal dementia, and Alzheimer's disease. More recently, we have been using zebrafish to validate mutations of synaptic genes discovered by large-scale genomic approaches in developmental disorders such as autism, schizophrenia, and non-syndromic mental retardation. Advances in zebrafish genetics such as multigenic analyses and chemical genetics now offer a unique potential for disease research. Thus, zebrafish hold much promise for advancing the functional genomics of human diseases, the understanding of the genetics and cell biology of degenerative and developmental disorders, and the discovery of therapeutics. This article is part of a Special Issue entitled Zebrafish Models of Neurological Diseases. Copyright © 2010 Elsevier B.V. All rights reserved.

  2. A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning

    PubMed Central

    2018-01-01

    Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968

  3. Sequence-based model of gap gene regulatory network.

    PubMed

    Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria

    2014-01-01

    The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3) functional important sites are not exclusively located in cis-regulatory elements, but are rather dispersed through regulatory region. It is of importance that some of the sites with high functional impact in hb, Kr and kni regulatory regions coincide with strong sites annotated and verified in Dnase I footprint assays.

  4. Prediction of plant lncRNA by ensemble machine learning classifiers.

    PubMed

    Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

    2018-05-02

    In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.

  5. Identification of HMX1 target genes: A predictive promoter model approach

    PubMed Central

    Boulling, Arnaud; Wicht, Linda

    2013-01-01

    Purpose A homozygous mutation in the H6 family homeobox 1 (HMX1) gene is responsible for a new oculoauricular defect leading to eye and auricular developmental abnormalities as well as early retinal degeneration (MIM 612109). However, the HMX1 pathway remains poorly understood, and in the first approach to better understand the pathway’s function, we sought to identify the target genes. Methods We developed a predictive promoter model (PPM) approach using a comparative transcriptomic analysis in the retina at P15 of a mouse model lacking functional Hmx1 (dmbo mouse) and its respective wild-type. This PPM was based on the hypothesis that HMX1 binding site (HMX1-BS) clusters should be more represented in promoters of HMX1 target genes. The most differentially expressed genes in the microarray experiment that contained HMX1-BS clusters were used to generate the PPM, which was then statistically validated. Finally, we developed two genome-wide target prediction methods: one that focused on conserving PPM features in human and mouse and one that was based on the co-occurrence of HMX1-BS pairs fitting the PPM, in human or in mouse, independently. Results The PPM construction revealed that sarcoglycan, gamma (35kDa dystrophin-associated glycoprotein) (Sgcg), teashirt zinc finger homeobox 2 (Tshz2), and solute carrier family 6 (neurotransmitter transporter, glycine) (Slc6a9) genes represented Hmx1 targets in the mouse retina at P15. Moreover, the genome-wide target prediction revealed that mouse genes belonging to the retinal axon guidance pathway were targeted by Hmx1. Expression of these three genes was experimentally validated using a quantitative reverse transcription PCR approach. The inhibitory activity of Hmx1 on Sgcg, as well as protein tyrosine phosphatase, receptor type, O (Ptpro) and Sema3f, two targets identified by the PPM, were validated with luciferase assay. Conclusions Gene expression analysis between wild-type and dmbo mice allowed us to develop a PPM that identified the first target genes of Hmx1. PMID:23946633

  6. Animal models of Duchenne muscular dystrophy: from basic mechanisms to gene therapy

    PubMed Central

    McGreevy, Joe W.; Hakim, Chady H.; McIntosh, Mark A.; Duan, Dongsheng

    2015-01-01

    Duchenne muscular dystrophy (DMD) is a progressive muscle-wasting disorder. It is caused by loss-of-function mutations in the dystrophin gene. Currently, there is no cure. A highly promising therapeutic strategy is to replace or repair the defective dystrophin gene by gene therapy. Numerous animal models of DMD have been developed over the last 30 years, ranging from invertebrate to large mammalian models. mdx mice are the most commonly employed models in DMD research and have been used to lay the groundwork for DMD gene therapy. After ~30 years of development, the field has reached the stage at which the results in mdx mice can be validated and scaled-up in symptomatic large animals. The canine DMD (cDMD) model will be excellent for these studies. In this article, we review the animal models for DMD, the pros and cons of each model system, and the history and progress of preclinical DMD gene therapy research in the animal models. We also discuss the current and emerging challenges in this field and ways to address these challenges using animal models, in particular cDMD dogs. PMID:25740330

  7. Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer.

    PubMed

    Paik, E Sun; Choi, Hyun Jin; Kim, Tae-Joong; Lee, Jeong-Won; Kim, Byoung-Gie; Bae, Duk-Soo; Choi, Chel Hun

    2018-04-01

    We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients' survival. Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.

  8. Gene Polymorphism Association with Type 2 Diabetes and Related Gene-Gene and Gene-Environment Interactions in a Uyghur Population

    PubMed Central

    Xiao, Shan; Zeng, Xiaoyun; Fan, Yong; Su, Yinxia; Ma, Qi; Zhu, Jun; Yao, Hua

    2016-01-01

    Background We investigated the association between 8 single-nucleotide polymorphisms (SNPs) at 3 genetic loci (CDKAL1, CDKN2A/2B and FTO) with type 2 diabetes (T2D) in a Uyghur population. Material/Methods A case-control study of 879 Uyghur patients with T2D and 895 non-diabetic Uyghur controls was conducted at the Hospital of Xinjiang Medical University between 2010 and 2013. Eight SNPs in CDKAL1, CDKN2A/2B and FTO were analyzed using Sequenom MassARRAY®SNP genotyping. Factors associated with T2D were assessed by logistic regression analyses. Gene-gene and gene-environment interactions were analyzed by generalized multifactor dimensionality reduction. Results Genotype distributions of rs10811661 (CDKN2A/2B), rs7195539, rs8050136, and rs9939609 (FTO) and allele frequencies of rs8050136 and rs9939609 differed significantly between diabetes and control groups (all P<0.05). While rs10811661, rs8050136, and rs9939609 were eliminated after adjusting for covariates (P>0.05), rs7195539 distribution differed significantly in co-dominant and dominant models (P<0.05). In gene-gene interaction analysis, after adjusting for covariates the two-locus rs10811661-rs7195539 interaction model had a cross-validation consistency of 10/10 and the highest balanced accuracy of 0.5483 (P=0.014). In gene-environment interaction analysis, the 3-locus interaction model TG-HDL-family history of diabetes had a cross-validation consistency of 10/10 and the highest balanced accuracy of 0.7072 (P<0.001). The 4-locus interaction model, rs7195539-TG-HDL-family history of diabetes had a cross-validation consistency of 8/10 (P<0.001). Conclusions Polymorphisms in CDKN2A/2B and FTO, but not CDKAL1, may be associated with T2D, and alleles rs8050136 and rs9939609 are likely risk alleles for T2D in this population. There were potential interactions among CDKN2A/2B (rs10811661) – FTO (rs7195539) or FTO (rs7195539)-TG-HDL-family history of diabetes in the pathogenesis of T2D in a Uyghur population. PMID:26873362

  9. In Silico Enhancing M. tuberculosis Protein Interaction Networks in STRING To Predict Drug-Resistance Pathways and Pharmacological Risks.

    PubMed

    Mei, Suyu

    2018-05-04

    Bacterial protein-protein interaction (PPI) networks are significant to reveal the machinery of signal transduction and drug resistance within bacterial cells. The database STRING has collected a large number of bacterial pathogen PPI networks, but most of the data are of low quality without being experimentally or computationally validated, thus restricting its further biomedical applications. We exploit the experimental data via four solutions to enhance the quality of M. tuberculosis H37Rv (MTB) PPI networks in STRING. Computational results show that the experimental data derived jointly by two-hybrid and copurification approaches are the most reliable to train an L 2 -regularized logistic regression model for MTB PPI network validation. On the basis of the validated MTB PPI networks, we further study the three problems via breadth-first graph search algorithm: (1) discovery of MTB drug-resistance pathways through searching for the paths between known drug-target genes and drug-resistance genes, (2) choosing potential cotarget genes via searching for the critical genes located on multiple pathways, and (3) choosing essential drug-target genes via analysis of network degree distribution. In addition, we further combine the validated MTB PPI networks with human PPI networks to analyze the potential pharmacological risks of known and candidate drug-target genes from the point of view of system pharmacology. The evidence from protein structure alignment demonstrates that the drugs that act on MTB target genes could also adversely act on human signaling pathways.

  10. Systems-level modeling of mycobacterial metabolism for the identification of new (multi-)drug targets.

    PubMed

    Rienksma, Rienk A; Suarez-Diez, Maria; Spina, Lucie; Schaap, Peter J; Martins dos Santos, Vitor A P

    2014-12-01

    Systems-level metabolic network reconstructions and the derived constraint-based (CB) mathematical models are efficient tools to explore bacterial metabolism. Approximately one-fourth of the Mycobacterium tuberculosis (Mtb) genome contains genes that encode proteins directly involved in its metabolism. These represent potential drug targets that can be systematically probed with CB models through the prediction of genes essential (or the combination thereof) for the pathogen to grow. However, gene essentiality depends on the growth conditions and, so far, no in vitro model precisely mimics the host at the different stages of mycobacterial infection, limiting model predictions. These limitations can be circumvented by combining expression data from in vivo samples with a validated CB model, creating an accurate description of pathogen metabolism in the host. To this end, we present here a thoroughly curated and extended genome-scale CB metabolic model of Mtb quantitatively validated using 13C measurements. We describe some of the efforts made in integrating CB models and high-throughput data to generate condition specific models, and we will discuss challenges ahead. This knowledge and the framework herein presented will enable to identify potential new drug targets, and will foster the development of optimal therapeutic strategies. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  11. Lessons Learned from a Cross-Model Validation between a Discrete Event Simulation Model and a Cohort State-Transition Model for Personalized Breast Cancer Treatment.

    PubMed

    Jahn, Beate; Rochau, Ursula; Kurzthaler, Christina; Paulden, Mike; Kluibenschädl, Martina; Arvandi, Marjan; Kühne, Felicitas; Goehler, Alexander; Krahn, Murray D; Siebert, Uwe

    2016-04-01

    Breast cancer is the most common malignancy among women in developed countries. We developed a model (the Oncotyrol breast cancer outcomes model) to evaluate the cost-effectiveness of a 21-gene assay when used in combination with Adjuvant! Online to support personalized decisions about the use of adjuvant chemotherapy. The goal of this study was to perform a cross-model validation. The Oncotyrol model evaluates the 21-gene assay by simulating a hypothetical cohort of 50-year-old women over a lifetime horizon using discrete event simulation. Primary model outcomes were life-years, quality-adjusted life-years (QALYs), costs, and incremental cost-effectiveness ratios (ICERs). We followed the International Society for Pharmacoeconomics and Outcomes Research-Society for Medical Decision Making (ISPOR-SMDM) best practice recommendations for validation and compared modeling results of the Oncotyrol model with the state-transition model developed by the Toronto Health Economics and Technology Assessment (THETA) Collaborative. Both models were populated with Canadian THETA model parameters, and outputs were compared. The differences between the models varied among the different validation end points. The smallest relative differences were in costs, and the greatest were in QALYs. All relative differences were less than 1.2%. The cost-effectiveness plane showed that small differences in the model structure can lead to different sets of nondominated test-treatment strategies with different efficiency frontiers. We faced several challenges: distinguishing between differences in outcomes due to different modeling techniques and initial coding errors, defining meaningful differences, and selecting measures and statistics for comparison (means, distributions, multivariate outcomes). Cross-model validation was crucial to identify and correct coding errors and to explain differences in model outcomes. In our comparison, small differences in either QALYs or costs led to changes in ICERs because of changes in the set of dominated and nondominated strategies. © The Author(s) 2015.

  12. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

    PubMed

    Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh

    2018-04-26

    Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

  13. Validation of endogenous reference genes for qRT-PCR analysis of human visceral adipose samples

    PubMed Central

    2010-01-01

    Background Given the epidemic proportions of obesity worldwide and the concurrent prevalence of metabolic syndrome, there is an urgent need for better understanding the underlying mechanisms of metabolic syndrome, in particular, the gene expression differences which may participate in obesity, insulin resistance and the associated series of chronic liver conditions. Real-time PCR (qRT-PCR) is the standard method for studying changes in relative gene expression in different tissues and experimental conditions. However, variations in amount of starting material, enzymatic efficiency and presence of inhibitors can lead to quantification errors. Hence the need for accurate data normalization is vital. Among several known strategies for data normalization, the use of reference genes as an internal control is the most common approach. Recent studies have shown that both obesity and presence of insulin resistance influence an expression of commonly used reference genes in omental fat. In this study we validated candidate reference genes suitable for qRT-PCR profiling experiments using visceral adipose samples from obese and lean individuals. Results Cross-validation of expression stability of eight selected reference genes using three popular algorithms, GeNorm, NormFinder and BestKeeper found ACTB and RPII as most stable reference genes. Conclusions We recommend ACTB and RPII as stable reference genes most suitable for gene expression studies of human visceral adipose tissue. The use of these genes as a reference pair may further enhance the robustness of qRT-PCR in this model system. PMID:20492695

  14. Validation of endogenous reference genes for qRT-PCR analysis of human visceral adipose samples.

    PubMed

    Mehta, Rohini; Birerdinc, Aybike; Hossain, Noreen; Afendy, Arian; Chandhoke, Vikas; Younossi, Zobair; Baranova, Ancha

    2010-05-21

    Given the epidemic proportions of obesity worldwide and the concurrent prevalence of metabolic syndrome, there is an urgent need for better understanding the underlying mechanisms of metabolic syndrome, in particular, the gene expression differences which may participate in obesity, insulin resistance and the associated series of chronic liver conditions. Real-time PCR (qRT-PCR) is the standard method for studying changes in relative gene expression in different tissues and experimental conditions. However, variations in amount of starting material, enzymatic efficiency and presence of inhibitors can lead to quantification errors. Hence the need for accurate data normalization is vital. Among several known strategies for data normalization, the use of reference genes as an internal control is the most common approach. Recent studies have shown that both obesity and presence of insulin resistance influence an expression of commonly used reference genes in omental fat. In this study we validated candidate reference genes suitable for qRT-PCR profiling experiments using visceral adipose samples from obese and lean individuals. Cross-validation of expression stability of eight selected reference genes using three popular algorithms, GeNorm, NormFinder and BestKeeper found ACTB and RPII as most stable reference genes. We recommend ACTB and RPII as stable reference genes most suitable for gene expression studies of human visceral adipose tissue. The use of these genes as a reference pair may further enhance the robustness of qRT-PCR in this model system.

  15. A quantitative validated model reveals two phases of transcriptional regulation for the gap gene giant in Drosophila.

    PubMed

    Hoermann, Astrid; Cicin-Sain, Damjan; Jaeger, Johannes

    2016-03-15

    Understanding eukaryotic transcriptional regulation and its role in development and pattern formation is one of the big challenges in biology today. Most attempts at tackling this problem either focus on the molecular details of transcription factor binding, or aim at genome-wide prediction of expression patterns from sequence through bioinformatics and mathematical modelling. Here we bridge the gap between these two complementary approaches by providing an integrative model of cis-regulatory elements governing the expression of the gap gene giant (gt) in the blastoderm embryo of Drosophila melanogaster. We use a reverse-engineering method, where mathematical models are fit to quantitative spatio-temporal reporter gene expression data to infer the regulatory mechanisms underlying gt expression in its anterior and posterior domains. These models are validated through prediction of gene expression in mutant backgrounds. A detailed analysis of our data and models reveals that gt is regulated by domain-specific CREs at early stages, while a late element drives expression in both the anterior and the posterior domains. Initial gt expression depends exclusively on inputs from maternal factors. Later, gap gene cross-repression and gt auto-activation become increasingly important. We show that auto-regulation creates a positive feedback, which mediates the transition from early to late stages of regulation. We confirm the existence and role of gt auto-activation through targeted mutagenesis of Gt transcription factor binding sites. In summary, our analysis provides a comprehensive picture of spatio-temporal gene regulation by different interacting enhancer elements for an important developmental regulator. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Identifying novel interventional strategies for psychiatric disorders: integrating genomics, 'enviromics' and gene-environment interactions in valid preclinical models.

    PubMed

    McOmish, Caitlin E; Burrows, Emma L; Hannan, Anthony J

    2014-10-01

    Psychiatric disorders affect a substantial proportion of the population worldwide. This high prevalence, combined with the chronicity of the disorders and the major social and economic impacts, creates a significant burden. As a result, an important priority is the development of novel and effective interventional strategies for reducing incidence rates and improving outcomes. This review explores the progress that has been made to date in establishing valid animal models of psychiatric disorders, while beginning to unravel the complex factors that may be contributing to the limitations of current methodological approaches. We propose some approaches for optimizing the validity of animal models and developing effective interventions. We use schizophrenia and autism spectrum disorders as examples of disorders for which development of valid preclinical models, and fully effective therapeutics, have proven particularly challenging. However, the conclusions have relevance to various other psychiatric conditions, including depression, anxiety and bipolar disorders. We address the key aspects of construct, face and predictive validity in animal models, incorporating genetic and environmental factors. Our understanding of psychiatric disorders is accelerating exponentially, revealing extraordinary levels of genetic complexity, heterogeneity and pleiotropy. The environmental factors contributing to individual, and multiple, disorders also exhibit breathtaking complexity, requiring systematic analysis to experimentally explore the environmental mediators and modulators which constitute the 'envirome' of each psychiatric disorder. Ultimately, genetic and environmental factors need to be integrated via animal models incorporating the spatiotemporal complexity of gene-environment interactions and experience-dependent plasticity, thus better recapitulating the dynamic nature of brain development, function and dysfunction. © 2014 The British Pharmacological Society.

  17. Gene × Environment Interactions in Schizophrenia: Evidence from Genetic Mouse Models

    PubMed Central

    Marr, Julia; Bock, Gavin; Desbonnet, Lieve; Waddington, John

    2016-01-01

    The study of gene × environment, as well as epistatic interactions in schizophrenia, has provided important insight into the complex etiopathologic basis of schizophrenia. It has also increased our understanding of the role of susceptibility genes in the disorder and is an important consideration as we seek to translate genetic advances into novel antipsychotic treatment targets. This review summarises data arising from research involving the modelling of gene × environment interactions in schizophrenia using preclinical genetic models. Evidence for synergistic effects on the expression of schizophrenia-relevant endophenotypes will be discussed. It is proposed that valid and multifactorial preclinical models are important tools for identifying critical areas, as well as underlying mechanisms, of convergence of genetic and environmental risk factors, and their interaction in schizophrenia. PMID:27725886

  18. A Dynamical Model Reveals Gene Co-Localizations in Nucleus

    PubMed Central

    Yao, Ye; Lin, Wei; Hennessy, Conor; Fraser, Peter; Feng, Jianfeng

    2011-01-01

    Co-localization of networks of genes in the nucleus is thought to play an important role in determining gene expression patterns. Based upon experimental data, we built a dynamical model to test whether pure diffusion could account for the observed co-localization of genes within a defined subnuclear region. A simple standard Brownian motion model in two and three dimensions shows that preferential co-localization is possible for co-regulated genes without any direct interaction, and suggests the occurrence may be due to a limitation in the number of available transcription factors. Experimental data of chromatin movements demonstrates that fractional rather than standard Brownian motion is more appropriate to model gene mobilizations, and we tested our dynamical model against recent static experimental data, using a sub-diffusion process by which the genes tend to colocalize more easily. Moreover, in order to compare our model with recently obtained experimental data, we studied the association level between genes and factors, and presented data supporting the validation of this dynamic model. As further applications of our model, we applied it to test against more biological observations. We found that increasing transcription factor number, rather than factory number and nucleus size, might be the reason for decreasing gene co-localization. In the scenario of frequency- or amplitude-modulation of transcription factors, our model predicted that frequency-modulation may increase the co-localization between its targeted genes. PMID:21760760

  19. Gene-Auto: Automatic Software Code Generation for Real-Time Embedded Systems

    NASA Astrophysics Data System (ADS)

    Rugina, A.-E.; Thomas, D.; Olive, X.; Veran, G.

    2008-08-01

    This paper gives an overview of the Gene-Auto ITEA European project, which aims at building a qualified C code generator from mathematical models under Matlab-Simulink and Scilab-Scicos. The project is driven by major European industry partners, active in the real-time embedded systems domains. The Gene- Auto code generator will significantly improve the current development processes in such domains by shortening the time to market and by guaranteeing the quality of the generated code through the use of formal methods. The first version of the Gene-Auto code generator has already been released and has gone thought a validation phase on real-life case studies defined by each project partner. The validation results are taken into account in the implementation of the second version of the code generator. The partners aim at introducing the Gene-Auto results into industrial development by 2010.

  20. Selenium and Vitamin E: Cell Type– and Intervention-Specific Tissue Effects in Prostate Cancer

    PubMed Central

    Tsavachidou, Dimitra; McDonnell, Timothy J.; Wen, Sijin; Wang, Xuemei; Vakar-Lopez, Funda; Pisters, Louis L.; Pettaway, Curtis A.; Wood, Christopher G.; Do, Kim-Anh; Thall, Peter F.; Stephens, Clifton; Efstathiou, Eleni; Taylor, Robert; Menter, David G.; Troncoso, Patricia; Lippman, Scott M.; Logothetis, Christopher J.

    2009-01-01

    Background Secondary analyses of two randomized, controlled phase III trials demonstrated that selenium and vitamin E could reduce prostate cancer incidence. To characterize pharmacodynamic and gene expression effects associated with use of selenium and vitamin E, we undertook a randomized, placebo-controlled phase IIA study of prostate cancer patients before prostatectomy and created a preoperative model for prostatectomy tissue interrogation. Methods Thirty-nine men with prostate cancer were randomly assigned to treatment with 200 μg of selenium, 400 IU of vitamin E, both, or placebo. Laser capture microdissection of prostatectomy biopsy specimens was used to isolate normal, stromal, and tumor cells. Gene expression in each cell type was studied with microarray analysis and validated with a real-time polymerase chain reaction (PCR) and immunohistochemistry. An analysis of variance model was fit to identify genes differentially expressed between treatments and cell types. A beta-uniform mixture model was used to analyze differential expression of genes and to assess the false discovery rate. All statistical tests were two-sided. Results The highest numbers of differentially expressed genes by treatment were 1329 (63%) of 2109 genes in normal epithelial cells after selenium treatment, 1354 (66%) of 2051 genes in stromal cells after vitamin E treatment, and 329 (56%) of 587 genes in tumor cells after combination treatment (false discovery rate = 2%). Validation of 21 representative genes across all treatments and all cell types yielded Spearman correlation coefficients between the microarray analysis and the PCR validation ranging from 0.64 (95% confidence interval [CI] = 0.31 to 0.79) for the vitamin E group to 0.87 (95% CI = 0.53 to 0.99) for the selenium group. The increase in the mean percentage of p53-positive tumor cells in the selenium-treated group (26.3%), compared with that in the placebo-treated group (5%), showed borderline statistical significance (difference = 21.3%; 95% CI = 0.7 to 41.8; P = .051). Conclusions We have demonstrated the feasibility and efficiency of the preoperative model and its power as a hypothesis-generating engine. We have also identified cell type– and zone-specific tissue effects of interventions with selenium and vitamin E that may have clinical implications. PMID:19244175

  1. Development and Validation of the PREMM5 Model for Comprehensive Risk Assessment of Lynch Syndrome.

    PubMed

    Kastrinos, Fay; Uno, Hajime; Ukaegbu, Chinedu; Alvero, Carmelita; McFarland, Ashley; Yurgelun, Matthew B; Kulke, Matthew H; Schrag, Deborah; Meyerhardt, Jeffrey A; Fuchs, Charles S; Mayer, Robert J; Ng, Kimmie; Steyerberg, Ewout W; Syngal, Sapna

    2017-07-01

    Purpose Current Lynch syndrome (LS) prediction models quantify the risk to an individual of carrying a pathogenic germline mutation in three mismatch repair (MMR) genes: MLH1, MSH2, and MSH6. We developed a new prediction model, PREMM 5 , that incorporates the genes PMS2 and EPCAM to provide comprehensive LS risk assessment. Patients and Methods PREMM 5 was developed to predict the likelihood of a mutation in any of the LS genes by using polytomous logistic regression analysis of clinical and germline data from 18,734 individuals who were tested for all five genes. Predictors of mutation status included sex, age at genetic testing, and proband and family cancer histories. Discrimination was evaluated by the area under the receiver operating characteristic curve (AUC), and clinical impact was determined by decision curve analysis; comparisons were made to the existing PREMM 1,2,6 model. External validation of PREMM 5 was performed in a clinic-based cohort of 1,058 patients with colorectal cancer. Results Pathogenic mutations were detected in 1,000 (5%) of 18,734 patients in the development cohort; mutations included MLH1 (n = 306), MSH2 (n = 354), MSH6 (n = 177), PMS2 (n = 141), and EPCAM (n = 22). PREMM 5 distinguished carriers from noncarriers with an AUC of 0.81 (95% CI, 0.79 to 0.82), and performance was similar in the validation cohort (AUC, 0.83; 95% CI, 0.75 to 0.92). Prediction was more difficult for PMS2 mutations (AUC, 0.64; 95% CI, 0.60 to 0.68) than for other genes. Performance characteristics of PREMM 5 exceeded those of PREMM 1,2,6 . Decision curve analysis supported germline LS testing for PREMM 5 scores ≥ 2.5%. Conclusion PREMM 5 provides comprehensive risk estimation of all five LS genes and supports LS genetic testing for individuals with scores ≥ 2.5%. At this threshold, PREMM 5 provides performance that is superior to the existing PREMM 1,2,6 model in the identification of carriers of LS, including those with weaker phenotypes and individuals unaffected by cancer.

  2. Development and Validation of the PREMM5 Model for Comprehensive Risk Assessment of Lynch Syndrome

    PubMed Central

    Uno, Hajime; Ukaegbu, Chinedu; Alvero, Carmelita; McFarland, Ashley; Yurgelun, Matthew B.; Kulke, Matthew H.; Schrag, Deborah; Meyerhardt, Jeffrey A.; Fuchs, Charles S.; Mayer, Robert J.; Ng, Kimmie; Steyerberg, Ewout W.; Syngal, Sapna

    2017-01-01

    Purpose Current Lynch syndrome (LS) prediction models quantify the risk to an individual of carrying a pathogenic germline mutation in three mismatch repair (MMR) genes: MLH1, MSH2, and MSH6. We developed a new prediction model, PREMM5, that incorporates the genes PMS2 and EPCAM to provide comprehensive LS risk assessment. Patients and Methods PREMM5 was developed to predict the likelihood of a mutation in any of the LS genes by using polytomous logistic regression analysis of clinical and germline data from 18,734 individuals who were tested for all five genes. Predictors of mutation status included sex, age at genetic testing, and proband and family cancer histories. Discrimination was evaluated by the area under the receiver operating characteristic curve (AUC), and clinical impact was determined by decision curve analysis; comparisons were made to the existing PREMM1,2,6 model. External validation of PREMM5 was performed in a clinic-based cohort of 1,058 patients with colorectal cancer. Results Pathogenic mutations were detected in 1,000 (5%) of 18,734 patients in the development cohort; mutations included MLH1 (n = 306), MSH2 (n = 354), MSH6 (n = 177), PMS2 (n = 141), and EPCAM (n = 22). PREMM5 distinguished carriers from noncarriers with an AUC of 0.81 (95% CI, 0.79 to 0.82), and performance was similar in the validation cohort (AUC, 0.83; 95% CI, 0.75 to 0.92). Prediction was more difficult for PMS2 mutations (AUC, 0.64; 95% CI, 0.60 to 0.68) than for other genes. Performance characteristics of PREMM5 exceeded those of PREMM1,2,6. Decision curve analysis supported germline LS testing for PREMM5 scores ≥ 2.5%. Conclusion PREMM5 provides comprehensive risk estimation of all five LS genes and supports LS genetic testing for individuals with scores ≥ 2.5%. At this threshold, PREMM5 provides performance that is superior to the existing PREMM1,2,6 model in the identification of carriers of LS, including those with weaker phenotypes and individuals unaffected by cancer. PMID:28489507

  3. Epsin Family Member 3 and Ribosome-Related Genes Are Associated with Late Metastasis in Estrogen Receptor-Positive Breast Cancer and Long-Term Survival in Non-Small Cell Lung Cancer Using a Genome-Wide Identification and Validation Strategy.

    PubMed

    Hellwig, Birte; Madjar, Katrin; Edlund, Karolina; Marchan, Rosemarie; Cadenas, Cristina; Heimes, Anne-Sophie; Almstedt, Katrin; Lebrecht, Antje; Sicking, Isabel; Battista, Marco J; Micke, Patrick; Schmidt, Marcus; Hengstler, Jan G; Rahnenführer, Jörg

    2016-01-01

    In breast cancer, gene signatures that predict the risk of metastasis after surgical tumor resection are mainly indicative of early events. The purpose of this study was to identify genes linked to metastatic recurrence more than three years after surgery. Affymetrix HG U133A and Plus 2.0 array datasets with information on metastasis-free, disease-free or overall survival were accessed via public repositories. Time restricted Cox regression models were used to identify genes associated with metastasis during or after the first three years post-surgery (early- and late-type genes). A sequential validation study design, with two non-adjuvantly treated discovery cohorts (n = 409) and one validation cohort (n = 169) was applied and identified genes were further evaluated in tamoxifen-treated breast cancer patients (n = 923), as well as in patients with non-small cell lung (n = 1779), colon (n = 893) and ovarian (n = 922) cancer. Ten late- and 243 early-type genes were identified in adjuvantly untreated breast cancer. Adjustment to clinicopathological factors and an established proliferation-related signature markedly reduced the number of early-type genes to 16, whereas nine late-type genes still remained significant. These nine genes were associated with metastasis-free survival (MFS) also in a non-time restricted model, but not in the early period alone, stressing that their prognostic impact was primarily based on MFS more than three years after surgery. Four of the ten late-type genes, the ribosome-related factors EIF4B, RPL5, RPL3, and the tumor angiogenesis modifier EPN3 were significantly associated with MFS in the late period also in a meta-analysis of tamoxifen-treated breast cancer cohorts. In contrast, only one late-type gene (EPN3) showed consistent survival associations in more than one cohort in the other cancer types, being associated with worse outcome in two non-small cell lung cancer cohorts. No late-type gene was validated in ovarian and colon cancer. Ribosome-related genes were associated with decreased risk of late metastasis in both adjuvantly untreated and tamoxifen-treated breast cancer patients. In contrast, high expression of epsin (EPN3) was associated with increased risk of late metastasis. This is of clinical relevance considering the well-understood role of epsins in tumor angiogenesis and the ongoing development of epsin antagonizing therapies.

  4. Selection of Valid Reference Genes for Reverse Transcription Quantitative PCR Analysis in Heliconius numata (Lepidoptera: Nymphalidae)

    PubMed Central

    Chouteau, Mathieu; Whibley, Annabel; Joron, Mathieu; Llaurens, Violaine

    2016-01-01

    Identifying the genetic basis of adaptive variation is challenging in non-model organisms and quantitative real time PCR. is a useful tool for validating predictions regarding the expression of candidate genes. However, comparing expression levels in different conditions requires rigorous experimental design and statistical analyses. Here, we focused on the neotropical passion-vine butterflies Heliconius, non-model species studied in evolutionary biology for their adaptive variation in wing color patterns involved in mimicry and in the signaling of their toxicity to predators. We aimed at selecting stable reference genes to be used for normalization of gene expression data in RT-qPCR analyses from developing wing discs according to the minimal guidelines described in Minimum Information for publication of Quantitative Real-Time PCR Experiments (MIQE). To design internal RT-qPCR controls, we studied the stability of expression of nine candidate reference genes (actin, annexin, eF1α, FK506BP, PolyABP, PolyUBQ, RpL3, RPS3A, and tubulin) at two developmental stages (prepupal and pupal) using three widely used programs (GeNorm, NormFinder and BestKeeper). Results showed that, despite differences in statistical methods, genes RpL3, eF1α, polyABP, and annexin were stably expressed in wing discs in late larval and pupal stages of Heliconius numata. This combination of genes may be used as a reference for a reliable study of differential expression in wings for instance for genes involved in important phenotypic variation, such as wing color pattern variation. Through this example, we provide general useful technical recommendations as well as relevant statistical strategies for evolutionary biologists aiming to identify candidate-genes involved adaptive variation in non-model organisms. PMID:27271971

  5. Network Modeling of MDM2 Inhibitor-Oxaliplatin Combination Reveals Biological Synergy in wt-p53 solid tumors

    PubMed Central

    Azmi, Asfar S.; Banerjee, Sanjeev; Ali, Shadan; Wang, Zhiwei; Bao, Bin; Beck, Frances W.J.; Maitah, Main; Choi, Minsig; Shields, Tony F.; Philip, Philip A.; Sarkar, Fazlul H.; Mohammad, Ramzi M.

    2011-01-01

    Earlier we had shown that the MDM2 inhibitor (MI-219) belonging to the spiro-oxindole family can synergistically enhance the efficacy of platinum chemotherapeutics leading to 50% tumor free survival in a genetically complex pancreatic ductal adenocarcinoma (PDAC) xenograft model. In this report, we have taken a systems and network modeling approach in order to understand central mechanisms behind MI219-oxaliplatin synergy with validation in PDAC, colon and breast cancer cell lines. Microarray profiling of drug treatments (MI-219, oxaliplatin or their combination) in capan-2 cells reveal a similar unique set of gene alterations that is duplicated in other solid tumor cells. As single agent, MI-219 or oxaliplatin induced alterations in 48 and 761 genes respectively. The combination treatment resulted in 767 gene alterations with emergence of 286 synergy unique genes. Ingenuity network modeling of combination and synergy unique genes showed the crucial role of five key local networks CREB, CARF, EGR1, NF-kB and E Cadherin. The network signatures were validated at the protein level in all three cell lines. Individually silencing central nodes in these five hubs resulted in abrogation of MI-219-oxaliplatin activity confirming their critical role in aiding p53 mediated apoptotic response. We anticipate that our MI219-oxaliplatin network blueprints can be clinically translated in the rationale design and application of this unique therapeutic combination in a genetically pre-defined subset of patients. PMID:21623005

  6. An experimentally validated network of nine haematopoietic transcription factors reveals mechanisms of cell state stability

    PubMed Central

    Schütte, Judith; Wang, Huange; Antoniou, Stella; Jarratt, Andrew; Wilson, Nicola K; Riepsaame, Joey; Calero-Nieto, Fernando J; Moignard, Victoria; Basilico, Silvia; Kinston, Sarah J; Hannah, Rebecca L; Chan, Mun Chiang; Nürnberg, Sylvia T; Ouwehand, Willem H; Bonzanni, Nicola; de Bruijn, Marella FTR; Göttgens, Berthold

    2016-01-01

    Transcription factor (TF) networks determine cell-type identity by establishing and maintaining lineage-specific expression profiles, yet reconstruction of mammalian regulatory network models has been hampered by a lack of comprehensive functional validation of regulatory interactions. Here, we report comprehensive ChIP-Seq, transgenic and reporter gene experimental data that have allowed us to construct an experimentally validated regulatory network model for haematopoietic stem/progenitor cells (HSPCs). Model simulation coupled with subsequent experimental validation using single cell expression profiling revealed potential mechanisms for cell state stabilisation, and also how a leukaemogenic TF fusion protein perturbs key HSPC regulators. The approach presented here should help to improve our understanding of both normal physiological and disease processes. DOI: http://dx.doi.org/10.7554/eLife.11469.001 PMID:26901438

  7. Functional genomics unique to week 20 post wounding in the deep cone/fat dome of the Duroc/Yorkshire porcine model of fibroproliferative scarring.

    PubMed

    Engrav, Loren H; Tuggle, Christopher K; Kerr, Kathleen F; Zhu, Kathy Q; Numhom, Surawej; Couture, Oliver P; Beyer, Richard P; Hocking, Anne M; Carrougher, Gretchen J; Ramos, Maria Luiza C; Klein, Matthew B; Gibran, Nicole S

    2011-04-20

    Hypertrophic scar was first described over 100 years ago; PubMed has more than 1,000 references on the topic. Nevertheless prevention and treatment remains poor, because 1) there has been no validated animal model; 2) human scar tissue, which is impossible to obtain in a controlled manner, has been the only source for study; 3) tissues typically have been homogenized, mixing cell populations; and 4) gene-by-gene studies are incomplete. We have assembled a system that overcomes these barriers and permits the study of genome-wide gene expression in microanatomical locations, in shallow and deep partial-thickness wounds, and pigmented and non-pigmented skin, using the Duroc(pigmented fibroproliferative)/Yorkshire(non-pigmented non-fibroproliferative) porcine model. We used this system to obtain the differential transcriptome at 1, 2, 3, 12 and 20 weeks post wounding. It is not clear when fibroproliferation begins, but it is fully developed in humans and the Duroc breed at 20 weeks. Therefore we obtained the derivative functional genomics unique to 20 weeks post wounding. We also obtained long-term, forty-six week follow-up with the model. 1) The scars are still thick at forty-six weeks post wounding further validating the model. 2) The differential transcriptome provides new insights into the fibroproliferative process as several genes thought fundamental to fibroproliferation are absent and others differentially expressed are newly implicated. 3) The findings in the derivative functional genomics support old concepts, which further validates the model, and suggests new avenues for reductionist exploration. In the future, these findings will be searched for directed networks likely involved in cutaneous fibroproliferation. These clues may lead to a better understanding of the systems biology of cutaneous fibroproliferation, and ultimately prevention and treatment of hypertrophic scarring.

  8. Development and Validation of a qRT-PCR Classifier for Lung Cancer Prognosis

    PubMed Central

    Chen, Guoan; Kim, Sinae; Taylor, Jeremy MG; Wang, Zhuwen; Lee, Oliver; Ramnath, Nithya; Reddy, Rishindra M; Lin, Jules; Chang, Andrew C; Orringer, Mark B; Beer, David G

    2011-01-01

    Purpose This prospective study aimed to develop a robust and clinically-applicable method to identify high-risk early stage lung cancer patients and then to validate this method for use in future translational studies. Patients and Methods Three published Affymetrix microarray data sets representing 680 primary tumors were used in the survival-related gene selection procedure using clustering, Cox model and random survival forest (RSF) analysis. A final set of 91 genes was selected and tested as a predictor of survival using a qRT-PCR-based assay utilizing an independent cohort of 101 lung adenocarcinomas. Results The RSF model built from 91 genes in the training set predicted patient survival in an independent cohort of 101 lung adenocarcinomas, with a prediction error rate of 26.6%. The mortality risk index (MRI) was significantly related to survival (Cox model p < 0.00001) and separated all patients into low, medium, and high-risk groups (HR = 1.00, 2.82, 4.42). The MRI was also related to survival in stage 1 patients (Cox model p = 0.001), separating patients into low, medium, and high-risk groups (HR = 1.00, 3.29, 3.77). Conclusions The development and validation of this robust qRT-PCR platform allows prediction of patient survival with early stage lung cancer. Utilization will now allow investigators to evaluate it prospectively by incorporation into new clinical trials with the goal of personalized treatment of lung cancer patients and improving patient survival. PMID:21792073

  9. Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection.

    PubMed

    Dong, Zuoli; Zhang, Naiqian; Li, Chun; Wang, Haiyun; Fang, Yun; Wang, Jun; Zheng, Xiaoqi

    2015-06-30

    An enduring challenge in personalized medicine is to select right drug for individual patients. Testing drugs on patients in large clinical trials is one way to assess their efficacy and toxicity, but it is impractical to test hundreds of drugs currently under development. Therefore the preclinical prediction model is highly expected as it enables prediction of drug response to hundreds of cell lines in parallel. Recently, two large-scale pharmacogenomic studies screened multiple anticancer drugs on over 1000 cell lines in an effort to elucidate the response mechanism of anticancer drugs. To this aim, we here used gene expression features and drug sensitivity data in Cancer Cell Line Encyclopedia (CCLE) to build a predictor based on Support Vector Machine (SVM) and a recursive feature selection tool. Robustness of our model was validated by cross-validation and an independent dataset, the Cancer Genome Project (CGP). Our model achieved good cross validation performance for most drugs in the Cancer Cell Line Encyclopedia (≥80% accuracy for 10 drugs, ≥75% accuracy for 19 drugs). Independent tests on eleven common drugs between CCLE and CGP achieved satisfactory performance for three of them, i.e., AZD6244, Erlotinib and PD-0325901, using expression levels of only twelve, six and seven genes, respectively. These results suggest that drug response could be effectively predicted from genomic features. Our model could be applied to predict drug response for some certain drugs and potentially play a complementary role in personalized medicine.

  10. Reverse engineering the gap gene network of Drosophila melanogaster.

    PubMed

    Perkins, Theodore J; Jaeger, Johannes; Reinitz, John; Glass, Leon

    2006-05-01

    A fundamental problem in functional genomics is to determine the structure and dynamics of genetic networks based on expression data. We describe a new strategy for solving this problem and apply it to recently published data on early Drosophila melanogaster development. Our method is orders of magnitude faster than current fitting methods and allows us to fit different types of rules for expressing regulatory relationships. Specifically, we use our approach to fit models using a smooth nonlinear formalism for modeling gene regulation (gene circuits) as well as models using logical rules based on activation and repression thresholds for transcription factors. Our technique also allows us to infer regulatory relationships de novo or to test network structures suggested by the literature. We fit a series of models to test several outstanding questions about gap gene regulation, including regulation of and by hunchback and the role of autoactivation. Based on our modeling results and validation against the experimental literature, we propose a revised network structure for the gap gene system. Interestingly, some relationships in standard textbook models of gap gene regulation appear to be unnecessary for or even inconsistent with the details of gap gene expression during wild-type development.

  11. Identification and Validation of Reference Genes for RT-qPCR Analysis in Non-Heading Chinese Cabbage Flowers

    PubMed Central

    Wang, Cheng; Cui, Hong-Mi; Huang, Tian-Hong; Liu, Tong-Kun; Hou, Xi-Lin; Li, Ying

    2016-01-01

    Non-heading Chinese cabbage (Brassica rapa ssp. chinensis Makino) is an important vegetable member of Brassica rapa crops. It exhibits a typical sporophytic self-incompatibility (SI) system and is an ideal model plant to explore the mechanism of SI. Gene expression research are frequently used to unravel the complex genetic mechanism and in such studies appropriate reference selection is vital. Validation of reference genes have neither been conducted in Brassica rapa flowers nor in SI trait. In this study, 13 candidate reference genes were selected and examined systematically in 96 non-heading Chinese cabbage flower samples that represent four strategic groups in compatible and self-incompatible lines of non-heading Chinese cabbage. Two RT-qPCR analysis software, geNorm and NormFinder, were used to evaluate the expression stability of these genes systematically. Results revealed that best-ranked references genes should be selected according to specific sample subsets. DNAJ, UKN1, and PP2A were identified as the most stable reference genes among all samples. Moreover, our research further revealed that the widely used reference genes, CYP and ACP, were the least suitable reference genes in most non-heading Chinese cabbage flower sample sets. To further validate the suitability of the reference genes identified in this study, the expression level of SRK and Exo70A1 genes which play important roles in regulating interaction between pollen and stigma were studied. Our study presented the first systematic study of reference gene(s) selection for SI study and provided guidelines to obtain more accurate RT-qPCR results in non-heading Chinese cabbage. PMID:27375663

  12. The expression dynamics of mechanosensitive genes in extra-embryonic vasculature after heart starts to beat in chick embryo.

    PubMed

    Rajendran, Saranya; Sundaresan, Lakshmikirupa; Rajendran, Krithika; Selvaraj, Monica; Gupta, Ravi; Chatterjee, Suvro

    2016-02-11

    Fluid flow plays an important role in vascular development. However, the detailed mechanisms, particularly the link between flow and modulation of gene expression during vascular development, remain unexplored. In chick embryo, the key events of vascular development from initiation of heart beat to establishment of effective blood flow occur between the stages HH10 and HH13. Therefore, we propose a novel in vivo model to study the flow experienced by developing endothelium. Using this model, we aimed to capture the transcriptome dynamics of the pre- and post-flow conditions. RNA was isolated from extra embryonic area vasculosa (EE-AV) pooled from three chick embryos between HH10-HH13 and RNA sequencing was performed. The whole transcriptome sequencing of chick identified up-regulation of some of the previously well-known mechanosensitive genes including NFR2, HAND1, CTGF and KDR. GO analyses of the up-regulated genes revealed enrichment of several biological processes including heart development, extracellular matrix organization, cell-matrix adhesion, cell migration, blood vessel development, patterning of blood vessels, collagen fibril organization. Genes encoding for gap junctions proteins which are involved in vascular remodeling and arterial-venous differentiation, and genes involved in cell-cell adhesion, and ECM interactions were significantly up-regulated. Validation of selected genes through semi quantitative PCR was performed. The study indicates that shear stress plays a major role in development. Through appropriate validation, this platform can serve as an in vivo model to study conditions of disturbed flow in pathology as well as normal flow during development.

  13. RankProd Combined with Genetic Algorithm Optimized Artificial Neural Network Establishes a Diagnostic and Prognostic Prediction Model that Revealed C1QTNF3 as a Biomarker for Prostate Cancer.

    PubMed

    Hou, Qi; Bing, Zhi-Tong; Hu, Cheng; Li, Mao-Yin; Yang, Ke-Hu; Mo, Zu; Xie, Xiang-Wei; Liao, Ji-Lin; Lu, Yan; Horie, Shigeo; Lou, Ming-Wu

    2018-06-01

    Prostate cancer (PCa) is the most commonly diagnosed cancer in males in the Western world. Although prostate-specific antigen (PSA) has been widely used as a biomarker for PCa diagnosis, its results can be controversial. Therefore, new biomarkers are needed to enhance the clinical management of PCa. From publicly available microarray data, differentially expressed genes (DEGs) were identified by meta-analysis with RankProd. Genetic algorithm optimized artificial neural network (GA-ANN) was introduced to establish a diagnostic prediction model and to filter candidate genes. The diagnostic and prognostic capability of the prediction model and candidate genes were investigated in both GEO and TCGA datasets. Candidate genes were further validated by qPCR, Western Blot and Tissue microarray. By RankProd meta-analyses, 2306 significantly up- and 1311 down-regulated probes were found in 133 cases and 30 controls microarray data. The overall accuracy rate of the PCa diagnostic prediction model, consisting of a 15-gene signature, reached up to 100% in both the training and test dataset. The prediction model also showed good results for the diagnosis (AUC = 0.953) and prognosis (AUC of 5 years overall survival time = 0.808) of PCa in the TCGA database. The expression levels of three genes, FABP5, C1QTNF3 and LPHN3, were validated by qPCR. C1QTNF3 high expression was further validated in PCa tissue by Western Blot and Tissue microarray. In the GEO datasets, C1QTNF3 was a good predictor for the diagnosis of PCa (GSE6956: AUC = 0.791; GSE8218: AUC = 0.868; GSE26910: AUC = 0.972). In the TCGA database, C1QTNF3 was significantly associated with PCa patient recurrence free survival (P < .001, AUC = 0.57). In this study, we have developed a diagnostic and prognostic prediction model for PCa. C1QTNF3 was revealed as a promising biomarker for PCa. This approach can be applied to other high-throughput data from different platforms for the discovery of oncogenes or biomarkers in different kinds of diseases. Copyright © 2018. Published by Elsevier B.V.

  14. Panels of tumor-derived RNA markers in peripheral blood of patients with non-small cell lung cancer: their dependence on age, gender and clinical stages.

    PubMed

    Chian, Chih-Feng; Hwang, Yi-Ting; Terng, Harn-Jing; Lee, Shih-Chun; Chao, Tsui-Yi; Chang, Hung; Ho, Ching-Liang; Wu, Yi-Ying; Perng, Wann-Cherng

    2016-08-02

    Peripheral blood mononuclear cell (PBMC)-derived gene signatures were investigated for their potential use in the early detection of non-small cell lung cancer (NSCLC). In our study, 187 patients with NSCLC and 310 age- and gender-matched controls, and an independent set containing 29 patients for validation were included. Eight significant NSCLC-associated genes were identified, including DUSP6, EIF2S3, GRB2, MDM2, NF1, POLDIP2, RNF4, and WEE1. The logistic model containing these significant markers was able to distinguish subjects with NSCLC from controls with an excellent performance, 80.7% sensitivity, 90.6% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.924. Repeated random sub-sampling for 100 times was used to validate the performance of classification training models with an average AUC of 0.92. Additional cross-validation using the independent set resulted in the sensitivity 75.86%. Furthermore, six age/gender-dependent genes: CPEB4, EIF2S3, GRB2, MCM4, RNF4, and STAT2 were identified using age and gender stratification approach. STAT2 and WEE1 were explored as stage-dependent using stage-stratified subpopulation. We conclude that these logistic models using different signatures for total and stratified samples are potential complementary tools for assessing the risk of NSCLC.

  15. A random variance model for detection of differential gene expression in small microarray experiments.

    PubMed

    Wright, George W; Simon, Richard M

    2003-12-12

    Microarray techniques provide a valuable way of characterizing the molecular nature of disease. Unfortunately expense and limited specimen availability often lead to studies with small sample sizes. This makes accurate estimation of variability difficult, since variance estimates made on a gene by gene basis will have few degrees of freedom, and the assumption that all genes share equal variance is unlikely to be true. We propose a model by which the within gene variances are drawn from an inverse gamma distribution, whose parameters are estimated across all genes. This results in a test statistic that is a minor variation of those used in standard linear models. We demonstrate that the model assumptions are valid on experimental data, and that the model has more power than standard tests to pick up large changes in expression, while not increasing the rate of false positives. This method is incorporated into BRB-ArrayTools version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html). ftp://linus.nci.nih.gov/pub/techreport/RVM_supplement.pdf

  16. UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets.

    PubMed

    Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K

    2015-06-04

    Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.

  17. With Reference to Reference Genes: A Systematic Review of Endogenous Controls in Gene Expression Studies.

    PubMed

    Chapman, Joanne R; Waldenström, Jonas

    2015-01-01

    The choice of reference genes that are stably expressed amongst treatment groups is a crucial step in real-time quantitative PCR gene expression studies. Recent guidelines have specified that a minimum of two validated reference genes should be used for normalisation. However, a quantitative review of the literature showed that the average number of reference genes used across all studies was 1.2. Thus, the vast majority of studies continue to use a single gene, with β-actin (ACTB) and/or glyceraldehyde 3-phosphate dehydrogenase (GAPDH) being commonly selected in studies of vertebrate gene expression. Few studies (15%) tested a panel of potential reference genes for stability of expression before using them to normalise data. Amongst studies specifically testing reference gene stability, few found ACTB or GAPDH to be optimal, whereby these genes were significantly less likely to be chosen when larger panels of potential reference genes were screened. Fewer reference genes were tested for stability in non-model organisms, presumably owing to a dearth of available primers in less well characterised species. Furthermore, the experimental conditions under which real-time quantitative PCR analyses were conducted had a large influence on the choice of reference genes, whereby different studies of rat brain tissue showed different reference genes to be the most stable. These results highlight the importance of validating the choice of normalising reference genes before conducting gene expression studies.

  18. Generation and Validation of the iKp1289 Metabolic Model for Klebsiella pneumoniae KPPR1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Henry, Christopher S.; Rotman, Ella; Lathem, Wyndham W.

    Klebsiella pneumoniae has a reputation for causing a wide range of infectious conditions, with numerous highly virulent and antibiotic-resistant strains. Metabolic models have the potential to provide insights into the growth behavior, nutrient requirements, essential genes, and candidate drug targets in these strains. Here we develop a metabolic model for KPPR1, a highly virulent strain of K. pneumoniae. We apply a combination of Biolog phenotype data and fitness data to validate and refine our KPPR1 model. The final model displays a predictive accuracy of 75% in identifying potential carbon and nitrogen sources for K. pneumoniae and of 99% in predictingmore » nonessential genes in rich media. We demonstrate how this model is useful in studying the differences in the metabolic capabilities of the low-virulence MGH 78578 strain and the highly virulent KPPR1 strain. For example, we demonstrate that these strains differ in carbohydrate metabolism, including the ability to metabolize dulcitol as a primary carbon source. Our model makes numerous other predictions for follow-up verification and analysis.« less

  19. Multi-omics approach identifies molecular mechanisms of plant-fungus mycorrhizal interaction

    DOE PAGES

    Larsen, Peter E.; Sreedasyam, Avinash; Trivedi, Geetika; ...

    2016-01-19

    In mycorrhizal symbiosis, plant roots form close, mutually beneficial interactions with soil fungi. Before this mycorrhizal interaction can be established however, plant roots must be capable of detecting potential beneficial fungal partners and initiating the gene expression patterns necessary to begin symbiosis. To predict a plant root – mycorrhizal fungi sensor systems, we analyzed in vitro experiments of Populus tremuloides (aspen tree) and Laccaria bicolor (mycorrhizal fungi) interaction and leveraged over 200 previously published transcriptomic experimental data sets, 159 experimentally validated plant transcription factor binding motifs, and more than 120-thousand experimentally validated protein-protein interactions to generate models of pre-mycorrhizal sensormore » systems in aspen root. These sensor mechanisms link extracellular signaling molecules with gene regulation through a network comprised of membrane receptors, signal cascade proteins, transcription factors, and transcription factor biding DNA motifs. Modeling predicted four pre-mycorrhizal sensor complexes in aspen that interact with fifteen transcription factors to regulate the expression of 1184 genes in response to extracellular signals synthesized by Laccaria. Predicted extracellular signaling molecules include common signaling molecules such as phenylpropanoids, salicylate, and, jasmonic acid. Lastly, this multi-omic computational modeling approach for predicting the complex sensory networks yielded specific, testable biological hypotheses for mycorrhizal interaction signaling compounds, sensor complexes, and mechanisms of gene regulation.« less

  20. Multi-omics approach identifies molecular mechanisms of plant-fungus mycorrhizal interaction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, Peter E.; Sreedasyam, Avinash; Trivedi, Geetika

    In mycorrhizal symbiosis, plant roots form close, mutually beneficial interactions with soil fungi. Before this mycorrhizal interaction can be established however, plant roots must be capable of detecting potential beneficial fungal partners and initiating the gene expression patterns necessary to begin symbiosis. To predict a plant root – mycorrhizal fungi sensor systems, we analyzed in vitro experiments of Populus tremuloides (aspen tree) and Laccaria bicolor (mycorrhizal fungi) interaction and leveraged over 200 previously published transcriptomic experimental data sets, 159 experimentally validated plant transcription factor binding motifs, and more than 120-thousand experimentally validated protein-protein interactions to generate models of pre-mycorrhizal sensormore » systems in aspen root. These sensor mechanisms link extracellular signaling molecules with gene regulation through a network comprised of membrane receptors, signal cascade proteins, transcription factors, and transcription factor biding DNA motifs. Modeling predicted four pre-mycorrhizal sensor complexes in aspen that interact with fifteen transcription factors to regulate the expression of 1184 genes in response to extracellular signals synthesized by Laccaria. Predicted extracellular signaling molecules include common signaling molecules such as phenylpropanoids, salicylate, and, jasmonic acid. Lastly, this multi-omic computational modeling approach for predicting the complex sensory networks yielded specific, testable biological hypotheses for mycorrhizal interaction signaling compounds, sensor complexes, and mechanisms of gene regulation.« less

  1. Short cell-penetrating peptides: a model of interactions with gene promoter sites.

    PubMed

    Khavinson, V Kh; Tarnovskaya, S I; Linkova, N S; Pronyaeva, V E; Shataeva, L K; Yakutseni, P P

    2013-01-01

    Analysis of the main parameters of molecular mechanics (number of hydrogen bonds, hydrophobic and electrostatic interactions, DNA-peptide complex minimization energy) provided the data to validate the previously proposed qualitative models of peptide-DNA interactions and to evaluate their quantitative characteristics. Based on these estimations, a three-dimensional model of Lys-Glu and Ala-Glu-Asp-Gly peptide interactions with DNA sites (GCAG and ATTTC) located in the promoter zones of genes encoding CD5, IL-2, MMP2, and Tram1 signal molecules.

  2. Adipose Genes Down-Regulated During Experimental Endotoxemia Are Also Suppressed in Obesity

    PubMed Central

    Hinkle, Christine C.; Haris, Lalarukh; Shah, Rhia; Mehta, Nehal N.; Putt, Mary E.; Reilly, Muredach P.

    2012-01-01

    Context: Adipose inflammation is a crucial link between obesity and its metabolic complications. Human experimental endotoxemia is a controlled model for the study of inflammatory cardiometabolic responses in vivo. Objective: We hypothesized that adipose genes down-regulated during endotoxemia would approximate changes observed with obesity-related inflammation and reveal novel candidates in cardiometabolic disease. Design, Subjects, and Intervention: Healthy volunteers (n = 14) underwent a 3 ng/kg endotoxin challenge; adipose biopsies were taken at 0, 4, 12, and 24 h for mRNA microarray. A priority list of highly down-regulated and biologically relevant genes was validated by RT-PCR in an independent sample of adipose from healthy subjects (n = 7) undergoing a subclinical 0.6 ng/kg endotoxemia protocol. Expression of validated genes was screened in adipose of lean and severely obese individuals (n = 11 per group), and cellular source was probed in cultured adipocytes and macrophages. Results: Endotoxemia (3 ng/kg) suppressed expression of 353 genes (to <67% of baseline; P < 1 × 10−5) of which 68 candidates were prioritized for validation. In low-dose (0.6 ng/kg) endotoxin validation, 22 (32%) of these 68 genes were confirmed. Functional classification revealed that many of these genes are involved in cell development and differentiation. Of validated genes, 59% (13 of 22) were down-regulated more than 1.5-fold in primary human adipocytes after treatment with endotoxin. In human macrophages, 59% (13 of 22) were up-regulated during differentiation to inflammatory M1 macrophages whereas 64% (14 of 22) were down-regulated during transition to homeostatic M2 macrophages. Finally, in obese vs. lean adipose, 91% (20 of 22) tended to have reduced expression (χ2 = 10.72, P < 0.01) with 50% (11 of 22) reaching P < 0.05 (χ2 = 9.28, P < 0.01). Conclusions: Exploration of down-regulated mRNA in adipose during human endotoxemia revealed suppression of genes involved in cell development and differentiation. A majority of candidates were also suppressed in endogenous human obesity, suggesting a potential pathophysiological role in human obesity-related adipose inflammation. PMID:22893715

  3. Adipose genes down-regulated during experimental endotoxemia are also suppressed in obesity.

    PubMed

    Shah, Rachana; Hinkle, Christine C; Haris, Lalarukh; Shah, Rhia; Mehta, Nehal N; Putt, Mary E; Reilly, Muredach P

    2012-11-01

    Adipose inflammation is a crucial link between obesity and its metabolic complications. Human experimental endotoxemia is a controlled model for the study of inflammatory cardiometabolic responses in vivo. We hypothesized that adipose genes down-regulated during endotoxemia would approximate changes observed with obesity-related inflammation and reveal novel candidates in cardiometabolic disease. Healthy volunteers (n = 14) underwent a 3 ng/kg endotoxin challenge; adipose biopsies were taken at 0, 4, 12, and 24 h for mRNA microarray. A priority list of highly down-regulated and biologically relevant genes was validated by RT-PCR in an independent sample of adipose from healthy subjects (n = 7) undergoing a subclinical 0.6 ng/kg endotoxemia protocol. Expression of validated genes was screened in adipose of lean and severely obese individuals (n = 11 per group), and cellular source was probed in cultured adipocytes and macrophages. Endotoxemia (3 ng/kg) suppressed expression of 353 genes (to <67% of baseline; P < 1 × 10(-5)) of which 68 candidates were prioritized for validation. In low-dose (0.6 ng/kg) endotoxin validation, 22 (32%) of these 68 genes were confirmed. Functional classification revealed that many of these genes are involved in cell development and differentiation. Of validated genes, 59% (13 of 22) were down-regulated more than 1.5-fold in primary human adipocytes after treatment with endotoxin. In human macrophages, 59% (13 of 22) were up-regulated during differentiation to inflammatory M1 macrophages whereas 64% (14 of 22) were down-regulated during transition to homeostatic M2 macrophages. Finally, in obese vs. lean adipose, 91% (20 of 22) tended to have reduced expression (χ(2) = 10.72, P < 0.01) with 50% (11 of 22) reaching P < 0.05 (χ(2) = 9.28, P < 0.01). Exploration of down-regulated mRNA in adipose during human endotoxemia revealed suppression of genes involved in cell development and differentiation. A majority of candidates were also suppressed in endogenous human obesity, suggesting a potential pathophysiological role in human obesity-related adipose inflammation.

  4. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    PubMed

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. Biological Networks for Predicting Chemical Hepatocarcinogenicity Using Gene Expression Data from Treated Mice and Relevance across Human and Rat Species

    PubMed Central

    Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.

    2013-01-01

    Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943

  6. Biological networks for predicting chemical hepatocarcinogenicity using gene expression data from treated mice and relevance across human and rat species.

    PubMed

    Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J

    2013-01-01

    Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.

  7. Reconstruction of the metabolic network of Pseudomonas aeruginosa to interrogate virulence factor synthesis

    NASA Astrophysics Data System (ADS)

    Bartell, Jennifer A.; Blazier, Anna S.; Yen, Phillip; Thøgersen, Juliane C.; Jelsbak, Lars; Goldberg, Joanna B.; Papin, Jason A.

    2017-03-01

    Virulence-linked pathways in opportunistic pathogens are putative therapeutic targets that may be associated with less potential for resistance than targets in growth-essential pathways. However, efficacy of virulence-linked targets may be affected by the contribution of virulence-related genes to metabolism. We evaluate the complex interrelationships between growth and virulence-linked pathways using a genome-scale metabolic network reconstruction of Pseudomonas aeruginosa strain PA14 and an updated, expanded reconstruction of P. aeruginosa strain PAO1. The PA14 reconstruction accounts for the activity of 112 virulence-linked genes and virulence factor synthesis pathways that produce 17 unique compounds. We integrate eight published genome-scale mutant screens to validate gene essentiality predictions in rich media, contextualize intra-screen discrepancies and evaluate virulence-linked gene distribution across essentiality datasets. Computational screening further elucidates interconnectivity between inhibition of virulence factor synthesis and growth. Successful validation of selected gene perturbations using PA14 transposon mutants demonstrates the utility of model-driven screening of therapeutic targets.

  8. A robust prognostic signature for hormone-positive node-negative breast cancer.

    PubMed

    Griffith, Obi L; Pepin, François; Enache, Oana M; Heiser, Laura M; Collisson, Eric A; Spellman, Paul T; Gray, Joe W

    2013-01-01

    Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

  9. A robust prognostic signature for hormone-positive node-negative breast cancer

    PubMed Central

    2013-01-01

    Background Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. Conclusions RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment. PMID:24112773

  10. A model of gene expression based on random dynamical systems reveals modularity properties of gene regulatory networks.

    PubMed

    Antoneli, Fernando; Ferreira, Renata C; Briones, Marcelo R S

    2016-06-01

    Here we propose a new approach to modeling gene expression based on the theory of random dynamical systems (RDS) that provides a general coupling prescription between the nodes of any given regulatory network given the dynamics of each node is modeled by a RDS. The main virtues of this approach are the following: (i) it provides a natural way to obtain arbitrarily large networks by coupling together simple basic pieces, thus revealing the modularity of regulatory networks; (ii) the assumptions about the stochastic processes used in the modeling are fairly general, in the sense that the only requirement is stationarity; (iii) there is a well developed mathematical theory, which is a blend of smooth dynamical systems theory, ergodic theory and stochastic analysis that allows one to extract relevant dynamical and statistical information without solving the system; (iv) one may obtain the classical rate equations form the corresponding stochastic version by averaging the dynamic random variables (small noise limit). It is important to emphasize that unlike the deterministic case, where coupling two equations is a trivial matter, coupling two RDS is non-trivial, specially in our case, where the coupling is performed between a state variable of one gene and the switching stochastic process of another gene and, hence, it is not a priori true that the resulting coupled system will satisfy the definition of a random dynamical system. We shall provide the necessary arguments that ensure that our coupling prescription does indeed furnish a coupled regulatory network of random dynamical systems. Finally, the fact that classical rate equations are the small noise limit of our stochastic model ensures that any validation or prediction made on the basis of the classical theory is also a validation or prediction of our model. We illustrate our framework with some simple examples of single-gene system and network motifs. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. An 8-gene qRT-PCR-based gene expression score that has prognostic value in early breast cancer

    PubMed Central

    2010-01-01

    Background Gene expression profiling may improve prognostic accuracy in patients with early breast cancer. Our objective was to demonstrate that it is possible to develop a simple molecular signature to predict distant relapse. Methods We included 153 patients with stage I-II hormonal receptor-positive breast cancer. RNA was isolated from formalin-fixed paraffin-embedded samples and qRT-PCR amplification of 83 genes was performed with gene expression assays. The genes we analyzed were those included in the 70-Gene Signature, the Recurrence Score and the Two-Gene Index. The association among gene expression, clinical variables and distant metastasis-free survival was analyzed using Cox regression models. Results An 8-gene prognostic score was defined. Distant metastasis-free survival at 5 years was 97% for patients defined as low-risk by the prognostic score versus 60% for patients defined as high-risk. The 8-gene score remained a significant factor in multivariate analysis and its performance was similar to that of two validated gene profiles: the 70-Gene Signature and the Recurrence Score. The validity of the signature was verified in independent cohorts obtained from the GEO database. Conclusions This study identifies a simple gene expression score that complements histopathological prognostic factors in breast cancer, and can be determined in paraffin-embedded samples. PMID:20584321

  12. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    PubMed

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  13. Multigene signature for predicting prognosis of patients with 1p19q co-deletion diffuse glioma.

    PubMed

    Hu, Xin; Martinez-Ledesma, Emmanuel; Zheng, Siyuan; Kim, Hoon; Barthel, Floris; Jiang, Tao; Hess, Kenneth R; Verhaak, Roel G W

    2017-06-01

    Co-deletion of 1p and 19q marks a diffuse glioma subtype associated with relatively favorable overall survival; however, heterogeneous clinical outcomes are observed within this category. We assembled gene expression profiles and sample annotation of 374 glioma patients carrying the 1p/19q co-deletion. We predicted 1p/19q status using gene expression when annotation was missing. A first cohort was randomly split into training (n = 170) and a validation dataset (n = 163). A second validation set consisted of 41 expression profiles. An elastic-net penalized Cox proportional hazards model was applied to build a classifier model through cross-validation within the training dataset. The selected 35-gene signature was used to identify high-risk and low-risk groups in the validation set, which showed significantly different overall survival (P = .00058, log-rank test). For time-to-death events, the high-risk group predicted by the gene signature yielded a hazard ratio of 1.78 (95% confidence interval, 1.02-3.11). The signature was also significantly associated with clinical outcome in the The Cancer Genome Atlas (CGA) IDH-mutant 1p/19q wild-type and IDH-wild-type glioma cohorts. Pathway analysis suggested that high risk was associated with increased acetylation activity and inflammatory response. Tumor purity was found to be significantly decreased in high-risk IDH-mutant with 1p/19q co-deletion gliomas and IDH-wild-type glioblastomas but not in IDH-wild-type lower grade or IDH-mutant, non-co-deleted gliomas. We identified a 35-gene signature that identifies high-risk and low-risk categories of 1p/19q positive glioma patients. We have demonstrated heterogeneity amongst a relatively new glioma subtype and provided a stepping stone towards risk stratification. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Radiogenomics analysis identifies correlations of digital mammography with clinical molecular signatures in breast cancer.

    PubMed

    Tamez-Peña, Jose-Gerardo; Rodriguez-Rojas, Juan-Andrés; Gomez-Rueda, Hugo; Celaya-Padilla, Jose-Maria; Rivera-Prieto, Roxana-Alicia; Palacios-Corona, Rebeca; Garza-Montemayor, Margarita; Cardona-Huerta, Servando; Treviño, Victor

    2018-01-01

    In breast cancer, well-known gene expression subtypes have been related to a specific clinical outcome. However, their impact on the breast tissue phenotype has been poorly studied. Here, we investigate the association of imaging data of tumors to gene expression signatures from 71 patients with breast cancer that underwent pre-treatment digital mammograms and tumor biopsies. From digital mammograms, a semi-automated radiogenomics analysis generated 1,078 features describing the shape, signal distribution, and texture of tumors along their contralateral image used as control. From tumor biopsy, we estimated the OncotypeDX and PAM50 recurrence scores using gene expression microarrays. Then, we used multivariate analysis under stringent cross-validation to train models predicting recurrence scores. Few univariate features reached Spearman correlation coefficients above 0.4. Nevertheless, multivariate analysis yielded significantly correlated models for both signatures (correlation of OncotypeDX = 0.49 ± 0.07 and PAM50 = 0.32 ± 0.10 in stringent cross-validation and OncotypeDX = 0.83 and PAM50 = 0.78 for a unique model). Equivalent models trained from the unaffected contralateral breast were not correlated suggesting that the image signatures were tumor-specific and that overfitting was not a considerable issue. We also noted that models were improved by combining clinical information (triple negative status and progesterone receptor). The models used mostly wavelets and fractal features suggesting their importance to capture tumor information. Our results suggest that molecular-based recurrence risk and breast cancer subtypes have observable radiographic phenotypes. To our knowledge, this is the first study associating mammographic information to gene expression recurrence signatures.

  15. Radiogenomics analysis identifies correlations of digital mammography with clinical molecular signatures in breast cancer

    PubMed Central

    Tamez-Peña, Jose-Gerardo; Rodriguez-Rojas, Juan-Andrés; Gomez-Rueda, Hugo; Celaya-Padilla, Jose-Maria; Rivera-Prieto, Roxana-Alicia; Palacios-Corona, Rebeca; Garza-Montemayor, Margarita; Cardona-Huerta, Servando

    2018-01-01

    In breast cancer, well-known gene expression subtypes have been related to a specific clinical outcome. However, their impact on the breast tissue phenotype has been poorly studied. Here, we investigate the association of imaging data of tumors to gene expression signatures from 71 patients with breast cancer that underwent pre-treatment digital mammograms and tumor biopsies. From digital mammograms, a semi-automated radiogenomics analysis generated 1,078 features describing the shape, signal distribution, and texture of tumors along their contralateral image used as control. From tumor biopsy, we estimated the OncotypeDX and PAM50 recurrence scores using gene expression microarrays. Then, we used multivariate analysis under stringent cross-validation to train models predicting recurrence scores. Few univariate features reached Spearman correlation coefficients above 0.4. Nevertheless, multivariate analysis yielded significantly correlated models for both signatures (correlation of OncotypeDX = 0.49 ± 0.07 and PAM50 = 0.32 ± 0.10 in stringent cross-validation and OncotypeDX = 0.83 and PAM50 = 0.78 for a unique model). Equivalent models trained from the unaffected contralateral breast were not correlated suggesting that the image signatures were tumor-specific and that overfitting was not a considerable issue. We also noted that models were improved by combining clinical information (triple negative status and progesterone receptor). The models used mostly wavelets and fractal features suggesting their importance to capture tumor information. Our results suggest that molecular-based recurrence risk and breast cancer subtypes have observable radiographic phenotypes. To our knowledge, this is the first study associating mammographic information to gene expression recurrence signatures. PMID:29596496

  16. Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer

    PubMed Central

    2015-01-01

    Background microRNA (miRNA) expression plays an influential role in cancer classification and malignancy, and miRNAs are feasible as alternative diagnostic markers for pancreatic cancer, a highly aggressive neoplasm with silent early symptoms, high metastatic potential, and resistance to conventional therapies. Methods In this study, we evaluated the benefits of multi-omics data analysis by integrating miRNA and mRNA expression data in pancreatic cancer. Using support vector machine (SVM) modelling and leave-one-out cross validation (LOOCV), we evaluated the diagnostic performance of single- or multi-markers based on miRNA and mRNA expression profiles from 104 PDAC tissues and 17 benign pancreatic tissues. For selecting even more reliable and robust markers, we performed validation by independent datasets from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) data depositories. For validation, miRNA activity was estimated by miRNA-target gene interaction and mRNA expression datasets in pancreatic cancer. Results Using a comprehensive identification approach, we successfully identified 705 multi-markers having powerful diagnostic performance for PDAC. In addition, these marker candidates annotated with cancer pathways using gene ontology analysis. Conclusions Our prediction models have strong potential for the diagnosis of pancreatic cancer. PMID:26328610

  17. How to perform RT-qPCR accurately in plant species? A case study on flower colour gene expression in an azalea (Rhododendron simsii hybrids) mapping population.

    PubMed

    De Keyser, Ellen; Desmet, Laurence; Van Bockstaele, Erik; De Riek, Jan

    2013-06-24

    Flower colour variation is one of the most crucial selection criteria in the breeding of a flowering pot plant, as is also the case for azalea (Rhododendron simsii hybrids). Flavonoid biosynthesis was studied intensively in several species. In azalea, flower colour can be described by means of a 3-gene model. However, this model does not clarify pink-coloration. The last decade gene expression studies have been implemented widely for studying flower colour. However, the methods used were often only semi-quantitative or quantification was not done according to the MIQE-guidelines. We aimed to develop an accurate protocol for RT-qPCR and to validate the protocol to study flower colour in an azalea mapping population. An accurate RT-qPCR protocol had to be established. RNA quality was evaluated in a combined approach by means of different techniques e.g. SPUD-assay and Experion-analysis. We demonstrated the importance of testing noRT-samples for all genes under study to detect contaminating DNA. In spite of the limited sequence information available, we prepared a set of 11 reference genes which was validated in flower petals; a combination of three reference genes was most optimal. Finally we also used plasmids for the construction of standard curves. This allowed us to calculate gene-specific PCR efficiencies for every gene to assure an accurate quantification. The validity of the protocol was demonstrated by means of the study of six genes of the flavonoid biosynthesis pathway. No correlations were found between flower colour and the individual expression profiles. However, the combination of early pathway genes (CHS, F3H, F3'H and FLS) is clearly related to co-pigmentation with flavonols. The late pathway genes DFR and ANS are to a minor extent involved in differentiating between coloured and white flowers. Concerning pink coloration, we could demonstrate that the lower intensity in this type of flowers is correlated to the expression of F3'H. Currently in plant research, validated and qualitative RT-qPCR protocols are still rare. The protocol in this study can be implemented on all plant species to assure accurate quantification of gene expression. We have been able to correlate flower colour to the combined regulation of structural genes, both in the early and late branch of the pathway. This allowed us to differentiate between flower colours in a broader genetic background as was done so far in flower colour studies. These data will now be used for eQTL mapping to comprehend even more the regulation of this pathway.

  18. How to perform RT-qPCR accurately in plant species? A case study on flower colour gene expression in an azalea (Rhododendron simsii hybrids) mapping population

    PubMed Central

    2013-01-01

    Background Flower colour variation is one of the most crucial selection criteria in the breeding of a flowering pot plant, as is also the case for azalea (Rhododendron simsii hybrids). Flavonoid biosynthesis was studied intensively in several species. In azalea, flower colour can be described by means of a 3-gene model. However, this model does not clarify pink-coloration. The last decade gene expression studies have been implemented widely for studying flower colour. However, the methods used were often only semi-quantitative or quantification was not done according to the MIQE-guidelines. We aimed to develop an accurate protocol for RT-qPCR and to validate the protocol to study flower colour in an azalea mapping population. Results An accurate RT-qPCR protocol had to be established. RNA quality was evaluated in a combined approach by means of different techniques e.g. SPUD-assay and Experion-analysis. We demonstrated the importance of testing noRT-samples for all genes under study to detect contaminating DNA. In spite of the limited sequence information available, we prepared a set of 11 reference genes which was validated in flower petals; a combination of three reference genes was most optimal. Finally we also used plasmids for the construction of standard curves. This allowed us to calculate gene-specific PCR efficiencies for every gene to assure an accurate quantification. The validity of the protocol was demonstrated by means of the study of six genes of the flavonoid biosynthesis pathway. No correlations were found between flower colour and the individual expression profiles. However, the combination of early pathway genes (CHS, F3H, F3'H and FLS) is clearly related to co-pigmentation with flavonols. The late pathway genes DFR and ANS are to a minor extent involved in differentiating between coloured and white flowers. Concerning pink coloration, we could demonstrate that the lower intensity in this type of flowers is correlated to the expression of F3'H. Conclusions Currently in plant research, validated and qualitative RT-qPCR protocols are still rare. The protocol in this study can be implemented on all plant species to assure accurate quantification of gene expression. We have been able to correlate flower colour to the combined regulation of structural genes, both in the early and late branch of the pathway. This allowed us to differentiate between flower colours in a broader genetic background as was done so far in flower colour studies. These data will now be used for eQTL mapping to comprehend even more the regulation of this pathway. PMID:23800303

  19. Segregation analysis of cryptogenic epilepsy and an empirical test of the validity of the results

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ottman, R.; Hauser, W.A.; Barker-Cummings, C.

    1997-03-01

    We used POINTER to perform segregation analysis of crytogenic epilepsy in 1,557 three-generation families (probands and their parents, siblings, and offspring) ascertained from voluntary organizations. Analysis of the full data set indicated that the data were most consistent with an autosomal dominant (AD) model with 61% penetrance of the susceptibility gene. However, subsequent analyses revealed that the patterns of familial aggregation differed markedly between siblings and offspring of the probands. Risks in siblings were consistent with an autosomal recessive (AR) model and inconsistent with an AD model, whereas risks in offspring were inconsistent with an AR model and more consistentmore » with an AD model. As a further test of the validity of the AD model, we used sequential ascertainment to extend the family history information in the subset of families judged likely to carry the putative susceptibility gene because they contained at least three affected individuals. Prevalence of idiopathic/cryptogenic epilepsy was only 3.7% in newly identified relatives expected to have a 50% probability of carrying the susceptibility gene under an AD model. Approximately 30% (i.e., 50% X 61%) were expected to be affected under the AD model resulting from the segregation analysis. These results suggest that the familial distribution of cryptogenic epilepsy is inconsistent with any conventional genetic model. The differences between siblings and offspring in the patterns of familial risk are intriguing and should be investigated further. 28 refs., 6 tabs.« less

  20. Synchronous versus asynchronous modeling of gene regulatory networks.

    PubMed

    Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni

    2008-09-01

    In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.

  1. Identification and Validation of Selected Universal Stress Protein Domain Containing Drought-Responsive Genes in Pigeonpea (Cajanus cajan L.)

    PubMed Central

    Sinha, Pallavi; Pazhamala, Lekha T.; Singh, Vikas K.; Saxena, Rachit K.; Krishnamurthy, L.; Azam, Sarwar; Khan, Aamir W.; Varshney, Rajeev K.

    2016-01-01

    Pigeonpea is a resilient crop, which is relatively more drought tolerant than many other legume crops. To understand the molecular mechanisms of this unique feature of pigeonpea, 51 genes were selected using the Hidden Markov Models (HMM) those codes for proteins having close similarity to universal stress protein domain. Validation of these genes was conducted on three pigeonpea genotypes (ICPL 151, ICPL 8755, and ICPL 227) having different levels of drought tolerance. Gene expression analysis using qRT-PCR revealed 6, 8, and 18 genes to be ≥2-fold differentially expressed in ICPL 151, ICPL 8755, and ICPL 227, respectively. A total of 10 differentially expressed genes showed ≥2-fold up-regulation in the more drought tolerant genotype, which encoded four different classes of proteins. These include plant U-box protein (four genes), universal stress protein A-like protein (four genes), cation/H(+) antiporter protein (one gene) and an uncharacterized protein (one gene). Genes C.cajan_29830 and C.cajan_33874 belonging to uspA, were found significantly expressed in all the three genotypes with ≥2-fold expression variations. Expression profiling of these two genes on the four other legume crops revealed their specific role in pigeonpea. Therefore, these genes seem to be promising candidates for conferring drought tolerance specifically to pigeonpea. PMID:26779199

  2. Development of the first oligonucleotide microarray for global gene expression profiling in guinea pigs: defining the transcription signature of infectious diseases.

    PubMed

    Jain, Ruchi; Dey, Bappaditya; Tyagi, Anil K

    2012-10-02

    The Guinea pig (Cavia porcellus) is one of the most extensively used animal models to study infectious diseases. However, despite its tremendous contribution towards understanding the establishment, progression and control of a number of diseases in general and tuberculosis in particular, the lack of fully annotated guinea pig genome sequence as well as appropriate molecular reagents has severely hampered detailed genetic and immunological analysis in this animal model. By employing the cross-species hybridization technique, we have developed an oligonucleotide microarray with 44,000 features assembled from different mammalian species, which to the best of our knowledge is the first attempt to employ microarray to study the global gene expression profile in guinea pigs. To validate and demonstrate the merit of this microarray, we have studied, as an example, the expression profile of guinea pig lungs during the advanced phase of M. tuberculosis infection. A significant upregulation of 1344 genes and a marked down regulation of 1856 genes in the lungs identified a disease signature of pulmonary tuberculosis infection. We report the development of first comprehensive microarray for studying the global gene expression profile in guinea pigs and validation of its usefulness with tuberculosis as a case study. An important gap in the area of infectious diseases has been addressed and a valuable molecular tool is provided to optimally harness the potential of guinea pig model to develop better vaccines and therapies against human diseases.

  3. Integrative analysis of micro-RNA, gene expression, and survival of glioblastoma multiforme.

    PubMed

    Huang, Yen-Tsung; Hsu, Thomas; Kelsey, Karl T; Lin, Chien-Ling

    2015-02-01

    Glioblastoma multiforme (GBM), the most common type of malignant brain tumor, is highly fatal. Limited understanding of its rapid progression necessitates additional approaches that integrate what is known about the genomics of this cancer. Using a discovery set (n = 348) and a validation set (n = 174) of GBM patients, we performed genome-wide analyses that integrated mRNA and micro-RNA expression data from GBM as well as associated survival information, assessing coordinated variability in each as this reflects their known mechanistic functions. Cox proportional hazards models were used for the survival analyses, and nonparametric permutation tests were performed for the micro-RNAs to investigate the association between the number of associated genes and its prognostication. We also utilized mediation analyses for micro-RNA-gene pairs to identify their mediation effects. Genome-wide analyses revealed a novel pattern: micro-RNAs related to more gene expressions are more likely to be associated with GBM survival (P = 4.8 × 10(-5)). Genome-wide mediation analyses for the 32,660 micro-RNA-gene pairs with strong association (false discovery rate [FDR] < 0.01%) identified 51 validated pairs with significant mediation effect. Of the 51 pairs, miR-223 had 16 mediation genes. These 16 mediation genes of miR-223 were also highly associated with various other micro-RNAs and mediated their prognostic effects as well. We further constructed a gene signature using the 16 genes, which was highly associated with GBM survival in both the discovery and validation sets (P = 9.8 × 10(-6)). This comprehensive study discovered mediation effects of micro-RNA to gene expression and GBM survival and provided a new analytic framework for integrative genomics. © 2014 WILEY PERIODICALS, INC.

  4. Endogenous Reference Genes and Their Quantitative Real-Time PCR Assays for Genetically Modified Bread Wheat (Triticum aestivum L.) Detection.

    PubMed

    Yang, Litao; Quan, Sheng; Zhang, Dabing

    2017-01-01

    Endogenous reference genes (ERG) and their derivate analytical methods are standard requirements for analysis of genetically modified organisms (GMOs). Development and validation of suitable ERGs is the primary step for establishing assays that monitoring the genetically modified (GM) contents in food/feed samples. Herein, we give a review of the ERGs currently used for GM wheat analysis, such as ACC1, PKABA1, ALMT1, and Waxy-D1, as well as their performances in GM wheat analysis. Also, we discussed one model for developing and validating one ideal RG for one plant species based on our previous research work.

  5. Degrees of separation as a statistical tool for evaluating candidate genes.

    PubMed

    Nelson, Ronald M; Pettersson, Mats E

    2014-12-01

    Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Discovery and validation of a glioblastoma co-expressed gene module

    PubMed Central

    Dunwoodie, Leland J.; Poehlman, William L.; Ficklin, Stephen P.; Feltus, Frank Alexander

    2018-01-01

    Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network. PMID:29541392

  7. Discovery and validation of a glioblastoma co-expressed gene module.

    PubMed

    Dunwoodie, Leland J; Poehlman, William L; Ficklin, Stephen P; Feltus, Frank Alexander

    2018-02-16

    Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.

  8. Inference of Gene Regulatory Networks Using Time-Series Data: A Survey

    PubMed Central

    Sima, Chao; Hua, Jianping; Jung, Sungwon

    2009-01-01

    The advent of high-throughput technology like microarrays has provided the platform for studying how different cellular components work together, thus created an enormous interest in mathematically modeling biological network, particularly gene regulatory network (GRN). Of particular interest is the modeling and inference on time-series data, which capture a more thorough picture of the system than non-temporal data do. We have given an extensive review of methodologies that have been used on time-series data. In realizing that validation is an impartible part of the inference paradigm, we have also presented a discussion on the principles and challenges in performance evaluation of different methods. This survey gives a panoramic view on these topics, with anticipation that the readers will be inspired to improve and/or expand GRN inference and validation tool repository. PMID:20190956

  9. GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

    PubMed

    Schulz, Tizian; Stoye, Jens; Doerr, Daniel

    2018-05-08

    Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

  10. Reverse-engineering the genetic circuitry of a cancer cell with predicted intervention in chronic lymphocytic leukemia.

    PubMed

    Vallat, Laurent; Kemper, Corey A; Jung, Nicolas; Maumy-Bertrand, Myriam; Bertrand, Frédéric; Meyer, Nicolas; Pocheville, Arnaud; Fisher, John W; Gribben, John G; Bahram, Seiamak

    2013-01-08

    Cellular behavior is sustained by genetic programs that are progressively disrupted in pathological conditions--notably, cancer. High-throughput gene expression profiling has been used to infer statistical models describing these cellular programs, and development is now needed to guide orientated modulation of these systems. Here we develop a regression-based model to reverse-engineer a temporal genetic program, based on relevant patterns of gene expression after cell stimulation. This method integrates the temporal dimension of biological rewiring of genetic programs and enables the prediction of the effect of targeted gene disruption at the system level. We tested the performance accuracy of this model on synthetic data before reverse-engineering the response of primary cancer cells to a proliferative (protumorigenic) stimulation in a multistate leukemia biological model (i.e., chronic lymphocytic leukemia). To validate the ability of our method to predict the effects of gene modulation on the global program, we performed an intervention experiment on a targeted gene. Comparison of the predicted and observed gene expression changes demonstrates the possibility of predicting the effects of a perturbation in a gene regulatory network, a first step toward an orientated intervention in a cancer cell genetic program.

  11. Classifying lower grade glioma cases according to whole genome gene expression.

    PubMed

    Chen, Baoshi; Liang, Tingyu; Yang, Pei; Wang, Haoyuan; Liu, Yanwei; Yang, Fan; You, Gan

    2016-11-08

    To identify a gene-based signature as a novel prognostic model in lower grade gliomas. A gene signature developed from HOXA7, SLC2A4RG and MN1 could segregate patients into low and high risk score groups with different overall survival (OS), and was validated in TCGA RNA-seq and GSE16011 mRNA array datasets. Receiver operating characteristic (ROC) was performed to show that the three-gene signature was more sensitive and specific than histology, grade, age, IDH1 mutation and 1p/19q co-deletion. Gene Set Enrichment Analysis (GSEA) and GO analysis showed high-risk samples were associated with tumor associated macrophages (TAMs) and highly invasive phenotypes. Moreover, HOXA7-siRNA inhibited migration and invasion in vitro, and downregulated MMP9 at the protein level in U251 glioma cells. A cohort of 164 glioma specimens from the Chinese Glioma Genome Atlas (CGGA) array database were assessed as the training group. TCGA RNA-seq and GSE16011 mRNA array datasets were used for validation. Regression analyses and linear risk score assessment were performed for the identification of the three-gene signature comprising HOXA7, SLC2A4RG and MN1. We established a three-gene signature for lower grade gliomas, which could independently predict overall survival (OS) of lower grade glioma patients with higher sensitivity and specificity compared with other clinical characteristics. These findings indicate that the three-gene signature is a new prognostic model that could provide improved OS prediction and accurate therapies for lower grade glioma patients.

  12. Modeling mania in preclinical settings: a comprehensive review

    PubMed Central

    Sharma, Ajaykumar N.; Fries, Gabriel R.; Galvez, Juan F.; Valvassori, Samira S.; Soares, Jair C.; Carvalho, André F.; Quevedo, Joao

    2015-01-01

    The current pathophysiological understanding of mechanisms leading to onset and progression of bipolar manic episodes remains limited. At the same time, available animal models for mania have limited face, construct, and predictive validities. Additionally, these models fail to encompass recent pathophysiological frameworks of bipolar disorder (BD), e.g. neuroprogression. Therefore, there is a need to search for novel preclinical models for mania that could comprehensively address these limitations. Herein we review the history, validity, and caveats of currently available animal models for mania. We also review new genetic models for mania, namely knockout mice for genes involved in neurotransmission, synapse formation, and intracellular signaling pathways. Furthermore, we review recent trends in preclinical models for mania that may aid in the comprehension of mechanisms underlying the neuroprogressive and recurring nature of BD. In conclusion, the validity of animal models for mania remains limited. Nevertheless, novel (e.g. genetic) animal models as well as adaptation of existing paradigms hold promise. PMID:26545487

  13. Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways.

    PubMed

    Deng, Wenping; Zhang, Kui; Busov, Victor; Wei, Hairong

    2017-01-01

    Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs. A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained. BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental validation.

  14. Using the epigenetic field defect to detect prostate cancer in biopsy negative patients.

    PubMed

    Truong, Matthew; Yang, Bing; Livermore, Andrew; Wagner, Jennifer; Weeratunga, Puspha; Huang, Wei; Dhir, Rajiv; Nelson, Joel; Lin, Daniel W; Jarrard, David F

    2013-06-01

    We determined whether a novel combination of field defect DNA methylation markers could predict the presence of prostate cancer using histologically normal transrectal ultrasound guided biopsy cores. Methylation was assessed using quantitative Pyrosequencing® in a training set consisting of 65 nontumor and tumor associated prostate tissues from University of Wisconsin. A multiplex model was generated using multivariate logistic regression and externally validated in blinded fashion in a set of 47 nontumor and tumor associated biopsy specimens from University of Washington. We observed robust methylation differences in all genes at all CpGs assayed (p <0.0001). Regression models incorporating individual genes (EVX1, CAV1 and FGF1) and a gene combination (EVX1 and FGF1) discriminated nontumor from tumor associated tissues in the original training set (AUC 0.796-0.898, p <0.001). On external validation uniplex models incorporating EVX1, CAV1 or FGF1 discriminated tumor from nontumor associated biopsy negative specimens (AUC 0.702, 0.696 and 0.658, respectively, p <0.05). A multiplex model (EVX1 and FGF1) identified patients with prostate cancer (AUC 0.774, p = 0.001) and had a negative predictive value of 0.909. Comparison between 2 separate cores in patients in this validation set revealed similar methylation defects, indicating detection of a widespread field defect. A widespread epigenetic field defect can be used to detect prostate cancer in patients with histologically negative biopsies. To our knowledge this assay is unique, in that it detects alterations in nontumor cells. With further validation this marker combination (EVX1 and FGF1) has the potential to decrease the need for repeat prostate biopsies, a procedure associated with cost and complications. Copyright © 2013 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  15. SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

    PubMed Central

    2014-01-01

    Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894

  16. Twenty-four signature genes predict the prognosis of oral squamous cell carcinoma with high accuracy and repeatability

    PubMed Central

    Gao, Jianyong; Tian, Gang; Han, Xu; Zhu, Qiang

    2018-01-01

    Oral squamous cell carcinoma (OSCC) is the sixth most common type cancer worldwide, with poor prognosis. The present study aimed to identify gene signatures that could classify OSCC and predict prognosis in different stages. A training data set (GSE41613) and two validation data sets (GSE42743 and GSE26549) were acquired from the online Gene Expression Omnibus database. In the training data set, patients were classified based on the tumor-node-metastasis staging system, and subsequently grouped into low stage (L) or high stage (H). Signature genes between L and H stages were selected by disparity index analysis, and classification was performed by the expression of these signature genes. The established classification was compared with the L and H classification, and fivefold cross validation was used to evaluate the stability. Enrichment analysis for the signature genes was implemented by the Database for Annotation, Visualization and Integration Discovery. Two validation data sets were used to determine the precise of classification. Survival analysis was conducted followed each classification using the package ‘survival’ in R software. A set of 24 signature genes was identified based on the classification model with the Fi value of 0.47, which was used to distinguish OSCC samples in two different stages. Overall survival of patients in the H stage was higher than those in the L stage. Signature genes were primarily enriched in ‘ether lipid metabolism’ pathway and biological processes such as ‘positive regulation of adaptive immune response’ and ‘apoptotic cell clearance’. The results provided a novel 24-gene set that may be used as biomarkers to predict OSCC prognosis with high accuracy, which may be used to determine an appropriate treatment program for patients with OSCC in addition to the traditional evaluation index. PMID:29257303

  17. Sex-Specific Associations between Particulate Matter Exposure and Gene Expression in Independent Discovery and Validation Cohorts of Middle-Aged Men and Women.

    PubMed

    Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S

    2017-04-01

    Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM 10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM 2.5 and PM 10 exposures over 2-years were estimated for each participant's residential address using spatiotemporal interpolation in combination with a dispersion model. Average long-term PM 10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m 3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM 10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM 10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 ( p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM 10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM 10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Expression of the sex-specific candidate genes identified in the discovery population predicted PM 10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular mechanisms for PM-induced health effects.

  18. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    PubMed Central

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-01-01

    Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method. PMID:19193216

  19. Using expression genetics to study the neurobiology of ethanol and alcoholism.

    PubMed

    Farris, Sean P; Wolen, Aaron R; Miles, Michael F

    2010-01-01

    Recent simultaneous progress in human and animal model genetics and the advent of microarray whole genome expression profiling have produced prodigious data sets on genetic loci, potential candidate genes, and differential gene expression related to alcoholism and ethanol behaviors. Validated target genes or gene networks functioning in alcoholism are still of meager proportions. Genetical genomics, which combines genetic analysis of both traditional phenotypes and whole genome expression data, offers a potential methodology for characterizing brain gene networks functioning in alcoholism. This chapter will describe concepts, approaches, and recent findings in the field of genetical genomics as it applies to alcohol research. Copyright 2010 Elsevier Inc. All rights reserved.

  20. Discovering disease-associated genes in weighted protein-protein interaction networks

    NASA Astrophysics Data System (ADS)

    Cui, Ying; Cai, Meng; Stanley, H. Eugene

    2018-04-01

    Although there have been many network-based attempts to discover disease-associated genes, most of them have not taken edge weight - which quantifies their relative strength - into consideration. We use connection weights in a protein-protein interaction (PPI) network to locate disease-related genes. We analyze the topological properties of both weighted and unweighted PPI networks and design an improved random forest classifier to distinguish disease genes from non-disease genes. We use a cross-validation test to confirm that weighted networks are better able to discover disease-associated genes than unweighted networks, which indicates that including link weight in the analysis of network properties provides a better model of complex genotype-phenotype associations.

  1. A Predictive Approach to Network Reverse-Engineering

    NASA Astrophysics Data System (ADS)

    Wiggins, Chris

    2005-03-01

    A central challenge of systems biology is the ``reverse engineering" of transcriptional networks: inferring which genes exert regulatory control over which other genes. Attempting such inference at the genomic scale has only recently become feasible, via data-intensive biological innovations such as DNA microrrays (``DNA chips") and the sequencing of whole genomes. In this talk we present a predictive approach to network reverse-engineering, in which we integrate DNA chip data and sequence data to build a model of the transcriptional network of the yeast S. cerevisiae capable of predicting the response of genes in unseen experiments. The technique can also be used to extract ``motifs,'' sequence elements which act as binding sites for regulatory proteins. We validate by a number of approaches and present comparison of theoretical prediction vs. experimental data, along with biological interpretations of the resulting model. En route, we will illustrate some basic notions in statistical learning theory (fitting vs. over-fitting; cross- validation; assessing statistical significance), highlighting ways in which physicists can make a unique contribution in data- driven approaches to reverse engineering.

  2. Construction of a novel multi-gene assay (42-gene classifier) for prediction of late recurrence in ER-positive breast cancer patients.

    PubMed

    Tsunashima, Ryo; Naoi, Yasuto; Shimazu, Kenzo; Kagara, Naofumi; Shimoda, Masashi; Tanei, Tomonori; Miyake, Tomohiro; Kim, Seung Jin; Noguchi, Shinzaburo

    2018-05-04

    Prediction models for late (> 5 years) recurrence in ER-positive breast cancer need to be developed for the accurate selection of patients for extended hormonal therapy. We attempted to develop such a prediction model focusing on the differences in gene expression between breast cancers with early and late recurrence. For the training set, 779 ER-positive breast cancers treated with tamoxifen alone for 5 years were selected from the databases (GSE6532, GSE12093, GSE17705, and GSE26971). For the validation set, 221 ER-positive breast cancers treated with adjuvant hormonal therapy for 5 years with or without chemotherapy at our hospital were included. Gene expression was assayed by DNA microarray analysis (Affymetrix U133 plus 2.0). With the 42 genes differentially expressed in early and late recurrence breast cancers in the training set, a prediction model (42GC) for late recurrence was constructed. The patients classified by 42GC into the late recurrence-like group showed a significantly (P = 0.006) higher late recurrence rate as expected but a significantly (P = 1.62 × E-13) lower rate for early recurrence than non-late recurrence-like group. These observations were confirmed for the validation set, i.e., P = 0.020 for late recurrence and P = 5.70 × E-5 for early recurrence. We developed a unique prediction model (42GC) for late recurrence by focusing on the biological differences between breast cancers with early and late recurrence. Interestingly, patients in the late recurrence-like group by 42GC were at low risk for early recurrence.

  3. Efficient CRISPR/Cas9-based genome editing in carrot cells.

    PubMed

    Klimek-Chodacka, Magdalena; Oleszkiewicz, Tomasz; Lowder, Levi G; Qi, Yiping; Baranski, Rafal

    2018-04-01

    The first report presenting successful and efficient carrot genome editing using CRISPR/Cas9 system. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated (Cas9) is a powerful genome editing tool that has been widely adopted in model organisms recently, but has not been used in carrot-a model species for in vitro culture studies and an important health-promoting crop grown worldwide. In this study, for the first time, we report application of the CRISPR/Cas9 system for efficient targeted mutagenesis of the carrot genome. Multiplexing CRISPR/Cas9 vectors expressing two single-guide RNA (gRNAs) targeting the carrot flavanone-3-hydroxylase (F3H) gene were tested for blockage of the anthocyanin biosynthesis in a model purple-colored callus using Agrobacterium-mediated genetic transformation. This approach allowed fast and visual comparison of three codon-optimized Cas9 genes and revealed that the most efficient one in generating F3H mutants was the Arabidopsis codon-optimized AteCas9 gene with up to 90% efficiency. Knockout of F3H gene resulted in the discoloration of calli, validating the functional role of this gene in the anthocyanin biosynthesis in carrot as well as providing a visual marker for screening successfully edited events. Most resulting mutations were small Indels, but long chromosome fragment deletions of 116-119 nt were also generated with simultaneous cleavage mediated by two gRNAs. The results demonstrate successful site-directed mutagenesis in carrot with CRISPR/Cas9 and the usefulness of a model callus culture to validate genome editing systems. Given that the carrot genome has been sequenced recently, our timely study sheds light on the promising application of genome editing tools for boosting basic and translational research in this important vegetable crop.

  4. Chronic Antibody-Mediated Rejection in Nonhuman Primate Renal Allografts: Validation of Human Histological and Molecular Phenotypes.

    PubMed

    Adam, B A; Smith, R N; Rosales, I A; Matsunami, M; Afzali, B; Oura, T; Cosimi, A B; Kawai, T; Colvin, R B; Mengel, M

    2017-11-01

    Molecular testing represents a promising adjunct for the diagnosis of antibody-mediated rejection (AMR). Here, we apply a novel gene expression platform in sequential formalin-fixed paraffin-embedded samples from nonhuman primate (NHP) renal transplants. We analyzed 34 previously described gene transcripts related to AMR in humans in 197 archival NHP samples, including 102 from recipients that developed chronic AMR, 80 from recipients without AMR, and 15 normal native nephrectomies. Three endothelial genes (VWF, DARC, and CAV1), derived from 10-fold cross-validation receiver operating characteristic curve analysis, demonstrated excellent discrimination between AMR and non-AMR samples (area under the curve = 0.92). This three-gene set correlated with classic features of AMR, including glomerulitis, capillaritis, glomerulopathy, C4d deposition, and DSAs (r = 0.39-0.63, p < 0.001). Principal component analysis confirmed the association between three-gene set expression and AMR and highlighted the ambiguity of v lesions and ptc lesions between AMR and T cell-mediated rejection (TCMR). Elevated three-gene set expression corresponded with the development of immunopathological evidence of rejection and often preceded it. Many recipients demonstrated mixed AMR and TCMR, suggesting that this represents the natural pattern of rejection. These data provide NHP animal model validation of recent updates to the Banff classification including the assessment of molecular markers for diagnosing AMR. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.

  5. Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes.

    PubMed

    Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun; Borodovsky, Mark

    2018-05-17

    In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed "heuristic" models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts. © 2018 Lomsadze et al.; Published by Cold Spring Harbor Laboratory Press.

  6. Validity of using ad hoc methods to analyze secondary traits in case-control association studies.

    PubMed

    Yung, Godwin; Lin, Xihong

    2016-12-01

    Case-control association studies often collect from their subjects information on secondary phenotypes. Reusing the data and studying the association between genes and secondary phenotypes provide an attractive and cost-effective approach that can lead to discovery of new genetic associations. A number of approaches have been proposed, including simple and computationally efficient ad hoc methods that ignore ascertainment or stratify on case-control status. Justification for these approaches relies on the assumption of no covariates and the correct specification of the primary disease model as a logistic model. Both might not be true in practice, for example, in the presence of population stratification or the primary disease model following a probit model. In this paper, we investigate the validity of ad hoc methods in the presence of covariates and possible disease model misspecification. We show that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype. We also show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a probit model instead of a logistic model. Our results are justified theoretically and via simulations. Applied to real data analysis of genetic associations with cigarette smoking, ad hoc methods collectively identified as highly significant (P<10-5) single nucleotide polymorphisms from over 10 genes, genes that were identified in previous studies of smoking cessation. © 2016 WILEY PERIODICALS, INC.

  7. Sleeping Beauty transposon system for genetic etiological research and gene therapy of cancers.

    PubMed

    Hou, Xiaomei; Du, Yan; Deng, Yang; Wu, Jianfeng; Cao, Guangwen

    2015-01-01

    Carcinogenesis is etiologically associated with somatic mutations of critical genes. Recently, a number of somatic mutations and key molecules have been found to be involved in functional networks affecting cancer progression. Suitable animal models are required to validate cancer-promoting or -inhibiting capacities of these mutants and molecules. Sleeping Beauty transposon system consists of a transposon that carries gene(s) of interest and a transposase that recognizes, excises, and reinserts genes in given location of the genome. It can create both gain-of-function and loss-of-function mutations, thus being frequently chosen to investigate the etiological mechanisms and gene therapy for cancers in animal models. In this review, we summarized current advances of Sleeping Beauty transposon system in revealing molecular mechanism of cancers and improving gene therapy. Understanding molecular mechanisms by which driver mutations contribute to carcinogenesis and metastasis may pave the way for the development of innovative prophylactic and therapeutic strategies against malignant diseases.

  8. A combined analysis of genome-wide expression profiling of bipolar disorder in human prefrontal cortex.

    PubMed

    Wang, Jinglu; Qu, Susu; Wang, Weixiao; Guo, Liyuan; Zhang, Kunlin; Chang, Suhua; Wang, Jing

    2016-11-01

    Numbers of gene expression profiling studies of bipolar disorder have been published. Besides different array chips and tissues, variety of the data processes in different cohorts aggravated the inconsistency of results of these genome-wide gene expression profiling studies. By searching the gene expression databases, we obtained six data sets for prefrontal cortex (PFC) of bipolar disorder with raw data and combinable platforms. We used standardized pre-processing and quality control procedures to analyze each data set separately and then combined them into a large gene expression matrix with 101 bipolar disorder subjects and 106 controls. A standard linear mixed-effects model was used to calculate the differentially expressed genes (DEGs). Multiple levels of sensitivity analyses and cross validation with genetic data were conducted. Functional and network analyses were carried out on basis of the DEGs. In the result, we identified 198 unique differentially expressed genes in the PFC of bipolar disorder and control. Among them, 115 DEGs were robust to at least three leave-one-out tests or different pre-processing methods; 51 DEGs were validated with genetic association signals. Pathway enrichment analysis showed these DEGs were related with regulation of neurological system, cell death and apoptosis, and several basic binding processes. Protein-protein interaction network further identified one key hub gene. We have contributed the most comprehensive integrated analysis of bipolar disorder expression profiling studies in PFC to date. The DEGs, especially those with multiple validations, may denote a common signature of bipolar disorder and contribute to the pathogenesis of disease. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Gene-Expression Signature Predicts Postoperative Recurrence in Stage I Non-Small Cell Lung Cancer Patients

    PubMed Central

    Lu, Yan; Wang, Liang; Liu, Pengyuan; Yang, Ping; You, Ming

    2012-01-01

    About 30% stage I non-small cell lung cancer (NSCLC) patients undergoing resection will recur. Robust prognostic markers are required to better manage therapy options. The purpose of this study is to develop and validate a novel gene-expression signature that can predict tumor recurrence of stage I NSCLC patients. Cox proportional hazards regression analysis was performed to identify recurrence-related genes and a partial Cox regression model was used to generate a gene signature of recurrence in the training dataset −142 stage I lung adenocarcinomas without adjunctive therapy from the Director's Challenge Consortium. Four independent validation datasets, including GSE5843, GSE8894, and two other datasets provided by Mayo Clinic and Washington University, were used to assess the prediction accuracy by calculating the correlation between risk score estimated from gene expression and real recurrence-free survival time and AUC of time-dependent ROC analysis. Pathway-based survival analyses were also performed. 104 probesets correlated with recurrence in the training dataset. They are enriched in cell adhesion, apoptosis and regulation of cell proliferation. A 51-gene expression signature was identified to distinguish patients likely to develop tumor recurrence (Dxy = −0.83, P<1e-16) and this signature was validated in four independent datasets with AUC >85%. Multiple pathways including leukocyte transendothelial migration and cell adhesion were highly correlated with recurrence-free survival. The gene signature is highly predictive of recurrence in stage I NSCLC patients, which has important prognostic and therapeutic implications for the future management of these patients. PMID:22292069

  10. Predicting neuroblastoma using developmental signals and a logic-based model.

    PubMed

    Kasemeier-Kulesa, Jennifer C; Schnell, Santiago; Woolley, Thomas; Spengler, Jennifer A; Morrison, Jason A; McKinney, Mary C; Pushel, Irina; Wolfe, Lauren A; Kulesa, Paul M

    2018-07-01

    Genomic information from human patient samples of pediatric neuroblastoma cancers and known outcomes have led to specific gene lists put forward as high risk for disease progression. However, the reliance on gene expression correlations rather than mechanistic insight has shown limited potential and suggests a critical need for molecular network models that better predict neuroblastoma progression. In this study, we construct and simulate a molecular network of developmental genes and downstream signals in a 6-gene input logic model that predicts a favorable/unfavorable outcome based on the outcome of the four cell states including cell differentiation, proliferation, apoptosis, and angiogenesis. We simulate the mis-expression of the tyrosine receptor kinases, trkA and trkB, two prognostic indicators of neuroblastoma, and find differences in the number and probability distribution of steady state outcomes. We validate the mechanistic model assumptions using RNAseq of the SHSY5Y human neuroblastoma cell line to define the input states and confirm the predicted outcome with antibody staining. Lastly, we apply input gene signatures from 77 published human patient samples and show that our model makes more accurate disease outcome predictions for early stage disease than any current neuroblastoma gene list. These findings highlight the predictive strength of a logic-based model based on developmental genes and offer a better understanding of the molecular network interactions during neuroblastoma disease progression. Copyright © 2018. Published by Elsevier B.V.

  11. Expression analysis in response to drought stress in soybean: Shedding light on the regulation of metabolic pathway genes.

    PubMed

    Guimarães-Dias, Fábia; Neves-Borges, Anna Cristina; Viana, Antonio Americo Barbosa; Mesquita, Rosilene Oliveira; Romano, Eduardo; de Fátima Grossi-de-Sá, Maria; Nepomuceno, Alexandre Lima; Loureiro, Marcelo Ehlers; Alves-Ferreira, Márcio

    2012-06-01

    Metabolomics analysis of wild type Arabidopsis thaliana plants, under control and drought stress conditions revealed several metabolic pathways that are induced under water deficit. The metabolic response to drought stress is also associated with ABA dependent and independent pathways, allowing a better understanding of the molecular mechanisms in this model plant. Through combining an in silico approach and gene expression analysis by quantitative real-time PCR, the present work aims at identifying genes of soybean metabolic pathways potentially associated with water deficit. Digital expression patterns of Arabidopsis genes, which were selected based on the basis of literature reports, were evaluated under drought stress condition by Genevestigator. Genes that showed strong induction under drought stress were selected and used as bait to identify orthologs in the soybean genome. This allowed us to select 354 genes of putative soybean orthologs of 79 Arabidopsis genes belonging to 38 distinct metabolic pathways. The expression pattern of the selected genes was verified in the subtractive libraries available in the GENOSOJA project. Subsequently, 13 genes from different metabolic pathways were selected for validation by qPCR experiments. The expression of six genes was validated in plants undergoing drought stress in both pot-based and hydroponic cultivation systems. The results suggest that the metabolic response to drought stress is conserved in Arabidopsis and soybean plants.

  12. A recellularized human colon model identifies cancer driver genes

    PubMed Central

    Chen, Huanhuan Joyce; Wei, Zhubo; Sun, Jian; Bhattacharya, Asmita; Savage, David J; Serda, Rita; Mackeyev, Yuri; Curley, Steven A.; Bu, Pengcheng; Wang, Lihua; Chen, Shuibing; Cohen-Gould, Leona; Huang, Emina; Shen, Xiling; Lipkin, Steven M.; Copeland, Neal G.; Jenkins, Nancy A.; Shuler, Michael L.

    2016-01-01

    Refined cancer models are needed to bridge the gap between cell-line, animal and clinical research. Here we describe the engineering of an organotypic colon cancer model by recellularization of a native human matrix that contains cell-populated mucosa and an intact muscularis mucosa layer. This ex vivo system recapitulates the pathophysiological progression from APC-mutant neoplasia to submucosal invasive tumor. We used it to perform a Sleeping Beauty transposon mutagenesis screen to identify genes that cooperate with mutant APC in driving invasive neoplasia. 38 candidate invasion driver genes were identified, 17 of which have been previously implicated in colorectal cancer progression, including TCF7L2, TWIST2, MSH2, DCC and EPHB1/2. Six invasion driver genes that to our knowledge have not been previously described were validated in vitro using cell proliferation, migration and invasion assays, and ex vivo using recellularized human colon. These results demonstrate the utility of our organoid model for studying cancer biology. PMID:27398792

  13. Major psychological factors affecting acceptance of gene-recombination technology.

    PubMed

    Tanaka, Yutaka

    2004-12-01

    The purpose of this study was to verify the validity of a causal model that was made to predict the acceptance of gene-recombination technology. A structural equation model was used as a causal model. First of all, based on preceding studies, the factors of perceived risk, perceived benefit, and trust were set up as important psychological factors determining acceptance of gene-recombination technology in the structural equation model. An additional factor, "sense of bioethics," which I consider to be important for acceptance of biotechnology, was added to the model. Based on previous studies, trust was set up to have an indirect influence on the acceptance of gene-recombination technology through perceived risk and perceived benefit in the model. Participants were 231 undergraduate students in Japan who answered a questionnaire with a 5-point bipolar scale. The results indicated that the proposed model fits the data well, and showed that acceptance of gene-recombination technology is explained largely by four factors, that is, perceived risk, perceived benefit, trust, and sense of bioethics, whether the technology is applied to plants, animals, or human beings. However, the relative importance of the four factors was found to vary depending on whether the gene-recombination technology was applied to plants, animals, or human beings. Specifically, the factor of sense of bioethics is the most important factor in acceptance of plant gene-recombination technology and animal gene-recombination technology, and the factors of trust and perceived risk are the most important factors in acceptance of human being gene-recombination technology.

  14. Genome-wide transposon mutagenesis of Proteus mirabilis: Essential genes, fitness factors for catheter-associated urinary tract infection, and the impact of polymicrobial infection on fitness requirements.

    PubMed

    Armbruster, Chelsie E; Forsyth-DeOrnellas, Valerie; Johnson, Alexandra O; Smith, Sara N; Zhao, Lili; Wu, Weisheng; Mobley, Harry L T

    2017-06-01

    The Gram-negative bacterium Proteus mirabilis is a leading cause of catheter-associated urinary tract infections (CAUTIs), which are often polymicrobial. Numerous prior studies have uncovered virulence factors for P. mirabilis pathogenicity in a murine model of ascending UTI, but little is known concerning pathogenesis during CAUTI or polymicrobial infection. In this study, we utilized five pools of 10,000 transposon mutants each and transposon insertion-site sequencing (Tn-Seq) to identify the full arsenal of P. mirabilis HI4320 fitness factors for single-species versus polymicrobial CAUTI with Providencia stuartii BE2467. 436 genes in the input pools lacked transposon insertions and were therefore concluded to be essential for P. mirabilis growth in rich medium. 629 genes were identified as P. mirabilis fitness factors during single-species CAUTI. Tn-Seq from coinfection with P. stuartii revealed 217/629 (35%) of the same genes as identified by single-species Tn-Seq, and 1353 additional factors that specifically contribute to colonization during coinfection. Mutants were constructed in eight genes of interest to validate the initial screen: 7/8 (88%) mutants exhibited the expected phenotypes for single-species CAUTI, and 3/3 (100%) validated the expected phenotypes for polymicrobial CAUTI. This approach provided validation of numerous previously described P. mirabilis fitness determinants from an ascending model of UTI, the discovery of novel fitness determinants specifically for CAUTI, and a stringent assessment of how polymicrobial infection influences fitness requirements. For instance, we describe a requirement for branched-chain amino acid biosynthesis by P. mirabilis during coinfection due to high-affinity import of leucine by P. stuartii. Further investigation of genes and pathways that provide a competitive advantage during both single-species and polymicrobial CAUTI will likely provide robust targets for therapeutic intervention to reduce P. mirabilis CAUTI incidence and severity.

  15. Genome-wide transposon mutagenesis of Proteus mirabilis: Essential genes, fitness factors for catheter-associated urinary tract infection, and the impact of polymicrobial infection on fitness requirements

    PubMed Central

    Smith, Sara N.; Zhao, Lili; Wu, Weisheng

    2017-01-01

    The Gram-negative bacterium Proteus mirabilis is a leading cause of catheter-associated urinary tract infections (CAUTIs), which are often polymicrobial. Numerous prior studies have uncovered virulence factors for P. mirabilis pathogenicity in a murine model of ascending UTI, but little is known concerning pathogenesis during CAUTI or polymicrobial infection. In this study, we utilized five pools of 10,000 transposon mutants each and transposon insertion-site sequencing (Tn-Seq) to identify the full arsenal of P. mirabilis HI4320 fitness factors for single-species versus polymicrobial CAUTI with Providencia stuartii BE2467. 436 genes in the input pools lacked transposon insertions and were therefore concluded to be essential for P. mirabilis growth in rich medium. 629 genes were identified as P. mirabilis fitness factors during single-species CAUTI. Tn-Seq from coinfection with P. stuartii revealed 217/629 (35%) of the same genes as identified by single-species Tn-Seq, and 1353 additional factors that specifically contribute to colonization during coinfection. Mutants were constructed in eight genes of interest to validate the initial screen: 7/8 (88%) mutants exhibited the expected phenotypes for single-species CAUTI, and 3/3 (100%) validated the expected phenotypes for polymicrobial CAUTI. This approach provided validation of numerous previously described P. mirabilis fitness determinants from an ascending model of UTI, the discovery of novel fitness determinants specifically for CAUTI, and a stringent assessment of how polymicrobial infection influences fitness requirements. For instance, we describe a requirement for branched-chain amino acid biosynthesis by P. mirabilis during coinfection due to high-affinity import of leucine by P. stuartii. Further investigation of genes and pathways that provide a competitive advantage during both single-species and polymicrobial CAUTI will likely provide robust targets for therapeutic intervention to reduce P. mirabilis CAUTI incidence and severity. PMID:28614382

  16. Boolean Dynamic Modeling Approaches to Study Plant Gene Regulatory Networks: Integration, Validation, and Prediction.

    PubMed

    Velderraín, José Dávila; Martínez-García, Juan Carlos; Álvarez-Buylla, Elena R

    2017-01-01

    Mathematical models based on dynamical systems theory are well-suited tools for the integration of available molecular experimental data into coherent frameworks in order to propose hypotheses about the cooperative regulatory mechanisms driving developmental processes. Computational analysis of the proposed models using well-established methods enables testing the hypotheses by contrasting predictions with observations. Within such framework, Boolean gene regulatory network dynamical models have been extensively used in modeling plant development. Boolean models are simple and intuitively appealing, ideal tools for collaborative efforts between theorists and experimentalists. In this chapter we present protocols used in our group for the study of diverse plant developmental processes. We focus on conceptual clarity and practical implementation, providing directions to the corresponding technical literature.

  17. miRTar2GO: a novel rule-based model learning method for cell line specific microRNA target prediction that integrates Ago2 CLIP-Seq and validated microRNA-target interaction data.

    PubMed

    Ahadi, Alireza; Sablok, Gaurav; Hutvagner, Gyorgy

    2017-04-07

    MicroRNAs (miRNAs) are ∼19-22 nucleotides (nt) long regulatory RNAs that regulate gene expression by recognizing and binding to complementary sequences on mRNAs. The key step in revealing the function of a miRNA, is the identification of miRNA target genes. Recent biochemical advances including PAR-CLIP and HITS-CLIP allow for improved miRNA target predictions and are widely used to validate miRNA targets. Here, we present miRTar2GO, which is a model, trained on the common rules of miRNA-target interactions, Argonaute (Ago) CLIP-Seq data and experimentally validated miRNA target interactions. miRTar2GO is designed to predict miRNA target sites using more relaxed miRNA-target binding characteristics. More importantly, miRTar2GO allows for the prediction of cell-type specific miRNA targets. We have evaluated miRTar2GO against other widely used miRNA target prediction algorithms and demonstrated that miRTar2GO produced significantly higher F1 and G scores. Target predictions, binding specifications, results of the pathway analysis and gene ontology enrichment of miRNA targets are freely available at http://www.mirtar2go.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Biomarkers for Early Detection of Clinically Relvant Prostate Cancer: A Multi-Institutional Validation Trial - Genomic Health, Inc. — EDRN Public Portal

    Cancer.gov

    Validate a panel of tissue-based biomarkers to determine the presence of or progression to clinically relevant prostate cancer at the time of diagnosis. Utilize a novel, biopsy based multi-gene quantitative RT-PCR assay developed by Genomic Health, Oncotype DX Prostate Cancer Assay, which discriminates aggressive from indolent cancer on multivariate modeling of PCa patients.

  19. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537

  20. Sex-Specific Associations between Particulate Matter Exposure and Gene Expression in Independent Discovery and Validation Cohorts of Middle-Aged Men and Women

    PubMed Central

    Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M.; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S.

    2016-01-01

    Background: Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Objectives: Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Methods: Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM2.5 and PM10 exposures over 2-years were estimated for each participant’s residential address using spatiotemporal interpolation in combination with a dispersion model. Results: Average long-term PM10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 (p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Conclusions: Expression of the sex-specific candidate genes identified in the discovery population predicted PM10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular mechanisms for PM-induced health effects. Citation: Vrijens K, Winckelmans E, Tsamou M, Baeyens W, De Boever P, Jennen D, de Kok TM, Den Hond E, Lefebvre W, Plusquin M, Reynders H, Schoeters G, Van Larebeke N, Vanpoucke C, Kleinjans J, Nawrot TS. 2017. Sex-specific associations between particulate matter exposure and gene expression in independent discovery and validation cohorts of middle-aged men and women. Environ Health Perspect 125:660–669; http://dx.doi.org/10.1289/EHP370 PMID:27740511

  1. Identification and validation of single nucleotide polymorphisms in growth- and maturation-related candidate genes in sole (Solea solea L.).

    PubMed

    Diopere, Eveline; Hellemans, Bart; Volckaert, Filip A M; Maes, Gregory E

    2013-03-01

    Genomic methodologies applied in evolutionary and fisheries research have been of great benefit to understand the marine ecosystem and the management of natural resources. Although single nucleotide polymorphisms (SNPs) are attractive for the study of local adaptation, spatial stock management and traceability, and investigating the effects of fisheries-induced selection, they have rarely been exploited in non-model organisms. This is partly due to difficulties in finding and validating SNPs in species with limited or no genomic resources. Complementary to random genome-scan approaches, a targeted candidate gene approach has the potential to unveil pre-selected functional diversity and provides more in depth information on the action of selection at specific genes. For example genes can be under selective pressure due to climate change and sustained periods of heavy fishing pressure. In this study, we applied a candidate gene approach in sole (Solea solea L.), an important member of the demersal ecosystem. As consumption flatfish it is heavy exploited and has experienced associated life-history changes over the last 60years. To discover novel genetic polymorphisms in or around genes linked to important life history traits in sole, we screened a total of 76 candidate genes related to growth and maturation using a targeted resequencing approach. We identified in total 86 putative SNPs in 22 genes and validated 29 SNPs using a multiplex single-base extension genotyping assay. We found 22 informative SNPs, of which two represent non-synonymous mutations, potentially of functional relevance. These novel markers should be rapidly and broadly applicable in analyses of natural sole populations, as a measure of the evolutionary signature of overfishing and for initiatives on marker assisted selection. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Expression analysis in a rat psychosis model identifies novel candidate genes validated in a large case–control sample of schizophrenia

    PubMed Central

    Ingason, A; Giegling, I; Hartmann, A M; Genius, J; Konte, B; Friedl, M; Ripke, S; Sullivan, P F; St. Clair, D; Collier, D A; O'Donovan, M C; Mirnics, K; Rujescu, D

    2015-01-01

    Antagonists of the N-methyl-D-aspartate (NMDA)-type glutamate receptor induce psychosis in healthy individuals and exacerbate schizophrenia symptoms in patients. In this study we have produced an animal model of NMDA receptor hypofunction by chronically treating rats with low doses of the NMDA receptor antagonist MK-801. Subsequently, we performed an expression study and identified 20 genes showing altered expression in the brain of these rats compared with untreated animals. We then explored whether the human orthologs of these genes are associated with schizophrenia in the largest schizophrenia genome-wide association study published to date, and found evidence for association for 4 out of the 20 genes: SF3B1, FOXP1, DLG2 and VGLL4. Interestingly, three of these genes, FOXP1, SF3B1 and DLG2, have previously been implicated in neurodevelopmental disorders. PMID:26460480

  3. Expression analysis in a rat psychosis model identifies novel candidate genes validated in a large case-control sample of schizophrenia.

    PubMed

    Ingason, A; Giegling, I; Hartmann, A M; Genius, J; Konte, B; Friedl, M; Ripke, S; Sullivan, P F; St Clair, D; Collier, D A; O'Donovan, M C; Mirnics, K; Rujescu, D

    2015-10-13

    Antagonists of the N-methyl-D-aspartate (NMDA)-type glutamate receptor induce psychosis in healthy individuals and exacerbate schizophrenia symptoms in patients. In this study we have produced an animal model of NMDA receptor hypofunction by chronically treating rats with low doses of the NMDA receptor antagonist MK-801. Subsequently, we performed an expression study and identified 20 genes showing altered expression in the brain of these rats compared with untreated animals. We then explored whether the human orthologs of these genes are associated with schizophrenia in the largest schizophrenia genome-wide association study published to date, and found evidence for association for 4 out of the 20 genes: SF3B1, FOXP1, DLG2 and VGLL4. Interestingly, three of these genes, FOXP1, SF3B1 and DLG2, have previously been implicated in neurodevelopmental disorders.

  4. A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database

    PubMed Central

    2014-01-01

    Background Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. Results We present EcoCyc–18.0–GEM, a genome-scale model of the E. coli K–12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc–18.0–GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc–18.0–GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model’s derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc–18.0–GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. Conclusion Significant advantages can be derived from the combination of model organism databases and flux balance modeling represented by MetaFlux. Interpretation of the EcoCyc database as a flux balance model results in a highly accurate metabolic model and provides a rigorous consistency check for information stored in the database. PMID:24974895

  5. Prediction of gene expression with cis-SNPs using mixed models and regularization methods.

    PubMed

    Zeng, Ping; Zhou, Xiang; Huang, Shuiping

    2017-05-11

    It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2  ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2  ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.

  6. Establishment and Validation of RNA-Based Predictive Models for Understanding Survival of Vibrio parahaemolyticus in Oysters Stored at Low Temperatures

    PubMed Central

    Liao, Chao; Zhao, Yong

    2017-01-01

    ABSTRACT This study developed RNA-based predictive models describing the survival of Vibrio parahaemolyticus in Eastern oysters (Crassostrea virginica) during storage at 0, 4, and 10°C. Postharvested oysters were inoculated with a cocktail of five V. parahaemolyticus strains and were then stored at 0, 4, and 10°C for 21 or 11 days. A real-time reverse transcription-PCR (RT-PCR) assay targeting expression of the tlh gene was used to evaluate the number of surviving V. parahaemolyticus cells, which was then used to establish primary molecular models (MMs). Before construction of the MMs, consistent expression levels of the tlh gene at 0, 4, and 10°C were confirmed, and this gene was used to monitor the survival of the total V. parahaemolyticus cells. In addition, the tdh and trh genes were used for monitoring the survival of virulent V. parahaemolyticus. Traditional models (TMs) were built based on data collected using a plate counting method. From the MMs, V. parahaemolyticus populations had decreased 0.493, 0.362, and 0.238 log10 CFU/g by the end of storage at 0, 4, and 10°C, respectively. Rates of reduction of V. parahaemolyticus shown in the TMs were 2.109, 1.579, and 0.894 log10 CFU/g for storage at 0, 4, and 10°C, respectively. Bacterial inactivation rates (IRs) estimated with the TMs (−0.245, −0.152, and −0.121 log10 CFU/day, respectively) were higher than those estimated with the MMs (−0.134, −0.0887, and −0.0732 log10 CFU/day, respectively) for storage at 0, 4, and 10°C. Higher viable V. parahaemolyticus numbers were predicted using the MMs than using the TMs. On the basis of this study, RNA-based predictive MMs are the more accurate and reliable models and can prevent false-negative results compared to TMs. IMPORTANCE One important method for validating postharvest techniques and for monitoring the behavior of V. parahaemolyticus is to establish predictive models. Unfortunately, previous predictive models established based on plate counting methods or on DNA-based PCR can underestimate or overestimate the number of surviving cells. This study developed and validated RNA-based molecular predictive models to describe the survival of V. parahaemolyticus in oysters during low-temperature storage (0, 4, and 10°C). The RNA-based predictive models show the advantage of being able to count all of the culturable, nonculturable, and stressed cells. By using primers targeting the tlh gene and pathogenesis-associated genes (tdh and trh), real-time RT-PCR can evaluate the total surviving V. parahaemolyticus population as well as differentiate the pathogenic ones from the total population. Reliable and accurate predictive models are very important for conducting risk assessment and management of pathogens in food. PMID:28087532

  7. Establishment and Validation of RNA-Based Predictive Models for Understanding Survival of Vibrio parahaemolyticus in Oysters Stored at Low Temperatures.

    PubMed

    Liao, Chao; Zhao, Yong; Wang, Luxin

    2017-03-15

    This study developed RNA-based predictive models describing the survival of Vibrio parahaemolyticus in Eastern oysters ( Crassostrea virginica ) during storage at 0, 4, and 10°C. Postharvested oysters were inoculated with a cocktail of five V. parahaemolyticus strains and were then stored at 0, 4, and 10°C for 21 or 11 days. A real-time reverse transcription-PCR (RT-PCR) assay targeting expression of the tlh gene was used to evaluate the number of surviving V. parahaemolyticus cells, which was then used to establish primary molecular models (MMs). Before construction of the MMs, consistent expression levels of the tlh gene at 0, 4, and 10°C were confirmed, and this gene was used to monitor the survival of the total V. parahaemolyticus cells. In addition, the tdh and trh genes were used for monitoring the survival of virulent V. parahaemolyticus Traditional models (TMs) were built based on data collected using a plate counting method. From the MMs, V. parahaemolyticus populations had decreased 0.493, 0.362, and 0.238 log 10 CFU/g by the end of storage at 0, 4, and 10°C, respectively. Rates of reduction of V. parahaemolyticus shown in the TMs were 2.109, 1.579, and 0.894 log 10 CFU/g for storage at 0, 4, and 10°C, respectively. Bacterial inactivation rates (IRs) estimated with the TMs (-0.245, -0.152, and -0.121 log 10 CFU/day, respectively) were higher than those estimated with the MMs (-0.134, -0.0887, and -0.0732 log 10 CFU/day, respectively) for storage at 0, 4, and 10°C. Higher viable V. parahaemolyticus numbers were predicted using the MMs than using the TMs. On the basis of this study, RNA-based predictive MMs are the more accurate and reliable models and can prevent false-negative results compared to TMs. IMPORTANCE One important method for validating postharvest techniques and for monitoring the behavior of V. parahaemolyticus is to establish predictive models. Unfortunately, previous predictive models established based on plate counting methods or on DNA-based PCR can underestimate or overestimate the number of surviving cells. This study developed and validated RNA-based molecular predictive models to describe the survival of V. parahaemolyticus in oysters during low-temperature storage (0, 4, and 10°C). The RNA-based predictive models show the advantage of being able to count all of the culturable, nonculturable, and stressed cells. By using primers targeting the tlh gene and pathogenesis-associated genes ( tdh and trh ), real-time RT-PCR can evaluate the total surviving V. parahaemolyticus population as well as differentiate the pathogenic ones from the total population. Reliable and accurate predictive models are very important for conducting risk assessment and management of pathogens in food. Copyright © 2017 American Society for Microbiology.

  8. Prediction of complicated disease course for children newly diagnosed with Crohn's disease: a multicentre inception cohort study.

    PubMed

    Kugathasan, Subra; Denson, Lee A; Walters, Thomas D; Kim, Mi-Ok; Marigorta, Urko M; Schirmer, Melanie; Mondal, Kajari; Liu, Chunyan; Griffiths, Anne; Noe, Joshua D; Crandall, Wallace V; Snapper, Scott; Rabizadeh, Shervin; Rosh, Joel R; Shapiro, Jason M; Guthery, Stephen; Mack, David R; Kellermayer, Richard; Kappelman, Michael D; Steiner, Steven; Moulton, Dedrick E; Keljo, David; Cohen, Stanley; Oliva-Hemker, Maria; Heyman, Melvin B; Otley, Anthony R; Baker, Susan S; Evans, Jonathan S; Kirschner, Barbara S; Patel, Ashish S; Ziring, David; Trapnell, Bruce C; Sylvester, Francisco A; Stephens, Michael C; Baldassano, Robert N; Markowitz, James F; Cho, Judy; Xavier, Ramnik J; Huttenhower, Curtis; Aronow, Bruce J; Gibson, Greg; Hyams, Jeffrey S; Dubinsky, Marla C

    2017-04-29

    Stricturing and penetrating complications account for substantial morbidity and health-care costs in paediatric and adult onset Crohn's disease. Validated models to predict risk for complications are not available, and the effect of treatment on risk is unknown. We did a prospective inception cohort study of paediatric patients with newly diagnosed Crohn's disease at 28 sites in the USA and Canada. Genotypes, antimicrobial serologies, ileal gene expression, and ileal, rectal, and faecal microbiota were assessed. A competing-risk model for disease complications was derived and validated in independent groups. Propensity-score matching tested the effect of anti-tumour necrosis factor α (TNFα) therapy exposure within 90 days of diagnosis on complication risk. Between Nov 1, 2008, and June 30, 2012, we enrolled 913 patients, 78 (9%) of whom experienced Crohn's disease complications. The validated competing-risk model included age, race, disease location, and antimicrobial serologies and provided a sensitivity of 66% (95% CI 51-82) and specificity of 63% (55-71), with a negative predictive value of 95% (94-97). Patients who received early anti-TNFα therapy were less likely to have penetrating complications (hazard ratio [HR] 0·30, 95% CI 0·10-0·89; p=0·0296) but not stricturing complication (1·13, 0·51-2·51; 0·76) than were those who did not receive early anti-TNFα therapy. Ruminococcus was implicated in stricturing complications and Veillonella in penetrating complications. Ileal genes controlling extracellular matrix production were upregulated at diagnosis, and this gene signature was associated with stricturing in the risk model (HR 1·70, 95% CI 1·12-2·57; p=0·0120). When this gene signature was included, the model's specificity improved to 71%. Our findings support the usefulness of risk stratification of paediatric patients with Crohn's disease at diagnosis, and selection of anti-TNFα therapy. Crohn's and Colitis Foundation of America, Cincinnati Children's Hospital Research Foundation Digestive Health Center. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis.

    PubMed

    Kim, SungHwan; Lin, Chien-Wei; Tseng, George C

    2016-07-01

    Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies. We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients. An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm). ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Empirical validation of landscape resistance models: insights from the Greater Sage-Grouse (Centrocercus urophasianus)

    Treesearch

    Andrew J. Shirk; Michael A. Schroeder; Leslie A. Robb; Samuel A. Cushman

    2015-01-01

    The ability of landscapes to impede species’ movement or gene flow may be quantified by resistance models. Few studies have assessed the performance of resistance models parameterized by expert opinion. In addition, resistance models differ in terms of spatial and thematic resolution as well as their focus on the ecology of a particular species or more generally on the...

  11. Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling.

    PubMed

    Wu, Hulin; Lu, Tao; Xue, Hongqi; Liang, Hua

    2014-04-02

    The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.

  12. Ezrin Inhibition Up-regulates Stress Response Gene Expression*

    PubMed Central

    Çelik, Haydar; Bulut, Gülay; Han, Jenny; Graham, Garrett T.; Minas, Tsion Z.; Conn, Erin J.; Hong, Sung-Hyeok; Pauly, Gary T.; Hayran, Mutlu; Li, Xin; Özdemirli, Metin; Ayhan, Ayşe; Rudek, Michelle A.; Toretsky, Jeffrey A.; Üren, Aykut

    2016-01-01

    Ezrin is a member of the ERM (ezrin/radixin/moesin) family of proteins that links cortical cytoskeleton to the plasma membrane. High expression of ezrin correlates with poor prognosis and metastasis in osteosarcoma. In this study, to uncover specific cellular responses evoked by ezrin inhibition that can be used as a specific pharmacodynamic marker(s), we profiled global gene expression in osteosarcoma cells after treatment with small molecule ezrin inhibitors, NSC305787 and NSC668394. We identified and validated several up-regulated integrated stress response genes including PTGS2, ATF3, DDIT3, DDIT4, TRIB3, and ATF4 as novel ezrin-regulated transcripts. Analysis of transcriptional response in skin and peripheral blood mononuclear cells from NSC305787-treated mice compared with a control group revealed that, among those genes, the stress gene DDIT4/REDD1 may be used as a surrogate pharmacodynamic marker of ezrin inhibitor compound activity. In addition, we validated the anti-metastatic effects of NSC305787 in reducing the incidence of lung metastasis in a genetically engineered mouse model of osteosarcoma and evaluated the pharmacokinetics of NSC305787 and NSC668394 in mice. In conclusion, our findings suggest that cytoplasmic ezrin, previously considered a dormant and inactive protein, has important functions in regulating gene expression that may result in down-regulation of stress response genes. PMID:27137931

  13. Tumour gene expression predicts response to cetuximab in patients with KRAS wild-type metastatic colorectal cancer.

    PubMed

    Baker, J B; Dutta, D; Watson, D; Maddala, T; Munneke, B M; Shak, S; Rowinsky, E K; Xu, L-A; Harbison, C T; Clark, E A; Mauro, D J; Khambata-Ford, S

    2011-02-01

    Although it is accepted that metastatic colorectal cancers (mCRCs) that carry activating mutations in KRAS are unresponsive to anti-epidermal growth factor receptor (EGFR) monoclonal antibodies, a significant fraction of KRAS wild-type (wt) mCRCs are also unresponsive to anti-EGFR therapy. Genes encoding EGFR ligands amphiregulin (AREG) and epiregulin (EREG) are promising gene expression-based markers but have not been incorporated into a test to dichotomise KRAS wt mCRC patients with respect to sensitivity to anti-EGFR treatment. We used RT-PCR to test 110 candidate gene expression markers in primary tumours from 144 KRAS wt mCRC patients who received monotherapy with the anti-EGFR antibody cetuximab. Results were correlated with multiple clinical endpoints: disease control, objective response, and progression-free survival (PFS). Expression of many of the tested candidate genes, including EREG and AREG, strongly associate with all clinical endpoints. Using multivariate analysis with two-layer five-fold cross-validation, we constructed a four-gene predictive classifier. Strikingly, patients below the classifier cutpoint had PFS and disease control rates similar to those of patients with KRAS mutant mCRC. Gene expression appears to identify KRAS wt mCRC patients who receive little benefit from cetuximab. It will be important to test this model in an independent validation study.

  14. Robust diagnosis of non-Hodgkin lymphoma phenotypes validated on gene expression data from different laboratories.

    PubMed

    Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo

    2005-01-01

    A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.

  15. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  16. Identification of additive, dominant, and epistatic variation conferred by key genes in cellulose biosynthesis pathway in Populus tomentosa†

    PubMed Central

    Du, Qingzhang; Tian, Jiaxing; Yang, Xiaohui; Pan, Wei; Xu, Baohua; Li, Bailian; Ingvarsson, Pär K.; Zhang, Deqiang

    2015-01-01

    Economically important traits in many species generally show polygenic, quantitative inheritance. The components of genetic variation (additive, dominant and epistatic effects) of these traits conferred by multiple genes in shared biological pathways remain to be defined. Here, we investigated 11 full-length genes in cellulose biosynthesis, on 10 growth and wood-property traits, within a population of 460 unrelated Populus tomentosa individuals, via multi-gene association. To validate positive associations, we conducted single-marker analysis in a linkage population of 1,200 individuals. We identified 118, 121, and 43 associations (P< 0.01) corresponding to additive, dominant, and epistatic effects, respectively, with low to moderate proportions of phenotypic variance (R2). Epistatic interaction models uncovered a combination of three non-synonymous sites from three unique genes, representing a significant epistasis for diameter at breast height and stem volume. Single-marker analysis validated 61 associations (false discovery rate, Q ≤ 0.10), representing 38 SNPs from nine genes, and its average effect (R2 = 3.8%) nearly 2-fold higher than that identified with multi-gene association, suggesting that multi-gene association can capture smaller individual variants. Moreover, a structural gene–gene network based on tissue-specific transcript abundances provides a better understanding of the multi-gene pathway affecting tree growth and lignocellulose biosynthesis. Our study highlights the importance of pathway-based multiple gene associations to uncover the nature of genetic variance for quantitative traits and may drive novel progress in molecular breeding. PMID:25428896

  17. Evaluation of Reference Genes for Quantitative Real-Time PCR in Songbirds

    PubMed Central

    Zinzow-Kramer, Wendy M.; Horton, Brent M.; Maney, Donna L.

    2014-01-01

    Quantitative real-time PCR (qPCR) is becoming a popular tool for the quantification of gene expression in the brain and endocrine tissues of songbirds. Accurate analysis of qPCR data relies on the selection of appropriate reference genes for normalization, yet few papers on songbirds contain evidence of reference gene validation. Here, we evaluated the expression of ten potential reference genes (18S, ACTB, GAPDH, HMBS, HPRT, PPIA, RPL4, RPL32, TFRC, and UBC) in brain, pituitary, ovary, and testis in two species of songbird: zebra finch and white-throated sparrow. We used two algorithms, geNorm and NormFinder, to assess the stability of these reference genes in our samples. We found that the suitability of some of the most popular reference genes for target gene normalization in mammals, such as 18S, depended highly on tissue type. Thus, they are not the best choices for brain and gonad in these songbirds. In contrast, we identified alternative genes, such as HPRT, RPL4 and PPIA, that were highly stable in brain, pituitary, and gonad in these species. Our results suggest that the validation of reference genes in mammals does not necessarily extrapolate to other taxonomic groups. For researchers wishing to identify and evaluate suitable reference genes for qPCR songbirds, our results should serve as a starting point and should help increase the power and utility of songbird models in behavioral neuroendocrinology. PMID:24780145

  18. Pharmacological Validation of Candidate Causal Sleep Genes Identified in an N2 Cross

    PubMed Central

    Brunner, Joseph I.; Gotter, Anthony L.; Millstein, Joshua; Garson, Susan; Binns, Jacquelyn; Fox, Steven V.; Savitz, Alan T.; Yang, He S.; Fitzpatrick, Karrie; Zhou, Lili; Owens, Joseph R.; Webber, Andrea L.; Vitaterna, Martha H.; Kasarskis, Andrew; Uebele, Victor N.; Turek, Fred; Renger, John J.; Winrow, Christopher J.

    2013-01-01

    Despite the substantial impact of sleep disturbances on human health and the many years of study dedicated to understanding sleep pathologies, the underlying genetic mechanisms that govern sleep and wake largely remain unknown. Recently, we completed large scale genetic and gene expression analyses in a segregating inbred mouse cross and identified candidate causal genes that regulate the mammalian sleep-wake cycle, across multiple traits including total sleep time, amounts of REM, non-REM, sleep bout duration and sleep fragmentation. Here we describe a novel approach toward validating candidate causal genes, while also identifying potential targets for sleep-related indications. Select small molecule antagonists and agonists were used to interrogate candidate causal gene function in rodent sleep polysomnography assays to determine impact on overall sleep architecture and to evaluate alignment with associated sleep-wake traits. Significant effects on sleep architecture were observed in validation studies using compounds targeting the muscarinic acetylcholine receptor M3 subunit (Chrm3)(wake promotion), nicotinic acetylcholine receptor alpha4 subunit (Chrna4)(wake promotion), dopamine receptor D5 subunit (Drd5)(sleep induction), serotonin 1D receptor (Htr1d)(altered REM fragmentation), glucagon-like peptide-1 receptor (Glp1r)(light sleep promotion and reduction of deep sleep), and Calcium channel, voltage-dependent, T type, alpha 1I subunit (Cacna1i)(increased bout duration slow wave sleep). Taken together, these results show the complexity of genetic components that regulate sleep-wake traits and highlight the importance of evaluating this complex behavior at a systems level. Pharmacological validation of genetically identified putative targets provides a rapid alternative to generating knock out or transgenic animal models, and may ultimately lead towards new therapeutic opportunities. PMID:22091728

  19. Microarray Meta-Analysis Identifies Acute Lung Injury Biomarkers in Donor Lungs That Predict Development of Primary Graft Failure in Recipients

    PubMed Central

    Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia

    2012-01-01

    Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521

  20. APG: an Active Protein-Gene network model to quantify regulatory signals in complex biological systems.

    PubMed

    Wang, Jiguang; Sun, Yidan; Zheng, Si; Zhang, Xiang-Sun; Zhou, Huarong; Chen, Luonan

    2013-01-01

    Synergistic interactions among transcription factors (TFs) and their cofactors collectively determine gene expression in complex biological systems. In this work, we develop a novel graphical model, called Active Protein-Gene (APG) network model, to quantify regulatory signals of transcription in complex biomolecular networks through integrating both TF upstream-regulation and downstream-regulation high-throughput data. Firstly, we theoretically and computationally demonstrate the effectiveness of APG by comparing with the traditional strategy based only on TF downstream-regulation information. We then apply this model to study spontaneous type 2 diabetic Goto-Kakizaki (GK) and Wistar control rats. Our biological experiments validate the theoretical results. In particular, SP1 is found to be a hidden TF with changed regulatory activity, and the loss of SP1 activity contributes to the increased glucose production during diabetes development. APG model provides theoretical basis to quantitatively elucidate transcriptional regulation by modelling TF combinatorial interactions and exploiting multilevel high-throughput information.

  1. APG: an Active Protein-Gene Network Model to Quantify Regulatory Signals in Complex Biological Systems

    PubMed Central

    Wang, Jiguang; Sun, Yidan; Zheng, Si; Zhang, Xiang-Sun; Zhou, Huarong; Chen, Luonan

    2013-01-01

    Synergistic interactions among transcription factors (TFs) and their cofactors collectively determine gene expression in complex biological systems. In this work, we develop a novel graphical model, called Active Protein-Gene (APG) network model, to quantify regulatory signals of transcription in complex biomolecular networks through integrating both TF upstream-regulation and downstream-regulation high-throughput data. Firstly, we theoretically and computationally demonstrate the effectiveness of APG by comparing with the traditional strategy based only on TF downstream-regulation information. We then apply this model to study spontaneous type 2 diabetic Goto-Kakizaki (GK) and Wistar control rats. Our biological experiments validate the theoretical results. In particular, SP1 is found to be a hidden TF with changed regulatory activity, and the loss of SP1 activity contributes to the increased glucose production during diabetes development. APG model provides theoretical basis to quantitatively elucidate transcriptional regulation by modelling TF combinatorial interactions and exploiting multilevel high-throughput information. PMID:23346354

  2. Quantification of the effects of VRN1 and Ppd-D1 to predict spring wheat (Triticum aestivum) heading time across diverse environments

    PubMed Central

    Zheng, Bangyou; Biddulph, Ben; Li, Dora; Kuchel, Haydn; Chapman, Scott

    2013-01-01

    Heading time is a major determinant of the adaptation of wheat to different environments, and is critical in minimizing risks of frost, heat, and drought on reproductive development. Given that major developmental genes are known in wheat, a process-based model, APSIM, was modified to incorporate gene effects into estimation of heading time, while minimizing degradation in the predictive capability of the model. Model parameters describing environment responses were replaced with functions of the number of winter and photoperiod (PPD)-sensitive alleles at the three VRN1 loci and the Ppd-D1 locus, respectively. Two years of vernalization and PPD trials of 210 lines (spring wheats) at a single location were used to estimate the effects of the VRN1 and Ppd-D1 alleles, with validation against 190 trials (~4400 observations) across the Australian wheatbelt. Compared with spring genotypes, winter genotypes for Vrn-A1 (i.e. with two winter alleles) had a delay of 76.8 degree days (°Cd) in time to heading, which was double the effect of the Vrn-B1 or Vrn-D1 winter genotypes. Of the three VRN1 loci, winter alleles at Vrn-B1 had the strongest interaction with PPD, delaying heading time by 99.0 °Cd under long days. The gene-based model had root mean square error of 3.2 and 4.3 d for calibration and validation datasets, respectively. Virtual genotypes were created to examine heading time in comparison with frost and heat events and showed that new longer-season varieties could be heading later (with potential increased yield) when sown early in season. This gene-based model allows breeders to consider how to target gene combinations to current and future production environments using parameters determined from a small set of phenotyping treatments. PMID:23873997

  3. Quantification of the effects of VRN1 and Ppd-D1 to predict spring wheat (Triticum aestivum) heading time across diverse environments.

    PubMed

    Zheng, Bangyou; Biddulph, Ben; Li, Dora; Kuchel, Haydn; Chapman, Scott

    2013-09-01

    Heading time is a major determinant of the adaptation of wheat to different environments, and is critical in minimizing risks of frost, heat, and drought on reproductive development. Given that major developmental genes are known in wheat, a process-based model, APSIM, was modified to incorporate gene effects into estimation of heading time, while minimizing degradation in the predictive capability of the model. Model parameters describing environment responses were replaced with functions of the number of winter and photoperiod (PPD)-sensitive alleles at the three VRN1 loci and the Ppd-D1 locus, respectively. Two years of vernalization and PPD trials of 210 lines (spring wheats) at a single location were used to estimate the effects of the VRN1 and Ppd-D1 alleles, with validation against 190 trials (~4400 observations) across the Australian wheatbelt. Compared with spring genotypes, winter genotypes for Vrn-A1 (i.e. with two winter alleles) had a delay of 76.8 degree days (°Cd) in time to heading, which was double the effect of the Vrn-B1 or Vrn-D1 winter genotypes. Of the three VRN1 loci, winter alleles at Vrn-B1 had the strongest interaction with PPD, delaying heading time by 99.0 °Cd under long days. The gene-based model had root mean square error of 3.2 and 4.3 d for calibration and validation datasets, respectively. Virtual genotypes were created to examine heading time in comparison with frost and heat events and showed that new longer-season varieties could be heading later (with potential increased yield) when sown early in season. This gene-based model allows breeders to consider how to target gene combinations to current and future production environments using parameters determined from a small set of phenotyping treatments.

  4. Pharmacogenetics-based area-under-curve model can predict efficacy and adverse events from axitinib in individual patients with advanced renal cell carcinoma.

    PubMed

    Yamamoto, Yoshiaki; Tsunedomi, Ryouichi; Fujita, Yusuke; Otori, Toru; Ohba, Mitsuyoshi; Kawai, Yoshihisa; Hirata, Hiroshi; Matsumoto, Hiroaki; Haginaka, Jun; Suzuki, Shigeo; Dahiya, Rajvir; Hamamoto, Yoshihiko; Matsuyama, Kenji; Hazama, Shoichi; Nagano, Hiroaki; Matsuyama, Hideyasu

    2018-03-30

    We investigated the relationship between axitinib pharmacogenetics and clinical efficacy/adverse events in advanced renal cell carcinoma (RCC) and established a model to predict clinical efficacy and adverse events using pharmacokinetic and gene polymorphisms related to drug metabolism and efflux in a phase II trial. We prospectively evaluated the area under the plasma concentration-time curve (AUC) of axitinib, objective response rate, and adverse events in 44 consecutive advanced RCC patients treated with axitinib. To establish a model for predicting clinical efficacy and adverse events, polymorphisms in genes including ABC transporters ( ABCB1 and ABCG2 ), UGT1A , and OR2B11 were analyzed by whole-exome sequencing, Sanger sequencing, and DNA microarray. To validate this prediction model, calculated AUC by 6 gene polymorphisms was compared with actual AUC in 16 additional consecutive patients prospectively. Actual AUC significantly correlated with the objective response rate ( P = 0.0002) and adverse events (hand-foot syndrome, P = 0.0055; and hypothyroidism, P = 0.0381). Calculated AUC significantly correlated with actual AUC ( P < 0.0001), and correctly predicted objective response rate ( P = 0.0044) as well as adverse events ( P = 0.0191 and 0.0082, respectively). In the validation study, calculated AUC prior to axitinib treatment precisely predicted actual AUC after axitinib treatment ( P = 0.0066). Our pharmacogenetics-based AUC prediction model may determine the optimal initial dose of axitinib, and thus facilitate better treatment of patients with advanced RCC.

  5. Pharmacogenetics-based area-under-curve model can predict efficacy and adverse events from axitinib in individual patients with advanced renal cell carcinoma

    PubMed Central

    Yamamoto, Yoshiaki; Tsunedomi, Ryouichi; Fujita, Yusuke; Otori, Toru; Ohba, Mitsuyoshi; Kawai, Yoshihisa; Hirata, Hiroshi; Matsumoto, Hiroaki; Haginaka, Jun; Suzuki, Shigeo; Dahiya, Rajvir; Hamamoto, Yoshihiko; Matsuyama, Kenji; Hazama, Shoichi; Nagano, Hiroaki; Matsuyama, Hideyasu

    2018-01-01

    We investigated the relationship between axitinib pharmacogenetics and clinical efficacy/adverse events in advanced renal cell carcinoma (RCC) and established a model to predict clinical efficacy and adverse events using pharmacokinetic and gene polymorphisms related to drug metabolism and efflux in a phase II trial. We prospectively evaluated the area under the plasma concentration–time curve (AUC) of axitinib, objective response rate, and adverse events in 44 consecutive advanced RCC patients treated with axitinib. To establish a model for predicting clinical efficacy and adverse events, polymorphisms in genes including ABC transporters (ABCB1 and ABCG2), UGT1A, and OR2B11 were analyzed by whole-exome sequencing, Sanger sequencing, and DNA microarray. To validate this prediction model, calculated AUC by 6 gene polymorphisms was compared with actual AUC in 16 additional consecutive patients prospectively. Actual AUC significantly correlated with the objective response rate (P = 0.0002) and adverse events (hand-foot syndrome, P = 0.0055; and hypothyroidism, P = 0.0381). Calculated AUC significantly correlated with actual AUC (P < 0.0001), and correctly predicted objective response rate (P = 0.0044) as well as adverse events (P = 0.0191 and 0.0082, respectively). In the validation study, calculated AUC prior to axitinib treatment precisely predicted actual AUC after axitinib treatment (P = 0.0066). Our pharmacogenetics-based AUC prediction model may determine the optimal initial dose of axitinib, and thus facilitate better treatment of patients with advanced RCC. PMID:29682213

  6. Identification of line-specific strategies for improving carotenoid production in synthetic maize through data-driven mathematical modeling.

    PubMed

    Comas, Jorge; Benfeitas, Rui; Vilaprinyo, Ester; Sorribas, Albert; Solsona, Francesc; Farré, Gemma; Berman, Judit; Zorrilla, Uxue; Capell, Teresa; Sandmann, Gerhard; Zhu, Changfu; Christou, Paul; Alves, Rui

    2016-09-01

    Plant synthetic biology is still in its infancy. However, synthetic biology approaches have been used to manipulate and improve the nutritional and health value of staple food crops such as rice, potato and maize. With current technologies, production yields of the synthetic nutrients are a result of trial and error, and systematic rational strategies to optimize those yields are still lacking. Here, we present a workflow that combines gene expression and quantitative metabolomics with mathematical modeling to identify strategies for increasing production yields of nutritionally important carotenoids in the seed endosperm synthesized through alternative biosynthetic pathways in synthetic lines of white maize, which is normally devoid of carotenoids. Quantitative metabolomics and gene expression data are used to create and fit parameters of mathematical models that are specific to four independent maize lines. Sensitivity analysis and simulation of each model is used to predict which gene activities should be further engineered in order to increase production yields for carotenoid accumulation in each line. Some of these predictions (e.g. increasing Zmlycb/Gllycb will increase accumulated β-carotenes) are valid across the four maize lines and consistent with experimental observations in other systems. Other predictions are line specific. The workflow is adaptable to any other biological system for which appropriate quantitative information is available. Furthermore, we validate some of the predictions using experimental data from additional synthetic maize lines for which no models were developed. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  7. A Tightly Regulated Genetic Selection System with Signaling-Active Alleles of Phytochrome B.

    PubMed

    Hu, Wei; Lagarias, J Clark

    2017-01-01

    Selectable markers derived from plant genes circumvent the potential risk of antibiotic/herbicide-resistance gene transfer into neighboring plant species, endophytic bacteria, and mycorrhizal fungi. Toward this goal, we have engineered and validated signaling-active alleles of phytochrome B (eYHB) as plant-derived selection marker genes in the model plant Arabidopsis (Arabidopsis thaliana). By probing the relationship of construct size and induction conditions to optimal phenotypic selection, we show that eYHB-based alleles are robust substitutes for antibiotic/herbicide-dependent marker genes as well as surprisingly sensitive reporters of off-target transgene expression. © 2017 American Society of Plant Biologists. All Rights Reserved.

  8. A Tightly Regulated Genetic Selection System with Signaling-Active Alleles of Phytochrome B1[OPEN

    PubMed Central

    2017-01-01

    Selectable markers derived from plant genes circumvent the potential risk of antibiotic/herbicide-resistance gene transfer into neighboring plant species, endophytic bacteria, and mycorrhizal fungi. Toward this goal, we have engineered and validated signaling-active alleles of phytochrome B (eYHB) as plant-derived selection marker genes in the model plant Arabidopsis (Arabidopsis thaliana). By probing the relationship of construct size and induction conditions to optimal phenotypic selection, we show that eYHB-based alleles are robust substitutes for antibiotic/herbicide-dependent marker genes as well as surprisingly sensitive reporters of off-target transgene expression. PMID:27881727

  9. Differential in vivo gene expression of major Leptospira proteins in resistant or susceptible animal models.

    PubMed

    Matsui, Mariko; Soupé, Marie-Estelle; Becam, Jérôme; Goarant, Cyrille

    2012-09-01

    Transcripts of Leptospira 16S rRNA, FlaB, LigB, LipL21, LipL32, LipL36, LipL41, and OmpL37 were quantified in the blood of susceptible (hamsters) and resistant (mice) animal models of leptospirosis. We first validated adequate reference genes and then evaluated expression patterns in vivo compared to in vitro cultures. LipL32 expression was downregulated in vivo and differentially regulated in resistant and susceptible animals. FlaB expression was also repressed in mice but not in hamsters. In contrast, LigB and OmpL37 were upregulated in vivo. Thus, we demonstrated that a virulent strain of Leptospira differentially adapts its gene expression in the blood of infected animals.

  10. Identification and validation of reference genes for quantitative real-time PCR normalization and its applications in lycium.

    PubMed

    Zeng, Shaohua; Liu, Yongliang; Wu, Min; Liu, Xiaomin; Shen, Xiaofei; Liu, Chunzhao; Wang, Ying

    2014-01-01

    Lycium barbarum and L. ruthenicum are extensively used as traditional Chinese medicinal plants. Next generation sequencing technology provides a powerful tool for analyzing transcriptomic profiles of gene expression in non-model species. Such gene expression can then be confirmed with quantitative real-time polymerase chain reaction (qRT-PCR). Therefore, use of systematically identified suitable reference genes is a prerequisite for obtaining reliable gene expression data. Here, we calculated the expression stability of 18 candidate reference genes across samples from different tissues and grown under salt stress using geNorm and NormFinder procedures. The geNorm-determined rank of reference genes was similar to those defined by NormFinder with some differences. Both procedures confirmed that the single most stable reference gene was ACNTIN1 for L. barbarum fruits, H2B1 for L. barbarum roots, and EF1α for L. ruthenicum fruits. PGK3, H2B2, and PGK3 were identified as the best stable reference genes for salt-treated L. ruthenicum leaves, roots, and stems, respectively. H2B1 and GAPDH1+PGK1 for L. ruthenicum and SAMDC2+H2B1 for L. barbarum were the best single and/or combined reference genes across all samples. Finally, expression of salt-responsive gene NAC, fruit ripening candidate gene LrPG, and anthocyanin genes were investigated to confirm the validity of the selected reference genes. Suitable reference genes identified in this study provide a foundation for accurately assessing gene expression and further better understanding of novel gene function to elucidate molecular mechanisms behind particular biological/physiological processes in Lycium.

  11. Comparison of the theoretical and real-world evolutionary potential of a genetic circuit

    NASA Astrophysics Data System (ADS)

    Razo-Mejia, M.; Boedicker, J. Q.; Jones, D.; DeLuna, A.; Kinney, J. B.; Phillips, R.

    2014-04-01

    With the development of next-generation sequencing technologies, many large scale experimental efforts aim to map genotypic variability among individuals. This natural variability in populations fuels many fundamental biological processes, ranging from evolutionary adaptation and speciation to the spread of genetic diseases and drug resistance. An interesting and important component of this variability is present within the regulatory regions of genes. As these regions evolve, accumulated mutations lead to modulation of gene expression, which may have consequences for the phenotype. A simple model system where the link between genetic variability, gene regulation and function can be studied in detail is missing. In this article we develop a model to explore how the sequence of the wild-type lac promoter dictates the fold-change in gene expression. The model combines single-base pair resolution maps of transcription factor and RNA polymerase binding energies with a comprehensive thermodynamic model of gene regulation. The model was validated by predicting and then measuring the variability of lac operon regulation in a collection of natural isolates. We then implement the model to analyze the sensitivity of the promoter sequence to the regulatory output, and predict the potential for regulation to evolve due to point mutations in the promoter region.

  12. Validation study of a quantitative multigene reverse transcriptase-polymerase chain reaction assay for assessment of recurrence risk in patients with stage II colon cancer.

    PubMed

    Gray, Richard G; Quirke, Philip; Handley, Kelly; Lopatin, Margarita; Magill, Laura; Baehner, Frederick L; Beaumont, Claire; Clark-Langone, Kim M; Yoshizawa, Carl N; Lee, Mark; Watson, Drew; Shak, Steven; Kerr, David J

    2011-12-10

    We developed quantitative gene expression assays to assess recurrence risk and benefits from chemotherapy in patients with stage II colon cancer. We sought validation by using RNA extracted from fixed paraffin-embedded primary colon tumor blocks from 1,436 patients with stage II colon cancer in the QUASAR (Quick and Simple and Reliable) study of adjuvant fluoropyrimidine chemotherapy versus surgery alone. A recurrence score (RS) and a treatment score (TS) were calculated from gene expression levels of 13 cancer-related genes (n = 7 recurrence genes and n = 6 treatment benefit genes) and from five reference genes with prespecified algorithms. Cox proportional hazards regression models and log-rank methods were used to analyze the relationship between the RS and risk of recurrence in patients treated with surgery alone and between TS and benefits of chemotherapy. Risk of recurrence was significantly associated with RS (hazard ratio [HR] per interquartile range, 1.38; 95% CI, 1.11 to 1.74; P = .004). Recurrence risks at 3 years were 12%, 18%, and 22% for predefined low, intermediate, and high recurrence risk groups, respectively. T stage (HR, 1.94; P < .001) and mismatch repair (MMR) status (HR, 0.31; P < .001) were the strongest histopathologic prognostic factors. The continuous RS was associated with risk of recurrence (P = .006) beyond these and other covariates. There was no trend for increased benefit from chemotherapy at higher TS (P = .95). The continuous 12-gene RS has been validated in a prospective study for assessment of recurrence risk in patients with stage II colon cancer after surgery and provides prognostic value that complements T stage and MMR. The TS was not predictive of chemotherapy benefit.

  13. Bacterial reference genes for gene expression studies by RT-qPCR: survey and analysis.

    PubMed

    Rocha, Danilo J P; Santos, Carolina S; Pacheco, Luis G C

    2015-09-01

    The appropriate choice of reference genes is essential for accurate normalization of gene expression data obtained by the method of reverse transcription quantitative real-time PCR (RT-qPCR). In 2009, a guideline called the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) highlighted the importance of the selection and validation of more than one suitable reference gene for obtaining reliable RT-qPCR results. Herein, we searched the recent literature in order to identify the bacterial reference genes that have been most commonly validated in gene expression studies by RT-qPCR (in the first 5 years following publication of the MIQE guidelines). Through a combination of different search parameters with the text mining tool MedlineRanker, we identified 145 unique bacterial genes that were recently tested as candidate reference genes. Of these, 45 genes were experimentally validated and, in most of the cases, their expression stabilities were verified using the software tools geNorm and NormFinder. It is noteworthy that only 10 of these reference genes had been validated in two or more of the studies evaluated. An enrichment analysis using Gene Ontology classifications demonstrated that genes belonging to the functional categories of DNA Replication (GO: 0006260) and Transcription (GO: 0006351) rendered a proportionally higher number of validated reference genes. Three genes in the former functional class were also among the top five most stable genes identified through an analysis of gene expression data obtained from the Pathosystems Resource Integration Center. These results may provide a guideline for the initial selection of candidate reference genes for RT-qPCR studies in several different bacterial species.

  14. Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

    PubMed

    van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

    2014-03-01

    For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (< 3 km), we calculated several measures of landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.

  15. In silico prediction of novel therapeutic targets using gene-disease association data.

    PubMed

    Ferrero, Enrico; Dunham, Ian; Sanseau, Philippe

    2017-08-29

    Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients.

  16. Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma.

    PubMed

    Fowles, Jared S; Brown, Kristen C; Hess, Ann M; Duval, Dawn L; Gustafson, Daniel L

    2016-02-19

    Genomics-based predictors of drug response have the potential to improve outcomes associated with cancer therapy. Osteosarcoma (OS), the most common primary bone cancer in dogs, is commonly treated with adjuvant doxorubicin or carboplatin following amputation of the affected limb. We evaluated the use of gene-expression based models built in an intra- or interspecies manner to predict chemosensitivity and treatment outcome in canine OS. Models were built and evaluated using microarray gene expression and drug sensitivity data from human and canine cancer cell lines, and canine OS tumor datasets. The "COXEN" method was utilized to filter gene signatures between human and dog datasets based on strong co-expression patterns. Models were built using linear discriminant analysis via the misclassification penalized posterior algorithm. The best doxorubicin model involved genes identified in human lines that were co-expressed and trained on canine OS tumor data, which accurately predicted clinical outcome in 73 % of dogs (p = 0.0262, binomial). The best carboplatin model utilized canine lines for gene identification and model training, with canine OS tumor data for co-expression. Dogs whose treatment matched our predictions had significantly better clinical outcomes than those that didn't (p = 0.0006, Log Rank), and this predictor significantly associated with longer disease free intervals in a Cox multivariate analysis (hazard ratio = 0.3102, p = 0.0124). Our data show that intra- and interspecies gene expression models can successfully predict response in canine OS, which may improve outcome in dogs and serve as pre-clinical validation for similar methods in human cancer research.

  17. Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease

    PubMed Central

    Carbonetto, Peter; Stephens, Matthew

    2013-01-01

    Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study. PMID:24098138

  18. Gene expression-based molecular diagnostic system for malignant gliomas is superior to histological diagnosis.

    PubMed

    Shirahata, Mitsuaki; Iwao-Koizumi, Kyoko; Saito, Sakae; Ueno, Noriko; Oda, Masashi; Hashimoto, Nobuo; Takahashi, Jun A; Kato, Kikuya

    2007-12-15

    Current morphology-based glioma classification methods do not adequately reflect the complex biology of gliomas, thus limiting their prognostic ability. In this study, we focused on anaplastic oligodendroglioma and glioblastoma, which typically follow distinct clinical courses. Our goal was to construct a clinically useful molecular diagnostic system based on gene expression profiling. The expression of 3,456 genes in 32 patients, 12 and 20 of whom had prognostically distinct anaplastic oligodendroglioma and glioblastoma, respectively, was measured by PCR array. Next to unsupervised methods, we did supervised analysis using a weighted voting algorithm to construct a diagnostic system discriminating anaplastic oligodendroglioma from glioblastoma. The diagnostic accuracy of this system was evaluated by leave-one-out cross-validation. The clinical utility was tested on a microarray-based data set of 50 malignant gliomas from a previous study. Unsupervised analysis showed divergent global gene expression patterns between the two tumor classes. A supervised binary classification model showed 100% (95% confidence interval, 89.4-100%) diagnostic accuracy by leave-one-out cross-validation using 168 diagnostic genes. Applied to a gene expression data set from a previous study, our model correlated better with outcome than histologic diagnosis, and also displayed 96.6% (28 of 29) consistency with the molecular classification scheme used for these histologically controversial gliomas in the original article. Furthermore, we observed that histologically diagnosed glioblastoma samples that shared anaplastic oligodendroglioma molecular characteristics tended to be associated with longer survival. Our molecular diagnostic system showed reproducible clinical utility and prognostic ability superior to traditional histopathologic diagnosis for malignant glioma.

  19. Genome-Wide Analysis Identifies IL-18 and FUCA2 as Novel Genes Associated with Diastolic Function in African Americans with Sickle Cell Disease

    PubMed Central

    Sysol, Justin R.; Abbasi, Taimur; Patel, Amit R.; Lang, Roberto M.; Gupta, Akash; Garcia, Joe G. N.; Gordeuk, Victor R.; Machado, Roberto F.

    2016-01-01

    Background Diastolic dysfunction is common in sickle cell disease (SCD), and is associated with an increased risk of mortality. However, the molecular pathogenesis underlying this development is poorly understood. The aim of this study was to identify a gene expression profile that is associated with diastolic function in SCD, potentially elucidating molecular mechanisms behind diastolic dysfunction development. Methods Diastolic function was measured via echocardiography in 65 patients with SCD from two independent study populations. Gene expression microarray data was compared with diastolic function in both study cohorts. Candidate genes that associated in both analyses were tested for validation in a murine SCD model. Lastly, genotyping array data from the replication cohort was used to derive cis-expression quantitative trait loci (cis-eQTLs) and genetic associations within the candidate gene regions. Results Transcriptome data from both patient cohorts implicated 7 genes associated with diastolic function, and mouse SCD myocardial expression validated 3 of these genes. Genetic associations and eQTLs were detected in 2 of the 3 genes, FUCA2 and IL18. Conclusions FUCA2 and IL18 are associated with diastolic function in SCD patients, and may be involved in the pathogenesis of the disease. Genetic polymorphisms within the FUCA2 and IL18 gene regions are also associated with diastolic function in SCD, likely by affecting expression levels of the genes. PMID:27636371

  20. ICG: a wiki-driven knowledgebase of internal control genes for RT-qPCR normalization.

    PubMed

    Sang, Jian; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Xia, Lin; Zou, Dong; Wang, Fan; Xu, Xingjian; Han, Xiaojiao; Fan, Jinqi; Yang, Ye; Zuo, Wanzhu; Zhang, Yang; Zhao, Wenming; Bao, Yiming; Xiao, Jingfa; Hu, Songnian; Hao, Lili; Zhang, Zhang

    2018-01-04

    Real-time quantitative PCR (RT-qPCR) has become a widely used method for accurate expression profiling of targeted mRNA and ncRNA. Selection of appropriate internal control genes for RT-qPCR normalization is an elementary prerequisite for reliable expression measurement. Here, we present ICG (http://icg.big.ac.cn), a wiki-driven knowledgebase for community curation of experimentally validated internal control genes as well as their associated experimental conditions. Unlike extant related databases that focus on qPCR primers in model organisms (mainly human and mouse), ICG features harnessing collective intelligence in community integration of internal control genes for a variety of species. Specifically, it integrates a comprehensive collection of more than 750 internal control genes for 73 animals, 115 plants, 12 fungi and 9 bacteria, and incorporates detailed information on recommended application scenarios corresponding to specific experimental conditions, which, collectively, are of great help for researchers to adopt appropriate internal control genes for their own experiments. Taken together, ICG serves as a publicly editable and open-content encyclopaedia of internal control genes and accordingly bears broad utility for reliable RT-qPCR normalization and gene expression characterization in both model and non-model organisms. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Entropy-based gene ranking without selection bias for the predictive classification of microarray data.

    PubMed

    Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe

    2003-11-06

    We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  2. Discovery of cancer common and specific driver gene sets

    PubMed Central

    2017-01-01

    Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295

  3. The Drosophila pigmentation gene pink (p) encodes a homologue of human Hermansky-Pudlak syndrome 5 (HPS5).

    PubMed

    Falcón-Pérez, Juan M; Romero-Calderón, Rafael; Brooks, Elizabeth S; Krantz, David E; Dell'Angelica, Esteban C

    2007-02-01

    Lysosome-related organelles comprise a group of specialized intracellular compartments that include melanosomes and platelet dense granules (in mammals) and eye pigment granules (in insects). In humans, the biogenesis of these organelles is defective in genetic disorders collectively known as Hermansky-Pudlak syndrome (HPS). Patients with HPS-2, and two murine HPS models, carry mutations in genes encoding subunits of adaptor protein (AP)-3. Other genes mutated in rodent models include those encoding VPS33A and Rab38. Orthologs of all of these genes in Drosophila melanogaster belong to the 'granule group' of eye pigmentation genes. Other genes associated with HPS encode subunits of three complexes of unknown function, named biogenesis of lysosome-related organelles complex (BLOC)-1, -2 and -3, for which the Drosophila counterparts had not been characterized. Here, we report that the gene encoding the Drosophila ortholog of the HPS5 subunit of BLOC-2 is identical to the granule group gene pink (p), which was first studied in 1910 but had not been identified at the molecular level. The phenotype of pink mutants was exacerbated by mutations in AP-3 subunits or in the orthologs of VPS33A and Rab38. These results validate D. melanogaster as a genetic model to study the function of the BLOCs.

  4. Identification of causal genes for complex traits

    PubMed Central

    Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun; Pasaniuc, Bogdan; Eskin, Eleazar

    2015-01-01

    Motivation: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider ‘causal variants’ as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. Results: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. Availability and implementation: Software is freely available for download at genetics.cs.ucla.edu/caviar. Contact: eeskin@cs.ucla.edu PMID:26072484

  5. Identification of causal genes for complex traits.

    PubMed

    Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun; Pasaniuc, Bogdan; Eskin, Eleazar

    2015-06-15

    Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability ρ. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. Software is freely available for download at genetics.cs.ucla.edu/caviar. © The Author 2015. Published by Oxford University Press.

  6. Deep Sequencing of Urinary RNAs for Bladder Cancer Molecular Diagnostics.

    PubMed

    Sin, Mandy L Y; Mach, Kathleen E; Sinha, Rahul; Wu, Fan; Trivedi, Dharati R; Altobelli, Emanuela; Jensen, Kristin C; Sahoo, Debashis; Lu, Ying; Liao, Joseph C

    2017-07-15

    Purpose: The majority of bladder cancer patients present with localized disease and are managed by transurethral resection. However, the high rate of recurrence necessitates lifetime cystoscopic surveillance. Developing a sensitive and specific urine-based test would significantly improve bladder cancer screening, detection, and surveillance. Experimental Design: RNA-seq was used for biomarker discovery to directly assess the gene expression profile of exfoliated urothelial cells in urine derived from bladder cancer patients ( n = 13) and controls ( n = 10). Eight bladder cancer specific and 3 reference genes identified by RNA-seq were quantitated by qPCR in a training cohort of 102 urine samples. A diagnostic model based on the training cohort was constructed using multiple logistic regression. The model was further validated in an independent cohort of 101 urines. Results: A total of 418 genes were found to be differentially expressed between bladder cancer and controls. Validation of a subset of these genes was used to construct an equation for computing a probability of bladder cancer score (P BC ) based on expression of three markers ( ROBO1, WNT5A , and CDC42BPB ). Setting P BC = 0.45 as the cutoff for a positive test, urine testing using the three-marker panel had overall 88% sensitivity and 92% specificity in the training cohort. The accuracy of the three-marker panel in the independent validation cohort yielded an AUC of 0.87 and overall 83% sensitivity and 89% specificity. Conclusions: Urine-based molecular diagnostics using this three-marker signature could provide a valuable adjunct to cystoscopy and may lead to a reduction of unnecessary procedures for bladder cancer diagnosis. Clin Cancer Res; 23(14); 3700-10. ©2017 AACR . ©2017 American Association for Cancer Research.

  7. Discovering relationships between nuclear receptor signaling pathways, genes, and tissues in Transcriptomine.

    PubMed

    Becnel, Lauren B; Ochsner, Scott A; Darlington, Yolanda F; McOwiti, Apollo; Kankanamge, Wasula H; Dehart, Michael; Naumov, Alexey; McKenna, Neil J

    2017-04-25

    We previously developed a web tool, Transcriptomine, to explore expression profiling data sets involving small-molecule or genetic manipulations of nuclear receptor signaling pathways. We describe advances in biocuration, query interface design, and data visualization that enhance the discovery of uncharacterized biology in these pathways using this tool. Transcriptomine currently contains about 45 million data points encompassing more than 2000 experiments in a reference library of nearly 550 data sets retrieved from public archives and systematically curated. To make the underlying data points more accessible to bench biologists, we classified experimental small molecules and gene manipulations into signaling pathways and experimental tissues and cell lines into physiological systems and organs. Incorporation of these mappings into Transcriptomine enables the user to readily evaluate tissue-specific regulation of gene expression by nuclear receptor signaling pathways. Data points from animal and cell model experiments and from clinical data sets elucidate the roles of nuclear receptor pathways in gene expression events accompanying various normal and pathological cellular processes. In addition, data sets targeting non-nuclear receptor signaling pathways highlight transcriptional cross-talk between nuclear receptors and other signaling pathways. We demonstrate with specific examples how data points that exist in isolation in individual data sets validate each other when connected and made accessible to the user in a single interface. In summary, Transcriptomine allows bench biologists to routinely develop research hypotheses, validate experimental data, or model relationships between signaling pathways, genes, and tissues. Copyright © 2017, American Association for the Advancement of Science.

  8. Gene doping detection: evaluation of approach for direct detection of gene transfer using erythropoietin as a model system.

    PubMed

    Baoutina, A; Coldham, T; Bains, G S; Emslie, K R

    2010-08-01

    As clinical gene therapy has progressed toward realizing its potential, concern over misuse of the technology to enhance performance in athletes is growing. Although 'gene doping' is banned by the World Anti-Doping Agency, its detection remains a major challenge. In this study, we developed a methodology for direct detection of the transferred genetic material and evaluated its feasibility for gene doping detection in blood samples from athletes. Using erythropoietin (EPO) as a model gene and a simple in vitro system, we developed real-time PCR assays that target sequences within the transgene complementary DNA corresponding to exon/exon junctions. As these junctions are absent in the endogenous gene due to their interruption by introns, the approach allows detection of trace amounts of a transgene in a large background of the endogenous gene. Two developed assays and one commercial gene expression assay for EPO were validated. On the basis of ability of these assays to selectively amplify transgenic DNA and analysis of literature on testing of gene transfer in preclinical and clinical gene therapy, it is concluded that the developed approach would potentially be suitable to detect gene doping through gene transfer by analysis of small volumes of blood using regular out-of-competition testing.

  9. Radiogenomics to characterize regional genetic heterogeneity in glioblastoma

    PubMed Central

    Hu, Leland S.; Ning, Shuluo; Eschbacher, Jennifer M.; Baxter, Leslie C.; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C.; Peng, Sen; Smith, Kris A.; Nakaji, Peter; Karis, John P.; Quarles, C. Chad; Wu, Teresa; Loftus, Joseph C.; Jenkins, Robert B.; Sicotte, Hugues; Kollmeyer, Thomas M.; O'Neill, Brian P.; Elmquist, William; Hoxworth, Joseph M.; Frakes, David; Sarkaria, Jann; Swanson, Kristin R.; Tran, Nhan L.; Li, Jing; Mitchell, J. Ross

    2017-01-01

    Background Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. Methods We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). Results We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). Conclusion MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. PMID:27502248

  10. Protein classification using probabilistic chain graphs and the Gene Ontology structure.

    PubMed

    Carroll, Steven; Pavlovic, Vladimir

    2006-08-01

    Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. C/C++/Perl implementation is available from authors upon request.

  11. Identification and validation of reference genes for qRT-PCR studies of the obligate aphid pathogenic fungus Pandora neoaphidis during different developmental stages.

    PubMed

    Zhang, Shutao; Chen, Chun; Xie, Tingna; Ye, Sudan

    2017-01-01

    The selection of stable reference genes is a critical step for the accurate quantification of gene expression. To identify and validate the reference genes in Pandora neoaphidis-an obligate aphid pathogenic fungus-the expression of 13classical candidate reference genes were evaluated by quantitative real-time reverse transcriptase polymerase chain reaction(qPCR) at four developmental stages (conidia, conidia with germ tubes, short hyphae and elongated hyphae). Four statistical algorithms, including geNorm, NormFinder, BestKeeper and Delta Ct method were used to rank putative reference genes according to their expression stability and indicate the best reference gene or combination of reference genes for accurate normalization. The analysis of comprehensive ranking revealed that ACT1and 18Swas the most stably expressed genes throughout the developmental stages. To further validate the suitability of the reference genes identified in this study, the expression of cell division control protein 25 (CDC25) and Chitinase 1(CHI1) genes were used to further confirm the validated candidate reference genes. Our study presented the first systematic study of reference gene(s) selection for P. neoaphidis study and provided guidelines to obtain more accurate qPCR results for future developmental efforts.

  12. Supervised group Lasso with applications to microarray data analysis

    PubMed Central

    Ma, Shuangge; Song, Xiao; Huang, Jian

    2007-01-01

    Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436

  13. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

    PubMed

    Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

    2015-11-15

    High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.

  14. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

    PubMed Central

    Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

    2015-01-01

    Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307

  15. Radiation induced pulmonary fibrosis as a model of progressive fibrosis: Contributions of DNA damage, inflammatory response and cellular senescence genes.

    PubMed

    Beach, Tyler A; Johnston, Carl J; Groves, Angela M; Williams, Jacqueline P; Finkelstein, Jacob N

    2017-04-01

    Purpose/Aim of Study: Studies of pulmonary fibrosis (PF) have resulted in DNA damage, inflammatory response, and cellular senescence being widely hypothesized to play a role in the progression of the disease. Utilizing these aforementioned terms, genomics databases were interrogated along with the term, "pulmonary fibrosis," to identify genes common among all 4 search terms. Findings were compared to data derived from a model of radiation-induced progressive pulmonary fibrosis (RIPF) to verify that these genes are similarly expressed, supporting the use of radiation as a model for diseases involving PF, such as human idiopathic pulmonary fibrosis (IPF). In an established model of RIPF, C57BL/6J mice were exposed to 12.5 Gy thorax irradiation and sacrificed at 24 hours, 1, 4, 12, and 32 weeks following exposure, and lung tissue was compared to age-matched controls by RNA sequencing. Of 176 PF associated gene transcripts identified by database interrogation, 146 (>82%) were present in our experimental model, throughout the progression of RIPF. Analysis revealed that nearly 85% of PF gene transcripts were associated with at least 1 other search term. Furthermore, of 22 genes common to all four terms, 16 were present experimentally in RIPF. This illustrates the validity of RIPF as a model of progressive PF/IPF based on the numbers of transcripts reported in both literature and observed experimentally. Well characterized genes and proteins are implicated in this model, supporting the hypotheses that DNA damage, inflammatory response and cellular senescence are associated with the pathogenesis of PF.

  16. Prediction of complicated disease course for children newly diagnosed with Crohn’s disease: a multicentre inception cohort study

    PubMed Central

    Kugathasan, Subra; Denson, Lee A; Walters, Thomas D; Kim, Mi-Ok; Marigorta, Urko M; Schirmer, Melanie; Mondal, Kajari; Liu, Chunyan; Griffiths, Anne; Noe, Joshua D; Crandall, Wallace V; Snapper, Scott; Rabizadeh, Shervin; Rosh, Joel R; Shapiro, Jason M; Guthery, Stephen; Mack, David R; Kellermayer, Richard; Kappelman, Michael D; Steiner, Steven; Moulton, Dedrick E; Keljo, David; Cohen, Stanley; Oliva-Hemker, Maria; Heyman, Melvin B; Otley, Anthony R; Baker, Susan S; Evans, Jonathan S; Kirschner, Barbara S; Patel, Ashish S; Ziring, David; Trapnell, Bruce C; Sylvester, Francisco A; Stephens, Michael C; Baldassano, Robert N; Markowitz, James F; Cho, Judy; Xavier, Ramnik J; Huttenhower, Curtis; Aronow, Bruce J; Gibson, Greg; Hyams, Jeffrey S; Dubinsky, Marla C

    2017-01-01

    Summary Background Stricturing and penetrating complications account for substantial morbidity and health-care costs in paediatric and adult onset Crohn’s disease. Validated models to predict risk for complications are not available, and the effect of treatment on risk is unknown. Methods We did a prospective inception cohort study of paediatric patients with newly diagnosed Crohn’s disease at 28 sites in the USA and Canada. Genotypes, antimicrobial serologies, ileal gene expression, and ileal, rectal, and faecal microbiota were assessed. A competing-risk model for disease complications was derived and validated in independent groups. Propensity-score matching tested the effect of anti-tumour necrosis factor α (TNFα) therapy exposure within 90 days of diagnosis on complication risk. Findings Between Nov 1, 2008, and June 30, 2012, we enrolled 913 patients, 78 (9%) of whom experienced Crohn’s disease complications. The validated competing-risk model included age, race, disease location, and antimicrobial serologies and provided a sensitivity of 66% (95% CI 51–82) and specificity of 63% (55–71), with a negative predictive value of 95% (94–97). Patients who received early anti-TNFα therapy were less likely to have penetrating complications (hazard ratio [HR] 0·30, 95% CI 0·10–0·89; p=0·0296) but not stricturing complication (1·13, 0·51–2·51; 0·76) than were those who did not receive early anti-TNFα therapy. Ruminococcus was implicated in stricturing complications and Veillonella in penetrating complications. Ileal genes controlling extracellular matrix production were upregulated at diagnosis, and this gene signature was associated with stricturing in the risk model (HR 1·70, 95% CI 1·12–2·57; p=0·0120). When this gene signature was included, the model’s specificity improved to 71%. Interpretation Our findings support the usefulness of risk stratification of paediatric patients with Crohn’s disease at diagnosis, and selection of anti-TNFα therapy. Funding Crohn’s and Colitis Foundation of America, Cincinnati Children’s Hospital Research Foundation Digestive Health Center. PMID:28259484

  17. Risk of type 1 diabetes progression in islet autoantibody-positive children can be further stratified using expression patterns of multiple genes implicated in peripheral blood lymphocyte activation and function.

    PubMed

    Jin, Yulan; Sharma, Ashok; Bai, Shan; Davis, Colleen; Liu, Haitao; Hopkins, Diane; Barriga, Kathy; Rewers, Marian; She, Jin-Xiong

    2014-07-01

    There is tremendous scientific and clinical value to further improving the predictive power of autoantibodies because autoantibody-positive (AbP) children have heterogeneous rates of progression to clinical diabetes. This study explored the potential of gene expression profiles as biomarkers for risk stratification among 104 AbP subjects from the Diabetes Autoimmunity Study in the Young (DAISY) using a discovery data set based on microarray and a validation data set based on real-time RT-PCR. The microarray data identified 454 candidate genes with expression levels associated with various type 1 diabetes (T1D) progression rates. RT-PCR analyses of the top-27 candidate genes confirmed 5 genes (BACH2, IGLL3, EIF3A, CDC20, and TXNDC5) associated with differential progression and implicated in lymphocyte activation and function. Multivariate analyses of these five genes in the discovery and validation data sets identified and confirmed four multigene models (BI, ICE, BICE, and BITE, with each letter representing a gene) that consistently stratify high- and low-risk subsets of AbP subjects with hazard ratios >6 (P < 0.01). The results suggest that these genes may be involved in T1D pathogenesis and potentially serve as excellent gene expression biomarkers to predict the risk of progression to clinical diabetes for AbP subjects. © 2014 by the American Diabetes Association.

  18. Expression profiles of loneliness-associated genes for survival prediction in cancer patients.

    PubMed

    You, Liang-Fu; Yeh, Jia-Rong; Su, Mu-Chun

    2014-01-01

    Influence of loneliness on human survival has been established epidemiologically, but genomic research remains undeveloped. We identified 34 loneliness-associated genes which were statistically significant for high- lonely and low-lonely individuals. With the univariate Cox proportional hazards regression model, we obtained corresponding regression coefficients for loneliness-associated genes fo individual cancer patients. Furthermore, risk scores could be generated with the combination of gene expression level multiplied by corresponding regression coefficients of loneliness-associated genes. We verified that high-risk score cancer patients had shorter mean survival time than their low-risk score counterparts. Then we validated the loneliness-associated gene signature in three independent brain cancer cohorts with Kaplan-Meier survival curves (n=77, 85 and 191), significantly separable by log-rank test with hazard ratios (HR) >1 and p-values <0.0001 (HR=2.94, 3.82, and 1.78). Moreover, we validated the loneliness-associated gene signature in bone cancer (HR=5.10, p-value=4.69e-3), lung cancer (HR=2.86, p-value=4.71e-5), ovarian cancer (HR=1.97, p-value=3.11e-5), and leukemia (HR=2.06, p-value=1.79e-4) cohorts. The last lymphoma cohort proved to have an HR=3.50, p-value=1.15e-7. Loneliness- associated genes had good survival prediction for cancer patients, especially bone cancer patients. Our study provided the first indication that expression of loneliness-associated genes are related to survival time of cancer patients.

  19. Behavioral phenotypes of genetic mouse models of autism

    PubMed Central

    Kazdoba, T. M.; Leach, P. T.; Crawley, J. N.

    2016-01-01

    More than a hundred de novo single gene mutations and copy-number variants have been implicated in autism, each occurring in a small subset of cases. Mutant mouse models with syntenic mutations offer research tools to gain an understanding of the role of each gene in modulating biological and behavioral phenotypes relevant to autism. Knockout, knockin and transgenic mice incorporating risk gene mutations detected in autism spectrum disorder and comorbid neurodevelopmental disorders are now widely available. At present, autism spectrum disorder is diagnosed solely by behavioral criteria. We developed a constellation of mouse behavioral assays designed to maximize face validity to the types of social deficits and repetitive behaviors that are central to an autism diagnosis. Mouse behavioral assays for associated symptoms of autism, which include cognitive inflexibility, anxiety, hyperactivity, and unusual reactivity to sensory stimuli, are frequently included in the phenotypic analyses. Over the past 10 years, we and many other laboratories around the world have employed these and additional behavioral tests to phenotype a large number of mutant mouse models of autism. In this review, we highlight mouse models with mutations in genes that have been identified as risk genes for autism, which work through synaptic mechanisms and through the mTOR signaling pathway. Robust, replicated autism-relevant behavioral outcomes in a genetic mouse model lend credence to a causal role for specific gene contributions and downstream biological mechanisms in the etiology of autism. PMID:26403076

  20. SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees

    PubMed Central

    Mallo, Diego; De Oliveira Martins, Leonardo; Posada, David

    2016-01-01

    We present a fast and flexible software package—SimPhy—for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases. PMID:26526427

  1. Pathway-driven gene stability selection of two rheumatoid arthritis GWAS identifies and validates new susceptibility genes in receptor mediated signalling pathways.

    PubMed

    Eleftherohorinou, Hariklia; Hoggart, Clive J; Wright, Victoria J; Levin, Michael; Coin, Lachlan J M

    2011-09-01

    Rheumatoid arthritis (RA) is the commonest chronic, systemic, inflammatory disorder affecting ∼1% of the world population. It has a strong genetic component and a growing number of associated genes have been discovered in genome-wide association studies (GWAS), which nevertheless only account for 23% of the total genetic risk. We aimed to identify additional susceptibility loci through the analysis of GWAS in the context of biological function. We bridge the gap between pathway and gene-oriented analyses of GWAS, by introducing a pathway-driven gene stability-selection methodology that identifies potential causal genes in the top-associated disease pathways that may be driving the pathway association signals. We analysed the WTCCC and the NARAC studies of ∼5000 and ∼2000 subjects, respectively. We examined 700 pathways comprising ∼8000 genes. Ranking pathways by significance revealed that the NARAC top-ranked ∼6% laid within the top 10% of WTCCC. Gene selection on those pathways identified 58 genes in WTCCC and 61 in NARAC; 21 of those were common (P(overlap)< 10(-21)), of which 16 were novel discoveries. Among the identified genes, we validated 10 known RA associations in WTCCC and 13 in NARAC, not discovered using single-SNP approaches on the same data. Gene ontology functional enrichment analysis on the identified genes showed significant over-representation of signalling activity (P< 10(-29)) in both studies. Our findings suggest a novel model of RA genetic predisposition, which involves cell-membrane receptors and genes in second messenger signalling systems, in addition to genes that regulate immune responses, which have been the focus of interest previously.

  2. Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning.

    PubMed

    Zhao, Jonathan Z L; Mucaki, Eliseos J; Rogan, Peter K

    2018-01-01

    Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2 ,  PRKDC , TPP2 , PTPRE , and GADD45A ) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2 ,  CD8A ,  TALDO1 ,  PCNA ,  EIF4G2 ,  LCN2 ,  CDKN1A ,  PRKCH ,  ENO1 ,  and PPM1D ) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

  3. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  4. Modeling river total bed material load discharge using artificial intelligence approaches (based on conceptual inputs)

    NASA Astrophysics Data System (ADS)

    Roushangar, Kiyoumars; Mehrabani, Fatemeh Vojoudi; Shiri, Jalal

    2014-06-01

    This study presents Artificial Intelligence (AI)-based modeling of total bed material load through developing the accuracy level of the predictions of traditional models. Gene expression programming (GEP) and adaptive neuro-fuzzy inference system (ANFIS)-based models were developed and validated for estimations. Sediment data from Qotur River (Northwestern Iran) were used for developing and validation of the applied techniques. In order to assess the applied techniques in relation to traditional models, stream power-based and shear stress-based physical models were also applied in the studied case. The obtained results reveal that developed AI-based models using minimum number of dominant factors, give more accurate results than the other applied models. Nonetheless, it was revealed that k-fold test is a practical but high-cost technique for complete scanning of applied data and avoiding the over-fitting.

  5. Inherited variation in immune response genes in follicular lymphoma and diffuse large B-cell lymphoma.

    PubMed

    Nielsen, Kaspar Rene; Steffensen, Rudi; Haunstrup, Thure Mors; Bødker, Julie Støve; Dybkær, Karen; Baech, John; Bøgsted, Martin; Johnsen, Hans Erik

    2015-01-01

    Diffuse large B-cell lymphoma (DLBCL) and follicular lymphoma (FL) both depend on immune-mediated survival and proliferation signals from the tumor microenvironment. Inherited genetic variation influences this complex interaction. A total of 89 studies investigating immune-response genes in DLBCL and FL were critically reviewed. Relatively consistent association exists for variation in the tumor necrosis factor alpha (TNFA) and interleukin-10 loci and DLBCL risk; for DLBCL outcome association with the TNFA locus exists. Variations at chromosome 6p31-32 were associated with FL risk. Importantly, individual risk alleles have been shown to interact with each other. We suggest that the pathogenetic impact of polymorphic genes should include gene-gene interaction analysis and should be validated in preclinical model systems of normal B lymphopoiesis and B-cell malignancies. In the future, large cohort studies of interactions and genome-wide association studies are needed to extend the present findings and explore new risk alleles to be studied in preclinical models.

  6. Disease Model Discovery from 3,328 Gene Knockouts by The International Mouse Phenotyping Consortium

    PubMed Central

    Meehan, Terrence F.; Conte, Nathalie; West, David B.; Jacobsen, Julius O.; Mason, Jeremy; Warren, Jonathan; Chen, Chao-Kung; Tudose, Ilinca; Relac, Mike; Matthews, Peter; Karp, Natasha; Santos, Luis; Fiegel, Tanja; Ring, Natalie; Westerberg, Henrik; Greenaway, Simon; Sneddon, Duncan; Morgan, Hugh; Codner, Gemma F; Stewart, Michelle E; Brown, James; Horner, Neil; Haendel, Melissa; Washington, Nicole; Mungall, Christopher J.; Reynolds, Corey L; Gallegos, Juan; Gailus-Durner, Valerie; Sorg, Tania; Pavlovic, Guillaume; Bower, Lynette R; Moore, Mark; Morse, Iva; Gao, Xiang; Tocchini-Valentini, Glauco P; Obata, Yuichi; Cho, Soo Young; Seong, Je Kyung; Seavitt, John; Beaudet, Arthur L.; Dickinson, Mary E.; Herault, Yann; Wurst, Wolfgang; de Angelis, Martin Hrabe; Lloyd, K.C. Kent; Flenniken, Ann M; Nutter, Lauryl MJ; Newbigging, Susan; McKerlie, Colin; Justice, Monica J.; Murray, Stephen A.; Svenson, Karen L.; Braun, Robert E.; White, Jacqueline K.; Bradley, Allan; Flicek, Paul; Wells, Sara; Skarnes, William C.; Adams, David J.; Parkinson, Helen; Mallon, Ann-Marie; Brown, Steve D.M.; Smedley, Damian

    2017-01-01

    Although next generation sequencing has revolutionised the ability to associate variants with human diseases, diagnostic rates and development of new therapies are still limited by our lack of knowledge of function and pathobiological mechanism for most genes. To address this challenge, the International Mouse Phenotyping Consortium (IMPC) is creating a genome- and phenome-wide catalogue of gene function by characterizing new knockout mouse strains across diverse biological systems through a broad set of standardised phenotyping tests, with all mice made readily available to the biomedical community. Analysing the first 3328 genes reveals models for 360 diseases including the first for type C Bernard-Soulier, Bardet-Biedl-5 and Gordon Holmes syndromes. 90% of our phenotype annotations are novel, providing the first functional evidence for 1092 genes and candidates in unsolved diseases such as Arrhythmogenic Right Ventricular Dysplasia 3. Finally, we describe our role in variant functional validation with the 100,000 Genomes and other projects. PMID:28650483

  7. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  8. Ezrin Inhibition Up-regulates Stress Response Gene Expression.

    PubMed

    Çelik, Haydar; Bulut, Gülay; Han, Jenny; Graham, Garrett T; Minas, Tsion Z; Conn, Erin J; Hong, Sung-Hyeok; Pauly, Gary T; Hayran, Mutlu; Li, Xin; Özdemirli, Metin; Ayhan, Ayşe; Rudek, Michelle A; Toretsky, Jeffrey A; Üren, Aykut

    2016-06-17

    Ezrin is a member of the ERM (ezrin/radixin/moesin) family of proteins that links cortical cytoskeleton to the plasma membrane. High expression of ezrin correlates with poor prognosis and metastasis in osteosarcoma. In this study, to uncover specific cellular responses evoked by ezrin inhibition that can be used as a specific pharmacodynamic marker(s), we profiled global gene expression in osteosarcoma cells after treatment with small molecule ezrin inhibitors, NSC305787 and NSC668394. We identified and validated several up-regulated integrated stress response genes including PTGS2, ATF3, DDIT3, DDIT4, TRIB3, and ATF4 as novel ezrin-regulated transcripts. Analysis of transcriptional response in skin and peripheral blood mononuclear cells from NSC305787-treated mice compared with a control group revealed that, among those genes, the stress gene DDIT4/REDD1 may be used as a surrogate pharmacodynamic marker of ezrin inhibitor compound activity. In addition, we validated the anti-metastatic effects of NSC305787 in reducing the incidence of lung metastasis in a genetically engineered mouse model of osteosarcoma and evaluated the pharmacokinetics of NSC305787 and NSC668394 in mice. In conclusion, our findings suggest that cytoplasmic ezrin, previously considered a dormant and inactive protein, has important functions in regulating gene expression that may result in down-regulation of stress response genes. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  9. Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

    PubMed

    Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

    2006-03-01

    Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

  10. Modeling and validation of autoinducer-mediated bacterial gene expression in microfluidic environments

    PubMed Central

    Austin, Caitlin M.; Stoy, William; Su, Peter; Harber, Marie C.; Bardill, J. Patrick; Hammer, Brian K.; Forest, Craig R.

    2014-01-01

    Biosensors exploiting communication within genetically engineered bacteria are becoming increasingly important for monitoring environmental changes. Currently, there are a variety of mathematical models for understanding and predicting how genetically engineered bacteria respond to molecular stimuli in these environments, but as sensors have miniaturized towards microfluidics and are subjected to complex time-varying inputs, the shortcomings of these models have become apparent. The effects of microfluidic environments such as low oxygen concentration, increased biofilm encapsulation, diffusion limited molecular distribution, and higher population densities strongly affect rate constants for gene expression not accounted for in previous models. We report a mathematical model that accurately predicts the biological response of the autoinducer N-acyl homoserine lactone-mediated green fluorescent protein expression in reporter bacteria in microfluidic environments by accommodating these rate constants. This generalized mass action model considers a chain of biomolecular events from input autoinducer chemical to fluorescent protein expression through a series of six chemical species. We have validated this model against experimental data from our own apparatus as well as prior published experimental results. Results indicate accurate prediction of dynamics (e.g., 14% peak time error from a pulse input) and with reduced mean-squared error with pulse or step inputs for a range of concentrations (10 μM–30 μM). This model can help advance the design of genetically engineered bacteria sensors and molecular communication devices. PMID:25379076

  11. Fruitful research: drug target discovery for neurodegenerative diseases in Drosophila.

    PubMed

    Konsolaki, Mary

    2013-12-01

    Although vertebrate model systems have obvious advantages in the study of human disease, invertebrate organisms have contributed enormously to this field as well. The conservation of genome structure and physiology among organisms poses unexpected peculiarities, and the redundancy in certain gene families or the presence of polymorphisms that can slightly alter gene expression can, in certain instances, bring invertebrate systems, such as Drosophila, closer to humans than mice and vice versa. This necessitates the analysis of disease pathways in multiple model organisms. The author highlights findings from Drosophila models of neurodegenerative diseases that have occurred in the past few years. She also highlights and discusses various molecular, genetic and genomic tools used in flies, as well as methods for generating disease models. Finally, the author describes Drosophila models of Alzheimer's, Parkinson's tri-nucleotide repeat diseases, and Fragile X syndrome and summarizes insights in disease mechanisms that have been discovered directly in fly models. Full genome genetic screens in Drosophila can lead to the rapid identification of drug target candidates that can be subsequently validated in a vertebrate system. In addition, the Drosophila models of neurodegeneration may often show disease phenotypes that are absent in equivalent mouse models. The author believes that the extensive contribution of Drosophila to both new disease drug target discovery, in addition to target validation, makes them indispensible to drug discovery and development.

  12. Sequencing and comparing whole mitochondrial genomes ofanimals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less

  13. An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

    PubMed

    Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

    2012-01-01

    Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.

  14. From SNP co-association to RNA co-expression: novel insights into gene networks for intramuscular fatty acid composition in porcine.

    PubMed

    Ramayo-Caldas, Yuliaxis; Ballester, Maria; Fortes, Marina R S; Esteve-Codina, Anna; Castelló, Anna; Noguera, Jose L; Fernández, Ana I; Pérez-Enciso, Miguel; Reverter, Antonio; Folch, Josep M

    2014-03-26

    Fatty acids (FA) play a critical role in energy homeostasis and metabolic diseases; in the context of livestock species, their profile also impacts on meat quality for healthy human consumption. Molecular pathways controlling lipid metabolism are highly interconnected and are not fully understood. Elucidating these molecular processes will aid technological development towards improvement of pork meat quality and increased knowledge of FA metabolism, underpinning metabolic diseases in humans. The results from genome-wide association studies (GWAS) across 15 phenotypes were subjected to an Association Weight Matrix (AWM) approach to predict a network of 1,096 genes related to intramuscular FA composition in pigs. To identify the key regulators of FA metabolism, we focused on the minimal set of transcription factors (TF) that the explored the majority of the network topology. Pathway and network analyses pointed towards a trio of TF as key regulators of FA metabolism: NCOA2, FHL2 and EP300. Promoter sequence analyses confirmed that these TF have binding sites for some well-know regulators of lipid and carbohydrate metabolism. For the first time in a non-model species, some of the co-associations observed at the genetic level were validated through co-expression at the transcriptomic level based on real-time PCR of 40 genes in adipose tissue, and a further 55 genes in liver. In particular, liver expression of NCOA2 and EP300 differed between pig breeds (Iberian and Landrace) extreme in terms of fat deposition. Highly clustered co-expression networks in both liver and adipose tissues were observed. EP300 and NCOA2 showed centrality parameters above average in the both networks. Over all genes, co-expression analyses confirmed 28.9% of the AWM predicted gene-gene interactions in liver and 33.0% in adipose tissue. The magnitude of this validation varied across genes, with up to 60.8% of the connections of NCOA2 in adipose tissue being validated via co-expression. Our results recapitulate the known transcriptional regulation of FA metabolism, predict gene interactions that can be experimentally validated, and suggest that genetic variants mapped to EP300, FHL2, and NCOA2 modulate lipid metabolism and control energy homeostasis in pigs.

  15. RNA Interference Using c-Myc-Conjugated Nanoparticles Suppresses Breast and Colorectal Cancer Models.

    PubMed

    Tangudu, Naveen K; Verma, Vinod K; Clemons, Tristan D; Beevi, Syed S; Hay, Trevor; Mahidhara, Ganesh; Raja, Meera; Nair, Rekha A; Alexander, Liza E; Patel, Anant B; Jose, Jedy; Smith, Nicole M; Zdyrko, Bogdan; Bourdoncle, Anne; Luzinov, Igor; Iyer, K Swaminathan; Clarke, Alan R; Dinesh Kumar, Lekha

    2015-05-01

    In this article, we report the development and preclinical validation of combinatorial therapy for treatment of cancers using RNA interference (RNAi). RNAi technology is an attractive approach to silence genes responsible for disease onset and progression. Currently, the critical challenge facing the clinical success of RNAi technology is in the difficulty of delivery of RNAi inducers, due to low transfection efficiency, difficulties of integration into host DNA and unstable expression. Using the macromolecule polyglycidal methacrylate (PGMA) as a platform to graft multiple polyethyleneimine (PEI) chains, we demonstrate effective delivery of small oligos (anti-miRs and mimics) and larger DNAs (encoding shRNAs) in a wide variety of cancer cell lines by successful silencing/activation of their respective target genes. Furthermore, the effectiveness of this therapy was validated for in vivo tumor suppression using two transgenic mouse models; first, tumor growth arrest and increased animal survival was seen in mice bearing Brca2/p53-mutant mammary tumors following daily intratumoral treatment with nanoparticles conjugated to c-Myc shRNA. Second, oral delivery of the conjugate to an Apc-deficient crypt progenitor colon cancer model increased animal survival and returned intestinal tissue to a non-wnt-deregulated state. This study demonstrates, through careful design of nonviral nanoparticles and appropriate selection of therapeutic gene targets, that RNAi technology can be made an affordable and amenable therapy for cancer. ©2015 American Association for Cancer Research.

  16. Differential In Vivo Gene Expression of Major Leptospira Proteins in Resistant or Susceptible Animal Models

    PubMed Central

    Matsui, Mariko; Soupé, Marie-Estelle; Becam, Jérôme

    2012-01-01

    Transcripts of Leptospira 16S rRNA, FlaB, LigB, LipL21, LipL32, LipL36, LipL41, and OmpL37 were quantified in the blood of susceptible (hamsters) and resistant (mice) animal models of leptospirosis. We first validated adequate reference genes and then evaluated expression patterns in vivo compared to in vitro cultures. LipL32 expression was downregulated in vivo and differentially regulated in resistant and susceptible animals. FlaB expression was also repressed in mice but not in hamsters. In contrast, LigB and OmpL37 were upregulated in vivo. Thus, we demonstrated that a virulent strain of Leptospira differentially adapts its gene expression in the blood of infected animals. PMID:22729538

  17. Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network.

    PubMed

    Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui

    2017-10-01

    The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.

  18. Testing the predictive value of peripheral gene expression for nonremission following citalopram treatment for major depression.

    PubMed

    Guilloux, Jean-Philippe; Bassi, Sabrina; Ding, Ying; Walsh, Chris; Turecki, Gustavo; Tseng, George; Cyranowski, Jill M; Sibille, Etienne

    2015-02-01

    Major depressive disorder (MDD) in general, and anxious-depression in particular, are characterized by poor rates of remission with first-line treatments, contributing to the chronic illness burden suffered by many patients. Prospective research is needed to identify the biomarkers predicting nonremission prior to treatment initiation. We collected blood samples from a discovery cohort of 34 adult MDD patients with co-occurring anxiety and 33 matched, nondepressed controls at baseline and after 12 weeks (of citalopram plus psychotherapy treatment for the depressed cohort). Samples were processed on gene arrays and group differences in gene expression were investigated. Exploratory analyses suggest that at pretreatment baseline, nonremitting patients differ from controls with gene function and transcription factor analyses potentially related to elevated inflammation and immune activation. In a second phase, we applied an unbiased machine learning prediction model and corrected for model-selection bias. Results show that baseline gene expression predicted nonremission with 79.4% corrected accuracy with a 13-gene model. The same gene-only model predicted nonremission after 8 weeks of citalopram treatment with 76% corrected accuracy in an independent validation cohort of 63 MDD patients treated with citalopram at another institution. Together, these results demonstrate the potential, but also the limitations, of baseline peripheral blood-based gene expression to predict nonremission after citalopram treatment. These results not only support their use in future prediction tools but also suggest that increased accuracy may be obtained with the inclusion of additional predictors (eg, genetics and clinical scales).

  19. Validation of reference genes for quantifying changes in gene expression in virus-infected tobacco.

    PubMed

    Baek, Eseul; Yoon, Ju-Yeon; Palukaitis, Peter

    2017-10-01

    To facilitate quantification of gene expression changes in virus-infected tobacco plants, eight housekeeping genes were evaluated for their stability of expression during infection by one of three systemically-infecting viruses (cucumber mosaic virus, potato virus X, potato virus Y) or a hypersensitive-response-inducing virus (tobacco mosaic virus; TMV) limited to the inoculated leaf. Five reference-gene validation programs were used to establish the order of the most stable genes for the systemically-infecting viruses as ribosomal protein L25 > β-Tubulin > Actin, and the least stable genes Ubiquitin-conjugating enzyme (UCE) < PP2A < GAPDH. For local infection by TMV, the most stable genes were EF1α > Cysteine protease > Actin, and the least stable genes were GAPDH < PP2A < UCE. Using two of the most stable and the two least stable validated reference genes, three defense responsive genes were examined to compare their relative changes in gene expression caused by each virus. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Biophysical Constraints Arising from Compositional Context in Synthetic Gene Networks.

    PubMed

    Yeung, Enoch; Dy, Aaron J; Martin, Kyle B; Ng, Andrew H; Del Vecchio, Domitilla; Beck, James L; Collins, James J; Murray, Richard M

    2017-07-26

    Synthetic gene expression is highly sensitive to intragenic compositional context (promoter structure, spacing regions between promoter and coding sequences, and ribosome binding sites). However, much less is known about the effects of intergenic compositional context (spatial arrangement and orientation of entire genes on DNA) on expression levels in synthetic gene networks. We compare expression of induced genes arranged in convergent, divergent, or tandem orientations. Induction of convergent genes yielded up to 400% higher expression, greater ultrasensitivity, and dynamic range than divergent- or tandem-oriented genes. Orientation affects gene expression whether one or both genes are induced. We postulate that transcriptional interference in divergent and tandem genes, mediated by supercoiling, can explain differences in expression and validate this hypothesis through modeling and in vitro supercoiling relaxation experiments. Treatment with gyrase abrogated intergenic context effects, bringing expression levels within 30% of each other. We rebuilt the toggle switch with convergent genes, taking advantage of supercoiling effects to improve threshold detection and switch stability. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Social stress in tree shrews as an animal model of depression: an example of a behavioral model of a CNS disorder.

    PubMed

    Fuchs, Eberhard

    2005-03-01

    Animal models are invaluable in preclinical research on human psychopathology. Valid animal models to study the pathophysiology of depression and specific biological and behavioral responses to antidepressant drug treatments are of prime interest. In order to improve our knowledge of the causal mechanisms of stress-related disorders such as depression, we need animal models that mirror the situation seen in patients. One promising model is the chronic psychosocial stress paradigm in male tree shrews. Coexistence of two males in visual and olfactory contact leads to a stable dominant/subordinate relationship, with the subordinates showing obvious changes in behavioral, neuroendocrine, and central nervous activity that are similar to the signs and symptoms observed during episodes of depression in patients. To discover whether this model, besides its "face validity" for depression, also has "predictive validity," we treated subordinate animals with the tricyclic antidepressant clomipramine and found a time-dependent recovery of both endocrine function and normal behavior. In contrast, the anxiolytic diazepam was ineffective. Chronic psychosocial stress in male tree shrews significantly decreased hippocampal volume and the proliferation rate of the granule precursor cells in the dentate gyrus. These stress-induced changes can be prevented by treating the animals with clomipramine, tianeptine, or the selective neurokinin receptor antagonist L-760,735. In addition to its apparent face and predictive validity, the tree shrew model also has a "molecular validity" due to the degradation routes of psychotropic compounds and gene sequences of receptors are very similar to those in humans. Although further research is required to validate this model fully, it provides an adequate and interesting non-rodent experimental paradigm for preclinical research on depression.

  2. Directed evolution induces tributyrin hydrolysis in a virulence factor of Xylella fastidiosa using a duplicated gene as a template.

    PubMed

    Gouran, Hossein; Chakraborty, Sandeep; Rao, Basuthkar J; Asgeirsson, Bjarni; Dandekar, Abhaya

    2014-01-01

    Duplication of genes is one of the preferred ways for natural selection to add advantageous functionality to the genome without having to reinvent the wheel with respect to catalytic efficiency and protein stability. The duplicated secretory virulence factors of Xylella fastidiosa (LesA, LesB and LesC), implicated in Pierce's disease of grape and citrus variegated chlorosis of citrus species, epitomizes the positive selection pressures exerted on advantageous genes in such pathogens. A deeper insight into the evolution of these lipases/esterases is essential to develop resistance mechanisms in transgenic plants. Directed evolution, an attempt to accelerate the evolutionary steps in the laboratory, is inherently simple when targeted for loss of function. A bigger challenge is to specify mutations that endow a new function, such as a lost functionality in a duplicated gene. Previously, we have proposed a method for enumerating candidates for mutations intended to transfer the functionality of one protein into another related protein based on the spatial and electrostatic properties of the active site residues (DECAAF). In the current work, we present in vivo validation of DECAAF by inducing tributyrin hydrolysis in LesB based on the active site similarity to LesA. The structures of these proteins have been modeled using RaptorX based on the closely related LipA protein from Xanthomonas oryzae. These mutations replicate the spatial and electrostatic conformation of LesA in the modeled structure of the mutant LesB as well, providing in silico validation before proceeding to the laborious in vivo work. Such focused mutations allows one to dissect the relevance of the duplicated genes in finer detail as compared to gene knockouts, since they do not interfere with other moonlighting functions, protein expression levels or protein-protein interaction.

  3. Directed evolution induces tributyrin hydrolysis in a virulence factor of Xylella fastidiosa using a duplicated gene as a template

    PubMed Central

    Rao, Basuthkar J.; Asgeirsson, Bjarni; Dandekar, Abhaya

    2014-01-01

    Duplication of genes is one of the preferred ways for natural selection to add advantageous functionality to the genome without having to reinvent the wheel with respect to catalytic efficiency and protein stability. The duplicated secretory virulence factors of Xylella fastidiosa (LesA, LesB and LesC), implicated in Pierce's disease of grape and citrus variegated chlorosis of citrus species, epitomizes the positive selection pressures exerted on advantageous genes in such pathogens. A deeper insight into the evolution of these lipases/esterases is essential to develop resistance mechanisms in transgenic plants. Directed evolution, an attempt to accelerate the evolutionary steps in the laboratory, is inherently simple when targeted for loss of function. A bigger challenge is to specify mutations that endow a new function, such as a lost functionality in a duplicated gene. Previously, we have proposed a method for enumerating candidates for mutations intended to transfer the functionality of one protein into another related protein based on the spatial and electrostatic properties of the active site residues (DECAAF). In the current work, we present in vivo validation of DECAAF by inducing tributyrin hydrolysis in LesB based on the active site similarity to LesA. The structures of these proteins have been modeled using RaptorX based on the closely related LipA protein from Xanthomonas oryzae. These mutations replicate the spatial and electrostatic conformation of LesA in the modeled structure of the mutant LesB as well, providing in silico validation before proceeding to the laborious in vivo work. Such focused mutations allows one to dissect the relevance of the duplicated genes in finer detail as compared to gene knockouts, since they do not interfere with other moonlighting functions, protein expression levels or protein-protein interaction. PMID:25717364

  4. A microarray whole-genome gene expression dataset in a rat model of inflammatory corneal angiogenesis.

    PubMed

    Mukwaya, Anthony; Lindvall, Jessica M; Xeroudaki, Maria; Peebo, Beatrice; Ali, Zaheer; Lennikov, Anton; Jensen, Lasse Dahl Ejby; Lagali, Neil

    2016-11-22

    In angiogenesis with concurrent inflammation, many pathways are activated, some linked to VEGF and others largely VEGF-independent. Pathways involving inflammatory mediators, chemokines, and micro-RNAs may play important roles in maintaining a pro-angiogenic environment or mediating angiogenic regression. Here, we describe a gene expression dataset to facilitate exploration of pro-angiogenic, pro-inflammatory, and remodelling/normalization-associated genes during both an active capillary sprouting phase, and in the restoration of an avascular phenotype. The dataset was generated by microarray analysis of the whole transcriptome in a rat model of suture-induced inflammatory corneal neovascularisation. Regions of active capillary sprout growth or regression in the cornea were harvested and total RNA extracted from four biological replicates per group. High quality RNA was obtained for gene expression analysis using microarrays. Fold change of selected genes was validated by qPCR, and protein expression was evaluated by immunohistochemistry. We provide a gene expression dataset that may be re-used to investigate corneal neovascularisation, and may also have implications in other contexts of inflammation-mediated angiogenesis.

  5. Rapid Generation of Human Genetic Loss-of-Function iPSC Lines by Simultaneous Reprogramming and Gene Editing.

    PubMed

    Tidball, Andrew M; Dang, Louis T; Glenn, Trevor W; Kilbane, Emma G; Klarr, Daniel J; Margolis, Joshua L; Uhler, Michael D; Parent, Jack M

    2017-09-12

    Specifically ablating genes in human induced pluripotent stem cells (iPSCs) allows for studies of gene function as well as disease mechanisms in disorders caused by loss-of-function (LOF) mutations. While techniques exist for engineering such lines, we have developed and rigorously validated a method of simultaneous iPSC reprogramming while generating CRISPR/Cas9-dependent insertions/deletions (indels). This approach allows for the efficient and rapid formation of genetic LOF human disease cell models with isogenic controls. The rate of mutagenized lines was strikingly consistent across experiments targeting four different human epileptic encephalopathy genes and a metabolic enzyme-encoding gene, and was more efficient and consistent than using CRISPR gene editing of established iPSC lines. The ability of our streamlined method to reproducibly generate heterozygous and homozygous LOF iPSC lines with passage-matched isogenic controls in a single step provides for the rapid development of LOF disease models with ideal control lines, even in the absence of patient tissue. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  6. Backward-stochastic-differential-equation approach to modeling of gene expression

    NASA Astrophysics Data System (ADS)

    Shamarova, Evelina; Chertovskih, Roman; Ramos, Alexandre F.; Aguiar, Paulo

    2017-03-01

    In this article, we introduce a backward method to model stochastic gene expression and protein-level dynamics. The protein amount is regarded as a diffusion process and is described by a backward stochastic differential equation (BSDE). Unlike many other SDE techniques proposed in the literature, the BSDE method is backward in time; that is, instead of initial conditions it requires the specification of end-point ("final") conditions, in addition to the model parametrization. To validate our approach we employ Gillespie's stochastic simulation algorithm (SSA) to generate (forward) benchmark data, according to predefined gene network models. Numerical simulations show that the BSDE method is able to correctly infer the protein-level distributions that preceded a known final condition, obtained originally from the forward SSA. This makes the BSDE method a powerful systems biology tool for time-reversed simulations, allowing, for example, the assessment of the biological conditions (e.g., protein concentrations) that preceded an experimentally measured event of interest (e.g., mitosis, apoptosis, etc.).

  7. Backward-stochastic-differential-equation approach to modeling of gene expression.

    PubMed

    Shamarova, Evelina; Chertovskih, Roman; Ramos, Alexandre F; Aguiar, Paulo

    2017-03-01

    In this article, we introduce a backward method to model stochastic gene expression and protein-level dynamics. The protein amount is regarded as a diffusion process and is described by a backward stochastic differential equation (BSDE). Unlike many other SDE techniques proposed in the literature, the BSDE method is backward in time; that is, instead of initial conditions it requires the specification of end-point ("final") conditions, in addition to the model parametrization. To validate our approach we employ Gillespie's stochastic simulation algorithm (SSA) to generate (forward) benchmark data, according to predefined gene network models. Numerical simulations show that the BSDE method is able to correctly infer the protein-level distributions that preceded a known final condition, obtained originally from the forward SSA. This makes the BSDE method a powerful systems biology tool for time-reversed simulations, allowing, for example, the assessment of the biological conditions (e.g., protein concentrations) that preceded an experimentally measured event of interest (e.g., mitosis, apoptosis, etc.).

  8. Selection and validation of reference genes for qRT-PCR analysis during biological invasions: The thermal adaptability of Bemisia tabaci MED.

    PubMed

    Dai, Tian-Mei; Lü, Zhi-Chuang; Liu, Wan-Xue; Wan, Fang-Hao

    2017-01-01

    The Bemisia tabaci Mediterranean (MED) cryptic species has been rapidly invading to most parts of the world owing to its strong ecological adaptability, which is considered as a model insect for stress tolerance studies under rapidly changing environments. Selection of a suitable reference gene for quantitative stress-responsive gene expression analysis based on qRT-PCR is critical for elaborating the molecular mechanisms of thermotolerance. To obtain accurate and reliable normalization data in MED, eight candidate reference genes (β-act, GAPDH, β-tub, EF1-α, GST, 18S, RPL13A and α-tub) were examined under various thermal stresses for varied time periods by using geNorm, NormFinder and BestKeeper algorithms, respectively. Our results revealed that β-tub and EF1-α were the best reference genes across all sample sets. On the other hand, 18S and GADPH showed the least stability for all the samples studied. β-act was proved to be highly stable only in case of short-term thermal stresses. To our knowledge this was the first comprehensive report on validation of reference genes under varying temperature stresses in MED. The study could expedite particular discovery of thermotolerance genes in MED. Further, the present results can form the basis of further research on suitable reference genes in this invasive insect and will facilitate transcript profiling in other invasive insects.

  9. KMgene: a unified R package for gene-based association analysis for complex traits.

    PubMed

    Yan, Qi; Fang, Zhou; Chen, Wei; Stegle, Oliver

    2018-02-09

    In this report, we introduce an R package KMgene for performing gene-based association tests for familial, multivariate or longitudinal traits using kernel machine (KM) regression under a generalized linear mixed model (GLMM) framework. Extensive simulations were performed to evaluate the validity of the approaches implemented in KMgene. http://cran.r-project.org/web/packages/KMgene. qi.yan@chp.edu or wei.chen@chp.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.

  10. Validation of reference genes for RT-qPCR studies of gene expression in banana fruit under different experimental conditions.

    PubMed

    Chen, Lei; Zhong, Hai-ying; Kuang, Jian-fei; Li, Jian-guo; Lu, Wang-jin; Chen, Jian-ye

    2011-08-01

    Reverse transcription quantitative real-time PCR (RT-qPCR) is a sensitive technique for quantifying gene expression, but its success depends on the stability of the reference gene(s) used for data normalization. Only a few studies on validation of reference genes have been conducted in fruit trees and none in banana yet. In the present work, 20 candidate reference genes were selected, and their expression stability in 144 banana samples were evaluated and analyzed using two algorithms, geNorm and NormFinder. The samples consisted of eight sample sets collected under different experimental conditions, including various tissues, developmental stages, postharvest ripening, stresses (chilling, high temperature, and pathogen), and hormone treatments. Our results showed that different suitable reference gene(s) or combination of reference genes for normalization should be selected depending on the experimental conditions. The RPS2 and UBQ2 genes were validated as the most suitable reference genes across all tested samples. More importantly, our data further showed that the widely used reference genes, ACT and GAPDH, were not the most suitable reference genes in many banana sample sets. In addition, the expression of MaEBF1, a gene of interest that plays an important role in regulating fruit ripening, under different experimental conditions was used to further confirm the validated reference genes. Taken together, our results provide guidelines for reference gene(s) selection under different experimental conditions and a foundation for more accurate and widespread use of RT-qPCR in banana.

  11. Discovering Genes Essential to the Hypothalamic Regulation of Human Reproduction Using a Human Disease Model: Adjusting to Life in the “-Omics” Era

    PubMed Central

    Stamou, M. I.; Cox, K. H.

    2015-01-01

    The neuroendocrine regulation of reproduction is an intricate process requiring the exquisite coordination of an assortment of cellular networks, all converging on the GnRH neurons. These neurons have a complex life history, migrating mainly from the olfactory placode into the hypothalamus, where GnRH is secreted and acts as the master regulator of the hypothalamic-pituitary-gonadal axis. Much of what we know about the biology of the GnRH neurons has been aided by discoveries made using the human disease model of isolated GnRH deficiency (IGD), a family of rare Mendelian disorders that share a common failure of secretion and/or action of GnRH causing hypogonadotropic hypogonadism. Over the last 30 years, research groups around the world have been investigating the genetic basis of IGD using different strategies based on complex cases that harbor structural abnormalities or single pleiotropic genes, endogamous pedigrees, candidate gene approaches as well as pathway gene analyses. Although such traditional approaches, based on well-validated tools, have been critical to establish the field, new strategies, such as next-generation sequencing, are now providing speed and robustness, but also revealing a surprising number of variants in known IGD genes in both patients and healthy controls. Thus, before the field moves forward with new genetic tools and continues discovery efforts, we must reassess what we know about IGD genetics and prepare to hold our work to a different standard. The purpose of this review is to: 1) look back at the strategies used to discover the “known” genes implicated in the rare forms of IGD; 2) examine the strengths and weaknesses of the methodologies used to validate genetic variation; 3) substantiate the role of known genes in the pathophysiology of the disease; and 4) project forward as we embark upon a widening use of these new and powerful technologies for gene discovery. PMID:26394276

  12. Discovering Genes Essential to the Hypothalamic Regulation of Human Reproduction Using a Human Disease Model: Adjusting to Life in the "-Omics" Era.

    PubMed

    Stamou, M I; Cox, K H; Crowley, William F

    2015-12-01

    The neuroendocrine regulation of reproduction is an intricate process requiring the exquisite coordination of an assortment of cellular networks, all converging on the GnRH neurons. These neurons have a complex life history, migrating mainly from the olfactory placode into the hypothalamus, where GnRH is secreted and acts as the master regulator of the hypothalamic-pituitary-gonadal axis. Much of what we know about the biology of the GnRH neurons has been aided by discoveries made using the human disease model of isolated GnRH deficiency (IGD), a family of rare Mendelian disorders that share a common failure of secretion and/or action of GnRH causing hypogonadotropic hypogonadism. Over the last 30 years, research groups around the world have been investigating the genetic basis of IGD using different strategies based on complex cases that harbor structural abnormalities or single pleiotropic genes, endogamous pedigrees, candidate gene approaches as well as pathway gene analyses. Although such traditional approaches, based on well-validated tools, have been critical to establish the field, new strategies, such as next-generation sequencing, are now providing speed and robustness, but also revealing a surprising number of variants in known IGD genes in both patients and healthy controls. Thus, before the field moves forward with new genetic tools and continues discovery efforts, we must reassess what we know about IGD genetics and prepare to hold our work to a different standard. The purpose of this review is to: 1) look back at the strategies used to discover the "known" genes implicated in the rare forms of IGD; 2) examine the strengths and weaknesses of the methodologies used to validate genetic variation; 3) substantiate the role of known genes in the pathophysiology of the disease; and 4) project forward as we embark upon a widening use of these new and powerful technologies for gene discovery.

  13. A Drosophila model for toxicogenomics: Genetic variation in susceptibility to heavy metal exposure

    PubMed Central

    Luoma, Sarah E.; St. Armour, Genevieve E.; Thakkar, Esha

    2017-01-01

    The genetic factors that give rise to variation in susceptibility to environmental toxins remain largely unexplored. Studies on genetic variation in susceptibility to environmental toxins are challenging in human populations, due to the variety of clinical symptoms and difficulty in determining which symptoms causally result from toxic exposure; uncontrolled environments, often with exposure to multiple toxicants; and difficulty in relating phenotypic effect size to toxic dose, especially when symptoms become manifest with a substantial time lag. Drosophila melanogaster is a powerful model that enables genome-wide studies for the identification of allelic variants that contribute to variation in susceptibility to environmental toxins, since the genetic background, environmental rearing conditions and toxic exposure can be precisely controlled. Here, we used extreme QTL mapping in an outbred population derived from the D. melanogaster Genetic Reference Panel to identify alleles associated with resistance to lead and/or cadmium, two ubiquitous environmental toxins that present serious health risks. We identified single nucleotide polymorphisms (SNPs) associated with variation in resistance to both heavy metals as well as SNPs associated with resistance specific to each of them. The effects of these SNPs were largely sex-specific. We applied mutational and RNAi analyses to 33 candidate genes and functionally validated 28 of them. We constructed networks of candidate genes as blueprints for orthologous networks of human genes. The latter not only provided functional contexts for known human targets of heavy metal toxicity, but also implicated novel candidate susceptibility genes. These studies validate Drosophila as a translational toxicogenomics gene discovery system. PMID:28732062

  14. Multiplex polymerase chain reaction-based prognostic models in diffuse large B-cell lymphoma patients treated with R-CHOP.

    PubMed

    Green, Tina M; Jensen, Andreas K; Holst, René; Falgreen, Steffen; Bøgsted, Martin; de Stricker, Karin; Plesner, Torben; Mourits-Andersen, Torben; Frederiksen, Mikael; Johnsen, Hans E; Pedersen, Lars M; Møller, Michael B

    2016-09-01

    We present a multiplex analysis for genes known to have prognostic value in an attempt to design a clinically useful classification model in patients with diffuse large B-cell lymphoma (DLBCL). Real-time polymerase chain reaction was used to measure transcript levels of 28 relevant genes in 194 de novo DLBCL patients treated with R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine, prednisone). Including International Prognostic Index (IPI) as a variable in a penalized Cox regression, we investigated the association with disease progression for single genes or gene combinations in four models. The best model was validated in data from an online available R-CHOP treated cohort. With progression-free survival (PFS) as primary endpoint, the best performing IPI independent model incorporated the LMO2 and HLADQA1 as well as gene interactions for GCSAMxMIB1, GCSAMxCTGF and FOXP1xPDE4B. This model assigned 33% of patients (n = 60) to poor outcome with an estimated 3-year PFS of 40% vs. 87% for low risk (n = 61) and intermediate (n = 60) risk groups (P < 0·001). However, a simpler, IPI independent model incorporated LMO2 and BCL2 and assigned 33% of the patients with a 3-year PFS of 35% vs. 82% for low risk group (P < 0·001). We have documented the impact of a few single genes added to IPI for assignment in new drug trials. © 2016 John Wiley & Sons Ltd.

  15. Functional Enzyme-Based Approach for Linking Microbial Community Functions with Biogeochemical Process Kinetics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Minjing; Qian, Wei-jun; Gao, Yuqian

    The kinetics of biogeochemical processes in natural and engineered environmental systems are typically described using Monod-type or modified Monod-type models. These models rely on biomass as surrogates for functional enzymes in microbial community that catalyze biogeochemical reactions. A major challenge to apply such models is the difficulty to quantitatively measure functional biomass for constraining and validating the models. On the other hand, omics-based approaches have been increasingly used to characterize microbial community structure, functions, and metabolites. Here we proposed an enzyme-based model that can incorporate omics-data to link microbial community functions with biogeochemical process kinetics. The model treats enzymes asmore » time-variable catalysts for biogeochemical reactions and applies biogeochemical reaction network to incorporate intermediate metabolites. The sequences of genes and proteins from metagenomes, as well as those from the UniProt database, were used for targeted enzyme quantification and to provide insights into the dynamic linkage among functional genes, enzymes, and metabolites that are necessary to be incorporated in the model. The application of the model was demonstrated using denitrification as an example by comparing model-simulated with measured functional enzymes, genes, denitrification substrates and intermediates« less

  16. Validation of reference genes for RT-qPCR analysis in Herbaspirillum seropedicae.

    PubMed

    Pessoa, Daniella Duarte Villarinho; Vidal, Marcia Soares; Baldani, José Ivo; Simoes-Araujo, Jean Luiz

    2016-08-01

    The RT-qPCR technique needs a validated set of reference genes for ensuring the consistency of the results from the gene expression. Expression stabilities for 9 genes from Herbaspirillum seropedicae, strain HRC54, grown with different carbon sources were calculated using geNorm and NormFinder, and the gene rpoA showed the best stability values. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

    PubMed

    Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

    2011-03-24

    The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.

  18. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing

    PubMed Central

    2011-01-01

    Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219

  19. Application of hidden Markov models to biological data mining: a case study

    NASA Astrophysics Data System (ADS)

    Yin, Michael M.; Wang, Jason T.

    2000-04-01

    In this paper we present an example of biological data mining: the detection of splicing junction acceptors in eukaryotic genes. Identification or prediction of transcribed sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. Here we develop a hidden Markov model (HMM) to represent the degeneracy features of splicing junction acceptor sites in eukaryotic genes. The HMM system is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using the 10-way cross- validation method. Experimental results show that our HMM system can correctly classify more than 94% of the candidate sequences (including true and false acceptor sites) into right categories. About 90% of the true acceptor sites and 96% of the false acceptor sites in the test data are classified correctly. These results are very promising considering that only the local information in DNA is used. The proposed model will be a very important component of an effective and accurate gene structure detection system currently being developed in our lab.

  20. CRISPR Gene Editing in the Kidney.

    PubMed

    Cruz, Nelly M; Freedman, Benjamin S

    2018-06-01

    CRISPR is a nuclease guidance system that enables rapid and efficient gene editing of specific DNA sequences within genomes. We review applications of CRISPR for the study and treatment of kidney disease. CRISPR enables functional experiments in cell lines and model organisms to validate candidate genes arising from genetic studies. CRISPR has furthermore been used to establish the first models of genetic disease in human kidney organoids derived from pluripotent stem cells. These gene-edited organoids are providing new insight into the cellular mechanisms of polycystic kidney disease and nephrotic syndrome. CRISPR-engineered cell therapies are currently in clinical trials for cancers and immunologic syndromes, an approach that may be applicable to inflammatory conditions such as lupus nephritis. Use of CRISPR in large domestic species such as pigs raises the possibility of farming kidneys for transplantation to alleviate the shortage of donor organs. However, significant challenges remain, including how to effectively deliver CRISPR to kidneys and how to control gene editing events within the genome. Thorough testing of CRISPR in preclinical models will be critical to the safe and efficacious translation of this powerful young technology into therapies. Copyright © 2018 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.

  1. An enhanced genome-scale metabolic reconstruction of Streptomyces clavuligerus identifies novel strain improvement strategies.

    PubMed

    Toro, León; Pinilla, Laura; Avignone-Rossa, Claudio; Ríos-Estepa, Rigoberto

    2018-05-01

    In this work, we expanded and updated a genome-scale metabolic model of Streptomyces clavuligerus. The model includes 1021 genes and 1494 biochemical reactions; genome-reaction information was curated and new features related to clavam metabolism and to the biomass synthesis equation were incorporated. The model was validated using experimental data from the literature and simulations were performed to predict cellular growth and clavulanic acid biosynthesis. Flux balance analysis (FBA) showed that limiting concentrations of phosphate and an excess of ammonia accumulation are unfavorable for growth and clavulanic acid biosynthesis. The evaluation of different objective functions for FBA showed that maximization of ATP yields the best predictions for cellular behavior in continuous cultures, while the maximization of growth rate provides better predictions for batch cultures. Through gene essentiality analysis, 130 essential genes were found using a limited in silico media, while 100 essential genes were identified in amino acid-supplemented media. Finally, a strain design was carried out to identify candidate genes to be overexpressed or knocked out so as to maximize antibiotic biosynthesis. Interestingly, potential metabolic engineering targets, identified in this study, have not been tested experimentally.

  2. Tumor-adjacent tissue co-expression profile analysis reveals pro-oncogenic ribosomal gene signature for prognosis of resectable hepatocellular carcinoma.

    PubMed

    Grinchuk, Oleg V; Yenamandra, Surya P; Iyer, Ramakrishnan; Singh, Malay; Lee, Hwee Kuan; Lim, Kiat Hon; Chow, Pierce Kah-Hoe; Kuznetsov, Vladamir A

    2018-01-01

    Currently, molecular markers are not used when determining the prognosis and treatment strategy for patients with hepatocellular carcinoma (HCC). In the present study, we proposed that the identification of common pro-oncogenic pathways in primary tumors (PT) and adjacent non-malignant tissues (AT) typically used to predict HCC patient risks may result in HCC biomarker discovery. We examined the genome-wide mRNA expression profiles of paired PT and AT samples from 321 HCC patients. The workflow integrated differentially expressed gene selection, gene ontology enrichment, computational classification, survival predictions, image analysis and experimental validation methods. We developed a 24-ribosomal gene-based HCC classifier (RGC), which is prognostically significant in both PT and AT. The RGC gene overexpression in PT was associated with a poor prognosis in the training (hazard ratio = 8.2, P = 9.4 × 10 -6 ) and cross-cohort validation (hazard ratio = 2.63, P = 0.004) datasets. The multivariate survival analysis demonstrated the significant and independent prognostic value of the RGC. The RGC displayed a significant prognostic value in AT of the training (hazard ratio = 5.0, P = 0.03) and cross-validation (hazard ratio = 1.9, P = 0.03) HCC groups, confirming the accuracy and robustness of the RGC. Our experimental and bioinformatics analyses suggested a key role for c-MYC in the pro-oncogenic pattern of ribosomal biogenesis co-regulation in PT and AT. Microarray, quantitative RT-PCR and quantitative immunohistochemical studies of the PT showed that DKK1 in PT is the perspective biomarker for poor HCC outcomes. The common co-transcriptional pattern of ribosome biogenesis genes in PT and AT from HCC patients suggests a new scalable prognostic system, as supported by the model of tumor-like metabolic redirection/assimilation in non-malignant AT. The RGC, comprising 24 ribosomal genes, is introduced as a robust and reproducible prognostic model for stratifying HCC patient risks. The adjacent non-malignant liver tissue alone, or in combination with HCC tissue biopsy, could be an important target for developing predictive and monitoring strategies, as well as evidence-based therapeutic interventions, that aim to reduce the risk of post-surgery relapse in HCC patients. © 2017 The Authors. Published by FEBS Press and John Wiley & Sons Ltd.

  3. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data

    PubMed Central

    Vallejos, Catalina A.; Marioni, John C.; Richardson, Sylvia

    2015-01-01

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach. PMID:26107944

  4. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.

    PubMed

    Vallejos, Catalina A; Marioni, John C; Richardson, Sylvia

    2015-06-01

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.

  5. Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference

    PubMed Central

    Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.

    2014-01-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology. PMID:24921649

  6. Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference.

    PubMed

    Chen, Guocai; Cairelli, Michael J; Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C

    2014-06-01

    Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.

  7. gene2drug: a computational tool for pathway-based rational drug repositioning.

    PubMed

    Napolitano, Francesco; Carrella, Diego; Mandriani, Barbara; Pisonero-Vaquero, Sandra; Sirci, Francesco; Medina, Diego L; Brunetti-Pierri, Nicola; di Bernardo, Diego

    2018-05-01

    Drug repositioning has been proposed as an effective shortcut to drug discovery. The availability of large collections of transcriptional responses to drugs enables computational approaches to drug repositioning directly based on measured molecular effects. We introduce a novel computational methodology for rational drug repositioning, which exploits the transcriptional responses following treatment with small molecule. Specifically, given a therapeutic target gene, a prioritization of potential effective drugs is obtained by assessing their impact on the transcription of genes in the pathway(s) including the target. We performed in silico validation and comparison with a state-of-art technique based on similar principles. We next performed experimental validation in two different real-case drug repositioning scenarios: (i) upregulation of the glutamate-pyruvate transaminase (GPT), which has been shown to induce reduction of oxalate levels in a mouse model of primary hyperoxaluria, and (ii) activation of the transcription factor TFEB, a master regulator of lysosomal biogenesis and autophagy, whose modulation may be beneficial in neurodegenerative disorders. A web tool for Gene2drug is freely available at http://gene2drug.tigem.it. An R package is under development and can be obtained from https://github.com/franapoli/gep2pep. dibernardo@tigem.it. Supplementary data are available at Bioinformatics online.

  8. Identification of Direct Target Genes Using Joint Sequence and Expression Likelihood with Application to DAF-16

    PubMed Central

    Yu, Ron X.; Liu, Jie; True, Nick; Wang, Wei

    2008-01-01

    A major challenge in the post-genome era is to reconstruct regulatory networks from the biological knowledge accumulated up to date. The development of tools for identifying direct target genes of transcription factors (TFs) is critical to this endeavor. Given a set of microarray experiments, a probabilistic model called TRANSMODIS has been developed which can infer the direct targets of a TF by integrating sequence motif, gene expression and ChIP-chip data. The performance of TRANSMODIS was first validated on a set of transcription factor perturbation experiments (TFPEs) involving Pho4p, a well studied TF in Saccharomyces cerevisiae. TRANSMODIS removed elements of arbitrariness in manual target gene selection process and produced results that concur with one's intuition. TRANSMODIS was further validated on a genome-wide scale by comparing it with two other methods in Saccharomyces cerevisiae. The usefulness of TRANSMODIS was then demonstrated by applying it to the identification of direct targets of DAF-16, a critical TF regulating ageing in Caenorhabditis elegans. We found that 189 genes were tightly regulated by DAF-16. In addition, DAF-16 has differential preference for motifs when acting as an activator or repressor, which awaits experimental verification. TRANSMODIS is computationally efficient and robust, making it a useful probabilistic framework for finding immediate targets. PMID:18350157

  9. The Role of the 21-Gene Recurrence Score in Breast Cancer Treatment.

    PubMed

    Ethier, Josee-Lyne; Amir, Eitan

    2016-08-01

    Several multi-gene assays have been developed to predict the risk of recurrence in patients with estrogen receptor-positive early breast cancer and in whom endocrine therapy is planned. The 21-gene assay is widely used and its prognostic value has been retrospectively validated, showing significant differences in the risk of distant recurrence for patients at high versus low risk. Its role in predicting chemotherapy benefit has also been established, showing a clear benefit for high-risk patients and minimal benefit in those at low risk. These findings have been prospectively investigated in TAILORx (Trial Assigning Individualized Options for Treatment), where available data from the low-risk cohort confirms the prognostic value of this diagnostic test. The prognostic utility of the 21-gene assay increases when combined with clinicopathologic variables, and data from integrated models suggest that its use should be limited to patients with tumor characteristics suggestive of potential chemotherapy benefit. Furthermore, the 21-gene assay has been shown to impact clinical decision making in a cost-effective manner, although direct evidence of benefit from modified treatment recommendations is yet to be proven. The prognostic value of this test has also been shown in populations with node-positive or locally advanced disease treated with neoadjuvant chemotherapy, and ongoing trials aim to prospectively validate these findings.

  10. Regulatory elements driving the expression of skeletal lineage reporters differ during bone development and adulthood.

    PubMed

    Stiers, Pieter-Jan; van Gastel, Nick; Moermans, Karen; Stockmans, Ingrid; Carmeliet, Geert

    2017-12-01

    To improve bone healing or regeneration more insight in the fate and role of the different skeletal cell types is required. Mouse models for fate mapping and lineage tracing of skeletal cells, using stage-specific promoters, have advanced our understanding of bone development, a process that is largely recapitulated during bone repair. However, validation of these models is often only performed during development, whereas proof of the activity and specificity of the used promoters during the bone regenerative process is limited. Here, we show that the regulatory elements of the 6kb collagen type II promoter are not adequate to drive gene expression during bone repair. Similarly, the 2.3kb promoter of collagen type I lacks activity in adult mice, but the 3.2kb promoter is suitable. Furthermore, Cre-mediated fate mapping allows the visualization of progeny, but this label retention may hinder to distinguish these cells from ones with active expression of the marker at later time points. Together, our results show that the lineage-specific regulatory elements driving gene expression during bone development differ from those required later in life and during bone repair, and justify validation of lineage-specific cell tracing and gene silencing strategies during fracture healing and bone regenerative applications. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Behavioral phenotypes of genetic mouse models of autism.

    PubMed

    Kazdoba, T M; Leach, P T; Crawley, J N

    2016-01-01

    More than a hundred de novo single gene mutations and copy-number variants have been implicated in autism, each occurring in a small subset of cases. Mutant mouse models with syntenic mutations offer research tools to gain an understanding of the role of each gene in modulating biological and behavioral phenotypes relevant to autism. Knockout, knockin and transgenic mice incorporating risk gene mutations detected in autism spectrum disorder and comorbid neurodevelopmental disorders are now widely available. At present, autism spectrum disorder is diagnosed solely by behavioral criteria. We developed a constellation of mouse behavioral assays designed to maximize face validity to the types of social deficits and repetitive behaviors that are central to an autism diagnosis. Mouse behavioral assays for associated symptoms of autism, which include cognitive inflexibility, anxiety, hyperactivity, and unusual reactivity to sensory stimuli, are frequently included in the phenotypic analyses. Over the past 10 years, we and many other laboratories around the world have employed these and additional behavioral tests to phenotype a large number of mutant mouse models of autism. In this review, we highlight mouse models with mutations in genes that have been identified as risk genes for autism, which work through synaptic mechanisms and through the mTOR signaling pathway. Robust, replicated autism-relevant behavioral outcomes in a genetic mouse model lend credence to a causal role for specific gene contributions and downstream biological mechanisms in the etiology of autism. © 2015 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.

  12. Biomining active cellulases from a mining bioremediation system.

    PubMed

    Mewis, Keith; Armstrong, Zachary; Song, Young C; Baldwin, Susan A; Withers, Stephen G; Hallam, Steven J

    2013-09-20

    Functional metagenomics has emerged as a powerful method for gene model validation and enzyme discovery from natural and human engineered ecosystems. Here we report development of a high-throughput functional metagenomic screen incorporating bioinformatic and biochemical analyses features. A fosmid library containing 6144 clones sourced from a mining bioremediation system was screened for cellulase activity using 2,4-dinitrophenyl β-cellobioside, a previously proven cellulose model substrate. Fifteen active clones were recovered and fully sequenced revealing 9 unique clones with the ability to hydrolyse 1,4-β-D-glucosidic linkages. Transposon mutagenesis identified genes belonging to glycoside hydrolase (GH) 1, 3, or 5 as necessary for mediating this activity. Reference trees for GH 1, 3, and 5 families were generated from sequences in the CAZy database for automated phylogenetic analysis of fosmid end and active clone sequences revealing known and novel cellulase encoding genes. Active cellulase genes recovered in functional screens were subcloned into inducible high copy plasmids, expressed and purified to determine enzymatic properties including thermostability, pH optima, and substrate specificity. The workflow described here provides a general paradigm for recovery and characterization of microbially derived genes and gene products based on genetic logic and contemporary screening technologies developed for model organismal systems. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  13. CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

    PubMed

    Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel

    2016-03-01

    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  14. GETPrime: a gene- or transcript-specific primer database for quantitative real-time PCR.

    PubMed

    Gubelmann, Carine; Gattiker, Alexandre; Massouras, Andreas; Hens, Korneel; David, Fabrice; Decouttere, Frederik; Rougemont, Jacques; Deplancke, Bart

    2011-01-01

    The vast majority of genes in humans and other organisms undergo alternative splicing, yet the biological function of splice variants is still very poorly understood in large part because of the lack of simple tools that can map the expression profiles and patterns of these variants with high sensitivity. High-throughput quantitative real-time polymerase chain reaction (qPCR) is an ideal technique to accurately quantify nucleic acid sequences including splice variants. However, currently available primer design programs do not distinguish between splice variants and also differ substantially in overall quality, functionality or throughput mode. Here, we present GETPrime, a primer database supported by a novel platform that uniquely combines and automates several features critical for optimal qPCR primer design. These include the consideration of all gene splice variants to enable either gene-specific (covering the majority of splice variants) or transcript-specific (covering one splice variant) expression profiling, primer specificity validation, automated best primer pair selection according to strict criteria and graphical visualization of the latter primer pairs within their genomic context. GETPrime primers have been extensively validated experimentally, demonstrating high transcript specificity in complex samples. Thus, the free-access, user-friendly GETPrime database allows fast primer retrieval and visualization for genes or groups of genes of most common model organisms, and is available at http://updepla1srv1.epfl.ch/getprime/. Database URL: http://deplanckelab.epfl.ch.

  15. GETPrime: a gene- or transcript-specific primer database for quantitative real-time PCR

    PubMed Central

    Gubelmann, Carine; Gattiker, Alexandre; Massouras, Andreas; Hens, Korneel; David, Fabrice; Decouttere, Frederik; Rougemont, Jacques; Deplancke, Bart

    2011-01-01

    The vast majority of genes in humans and other organisms undergo alternative splicing, yet the biological function of splice variants is still very poorly understood in large part because of the lack of simple tools that can map the expression profiles and patterns of these variants with high sensitivity. High-throughput quantitative real-time polymerase chain reaction (qPCR) is an ideal technique to accurately quantify nucleic acid sequences including splice variants. However, currently available primer design programs do not distinguish between splice variants and also differ substantially in overall quality, functionality or throughput mode. Here, we present GETPrime, a primer database supported by a novel platform that uniquely combines and automates several features critical for optimal qPCR primer design. These include the consideration of all gene splice variants to enable either gene-specific (covering the majority of splice variants) or transcript-specific (covering one splice variant) expression profiling, primer specificity validation, automated best primer pair selection according to strict criteria and graphical visualization of the latter primer pairs within their genomic context. GETPrime primers have been extensively validated experimentally, demonstrating high transcript specificity in complex samples. Thus, the free-access, user-friendly GETPrime database allows fast primer retrieval and visualization for genes or groups of genes of most common model organisms, and is available at http://updepla1srv1.epfl.ch/getprime/. Database URL: http://deplanckelab.epfl.ch. PMID:21917859

  16. Modeling the functional genomics of autism using human neurons.

    PubMed

    Konopka, G; Wexler, E; Rosen, E; Mukamel, Z; Osborn, G E; Chen, L; Lu, D; Gao, F; Gao, K; Lowe, J K; Geschwind, D H

    2012-02-01

    Human neural progenitors from a variety of sources present new opportunities to model aspects of human neuropsychiatric disease in vitro. Such in vitro models provide the advantages of a human genetic background combined with rapid and easy manipulation, making them highly useful adjuncts to animal models. Here, we examined whether a human neuronal culture system could be utilized to assess the transcriptional program involved in human neural differentiation and to model some of the molecular features of a neurodevelopmental disorder, such as autism. Primary normal human neuronal progenitors (NHNPs) were differentiated into a post-mitotic neuronal state through addition of specific growth factors and whole-genome gene expression was examined throughout a time course of neuronal differentiation. After 4 weeks of differentiation, a significant number of genes associated with autism spectrum disorders (ASDs) are either induced or repressed. This includes the ASD susceptibility gene neurexin 1, which showed a distinct pattern from neurexin 3 in vitro, and which we validated in vivo in fetal human brain. Using weighted gene co-expression network analysis, we visualized the network structure of transcriptional regulation, demonstrating via this unbiased analysis that a significant number of ASD candidate genes are coordinately regulated during the differentiation process. As NHNPs are genetically tractable and manipulable, they can be used to study both the effects of mutations in multiple ASD candidate genes on neuronal differentiation and gene expression in combination with the effects of potential therapeutic molecules. These data also provide a step towards better understanding of the signaling pathways disrupted in ASD.

  17. Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station.

    PubMed

    Rettig, Trisha A; Ward, Claire; Pecaut, Michael J; Chapes, Stephen K

    2017-07-01

    Spaceflight is known to affect immune cell populations. In particular, splenic B cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after space flight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. This included assessments of our bioinformatic workflow on Illumina HiSeq and MiSeq datasets and is specifically designed to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options. We validated our workflow by comparing our normal mouse MiSeq data to existing murine antibody repertoire studies validating it for future antibody repertoire studies.

  18. A new biologic prognostic model based on immunohistochemistry predicts survival in patients with diffuse large B-cell lymphoma.

    PubMed

    Perry, Anamarija M; Cardesa-Salzmann, Teresa M; Meyer, Paul N; Colomo, Luis; Smith, Lynette M; Fu, Kai; Greiner, Timothy C; Delabie, Jan; Gascoyne, Randy D; Rimsza, Lisa; Jaffe, Elaine S; Ott, German; Rosenwald, Andreas; Braziel, Rita M; Tubbs, Raymond; Cook, James R; Staudt, Louis M; Connors, Joseph M; Sehn, Laurie H; Vose, Julie M; López-Guillermo, Armando; Campo, Elias; Chan, Wing C; Weisenburger, Dennis D

    2012-09-13

    Biologic factors that predict the survival of patients with a diffuse large B-cell lymphoma, such as cell of origin and stromal signatures, have been discovered by gene expression profiling. We attempted to simulate these gene expression profiling findings and create a new biologic prognostic model based on immunohistochemistry. We studied 199 patients (125 in the training set, 74 in the validation set) with de novo diffuse large B-cell lymphoma treated with rituximab and CHOP (cyclophosphamide, doxorubicin, vincristine, and prednisone) or CHOP-like therapies, and immunohistochemical stains were performed on paraffin-embedded tissue microarrays. In the model, 1 point was awarded for each adverse prognostic factor: nongerminal center B cell-like subtype, SPARC (secreted protein, acidic, and rich in cysteine) < 5%, and microvascular density quartile 4. The model using these 3 biologic markers was highly predictive of overall survival and event-free survival in multivariate analysis after adjusting for the International Prognostic Index in both the training and validation sets. This new model delineates 2 groups of patients, 1 with a low biologic score (0-1) and good survival and the other with a high score (2-3) and poor survival. This new biologic prognostic model could be used with the International Prognostic Index to stratify patients for novel or risk-adapted therapies.

  19. A systems level predictive model for global gene regulation of methanogenesis in a hydrogenotrophic methanogen

    PubMed Central

    Yoon, Sung Ho; Turkarslan, Serdar; Reiss, David J.; Pan, Min; Burn, June A.; Costa, Kyle C.; Lie, Thomas J.; Slagel, Joseph; Moritz, Robert L.; Hackett, Murray; Leigh, John A.; Baliga, Nitin S.

    2013-01-01

    Methanogens catalyze the critical methane-producing step (called methanogenesis) in the anaerobic decomposition of organic matter. Here, we present the first predictive model of global gene regulation of methanogenesis in a hydrogenotrophic methanogen, Methanococcus maripaludis. We generated a comprehensive list of genes (protein-coding and noncoding) for M. maripaludis through integrated analysis of the transcriptome structure and a newly constructed Peptide Atlas. The environment and gene-regulatory influence network (EGRIN) model of the strain was constructed from a compendium of transcriptome data that was collected over 58 different steady-state and time-course experiments that were performed in chemostats or batch cultures under a spectrum of environmental perturbations that modulated methanogenesis. Analyses of the EGRIN model have revealed novel components of methanogenesis that included at least three additional protein-coding genes of previously unknown function as well as one noncoding RNA. We discovered that at least five regulatory mechanisms act in a combinatorial scheme to intercoordinate key steps of methanogenesis with different processes such as motility, ATP biosynthesis, and carbon assimilation. Through a combination of genetic and environmental perturbation experiments we have validated the EGRIN-predicted role of two novel transcription factors in the regulation of phosphate-dependent repression of formate dehydrogenase—a key enzyme in the methanogenesis pathway. The EGRIN model demonstrates regulatory affiliations within methanogenesis as well as between methanogenesis and other cellular functions. PMID:24089473

  20. Validation of miRNA genes suitable as reference genes in qPCR analyses of miRNA gene expression in Atlantic salmon (Salmo salar).

    PubMed

    Johansen, Ilona; Andreassen, Rune

    2014-12-23

    MicroRNAs (miRNAs) are an abundant class of endogenous small RNA molecules that downregulate gene expression at the post-transcriptional level. They play important roles by regulating genes that control multiple biological processes, and recent years there has been an increased interest in studying miRNA genes and miRNA gene expression. The most common method applied to study gene expression of single genes is quantitative PCR (qPCR). However, before expression of mature miRNAs can be studied robust qPCR methods (miRNA-qPCR) must be developed. This includes identification and validation of suitable reference genes. We are particularly interested in Atlantic salmon (Salmo salar). This is an economically important aquaculture species, but no reference genes dedicated for use in miRNA-qPCR methods has been validated for this species. Our aim was, therefore, to identify suitable reference genes for miRNA-qPCR methods in Salmo salar. We used a systematic approach where we utilized similar studies in other species, some biological criteria, results from deep sequencing of small RNAs and, finally, experimental validation of candidate reference genes by qPCR to identify the most suitable reference genes. Ssa-miR-25-3p was identified as most suitable single reference gene. The best combinations of two reference genes were ssa-miR-25-3p and ssa-miR-455-5p. These two genes were constitutively and stably expressed across many different tissues. Furthermore, infectious salmon anaemia did not seem to affect their expression levels. These genes were amplified with high specificity, good efficiency and the qPCR assays showed a good linearity when applying a simple cybergreen miRNA-PCR method using miRNA gene specific forward primers. We have identified suitable reference genes for miRNA-qPCR in Atlantic salmon. These results will greatly facilitate further studies on miRNA genes in this species. The reference genes identified are conserved genes that are identical in their mature sequence in many aquaculture species. Therefore, they may also be suitable as reference genes in other teleosts. Finally, the systematic approach used in our study successfully identified suitable reference genes, suggesting that this may be a useful strategy to apply in similar validation studies in other aquaculture species.

  1. Subgroups at high risk for ischaemic heart disease:identification and validation in 67 000 individuals from the general population

    PubMed Central

    Frikke-Schmidt, Ruth; Tybjærg-Hansen, Anne; Dyson, Greg; Haase, Christiane L; Benn, Marianne; Nordestgaard, Børge G; Sing, Charles F

    2015-01-01

    Background The aetiology of ischaemic heart disease (IHD) is complex and is influenced by a spectrum of environmental factors and susceptibility genes. Traditional statistical modelling considers such factors to act independently in an additive manner. The Patient Rule-Induction Method (PRIM) is a multi-model building strategy for evaluating risk attributable to context-dependent gene and environmental effects. Methods PRIM was applied to 9073 participants from the prospective Copenhagen City Heart Study (CCHS). Gender-specific cumulative incidences were estimated for subgroups defined by categories of age, smoking, hypertension, diabetes, body mass index, total cholesterol, high-density lipoprotein cholesterol and triglycerides and by 94 single nucleotide variants (SNVs).Cumulative incidences for subgroups were validated using an independently ascertained sample of 58 240 participants from the Copenhagen General Population Study (CGPS). Results In the CCHS the overall cumulative incidences were 0.17 in women and 0.21 in men. PRIM identified six and four mutually exclusive subgroups in women and men, respectively, with cumulative incidences of IHD ranging from 0.02 to 0.34. Cumulative incidences of IHD generated by PRIM in the CCHS were validated in four of the six subgroups of women and two of the four subgroups of men in the CGPS. Conclusions PRIM identified high-risk subgroups characterized by specific contexts of selected values of traditional risk factors and genetic variants. These subgroups were validated in an independently ascertained cohort study. Thus, a multi-model strategy may identify groups of individuals with substantially higher risk of IHD than the overall risk for the general population. PMID:25361584

  2. Common Polymorphisms in the PKP3-SIGIRR-TMEM16J Gene Region Are Associated With Susceptibility to Tuberculosis

    PubMed Central

    Randhawa, April K.; Chau, Tran T. H.; Bang, Nguyen D.; Yen, Nguyen T. B.; Farrar, Jeremy J.; Dunstan, Sarah J.; Hawn, Thomas R.

    2012-01-01

    (See the editorial commentary by Wilkinson, on pages 525–7.) Background. Tuberculosis has been associated with genetic variation in host immunity. We hypothesized that single-nucleotide polymorphisms (SNPs) in SIGIRR, a negative regulator of Toll-like receptor/IL-1R signaling, are associated with susceptibility to tuberculosis. Methods. We used a case-population study design in Vietnam with cases that had either tuberculous meningitis or pulmonary tuberculosis. We genotyped 6 SNPs in the SIGIRR gene region (including the adjacent genes PKP3 and TMEM16J) in a discovery cohort of 352 patients with tuberculosis and 382 controls. Significant associations were genotyped in a validation cohort (339 patients with tuberculosis, 376 controls). Results. Three SNPs (rs10902158, rs7105848, rs7111432) were associated with tuberculosis in discovery and validation cohorts. The polymorphisms were associated with both tuberculous meningitis and pulmonary tuberculosis and were strongest with a recessive genetic model (odds ratios, 1.5–1.6; P = .0006–.001). Coinheritance of these polymorphisms with previously identified risk alleles in Toll-like receptor 2 and TIRAP was associated with an additive risk of tuberculosis susceptibility. Conclusions. These results demonstrate a strong association of SNPs in the PKP3-SIGIRR-TMEM16J gene region and tuberculosis in discovery and validation cohorts. To our knowledge, these are the first associations of polymorphisms in this region with any disease. PMID:22223854

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oberhardt, Matthew A.; Zarecki, Raphy; Reshef, Leah

    Recent insights suggest that non-specific and/or promiscuous enzymes are common and active across life. Understanding the role of such enzymes is an important open question in biology. Here we develop a genome-wide method, PROPER, that uses a permissive PSI-BLAST approach to predict promiscuous activities of metabolic genes. Enzyme promiscuity is typically studied experimentally using multicopy suppression, in which over-expression of a promiscuous ‘replacer’ gene rescues lethality caused by inactivation of a ‘target’ gene. We use PROPER to predict multicopy suppression in Escherichia coli, achieving highly significant overlap with published cases (hypergeometric p = 4.4e-13). We then validate three novel predictedmore » target-replacer gene pairs in new multicopy suppression experiments. We next go beyond PROPER and develop a network-based approach, GEM-PROPER, that integrates PROPER with genome-scale metabolic modeling to predict promiscuous replacements via alternative metabolic pathways. GEM-PROPER predicts a new indirect replacer (thiG) for an essential enzyme (pdxB) in production of pyridoxal 5’-phosphate (the active form of Vitamin B 6), which we validate experimentally via multicopy suppression. Here, we perform a structural analysis of thiG to determine its potential promiscuous active site, which we validate experimentally by inactivating the pertaining residues and showing a loss of replacer activity. Thus, this study is a successful example where a computational investigation leads to a network-based identification of an indirect promiscuous replacement of a key metabolic enzyme, which would have been extremely difficult to identify directly.« less

  4. Reference genes for real-time PCR quantification of messenger RNAs and microRNAs in mouse model of obesity.

    PubMed

    Matoušková, Petra; Bártíková, Hana; Boušová, Iva; Hanušová, Veronika; Szotáková, Barbora; Skálová, Lenka

    2014-01-01

    Obesity and metabolic syndrome is increasing health problem worldwide. Among other ways, nutritional intervention using phytochemicals is important method for treatment and prevention of this disease. Recent studies have shown that certain phytochemicals could alter the expression of specific genes and microRNAs (miRNAs) that play a fundamental role in the pathogenesis of obesity. For study of the obesity and its treatment, monosodium glutamate (MSG)-injected mice with developed central obesity, insulin resistance and liver lipid accumulation are frequently used animal models. To understand the mechanism of phytochemicals action in obese animals, the study of selected genes expression together with miRNA quantification is extremely important. For this purpose, real-time quantitative PCR is a sensitive and reproducible method, but it depends on proper normalization entirely. The aim of present study was to identify the appropriate reference genes for mRNA and miRNA quantification in MSG mice treated with green tea catechins, potential anti-obesity phytochemicals. Two sets of reference genes were tested: first set contained seven commonly used genes for normalization of messenger RNA, the second set of candidate reference genes included ten small RNAs for normalization of miRNA. The expression stability of these reference genes were tested upon treatment of mice with catechins using geNorm, NormFinder and BestKeeper algorithms. Selected normalizers for mRNA quantification were tested and validated on expression of quinone oxidoreductase, biotransformation enzyme known to be modified by catechins. The effect of selected normalizers for miRNA quantification was tested on two obesity- and diabetes- related miRNAs, miR-221 and miR-29b, respectively. Finally, the combinations of B2M/18S/HPRT1 and miR-16/sno234 were validated as optimal reference genes for mRNA and miRNA quantification in liver and 18S/RPlP0/HPRT1 and sno234/miR-186 in small intestine of MSG mice. These reference genes will be used for mRNA and miRNA normalization in further study of green tea catechins action in obese mice.

  5. A Systems' Biology Approach to Study MicroRNA-Mediated Gene Regulatory Networks

    PubMed Central

    Kunz, Manfred; Vera, Julio; Wolkenhauer, Olaf

    2013-01-01

    MicroRNAs (miRNAs) are potent effectors in gene regulatory networks where aberrant miRNA expression can contribute to human diseases such as cancer. For a better understanding of the regulatory role of miRNAs in coordinating gene expression, we here present a systems biology approach combining data-driven modeling and model-driven experiments. Such an approach is characterized by an iterative process, including biological data acquisition and integration, network construction, mathematical modeling and experimental validation. To demonstrate the application of this approach, we adopt it to investigate mechanisms of collective repression on p21 by multiple miRNAs. We first construct a p21 regulatory network based on data from the literature and further expand it using algorithms that predict molecular interactions. Based on the network structure, a detailed mechanistic model is established and its parameter values are determined using data. Finally, the calibrated model is used to study the effect of different miRNA expression profiles and cooperative target regulation on p21 expression levels in different biological contexts. PMID:24350286

  6. The mutational landscape of MYCN, Lin28b and ALK F1174L driven murine neuroblastoma mimics human disease.

    PubMed

    De Wilde, Bram; Beckers, Anneleen; Lindner, Sven; Kristina, Althoff; De Preter, Katleen; Depuydt, Pauline; Mestdagh, Pieter; Sante, Tom; Lefever, Steve; Hertwig, Falk; Peng, Zhiyu; Shi, Le-Ming; Lee, Sangkyun; Vandermarliere, Elien; Martens, Lennart; Menten, Björn; Schramm, Alexander; Fischer, Matthias; Schulte, Johannes; Vandesompele, Jo; Speleman, Frank

    2018-02-02

    Genetically engineered mouse models have proven to be essential tools for unraveling fundamental aspects of cancer biology and for testing novel therapeutic strategies. To optimally serve these goals, it is essential that the mouse model faithfully recapitulates the human disease. Recently, novel mouse models for neuroblastoma have been developed. Here, we report on the further genomic characterization through exome sequencing and DNA copy number analysis of four of the currently available murine neuroblastoma model systems ( ALK, Th- MYCN, Dbh- MYCN and Lin28b ). The murine tumors revealed a low number of genomic alterations - in keeping with human neuroblastoma - and a positive correlation of the number of genetic lesions with the time to onset of tumor formation was observed. Gene copy number alterations are the hallmark of both murine and human disease and frequently affect syntenic genomic regions. Despite low mutational load, the genes mutated in murine disease were found to be enriched for genes mutated in human disease. Taken together, our study further supports the validity of the tested mouse models for mechanistic and preclinical studies of human neuroblastoma.

  7. Predictive Models of Cognitive Outcomes of Developmental Insults

    NASA Astrophysics Data System (ADS)

    Chan, Yupo; Bouaynaya, Nidhal; Chowdhury, Parimal; Leszczynska, Danuta; Patterson, Tucker A.; Tarasenko, Olga

    2010-04-01

    Representatives of Arkansas medical, research and educational institutions have gathered over the past four years to discuss the relationship between functional developmental perturbations and their neurological consequences. We wish to track the effect on the nervous system by developmental perturbations over time and across species. Except for perturbations, the sequence of events that occur during neural development was found to be remarkably conserved across mammalian species. The tracking includes consequences on anatomical regions and behavioral changes. The ultimate goal is to develop a predictive model of long-term genotypic and phenotypic outcomes that includes developmental insults. Such a model can subsequently be fostered into an educated intervention for therapeutic purposes. Several datasets were identified to test plausible hypotheses, ranging from evoked potential datasets to sleep-disorder datasets. An initial model may be mathematical and conceptual. However, we expect to see rapid progress as large-scale gene expression studies in the mammalian brain permit genome-wide searches to discover genes that are uniquely expressed in brain circuits and regions. These genes ultimately control behavior. By using a validated model we endeavor to make useful predictions.

  8. Combining inferred regulatory and reconstructed metabolic networks enhances phenotype prediction in yeast.

    PubMed

    Wang, Zhuo; Danziger, Samuel A; Heavner, Benjamin D; Ma, Shuyi; Smith, Jennifer J; Li, Song; Herricks, Thurston; Simeonidis, Evangelos; Baliga, Nitin S; Aitchison, John D; Price, Nathan D

    2017-05-01

    Gene regulatory and metabolic network models have been used successfully in many organisms, but inherent differences between them make networks difficult to integrate. Probabilistic Regulation Of Metabolism (PROM) provides a partial solution, but it does not incorporate network inference and underperforms in eukaryotes. We present an Integrated Deduced And Metabolism (IDREAM) method that combines statistically inferred Environment and Gene Regulatory Influence Network (EGRIN) models with the PROM framework to create enhanced metabolic-regulatory network models. We used IDREAM to predict phenotypes and genetic interactions between transcription factors and genes encoding metabolic activities in the eukaryote, Saccharomyces cerevisiae. IDREAM models contain many fewer interactions than PROM and yet produce significantly more accurate growth predictions. IDREAM consistently outperformed PROM using any of three popular yeast metabolic models and across three experimental growth conditions. Importantly, IDREAM's enhanced accuracy makes it possible to identify subtle synthetic growth defects. With experimental validation, these novel genetic interactions involving the pyruvate dehydrogenase complex suggested a new role for fatty acid-responsive factor Oaf1 in regulating acetyl-CoA production in glucose grown cells.

  9. Gene Expression Profile Analysis is Directly Affected by the Selected Reference Gene: The Case of Leaf-Cutting Atta Sexdens

    PubMed Central

    Máximo, Wesley P. F.; Zanetti, Ronald; Paiva, Luciano V.

    2018-01-01

    Although several ant species are important targets for the development of molecular control strategies, only a few studies focus on identifying and validating reference genes for quantitative reverse transcription polymerase chain reaction (RT-qPCR) data normalization. We provide here an extensive study to identify and validate suitable reference genes for gene expression analysis in the ant Atta sexdens, a threatening agricultural pest in South America. The optimal number of reference genes varies according to each sample and the result generated by RefFinder differed about which is the most suitable reference gene. Results suggest that the RPS16, NADH and SDHB genes were the best reference genes in the sample pool according to stability values. The SNF7 gene expression pattern was stable in all evaluated sample set. In contrast, when using less stable reference genes for normalization a large variability in SNF7 gene expression was recorded. There is no universal reference gene suitable for all conditions under analysis, since these genes can also participate in different cellular functions, thus requiring a systematic validation of possible reference genes for each specific condition. The choice of reference genes on SNF7 gene normalization confirmed that unstable reference genes might drastically change the expression profile analysis of target candidate genes. PMID:29419794

  10. Evaluation of gene expression classification studies: factors associated with classification performance.

    PubMed

    Novianti, Putri W; Roes, Kit C B; Eijkemans, Marinus J C

    2014-01-01

    Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was affected by factors such as the classification algorithm, cross validation method, number of genes, and gene selection method. In this paper, we study the hypothesis that the disease under study significantly determines which method is optimal, and that additionally sample size, class imbalance, type of medical question (diagnostic, prognostic or treatment response), and microarray platform are potentially influential. A systematic literature review was used to extract the information from 48 published articles on non-cancer microarray classification studies. The impact of the various factors on the reported classification accuracy was analyzed through random-intercept logistic regression. The type of medical question and method of cross validation dominated the explained variation in accuracy among studies, followed by disease category and microarray platform. In total, 42% of the between study variation was explained by all the study specific and problem specific factors that we studied together.

  11. Considering RNAi experimental design in parasitic helminths.

    PubMed

    Dalzell, Johnathan J; Warnock, Neil D; McVeigh, Paul; Marks, Nikki J; Mousley, Angela; Atkinson, Louise; Maule, Aaron G

    2012-04-01

    Almost a decade has passed since the first report of RNA interference (RNAi) in a parasitic helminth. Whilst much progress has been made with RNAi informing gene function studies in disparate nematode and flatworm parasites, substantial and seemingly prohibitive difficulties have been encountered in some species, hindering progress. An appraisal of current practices, trends and ideals of RNAi experimental design in parasitic helminths is both timely and necessary for a number of reasons: firstly, the increasing availability of parasitic helminth genome/transcriptome resources means there is a growing need for gene function tools such as RNAi; secondly, fundamental differences and unique challenges exist for parasite species which do not apply to model organisms; thirdly, the inherent variation in experimental design, and reported difficulties with reproducibility undermine confidence. Ideally, RNAi studies of gene function should adopt standardised experimental design to aid reproducibility, interpretation and comparative analyses. Although the huge variations in parasite biology and experimental endpoints make RNAi experimental design standardization difficult or impractical, we must strive to validate RNAi experimentation in helminth parasites. To aid this process we identify multiple approaches to RNAi experimental validation and highlight those which we deem to be critical for gene function studies in helminth parasites.

  12. Predicting paclitaxel-induced neutropenia using the DMET platform.

    PubMed

    Nieuweboer, Annemieke J M; Smid, Marcel; de Graan, Anne-Joy M; Elbouazzaoui, Samira; de Bruijn, Peter; Martens, John W; Mathijssen, Ron H J; van Schaik, Ron H N

    2015-01-01

    The use of paclitaxel in cancer treatment is limited by paclitaxel-induced neutropenia. We investigated the ability of genetic variation in drug-metabolizing enzymes and transporters to predict hematological toxicity. Using a discovery and validation approach, we identified a pharmacogenetic predictive model for neutropenia. For this, a drug-metabolizing enzymes and transporters plus DNA chip was used, which contains 1936 SNPs in 225 metabolic enzyme and drug-transporter genes. Our 10-SNP model in 279 paclitaxel-dosed patients reached 43% sensitivity in the validation cohort. Analysis in 3-weekly treated patients only resulted in improved sensitivity of 79%, with a specificity of 33%. None of our models reached statistical significance. Our drug-metabolizing enzymes and transporters-based SNP-models are currently of limited value for predicting paclitaxel-induced neutropenia in clinical practice. Original submitted 9 March 2015; Revision submitted 20 May 2015.

  13. Mimosa: Mixture Model of Co-expression to Detect Modulators of Regulatory Interaction

    NASA Astrophysics Data System (ADS)

    Hansen, Matthew; Everett, Logan; Singh, Larry; Hannenhalli, Sridhar

    Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation. Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in a (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on three biological cases in cow and in yeast. While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.

  14. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    PubMed Central

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775

  15. THP-1 monocytes but not macrophages as a potential alternative for CD34{sup +} dendritic cells to identify chemical skin sensitizers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lambrechts, Nathalie; Verstraelen, Sandra; Lodewyckx, Hanne

    2009-04-15

    Early detection of the sensitizing potential of chemicals is an emerging issue for chemical, pharmaceutical and cosmetic industries. In our institute, an in vitro classification model for prediction of chemical-induced skin sensitization based on gene expression signatures in human CD34{sup +} progenitor-derived dendritic cells (DC) has been developed. This primary cell model is able to closely mimic the induction phase of sensitization by Langerhans cells in the skin, but it has drawbacks, such as the availability of cord blood. The aim of this study was to investigate whether human in vitro cultured THP-1 monocytes or macrophages display a similar expressionmore » profile for 13 predictive gene markers previously identified in DC and whether they also possess a discriminating capacity towards skin sensitizers and non-sensitizers based on these marker genes. To this end, the cell models were exposed to 5 skin sensitizers (ammonium hexachloroplatinate IV, 1-chloro-2,4-dinitrobenzene, eugenol, para-phenylenediamine, and tetramethylthiuram disulfide) and 5 non-sensitizers (L-glutamic acid, methyl salicylate, sodium dodecyl sulfate, tributyltin chloride, and zinc sulfate) for 6, 10, and 24 h, and mRNA expression of the 13 genes was analyzed using real-time RT-PCR. The transcriptional response of 7 out of 13 genes in THP-1 monocytes was significantly correlated with DC, whereas only 2 out of 13 genes in THP-1 macrophages. After a cross-validation of a discriminant analysis of the gene expression profiles in the THP-1 monocytes, this cell model demonstrated to also have a capacity to distinguish skin sensitizers from non-sensitizers. However, the DC model was superior to the monocyte model for discrimination of (non-)sensitizing chemicals.« less

  16. Genomic variants in an inbred mouse model predict mania-like behaviors.

    PubMed

    Saul, Michael C; Stevenson, Sharon A; Zhao, Changjiu; Driessen, Terri M; Eisinger, Brian E; Gammie, Stephen C

    2018-01-01

    Contemporary rodent models for bipolar disorders split the bipolar spectrum into complimentary behavioral endophenotypes representing mania and depression. Widely accepted mania models typically utilize single gene transgenics or pharmacological manipulations, but inbred rodent strains show great potential as mania models. Their acceptance is often limited by the lack of genotypic data needed to establish construct validity. In this study, we used a unique strategy to inexpensively explore and confirm population allele differences in naturally occurring candidate variants in a manic rodent strain, the Madison (MSN) mouse strain. Variants were identified using whole exome resequencing on a small population of animals. Interesting candidate variants were confirmed in a larger population with genotyping. We enriched these results with observations of locomotor behavior from a previous study. Resequencing identified 447 structural variants that are mostly fixed in the MSN strain relative to control strains. After filtering and annotation, we found 11 non-synonymous MSN variants that we believe alter protein function. The allele frequencies for 6 of these variants were consistent with explanatory variants for the Madison strain's phenotype. The variants are in the Npas2, Cp, Polr3c, Smarca4, Trpv1, and Slc5a7 genes, and many of these genes' products are in pathways implicated in human bipolar disorders. Variants in Smarca4 and Polr3c together explained over 40% of the variance in locomotor behavior in the Hsd:ICR founder strain. These results enhance the MSN strain's construct validity and implicate altered nucleosome structure and transcriptional regulation as a chief molecular system underpinning behavior.

  17. Inference of Gene Regulatory Networks Incorporating Multi-Source Biological Knowledge via a State Space Model with L1 Regularization

    PubMed Central

    Hasegawa, Takanori; Yamaguchi, Rui; Nagasaki, Masao; Miyano, Satoru; Imoto, Seiya

    2014-01-01

    Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in the field of systems biology. Currently, there are two main approaches in GRN analysis using time-course observation data, namely an ordinary differential equation (ODE)-based approach and a statistical model-based approach. The ODE-based approach can generate complex dynamics of GRNs according to biologically validated nonlinear models. However, it cannot be applied to ten or more genes to simultaneously estimate system dynamics and regulatory relationships due to the computational difficulties. The statistical model-based approach uses highly abstract models to simply describe biological systems and to infer relationships among several hundreds of genes from the data. However, the high abstraction generates false regulations that are not permitted biologically. Thus, when dealing with several tens of genes of which the relationships are partially known, a method that can infer regulatory relationships based on a model with low abstraction and that can emulate the dynamics of ODE-based models while incorporating prior knowledge is urgently required. To accomplish this, we propose a method for inference of GRNs using a state space representation of a vector auto-regressive (VAR) model with L1 regularization. This method can estimate the dynamic behavior of genes based on linear time-series modeling constructed from an ODE-based model and can infer the regulatory structure among several tens of genes maximizing prediction ability for the observational data. Furthermore, the method is capable of incorporating various types of existing biological knowledge, e.g., drug kinetics and literature-recorded pathways. The effectiveness of the proposed method is shown through a comparison of simulation studies with several previous methods. For an application example, we evaluated mRNA expression profiles over time upon corticosteroid stimulation in rats, thus incorporating corticosteroid kinetics/dynamics, literature-recorded pathways and transcription factor (TF) information. PMID:25162401

  18. Inferring evolution of gene duplicates using probabilistic models and nonparametric belief propagation.

    PubMed

    Zeng, Jia; Hannenhalli, Sridhar

    2013-01-01

    Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study. Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human. Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.

  19. Housekeeping while brain's storming Validation of normalizing factors for gene expression studies in a murine model of traumatic brain injury

    PubMed Central

    Rhinn, Hervé; Marchand-Leroux, Catherine; Croci, Nicole; Plotkine, Michel; Scherman, Daniel; Escriou, Virginie

    2008-01-01

    Background Traumatic brain injury models are widely studied, especially through gene expression, either to further understand implied biological mechanisms or to assess the efficiency of potential therapies. A large number of biological pathways are affected in brain trauma models, whose elucidation might greatly benefit from transcriptomic studies. However the suitability of reference genes needed for quantitative RT-PCR experiments is missing for these models. Results We have compared five potential reference genes as well as total cDNA level monitored using Oligreen reagent in order to determine the best normalizing factors for quantitative RT-PCR expression studies in the early phase (0–48 h post-trauma (PT)) of a murine model of diffuse brain injury. The levels of 18S rRNA, and of transcripts of β-actin, glyceraldehyde-3P-dehydrogenase (GAPDH), β-microtubulin and S100β were determined in the injured brain region of traumatized mice sacrificed at 30 min, 3 h, 6 h, 12 h, 24 h and 48 h post-trauma. The stability of the reference genes candidates and of total cDNA was evaluated by three different methods, leading to the following rankings as normalization factors, from the most suitable to the less: by using geNorm VBA applet, we obtained the following sequence: cDNA(Oligreen); GAPDH > 18S rRNA > S100β > β-microtubulin > β-actin; by using NormFinder Excel Spreadsheet, we obtained the following sequence: GAPDH > cDNA(Oligreen) > S100β > 18S rRNA > β-actin > β-microtubulin; by using a Confidence-Interval calculation, we obtained the following sequence: cDNA(Oligreen) > 18S rRNA; GAPDH > S100β > β-microtubulin > β-actin. Conclusion This work suggests that Oligreen cDNA measurements, 18S rRNA and GAPDH or a combination of them may be used to efficiently normalize qRT-PCR gene expression in mouse brain trauma injury, and that β-actin and β-microtubulin should be avoided. The potential of total cDNA as measured by Oligreen as a first-intention normalizing factor with a broad field of applications is highlighted. Pros and cons of the three methods of normalization factors selection are discussed. A generic time- and cost-effective procedure for normalization factor validation is proposed. PMID:18611280

  20. Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.

    PubMed

    Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin

    2005-01-01

    DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.

  1. Construction and Experimental Validation of a Petri Net Model of Wnt/β-Catenin Signaling.

    PubMed

    Jacobsen, Annika; Heijmans, Nika; Verkaar, Folkert; Smit, Martine J; Heringa, Jaap; van Amerongen, Renée; Feenstra, K Anton

    2016-01-01

    The Wnt/β-catenin signaling pathway is important for multiple developmental processes and tissue maintenance in adults. Consequently, deregulated signaling is involved in a range of human diseases including cancer and developmental defects. A better understanding of the intricate regulatory mechanism and effect of physiological (active) and pathophysiological (hyperactive) WNT signaling is important for predicting treatment response and developing novel therapies. The constitutively expressed CTNNB1 (commonly and hereafter referred to as β-catenin) is degraded by a destruction complex, composed of amongst others AXIN1 and GSK3. The destruction complex is inhibited during active WNT signaling, leading to β-catenin stabilization and induction of β-catenin/TCF target genes. In this study we investigated the mechanism and effect of β-catenin stabilization during active and hyperactive WNT signaling in a combined in silico and in vitro approach. We constructed a Petri net model of Wnt/β-catenin signaling including main players from the plasma membrane (WNT ligands and receptors), cytoplasmic effectors and the downstream negative feedback target gene AXIN2. We validated that our model can be used to simulate both active (WNT stimulation) and hyperactive (GSK3 inhibition) signaling by comparing our simulation and experimental data. We used this experimentally validated model to get further insights into the effect of the negative feedback regulator AXIN2 upon WNT stimulation and observed an attenuated β-catenin stabilization. We furthermore simulated the effect of APC inactivating mutations, yielding a stabilization of β-catenin levels comparable to the Wnt-pathway activities observed in colorectal and breast cancer. Our model can be used for further investigation and viable predictions of the role of Wnt/β-catenin signaling in oncogenesis and development.

  2. Construction and Experimental Validation of a Petri Net Model of Wnt/β-Catenin Signaling

    PubMed Central

    Heijmans, Nika; Verkaar, Folkert; Smit, Martine J.; Heringa, Jaap

    2016-01-01

    The Wnt/β-catenin signaling pathway is important for multiple developmental processes and tissue maintenance in adults. Consequently, deregulated signaling is involved in a range of human diseases including cancer and developmental defects. A better understanding of the intricate regulatory mechanism and effect of physiological (active) and pathophysiological (hyperactive) WNT signaling is important for predicting treatment response and developing novel therapies. The constitutively expressed CTNNB1 (commonly and hereafter referred to as β-catenin) is degraded by a destruction complex, composed of amongst others AXIN1 and GSK3. The destruction complex is inhibited during active WNT signaling, leading to β-catenin stabilization and induction of β-catenin/TCF target genes. In this study we investigated the mechanism and effect of β-catenin stabilization during active and hyperactive WNT signaling in a combined in silico and in vitro approach. We constructed a Petri net model of Wnt/β-catenin signaling including main players from the plasma membrane (WNT ligands and receptors), cytoplasmic effectors and the downstream negative feedback target gene AXIN2. We validated that our model can be used to simulate both active (WNT stimulation) and hyperactive (GSK3 inhibition) signaling by comparing our simulation and experimental data. We used this experimentally validated model to get further insights into the effect of the negative feedback regulator AXIN2 upon WNT stimulation and observed an attenuated β-catenin stabilization. We furthermore simulated the effect of APC inactivating mutations, yielding a stabilization of β-catenin levels comparable to the Wnt-pathway activities observed in colorectal and breast cancer. Our model can be used for further investigation and viable predictions of the role of Wnt/β-catenin signaling in oncogenesis and development. PMID:27218469

  3. Genome-wide essential gene identification in Streptococcus sanguinis

    PubMed Central

    Xu, Ping; Ge, Xiuchun; Chen, Lei; Wang, Xiaojing; Dou, Yuetan; Xu, Jerry Z.; Patel, Jenishkumar R.; Stone, Victoria; Trinh, My; Evans, Karra; Kitten, Todd; Bonchev, Danail; Buck, Gregory A.

    2011-01-01

    A clear perception of gene essentiality in bacterial pathogens is pivotal for identifying drug targets to combat emergence of new pathogens and antibiotic-resistant bacteria, for synthetic biology, and for understanding the origins of life. We have constructed a comprehensive set of deletion mutants and systematically identified a clearly defined set of essential genes for Streptococcus sanguinis. Our results were confirmed by growing S. sanguinis in minimal medium and by double-knockout of paralogous or isozyme genes. Careful examination revealed that these essential genes were associated with only three basic categories of biological functions: maintenance of the cell envelope, energy production, and processing of genetic information. Our finding was subsequently validated in two other pathogenic streptococcal species, Streptococcus pneumoniae and Streptococcus mutans and in two other gram-positive pathogens, Bacillus subtilis and Staphylococcus aureus. Our analysis has thus led to a simplified model that permits reliable prediction of gene essentiality. PMID:22355642

  4. Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

    PubMed

    Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

    2016-02-01

    Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  5. Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis.

    PubMed

    Swindell, William R; Johnston, Andrew; Carbajal, Steve; Han, Gangwen; Wohn, Christian; Lu, Jun; Xing, Xianying; Nair, Rajan P; Voorhees, John J; Elder, James T; Wang, Xiao-Jing; Sano, Shigetoshi; Prens, Errol P; DiGiovanni, John; Pittelkow, Mark R; Ward, Nicole L; Gudjonsson, Johann E

    2011-04-04

    Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the human disease, and standardized validation criteria for psoriasis mouse models have not been widely applied. In this study, whole-genome transcriptional profiling is used to compare gene expression patterns manifested by human psoriatic skin lesions with those that occur in five psoriasis mouse models (K5-Tie2, imiquimod, K14-AREG, K5-Stat3C and K5-TGFbeta1). While the cutaneous gene expression profiles associated with each mouse phenotype exhibited statistically significant similarity to the expression profile of psoriasis in humans, each model displayed distinctive sets of similarities and differences in comparison to human psoriasis. For all five models, correspondence to the human disease was strong with respect to genes involved in epidermal development and keratinization. Immune and inflammation-associated gene expression, in contrast, was more variable between models as compared to the human disease. These findings support the value of all five models as research tools, each with identifiable areas of convergence to and divergence from the human disease. Additionally, the approach used in this paper provides an objective and quantitative method for evaluation of proposed mouse models of psoriasis, which can be strategically applied in future studies to score strengths of mouse phenotypes relative to specific aspects of human psoriasis.

  6. Whole gene expression profile in blood reveals multiple pathways deregulation in R6/2 mouse model

    PubMed Central

    2013-01-01

    Background Huntington Disease (HD) is a progressive neurological disorder, with pathological manifestations in brain areas and in periphery caused by the ubiquitous expression of mutant Huntingtin protein. Transcriptional dysregulation is considered a key molecular mechanism responsible of HD pathogenesis but, although numerous studies investigated mRNA alterations in HD, so far none evaluated a whole gene expression profile in blood of R6/2 mouse model. Findings To discover novel pathogenic mechanisms and potential peripheral biomarkers useful to monitor disease progression or drug efficacy, a microarray study was performed in blood of R6/2 at manifest stage and wild type littermate mice. This approach allowed to propose new peripheral molecular processes involved in HD and to suggest different panels of candidate biomarkers. Among the discovered deregulated processes, we focused on specific ones: complement and coagulation cascades, PPAR signaling, cardiac muscle contraction, and dilated cardiomyopathy pathways. Selected genes derived from these pathways were additionally investigated in other accessible tissues to validate these matrices as source of biomarkers, and in brain, to link central and peripheral disease manifestations. Conclusions Our findings validated the skeletal muscle as suitable source to investigate peripheral transcriptional alterations in HD and supported the hypothesis that immunological alteration may contribute to neurological degeneration. Moreover, the identification of altered signaling in mouse blood enforce R6/2 transgenic mouse as a powerful HD model while suggesting novel disease biomarkers for pre-clinical investigation. PMID:24252798

  7. SNPs in stress-responsive rice genes: validation, genotyping, functional relevance and population structure

    PubMed Central

    2012-01-01

    Background Single nucleotide polymorphism (SNP) validation and large-scale genotyping are required to maximize the use of DNA sequence variation and determine the functional relevance of candidate genes for complex stress tolerance traits through genetic association in rice. We used the bead array platform-based Illumina GoldenGate assay to validate and genotype SNPs in a select set of stress-responsive genes to understand their functional relevance and study the population structure in rice. Results Of the 384 putative SNPs assayed, we successfully validated and genotyped 362 (94.3%). Of these 325 (84.6%) showed polymorphism among the 91 rice genotypes examined. Physical distribution, degree of allele sharing, admixtures and introgression, and amino acid replacement of SNPs in 263 abiotic and 62 biotic stress-responsive genes provided clues for identification and targeted mapping of trait-associated genomic regions. We assessed the functional and adaptive significance of validated SNPs in a set of contrasting drought tolerant upland and sensitive lowland rice genotypes by correlating their allelic variation with amino acid sequence alterations in catalytic domains and three-dimensional secondary protein structure encoded by stress-responsive genes. We found a strong genetic association among SNPs in the nine stress-responsive genes with upland and lowland ecological adaptation. Higher nucleotide diversity was observed in indica accessions compared with other rice sub-populations based on different population genetic parameters. The inferred ancestry of 16% among rice genotypes was derived from admixed populations with the maximum between upland aus and wild Oryza species. Conclusions SNPs validated in biotic and abiotic stress-responsive rice genes can be used in association analyses to identify candidate genes and develop functional markers for stress tolerance in rice. PMID:22921105

  8. Intricate interplay between astrocytes and motor neurons in ALS

    PubMed Central

    Phatnani, Hemali P.; Guarnieri, Paolo; Friedman, Brad A.; Carrasco, Monica A.; Muratet, Michael; O’Keeffe, Sean; Nwakeze, Chiamaka; Pauli-Behn, Florencia; Newberry, Kimberly M.; Meadows, Sarah K.; Tapia, Juan Carlos; Myers, Richard M.; Maniatis, Tom

    2013-01-01

    ALS results from the selective and progressive degeneration of motor neurons. Although the underlying disease mechanisms remain unknown, glial cells have been implicated in ALS disease progression. Here, we examine the effects of glial cell/motor neuron interactions on gene expression using the hSOD1G93A (the G93A allele of the human superoxide dismutase gene) mouse model of ALS. We detect striking cell autonomous and nonautonomous changes in gene expression in cocultured motor neurons and glia, revealing that the two cell types profoundly affect each other. In addition, we found a remarkable concordance between the cell culture data and expression profiles of whole spinal cords and acutely isolated spinal cord cells during disease progression in the G93A mouse model, providing validation of the cell culture approach. Bioinformatics analyses identified changes in the expression of specific genes and signaling pathways that may contribute to motor neuron degeneration in ALS, among which are TGF-β signaling pathways. PMID:23388633

  9. Altered Expression of Diabetes-Related Genes in Alzheimer's Disease Brains: The Hisayama Study

    PubMed Central

    Hokama, Masaaki; Oka, Sugako; Leon, Julio; Ninomiya, Toshiharu; Honda, Hiroyuki; Sasaki, Kensuke; Iwaki, Toru; Ohara, Tomoyuki; Sasaki, Tomio; LaFerla, Frank M.; Kiyohara, Yutaka; Nakabeppu, Yusaku

    2014-01-01

    Diabetes mellitus (DM) is considered to be a risk factor for dementia including Alzheimer's disease (AD). However, the molecular mechanism underlying this risk is not well understood. We examined gene expression profiles in postmortem human brains donated for the Hisayama study. Three-way analysis of variance of microarray data from frontal cortex, temporal cortex, and hippocampus was performed with the presence/absence of AD and vascular dementia, and sex, as factors. Comparative analyses of expression changes in the brains of AD patients and a mouse model of AD were also performed. Relevant changes in gene expression identified by microarray analysis were validated by quantitative real-time reverse-transcription polymerase chain reaction and western blotting. The hippocampi of AD brains showed the most significant alteration in gene expression profile. Genes involved in noninsulin-dependent DM and obesity were significantly altered in both AD brains and the AD mouse model, as were genes related to psychiatric disorders and AD. The alterations in the expression profiles of DM-related genes in AD brains were independent of peripheral DM-related abnormalities. These results indicate that altered expression of genes related to DM in AD brains is a result of AD pathology, which may thereby be exacerbated by peripheral insulin resistance or DM. PMID:23595620

  10. Selection and validation of reference genes for gene expression analysis in apomictic and sexual Cenchrus ciliaris

    PubMed Central

    2013-01-01

    Background Apomixis is a naturally occurring asexual mode of seed reproduction resulting in offspring genetically identical to the maternal plant. Identifying differential gene expression patterns between apomictic and sexual plants is valuable to help deconstruct the trait. Quantitative RT-PCR (qRT-PCR) is a popular method for analyzing gene expression. Normalizing gene expression data using proper reference genes which show stable expression under investigated conditions is critical in qRT-PCR analysis. We used qRT-PCR to validate expression and stability of six potential reference genes (EF1alpha, EIF4A, UBCE, GAPDH, ACT2 and TUBA) in vegetative and reproductive tissues of B-2S and B-12-9 accessions of C. ciliaris. Findings Among tissue types evaluated, EF1alpha showed the highest level of expression while TUBA showed the lowest. When all tissue types were evaluated and compared between genotypes, EIF4A was the most stable reference gene. Gene expression stability for specific ovary stages of B-2S and B-12-9 was also determined. Except for TUBA, all other tested reference genes could be used for any stage-specific ovary tissue normalization, irrespective of the mode of reproduction. Conclusion Our gene expression stability assay using six reference genes, in sexual and apomictic accessions of C. ciliaris, suggests that EIF4A is the most stable gene across all tissue types analyzed. All other tested reference genes, with the exception of TUBA, could be used for gene expression comparison studies between sexual and apomictic ovaries over multiple developmental stages. This reference gene validation data in C. ciliaris will serve as an important base for future apomixis-related transcriptome data validation. PMID:24083672

  11. Log-Linear Models for Gene Association

    PubMed Central

    Hu, Jianhua; Joshi, Adarsh; Johnson, Valen E.

    2009-01-01

    We describe a class of log-linear models for the detection of interactions in high-dimensional genomic data. This class of models leads to a Bayesian model selection algorithm that can be applied to data that have been reduced to contingency tables using ranks of observations within subjects, and discretization of these ranks within gene/network components. Many normalization issues associated with the analysis of genomic data are thereby avoided. A prior density based on Ewens’ sampling distribution is used to restrict the number of interacting components assigned high posterior probability, and the calculation of posterior model probabilities is expedited by approximations based on the likelihood ratio statistic. Simulation studies are used to evaluate the efficiency of the resulting algorithm for known interaction structures. Finally, the algorithm is validated in a microarray study for which it was possible to obtain biological confirmation of detected interactions. PMID:19655032

  12. Sequencing and Validation of Reference Genes to Analyze Endogenous Gene Expression and Quantify Yellow Dwarf Viruses Using RT-qPCR in Viruliferous Rhopalosiphum padi

    PubMed Central

    Wu, Keke; Liu, Wenwen; Mar, Thithi; Liu, Yan; Wu, Yunfeng; Wang, Xifeng

    2014-01-01

    The bird cherry-oat aphid (Rhopalosiphum padi), an important pest of cereal crops, not only directly sucks sap from plants, but also transmits a number of plant viruses, collectively the yellow dwarf viruses (YDVs). For quantifying changes in gene expression in vector aphids, reverse transcription-quantitative polymerase chain reaction (RT-qPCR) is a touchstone method, but the selection and validation of housekeeping genes (HKGs) as reference genes to normalize the expression level of endogenous genes of the vector and for exogenous genes of the virus in the aphids is critical to obtaining valid results. Such an assessment has not been done, however, for R. padi and YDVs. Here, we tested three algorithms (GeNorm, NormFinder and BestKeeper) to assess the suitability of candidate reference genes (EF-1α, ACT1, GAPDH, 18S rRNA) in 6 combinations of YDV and vector aphid morph. EF-1α and ACT1 together or in combination with GAPDH or with GAPDH and 18S rRNA could confidently be used to normalize virus titre and expression levels of endogenous genes in winged or wingless R. padi infected with Barley yellow dwarf virus isolates (BYDV)-PAV and BYDV-GAV. The use of only one reference gene, whether the most stably expressed (EF-1α) or the least stably expressed (18S rRNA), was not adequate for obtaining valid relative expression data from the RT-qPCR. Because of discrepancies among values for changes in relative expression obtained using 3 regions of the same gene, different regions of an endogenous aphid gene, including each terminus and the middle, should be analyzed at the same time with RT-qPCR. Our results highlight the necessity of choosing the best reference genes to obtain valid experimental data and provide several HKGs for relative quantification of virus titre in YDV-viruliferous aphids. PMID:24810421

  13. A gene expression inflammatory signature specifically predicts multiple myeloma evolution and patients survival.

    PubMed

    Botta, C; Di Martino, M T; Ciliberto, D; Cucè, M; Correale, P; Rossi, M; Tagliaferri, P; Tassone, P

    2016-12-16

    Multiple myeloma (MM) is closely dependent on cross-talk between malignant plasma cells and cellular components of the inflammatory/immunosuppressive bone marrow milieu, which promotes disease progression, drug resistance, neo-angiogenesis, bone destruction and immune-impairment. We investigated the relevance of inflammatory genes in predicting disease evolution and patient survival. A bioinformatics study by Ingenuity Pathway Analysis on gene expression profiling dataset of monoclonal gammopathy of undetermined significance, smoldering and symptomatic-MM, identified inflammatory and cytokine/chemokine pathways as the most progressively affected during disease evolution. We then selected 20 candidate genes involved in B-cell inflammation and we investigated their role in predicting clinical outcome, through univariate and multivariate analyses (log-rank test, logistic regression and Cox-regression model). We defined an 8-genes signature (IL8, IL10, IL17A, CCL3, CCL5, VEGFA, EBI3 and NOS2) identifying each condition (MGUS/smoldering/symptomatic-MM) with 84% accuracy. Moreover, six genes (IFNG, IL2, LTA, CCL2, VEGFA, CCL3) were found independently correlated with patients' survival. Patients whose MM cells expressed high levels of Th1 cytokines (IFNG/LTA/IL2/CCL2) and low levels of CCL3 and VEGFA, experienced the longest survival. On these six genes, we built a prognostic risk score that was validated in three additional independent datasets. In this study, we provide proof-of-concept that inflammation has a critical role in MM patient progression and survival. The inflammatory-gene prognostic signature validated in different datasets clearly indicates novel opportunities for personalized anti-MM treatment.

  14. Novel prediction of anticancer drug chemosensitivity in cancer cell lines: evidence of moderation by microRNA expressions.

    PubMed

    Yang, Daniel S

    2014-01-01

    The objectives of this study are (1) to develop a novel "moderation" model of drug chemosensitivity and (2) to investigate if miRNA expression moderates the relationship between gene expression and drug chemosensitivity, specifically for HSP90 inhibitors applied to human cancer cell lines. A moderation model integrating the interaction between miRNA and gene expressions was developed to examine if miRNA expression affects the strength of the relationship between gene expression and chemosensitivity. Comprehensive datasets on miRNA expressions, gene expressions, and drug chemosensitivities were obtained from National Cancer Institute's NCI-60 cell lines including nine different cancer types. A workflow including steps of selecting genes, miRNAs, and compounds, correlating gene expression with chemosensitivity, and performing multivariate analysis was utilized to test the proposed model. The proposed moderation model identified 12 significantly-moderating miRNAs: miR-15b*, miR-16-2*, miR-9, miR-126*, miR-129*, miR-138, miR-519e*, miR-624*, miR-26b, miR-30e*, miR-32, and miR-196a, as well as two genes ERCC2 and SF3B1 which affect chemosensitivities of Tanespimycin and Alvespimycin - both HSP90 inhibitors. A bootstrap resampling of 2,500 times validates the significance of all 12 identified miRNAs. The results confirm that certain miRNA and gene expressions interact to produce an effect on drug response. The lack of correlation between miRNA and gene expression themselves suggests that miRNA transmits its effect through translation inhibition/control rather than mRNA degradation. The results suggest that miRNAs could serve not only as prognostic biomarkers for cancer treatment outcome but also as interventional agents to modulate desired chemosensitivity.

  15. Form-Deprivation Myopia in Chick Induces Limited Changes in Retinal Gene Expression

    PubMed Central

    McGlinn, Alice M.; Baldwin, Donald A.; Tobias, John W.; Budak, Murat T.; Khurana, Tejvir S.; Stone, Richard A.

    2007-01-01

    Purpose Evidence has implicated the retina as a principal controller of refractive development. In the present study, the retinal transcriptome was analyzed to identify alterations in gene expression and potential signaling pathways involved in form-deprivation myopia of the chick. Methods One-week-old white Leghorn chicks wore a unilateral image-degrading goggle for 6 hours or 3 days (n = 6 at each time). Total RNA from the retina/(retinal pigment epithelium) was used for expression profiling with chicken gene microarrays (Chicken GeneChips; Affymetrix, Santa Clara, CA). To identify gene expression level differences between goggled and contralateral nongoggled eyes, normalized microarray signal intensities were analyzed by the significance analysis of microarrays (SAM) approach. Differentially expressed genes were validated by real-time quantitative reverse transcription–polymerase chain reaction (qPCR) in independent biological replicates. Results Small changes were detected in differentially expressed genes in form-deprived eyes. In chickens that had 6 hours of goggle wear, downregulation of bone morphogenetic protein 2 and connective tissue growth factor was validated. In those with 3 days of goggle wear, downregulation of bone morphogenetic protein 2, vasoactive intestinal peptide, preopro-urotensin II–related peptide and mitogen-activated protein kinase phosphatase 2 was validated, and upregulation of endothelin receptor type B and interleukin-18 was validated. Conclusions Form-deprivation myopia, in its early stages, is associated with only minimal changes in retinal gene expression at the level of the transcriptome. While the list of validated genes is short, each merits further study for potential involvement in the signaling cascade mediating myopia development. PMID:17652709

  16. Creation of knock out and knock in mice by CRISPR/Cas9 to validate candidate genes for human male infertility, interest, difficulties and feasibility.

    PubMed

    Kherraf, Zine-Eddine; Conne, Beatrice; Amiri-Yekta, Amir; Kent, Marie Christou; Coutton, Charles; Escoffier, Jessica; Nef, Serge; Arnoult, Christophe; Ray, Pierre F

    2018-06-15

    High throughput sequencing (HTS) and CRISPR/Cas9 are two recent technologies that are currently revolutionizing biological and clinical research. Both techniques are complementary as HTS permits to identify new genetic variants and genes involved in various pathologies and CRISPR/Cas9 permits to create animals or cell models to validate the effect of the identified variants, to characterize the pathogeny of the identified variants and the function of the genes of interest and ultimately to provide ways of correcting the molecular defects. We analyzed a cohort of 78 infertile men presenting with multiple morphological anomalies of the sperm flagella (MMAF), a severe form of male infertility. Using whole exome sequencing (WES), homozygous mutations in autosomal candidate genes were identified in 63% of the tested subjects. We decided to produce by CRISPR/cas9 four knock-out (KO) and one knock-in (KI) mouse lines to confirm these results and to increase our understanding of the physiopathology associated with these genetic variations. Overall 31% of the live pups obtained presented a mutational event in one of the targeted regions. All identified events were insertions or deletions localized near the PAM sequence. Surprisingly we observed a high rate of germline mosaicism as 30% of the F1 displayed a different mutation than the parental event characterized on somatic tissue (tail), indicating that CRISPR/Cas9 mutational events kept happening several cell divisions after the injection. Overall, we created mouse models for 5 distinct loci and in each case homozygous animals could be obtained in approximately 6 months. These results demonstrate that the combined use of WES and CRISPR/Cas9 is an efficient and timely strategy to identify and validate mutations responsible for infertility phenotypes in human. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Multicenter validation of the diagnostic accuracy of a blood-based gene expression test for assessing obstructive coronary artery disease in nondiabetic patients.

    PubMed

    Rosenberg, Steven; Elashoff, Michael R; Beineke, Philip; Daniels, Susan E; Wingrove, James A; Tingley, Whittemore G; Sager, Philip T; Sehnert, Amy J; Yau, May; Kraus, William E; Newby, L Kristin; Schwartz, Robert S; Voros, Szilard; Ellis, Stephen G; Tahirkheli, Naeem; Waksman, Ron; McPherson, John; Lansky, Alexandra; Winn, Mary E; Schork, Nicholas J; Topol, Eric J

    2010-10-05

    Diagnosing obstructive coronary artery disease (CAD) in at-risk patients can be challenging and typically requires both noninvasive imaging methods and coronary angiography, the gold standard. Previous studies have suggested that peripheral blood gene expression can indicate the presence of CAD. To validate a previously developed 23-gene, expression-based classification test for diagnosis of obstructive CAD in nondiabetic patients. Multicenter prospective trial with blood samples obtained before coronary angiography. (ClinicalTrials.gov registration number: NCT00500617) SETTING: 39 centers in the United States. An independent validation cohort of 526 nondiabetic patients with a clinical indication for coronary angiography. Receiver-operating characteristic (ROC) analysis of classifier score measured by real-time polymerase chain reaction, additivity to clinical factors, and reclassification of patient disease likelihood versus disease status defined by quantitative coronary angiography. Obstructive CAD was defined as 50% or greater stenosis in 1 or more major coronary arteries by quantitative coronary angiography. The area under the ROC curve (AUC) was 0.70 ± 0.02 (P < 0.001); the test added to clinical variables (Diamond-Forrester method) (AUC, 0.72 with the test vs. 0.66 without; P = 0.003) and added somewhat to an expanded clinical model (AUC, 0.745 with the test vs. 0.732 without; P = 0.089). The test improved net reclassification over both the Diamond-Forrester method and the expanded clinical model (P < 0.001). At a score threshold that corresponded to a 20% likelihood of obstructive CAD (14.75), the sensitivity and specificity were 85% and 43% (yielding a negative predictive value of 83% and a positive predictive value of 46%), with 33% of patient scores below this threshold. Patients with chronic inflammatory disorders, elevated levels of leukocytes or cardiac protein markers, or diabetes were excluded. A noninvasive whole-blood test based on gene expression and demographic characteristics may be useful for assessing obstructive CAD in nondiabetic patients without known CAD. CardioDx.

  18. Gene expression studies of reference genes for quantitative real-time PCR: an overview in insects.

    PubMed

    Shakeel, Muhammad; Rodriguez, Alicia; Tahir, Urfa Bin; Jin, Fengliang

    2018-02-01

    Whenever gene expression is being examined, it is essential that a normalization process is carried out to eliminate non-biological variations. The use of reference genes, such as glyceraldehyde-3-phosphate dehydrogenase, actin, and ribosomal protein genes, is the usual method of choice for normalizing gene expression. Although reference genes are used to normalize target gene expression, a major problem is that the stability of these genes differs among tissues, developmental stages, species, and responses to abiotic factors. Therefore, the use and validation of multiple reference genes are required. This review discusses the reasons that why RT-qPCR has become the preferred method for validating results of gene expression profiles, the use of specific and non-specific dyes and the importance of use of primers and probes for qPCR as well as to discuss several statistical algorithms developed to help the validation of potential reference genes. The conflicts arising in the use of classical reference genes in gene normalization and their replacement with novel references are also discussed by citing the high stability and low stability of classical and novel reference genes under various biotic and abiotic experimental conditions by employing various methods applied for the reference genes amplification.

  19. Radiogenomics to characterize regional genetic heterogeneity in glioblastoma.

    PubMed

    Hu, Leland S; Ning, Shuluo; Eschbacher, Jennifer M; Baxter, Leslie C; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C; Peng, Sen; Smith, Kris A; Nakaji, Peter; Karis, John P; Quarles, C Chad; Wu, Teresa; Loftus, Joseph C; Jenkins, Robert B; Sicotte, Hugues; Kollmeyer, Thomas M; O'Neill, Brian P; Elmquist, William; Hoxworth, Joseph M; Frakes, David; Sarkaria, Jann; Swanson, Kristin R; Tran, Nhan L; Li, Jing; Mitchell, J Ross

    2017-01-01

    Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Complexity of Gene Expression Evolution after Duplication: Protein Dosage Rebalancing

    PubMed Central

    Rogozin, Igor B.

    2014-01-01

    Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes. PMID:25197576

  1. Modeling 3D Facial Shape from DNA

    PubMed Central

    Claes, Peter; Liberton, Denise K.; Daniels, Katleen; Rosana, Kerri Matthes; Quillen, Ellen E.; Pearson, Laurel N.; McEvoy, Brian; Bauchet, Marc; Zaidi, Arslan A.; Yao, Wei; Tang, Hua; Barsh, Gregory S.; Absher, Devin M.; Puts, David A.; Rocha, Jorge; Beleza, Sandra; Pereira, Rinaldo W.; Baynam, Gareth; Suetens, Paul; Vandermeulen, Dirk; Wagner, Jennifer K.; Boster, James S.; Shriver, Mark D.

    2014-01-01

    Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks to measure face shape in population samples with mixed West African and European ancestry from three locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables, which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting normal-range facial features and for approximating the appearance of a face from genetic markers. PMID:24651127

  2. The application of artificial intelligence to microarray data: identification of a novel gene signature to identify bladder cancer progression.

    PubMed

    Catto, James W F; Abbod, Maysam F; Wild, Peter J; Linkens, Derek A; Pilarsky, Christian; Rehman, Ishtiaq; Rosario, Derek J; Denzinger, Stefan; Burger, Maximilian; Stoehr, Robert; Knuechel, Ruth; Hartmann, Arndt; Hamdy, Freddie C

    2010-03-01

    New methods for identifying bladder cancer (BCa) progression are required. Gene expression microarrays can reveal insights into disease biology and identify novel biomarkers. However, these experiments produce large datasets that are difficult to interpret. To develop a novel method of microarray analysis combining two forms of artificial intelligence (AI): neurofuzzy modelling (NFM) and artificial neural networks (ANN) and validate it in a BCa cohort. We used AI and statistical analyses to identify progression-related genes in a microarray dataset (n=66 tumours, n=2800 genes). The AI-selected genes were then investigated in a second cohort (n=262 tumours) using immunohistochemistry. We compared the accuracy of AI and statistical approaches to identify tumour progression. AI identified 11 progression-associated genes (odds ratio [OR]: 0.70; 95% confidence interval [CI], 0.56-0.87; p=0.0004), and these were more discriminate than genes chosen using statistical analyses (OR: 1.24; 95% CI, 0.96-1.60; p=0.09). The expression of six AI-selected genes (LIG3, FAS, KRT18, ICAM1, DSG2, and BRCA2) was determined using commercial antibodies and successfully identified tumour progression (concordance index: 0.66; log-rank test: p=0.01). AI-selected genes were more discriminate than pathologic criteria at determining progression (Cox multivariate analysis: p=0.01). Limitations include the use of statistical correlation to identify 200 genes for AI analysis and that we did not compare regression identified genes with immunohistochemistry. AI and statistical analyses use different techniques of inference to determine gene-phenotype associations and identify distinct prognostic gene signatures that are equally valid. We have identified a prognostic gene signature whose members reflect a variety of carcinogenic pathways that could identify progression in non-muscle-invasive BCa. 2009 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  3. Analysis of blood-based gene expression in idiopathic Parkinson disease.

    PubMed

    Shamir, Ron; Klein, Christine; Amar, David; Vollstedt, Eva-Juliane; Bonin, Michael; Usenovic, Marija; Wong, Yvette C; Maver, Ales; Poths, Sven; Safer, Hershel; Corvol, Jean-Christophe; Lesage, Suzanne; Lavi, Ofer; Deuschl, Günther; Kuhlenbaeumer, Gregor; Pawlack, Heike; Ulitsky, Igor; Kasten, Meike; Riess, Olaf; Brice, Alexis; Peterlin, Borut; Krainc, Dimitri

    2017-10-17

    To examine whether gene expression analysis of a large-scale Parkinson disease (PD) patient cohort produces a robust blood-based PD gene signature compared to previous studies that have used relatively small cohorts (≤220 samples). Whole-blood gene expression profiles were collected from a total of 523 individuals. After preprocessing, the data contained 486 gene profiles (n = 205 PD, n = 233 controls, n = 48 other neurodegenerative diseases) that were partitioned into training, validation, and independent test cohorts to identify and validate a gene signature. Batch-effect reduction and cross-validation were performed to ensure signature reliability. Finally, functional and pathway enrichment analyses were applied to the signature to identify PD-associated gene networks. A gene signature of 100 probes that mapped to 87 genes, corresponding to 64 upregulated and 23 downregulated genes differentiating between patients with idiopathic PD and controls, was identified with the training cohort and successfully replicated in both an independent validation cohort (area under the curve [AUC] = 0.79, p = 7.13E-6) and a subsequent independent test cohort (AUC = 0.74, p = 4.2E-4). Network analysis of the signature revealed gene enrichment in pathways, including metabolism, oxidation, and ubiquitination/proteasomal activity, and misregulation of mitochondria-localized genes, including downregulation of COX4I1 , ATP5A1 , and VDAC3 . We present a large-scale study of PD gene expression profiling. This work identifies a reliable blood-based PD signature and highlights the importance of large-scale patient cohorts in developing potential PD biomarkers. © 2017 American Academy of Neurology.

  4. A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling.

    PubMed

    Klein, Eric A; Cooperberg, Matthew R; Magi-Galluzzi, Cristina; Simko, Jeffry P; Falzarano, Sara M; Maddala, Tara; Chan, June M; Li, Jianbo; Cowan, Janet E; Tsiatis, Athanasios C; Cherbavaz, Diana B; Pelham, Robert J; Tenggara-Hunter, Imelda; Baehner, Frederick L; Knezevic, Dejan; Febbo, Phillip G; Shak, Steven; Kattan, Michael W; Lee, Mark; Carroll, Peter R

    2014-09-01

    Prostate tumor heterogeneity and biopsy undersampling pose challenges to accurate, individualized risk assessment for men with localized disease. To identify and validate a biopsy-based gene expression signature that predicts clinical recurrence, prostate cancer (PCa) death, and adverse pathology. Gene expression was quantified by reverse transcription-polymerase chain reaction for three studies-a discovery prostatectomy study (n=441), a biopsy study (n=167), and a prospectively designed, independent clinical validation study (n=395)-testing retrospectively collected needle biopsies from contemporary (1997-2011) patients with low to intermediate clinical risk who were candidates for active surveillance (AS). The main outcome measures defining aggressive PCa were clinical recurrence, PCa death, and adverse pathology at prostatectomy. Cox proportional hazards regression models were used to evaluate the association between gene expression and time to event end points. Results from the prostatectomy and biopsy studies were used to develop and lock a multigene-expression-based signature, called the Genomic Prostate Score (GPS); in the validation study, logistic regression was used to test the association between the GPS and pathologic stage and grade at prostatectomy. Decision-curve analysis and risk profiles were used together with clinical and pathologic characteristics to evaluate clinical utility. Of the 732 candidate genes analyzed, 288 (39%) were found to predict clinical recurrence despite heterogeneity and multifocality, and 198 (27%) were predictive of aggressive disease after adjustment for prostate-specific antigen, Gleason score, and clinical stage. Further analysis identified 17 genes representing multiple biological pathways that were combined into the GPS algorithm. In the validation study, GPS predicted high-grade (odds ratio [OR] per 20 GPS units: 2.3; 95% confidence interval [CI], 1.5-3.7; p<0.001) and high-stage (OR per 20 GPS units: 1.9; 95% CI, 1.3-3.0; p=0.003) at surgical pathology. GPS predicted high-grade and/or high-stage disease after controlling for established clinical factors (p<0.005) such as an OR of 2.1 (95% CI, 1.4-3.2) when adjusting for Cancer of the Prostate Risk Assessment score. A limitation of the validation study was the inclusion of men with low-volume intermediate-risk PCa (Gleason score 3+4), for whom some providers would not consider AS. Genes representing multiple biological pathways discriminate PCa aggressiveness in biopsy tissue despite tumor heterogeneity, multifocality, and limited sampling at time of biopsy. The biopsy-based 17-gene GPS improves prediction of the presence or absence of adverse pathology and may help men with PCa make more informed decisions between AS and immediate treatment. Prostate cancer (PCa) is often present in multiple locations within the prostate and has variable characteristics. We identified genes with expression associated with aggressive PCa to develop a biopsy-based, multigene signature, the Genomic Prostate Score (GPS). GPS was validated for its ability to predict men who have high-grade or high-stage PCa at diagnosis and may help men diagnosed with PCa decide between active surveillance and immediate definitive treatment. Copyright © 2014 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  5. Genetically engineered mouse models in oncology research and cancer medicine.

    PubMed

    Kersten, Kelly; de Visser, Karin E; van Miltenburg, Martine H; Jonkers, Jos

    2017-02-01

    Genetically engineered mouse models (GEMMs) have contributed significantly to the field of cancer research. In contrast to cancer cell inoculation models, GEMMs develop de novo tumors in a natural immune-proficient microenvironment. Tumors arising in advanced GEMMs closely mimic the histopathological and molecular features of their human counterparts, display genetic heterogeneity, and are able to spontaneously progress toward metastatic disease. As such, GEMMs are generally superior to cancer cell inoculation models, which show no or limited heterogeneity and are often metastatic from the start. Given that GEMMs capture both tumor cell-intrinsic and cell-extrinsic factors that drive de novo tumor initiation and progression toward metastatic disease, these models are indispensable for preclinical research. GEMMs have successfully been used to validate candidate cancer genes and drug targets, assess therapy efficacy, dissect the impact of the tumor microenvironment, and evaluate mechanisms of drug resistance. In vivo validation of candidate cancer genes and therapeutic targets is further accelerated by recent advances in genetic engineering that enable fast-track generation and fine-tuning of GEMMs to more closely resemble human patients. In addition, aligning preclinical tumor intervention studies in advanced GEMMs with clinical studies in patients is expected to accelerate the development of novel therapeutic strategies and their translation into the clinic. © 2016 The Authors. Published under the terms of the CC BY 4.0 license.

  6. Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Auerbach, Scott S.; Shah, Ruchir R.; Mav, Deepak

    Identification of carcinogenic activity is the primary goal of the 2-year bioassay. The expense of these studies limits the number of chemicals that can be studied and therefore chemicals need to be prioritized based on a variety of parameters. We have developed an ensemble of support vector machine classification models based on male F344 rat liver gene expression following 2, 14 or 90 days of exposure to a collection of hepatocarcinogens (aflatoxin B1, 1-amino-2,4-dibromoanthraquinone, N-nitrosodimethylamine, methyleugenol) and non-hepatocarcinogens (acetaminophen, ascorbic acid, tryptophan). Seven models were generated based on individual exposure durations (2, 14 or 90 days) or a combination ofmore » exposures (2 + 14, 2 + 90, 14 + 90 and 2 + 14 + 90 days). All sets of data, with the exception of one yielded models with 0% cross-validation error. Independent validation of the models was performed using expression data from the liver of rats exposed at 2 dose levels to a collection of alkenylbenzene flavoring agents. Depending on the model used and the exposure duration of the test data, independent validation error rates ranged from 47% to 10%. The variable with the most notable effect on independent validation accuracy was exposure duration of the alkenylbenzene test data. All models generally exhibited improved performance as the exposure duration of the alkenylbenzene data increased. The models differentiated between hepatocarcinogenic (estragole and safrole) and non-hepatocarcinogenic (anethole, eugenol and isoeugenol) alkenylbenzenes previously studied in a carcinogenicity bioassay. In the case of safrole the models correctly differentiated between carcinogenic and non-carcinogenic dose levels. The models predict that two alkenylbenzenes not previously assessed in a carcinogenicity bioassay, myristicin and isosafrole, would be weakly hepatocarcinogenic if studied at a dose level of 2 mmol/kg bw/day for 2 years in male F344 rats; therefore suggesting that these chemicals should be a higher priority relative to other untested alkenylbenzenes for evaluation in the carcinogenicity bioassay. The results of the study indicate that gene expression-based predictive models are an effective tool for identifying hepatocarcinogens. Furthermore, we find that exposure duration is a critical variable in the success or failure of such an approach, particularly when evaluating chemicals with unknown carcinogenic potency.« less

  7. Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning.

    PubMed

    Auerbach, Scott S; Shah, Ruchir R; Mav, Deepak; Smith, Cynthia S; Walker, Nigel J; Vallant, Molly K; Boorman, Gary A; Irwin, Richard D

    2010-03-15

    Identification of carcinogenic activity is the primary goal of the 2-year bioassay. The expense of these studies limits the number of chemicals that can be studied and therefore chemicals need to be prioritized based on a variety of parameters. We have developed an ensemble of support vector machine classification models based on male F344 rat liver gene expression following 2, 14 or 90 days of exposure to a collection of hepatocarcinogens (aflatoxin B1, 1-amino-2,4-dibromoanthraquinone, N-nitrosodimethylamine, methyleugenol) and non-hepatocarcinogens (acetaminophen, ascorbic acid, tryptophan). Seven models were generated based on individual exposure durations (2, 14 or 90 days) or a combination of exposures (2+14, 2+90, 14+90 and 2+14+90 days). All sets of data, with the exception of one yielded models with 0% cross-validation error. Independent validation of the models was performed using expression data from the liver of rats exposed at 2 dose levels to a collection of alkenylbenzene flavoring agents. Depending on the model used and the exposure duration of the test data, independent validation error rates ranged from 47% to 10%. The variable with the most notable effect on independent validation accuracy was exposure duration of the alkenylbenzene test data. All models generally exhibited improved performance as the exposure duration of the alkenylbenzene data increased. The models differentiated between hepatocarcinogenic (estragole and safrole) and non-hepatocarcinogenic (anethole, eugenol and isoeugenol) alkenylbenzenes previously studied in a carcinogenicity bioassay. In the case of safrole the models correctly differentiated between carcinogenic and non-carcinogenic dose levels. The models predict that two alkenylbenzenes not previously assessed in a carcinogenicity bioassay, myristicin and isosafrole, would be weakly hepatocarcinogenic if studied at a dose level of 2 mmol/kg bw/day for 2 years in male F344 rats; therefore suggesting that these chemicals should be a higher priority relative to other untested alkenylbenzenes for evaluation in the carcinogenicity bioassay. The results of the study indicate that gene expression-based predictive models are an effective tool for identifying hepatocarcinogens. Furthermore, we find that exposure duration is a critical variable in the success or failure of such an approach, particularly when evaluating chemicals with unknown carcinogenic potency. Published by Elsevier Inc.

  8. Murine models of osteosarcoma: A piece of the translational puzzle.

    PubMed

    Walia, Mannu K; Castillo-Tandazo, Wilson; Mutsaers, Anthony J; Martin, Thomas John; Walkley, Carl R

    2018-06-01

    Osteosarcoma (OS) is the most common cancer of bone in children and young adults. Despite extensive research efforts, there has been no significant improvement in patient outcome for many years. An improved understanding of the biology of this cancer and how genes frequently mutated contribute to OS may help improve outcomes for patients. While our knowledge of the mutational burden of OS is approaching saturation, our understanding of how these mutations contribute to OS initiation and maintenance is less clear. Murine models of OS have now been demonstrated to be highly valid recapitulations of human OS. These models were originally based on the frequent disruption of p53 and Rb in familial OS syndromes, which are also common mutations in sporadic OS. They have been applied to significantly improve our understanding about the functions of recurrently mutated genes in disease. The murine models can be used as a platform for preclinical testing and identifying new therapeutic targets, in addition to testing the role of additional mutations in vivo. Most recently these models have begun to be used for discovery based approaches and screens, which hold significant promise in furthering our understanding of the genetic and therapeutic sensitivities of OS. In this review, we discuss the mouse models of OS that have been reported in the last 3-5 years and newly identified pathways from these studies. Finally, we discuss the preclinical utilization of the mouse models of OS for identifying and validating actionable targets to improve patient outcome. © 2017 Wiley Periodicals, Inc.

  9. Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

    PubMed

    Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

    2007-09-21

    Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.

  10. Evaluation and Validation of Housekeeping Genes as Reference for Gene Expression Studies in Pigeonpea (Cajanus cajan) Under Drought Stress Conditions

    PubMed Central

    Sinha, Pallavi; Singh, Vikas K.; Suryanarayana, V.; Krishnamurthy, L.; Saxena, Rachit K.; Varshney, Rajeev K.

    2015-01-01

    Gene expression analysis using quantitative real-time PCR (qRT-PCR) is a very sensitive technique and its sensitivity depends on the stable performance of reference gene(s) used in the study. A number of housekeeping genes have been used in various expression studies in many crops however, their expression were found to be inconsistent under different stress conditions. As a result, species specific housekeeping genes have been recommended for different expression studies in several crop species. However, such specific housekeeping genes have not been reported in the case of pigeonpea (Cajanus cajan) despite the fact that genome sequence has become available for the crop. To identify the stable housekeeping genes in pigeonpea for expression analysis under drought stress conditions, the relative expression variations of 10 commonly used housekeeping genes (EF1α, UBQ10, GAPDH, 18SrRNA, 25SrRNA, TUB6, ACT1, IF4α, UBC and HSP90) were studied on root, stem and leaves tissues of Asha (ICPL 87119). Three statistical algorithms geNorm, NormFinder and BestKeeper were used to define the stability of candidate genes. geNorm analysis identified IF4α and TUB6 as the most stable housekeeping genes however, NormFinder analysis determined IF4α and HSP90 as the most stable housekeeping genes under drought stress conditions. Subsequently validation of the identified candidate genes was undertaken in qRT-PCR based gene expression analysis of uspA gene which plays an important role for drought stress conditions in pigeonpea. The relative quantification of the uspA gene varied according to the internal controls (stable and least stable genes), thus highlighting the importance of the choice of as well as validation of internal controls in such experiments. The identified stable and validated housekeeping genes will facilitate gene expression studies in pigeonpea especially under drought stress conditions. PMID:25849964

  11. Evaluation and validation of housekeeping genes as reference for gene expression studies in pigeonpea (Cajanus cajan) under drought stress conditions.

    PubMed

    Sinha, Pallavi; Singh, Vikas K; Suryanarayana, V; Krishnamurthy, L; Saxena, Rachit K; Varshney, Rajeev K

    2015-01-01

    Gene expression analysis using quantitative real-time PCR (qRT-PCR) is a very sensitive technique and its sensitivity depends on the stable performance of reference gene(s) used in the study. A number of housekeeping genes have been used in various expression studies in many crops however, their expression were found to be inconsistent under different stress conditions. As a result, species specific housekeeping genes have been recommended for different expression studies in several crop species. However, such specific housekeeping genes have not been reported in the case of pigeonpea (Cajanus cajan) despite the fact that genome sequence has become available for the crop. To identify the stable housekeeping genes in pigeonpea for expression analysis under drought stress conditions, the relative expression variations of 10 commonly used housekeeping genes (EF1α, UBQ10, GAPDH, 18SrRNA, 25SrRNA, TUB6, ACT1, IF4α, UBC and HSP90) were studied on root, stem and leaves tissues of Asha (ICPL 87119). Three statistical algorithms geNorm, NormFinder and BestKeeper were used to define the stability of candidate genes. geNorm analysis identified IF4α and TUB6 as the most stable housekeeping genes however, NormFinder analysis determined IF4α and HSP90 as the most stable housekeeping genes under drought stress conditions. Subsequently validation of the identified candidate genes was undertaken in qRT-PCR based gene expression analysis of uspA gene which plays an important role for drought stress conditions in pigeonpea. The relative quantification of the uspA gene varied according to the internal controls (stable and least stable genes), thus highlighting the importance of the choice of as well as validation of internal controls in such experiments. The identified stable and validated housekeeping genes will facilitate gene expression studies in pigeonpea especially under drought stress conditions.

  12. A systems approach to model the relationship between aflatoxin gene cluster expression, environmental factors, growth and toxin production by Aspergillus flavus

    PubMed Central

    Abdel-Hadi, Ahmed; Schmidt-Heydt, Markus; Parra, Roberto; Geisen, Rolf; Magan, Naresh

    2012-01-01

    A microarray analysis was used to examine the effect of combinations of water activity (aw, 0.995–0.90) and temperature (20–42°C) on the activation of aflatoxin biosynthetic genes (30 genes) in Aspergillus flavus grown on a conducive YES (20 g yeast extract, 150 g sucrose, 1 g MgSO4·7H2O) medium. The relative expression of 10 key genes (aflF, aflD, aflE, aflM, aflO, aflP, aflQ, aflX, aflR and aflS) in the biosynthetic pathway was examined in relation to different environmental factors and phenotypic aflatoxin B1 (AFB1) production. These data, plus data on relative growth rates and AFB1 production under different aw × temperature conditions were used to develop a mixed-growth-associated product formation model. The gene expression data were normalized and then used as a linear combination of the data for all 10 genes and combined with the physical model. This was used to relate gene expression to aw and temperature conditions to predict AFB1 production. The relationship between the observed AFB1 production provided a good linear regression fit to the predicted production based in the model. The model was then validated by examining datasets outside the model fitting conditions used (37°C, 40°C and different aw levels). The relationship between structural genes (aflD, aflM) in the biosynthetic pathway and the regulatory genes (aflS, aflJ) was examined in relation to aw and temperature by developing ternary diagrams of relative expression. These findings are important in developing a more integrated systems approach by combining gene expression, ecophysiological influences and growth data to predict mycotoxin production. This could help in developing a more targeted approach to develop prevention strategies to control such carcinogenic natural metabolites that are prevalent in many staple food products. The model could also be used to predict the impact of climate change on toxin production. PMID:21880616

  13. Integrating machine learning techniques into robust data enrichment approach and its application to gene expression data.

    PubMed

    Erdoğdu, Utku; Tan, Mehmet; Alhajj, Reda; Polat, Faruk; Rokne, Jon; Demetrick, Douglas

    2013-01-01

    The availability of enough samples for effective analysis and knowledge discovery has been a challenge in the research community, especially in the area of gene expression data analysis. Thus, the approaches being developed for data analysis have mostly suffered from the lack of enough data to train and test the constructed models. We argue that the process of sample generation could be successfully automated by employing some sophisticated machine learning techniques. An automated sample generation framework could successfully complement the actual sample generation from real cases. This argument is validated in this paper by describing a framework that integrates multiple models (perspectives) for sample generation. We illustrate its applicability for producing new gene expression data samples, a highly demanding area that has not received attention. The three perspectives employed in the process are based on models that are not closely related. The independence eliminates the bias of having the produced approach covering only certain characteristics of the domain and leading to samples skewed towards one direction. The first model is based on the Probabilistic Boolean Network (PBN) representation of the gene regulatory network underlying the given gene expression data. The second model integrates Hierarchical Markov Model (HIMM) and the third model employs a genetic algorithm in the process. Each model learns as much as possible characteristics of the domain being analysed and tries to incorporate the learned characteristics in generating new samples. In other words, the models base their analysis on domain knowledge implicitly present in the data itself. The developed framework has been extensively tested by checking how the new samples complement the original samples. The produced results are very promising in showing the effectiveness, usefulness and applicability of the proposed multi-model framework.

  14. MINER: exploratory analysis of gene interaction networks by machine learning from expression data.

    PubMed

    Kadupitige, Sidath Randeni; Leung, Kin Chun; Sellmeier, Julia; Sivieng, Jane; Catchpoole, Daniel R; Bain, Michael E; Gaëta, Bruno A

    2009-12-03

    The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. We have developed MINER (Microarray Interactive Network Exploration and Representation), an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.

  15. Integrating toxin gene expression, growth and fumonisin B1 and B2 production by a strain of Fusarium verticillioides under different environmental factors

    PubMed Central

    Medina, Angel; Schmidt-Heydt, Markus; Cárdenas-Chávez, Diana L.; Parra, Roberto; Geisen, Rolf; Magan, Naresh

    2013-01-01

    The objective of this study was to integrate data on the effect of water activity (aw; 0.995–0.93) and temperature (20–35°C) on activation of the biosynthetic FUM genes, growth and the mycotoxins fumonisin (FB1, FB2) by Fusarium verticillioides in vitro. The relative expression of nine biosynthetic cluster genes (FUM1, FUM7, FUM10, FUM11, FUM12, FUM13, FUM14, FUM16 and FUM19) in relation to the environmental factors was determined using a microarray analysis. The expression was related to growth and phenotypic FB1 and FB2 production. These data were used to develop a mixed-growth-associated product formation model and link this to a linear combination of the expression data for the nine genes. The model was then validated by examining datasets outside the model fitting conditions used (35°C). The relationship between the key gene (FUM1) and other genes in the cluster (FUM11, FUM13, FUM9, FUM14) were examined in relation to aw, temperature, FB1 and FB2 production by developing ternary diagrams of relative expression. This model is important in developing an integrated systems approach to develop prevention strategies to control fumonisin biosynthesis in staple food commodities and could also be used to predict the potential impact that climate change factors may have on toxin production. PMID:23697716

  16. Predicting features of breast cancer with gene expression patterns.

    PubMed

    Lu, Xuesong; Lu, Xin; Wang, Zhigang C; Iglehart, J Dirk; Zhang, Xuegong; Richardson, Andrea L

    2008-03-01

    Data from gene expression arrays hold an enormous amount of biological information. We sought to determine if global gene expression in primary breast cancers contained information about biologic, histologic, and anatomic features of the disease in individual patients. Microarray data from the tumors of 129 patients were analyzed for the ability to predict biomarkers [estrogen receptor (ER) and HER2], histologic features [grade and lymphatic-vascular invasion (LVI)], and stage parameters (tumor size and lymph node metastasis). Multiple statistical predictors were used and the prediction accuracy was determined by cross-validation error rate; multidimensional scaling (MDS) allowed visualization of the predicted states under study. Models built from gene expression data accurately predict ER and HER2 status, and divide tumor grade into high-grade and low-grade clusters; intermediate-grade tumors are not a unique group. In contrast, gene expression data is inaccurate at predicting tumor size, lymph node status or LVI. The best model for prediction of nodal status included tumor size, LVI status and pathologically defined tumor subtype (based on combinations of ER, HER2, and grade); the addition of microarray-based prediction to this model failed to improve the prediction accuracy. Global gene expression supports a binary division of ER, HER2, and grade, clearly separating tumors into two categories; intermediate values for these bio-indicators do not define intermediate tumor subsets. Results are consistent with a model of regional metastasis that depends on inherent biologic differences in metastatic propensity between breast cancer subtypes, upon which time and chance then operate.

  17. GFD-Net: A novel semantic similarity methodology for the analysis of gene networks.

    PubMed

    Díaz-Montaña, Juan J; Díaz-Díaz, Norberto; Gómez-Vela, Francisco

    2017-04-01

    Since the popularization of biological network inference methods, it has become crucial to create methods to validate the resulting models. Here we present GFD-Net, the first methodology that applies the concept of semantic similarity to gene network analysis. GFD-Net combines the concept of semantic similarity with the use of gene network topology to analyze the functional dissimilarity of gene networks based on Gene Ontology (GO). The main innovation of GFD-Net lies in the way that semantic similarity is used to analyze gene networks taking into account the network topology. GFD-Net selects a functionality for each gene (specified by a GO term), weights each edge according to the dissimilarity between the nodes at its ends and calculates a quantitative measure of the network functional dissimilarity, i.e. a quantitative value of the degree of dissimilarity between the connected genes. The robustness of GFD-Net as a gene network validation tool was demonstrated by performing a ROC analysis on several network repositories. Furthermore, a well-known network was analyzed showing that GFD-Net can also be used to infer knowledge. The relevance of GFD-Net becomes more evident in Section "GFD-Net applied to the study of human diseases" where an example of how GFD-Net can be applied to the study of human diseases is presented. GFD-Net is available as an open-source Cytoscape app which offers a user-friendly interface to configure and execute the algorithm as well as the ability to visualize and interact with the results(http://apps.cytoscape.org/apps/gfdnet). Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Dosing algorithm for warfarin using CYP2C9 and VKORC1 genotyping from a multi-ethnic population: comparison with other equations.

    PubMed

    Wu, Alan H B; Wang, Ping; Smith, Andrew; Haller, Christine; Drake, Katherine; Linder, Mark; Valdes, Roland

    2008-02-01

    Polymorphism in the genes for cytochrome (CYP)2C9 and the vitamin K epoxide reductase complex subunit 1 (VKORC1) affect the pharmacokinetics and pharmacodynamics of warfarin. We developed and validated a warfarin-dosing algorithm for a multi-ethnic population that predicts the best dose for stable anticoagulation, and compared its performance against other regression equations. We determined the allele and haplotype frequencies of genes for CYP2C9 and VKORC1 on 167 Caucasian, African-American, Asian and Hispanic patients on warfarin. On a subset where complete data were available (n=92), we developed a dosing equation that predicts the actual dose needed to maintain target anticoagulation using demographic variables and genotypes. This regression was validated against an independent group of subjects. We also applied our data to five other published warfarin-dosing equations. The allele frequency for CYP2C9*2 and *3 and the A allele for VKORC1 3673 was similar to previously published reports. For Caucasians and Asians, VKORC1 SNPs were in Hardy-Weinberg linkage equilibrium. Some VKORC1 SNPs among the African-American population and one SNP among Hispanics were not in equilibrium. The linear regression of predicted versus actual warfarin dose produced r-values of 0.71 for the training set and 0.67 for the validation set. The regression coefficient improved (to r=0.78 and 0.75, respectively) when rare genotypes were eliminated or when the 7566 VKORC1 genotype was added to the model. All of the regression models tested produced a similar degree of correlation. The exclusion of rare genotypes that are more associated with certain ethnicities improved the model. Minor improvements in algorithms can be observed with the inclusion of ethnicity and more CYP2C9 and VKORC1 SNPs as variables. Major improvements will likely require the identification of new gene associations with warfarin dosing.

  19. Domain-swapped T cell receptors improve the safety of TCR gene therapy

    PubMed Central

    Bethune, Michael T; Gee, Marvin H; Bunse, Mario; Lee, Mark S; Gschweng, Eric H; Pagadala, Meghana S; Zhou, Jing; Cheng, Donghui; Heath, James R; Kohn, Donald B; Kuhns, Michael S; Uckert, Wolfgang; Baltimore, David

    2016-01-01

    T cells engineered to express a tumor-specific αβ T cell receptor (TCR) mediate anti-tumor immunity. However, mispairing of the therapeutic αβ chains with endogenous αβ chains reduces therapeutic TCR surface expression and generates self-reactive TCRs. We report a general strategy to prevent TCR mispairing: swapping constant domains between the α and β chains of a therapeutic TCR. When paired, domain-swapped (ds)TCRs assemble with CD3, express on the cell surface, and mediate antigen-specific T cell responses. By contrast, dsTCR chains mispaired with endogenous chains cannot properly assemble with CD3 or signal, preventing autoimmunity. We validate this approach in cell-based assays and in a mouse model of TCR gene transfer-induced graft-versus-host disease. We also validate a related approach whereby replacement of αβ TCR domains with corresponding γδ TCR domains yields a functional TCR that does not mispair. This work enables the design of safer TCR gene therapies for cancer immunotherapy. DOI: http://dx.doi.org/10.7554/eLife.19095.001 PMID:27823582

  20. Microarray-based characterization of differential gene expression during vocal fold wound healing in rats

    PubMed Central

    Welham, Nathan V.; Ling, Changying; Dawson, John A.; Kendziorski, Christina; Thibeault, Susan L.; Yamashita, Masaru

    2015-01-01

    The vocal fold (VF) mucosa confers elegant biomechanical function for voice production but is susceptible to scar formation following injury. Current understanding of VF wound healing is hindered by a paucity of data and is therefore often generalized from research conducted in skin and other mucosal systems. Here, using a previously validated rat injury model, expression microarray technology and an empirical Bayes analysis approach, we generated a VF-specific transcriptome dataset to better capture the system-level complexity of wound healing in this specialized tissue. We measured differential gene expression at 3, 14 and 60 days post-injury compared to experimentally naïve controls, pursued functional enrichment analyses to refine and add greater biological definition to the previously proposed temporal phases of VF wound healing, and validated the expression and localization of a subset of previously unidentified repair- and regeneration-related genes at the protein level. Our microarray dataset is a resource for the wider research community and has the potential to stimulate new hypotheses and avenues of investigation, improve biological and mechanistic insight, and accelerate the identification of novel therapeutic targets. PMID:25592437

  1. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

    PubMed Central

    Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar

    2016-01-01

    Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311

  2. Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes.

    PubMed

    Liu, Ruifeng; AbdulHameed, Mohamed Diwan M; Wallqvist, Anders

    2017-09-25

    The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very large number of models. To address this issue, we examined the performance of a variable nearest neighbor (v-NN) method that uses information on near neighbors conforming to the principle that similar structures have similar activities. Using a data set of gene expression signatures of 13 150 compounds derived from cell-based measurements in the NIH Library of Integrated Network-based Cellular Signatures program, we were able to make predictions for 62% of the compounds in a 10-fold cross validation test, with a correlation coefficient of 0.61 between the predicted and experimentally derived signatures-a reproducibility rivaling that of high-throughput gene expression measurements. To evaluate the utility of the predicted gene expression signatures, we compared the predicted and experimentally derived signatures in their ability to identify drugs known to cause specific liver, kidney, and heart injuries. Overall, the predicted and experimentally derived signatures had similar receiver operating characteristics, whose areas under the curve ranged from 0.71 to 0.77 and 0.70 to 0.73, respectively, across the three organ injury models. However, detailed analyses of enrichment curves indicate that signatures predicted from multiple near neighbors outperformed those derived from experiments, suggesting that averaging information from near neighbors may help improve the signal from gene expression measurements. Our results demonstrate that the v-NN method can serve as a practical approach for modeling large-scale, genomewide, chemical-induced, gene expression changes.

  3. Testing founder effect speciation: Divergence population genetics of the Spoonbills Platalea regia and Pl. minor (Threskiornithidae, Aves)

    USGS Publications Warehouse

    Yeung, Carol K.L.; Tsai, Pi-Wen; Chesser, R. Terry; Lin, Rong-Chien; Yao, Cheng-Te; Tian, Xiu-Hua; Li, Shou-Hsien

    2011-01-01

    Although founder effect speciation has been a popular theoretical model for the speciation of geographically isolated taxa, its empirical importance has remained difficult to evaluate due to the intractability of past demography, which in a founder effect speciation scenario would involve a speciational bottleneck in the emergent species and the complete cessation of gene flow following divergence. Using regression-weighted approximate Bayesian computation, we tested the validity of these two fundamental conditions of founder effect speciation in a pair of sister species with disjunct distributions: the royal spoonbill Platalea regia in Australasia and the black-faced spoonbill Pl. minor in eastern Asia. When compared with genetic polymorphism observed at 20 nuclear loci in the two species, simulations showed that the founder effect speciation model had an extremely low posterior probability (1.55 × 10-8) of producing the extant genetic pattern. In contrast, speciation models that allowed for postdivergence gene flow were much more probable (posterior probabilities were 0.37 and 0.50 for the bottleneck with gene flow and the gene flow models, respectively) and postdivergence gene flow persisted for a considerable period of time (more than 80% of the divergence history in both models) following initial divergence (median = 197,000 generations, 95% credible interval [CI]: 50,000-478,000, for the bottleneck with gene flow model; and 186,000 generations, 95% CI: 45,000-477,000, for the gene flow model). Furthermore, the estimated population size reduction in Pl. regia to 7,000 individuals (median, 95% CI: 487-12,000, according to the bottleneck with gene flow model) was unlikely to have been severe enough to be considered a bottleneck. Therefore, these results do not support founder effect speciation in Pl. regia but indicate instead that the divergence between Pl. regia and Pl. minor was probably driven by selection despite continuous gene flow. In this light, we discuss the potential importance of evolutionarily labile traits with significant fitness consequences, such as migratory behavior and habitat preference, in facilitating divergence of the spoonbills.

  4. Transcriptome Wide Identification and Validation of Calcium Sensor Gene Family in the Developing Spikes of Finger Millet Genotypes for Elucidating Its Role in Grain Calcium Accumulation

    PubMed Central

    Singh, Uma M.; Chandra, Muktesh; Shankhdhar, Shailesh C.; Kumar, Anil

    2014-01-01

    Background In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. Principal Finding In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Conclusion Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species. PMID:25157851

  5. Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium

    PubMed Central

    Yang, Fengxi; Zhu, Genfa

    2015-01-01

    Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral patterning of Cymbidium and supports a valuable resource for molecular breeding of the orchid plant. PMID:26580566

  6. Transcriptome wide identification and validation of calcium sensor gene family in the developing spikes of finger millet genotypes for elucidating its role in grain calcium accumulation.

    PubMed

    Singh, Uma M; Chandra, Muktesh; Shankhdhar, Shailesh C; Kumar, Anil

    2014-01-01

    In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species.

  7. Evaluation and Validation of Reference Genes for qRT-PCR Normalization in Frankliniella occidentalis (Thysanoptera:Thripidae)

    PubMed Central

    Zheng, Yu-Tao; Li, Hong-Bo; Lu, Ming-Xing; Du, Yu-Zhou

    2014-01-01

    Quantitative real time PCR (qRT-PCR) has emerged as a reliable and reproducible technique for studying gene expression analysis. For accurate results, the normalization of data with reference genes is particularly essential. Once the transcriptome sequencing of Frankliniella occidentalis was completed, numerous unigenes were identified and annotated. Unfortunately, there are no studies on the stability of reference genes used in F. occidentalis. In this work, seven candidate reference genes, including actin, 18S rRNA, H3, tubulin, GAPDH, EF-1 and RPL32, were evaluated for their suitability as normalization genes under different experimental conditions using the statistical software programs BestKeeper, geNorm, Normfinder and the comparative ΔCt method. Because the rankings of the reference genes provided by each of the four programs were different, we chose a user-friendly web-based comprehensive tool RefFinder to get the final ranking. The result demonstrated that EF-1 and RPL32 displayed the most stable expression in different developmental stages; RPL32 and GAPDH showed the most stable expression at high temperatures, while 18S and EF-1 exhibited the most stable expression at low temperatures. In this study, we validated the suitable reference genes in F. occidentalis for gene expression profiling under different experimental conditions. The choice of internal standard is very important in the normalization of the target gene expression levels, thus validating and selecting the best genes will help improve the quality of gene expression data of F. occidentalis. What is more, these validated reference genes could serve as the basis for the selection of candidate reference genes in other insects. PMID:25356721

  8. Evaluation and validation of reference genes for qRT-PCR normalization in Frankliniella occidentalis (Thysanoptera: Thripidae).

    PubMed

    Zheng, Yu-Tao; Li, Hong-Bo; Lu, Ming-Xing; Du, Yu-Zhou

    2014-01-01

    Quantitative real time PCR (qRT-PCR) has emerged as a reliable and reproducible technique for studying gene expression analysis. For accurate results, the normalization of data with reference genes is particularly essential. Once the transcriptome sequencing of Frankliniella occidentalis was completed, numerous unigenes were identified and annotated. Unfortunately, there are no studies on the stability of reference genes used in F. occidentalis. In this work, seven candidate reference genes, including actin, 18S rRNA, H3, tubulin, GAPDH, EF-1 and RPL32, were evaluated for their suitability as normalization genes under different experimental conditions using the statistical software programs BestKeeper, geNorm, Normfinder and the comparative ΔCt method. Because the rankings of the reference genes provided by each of the four programs were different, we chose a user-friendly web-based comprehensive tool RefFinder to get the final ranking. The result demonstrated that EF-1 and RPL32 displayed the most stable expression in different developmental stages; RPL32 and GAPDH showed the most stable expression at high temperatures, while 18S and EF-1 exhibited the most stable expression at low temperatures. In this study, we validated the suitable reference genes in F. occidentalis for gene expression profiling under different experimental conditions. The choice of internal standard is very important in the normalization of the target gene expression levels, thus validating and selecting the best genes will help improve the quality of gene expression data of F. occidentalis. What is more, these validated reference genes could serve as the basis for the selection of candidate reference genes in other insects.

  9. Genomic pathways modulated by Twist in breast cancer.

    PubMed

    Vesuna, Farhad; Bergman, Yehudit; Raman, Venu

    2017-01-13

    The basic helix-loop-helix transcription factor TWIST1 (Twist) is involved in embryonic cell lineage determination and mesodermal differentiation. There is evidence to indicate that Twist expression plays a role in breast tumor formation and metastasis, but the role of Twist in dysregulating pathways that drive the metastatic cascade is unclear. Moreover, many of the genes and pathways dysregulated by Twist in cell lines and mouse models have not been validated against data obtained from larger, independant datasets of breast cancer patients. We over-expressed the human Twist gene in non-metastatic MCF-7 breast cancer cells to generate the estrogen-independent metastatic breast cancer cell line MCF-7/Twist. These cells were inoculated in the mammary fat pad of female severe compromised immunodeficient mice, which subsequently formed xenograft tumors that metastasized to the lungs. Microarray data was collected from both in vitro (MCF-7 and MCF-7/Twist cell lines) and in vivo (primary tumors and lung metastases) models of Twist expression. Our data was compared to several gene datasets of various subtypes, classes, and grades of human breast cancers. Our data establishes a Twist over-expressing mouse model of breast cancer, which metastasizes to the lung and replicates some of the ontogeny of human breast cancer progression. Gene profiling data, following Twist expression, exhibited novel metastasis driver genes as well as cellular maintenance genes that were synonymous with the metastatic process. We demonstrated that the genes and pathways altered in the transgenic cell line and metastatic animal models parallel many of the dysregulated gene pathways observed in human breast cancers. Analogous gene expression patterns were observed in both in vitro and in vivo Twist preclinical models of breast cancer metastasis and breast cancer patient datasets supporting the functional role of Twist in promoting breast cancer metastasis. The data suggests that genetic dysregulation of Twist at the cellular level drives alterations in gene pathways in the Twist metastatic mouse model which are comparable to changes seen in human breast cancers. Lastly, we have identified novel genes and pathways that could be further investigated as targets for drugs to treat metastatic breast cancer.

  10. Long non-coding RNA expression patterns in lung tissues of chronic cigarette smoke induced COPD mouse model.

    PubMed

    Zhang, Haiyun; Sun, Dejun; Li, Defu; Zheng, Zeguang; Xu, Jingyi; Liang, Xue; Zhang, Chenting; Wang, Sheng; Wang, Jian; Lu, Wenju

    2018-05-15

    Long non-coding RNAs (lncRNAs) have critical regulatory roles in protein-coding gene expression. Aberrant expression profiles of lncRNAs have been observed in various human diseases. In this study, we investigated transcriptome profiles in lung tissues of chronic cigarette smoke (CS)-induced COPD mouse model. We found that 109 lncRNAs and 260 mRNAs were significantly differential expressed in lungs of chronic CS-induced COPD mouse model compared with control animals. GO and KEGG analyses indicated that differentially expressed lncRNAs associated protein-coding genes were mainly involved in protein processing of endoplasmic reticulum pathway, and taurine and hypotaurine metabolism pathway. The combination of high throughput data analysis and the results of qRT-PCR validation in lungs of chronic CS-induced COPD mouse model, 16HBE cells with CSE treatment and PBMC from patients with COPD revealed that NR_102714 and its associated protein-coding gene UCHL1 might be involved in the development of COPD both in mouse and human. In conclusion, our study demonstrated that aberrant expression profiles of lncRNAs and mRNAs existed in lungs of chronic CS-induced COPD mouse model. From animal models perspective, these results might provide further clues to investigate biological functions of lncRNAs and their potential target protein-coding genes in the pathogenesis of COPD.

  11. A vector space model approach to identify genetically related diseases.

    PubMed

    Sarkar, Indra Neil

    2012-01-01

    The relationship between diseases and their causative genes can be complex, especially in the case of polygenic diseases. Further exacerbating the challenges in their study is that many genes may be causally related to multiple diseases. This study explored the relationship between diseases through the adaptation of an approach pioneered in the context of information retrieval: vector space models. A vector space model approach was developed that bridges gene disease knowledge inferred across three knowledge bases: Online Mendelian Inheritance in Man, GenBank, and Medline. The approach was then used to identify potentially related diseases for two target diseases: Alzheimer disease and Prader-Willi Syndrome. In the case of both Alzheimer Disease and Prader-Willi Syndrome, a set of plausible diseases were identified that may warrant further exploration. This study furthers seminal work by Swanson, et al. that demonstrated the potential for mining literature for putative correlations. Using a vector space modeling approach, information from both biomedical literature and genomic resources (like GenBank) can be combined towards identification of putative correlations of interest. To this end, the relevance of the predicted diseases of interest in this study using the vector space modeling approach were validated based on supporting literature. The results of this study suggest that a vector space model approach may be a useful means to identify potential relationships between complex diseases, and thereby enable the coordination of gene-based findings across multiple complex diseases.

  12. Genetic enhancement of macroautophagy in vertebrate models of neurodegenerative diseases.

    PubMed

    Ejlerskov, Patrick; Ashkenazi, Avraham; Rubinsztein, David C

    2018-04-03

    Most of the neurodegenerative diseases that afflict humans manifest with the intraneuronal accumulation of toxic proteins that are aggregate-prone. Extensive data in cell and neuronal models support the concept that such proteins, like mutant huntingtin or alpha-synuclein, are substrates for macroautophagy (hereafter autophagy). Furthermore, autophagy-inducing compounds lower the levels of such proteins and ameliorate their toxicity in diverse animal models of neurodegenerative diseases. However, most of these compounds also have autophagy-independent effects and it is important to understand if similar benefits are seen with genetic strategies that upregulate autophagy, as this strengthens the validity of this strategy in such diseases. Here we review studies in vertebrate models using genetic manipulations of core autophagy genes and describe how these improve pathology and neurodegeneration, supporting the validity of autophagy upregulation as a target for certain neurodegenerative diseases. Copyright © 2018 Elsevier Inc. All rights reserved.

  13. Development and validation of a gene expression-based signature to predict distant metastasis in locoregionally advanced nasopharyngeal carcinoma: a retrospective, multicentre, cohort study.

    PubMed

    Tang, Xin-Ran; Li, Ying-Qin; Liang, Shao-Bo; Jiang, Wei; Liu, Fang; Ge, Wen-Xiu; Tang, Ling-Long; Mao, Yan-Ping; He, Qing-Mei; Yang, Xiao-Jing; Zhang, Yuan; Wen, Xin; Zhang, Jian; Wang, Ya-Qin; Zhang, Pan-Pan; Sun, Ying; Yun, Jing-Ping; Zeng, Jing; Li, Li; Liu, Li-Zhi; Liu, Na; Ma, Jun

    2018-03-01

    Gene expression patterns can be used as prognostic biomarkers in various types of cancers. We aimed to identify a gene expression pattern for individual distant metastatic risk assessment in patients with locoregionally advanced nasopharyngeal carcinoma. In this multicentre, retrospective, cohort analysis, we included 937 patients with locoregionally advanced nasopharyngeal carcinoma from three Chinese hospitals: the Sun Yat-sen University Cancer Center (Guangzhou, China), the Affiliated Hospital of Guilin Medical University (Guilin, China), and the First People's Hospital of Foshan (Foshan, China). Using microarray analysis, we profiled mRNA gene expression between 24 paired locoregionally advanced nasopharyngeal carcinoma tumours from patients at Sun Yat-sen University Cancer Center with or without distant metastasis after radical treatment. Differentially expressed genes were examined using digital expression profiling in a training cohort (Guangzhou training cohort; n=410) to build a gene classifier using a penalised regression model. We validated the prognostic accuracy of this gene classifier in an internal validation cohort (Guangzhou internal validation cohort, n=204) and two external independent cohorts (Guilin cohort, n=165; Foshan cohort, n=158). The primary endpoint was distant metastasis-free survival. Secondary endpoints were disease-free survival and overall survival. We identified 137 differentially expressed genes between metastatic and non-metastatic locoregionally advanced nasopharyngeal carcinoma tissues. A distant metastasis gene signature for locoregionally advanced nasopharyngeal carcinoma (DMGN) that consisted of 13 genes was generated to classify patients into high-risk and low-risk groups in the training cohort. Patients with high-risk scores in the training cohort had shorter distant metastasis-free survival (hazard ratio [HR] 4·93, 95% CI 2·99-8·16; p<0·0001), disease-free survival (HR 3·51, 2·43-5·07; p<0·0001), and overall survival (HR 3·22, 2·18-4·76; p<0·0001) than patients with low-risk scores. The prognostic accuracy of DMGN was validated in the internal and external cohorts. Furthermore, among patients with low-risk scores in the combined training and internal cohorts, concurrent chemotherapy improved distant metastasis-free survival compared with those patients who did not receive concurrent chemotherapy (HR 0·40, 95% CI 0·19-0·83; p=0·011), whereas patients with high-risk scores did not benefit from concurrent chemotherapy (HR 1·03, 0·71-1·50; p=0·876). This was also validated in the two external cohorts combined. We developed a nomogram based on the DMGN and other variables that predicted an individual's risk of distant metastasis, which was strengthened by adding Epstein-Barr virus DNA status. The DMGN is a reliable prognostic tool for distant metastasis in patients with locoregionally advanced nasopharyngeal carcinoma and might be able to predict which patients benefit from concurrent chemotherapy. It has the potential to guide treatment decisions for patients at different risk of distant metastasis. The National Natural Science Foundation of China, the National Science & Technology Pillar Program during the Twelfth Five-year Plan Period, the Natural Science Foundation of Guang Dong Province, the National Key Research and Development Program of China, the Innovation Team Development Plan of the Ministry of Education, the Health & Medical Collaborative Innovation Project of Guangzhou City, China, and the Program of Introducing Talents of Discipline to Universities. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Massive NGS Data Analysis Reveals Hundreds Of Potential Novel Gene Fusions in Human Cell Lines.

    PubMed

    Gioiosa, Silvia; Bolis, Marco; Flati, Tiziano; Massini, Annalisa; Garattini, Enrico; Chillemi, Giovanni; Fratelli, Maddalena; Castrignanò, Tiziana

    2018-06-01

    Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets.

  15. Ion channel gene expression predicts survival in glioma patients

    PubMed Central

    Wang, Rong; Gurguis, Christopher I.; Gu, Wanjun; Ko, Eun A; Lim, Inja; Bang, Hyoweon; Zhou, Tong; Ko, Jae-Hong

    2015-01-01

    Ion channels are important regulators in cell proliferation, migration, and apoptosis. The malfunction and/or aberrant expression of ion channels may disrupt these important biological processes and influence cancer progression. In this study, we investigate the expression pattern of ion channel genes in glioma. We designate 18 ion channel genes that are differentially expressed in high-grade glioma as a prognostic molecular signature. This ion channel gene expression based signature predicts glioma outcome in three independent validation cohorts. Interestingly, 16 of these 18 genes were down-regulated in high-grade glioma. This signature is independent of traditional clinical, molecular, and histological factors. Resampling tests indicate that the prognostic power of the signature outperforms random gene sets selected from human genome in all the validation cohorts. More importantly, this signature performs better than the random gene signatures selected from glioma-associated genes in two out of three validation datasets. This study implicates ion channels in brain cancer, thus expanding on knowledge of their roles in other cancers. Individualized profiling of ion channel gene expression serves as a superior and independent prognostic tool for glioma patients. PMID:26235283

  16. An Efficient Test for Gene-Environment Interaction in Generalized Linear Mixed Models with Family Data.

    PubMed

    Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza

    2017-09-27

    Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.

  17. Systems Biology-Based Identification of Mycobacterium tuberculosis Persistence Genes in Mouse Lungs

    PubMed Central

    Dutta, Noton K.; Bandyopadhyay, Nirmalya; Veeramani, Balaji; Lamichhane, Gyanu; Karakousis, Petros C.; Bader, Joel S.

    2014-01-01

    ABSTRACT Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. PMID:24549847

  18. Meta-review of protein network regulating obesity between validated obesity candidate genes in the white adipose tissue of high-fat diet-induced obese C57BL/6J mice.

    PubMed

    Kim, Eunjung; Kim, Eun Jung; Seo, Seung-Won; Hur, Cheol-Goo; McGregor, Robin A; Choi, Myung-Sook

    2014-01-01

    Worldwide obesity and related comorbidities are increasing, but identifying new therapeutic targets remains a challenge. A plethora of microarray studies in diet-induced obesity models has provided large datasets of obesity associated genes. In this review, we describe an approach to examine the underlying molecular network regulating obesity, and we discuss interactions between obesity candidate genes. We conducted network analysis on functional protein-protein interactions associated with 25 obesity candidate genes identified in a literature-driven approach based on published microarray studies of diet-induced obesity. The obesity candidate genes were closely associated with lipid metabolism and inflammation. Peroxisome proliferator activated receptor gamma (Pparg) appeared to be a core obesity gene, and obesity candidate genes were highly interconnected, suggesting a coordinately regulated molecular network in adipose tissue. In conclusion, the current network analysis approach may help elucidate the underlying molecular network regulating obesity and identify anti-obesity targets for therapeutic intervention.

  19. [Establishment of a human bladder cancer cell line stably co-expressing hSPRY2 and luciferase genes and its subcutaneous tumor xenograft model in nude mice].

    PubMed

    Yin, Xiaotao; Li, Fanglong; Jin, Yipeng; Yin, Zhaoyang; Qi, Siyong; Wu, Shuai; Wang, Zicheng; Wang, Lin; Yu, Jiyun; Gao, Jiangping

    2017-03-01

    Objective To establish a human bladder cancer cell line stably co-expressing human sprouty2 (hSPRY2) and luciferase (Luc) genes simultaneously, and develop its subcutaneous tumor xenograft model in nude mice. Methods The hSPRY2 and Luc gene segments were amplified by PCR, and were cloned into lentiviral vector pCDH and pLVX respectively to produce corresponding lentivirus particles. The J82 human bladder cancer cells were infected with these two kinds of lentivirus particles, and then further screened by puromycin and G418. The expressions of hSPRY2 and Luc genes were detected by bioluminescence, immunofluorescence and Western blot analysis. The screened J82-hSPRY2/Luc cells were injected subcutaneously into BALB/c nude mice, and the growth of tumor was monitored dynamically using in vivo fluorescence imaging system. Results J82-hSPRY2/Luc cell line stably expressing hSPRY2 and Luc genes was established successfully. Bioluminescence, immunofluorescence and Western blot analysis validated the expressions of hSPRY2 and Luc genes. The in vivo fluorescence imaging system showed obvious fluorescence in subcutaneous tumor xenograft in nude mice. Conclusion The J82-hSPRY2/Luc bladder cancer cell line and its subcutaneous tumor xenograft model in nude mice have been established successfully.

  20. CXCL4 Contributes to the Pathogenesis of Chronic Liver Allograft Dysfunction

    PubMed Central

    Li, Jing; Shi, Yuan; Xie, Ke-Liang; Yin, Hai-Fang; Yan, Lu-nan; Lau, Wan-yee; Wang, Guo-Lin

    2016-01-01

    Chronic liver allograft dysfunction (CLAD) remains the most common cause of patient morbidity and allograft loss in liver transplant patients. However, the pathogenesis of CLAD has not been completely elucidated. By establishing rat CLAD models, in this study, we identified the informative CLAD-associated genes using isobaric tags for relative and absolute quantification (iTRAQ) proteomics analysis and validated these results in recipient rat liver allografts. CXCL4, CXCR3, EGFR, JAK2, STAT3, and Collagen IV were associated with CLAD pathogenesis. We validated that CXCL4 is upstream of these informative genes in the isolated hepatic stellate cells (HSC). Blocking CXCL4 protects against CLAD by reducing liver fibrosis. Therefore, our results indicated that therapeutic approaches that neutralize CXCL4, a newly identified target of fibrosis, may represent a novel strategy for preventing and treating CLAD after liver transplantation. PMID:28053995

  1. CXCL4 Contributes to the Pathogenesis of Chronic Liver Allograft Dysfunction.

    PubMed

    Li, Jing; Liu, Bin; Shi, Yuan; Xie, Ke-Liang; Yin, Hai-Fang; Yan, Lu-Nan; Lau, Wan-Yee; Wang, Guo-Lin

    2016-01-01

    Chronic liver allograft dysfunction (CLAD) remains the most common cause of patient morbidity and allograft loss in liver transplant patients. However, the pathogenesis of CLAD has not been completely elucidated. By establishing rat CLAD models, in this study, we identified the informative CLAD-associated genes using isobaric tags for relative and absolute quantification (iTRAQ) proteomics analysis and validated these results in recipient rat liver allografts. CXCL4, CXCR3, EGFR, JAK2, STAT3, and Collagen IV were associated with CLAD pathogenesis. We validated that CXCL4 is upstream of these informative genes in the isolated hepatic stellate cells (HSC). Blocking CXCL4 protects against CLAD by reducing liver fibrosis. Therefore, our results indicated that therapeutic approaches that neutralize CXCL4, a newly identified target of fibrosis, may represent a novel strategy for preventing and treating CLAD after liver transplantation.

  2. A Resource of Quantitative Functional Annotation for Homo sapiens Genes.

    PubMed

    Taşan, Murat; Drabkin, Harold J; Beaver, John E; Chua, Hon Nian; Dunham, Julie; Tian, Weidong; Blake, Judith A; Roth, Frederick P

    2012-02-01

    The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

  3. Reconstruction of the regulatory network for Bacillus subtilis and reconciliation with gene expression data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Faria, Jose P.; Overbeek, Ross; Taylor, Ronald C.

    Here, we introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of B. subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, wemore » reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches and small regulatory RNAs. Overall, regulatory information is included in the model for approximately 2500 of the ~4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same “ON” and “OFF” gene expression profiles across multiple samples of experimental data. We show how atomic regulons for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how atomic regulons can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.« less

  4. Reconstruction of the regulatory network for Bacillus subtilis and reconciliation with gene expression data

    DOE PAGES

    Faria, Jose P.; Overbeek, Ross; Taylor, Ronald C.; ...

    2016-03-18

    Here, we introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of B. subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, wemore » reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches and small regulatory RNAs. Overall, regulatory information is included in the model for approximately 2500 of the ~4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same “ON” and “OFF” gene expression profiles across multiple samples of experimental data. We show how atomic regulons for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how atomic regulons can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.« less

  5. A role for genetic susceptibility in sporadic focal segmental glomerulosclerosis

    PubMed Central

    Yu, Haiyang; Artomov, Mykyta; Brähler, Sebastian; Stander, M. Christine; Shamsan, Ghaidan; Sampson, Matthew G.; White, J. Michael; Kretzler, Matthias; Jain, Sanjay; Winkler, Cheryl A.; Mitra, Robi D.; Daly, Mark J.; Shaw, Andrey S.

    2016-01-01

    Focal segmental glomerulosclerosis (FSGS) is a syndrome that involves kidney podocyte dysfunction and causes chronic kidney disease. Multiple factors including chemical toxicity, inflammation, and infection underlie FSGS; however, highly penetrant disease genes have been identified in a small fraction of patients with a family history of FSGS. Variants of apolipoprotein L1 (APOL1) have been linked to FSGS in African Americans with HIV or hypertension, supporting the proposal that genetic factors enhance FSGS susceptibility. Here, we used sequencing to investigate whether genetics plays a role in the majority of FSGS cases that are identified as primary or sporadic FSGS and have no known cause. Given the limited number of biopsy-proven cases with ethnically matched controls, we devised an analytic strategy to identify and rank potential candidate genes and used an animal model for validation. Nine candidate FSGS susceptibility genes were identified in our patient cohort, and three were validated using a high-throughput mouse method that we developed. Specifically, we introduced a podocyte-specific, doxycycline-inducible transactivator into a murine embryonic stem cell line with an FSGS-susceptible genetic background that allows shRNA-mediated targeting of candidate genes in the adult kidney. Our analysis supports a broader role for genetic susceptibility of both sporadic and familial cases of FSGS and provides a tool to rapidly evaluate candidate FSGS-associated genes. PMID:26901816

  6. Analyzing Gene Expression Profiles with Preliminary Validations in Cardiac Hypertrophy Induced by Pressure-overload.

    PubMed

    Gao, Jing; Li, Yuhong; Wang, Tongmei; Shi, Zhuo; Zhang, Yiqi; Liu, Shuang; Wen, Pushuai; Ma, Chunyan

    2018-03-06

    The aim of this study was to identify the key genes involved in the cardiac hypertrophy (CH) induced by pressure overload. mRNA microarray dataset GSE5500 and GSE18801 were downloaded from GEO database, and differentially expressed genes (DEGs) were screened using Limma package; then, functional and pathway enrichment analysis were performed for common DEGs using DAVID database. Furthermore, the top DEGs were further validated using qPCR in the hypertrophic heart tissue induced by Isoprenaline (ISO). A total of 113 common DEGs with absolute fold change >0.5, including 60 significantly up-regulated DEGs and 53 down-regulated DEGs were obtained. GO term enrichment analysis suggested that common up-regulated DEG mainly enriched in neutrophil chemotaxis, extracellular fibril organization and cell proliferation, and the common down-regulated genes were significantly enriched in ion transport, endoplasmic reticulum and dendritic spine. KEGG pathway analysis found that the common DEGs were mainly enriched in ECM-receptor interaction, phagosome, and focal adhesion. Additionally, the expression of Mfap4, Ltbp2, Aspn, Serpina3n, and Cnksr1 were up-regulated in the model of cardiac hypertrophy, while the expression of Anp32a was down-regulated. The current study identified the key deregulated genes and pathways involved in the CH, which could shed new light to understand the mechanism of CH.

  7. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  8. Validation of Reference Genes for Real-Time Quantitative PCR (qPCR) Analysis of Avibacterium paragallinarum.

    PubMed

    Wen, Shuxiang; Chen, Xiaoling; Xu, Fuzhou; Sun, Huiling

    2016-01-01

    Real-time quantitative reverse transcription PCR (qRT-PCR) offers a robust method for measurement of gene expression levels. Selection of reliable reference gene(s) for gene expression study is conducive to reduce variations derived from different amounts of RNA and cDNA, the efficiency of the reverse transcriptase or polymerase enzymes. Until now reference genes identified for other members of the family Pasteurellaceae have not been validated for Avibacterium paragallinarum. The aim of this study was to validate nine reference genes of serovars A, B, and C strains of A. paragallinarum in different growth phase by qRT-PCR. Three of the most widely used statistical algorithms, geNorm, NormFinder and ΔCT method were used to evaluate the expression stability of reference genes. Data analyzed by overall rankings showed that in exponential and stationary phase of serovar A, the most stable reference genes were gyrA and atpD respectively; in exponential and stationary phase of serovar B, the most stable reference genes were atpD and recN respectively; in exponential and stationary phase of serovar C, the most stable reference genes were rpoB and recN respectively. This study provides recommendations for stable endogenous control genes for use in further studies involving measurement of gene expression levels.

  9. Novel insights into embryonic stem cell self-renewal revealed through comparative human and mouse systems biology networks.

    PubMed

    Dowell, Karen G; Simons, Allen K; Bai, Hao; Kell, Braden; Wang, Zack Z; Yun, Kyuson; Hibbs, Matthew A

    2014-05-01

    Embryonic stem cells (ESCs), characterized by their ability to both self-renew and differentiate into multiple cell lineages, are a powerful model for biomedical research and developmental biology. Human and mouse ESCs share many features, yet have distinctive aspects, including fundamental differences in the signaling pathways and cell cycle controls that support self-renewal. Here, we explore the molecular basis of human ESC self-renewal using Bayesian network machine learning to integrate cell-type-specific, high-throughput data for gene function discovery. We integrated high-throughput ESC data from 83 human studies (~1.8 million data points collected under 1,100 conditions) and 62 mouse studies (~2.4 million data points collected under 1,085 conditions) into separate human and mouse predictive networks focused on ESC self-renewal to analyze shared and distinct functional relationships among protein-coding gene orthologs. Computational evaluations show that these networks are highly accurate, literature validation confirms their biological relevance, and reverse transcriptase polymerase chain reaction (RT-PCR) validation supports our predictions. Our results reflect the importance of key regulatory genes known to be strongly associated with self-renewal and pluripotency in both species (e.g., POU5F1, SOX2, and NANOG), identify metabolic differences between species (e.g., threonine metabolism), clarify differences between human and mouse ESC developmental signaling pathways (e.g., leukemia inhibitory factor (LIF)-activated JAK/STAT in mouse; NODAL/ACTIVIN-A-activated fibroblast growth factor in human), and reveal many novel genes and pathways predicted to be functionally associated with self-renewal in each species. These interactive networks are available online at www.StemSight.org for stem cell researchers to develop new hypotheses, discover potential mechanisms involving sparsely annotated genes, and prioritize genes of interest for experimental validation. © 2013 AlphaMed Press.

  10. Systems-Wide Prediction of Enzyme Promiscuity Reveals a New Underground Alternative Route for Pyridoxal 5’-Phosphate Production in E. coli

    DOE PAGES

    Oberhardt, Matthew A.; Zarecki, Raphy; Reshef, Leah; ...

    2016-01-28

    Recent insights suggest that non-specific and/or promiscuous enzymes are common and active across life. Understanding the role of such enzymes is an important open question in biology. Here we develop a genome-wide method, PROPER, that uses a permissive PSI-BLAST approach to predict promiscuous activities of metabolic genes. Enzyme promiscuity is typically studied experimentally using multicopy suppression, in which over-expression of a promiscuous ‘replacer’ gene rescues lethality caused by inactivation of a ‘target’ gene. We use PROPER to predict multicopy suppression in Escherichia coli, achieving highly significant overlap with published cases (hypergeometric p = 4.4e-13). We then validate three novel predictedmore » target-replacer gene pairs in new multicopy suppression experiments. We next go beyond PROPER and develop a network-based approach, GEM-PROPER, that integrates PROPER with genome-scale metabolic modeling to predict promiscuous replacements via alternative metabolic pathways. GEM-PROPER predicts a new indirect replacer (thiG) for an essential enzyme (pdxB) in production of pyridoxal 5’-phosphate (the active form of Vitamin B 6), which we validate experimentally via multicopy suppression. Here, we perform a structural analysis of thiG to determine its potential promiscuous active site, which we validate experimentally by inactivating the pertaining residues and showing a loss of replacer activity. Thus, this study is a successful example where a computational investigation leads to a network-based identification of an indirect promiscuous replacement of a key metabolic enzyme, which would have been extremely difficult to identify directly.« less

  11. Automated identification of reference genes based on RNA-seq data.

    PubMed

    Carmona, Rosario; Arroyo, Macarena; Jiménez-Quesada, María José; Seoane, Pedro; Zafra, Adoración; Larrosa, Rafael; Alché, Juan de Dios; Claros, M Gonzalo

    2017-08-18

    Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs.

  12. THE INVOLVEMENT OF HUMAN MONOGENIC CARDIOMYOPATHY GENES IN EXPERIMENTAL POLYGENIC CARDIAC HYPERTROPHY.

    PubMed

    Prestes, Priscilla R; Marques, Francine Z; Lopez-Campos, Guillermo; Lewandowski, Paul; Delbridge, Lea M D; Charchar, Fadi J; Harrap, Stephen B

    2018-05-18

    Hypertrophic cardiomyopathy thickens heart muscles reducing functionality and increasing risk of cardiac disease and morbidity. Genetic factors are involved, but their contribution is poorly understood. We used the hypertrophic heart rat (HHR), a unique normotensive polygenic model of cardiac hypertrophy and heart failure to investigate the role of genes associated with monogenic human cardiomyopathy. We selected 42 genes involved in monogenic human cardiomyopathies to study: 1) DNA variants, by sequencing the whole-genome of 13-week old HHR and age-matched normal heart rat (NHR), its genetic control strain; 2) mRNA expression, by targeted RNA-sequencing in left ventricles of HHR and NHR at five ages (2-days old, 4-, 13-, 33- and 50-weeks old) compared to human idiopathic dilated data; and 3) microRNA expression, with rat microRNA microarrays in left ventricles of 2-days old HHR and age-matched NHR. We also investigated experimentally validated microRNA-mRNA interactions. Whole-genome sequencing revealed unique variants mostly located in non-coding regions of HHR and NHR. We found 29 genes differentially expressed in at least one age. Genes encoding desmoglein 2 (Dsg2) and transthyretin (Ttr) were significantly differentially expressed at all ages in the HHR, but only Ttr was also differentially expressed in human idiopathic cardiomyopathy. Lastly, only two microRNAs differentially expressed in the HHR were present in our comparison of validated microRNA-mRNA interactions. These two microRNAs interact with five of the genes studied. Our study shows that genes involved in monogenic forms of human cardiomyopathies may also influence polygenic forms of the disease.

  13. Unique attributes of cyanobacterial metabolism revealed by improved genome-scale metabolic modeling and essential gene analysis

    DOE PAGES

    Broddrick, Jared T.; Rubin, Benjamin E.; Welkie, David G.; ...

    2016-12-20

    The model cyanobacterium, Synechococcus elongatus PCC 7942, is a genetically tractable obligate phototroph that is being developed for the bioproduction of high-value chemicals. Genome-scale models (GEMs) have been successfully used to assess and engineer cellular metabolism; however, GEMs of phototrophic metabolism have been limited by the lack of experimental datasets for model validation and the challenges of incorporating photon uptake. In this paper, we develop a GEM of metabolism in S. elongatus using random barcode transposon site sequencing (RB-TnSeq) essential gene and physiological data specific to photoautotrophic metabolism. The model explicitly describes photon absorption and accounts for shading, resulting inmore » the characteristic linear growth curve of photoautotrophs. GEM predictions of gene essentiality were compared with data obtained from recent dense-transposon mutagenesis experiments. This dataset allowed major improvements to the accuracy of the model. Furthermore, discrepancies between GEM predictions and the in vivo dataset revealed biological characteristics, such as the importance of a truncated, linear TCA pathway, low flux toward amino acid synthesis from photorespiration, and knowledge gaps within nucleotide metabolism. Finally, coupling of strong experimental support and photoautotrophic modeling methods thus resulted in a highly accurate model of S. elongatus metabolism that highlights previously unknown areas of S. elongatus biology.« less

  14. Unique attributes of cyanobacterial metabolism revealed by improved genome-scale metabolic modeling and essential gene analysis

    PubMed Central

    Broddrick, Jared T.; Rubin, Benjamin E.; Welkie, David G.; Du, Niu; Mih, Nathan; Diamond, Spencer; Lee, Jenny J.; Golden, Susan S.; Palsson, Bernhard O.

    2016-01-01

    The model cyanobacterium, Synechococcus elongatus PCC 7942, is a genetically tractable obligate phototroph that is being developed for the bioproduction of high-value chemicals. Genome-scale models (GEMs) have been successfully used to assess and engineer cellular metabolism; however, GEMs of phototrophic metabolism have been limited by the lack of experimental datasets for model validation and the challenges of incorporating photon uptake. Here, we develop a GEM of metabolism in S. elongatus using random barcode transposon site sequencing (RB-TnSeq) essential gene and physiological data specific to photoautotrophic metabolism. The model explicitly describes photon absorption and accounts for shading, resulting in the characteristic linear growth curve of photoautotrophs. GEM predictions of gene essentiality were compared with data obtained from recent dense-transposon mutagenesis experiments. This dataset allowed major improvements to the accuracy of the model. Furthermore, discrepancies between GEM predictions and the in vivo dataset revealed biological characteristics, such as the importance of a truncated, linear TCA pathway, low flux toward amino acid synthesis from photorespiration, and knowledge gaps within nucleotide metabolism. Coupling of strong experimental support and photoautotrophic modeling methods thus resulted in a highly accurate model of S. elongatus metabolism that highlights previously unknown areas of S. elongatus biology. PMID:27911809

  15. Unique attributes of cyanobacterial metabolism revealed by improved genome-scale metabolic modeling and essential gene analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Broddrick, Jared T.; Rubin, Benjamin E.; Welkie, David G.

    The model cyanobacterium, Synechococcus elongatus PCC 7942, is a genetically tractable obligate phototroph that is being developed for the bioproduction of high-value chemicals. Genome-scale models (GEMs) have been successfully used to assess and engineer cellular metabolism; however, GEMs of phototrophic metabolism have been limited by the lack of experimental datasets for model validation and the challenges of incorporating photon uptake. In this paper, we develop a GEM of metabolism in S. elongatus using random barcode transposon site sequencing (RB-TnSeq) essential gene and physiological data specific to photoautotrophic metabolism. The model explicitly describes photon absorption and accounts for shading, resulting inmore » the characteristic linear growth curve of photoautotrophs. GEM predictions of gene essentiality were compared with data obtained from recent dense-transposon mutagenesis experiments. This dataset allowed major improvements to the accuracy of the model. Furthermore, discrepancies between GEM predictions and the in vivo dataset revealed biological characteristics, such as the importance of a truncated, linear TCA pathway, low flux toward amino acid synthesis from photorespiration, and knowledge gaps within nucleotide metabolism. Finally, coupling of strong experimental support and photoautotrophic modeling methods thus resulted in a highly accurate model of S. elongatus metabolism that highlights previously unknown areas of S. elongatus biology.« less

  16. Single-step generation of rabbits carrying a targeted allele of the tyrosinase gene using CRISPR/Cas9.

    PubMed

    Honda, Arata; Hirose, Michiko; Sankai, Tadashi; Yasmin, Lubna; Yuzawa, Kazuaki; Honsho, Kimiko; Izu, Haruna; Iguchi, Atsushi; Ikawa, Masahito; Ogura, Atsuo

    2015-01-01

    Targeted genome editing of nonrodent mammalian species has provided the potential for highly accurate interventions into gene function in humans and the generation of useful animal models of human diseases. Here we show successful clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated (Cas)-mediated gene targeting via circular plasmid injection in rabbits. The rabbit tyrosinase gene (TYR) was effectively disrupted, and we confirmed germline transmission by pronuclear injection of a circular plasmid expressing humanized Cas9 (hCas9) and single-guide RNA. Direct injection into pronuclear stage zygotes was possible following an in vitro validation assay. Neither off-target mutagenesis nor hCas9 transgenesis was detected in any of the genetically targeted pups and embryos examined. Gene targeting with this rapid and simplified strategy will help accelerate the development of translational research using other nonrodent mammalian species.

  17. An Arabidopsis Gene Regulatory Network for Secondary Cell Wall Synthesis

    PubMed Central

    Taylor-Teeples, M; Lin, L; de Lucas, M; Turco, G; Toal, TW; Gaudinier, A; Young, NF; Trabucco, GM; Veling, MT; Lamothe, R; Handakumbura, PP; Xiong, G; Wang, C; Corwin, J; Tsoukalas, A; Zhang, L; Ware, D; Pauly, M; Kliebenstein, DJ; Dehesh, K; Tagkopoulos, I; Breton, G; Pruneda-Paz, JL; Ahnert, SE; Kay, SA; Hazen, SP; Brady, SM

    2014-01-01

    Summary The plant cell wall is an important factor for determining cell shape, function and response to the environment. Secondary cell walls, such as those found in xylem, are composed of cellulose, hemicelluloses and lignin and account for the bulk of plant biomass. The coordination between transcriptional regulation of synthesis for each polymer is complex and vital to cell function. A regulatory hierarchy of developmental switches has been proposed, although the full complement of regulators remains unknown. Here, we present a protein-DNA network between Arabidopsis transcription factors and secondary cell wall metabolic genes with gene expression regulated by a series of feed-forward loops. This model allowed us to develop and validate new hypotheses about secondary wall gene regulation under abiotic stress. Distinct stresses are able to perturb targeted genes to potentially promote functional adaptation. These interactions will serve as a foundation for understanding the regulation of a complex, integral plant component. PMID:25533953

  18. Single-step generation of rabbits carrying a targeted allele of the tyrosinase gene using CRISPR/Cas9

    PubMed Central

    Honda, Arata; Hirose, Michiko; Sankai, Tadashi; Yasmin, Lubna; Yuzawa, Kazuaki; Honsho, Kimiko; Izu, Haruna; Iguchi, Atsushi; Ikawa, Masahito; Ogura, Atsuo

    2014-01-01

    Targeted genome editing of nonrodent mammalian species has provided the potential for highly accurate interventions into gene function in humans and the generation of useful animal models of human diseases. Here we show successful clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated (Cas)-mediated gene targeting via circular plasmid injection in rabbits. The rabbit tyrosinase gene (TYR) was effectively disrupted, and we confirmed germline transmission by pronuclear injection of a circular plasmid expressing humanized Cas9 (hCas9) and single-guide RNA. Direct injection into pronuclear stage zygotes was possible following an in vitro validation assay. Neither off-target mutagenesis nor hCas9 transgenesis was detected in any of the genetically targeted pups and embryos examined. Gene targeting with this rapid and simplified strategy will help accelerate the development of translational research using other nonrodent mammalian species. PMID:25195632

  19. Effect of Temperature on Synthetic Positive and Negative Feedback Gene Networks

    NASA Astrophysics Data System (ADS)

    Charlebois, Daniel A.; Marshall, Sylvia; Balazsi, Gabor

    Synthetic biological systems are built and tested under well controlled laboratory conditions. How altering the environment, such as the ambient temperature affects their function is not well understood. To address this question for synthetic gene networks with positive and negative feedback, we used mathematical modeling coupled with experiments in the budding yeast Saccharomyces cerevisiae. We found that cellular growth rates and gene expression dose responses change significantly at temperatures above and below the physiological optimum for yeast. Gene expression distributions for the negative feedback-based circuit changed from unimodal to bimodal at high temperature, while the bifurcation point of the positive feedback circuit shifted up with temperature. These results demonstrate that synthetic gene network function is context-dependent. Temperature effects should thus be tested and incorporated into their design and validation for real-world applications. NSERC Postdoctoral Fellowship (Grant No. PDF-453977-2014).

  20. The emergence of overlapping scale-free genetic architecture in digital organisms.

    PubMed

    Gerlee, P; Lundh, T

    2008-01-01

    We have studied the evolution of genetic architecture in digital organisms and found that the gene overlap follows a scale-free distribution, which is commonly found in metabolic networks of many organisms. Our results show that the slope of the scale-free distribution depends on the mutation rate and that the gene development is driven by expansion of already existing genes, which is in direct correspondence to the preferential growth algorithm that gives rise to scale-free networks. To further validate our results we have constructed a simple model of gene development, which recapitulates the results from the evolutionary process and shows that the mutation rate affects the tendency of genes to cluster. In addition we could relate the slope of the scale-free distribution to the genetic complexity of the organisms and show that a high mutation rate gives rise to a more complex genetic architecture.

  1. confFuse: High-Confidence Fusion Gene Detection across Tumor Entities.

    PubMed

    Huang, Zhiqin; Jones, David T W; Wu, Yonghe; Lichter, Peter; Zapatka, Marc

    2017-01-01

    Background: Fusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant. Results: confFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate. Conclusions: confFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.

  2. Systems Nutrigenomics Reveals Brain Gene Networks Linking Metabolic and Brain Disorders.

    PubMed

    Meng, Qingying; Ying, Zhe; Noble, Emily; Zhao, Yuqi; Agrawal, Rahul; Mikhail, Andrew; Zhuang, Yumei; Tyagi, Ethika; Zhang, Qing; Lee, Jae-Hyung; Morselli, Marco; Orozco, Luz; Guo, Weilong; Kilts, Tina M; Zhu, Jun; Zhang, Bin; Pellegrini, Matteo; Xiao, Xinshu; Young, Marian F; Gomez-Pinilla, Fernando; Yang, Xia

    2016-05-01

    Nutrition plays a significant role in the increasing prevalence of metabolic and brain disorders. Here we employ systems nutrigenomics to scrutinize the genomic bases of nutrient-host interaction underlying disease predisposition or therapeutic potential. We conducted transcriptome and epigenome sequencing of hypothalamus (metabolic control) and hippocampus (cognitive processing) from a rodent model of fructose consumption, and identified significant reprogramming of DNA methylation, transcript abundance, alternative splicing, and gene networks governing cell metabolism, cell communication, inflammation, and neuronal signaling. These signals converged with genetic causal risks of metabolic, neurological, and psychiatric disorders revealed in humans. Gene network modeling uncovered the extracellular matrix genes Bgn and Fmod as main orchestrators of the effects of fructose, as validated using two knockout mouse models. We further demonstrate that an omega-3 fatty acid, DHA, reverses the genomic and network perturbations elicited by fructose, providing molecular support for nutritional interventions to counteract diet-induced metabolic and brain disorders. Our integrative approach complementing rodent and human studies supports the applicability of nutrigenomics principles to predict disease susceptibility and to guide personalized medicine. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  3. Validation of Suitable Reference Genes for Expression Normalization in Echinococcus spp. Larval Stages

    PubMed Central

    Espínola, Sergio Martin; Ferreira, Henrique Bunselmeyer; Zaha, Arnaldo

    2014-01-01

    In recent years, a significant amount of sequence data (both genomic and transcriptomic) for Echinococcus spp. has been published, thereby facilitating the analysis of genes expressed during a specific stage or involved in parasite development. To perform a suitable gene expression quantification analysis, the use of validated reference genes is strongly recommended. Thus, the aim of this work was to identify suitable reference genes to allow reliable expression normalization for genes of interest in Echinococcus granulosus sensu stricto (s.s.) (G1) and Echinococcus ortleppi upon induction of the early pre-adult development. Untreated protoscoleces (PS) and pepsin-treated protoscoleces (PSP) from E. granulosus s.s. (G1) and E. ortleppi metacestode were used. The gene expression stability of eleven candidate reference genes (βTUB, NDUFV2, RPL13, TBP, CYP-1, RPII, EF-1α, βACT-1, GAPDH, ETIF4A-III and MAPK3) was assessed using geNorm, Normfinder, and RefFinder. Our qPCR data showed a good correlation with the recently published RNA-seq data. Regarding expression stability, EF-1α and TBP were the most stable genes for both species. Interestingly, βACT-1 (the most commonly used reference gene), and GAPDH and ETIF4A-III (previously identified as housekeeping genes) did not behave stably in our assay conditions. We propose the use of EF-1α as a reference gene for studies involving gene expression analysis in both PS and PSP experimental conditions for E. granulosus s.s. and E. ortleppi. To demonstrate its applicability, EF-1α was used as a normalizer gene in the relative quantification of transcripts from genes coding for antigen B subunits. The same EF-1α reference gene may be used in studies with other Echinococcus sensu lato species. This report validates suitable reference genes for species of class Cestoda, phylum Platyhelminthes, thus providing a foundation for further validation in other epidemiologically important cestode species, such as those from the Taenia genus. PMID:25014071

  4. Validation of suitable reference genes for expression normalization in Echinococcus spp. larval stages.

    PubMed

    Espínola, Sergio Martin; Ferreira, Henrique Bunselmeyer; Zaha, Arnaldo

    2014-01-01

    In recent years, a significant amount of sequence data (both genomic and transcriptomic) for Echinococcus spp. has been published, thereby facilitating the analysis of genes expressed during a specific stage or involved in parasite development. To perform a suitable gene expression quantification analysis, the use of validated reference genes is strongly recommended. Thus, the aim of this work was to identify suitable reference genes to allow reliable expression normalization for genes of interest in Echinococcus granulosus sensu stricto (s.s.) (G1) and Echinococcus ortleppi upon induction of the early pre-adult development. Untreated protoscoleces (PS) and pepsin-treated protoscoleces (PSP) from E. granulosus s.s. (G1) and E. ortleppi metacestode were used. The gene expression stability of eleven candidate reference genes (βTUB, NDUFV2, RPL13, TBP, CYP-1, RPII, EF-1α, βACT-1, GAPDH, ETIF4A-III and MAPK3) was assessed using geNorm, Normfinder, and RefFinder. Our qPCR data showed a good correlation with the recently published RNA-seq data. Regarding expression stability, EF-1α and TBP were the most stable genes for both species. Interestingly, βACT-1 (the most commonly used reference gene), and GAPDH and ETIF4A-III (previously identified as housekeeping genes) did not behave stably in our assay conditions. We propose the use of EF-1α as a reference gene for studies involving gene expression analysis in both PS and PSP experimental conditions for E. granulosus s.s. and E. ortleppi. To demonstrate its applicability, EF-1α was used as a normalizer gene in the relative quantification of transcripts from genes coding for antigen B subunits. The same EF-1α reference gene may be used in studies with other Echinococcus sensu lato species. This report validates suitable reference genes for species of class Cestoda, phylum Platyhelminthes, thus providing a foundation for further validation in other epidemiologically important cestode species, such as those from the Taenia genus.

  5. Identification and validation of biomarkers of IgV(H) mutation status in chronic lymphocytic leukemia using microfluidics quantitative real-time polymerase chain reaction technology.

    PubMed

    Abruzzo, Lynne V; Barron, Lynn L; Anderson, Keith; Newman, Rachel J; Wierda, William G; O'brien, Susan; Ferrajoli, Alessandra; Luthra, Madan; Talwalkar, Sameer; Luthra, Rajyalakshmi; Jones, Dan; Keating, Michael J; Coombes, Kevin R

    2007-09-01

    To develop a model incorporating relevant prognostic biomarkers for untreated chronic lymphocytic leukemia patients, we re-analyzed the raw data from four published gene expression profiling studies. We selected 88 candidate biomarkers linked to immunoglobulin heavy-chain variable region gene (IgV(H)) mutation status and produced a reliable and reproducible microfluidics quantitative real-time polymerase chain reaction array. We applied this array to a training set of 29 purified samples from previously untreated patients. In an unsupervised analysis, the samples clustered into two groups. Using a cutoff point of 2% homology to the germline IgV(H) sequence, one group contained all 14 IgV(H)-unmutated samples; the other contained all 15 mutated samples. We confirmed the differential expression of 37 of the candidate biomarkers using two-sample t-tests. Next, we constructed 16 different models to predict IgV(H) mutation status and evaluated their performance on an independent test set of 20 new samples. Nine models correctly classified 11 of 11 IgV(H)-mutated cases and eight of nine IgV(H)-unmutated cases, with some models using three to seven genes. Thus, we can classify cases with 95% accuracy based on the expression of as few as three genes.

  6. Genomic Models of Short-Term Exposure Accurately Predict Long-Term Chemical Carcinogenicity and Identify Putative Mechanisms of Action

    PubMed Central

    Gusenleitner, Daniel; Auerbach, Scott S.; Melia, Tisha; Gómez, Harold F.; Sherr, David H.; Monti, Stefano

    2014-01-01

    Background Despite an overall decrease in incidence of and mortality from cancer, about 40% of Americans will be diagnosed with the disease in their lifetime, and around 20% will die of it. Current approaches to test carcinogenic chemicals adopt the 2-year rodent bioassay, which is costly and time-consuming. As a result, fewer than 2% of the chemicals on the market have actually been tested. However, evidence accumulated to date suggests that gene expression profiles from model organisms exposed to chemical compounds reflect underlying mechanisms of action, and that these toxicogenomic models could be used in the prediction of chemical carcinogenicity. Results In this study, we used a rat-based microarray dataset from the NTP DrugMatrix Database to test the ability of toxicogenomics to model carcinogenicity. We analyzed 1,221 gene-expression profiles obtained from rats treated with 127 well-characterized compounds, including genotoxic and non-genotoxic carcinogens. We built a classifier that predicts a chemical's carcinogenic potential with an AUC of 0.78, and validated it on an independent dataset from the Japanese Toxicogenomics Project consisting of 2,065 profiles from 72 compounds. Finally, we identified differentially expressed genes associated with chemical carcinogenesis, and developed novel data-driven approaches for the molecular characterization of the response to chemical stressors. Conclusion Here, we validate a toxicogenomic approach to predict carcinogenicity and provide strong evidence that, with a larger set of compounds, we should be able to improve the sensitivity and specificity of the predictions. We found that the prediction of carcinogenicity is tissue-dependent and that the results also confirm and expand upon previous studies implicating DNA damage, the peroxisome proliferator-activated receptor, the aryl hydrocarbon receptor, and regenerative pathology in the response to carcinogen exposure. PMID:25058030

  7. Genome-Wide Expression Profiling of Five Mouse Models Identifies Similarities and Differences with Human Psoriasis

    PubMed Central

    Swindell, William R.; Johnston, Andrew; Carbajal, Steve; Han, Gangwen; Wohn, Christian; Lu, Jun; Xing, Xianying; Nair, Rajan P.; Voorhees, John J.; Elder, James T.; Wang, Xiao-Jing; Sano, Shigetoshi; Prens, Errol P.; DiGiovanni, John; Pittelkow, Mark R.; Ward, Nicole L.; Gudjonsson, Johann E.

    2011-01-01

    Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the human disease, and standardized validation criteria for psoriasis mouse models have not been widely applied. In this study, whole-genome transcriptional profiling is used to compare gene expression patterns manifested by human psoriatic skin lesions with those that occur in five psoriasis mouse models (K5-Tie2, imiquimod, K14-AREG, K5-Stat3C and K5-TGFbeta1). While the cutaneous gene expression profiles associated with each mouse phenotype exhibited statistically significant similarity to the expression profile of psoriasis in humans, each model displayed distinctive sets of similarities and differences in comparison to human psoriasis. For all five models, correspondence to the human disease was strong with respect to genes involved in epidermal development and keratinization. Immune and inflammation-associated gene expression, in contrast, was more variable between models as compared to the human disease. These findings support the value of all five models as research tools, each with identifiable areas of convergence to and divergence from the human disease. Additionally, the approach used in this paper provides an objective and quantitative method for evaluation of proposed mouse models of psoriasis, which can be strategically applied in future studies to score strengths of mouse phenotypes relative to specific aspects of human psoriasis. PMID:21483750

  8. Identification of reference genes and validation for gene expression studies in diverse axolotl (Ambystoma mexicanum) tissues.

    PubMed

    Guelke, Eileen; Bucan, Vesna; Liebsch, Christina; Lazaridis, Andrea; Radtke, Christine; Vogt, Peter M; Reimers, Kerstin

    2015-04-10

    For the precise quantitative RT-PCR normalization a set of valid reference genes is obligatory. Moreover have to be taken into concern the experimental conditions as they bias the regulation of reference genes. Up till now, no reference targets have been described for the axolotl (Ambystoma mexicanum). In a search in the public database SalSite for genetic information of the axolotl we identified fourteen presumptive reference genes, eleven of which were further tested for their gene expression stability. This study characterizes the expressional patterns of 11 putative endogenous control genes during axolotl limb regeneration and in an axolotl tissue panel. All 11 reference genes showed variable expression. Strikingly, ACTB was to be found most stable expressed in all comparative tissue groups, so we reason it to be suitable for all different kinds of axolotl tissue-type investigations. Moreover do we suggest GAPDH and RPLP0 as suitable for certain axolotl tissue analysis. When it comes to axolotl limb regeneration, a validated pair of reference genes is ODC and RPLP0. With these findings, new insights into axolotl gene expression profiling might be gained. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.

    PubMed

    Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K

    2017-11-01

    Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Translational Profiles of Medullary Myofibroblasts during Kidney Fibrosis

    PubMed Central

    Grgic, Ivica; Krautzberger, A. Michaela; Hofmeister, Andreas; Lalli, Matthew; DiRocco, Derek P.; Fleig, Susanne V.; Liu, Jing; Duffield, Jeremy S.; McMahon, Andrew P.; Aronow, Bruce

    2014-01-01

    Myofibroblasts secrete matrix during chronic injury, and their ablation ameliorates fibrosis. Development of new biomarkers and therapies for CKD will be aided by a detailed analysis of myofibroblast gene expression during the early stages of fibrosis. However, dissociating myofibroblasts from fibrotic kidney is challenging. We therefore adapted translational ribosome affinity purification (TRAP) to isolate and profile mRNA from myofibroblasts and their precursors during kidney fibrosis. We generated and characterized a transgenic mouse expressing an enhanced green fluorescent protein (eGFP)–tagged L10a ribosomal subunit protein under control of the collagen1α1 promoter. We developed a one-step procedure for isolation of polysomal RNA from collagen1α1-eGFPL10a mice subject to unilateral ureteral obstruction and analyzed and validated the resulting transcriptional profiles. Pathway analysis revealed strong gene signatures for cell proliferation, migration, and shape change. Numerous novel genes and candidate biomarkers were upregulated during fibrosis, specifically in myofibroblasts, and we validated these results by quantitative PCR, in situ, and Western blot analysis. This study provides a comprehensive analysis of early myofibroblast gene expression during kidney fibrosis and introduces a new technique for cell-specific polysomal mRNA isolation in kidney injury models that is suited for RNA-sequencing technologies. PMID:24652793

  11. Action of multiple intra-QTL genes concerted around a co-localized transcription factor underpins a large effect QTL

    PubMed Central

    Dixit, Shalabh; Kumar Biswal, Akshaya; Min, Aye; Henry, Amelia; Oane, Rowena H.; Raorane, Manish L.; Longkumer, Toshisangba; Pabuayon, Isaiah M.; Mutte, Sumanth K.; Vardarajan, Adithi R.; Miro, Berta; Govindan, Ganesan; Albano-Enriquez, Blesilda; Pueffeld, Mandy; Sreenivasulu, Nese; Slamet-Loedin, Inez; Sundarvelpandian, Kalaipandian; Tsai, Yuan-Ching; Raghuvanshi, Saurabh; Hsing, Yue-Ie C.; Kumar, Arvind; Kohli, Ajay

    2015-01-01

    Sub-QTLs and multiple intra-QTL genes are hypothesized to underpin large-effect QTLs. Known QTLs over gene families, biosynthetic pathways or certain traits represent functional gene-clusters of genes of the same gene ontology (GO). Gene-clusters containing genes of different GO have not been elaborated, except in silico as coexpressed genes within QTLs. Here we demonstrate the requirement of multiple intra-QTL genes for the full impact of QTL qDTY12.1 on rice yield under drought. Multiple evidences are presented for the need of the transcription factor ‘no apical meristem’ (OsNAM12.1) and its co-localized target genes of separate GO categories for qDTY12.1 function, raising a regulon-like model of genetic architecture. The molecular underpinnings of qDTY12.1 support its effectiveness in further improving a drought tolerant genotype and for its validity in multiple genotypes/ecosystems/environments. Resolving the combinatorial value of OsNAM12.1 with individual intra-QTL genes notwithstanding, identification and analyses of qDTY12.1has fast-tracked rice improvement towards food security. PMID:26507552

  12. Validation of predictive models for germline mutations in DNA mismatch repair genes in colorectal cancer.

    PubMed

    Monzon, Jose G; Cremin, Carol; Armstrong, Linlea; Nuk, Jennifer; Young, Sean; Horsman, Doug E; Garbutt, Kristy; Bajdik, Chris D; Gill, Sharlene

    2010-02-15

    Lynch syndrome is defined by the presence of germline mutations in mismatch repair (MMR) genes. Several models have been recently devised that predict mutation carrier status (Myriad Genetics, Wijnen, Barnetson, PREMM and MMRpro models). Families at moderate-high risk for harboring a Lynch-associated mutation, referred to the BC Cancer Agency (BCCA) Hereditary Cancer Program (HCP), underwent mutation analysis, immunohistochemistry and/or microsatellite testing. Seventy-two tested cases were included. Twenty-five patients were mutation positive (34.7%) and 47 were mutation negative (65.3%). Nineteen of 43 patients who were both microsatellite stable and normal on immunohistochemistry for MLH1 and MSH2 were also genotyped for mutations in these genes; all 19 were negative for MMR gene mutations. Model-derived probabilities of harboring a MMR gene mutation in the proband were calculated and compared to observed results. The area under the ROC curves were 0.75 (95%CI; 0.63-0.87), 0.86 (0.7-0.96), 0.89 (0.82-0.97), 0.89 (0.81-0.98) and 0.93 (0.86-0.99) for the Myriad, Barnetson, Wijnen, MMRpro and PREMM models, respectively. The Amsterdam II criteria had a sensitivity and specificity of 0.76 and 0.74, respectively, in this cohort. The PREMM model demonstrated the best performance for predicting carrier status based on the positive likelihood ratios at the >10%, >20% and >30% probability thresholds. In this referred cohort, the PREMM model had the most favorable concordance index and predictive performance for carrier status based on the positive LR. These prediction models (PREMM, MMRPro and Wijnen) may soon replace the Amsterdam II and revised Bethesda criteria as a prescreening tool for Lynch mutations.

  13. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

    PubMed

    Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

    2015-06-25

    Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

  14. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    PubMed Central

    Seaver, Samuel M. D.; Bradbury, Louis M. T.; Frelin, Océane; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

    2015-01-01

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes. PMID:25806041

  15. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    DOE PAGES

    Seaver, Samuel M.D.; Bradbury, Louis M.T.; Frelin, Océane; ...

    2015-03-10

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions andmore » possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.« less

  16. Endothelial pro-atherosclerotic response to extracellular diabetic-like environment: possible role of thioredoxin-interacting protein.

    PubMed

    Zitman-Gal, Tali; Green, Janice; Pasmanik-Chor, Metsada; Oron-Karni, Varda; Bernheim, Jacques

    2010-07-01

    BACKGROUND. High blood and tissue concentrations of glucose and advanced glycation end-products (AGEs) are thought to play an important role in the development of vascular diabetic complications. Therefore, the impact of extracellular AGEs and different glucose concentrations was evaluated by studying the gene expressions and the underlying cellular pathways involved in the development of inflammatory pro-atherosclerotic processes observed in cultured endothelial cells. METHODS. Fresh human umbilical vein cord endothelial cells (HUVEC) were treated in the presence of elevated extracellular glucose concentrations (5.5-28 mmol/l) with and without AGE-human serum albumin (HSA). Affymetrix GeneChip(R) Human Gene 1.0 ST arrays were used for gene expression analysis (total 20 chips). Genes of interest were further validated using real-time PCR and western blot techniques. RESULTS. Microarray analysis revealed significant changes in some gene expressions in the presence of the different stimuli, suggesting that different pathways are involved. Six genes were selected for validation as follows: thioredoxin-interacting protein (TXNIP), thioredoxin (TXN), nuclear factor of kappa B (NF-kappaB), interleukin 6 (IL6), interleukin 8 (IL8) and receptor of advanced glycation end-products (RAGE). Interestingly, it was found that the association of AGEs together with the highest pathophysiological concentration of glucose (28 mmol/l) diminished the expression of these specific genes, excluding TXN. CONCLUSIONS. In the present model that mimics a diabetic environment, the relatively short-term experimental conditions used showed an unexpected blunting action of AGEs in the presence of the highest glucose concentration (28 mmol/l). The interactive cellular pathways involved in these processes should be further investigated.

  17. Immunological network analysis in HPV associated head and neck squamous cancer and implications for disease prognosis.

    PubMed

    Chen, Xiaohang; Yan, Bingqing; Lou, Huihuang; Shen, Zhenji; Tong, Fangjia; Zhai, Aixia; Wei, Lanlan; Zhang, Fengmin

    2018-04-01

    Human papillomavirus-positive (HPV+) head and neck squamous cell cancer (HNSCC) exhibits a better prognosis than HPV-negative (HPV-) HNSCC. This difference may in part be due to enhanced immune activation in the HPV+ HNSCC tumor microenvironment. To characterize differences in immune activation between HPV+ and HPV- HNSCC tumors, we identified and annotated differentially expressed genes based upon mRNA expression data from The Cancer Genome Atlas (TCGA). Immune network between immune cells and cytokines was constructed by using single sample Gene Set Enrichment Analysis and conditional mutual information. Multivariate Cox regression analysis was used to determine the prognostic value of immune microenvironment characterization. A total of 1673 differentially expressed genes were functionally annotated. We found that genes upregulated in HPV+ HNSCC are enriched in immune-associated processes. And the up-regulated gene sets were validated by Gene Set Enrichment Analysis. The microenvironment of HPV+ HNSCC exhibited greater numbers of infiltrating B and T cells and fewer neutrophils than HPV- HNSCC. These findings were validated by two independent datasets in the Gene Expression Omnibus (GEO) database. Further analyses of T cell subtypes revealed that cytotoxic T cell subtypes predominated in HPV+ HNSCC. In addition, the ratio of M1/M2 macrophages was much higher in HPV+ HNSCC. The infiltration of these immune cells was correlated with differentially expressed cytokine-associated genes. Enhanced infiltration of B cells and CD8+ T cells were identified as independent protective factors, while high neutrophil infiltration was a risk enhancing factor for HPV+ HNSCC patients. A schematic model of immunological network was established for HPV+ HNSCC to summarize our findings. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Whole Blood Gene Expression Profile Associated with Spontaneous Preterm Birth in Women with Threatened Preterm Labor

    PubMed Central

    Heng, Yujing Jan; Pennell, Craig Edward; Chua, Hon Nian; Perkins, Jonathan Edward; Lye, Stephen James

    2014-01-01

    Threatened preterm labor (TPTL) is defined as persistent premature uterine contractions between 20 and 37 weeks of gestation and is the most common condition that requires hospitalization during pregnancy. Most of these TPTL women continue their pregnancies to term while only an estimated 5% will deliver a premature baby within ten days. The aim of this work was to study differential whole blood gene expression associated with spontaneous preterm birth (sPTB) within 48 hours of hospital admission. Peripheral blood was collected at point of hospital admission from 154 women with TPTL before any medical treatment. Microarrays were utilized to investigate differential whole blood gene expression between TPTL women who did (n = 48) or did not have a sPTB (n = 106) within 48 hours of admission. Total leukocyte and neutrophil counts were significantly higher (35% and 41% respectively) in women who had sPTB than women who did not deliver within 48 hours (p<0.001). Fetal fibronectin (fFN) test was performed on 62 women. There was no difference in the urine, vaginal and placental microbiology and histopathology reports between the two groups of women. There were 469 significant differentially expressed genes (FDR<0.05); 28 differentially expressed genes were chosen for microarray validation using qRT-PCR and 20 out of 28 genes were successfully validated (p<0.05). An optimal random forest classifier model to predict sPTB was achieved using the top nine differentially expressed genes coupled with peripheral clinical blood data (sensitivity 70.8%, specificity 75.5%). These differentially expressed genes may further elucidate the underlying mechanisms of sPTB and pave the way for future systems biology studies to predict sPTB. PMID:24828675

  19. A polynomial based model for cell fate prediction in human diseases.

    PubMed

    Ma, Lichun; Zheng, Jie

    2017-12-21

    Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.

  20. Prospecting for pig single nucleotide polymorphisms in the human genome: have we struck gold?

    PubMed

    Grapes, L; Rudd, S; Fernando, R L; Megy, K; Rocha, D; Rothschild, M F

    2006-06-01

    Gene-to-gene variation in the frequency of single nucleotide polymorphisms (SNPs) has been observed in humans, mice, rats, primates and pigs, but a relationship across species in this variation has not been described. Here, the frequency of porcine coding SNPs (cSNPs) identified by in silico methods, and the frequency of murine cSNPs, were compared with the frequency of human cSNPs across homologous genes. From 150,000 porcine expressed sequence tag (EST) sequences, a total of 452 SNP-containing sequence clusters were found, totalling 1394 putative SNPs. All the clustered porcine EST annotations and SNP data have been made publicly available at http://sputnik.btk.fi/project?name=swine. Human and murine cSNPs were identified from dbSNP and were characterized as either validated or total number of cSNPs (validated plus non-validated) for comparison purposes. The correlation between in silico pig cSNP and validated human cSNP densities was found to be 0.77 (p < 0.00001) for a set of 25 homologous genes, while a correlation of 0.48 (p < 0.0005) was found for a primarily random sample of 50 homologous human and mouse genes. This is the first evidence of conserved gene-to-gene variability in cSNP frequency across species and indicates that site-directed screening of porcine genes that are homologous to cSNP-rich human genes may rapidly advance cSNP discovery in pigs.

  1. Differential gene expression profiling of matched primary renal cell carcinoma and metastases reveals upregulation of extracellular matrix genes.

    PubMed

    Ho, T H; Serie, D J; Parasramka, M; Cheville, J C; Bot, B M; Tan, W; Wang, L; Joseph, R W; Hilton, T; Leibovich, B C; Parker, A S; Eckel-Passow, J E

    2017-03-01

    The majority of renal cell carcinoma (RCC) studies analyze primary tumors, and the corresponding results are extrapolated to metastatic RCC tumors. However, it is unknown if gene expression profiles from primary RCC tumors differs from patient-matched metastatic tumors. Thus, we sought to identify differentially expressed genes between patient-matched primary and metastatic RCC tumors in order to understand the molecular mechanisms underlying the development of RCC metastases. We compared gene expression profiles between patient-matched primary and metastatic RCC tumors using a two-stage design. First, we used Affymetrix microarrays on 15 pairs of primary RCC [14 clear cell RCC (ccRCC), 1 papillary] tumors and patient-matched pulmonary metastases. Second, we used a custom NanoString panel to validate seven candidate genes in an independent cohort of 114 ccRCC patients. Differential gene expression was evaluated using a mixed effect linear model; a random effect denoting patient was included to account for the paired data. Third, The Cancer Genome Atlas (TCGA) data were used to evaluate associations with metastasis-free and overall survival in primary ccRCC tumors. We identified and validated up regulation of seven genes functionally involved in the formation of the extracellular matrix (ECM): DCN, SLIT2, LUM, LAMA2, ADAMTS12, CEACAM6 and LMO3. In primary ccRCC, CEACAM6 and LUM were significantly associated with metastasis-free and overall survival (P < 0.01). We evaluated gene expression profiles using the largest set to date, to our knowledge, of patient-matched primary and metastatic ccRCC tumors and identified up regulation of ECM genes in metastases. Our study implicates up regulation of ECM genes as a critical molecular event leading to visceral, bone and soft tissue metastases in ccRCC. © The Author 2016. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  2. Expression signature as a biomarker for prenatal diagnosis of trisomy 21.

    PubMed

    Volk, Marija; Maver, Aleš; Lovrečić, Luca; Juvan, Peter; Peterlin, Borut

    2013-01-01

    A universal biomarker panel with the potential to predict high-risk pregnancies or adverse pregnancy outcome does not exist. Transcriptome analysis is a powerful tool to capture differentially expressed genes (DEG), which can be used as biomarker-diagnostic-predictive tool for various conditions in prenatal setting. In search of biomarker set for predicting high-risk pregnancies, we performed global expression profiling to find DEG in Ts21. Subsequently, we performed targeted validation and diagnostic performance evaluation on a larger group of case and control samples. Initially, transcriptomic profiles of 10 cultivated amniocyte samples with Ts21 and 9 with normal euploid constitution were determined using expression microarrays. Datasets from Ts21 transcriptomic studies from GEO repository were incorporated. DEG were discovered using linear regression modelling and validated using RT-PCR quantification on an independent sample of 16 cases with Ts21 and 32 controls. The classification performance of Ts21 status based on expression profiling was performed using supervised machine learning algorithm and evaluated using a leave-one-out cross validation approach. Global gene expression profiling has revealed significant expression changes between normal and Ts21 samples, which in combination with data from previously performed Ts21 transcriptomic studies, were used to generate a multi-gene biomarker for Ts21, comprising of 9 gene expression profiles. In addition to biomarker's high performance in discriminating samples from global expression profiling, we were also able to show its discriminatory performance on a larger sample set 2, validated using RT-PCR experiment (AUC=0.97), while its performance on data from previously published studies reached discriminatory AUC values of 1.00. Our results show that transcriptomic changes might potentially be used to discriminate trisomy of chromosome 21 in the prenatal setting. As expressional alterations reflect both, causal and reactive cellular mechanisms, transcriptomic changes may thus have future potential in the diagnosis of a wide array of heterogeneous diseases that result from genetic disturbances.

  3. Reference Gene Validation for RT-qPCR, a Note on Different Available Software Packages

    PubMed Central

    De Spiegelaere, Ward; Dern-Wieloch, Jutta; Weigel, Roswitha; Schumacher, Valérie; Schorle, Hubert; Nettersheim, Daniel; Bergmann, Martin; Brehm, Ralph; Kliesch, Sabine; Vandekerckhove, Linos; Fink, Cornelia

    2015-01-01

    Background An appropriate normalization strategy is crucial for data analysis from real time reverse transcription polymerase chain reactions (RT-qPCR). It is widely supported to identify and validate stable reference genes, since no single biological gene is stably expressed between cell types or within cells under different conditions. Different algorithms exist to validate optimal reference genes for normalization. Applying human cells, we here compare the three main methods to the online available RefFinder tool that integrates these algorithms along with R-based software packages which include the NormFinder and GeNorm algorithms. Results 14 candidate reference genes were assessed by RT-qPCR in two sample sets, i.e. a set of samples of human testicular tissue containing carcinoma in situ (CIS), and a set of samples from the human adult Sertoli cell line (FS1) either cultured alone or in co-culture with the seminoma like cell line (TCam-2) or with equine bone marrow derived mesenchymal stem cells (eBM-MSC). Expression stabilities of the reference genes were evaluated using geNorm, NormFinder, and BestKeeper. Similar results were obtained by the three approaches for the most and least stably expressed genes. The R-based packages NormqPCR, SLqPCR and the NormFinder for R script gave identical gene rankings. Interestingly, different outputs were obtained between the original software packages and the RefFinder tool, which is based on raw Cq values for input. When the raw data were reanalysed assuming 100% efficiency for all genes, then the outputs of the original software packages were similar to the RefFinder software, indicating that RefFinder outputs may be biased because PCR efficiencies are not taken into account. Conclusions This report shows that assay efficiency is an important parameter for reference gene validation. New software tools that incorporate these algorithms should be carefully validated prior to use. PMID:25825906

  4. Reference gene validation for RT-qPCR, a note on different available software packages.

    PubMed

    De Spiegelaere, Ward; Dern-Wieloch, Jutta; Weigel, Roswitha; Schumacher, Valérie; Schorle, Hubert; Nettersheim, Daniel; Bergmann, Martin; Brehm, Ralph; Kliesch, Sabine; Vandekerckhove, Linos; Fink, Cornelia

    2015-01-01

    An appropriate normalization strategy is crucial for data analysis from real time reverse transcription polymerase chain reactions (RT-qPCR). It is widely supported to identify and validate stable reference genes, since no single biological gene is stably expressed between cell types or within cells under different conditions. Different algorithms exist to validate optimal reference genes for normalization. Applying human cells, we here compare the three main methods to the online available RefFinder tool that integrates these algorithms along with R-based software packages which include the NormFinder and GeNorm algorithms. 14 candidate reference genes were assessed by RT-qPCR in two sample sets, i.e. a set of samples of human testicular tissue containing carcinoma in situ (CIS), and a set of samples from the human adult Sertoli cell line (FS1) either cultured alone or in co-culture with the seminoma like cell line (TCam-2) or with equine bone marrow derived mesenchymal stem cells (eBM-MSC). Expression stabilities of the reference genes were evaluated using geNorm, NormFinder, and BestKeeper. Similar results were obtained by the three approaches for the most and least stably expressed genes. The R-based packages NormqPCR, SLqPCR and the NormFinder for R script gave identical gene rankings. Interestingly, different outputs were obtained between the original software packages and the RefFinder tool, which is based on raw Cq values for input. When the raw data were reanalysed assuming 100% efficiency for all genes, then the outputs of the original software packages were similar to the RefFinder software, indicating that RefFinder outputs may be biased because PCR efficiencies are not taken into account. This report shows that assay efficiency is an important parameter for reference gene validation. New software tools that incorporate these algorithms should be carefully validated prior to use.

  5. Genome-Scale Metabolic Model for the Green Alga Chlorella vulgaris UTEX 395 Accurately Predicts Phenotypes under Autotrophic, Heterotrophic, and Mixotrophic Growth Conditions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zuniga, Cristal; Li, Chien -Ting; Huelsman, Tyler

    The green microalgae Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organismmore » to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Moreover, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine.« less

  6. Genome-Scale Metabolic Model for the Green Alga Chlorella vulgaris UTEX 395 Accurately Predicts Phenotypes under Autotrophic, Heterotrophic, and Mixotrophic Growth Conditions

    DOE PAGES

    Zuniga, Cristal; Li, Chien -Ting; Huelsman, Tyler; ...

    2016-07-02

    The green microalgae Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organismmore » to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Moreover, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine.« less

  7. Genome-Scale Metabolic Model for the Green Alga Chlorella vulgaris UTEX 395 Accurately Predicts Phenotypes under Autotrophic, Heterotrophic, and Mixotrophic Growth Conditions.

    PubMed

    Zuñiga, Cristal; Li, Chien-Ting; Huelsman, Tyler; Levering, Jennifer; Zielinski, Daniel C; McConnell, Brian O; Long, Christopher P; Knoshaug, Eric P; Guarnieri, Michael T; Antoniewicz, Maciek R; Betenbaugh, Michael J; Zengler, Karsten

    2016-09-01

    The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. © 2016 American Society of Plant Biologists. All rights reserved.

  8. Genome-Scale Metabolic Model for the Green Alga Chlorella vulgaris UTEX 395 Accurately Predicts Phenotypes under Autotrophic, Heterotrophic, and Mixotrophic Growth Conditions1

    PubMed Central

    Zuñiga, Cristal; Li, Chien-Ting; Zielinski, Daniel C.; Guarnieri, Michael T.; Antoniewicz, Maciek R.; Zengler, Karsten

    2016-01-01

    The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. PMID:27372244

  9. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.

    The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted functionmore » in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. Lastly, these results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.« less

  10. Genome-scale CRISPR-Cas9 knockout screening in human cells.

    PubMed

    Shalem, Ophir; Sanjana, Neville E; Hartenian, Ella; Shi, Xi; Scott, David A; Mikkelson, Tarjei; Heckl, Dirk; Ebert, Benjamin L; Root, David E; Doench, John G; Zhang, Feng

    2014-01-03

    The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12, as well as novel hits NF2, CUL3, TADA2B, and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.

  11. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides

    PubMed Central

    Geiselman, Gina M; Ito, Masakazu; Mondo, Stephen J; Reilly, Morgann C; Cheng, Ya-Fang; Bauer, Stefan; Grigoriev, Igor V; Gladden, John M; Simmons, Blake A; Brem, Rachel B

    2018-01-01

    The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted function in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. These results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi. PMID:29521624

  12. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides

    DOE PAGES

    Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.; ...

    2018-03-09

    The basidiomycete yeast Rhodosporidium toruloides (also known as Rhodotorula toruloides) accumulates high concentrations of lipids and carotenoids from diverse carbon sources. It has great potential as a model for the cellular biology of lipid droplets and for sustainable chemical production. We developed a method for high-throughput genetics (RB-TDNAseq), using sequence-barcoded Agrobacterium tumefaciens T-DNA insertions. We identified 1,337 putative essential genes with low T-DNA insertion rates. We functionally profiled genes required for fatty acid catabolism and lipid accumulation, validating results with 35 targeted deletion strains. We identified a high-confidence set of 150 genes affecting lipid accumulation, including genes with predicted functionmore » in signaling cascades, gene expression, protein modification and vesicular trafficking, autophagy, amino acid synthesis and tRNA modification, and genes of unknown function. Lastly, these results greatly advance our understanding of lipid metabolism in this oleaginous species and demonstrate a general approach for barcoded mutagenesis that should enable functional genomics in diverse fungi.« less

  13. Inference of Expanded Lrp-Like Feast/Famine Transcription Factor Targets in a Non-Model Organism Using Protein Structure-Based Prediction

    PubMed Central

    Ashworth, Justin; Plaisier, Christopher L.; Lo, Fang Yin; Reiss, David J.; Baliga, Nitin S.

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272

  14. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

    PubMed

    Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

    2014-01-01

    Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.

  15. miRWalk--database: prediction of possible miRNA binding sites by "walking" the genes of three genomes.

    PubMed

    Dweep, Harsh; Sticht, Carsten; Pandey, Priyanka; Gretz, Norbert

    2011-10-01

    MicroRNAs are small, non-coding RNA molecules that can complementarily bind to the mRNA 3'-UTR region to regulate the gene expression by transcriptional repression or induction of mRNA degradation. Increasing evidence suggests a new mechanism by which miRNAs may regulate target gene expression by binding in promoter and amino acid coding regions. Most of the existing databases on miRNAs are restricted to mRNA 3'-UTR region. To address this issue, we present miRWalk, a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat. All mRNAs, mitochondrial genes and 10 kb upstream flanking regions of all known genes of human, mouse and rat were analyzed by using a newly developed algorithm named 'miRWalk' as well as with eight already established programs for putative miRNA binding sites. An automated and extensive text-mining search was performed on PubMed database to extract validated information on miRNAs. Combined information was put into a MySQL database. miRWalk presents predicted and validated information on miRNA-target interaction. Such a resource enables researchers to validate new targets of miRNA not only on 3'-UTR, but also on the other regions of all known genes. The 'Validated Target module' is updated every month and the 'Predicted Target module' is updated every 6 months. miRWalk is freely available at http://mirwalk.uni-hd.de/. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. Methods to Study the Role of Progranulin in the Tumor Microenvironment.

    PubMed

    Elkabets, Moshe; Brook, Samuel

    2018-01-01

    Accurate measurement of progranulin (PGRN) in the circulation and in the tumor microenvironment is essential for understanding its role in cancer progression and metastasis. This chapter describes a number of approaches to measure the transcription level of the GRN gene and to detect and analyze PGRN expression in cancer cells and in the local environment of the tumor, in mouse and human samples. These validated protocols are utilized to investigate the functional role of PGRN in cancer. Finally, we discuss strategies to investigate the functions of PGRN in tumors using genetically modified mouse models and gene silencing techniques.

  17. Impact of interaction between the G870A and EFEMP1 gene polymorphism on glioma risk in Chinese Han population.

    PubMed

    Yang, Libin; Qu, Bo; Xia, Xun; Kuang, Yongqin; Li, Jian; Fan, Kexia; Guo, Heng; Zheng, Hui; Ma, Yuan

    2017-06-06

    To investigate the impact of CCND1 and EFEMP1 gene polymorphism, and additional their gene-gene interactions and haplotype within EFEMP1 gene on glioma risk based on Chinese population. Logistic regression was performed to investigate association between single-nucleotide polymorphisms (SNP) and glioma risk and generalized multifactor dimensionality reduction (GMDR) was used to analyze the gene-gene interaction. Glioma risks were higher in carriers of homozygous mutant of rs603965 within CCND1 gene, rs1346787 and rs3791679 in EFEMP1 gene than those with wild-type homozygotes, OR (95%CI) were 1.67 (1.23-2.02), 1.59 (1.25-2.01) and 1.42 (1.15-1.82), respectively. GMDR analysis indicated a significant two-locus model (p=0.0010) involving rs603965 within CCND1 gene and rs1346787 within EFEMP1 gene. Overall, the cross-validation consistency of the two- locus models was 10\\ 10, and the testing accuracy is 60.17%. Participants with rs603965 - GA or AA and rs1346787- AG or GG genotype have the highest glioma risk, compared to participants with rs603965 - GG and rs1346787- AA genotype, OR (95%CI) was 3.65 (1.81-5.22). We conducted haplotype analysis for rs1346787 and rs3791679, because D' value between rs1346787 and rs3791679 was more than 0.8. The most common haplotype was rs1346787 - A and rs3791679- G haplotype, the frequency of which was 0.4905 and 0.4428 in case and control group. Polymorphism in rs603965 within CCND1 gene and rs1346787 within EFEMP1 gene and its gene- gene interaction were associated with increased glioma risk.

  18. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM

    PubMed Central

    Hood, Heather M.; Ocasio, Linda R.; Sachs, Matthew S.; Galagan, James E.

    2013-01-01

    The filamentous fungus Neurospora crassa played a central role in the development of twentieth-century genetics, biochemistry and molecular biology, and continues to serve as a model organism for eukaryotic biology. Here, we have reconstructed a genome-scale model of its metabolism. This model consists of 836 metabolic genes, 257 pathways, 6 cellular compartments, and is supported by extensive manual curation of 491 literature citations. To aid our reconstruction, we developed three optimization-based algorithms, which together comprise Fast Automated Reconstruction of Metabolism (FARM). These algorithms are: LInear MEtabolite Dilution Flux Balance Analysis (limed-FBA), which predicts flux while linearly accounting for metabolite dilution; One-step functional Pruning (OnePrune), which removes blocked reactions with a single compact linear program; and Consistent Reproduction Of growth/no-growth Phenotype (CROP), which reconciles differences between in silico and experimental gene essentiality faster than previous approaches. Against an independent test set of more than 300 essential/non-essential genes that were not used to train the model, the model displays 93% sensitivity and specificity. We also used the model to simulate the biochemical genetics experiments originally performed on Neurospora by comprehensively predicting nutrient rescue of essential genes and synthetic lethal interactions, and we provide detailed pathway-based mechanistic explanations of our predictions. Our model provides a reliable computational framework for the integration and interpretation of ongoing experimental efforts in Neurospora, and we anticipate that our methods will substantially reduce the manual effort required to develop high-quality genome-scale metabolic models for other organisms. PMID:23935467

  19. Microarray analysis in rat liver slices correctly predicts in vivo hepatotoxicity.

    PubMed

    Elferink, M G L; Olinga, P; Draaisma, A L; Merema, M T; Bauerschmidt, S; Polman, J; Schoonen, W G; Groothuis, G M M

    2008-06-15

    The microarray technology, developed for the simultaneous analysis of a large number of genes, may be useful for the detection of toxicity in an early stage of the development of new drugs. The effect of different hepatotoxins was analyzed at the gene expression level in the rat liver both in vivo and in vitro. As in vitro model system the precision-cut liver slice model was used, in which all liver cell types are present in their natural architecture. This is important since drug-induced toxicity often is a multi-cellular process involving not only hepatocytes but also other cell types such as Kupffer and stellate cells. As model toxic compounds lipopolysaccharide (LPS, inducing inflammation), paracetamol (necrosis), carbon tetrachloride (CCl(4), fibrosis and necrosis) and gliotoxin (apoptosis) were used. The aim of this study was to validate the rat liver slice system as in vitro model system for drug-induced toxicity studies. The results of the microarray studies show that the in vitro profiles of gene expression cluster per compound and incubation time, and when analyzed in a commercial gene expression database, can predict the toxicity and pathology observed in vivo. Each toxic compound induces a specific pattern of gene expression changes. In addition, some common genes were up- or down-regulated with all toxic compounds. These data show that the rat liver slice system can be an appropriate tool for the prediction of multi-cellular liver toxicity. The same experiments and analyses are currently performed for the prediction of human specific toxicity using human liver slices.

  20. Microarray analysis in rat liver slices correctly predicts in vivo hepatotoxicity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Elferink, M.G.L.; Olinga, P.; Draaisma, A.L.

    2008-06-15

    The microarray technology, developed for the simultaneous analysis of a large number of genes, may be useful for the detection of toxicity in an early stage of the development of new drugs. The effect of different hepatotoxins was analyzed at the gene expression level in the rat liver both in vivo and in vitro. As in vitro model system the precision-cut liver slice model was used, in which all liver cell types are present in their natural architecture. This is important since drug-induced toxicity often is a multi-cellular process involving not only hepatocytes but also other cell types such asmore » Kupffer and stellate cells. As model toxic compounds lipopolysaccharide (LPS, inducing inflammation), paracetamol (necrosis), carbon tetrachloride (CCl{sub 4}, fibrosis and necrosis) and gliotoxin (apoptosis) were used. The aim of this study was to validate the rat liver slice system as in vitro model system for drug-induced toxicity studies. The results of the microarray studies show that the in vitro profiles of gene expression cluster per compound and incubation time, and when analyzed in a commercial gene expression database, can predict the toxicity and pathology observed in vivo. Each toxic compound induces a specific pattern of gene expression changes. In addition, some common genes were up- or down-regulated with all toxic compounds. These data show that the rat liver slice system can be an appropriate tool for the prediction of multi-cellular liver toxicity. The same experiments and analyses are currently performed for the prediction of human specific toxicity using human liver slices.« less

  1. Comparative analysis of cis-regulation following stroke and seizures in subspaces of conserved eigensystems

    PubMed Central

    2010-01-01

    Background It is often desirable to separate effects of different regulators on gene expression, or to identify effects of the same regulator across several systems. Here, we focus on the rat brain following stroke or seizures, and demonstrate how the two tasks can be approached simultaneously. Results We applied SVD to time-series gene expression datasets from the rat experimental models of stroke and seizures. We demonstrate conservation of two eigensystems, reflecting inflammation and/or apoptosis (eigensystem 2) and neuronal synaptic activity (eigensystem 3), between the stroke and seizures. We analyzed cis-regulation of gene expression in the subspaces of the conserved eigensystems. Bayesian networks analysis was performed separately for either experimental model, with cross-system validation of the highest-ranking features. In this way, we correctly re-discovered the role of AP1 in the regulation of apoptosis, and the involvement of Creb and Egr in the regulation of synaptic activity-related genes. We identified a novel antagonistic effect of the motif recognized by the nuclear matrix attachment region-binding protein Satb1 on AP1-driven transcriptional activation, suggesting a link between chromatin loop structure and gene activation by AP1. The effects of motifs binding Satb1 and Creb on gene expression in brain conform to the assumption of the linear response model of gene regulation. Our data also suggest that numerous enhancers of neuronal-specific genes are important for their responsiveness to the synaptic activity. Conclusion Eigensystems conserved between stroke and seizures separate effects of inflammation/apoptosis and neuronal synaptic activity, exerted by different transcription factors, on gene expression in rat brain. PMID:20565733

  2. A reported 20-gene expression signature to predict lymph node-positive disease at radical cystectomy for muscle-invasive bladder cancer is clinically not applicable.

    PubMed

    van Kessel, Kim E M; van de Werken, Harmen J G; Lurkin, Irene; Ziel-van der Made, Angelique C J; Zwarthoff, Ellen C; Boormans, Joost L

    2017-01-01

    Neoadjuvant chemotherapy (NAC) for muscle-invasive bladder cancer (MIBC) provides a small but significant survival benefit. Nevertheless, controversies on applying NAC remain because the limited benefit must be weight against chemotherapy-related toxicity and the delay of definitive local treatment. Therefore, there is a clear clinical need for tools to guide treatment decisions on NAC in MIBC. Here, we aimed to validate a previously reported 20-gene expression signature that predicted lymph node-positive disease at radical cystectomy in clinically node-negative MIBC patients, which would be a justification for upfront chemotherapy. We studied diagnostic transurethral resection of bladder tumors (dTURBT) of 150 MIBC patients (urothelial carcinoma) who were subsequently treated by radical cystectomy and pelvic lymph node dissection. RNA was isolated and the expression level of the 20 genes was determined on a qRT-PCR platform. Normalized Ct values were used to calculate a risk score to predict the presence of node-positive disease. The Cancer Genome Atlas (TCGA) RNA expression data was analyzed to subsequently validate the results. In a univariate regression analysis, none of the 20 genes significantly correlated with node-positive disease. The area under the curve of the risk score calculated by the 20-gene expression signature was 0.54 (95% Confidence Interval: 0.44-0.65) versus 0.67 for the model published by Smith et al. Node-negative patients had a significantly lower tumor grade at TURBT (p = 0.03), a lower pT stage (p<0.01) and less frequent lymphovascular invasion (13% versus 38%, p<0.01) at radical cystectomy than node-positive patients. In addition, in the TCGA data, none of the 20 genes was differentially expressed in node-negative versus node-positive patients. We conclude that a 20-gene expression signature developed for nodal staging of MIBC at radical cystectomy could not be validated on a qRT-PCR platform in a large cohort of dTURBT specimens.

  3. Pervasive and opposing effects of Unpredictable Chronic Mild Stress (UCMS) on hippocampal gene expression in BALB/cJ and C57BL/6J mouse strains.

    PubMed

    Malki, Karim; Mineur, Yann S; Tosto, Maria Grazia; Campbell, James; Karia, Priya; Jumabhoy, Irfan; Sluyter, Frans; Crusio, Wim E; Schalkwyk, Leonard C

    2015-04-03

    BALB/cJ is a strain susceptible to stress and extremely susceptible to a defective hedonic impact in response to chronic stressors. The strain offers much promise as an animal model for the study of stress related disorders. We present a comparative hippocampal gene expression study on the effects of unpredictable chronic mild stress on BALB/cJ and C57BL/6J mice. Affymetrix MOE 430 was used to measure hippocampal gene expression from 16 animals of two different strains (BALB/cJ and C57BL/6J) of both sexes and subjected to either unpredictable chronic mild stress (UCMS) or no stress. Differences were statistically evaluated through supervised and unsupervised linear modelling and using Weighted Gene Coexpression Network Analysis (WGCNA). In order to gain further understanding into mechanisms related to stress response, we cross-validated our results with a parallel study from the GENDEP project using WGCNA in a meta-analysis design. The effects of UCMS are visible through Principal Component Analysis which highlights the stress sensitivity of the BALB/cJ strain. A number of genes and gene networks related to stress response were uncovered including the Creb1 gene. WGCNA and pathway analysis revealed a gene network centered on Nfkb1. Results from the meta-analysis revealed a highly significant gene pathway centred on the Ubiquitin C (Ubc) gene. All pathways uncovered are associated with inflammation and immune response. The study investigated the molecular mechanisms underlying the response to adverse environment in an animal model using a GxE design. Stress-related differences were visible at the genomic level through PCA analysis highlighting the high sensitivity of BALB/cJ animals to environmental stressors. Several candidate genes and gene networks reported are associated with inflammation and neurogenesis and could serve to inform candidate gene selection in human studies and provide additional insight into the pathology of Major Depressive Disorder.

  4. Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells

    PubMed Central

    de Luis Balaguer, Maria Angels; Fisher, Adam P.; Clark, Natalie M.; Fernandez-Espinosa, Maria Guadalupe; Möller, Barbara K.; Weijers, Dolf; Williams, Cranos; Lorenzo, Oscar; Sozzani, Rosangela

    2017-01-01

    Identifying the transcription factors (TFs) and associated networks involved in stem cell regulation is essential for understanding the initiation and growth of plant tissues and organs. Although many TFs have been shown to have a role in the Arabidopsis root stem cells, a comprehensive view of the transcriptional signature of the stem cells is lacking. In this work, we used spatial and temporal transcriptomic data to predict interactions among the genes involved in stem cell regulation. To accomplish this, we transcriptionally profiled several stem cell populations and developed a gene regulatory network inference algorithm that combines clustering with dynamic Bayesian network inference. We leveraged the topology of our networks to infer potential major regulators. Specifically, through mathematical modeling and experimental validation, we identified PERIANTHIA (PAN) as an important molecular regulator of quiescent center function. The results presented in this work show that our combination of molecular biology, computational biology, and mathematical modeling is an efficient approach to identify candidate factors that function in the stem cells. PMID:28827319

  5. Dynamics of embryonic stem cell differentiation inferred from single-cell transcriptomics show a series of transitions through discrete cell states

    PubMed Central

    Jang, Sumin; Choubey, Sandeep; Furchtgott, Leon; Zou, Ling-Nan; Doyle, Adele; Menon, Vilas; Loew, Ethan B; Krostag, Anne-Rachel; Martinez, Refugio A; Madisen, Linda; Levi, Boaz P; Ramanathan, Sharad

    2017-01-01

    The complexity of gene regulatory networks that lead multipotent cells to acquire different cell fates makes a quantitative understanding of differentiation challenging. Using a statistical framework to analyze single-cell transcriptomics data, we infer the gene expression dynamics of early mouse embryonic stem (mES) cell differentiation, uncovering discrete transitions across nine cell states. We validate the predicted transitions across discrete states using flow cytometry. Moreover, using live-cell microscopy, we show that individual cells undergo abrupt transitions from a naïve to primed pluripotent state. Using the inferred discrete cell states to build a probabilistic model for the underlying gene regulatory network, we further predict and experimentally verify that these states have unique response to perturbations, thus defining them functionally. Our study provides a framework to infer the dynamics of differentiation from single cell transcriptomics data and to build predictive models of the gene regulatory networks that drive the sequence of cell fate decisions during development. DOI: http://dx.doi.org/10.7554/eLife.20487.001 PMID:28296635

  6. An integrated approach to reconstructing genome-scale transcriptional regulatory networks

    DOE PAGES

    Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.; ...

    2015-02-27

    Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making themmore » highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.« less

  7. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    PubMed

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  8. Development and Validation of Targeted Next-Generation Sequencing Panels for Detection of Germline Variants in Inherited Diseases.

    PubMed

    Santani, Avni; Murrell, Jill; Funke, Birgit; Yu, Zhenming; Hegde, Madhuri; Mao, Rong; Ferreira-Gonzalez, Andrea; Voelkerding, Karl V; Weck, Karen E

    2017-06-01

    - The number of targeted next-generation sequencing (NGS) panels for genetic diseases offered by clinical laboratories is rapidly increasing. Before an NGS-based test is implemented in a clinical laboratory, appropriate validation studies are needed to determine the performance characteristics of the test. - To provide examples of assay design and validation of targeted NGS gene panels for the detection of germline variants associated with inherited disorders. - The approaches used by 2 clinical laboratories for the development and validation of targeted NGS gene panels are described. Important design and validation considerations are examined. - Clinical laboratories must validate performance specifications of each test prior to implementation. Test design specifications and validation data are provided, outlining important steps in validation of targeted NGS panels by clinical diagnostic laboratories.

  9. Convergent functional genomic studies of ω-3 fatty acids in stress reactivity, bipolar disorder and alcoholism.

    PubMed

    Le-Niculescu, H; Case, N J; Hulvershorn, L; Patel, S D; Bowker, D; Gupta, J; Bell, R; Edenberg, H J; Tsuang, M T; Kuczenski, R; Geyer, M A; Rodd, Z A; Niculescu, A B

    2011-04-26

    Omega-3 fatty acids have been proposed as an adjuvant treatment option in psychiatric disorders. Given their other health benefits and their relative lack of toxicity, teratogenicity and side effects, they may be particularly useful in children and in females of child-bearing age, especially during pregnancy and postpartum. A comprehensive mechanistic understanding of their effects is needed. Here we report translational studies demonstrating the phenotypic normalization and gene expression effects of dietary omega-3 fatty acids, specifically docosahexaenoic acid (DHA), in a stress-reactive knockout mouse model of bipolar disorder and co-morbid alcoholism, using a bioinformatic convergent functional genomics approach integrating animal model and human data to prioritize disease-relevant genes. Additionally, to validate at a behavioral level the novel observed effects on decreasing alcohol consumption, we also tested the effects of DHA in an independent animal model, alcohol-preferring (P) rats, a well-established animal model of alcoholism. Our studies uncover sex differences, brain region-specific effects and blood biomarkers that may underpin the effects of DHA. Of note, DHA modulates some of the same genes targeted by current psychotropic medications, as well as increases myelin-related gene expression. Myelin-related gene expression decrease is a common, if nonspecific, denominator of neuropsychiatric disorders. In conclusion, our work supports the potential utility of omega-3 fatty acids, specifically DHA, for a spectrum of psychiatric disorders such as stress disorders, bipolar disorder, alcoholism and beyond.

  10. Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (Danio rerio)

    EPA Science Inventory

    Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of human biomedical science. Many such classifiers discovered thus far lack vigorous statistical and experimental validations, with their stability and rel...

  11. A statistical framework for biomedical literature mining.

    PubMed

    Chung, Dongjun; Lawson, Andrew; Zheng, W Jim

    2017-09-30

    In systems biology, it is of great interest to identify new genes that were not previously reported to be associated with biological pathways related to various functions and diseases. Identification of these new pathway-modulating genes does not only promote understanding of pathway regulation mechanisms but also allow identification of novel targets for therapeutics. Recently, biomedical literature has been considered as a valuable resource to investigate pathway-modulating genes. While the majority of currently available approaches are based on the co-occurrence of genes within an abstract, it has been reported that these approaches show only sub-optimal performances because 70% of abstracts contain information only for a single gene. To overcome such limitation, we propose a novel statistical framework based on the concept of ontology fingerprint that uses gene ontology to extract information from large biomedical literature data. The proposed framework simultaneously identifies pathway-modulating genes and facilitates interpreting functions of these new genes. We also propose a computationally efficient posterior inference procedure based on Metropolis-Hastings within Gibbs sampler for parameter updates and the poor man's reversible jump Markov chain Monte Carlo approach for model selection. We evaluate the proposed statistical framework with simulation studies, experimental validation, and an application to studies of pathway-modulating genes in yeast. The R implementation of the proposed model is currently available at https://dongjunchung.github.io/bayesGO/. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. A quantitative framework to evaluate modeling of cortical development by neural stem cells

    PubMed Central

    Stein, Jason L.; de la Torre-Ubieta, Luis; Tian, Yuan; Parikshak, Neelroop N.; Hernandez, Israel A.; Marchetto, Maria C.; Baker, Dylan K.; Lu, Daning; Hinman, Cassidy R.; Lowe, Jennifer K.; Wexler, Eric M.; Muotri, Alysson R.; Gage, Fred H.; Kosik, Kenneth S.; Geschwind, Daniel H.

    2014-01-01

    Summary Neural stem cells have been adopted to model a wide range of neuropsychiatric conditions in vitro. However, how well such models correspond to in vivo brain has not been evaluated in an unbiased, comprehensive manner. We used transcriptomic analyses to compare in vitro systems to developing human fetal brain and observed strong conservation of in vivo gene expression and network architecture in differentiating primary human neural progenitor cells (phNPCs). Conserved modules are enriched in genes associated with ASD, supporting the utility of phNPCs for studying neuropsychiatric disease. We also developed and validated a machine learning approach called CoNTExT that identifies the developmental maturity and regional identity of in vitro models. We observed strong differences between in vitro models, including hiPSC-derived neural progenitors from multiple laboratories. This work provides a systems biology framework for evaluating in vitro systems and supports their value in studying the molecular mechanisms of human neurodevelopmental disease. PMID:24991955

  13. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    PubMed

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  14. Construction and Quantitative Validation of Chicken CXCR4 Expression Reporter.

    PubMed

    Es-Haghi, Masoumeh; Bassami, Mohammadreza; Dehghani, Hesam

    2016-03-01

    Site directional migration is an important biological event and an essential behavior for latent migratory cells. A migratory cell maintains its motility, survival, and proliferation abilities by a network of signaling pathways where CXCR4/SDF signaling route plays crucial role for directed homing of a polarized cell. The chicken embryo due to its specific vasculature modality has been used as a valuable model for organogenesis, migration, cancer, and metastasis. In this research, the regulatory regions of chicken CXCR4 gene have been characterized in a chicken hematopoietic lymphoblast cell line (MSB1). A region extending from -2000 bp upstream of CXCR4 gene to +68 after its transcriptional start site, in addition to two other mutant fragments were constructed and cloned in a promoter-less reporter vector. Promoter activity was analyzed by quantitative real-time RT-PCR and flow cytometry techniques. Our findings show that the full sequence from -2000 to +68 bp of CXCR4 regulatory region is required for maximum promoter functionality, while the mutant CXCR4 promoter fragments show a partial promoter activity. The chicken CXCR4 promoter validated in this study could be used for characterization of directed migratory cells in chicken development and disease models.

  15. The Mechanism of Gene Targeting in Human Somatic Cells

    PubMed Central

    Kan, Yinan; Ruis, Brian; Lin, Sherry; Hendrickson, Eric A.

    2014-01-01

    Gene targeting in human somatic cells is of importance because it can be used to either delineate the loss-of-function phenotype of a gene or correct a mutated gene back to wild-type. Both of these outcomes require a form of DNA double-strand break (DSB) repair known as homologous recombination (HR). The mechanism of HR leading to gene targeting, however, is not well understood in human cells. Here, we demonstrate that a two-end, ends-out HR intermediate is valid for human gene targeting. Furthermore, the resolution step of this intermediate occurs via the classic DSB repair model of HR while synthesis-dependent strand annealing and Holliday Junction dissolution are, at best, minor pathways. Moreover, and in contrast to other systems, the positions of Holliday Junction resolution are evenly distributed along the homology arms of the targeting vector. Most unexpectedly, we demonstrate that when a meganuclease is used to introduce a chromosomal DSB to augment gene targeting, the mechanism of gene targeting is inverted to an ends-in process. Finally, we demonstrate that the anti-recombination activity of mismatch repair is a significant impediment to gene targeting. These observations significantly advance our understanding of HR and gene targeting in human cells. PMID:24699519

  16. Injection of AAV2-BMP2 and AAV2-TIMP1 into the nucleus pulposus slows the course of intervertebral disc degeneration in an in vivo rabbit model.

    PubMed

    Leckie, Steven K; Bechara, Bernard P; Hartman, Robert A; Sowa, Gwendolyn A; Woods, Barrett I; Coelho, Joao P; Witt, William T; Dong, Qing D; Bowman, Brent W; Bell, Kevin M; Vo, Nam V; Wang, Bing; Kang, James D

    2012-01-01

    Intervertebral disc degeneration (IDD) is a common cause of back pain. Patients who fail conservative management may face the morbidity of surgery. Alternative treatment modalities could have a significant impact on disease progression and patients' quality of life. To determine if the injection of a virus vector carrying a therapeutic gene directly into the nucleus pulposus improves the course of IDD. Prospective randomized controlled animal study. Thirty-four skeletally mature New Zealand white rabbits were used. In the treatment group, L2-L3, L3-L4, and L4-L5 discs were punctured in accordance with a previously validated rabbit annulotomy model for IDD and then subsequently treated with adeno-associated virus serotype 2 (AAV2) vector carrying genes for either bone morphogenetic protein 2 (BMP2) or tissue inhibitor of metalloproteinase 1 (TIMP1). A nonoperative control group, nonpunctured sham surgical group, and punctured control group were also evaluated. Serial magnetic resonance imaging (MRI) studies at 0, 6, and 12 weeks were obtained, and a validated MRI analysis program was used to quantify degeneration. The rabbits were sacrificed at 12 weeks, and L4-L5 discs were analyzed histologically. Viscoelastic properties of the L3-L4 discs were analyzed using uniaxial load-normalized displacement testing. Creep curves were mathematically modeled according to a previously validated two-phase exponential model. Serum samples obtained at 0, 6, and 12 weeks were assayed for biochemical evidence of degeneration. The punctured group demonstrated MRI and histologic evidence of degeneration as expected. The treatment groups demonstrated less MRI and histologic evidence of degeneration than the punctured group. The serum biochemical marker C-telopeptide of collagen type II increased rapidly in the punctured group, but the treated groups returned to control values by 12 weeks. The treatment groups demonstrated several viscoelastic properties that were distinct from control and punctured values. Treatment of punctured rabbit intervertebral discs with AAV2-BMP2 or AAV2-TIMP1 helps delay degenerative changes, as seen on MRI, histologic sampling, serum biochemical analysis, and biomechanical testing. Although data from animal models should be extrapolated to the human condition with caution, this study supports the potential use of gene therapy for the treatment of IDD. Copyright © 2012 Elsevier Inc. All rights reserved.

  17. Transcriptional profiling of the host cell response to feline immunodeficiency virus infection.

    PubMed

    Ertl, Reinhard; Klein, Dieter

    2014-03-19

    Feline immunodeficiency virus (FIV) is a widespread pathogen of the domestic cat and an important animal model for human immunodeficiency virus (HIV) research. In contrast to HIV, only limited information is available on the transcriptional host cell response to FIV infections. This study aims to identify FIV-induced gene expression changes in feline T-cells during the early phase of the infection. Illumina RNA-sequencing (RNA-seq) was used identify differentially expressed genes (DEGs) at 24 h after FIV infection. After removal of low-quality reads, the remaining sequencing data were mapped against the cat genome and the numbers of mapping reads were counted for each gene. Regulated genes were identified through the comparison of FIV and mock-infected data sets. After statistical analysis and the removal of genes with insufficient coverage, we detected a total of 69 significantly DEGs (44 up- and 25 down-regulated genes) upon FIV infection. The results obtained by RNA-seq were validated by reverse transcription qPCR analysis for 10 genes. Out of the most distinct DEGs identified in this study, several genes are already known to interact with HIV in humans, indicating comparable effects of both viruses on the host cell gene expression and furthermore, highlighting the importance of FIV as a model system for HIV. In addition, a set of new genes not previously linked to virus infections could be identified. The provided list of virus-induced genes may represent useful information for future studies focusing on the molecular mechanisms of virus-host interactions in FIV pathogenesis.

  18. Thiopurine pharmacogenomics: association of SNPs with clinical response and functional validation of candidate genes

    PubMed Central

    Matimba, Alice; Li, Fang; Livshits, Alina; Cartwright, Cher S; Scully, Stephen; Fridley, Brooke L; Jenkins, Gregory; Batzler, Anthony; Wang, Liewei; Weinshilboum, Richard; Lennard, Lynne

    2014-01-01

    Aim We investigated candidate genes associated with thiopurine metabolism and clinical response in childhood acute lymphoblastic leukemia. Materials & methods We performed genome-wide SNP association studies of 6-thioguanine and 6-mercaptopurine cytotoxicity using lymphoblastoid cell lines. We then genotyped the top SNPs associated with lymphoblastoid cell line cytotoxicity, together with tagSNPs for genes in the ‘thiopurine pathway’ (686 total SNPs), in DNA from 589 Caucasian UK ALL97 patients. Functional validation studies were performed by siRNA knockdown in cancer cell lines. Results SNPs in the thiopurine pathway genes ABCC4, ABCC5, IMPDH1, ITPA, SLC28A3 and XDH, and SNPs located within or near ATP6AP2, FRMD4B, GNG2, KCNMA1 and NME1, were associated with clinical response and measures of thiopurine metabolism. Functional validation showed shifts in cytotoxicity for these genes. Conclusion The clinical response to thiopurines may be regulated by variation in known thiopurine pathway genes and additional novel genes outside of the thiopurine pathway. PMID:24624911

  19. Development and validation of a gene profile predicting benefit of postmastectomy radiotherapy in patients with high-risk breast cancer: a study of gene expression in the DBCG82bc cohort.

    PubMed

    Tramm, Trine; Mohammed, Hayat; Myhre, Simen; Kyndi, Marianne; Alsner, Jan; Børresen-Dale, Anne-Lise; Sørlie, Therese; Frigessi, Arnoldo; Overgaard, Jens

    2014-10-15

    To identify genes predicting benefit of radiotherapy in patients with high-risk breast cancer treated with systemic therapy and randomized to receive or not receive postmastectomy radiotherapy (PMRT). The study was based on the Danish Breast Cancer Cooperative Group (DBCG82bc) cohort. Gene-expression analysis was performed in a training set of frozen tumor tissue from 191 patients. Genes were identified through the Lasso method with the endpoint being locoregional recurrence (LRR). A weighted gene-expression index (DBCG-RT profile) was calculated and transferred to quantitative real-time PCR (qRT-PCR) in corresponding formalin-fixed, paraffin-embedded (FFPE) samples, before validation in FFPE from 112 additional patients. Seven genes were identified, and the derived DBCG-RT profile divided the 191 patients into "high LRR risk" and "low LRR risk" groups. PMRT significantly reduced risk of LRR in "high LRR risk" patients, whereas "low LRR risk" patients showed no additional reduction in LRR rate. Technical transfer of the DBCG-RT profile to FFPE/qRT-PCR was successful, and the predictive impact was successfully validated in another 112 patients. A DBCG-RT gene profile was identified and validated, identifying patients with very low risk of LRR and no benefit from PMRT. The profile may provide a method to individualize treatment with PMRT. ©2014 American Association for Cancer Research.

  20. Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

    PubMed Central

    Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298

  1. Differential Splicing of Oncogenes and Tumor Suppressor Genes in African- and Caucasian-American Populations: Contributing Factor in Prostate Cancer Disparities

    DTIC Science & Technology

    2015-10-01

    signaling protein as defined by in vitro assays and mouse xenograft studies, ii) is associated with worse prognosis in patients, and iii) is resistant to...available. Specific Aim 2. To characterize oncogenic differences of splice variant pairs in vivo using xenograft animal models. Task 1. Validate...idelalisib as defined by in vitro assays and mouse xenograft models. In contrast, the corresponding EA isoform (PI3Kδ-L) encodes a less aggressive isoform

  2. Bioinformatics approach for choosing the correct reference genes when studying gene expression in human keratinocytes.

    PubMed

    Beer, Lucian; Mlitz, Veronika; Gschwandtner, Maria; Berger, Tanja; Narzt, Marie-Sophie; Gruber, Florian; Brunner, Patrick M; Tschachler, Erwin; Mildner, Michael

    2015-10-01

    Reverse transcription polymerase chain reaction (qRT-PCR) has become a mainstay in many areas of skin research. To enable quantitative analysis, it is necessary to analyse expression of reference genes (RGs) for normalization of target gene expression. The selection of reliable RGs therefore has an important impact on the experimental outcome. In this study, we aimed to identify and validate the best suited RGs for qRT-PCR in human primary keratinocytes (KCs) over a broad range of experimental conditions using the novel bioinformatics tool 'RefGenes', which is based on a manually curated database of published microarray data. Expression of 6 RGs identified by RefGenes software and 12 commonly used RGs were validated by qRT-PCR. We assessed whether these 18 markers fulfilled the requirements for a valid RG by the comprehensive ranking of four bioinformatics tools and the coefficient of variation (CV). In an overall ranking, we found GUSB to be the most stably expressed RG, whereas the expression values of the commonly used RGs, GAPDH and B2M were significantly affected by varying experimental conditions. Our results identify RefGenes as a powerful tool for the identification of valid RGs and suggest GUSB as the most reliable RG for KCs. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  3. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome.

    PubMed

    Tothill, Richard W; Tinker, Anna V; George, Joshy; Brown, Robert; Fox, Stephen B; Lade, Stephen; Johnson, Daryl S; Trivett, Melanie K; Etemadmoghadam, Dariush; Locandro, Bianca; Traficante, Nadia; Fereday, Sian; Hung, Jillian A; Chiew, Yoke-Eng; Haviv, Izhak; Gertig, Dorota; DeFazio, Anna; Bowtell, David D L

    2008-08-15

    The study aim to identify novel molecular subtypes of ovarian cancer by gene expression profiling with linkage to clinical and pathologic features. Microarray gene expression profiling was done on 285 serous and endometrioid tumors of the ovary, peritoneum, and fallopian tube. K-means clustering was applied to identify robust molecular subtypes. Statistical analysis identified differentially expressed genes, pathways, and gene ontologies. Laser capture microdissection, pathology review, and immunohistochemistry validated the array-based findings. Patient survival within k-means groups was evaluated using Cox proportional hazards models. Class prediction validated k-means groups in an independent dataset. A semisupervised survival analysis of the array data was used to compare against unsupervised clustering results. Optimal clustering of array data identified six molecular subtypes. Two subtypes represented predominantly serous low malignant potential and low-grade endometrioid subtypes, respectively. The remaining four subtypes represented higher grade and advanced stage cancers of serous and endometrioid morphology. A novel subtype of high-grade serous cancers reflected a mesenchymal cell type, characterized by overexpression of N-cadherin and P-cadherin and low expression of differentiation markers, including CA125 and MUC1. A poor prognosis subtype was defined by a reactive stroma gene expression signature, correlating with extensive desmoplasia in such samples. A similar poor prognosis signature could be found using a semisupervised analysis. Each subtype displayed distinct levels and patterns of immune cell infiltration. Class prediction identified similar subtypes in an independent ovarian dataset with similar prognostic trends. Gene expression profiling identified molecular subtypes of ovarian cancer of biological and clinical importance.

  4. Follicle Online: an integrated database of follicle assembly, development and ovulation.

    PubMed

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.

  5. Pharmacoethnicity in Paclitaxel-Induced Sensory Peripheral Neuropathy

    PubMed Central

    Komatsu, Masaaki; Wheeler, Heather E.; Chung, Suyoun; Low, Siew-Kee; Wing, Claudia; Delaney, Shannon M.; Gorsic, Lidija K.; Takahashi, Atsushi; Kubo, Michiaki; Kroetz, Deanna L.; Zhang, Wei; Nakamura, Yusuke; Dolan, M. Eileen

    2015-01-01

    Purpose Paclitaxel is used worldwide in the treatment of breast, lung, ovarian and other cancers. Sensory peripheral neuropathy is an associated adverse effect that cannot be predicted, prevented or mitigated. To better understand the contribution of germline genetic variation to paclitaxel-induced peripheral neuropathy, we undertook an integrative approach that combines genome-wide association study (GWAS) data generated from HapMap lymphoblastoid cell lines (LCLs) and Asian patients. Methods GWAS was performed with paclitaxel-induced cytotoxicity generated in 363 LCLs and with paclitaxel-induced neuropathy from 145 Asian patients. A gene-based approach was used to identify overlapping genes and compare to a European clinical cohort of paclitaxel-induced neuropathy. Neurons derived from human induced pluripotent stem cells were used for functional validation of candidate genes. Results SNPs near AIPL1 were significantly associated with paclitaxel-induced cytotoxicity in Asian LCLs (P < 10−6). Decreased expression of AIPL1 resulted in decreased sensitivity of neurons to paclitaxel by inducing neurite morphological changes as measured by increased relative total outgrowth, number of processes and mean process length. Using a gene-based analysis, there were 32 genes that overlapped between Asian LCL cytotoxicity and Asian patient neuropathy (P < 0.05) including BCR. Upon BCR knockdown, there was an increase in neuronal sensitivity to paclitaxel as measured by neurite morphological characteristics. Conclusion We identified genetic variants associated with Asian paclitaxel-induced cytotoxicity and functionally validated the AIPL1 and BCR in a neuronal cell model. Furthermore, the integrative pharmacogenomics approach of LCL/patient GWAS may help prioritize target genes associated with chemotherapeutic-induced peripheral neuropathy. PMID:26015512

  6. Follicle Online: an integrated database of follicle assembly, development and ovulation

    PubMed Central

    Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua

    2015-01-01

    Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457

  7. Validation of Reference Genes for Gene Expression Studies in Virus-Infected Nicotiana benthamiana Using Quantitative Real-Time PCR

    PubMed Central

    Han, Chenggui; Yu, Jialin; Li, Dawei; Zhang, Yongliang

    2012-01-01

    Nicotiana benthamiana is the most widely-used experimental host in plant virology. The recent release of the draft genome sequence for N. benthamiana consolidates its role as a model for plant–pathogen interactions. Quantitative real-time PCR (qPCR) is commonly employed for quantitative gene expression analysis. For valid qPCR analysis, accurate normalisation of gene expression against an appropriate internal control is required. Yet there has been little systematic investigation of reference gene stability in N. benthamiana under conditions of viral infections. In this study, the expression profiles of 16 commonly used housekeeping genes (GAPDH, 18S, EF1α, SAMD, L23, UK, PP2A, APR, UBI3, SAND, ACT, TUB, GBP, F-BOX, PPR and TIP41) were determined in N. benthamiana and those with acceptable expression levels were further selected for transcript stability analysis by qPCR of complementary DNA prepared from N. benthamiana leaf tissue infected with one of five RNA plant viruses (Tobacco necrosis virus A, Beet black scorch virus, Beet necrotic yellow vein virus, Barley stripe mosaic virus and Potato virus X). Gene stability was analysed in parallel by three commonly-used dedicated algorithms: geNorm, NormFinder and BestKeeper. Statistical analysis revealed that the PP2A, F-BOX and L23 genes were the most stable overall, and that the combination of these three genes was sufficient for accurate normalisation. In addition, the suitability of PP2A, F-BOX and L23 as reference genes was illustrated by expression-level analysis of AGO2 and RdR6 in virus-infected N. benthamiana leaves. This is the first study to systematically examine and evaluate the stability of different reference genes in N. benthamiana. Our results not only provide researchers studying these viruses a shortlist of potential housekeeping genes to use as normalisers for qPCR experiments, but should also guide the selection of appropriate reference genes for gene expression studies of N. benthamiana under other biotic and abiotic stress conditions. PMID:23029521

  8. Patient-derived Hormone-naive Prostate Cancer Xenograft Models Reveal Growth Factor Receptor Bound Protein 10 as an Androgen Receptor-repressed Gene Driving the Development of Castration-resistant Prostate Cancer.

    PubMed

    Hao, Jun; Ci, Xinpei; Xue, Hui; Wu, Rebecca; Dong, Xin; Choi, Stephen Yiu Chuen; He, Haiqing; Wang, Yu; Zhang, Fang; Qu, Sifeng; Zhang, Fan; Haegert, Anne M; Gout, Peter W; Zoubeidi, Amina; Collins, Colin; Gleave, Martin E; Lin, Dong; Wang, Yuzhuo

    2018-06-01

    Although androgen deprivation therapy is initially effective in controlling growth of hormone-naive prostate cancers (HNPCs) in patients, currently incurable castration-resistant prostate cancer (CRPC) inevitably develops. To identify CRPC driver genes that may provide new targets to enhance CRPC therapy. Patient-derived xenografts (PDXs) of HNPCs that develop CRPC following host castration were examined for changes in expression of genes at various time points after castration using transcriptome profiling analysis; particular attention was given to pre-CRPC changes in expression indicative of genes acting as potential CRPC drivers. The functionality of a potential CRPC driver was validated via its knockdown in cultured prostate cancer cells; its clinical relevance was established using data from prostate cancer patient databases. Eighty genes were found to be significantly upregulated at the CRPC stage, while seven of them also showed elevated expression prior to CRPC development. Among the latter, growth factor receptor bound protein 10 (GRB10) was the most significantly and consistently upregulated gene. Moreover, elevated GRB10 expression in clinical prostate cancer samples correlated with more aggressive tumor types and poorer patient treatment outcome. GRB10 knockdown markedly reduced prostate cancer cell proliferation and activity of AKT, a well-established CRPC mediator. A positive correlation between AKT activity and GRB10 expression was also found in clinical cohorts. GRB10 acts as a driver of CRPC and sensitizes androgen receptor pathway inhibitors, and hence GRB10 targeting provides a novel therapeutic strategy for the disease. Development of castration-resistant prostate cancer (CRPC) is a major problem in the management of the disease. Using state-of-the-art patient-derived hormone-naive prostate cancer xenograft models, we found and validated the growth factor receptor bound protein 10 gene as a driver of CRPC, indicating that it may be used as a new molecular target to enhance current CRPC therapy. Copyright © 2018 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  9. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    PubMed Central

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  10. Genes associated with metabolic syndrome predict disease-free survival in stage II colorectal cancer patients. A novel link between metabolic dysregulation and colorectal cancer.

    PubMed

    Vargas, Teodoro; Moreno-Rubio, Juan; Herranz, Jesús; Cejas, Paloma; Molina, Susana; González-Vallinas, Margarita; Ramos, Ricardo; Burgos, Emilio; Aguayo, Cristina; Custodio, Ana B; Reglero, Guillermo; Feliu, Jaime; Ramírez de Molina, Ana

    2014-12-01

    Studies have recently suggested that metabolic syndrome and its components increase the risk of colorectal cancer. Both diseases are increasing in most countries, and the genetic association between them has not been fully elucidated. The objective of this study was to assess the association between genetic risk factors of metabolic syndrome or related conditions (obesity, hyperlipidaemia, diabetes mellitus type 2) and clinical outcome in stage II colorectal cancer patients. Expression levels of several genes related to metabolic syndrome and associated alterations were analysed by real-time qPCR in two equivalent but independent sets of stage II colorectal cancer patients. Using logistic regression models and cross-validation analysis with all tumour samples, we developed a metabolic syndrome-related gene expression profile to predict clinical outcome in stage II colorectal cancer patients. The results showed that a gene expression profile constituted by genes previously related to metabolic syndrome was significantly associated with clinical outcome of stage II colorectal cancer patients. This metabolic profile was able to identify patients with a low risk and high risk of relapse. Its predictive value was validated using an independent set of stage II colorectal cancer patients. The identification of a set of genes related to metabolic syndrome that predict survival in intermediate-stage colorectal cancer patients allows delineation of a high-risk group that may benefit from adjuvant therapy and avoid the toxic and unnecessary chemotherapy in patients classified as low risk. Our results also confirm the linkage between metabolic disorder and colorectal cancer and suggest the potential for cancer prevention and/or treatment by targeting these genes. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  11. The barley EST DNA Replication and Repair Database (bEST-DRRD) as a tool for the identification of the genes involved in DNA replication and repair.

    PubMed

    Gruszka, Damian; Marzec, Marek; Szarejko, Iwona

    2012-06-14

    The high level of conservation of genes that regulate DNA replication and repair indicates that they may serve as a source of information on the origin and evolution of the species and makes them a reliable system for the identification of cross-species homologs. Studies that had been conducted to date shed light on the processes of DNA replication and repair in bacteria, yeast and mammals. However, there is still much to be learned about the process of DNA damage repair in plants. These studies, which were conducted mainly using bioinformatics tools, enabled the list of genes that participate in various pathways of DNA repair in Arabidopsis thaliana (L.) Heynh to be outlined; however, information regarding these mechanisms in crop plants is still very limited. A similar, functional approach is particularly difficult for a species whose complete genomic sequences are still unavailable. One of the solutions is to apply ESTs (Expressed Sequence Tags) as the basis for gene identification. For the construction of the barley EST DNA Replication and Repair Database (bEST-DRRD), presented here, the Arabidopsis nucleotide and protein sequences involved in DNA replication and repair were used to browse for and retrieve the deposited sequences, derived from four barley (Hordeum vulgare L.) sequence databases, including the "Barley Genome version 0.05" database (encompassing ca. 90% of barley coding sequences) and from two databases covering the complete genomes of two monocot models: Oryza sativa L. and Brachypodium distachyon L. in order to identify homologous genes. Sequences of the categorised Arabidopsis queries are used for browsing the repositories, which are located on the ViroBLAST platform. The bEST-DRRD is currently used in our project during the identification and validation of the barley genes involved in DNA repair. The presented database provides information about the Arabidopsis genes involved in DNA replication and repair, their expression patterns and models of protein interactions. It was designed and established to provide an open-access tool for the identification of monocot homologs of known Arabidopsis genes that are responsible for DNA-related processes. The barley genes identified in the project are currently being analysed to validate their function.

  12. Modifier locus mapping of a transgenic F2 mouse population identifies CCDC115 as a novel aggressive prostate cancer modifier gene in humans.

    PubMed

    Winter, Jean M; Curry, Natasha L; Gildea, Derek M; Williams, Kendra A; Lee, Minnkyong; Hu, Ying; Crawford, Nigel P S

    2018-06-11

    It is well known that development of prostate cancer (PC) can be attributed to somatic mutations of the genome, acquired within proto-oncogenes or tumor-suppressor genes. What is less well understood is how germline variation contributes to disease aggressiveness in PC patients. To map germline modifiers of aggressive neuroendocrine PC, we generated a genetically diverse F2 intercross population using the transgenic TRAMP mouse model and the wild-derived WSB/EiJ (WSB) strain. The relevance of germline modifiers of aggressive PC identified in these mice was extensively correlated in human PC datasets and functionally validated in cell lines. Aggressive PC traits were quantified in a population of 30 week old (TRAMP x WSB) F2 mice (n = 307). Correlation of germline genotype with aggressive disease phenotype revealed seven modifier loci that were significantly associated with aggressive disease. RNA-seq were analyzed using cis-eQTL and trait correlation analyses to identify candidate genes within each of these loci. Analysis of 92 (TRAMP x WSB) F2 prostates revealed 25 candidate genes that harbored both a significant cis-eQTL and mRNA expression correlations with an aggressive PC trait. We further delineated these candidate genes based on their clinical relevance, by interrogating human PC GWAS and PC tumor gene expression datasets. We identified four genes (CCDC115, DNAJC10, RNF149, and STYXL1), which encompassed all of the following characteristics: 1) one or more germline variants associated with aggressive PC traits; 2) differential mRNA levels associated with aggressive PC traits; and 3) differential mRNA expression between normal and tumor tissue. Functional validation studies of these four genes using the human LNCaP prostate adenocarcinoma cell line revealed ectopic overexpression of CCDC115 can significantly impede cell growth in vitro and tumor growth in vivo. Furthermore, CCDC115 human prostate tumor expression was associated with better survival outcomes. We have demonstrated how modifier locus mapping in mouse models of PC, coupled with in silico analyses of human PC datasets, can reveal novel germline modifier genes of aggressive PC. We have also characterized CCDC115 as being associated with less aggressive PC in humans, placing it as a potential prognostic marker of aggressive PC.

  13. Validation of reference genes for normalization of qPCR mRNA expression levels in Staphylococcus aureus exposed to osmotic and lactic acid stress conditions encountered during food production and preservation.

    PubMed

    Sihto, Henna-Maria; Tasara, Taurai; Stephan, Roger; Johler, Sophia

    2014-07-01

    Staphylococcus aureus represents the most prevalent cause of food-borne intoxications worldwide. While being repressed by competing bacteria in most matrices, this pathogen exhibits crucial competitive advantages during growth at high salt concentrations or low pH, conditions frequently encountered in food production and preservation. We aimed to identify reference genes that could be used to normalize qPCR mRNA expression levels during growth of S. aureus in food-related osmotic (NaCl) and acidic (lactic acid) stress adaptation models. Expression stability of nine housekeeping genes was evaluated in full (LB) and nutrient-deficient (CYGP w/o glucose) medium under conditions of osmotic (4.5% NaCl) and acidic stress (lactic acid, pH 6.0) after 2-h exposure. Among the set of candidate reference genes investigated, rplD, rpoB,gyrB, and rho were most stably expressed in LB and thus represent the most suitable reference genes for normalization of qPCR data in osmotic or lactic acid stress models in a rich medium. Under nutrient-deficient conditions, expression of rho and rpoB was highly stable across all tested conditions. The presented comprehensive data on changes in expression of various S. aureus housekeeping genes under conditions of osmotic and lactic acid stress facilitate selection of reference genes for qPCR-based stress response models. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  14. Gene Overexpression/Suppression Analysis of Candidate Virulence Factors of Candida albicans▿

    PubMed Central

    Fu, Yue; Luo, Guanpingsheng; Spellberg, Brad J.; Edwards, John E.; Ibrahim, Ashraf S.

    2008-01-01

    We developed a conditional overexpression/suppression genetic strategy in Candida albicans to enable simultaneous testing of gain or loss of function in order to identify new virulence factors. The strategy involved insertion of a strong, tetracycline-regulated promoter in front of the gene of interest. To validate the strategy, a library of genes encoding glycosylphosphatidylinositol (GPI)-anchored surface proteins was screened for virulence phenotypes in vitro. During the screening, overexpression of IFF4 was found to increase the adherence of C. albicans to plastic and to human epithelial cells, but not endothelial cells. Consistent with the in vitro results, IFF4 overexpression modestly increased the tissue fungal burden during murine vaginal candidiasis. In addition to the in vitro screening tests, IFF4 overexpression was found to increase C. albicans susceptibility to neutrophil-mediated killing. Furthermore, IFF4 overexpression decreased the severity of hematogenously disseminated candidiasis in normal mice, but not in neutropenic mice, again consistent with the in vitro phenotype. Overexpression of 12 other GPI proteins did not affect normal GPI protein cell surface accumulation, demonstrating that the overexpression strategy did not affect the cell capacity for making such proteins. These data indicate that the same gene can increase or decrease candidal virulence in distinct models of infection, emphasizing the importance of studying virulence genes in different anatomical contexts. Finally, these data validate the use of a conditional overexpression/suppression genetic strategy to identify candidal virulence factors. PMID:18178776

  15. Drosophila models of amyotrophic lateral sclerosis with defects in RNA metabolism.

    PubMed

    Zhang, Ke; Coyne, Alyssa N; Lloyd, Thomas E

    2018-05-09

    The fruit fly Drosophila Melanogaster has been widely used to study neurodegenerative diseases. The conservation of nervous system biology coupled with the rapid life cycle and powerful genetic tools in the fly have enabled the identification of novel therapeutic targets that have been validated in vertebrate model systems and human patients. A recent example is in the study of the devastating motor neuron degenerative disease amyotrophic lateral sclerosis (ALS). Mutations in genes that regulate RNA metabolism are a major cause of inherited ALS, and functional analysis of these genes in the fly nervous system has shed light on how mutations cause disease. Importantly, unbiased genetic screens have identified key pathways that contribute to ALS pathogenesis such as nucleocytoplasmic transport and stress granule assembly. In this review, we will discuss the utilization of Drosophila models of ALS with defects in RNA metabolism. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Genetic Basis of Atherosclerosis: Insights from Mice and Humans

    PubMed Central

    Stylianou, Ioannis M.; Bauer, Robert C.; Reilly, Muredach P.; Rader, Daniel J.

    2012-01-01

    Atherosclerosis is a complex and heritable disease involving multiple cell types and the interactions of many different molecular pathways. The genetic and molecular mechanisms of atherosclerosis have in part been elucidated by mouse models; at least 100 different genes have been shown to influence atherosclerosis in mice. Importantly, unbiased genome-wide association studies have recently identified a number of novel loci robustly associated with atherosclerotic coronary artery disease (CAD). Here we review the genetic data elucidated from mouse models of atherosclerosis, as well as significant associations for human CAD. Furthermore, we discuss in greater detail some of these novel human CAD loci. The combination of mouse and human genetics has the potential to identify and validate novel genes that influence atherosclerosis, some of which may be candidates for new therapeutic approaches. PMID:22267839

  17. Personalized Nutrition-Genes, Diet, and Related Interactive Parameters as Predictors of Cancer in Multiethnic Colorectal Cancer Families.

    PubMed

    Shiao, S Pamela K; Grayson, James; Lie, Amanda; Yu, Chong Ho

    2018-06-20

    To personalize nutrition, the purpose of this study was to examine five key genes in the folate metabolism pathway, and dietary parameters and related interactive parameters as predictors of colorectal cancer (CRC) by measuring the healthy eating index (HEI) in multiethnic families. The five genes included methylenetetrahydrofolate reductase ( MTHFR ) 677 and 1298, methionine synthase ( MTR ) 2756, methionine synthase reductase ( MTRR 66), and dihydrofolate reductase ( DHFR ) 19bp , and they were used to compute a total gene mutation score. We included 53 families, 53 CRC patients and 53 paired family friend members of diverse population groups in Southern California. We measured multidimensional data using the ensemble bootstrap forest method to identify variables of importance within domains of genetic, demographic, and dietary parameters to achieve dimension reduction. We then constructed predictive generalized regression (GR) modeling with a supervised machine learning validation procedure with the target variable (cancer status) being specified to validate the results to allow enhanced prediction and reproducibility. The results showed that the CRC group had increased total gene mutation scores compared to the family members ( p < 0.05). Using the Akaike's information criterion and Leave-One-Out cross validation GR methods, the HEI was interactive with thiamine (vitamin B1), which is a new finding for the literature. The natural food sources for thiamine include whole grains, legumes, and some meats and fish which HEI scoring included as part of healthy portions (versus limiting portions on salt, saturated fat and empty calories). Additional predictors included age, as well as gender and the interaction of MTHFR 677 with overweight status (measured by body mass index) in predicting CRC, with the cancer group having more men and overweight cases. The HEI score was significant when split at the median score of 77 into greater or less scores, confirmed through the machine-learning recursive tree method and predictive modeling, although an HEI score of greater than 80 is the US national standard set value for a good diet. The HEI and healthy eating are modifiable factors for healthy living in relation to dietary parameters and cancer prevention, and they can be used for personalized nutrition in the precision-based healthcare era.

  18. A lung cancer risk classifier comprising genome maintenance genes measured in normal bronchial epithelial cells.

    PubMed

    Yeo, Jiyoun; Crawford, Erin L; Zhang, Xiaolu; Khuder, Sadik; Chen, Tian; Levin, Albert; Blomquist, Thomas M; Willey, James C

    2017-05-02

    Annual low dose CT (LDCT) screening of individuals at high demographic risk reduces lung cancer mortality by more than 20%. However, subjects selected for screening based on demographic criteria typically have less than a 10% lifetime risk for lung cancer. Thus, there is need for a biomarker that better stratifies subjects for LDCT screening. Toward this goal, we previously reported a lung cancer risk test (LCRT) biomarker comprising 14 genome-maintenance (GM) pathway genes measured in normal bronchial epithelial cells (NBEC) that accurately classified cancer (CA) from non-cancer (NC) subjects. The primary goal of the studies reported here was to optimize the LCRT biomarker for high specificity and ease of clinical implementation. Targeted competitive multiplex PCR amplicon libraries were prepared for next generation sequencing (NGS) analysis of transcript abundance at 68 sites among 33 GM target genes in NBEC specimens collected from a retrospective cohort of 120 subjects, including 61 CA cases and 59 NC controls. Genes were selected for analysis based on contribution to the previously reported LCRT biomarker and/or prior evidence for association with lung cancer risk. Linear discriminant analysis was used to identify the most accurate classifier suitable to stratify subjects for screening. After cross-validation, a model comprising expression values from 12 genes (CDKN1A, E2F1, ERCC1, ERCC4, ERCC5, GPX1, GSTP1, KEAP1, RB1, TP53, TP63, and XRCC1) and demographic factors age, gender, and pack-years smoking, had Receiver Operator Characteristic area under the curve (ROC AUC) of 0.975 (95% CI: 0.96-0.99). The overall classification accuracy was 93% (95% CI 88%-98%) with sensitivity 93.1%, specificity 92.9%, positive predictive value 93.1% and negative predictive value 93%. The ROC AUC for this classifier was significantly better (p < 0.0001) than the best model comprising demographic features alone. The LCRT biomarker reported here displayed high accuracy and ease of implementation on a high throughput, quality-controlled targeted NGS platform. As such, it is optimized for clinical validation in specimens from the ongoing LCRT blinded prospective cohort study. Following validation, the biomarker is expected to have clinical utility by better stratifying subjects for annual lung cancer screening compared to current demographic criteria alone.

  19. Association of a History of Child Abuse With Impaired Myelination in the Anterior Cingulate Cortex: Convergent Epigenetic, Transcriptional, and Morphological Evidence.

    PubMed

    Lutz, Pierre-Eric; Tanti, Arnaud; Gasecka, Alicja; Barnett-Burns, Sarah; Kim, John J; Zhou, Yi; Chen, Gang G; Wakid, Marina; Shaw, Meghan; Almeida, Daniel; Chay, Marc-Aurele; Yang, Jennie; Larivière, Vanessa; M'Boutchou, Marie-Noël; van Kempen, Léon C; Yerko, Volodymyr; Prud'homme, Josée; Davoli, Maria Antonietta; Vaillancourt, Kathryn; Théroux, Jean-François; Bramoullé, Alexandre; Zhang, Tie-Yuan; Meaney, Michael J; Ernst, Carl; Côté, Daniel; Mechawar, Naguib; Turecki, Gustavo

    2017-12-01

    Child abuse has devastating and long-lasting consequences, considerably increasing the lifetime risk of negative mental health outcomes such as depression and suicide. Yet the neurobiological processes underlying this heightened vulnerability remain poorly understood. The authors investigated the hypothesis that epigenetic, transcriptomic, and cellular adaptations may occur in the anterior cingulate cortex as a function of child abuse. Postmortem brain samples from human subjects (N=78) and from a rodent model of the impact of early-life environment (N=24) were analyzed. The human samples were from depressed individuals who died by suicide, with (N=27) or without (N=25) a history of severe child abuse, as well as from psychiatrically healthy control subjects (N=26). Genome-wide DNA methylation and gene expression were investigated using reduced representation bisulfite sequencing and RNA sequencing, respectively. Cell type-specific validation of differentially methylated loci was performed after fluorescence-activated cell sorting of oligodendrocyte and neuronal nuclei. Differential gene expression was validated using NanoString technology. Finally, oligodendrocytes and myelinated axons were analyzed using stereology and coherent anti-Stokes Raman scattering microscopy. A history of child abuse was associated with cell type-specific changes in DNA methylation of oligodendrocyte genes and a global impairment of the myelin-related transcriptional program. These effects were absent in the depressed suicide completers with no history of child abuse, and they were strongly correlated with myelin gene expression changes observed in the animal model. Furthermore, a selective and significant reduction in the thickness of myelin sheaths around small-diameter axons was observed in individuals with history of child abuse. The results suggest that child abuse, in part through epigenetic reprogramming of oligodendrocytes, may lastingly disrupt cortical myelination, a fundamental feature of cerebral connectivity.

  20. Peripuberty stress leads to abnormal aggression, altered amygdala and orbitofrontal reactivity and increased prefrontal MAOA gene expression.

    PubMed

    Márquez, C; Poirier, G L; Cordero, M I; Larsen, M H; Groner, A; Marquis, J; Magistretti, P J; Trono, D; Sandi, C

    2013-01-15

    Although adverse early life experiences have been found to increase lifetime risk to develop violent behaviors, the neurobiological mechanisms underlying these long-term effects remain unclear. We present a novel animal model for pathological aggression induced by peripubertal exposure to stress with face, construct and predictive validity. We show that male rats submitted to fear-induction experiences during the peripubertal period exhibit high and sustained rates of increased aggression at adulthood, even against unthreatening individuals, and increased testosterone/corticosterone ratio. They also exhibit hyperactivity in the amygdala under both basal conditions (evaluated by 2-deoxy-glucose autoradiography) and after a resident-intruder (RI) test (evaluated by c-Fos immunohistochemistry), and hypoactivation of the medial orbitofrontal (MO) cortex after the social challenge. Alterations in the connectivity between the orbitofrontal cortex and the amygdala were linked to the aggressive phenotype. Increased and sustained expression levels of the monoamine oxidase A (MAOA) gene were found in the prefrontal cortex but not in the amygdala of peripubertally stressed animals. They were accompanied by increased activatory acetylation of histone H3, but not H4, at the promoter of the MAOA gene. Treatment with an MAOA inhibitor during adulthood reversed the peripuberty stress-induced antisocial behaviors. Beyond the characterization and validation of the model, we present novel data highlighting changes in the serotonergic system in the prefrontal cortex-and pointing at epigenetic control of the MAOA gene-in the establishment of the link between peripubertal stress and later pathological aggression. Our data emphasize the impact of biological factors triggered by peripubertal adverse experiences on the emergence of violent behaviors.

  1. Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer.

    PubMed

    Zhou, Xionghui; Liu, Juan

    2014-01-01

    Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for phenotypic change.

  2. Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus subtilis.

    PubMed

    Koo, Byoung-Mo; Kritikos, George; Farelli, Jeremiah D; Todor, Horia; Tong, Kenneth; Kimsey, Harvey; Wapinski, Ilan; Galardini, Marco; Cabal, Angelo; Peters, Jason M; Hachmann, Anna-Barbara; Rudner, David Z; Allen, Karen N; Typas, Athanasios; Gross, Carol A

    2017-03-22

    A systems-level understanding of Gram-positive bacteria is important from both an environmental and health perspective and is most easily obtained when high-quality, validated genomic resources are available. To this end, we constructed two ordered, barcoded, erythromycin-resistance- and kanamycin-resistance-marked single-gene deletion libraries of the Gram-positive model organism, Bacillus subtilis. The libraries comprise 3,968 and 3,970 genes, respectively, and overlap in all but four genes. Using these libraries, we update the set of essential genes known for this organism, provide a comprehensive compendium of B. subtilis auxotrophic genes, and identify genes required for utilizing specific carbon and nitrogen sources, as well as those required for growth at low temperature. We report the identification of enzymes catalyzing several missing steps in amino acid biosynthesis. Finally, we describe a suite of high-throughput phenotyping methodologies and apply them to provide a genome-wide analysis of competence and sporulation. Altogether, we provide versatile resources for studying gene function and pathway and network architecture in Gram-positive bacteria. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  3. An Integrated Cell Purification and Genomics Strategy Reveals Multiple Regulators of Pancreas Development

    PubMed Central

    Benitez, Cecil M.; Qu, Kun; Sugiyama, Takuya; Pauerstein, Philip T.; Liu, Yinghua; Tsai, Jennifer; Gu, Xueying; Ghodasara, Amar; Arda, H. Efsun; Zhang, Jiajing; Dekker, Joseph D.; Tucker, Haley O.; Chang, Howard Y.; Kim, Seung K.

    2014-01-01

    The regulatory logic underlying global transcriptional programs controlling development of visceral organs like the pancreas remains undiscovered. Here, we profiled gene expression in 12 purified populations of fetal and adult pancreatic epithelial cells representing crucial progenitor cell subsets, and their endocrine or exocrine progeny. Using probabilistic models to decode the general programs organizing gene expression, we identified co-expressed gene sets in cell subsets that revealed patterns and processes governing progenitor cell development, lineage specification, and endocrine cell maturation. Purification of Neurog3 mutant cells and module network analysis linked established regulators such as Neurog3 to unrecognized gene targets and roles in pancreas development. Iterative module network analysis nominated and prioritized transcriptional regulators, including diabetes risk genes. Functional validation of a subset of candidate regulators with corresponding mutant mice revealed that the transcription factors Etv1, Prdm16, Runx1t1 and Bcl11a are essential for pancreas development. Our integrated approach provides a unique framework for identifying regulatory genes and functional gene sets underlying pancreas development and associated diseases such as diabetes mellitus. PMID:25330008

  4. The road ahead: working towards effective clinical translation of myocardial gene therapies

    PubMed Central

    Katz, Michael G; Fargnoli, Anthony S; Williams, Richard D; Bridges, Charles R

    2014-01-01

    During the last two decades the fields of molecular and cellular cardiology, and more recently molecular cardiac surgery, have developed rapidly. The concept of delivering cDNA encoding a therapeutic gene to cardiomyocytes using a vector system with substantial cardiac tropism, allowing for long-term expression of a therapeutic protein, has moved from hypothesis to bench to clinical application. However, the clinical results to date are still disappointing. The ideal gene transfer method should be explored in clinically relevant animal models of heart disease to evaluate the relative roles of specific molecular pathways in disease pathogenesis, helping to validate the potential targets for therapeutic intervention. Successful clinical cardiovascular gene therapy also requires the use of nonimmunogenic cardiotropic vectors capable of expressing the requisite amount of therapeutic protein in vivo and in situ. Depending on the desired application either regional or global myocardial gene delivery is required. Cardiac-specific delivery techniques incorporating mapping technologies for regional delivery and highly efficient methodologies for global delivery should improve the precision and specificity of gene transfer to the areas of interest and minimize collateral organ gene expression. PMID:24341816

  5. Statistical algorithms improve accuracy of gene fusion detection

    PubMed Central

    Hsieh, Gillian; Bierman, Rob; Szabo, Linda; Lee, Alex Gia; Freeman, Donald E.; Watson, Nathaniel; Sweet-Cordero, E. Alejandro

    2017-01-01

    Abstract Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors. PMID:28541529

  6. Association between polymorphism within interleukin related genes and Graves' disease: a meta-analysis of 22 case-control studies

    PubMed Central

    Zeng, Tianshu; Cai, Xiong; Kong, Wen

    2017-01-01

    Graves’ disease (GD) is a common autoimmune disorder with a genetic predisposition. There is strong evidence to suggest that both Th1 and Th2 circulating cytokines are involved in the development of GD. In this study, we conducted a meta-analysis to assess the impact of seven variations of five IL-related genes on the susceptibility to GD. A total of 22 case-control studies involving 5338 GD patients and 6446 healthy controls were included. The results showed that only one SNP rs1800795 in IL-6 was significantly associated with GD in homozygous model (CC vs. GG: OR = 2.714, 95% CI = 1.047–7.039, p = 0.04), heterozygous model (CG vs. GG: OR = 1.295, 95% CI = 1.013–1.655, p = 0.039), dominant model (CC+CG vs. GG: OR = 1.418, 95% CI = 1.122–1.793, p = 0.003) and additive model (C vs. G: OR = 1.432, 95% CI = 1.087–1.886, p = 0.011).To explain the heterogeneity, we performed the subgroup analysis by ethnicity. The ethnicity stratification revealed that the association between rs1800795 and GD tended to be much stronger for Asian than European population in homozygous, dominant, recessive, and additive models. The remaining 6 SNPs in 4 genes did not show any significant association with GD in any genetic models. Together, our data support that rs1800795 within the IL-6 gene confers genetic susceptibility for GD. Future large-scale studies are required to validate the associations between IL-6 and others IL-related genes and GD. PMID:29228744

  7. Protein-coding genes combined with long noncoding RNA as a novel transcriptome molecular staging model to predict the survival of patients with esophageal squamous cell carcinoma.

    PubMed

    Guo, Jin-Cheng; Wu, Yang; Chen, Yang; Pan, Feng; Wu, Zhi-Yong; Zhang, Jia-Sheng; Wu, Jian-Yi; Xu, Xiu-E; Zhao, Jian-Mei; Li, En-Min; Zhao, Yi; Xu, Li-Yan

    2018-04-09

    Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal carcinoma in China. This study was to develop a staging model to predict outcomes of patients with ESCC. Using Cox regression analysis, principal component analysis (PCA), partitioning clustering, Kaplan-Meier analysis, receiver operating characteristic (ROC) curve analysis, and classification and regression tree (CART) analysis, we mined the Gene Expression Omnibus database to determine the expression profiles of genes in 179 patients with ESCC from GSE63624 and GSE63622 dataset. Univariate cox regression analysis of the GSE63624 dataset revealed that 2404 protein-coding genes (PCGs) and 635 long non-coding RNAs (lncRNAs) were associated with the survival of patients with ESCC. PCA categorized these PCGs and lncRNAs into three principal components (PCs), which were used to cluster the patients into three groups. ROC analysis demonstrated that the predictive ability of PCG-lncRNA PCs when applied to new patients was better than that of the tumor-node-metastasis staging (area under ROC curve [AUC]: 0.69 vs. 0.65, P < 0.05). Accordingly, we constructed a molecular disaggregated model comprising one lncRNA and two PCGs, which we designated as the LSB staging model using CART analysis in the GSE63624 dataset. This LSB staging model classified the GSE63622 dataset of patients into three different groups, and its effectiveness was validated by analysis of another cohort of 105 patients. The LSB staging model has clinical significance for the prognosis prediction of patients with ESCC and may serve as a three-gene staging microarray.

  8. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  9. Map making in the 21st century: charting breast cancer susceptibility pathways in rodent models.

    PubMed

    Blackburn, Anneke C; Jerry, D Joseph

    2011-04-01

    Genetic factors play an important role in determining risk and resistance to increased breast cancer. Recent technological advances have made it possible to analyze hundreds of thousands of single nucleotide polymorphisms in large-scale association studies in humans and have resulted in identification of alleles in over 20 genes that influence breast cancer risk. Despite these advances, the challenge remains in identifying what the functional polymorphisms are that confer the increased risk, and how these genetic variants interact with each other and with environmental factors. In rodents, the incidence of mammary tumors varies among strains, such that they can provide alternate ideas for candidate pathways involved in humans. Mapping studies in animals have unearthed numerous loci for breast cancer susceptibility that have been validated in human populations. In a reciprocal manner, knockin and knockout mice have been used to validate the tumorigenicity of risk alleles found in population studies. Rodent studies also underscore the complexity of interactions among alleles. The fact that genes affecting risk and resistance to mammary tumors in rodents depend greatly upon the carcinogenic challenge emphasizes the importance of gene x environment interactions. The challenge to rodent geneticists now is to capitalize on the ability to control the genetics and environment in rodent models of tumorigenesis to better understand the biology of breast cancer development, to identify those polymorphisms most relevant to human susceptibility and to identify compensatory pathways that can be targeted for improved prevention in women at highest risk of developing breast cancer.

  10. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis

    NASA Astrophysics Data System (ADS)

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan

    2015-03-01

    Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning ``plug-and-play'' approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.

  11. Directed natural product biosynthesis gene cluster capture and expression in the model bacterium Bacillus subtilis.

    PubMed

    Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S; Qian, Pei-Yuan

    2015-03-24

    Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning "plug-and-play" approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.

  12. Transcriptome-wide single nucleotide polymorphisms (SNPs) for abalone (Haliotis midae): validation and application using GoldenGate medium-throughput genotyping assays.

    PubMed

    Bester-Van Der Merwe, Aletta; Blaauw, Sonja; Du Plessis, Jana; Roodt-Wilding, Rouvay

    2013-09-23

    Haliotis midae is one of the most valuable commercial abalone species in the world, but is highly vulnerable, due to exploitation, habitat destruction and predation. In order to preserve wild and cultured stocks, genetic management and improvement of the species has become crucial. Fundamental to this is the availability and employment of molecular markers, such as microsatellites and single nucleotide (SNPs). Transcriptome sequences generated through sequencing-by-synthesis technology were utilized for the in vitro and in silico identification of 505 putative SNPs from a total of 316 selected contigs. A subset of 234 SNPs were further validated and characterized in wild and cultured abalone using two Illumina GoldenGate genotyping assays. Combined with VeraCode technology, this genotyping platform yielded a 65%-69% conversion rate (percentage polymorphic markers) with a global genotyping success rate of 76%-85% and provided a viable means for validating SNP markers in a non-model species. The utility of 31 of the validated SNPs in population structure analysis was confirmed, while a large number of SNPs (174) were shown to be informative and are, thus, good candidates for linkage map construction. The non-synonymous SNPs (50) located in coding regions of genes that showed similarities with known proteins will also be useful for genetic applications, such as the marker-assisted selection of genes of relevance to abalone aquaculture.

  13. Functional and evolutionary insights from the Ciona notochord transcriptome.

    PubMed

    Reeves, Wendy M; Wu, Yuye; Harder, Matthew J; Veeman, Michael T

    2017-09-15

    The notochord of the ascidian Ciona consists of only 40 cells, and is a longstanding model for studying organogenesis in a small, simple embryo. Here, we perform RNAseq on flow-sorted notochord cells from multiple stages to define a comprehensive Ciona notochord transcriptome. We identify 1364 genes with enriched expression and extensively validate the results by in situ hybridization. These genes are highly enriched for Gene Ontology terms related to the extracellular matrix, cell adhesion and cytoskeleton. Orthologs of 112 of the Ciona notochord genes have known notochord expression in vertebrates, more than twice as many as predicted by chance alone. This set of putative effector genes with notochord expression conserved from tunicates to vertebrates will be invaluable for testing hypotheses about notochord evolution. The full set of Ciona notochord genes provides a foundation for systems-level studies of notochord gene regulation and morphogenesis. We find only modest overlap between this set of notochord-enriched transcripts and the genes upregulated by ectopic expression of the key notochord transcription factor Brachyury, indicating that Brachyury is not a notochord master regulator gene as strictly defined. © 2017. Published by The Company of Biologists Ltd.

  14. Mathematical modeling of gene expression: a guide for the perplexed biologist

    PubMed Central

    Ay, Ahmet; Arnosti, David N.

    2011-01-01

    The detailed analysis of transcriptional networks holds a key for understanding central biological processes, and interest in this field has exploded due to new large-scale data acquisition techniques. Mathematical modeling can provide essential insights, but the diversity of modeling approaches can be a daunting prospect to investigators new to this area. For those interested in beginning a transcriptional mathematical modeling project we provide here an overview of major types of models and their applications to transcriptional networks. In this discussion of recent literature on thermodynamic, Boolean and differential equation models we focus on considerations critical for choosing and validating a modeling approach that will be useful for quantitative understanding of biological systems. PMID:21417596

  15. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data.

    PubMed

    Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O

    2015-08-25

    Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.

  16. Selection and Validation of Appropriate Reference Genes for qRT-PCR Analysis in Isatis indigotica Fort.

    PubMed Central

    Li, Tao; Wang, Jing; Lu, Miao; Zhang, Tianyi; Qu, Xinyun; Wang, Zhezhi

    2017-01-01

    Due to its sensitivity and specificity, real-time quantitative PCR (qRT-PCR) is a popular technique for investigating gene expression levels in plants. Based on the Minimum Information for Publication of Real-Time Quantitative PCR Experiments (MIQE) guidelines, it is necessary to select and validate putative appropriate reference genes for qRT-PCR normalization. In the current study, three algorithms, geNorm, NormFinder, and BestKeeper, were applied to assess the expression stability of 10 candidate reference genes across five different tissues and three different abiotic stresses in Isatis indigotica Fort. Additionally, the IiYUC6 gene associated with IAA biosynthesis was applied to validate the candidate reference genes. The analysis results of the geNorm, NormFinder, and BestKeeper algorithms indicated certain differences for the different sample sets and different experiment conditions. Considering all of the algorithms, PP2A-4 and TUB4 were recommended as the most stable reference genes for total and different tissue samples, respectively. Moreover, RPL15 and PP2A-4 were considered to be the most suitable reference genes for abiotic stress treatments. The obtained experimental results might contribute to improved accuracy and credibility for the expression levels of target genes by qRT-PCR normalization in I. indigotica. PMID:28702046

  17. Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation

    PubMed Central

    Hériché, Jean-Karim; Lees, Jon G.; Morilla, Ian; Walter, Thomas; Petrova, Boryana; Roberti, M. Julia; Hossain, M. Julius; Adler, Priit; Fernández, José M.; Krallinger, Martin; Haering, Christian H.; Vilo, Jaak; Valencia, Alfonso; Ranea, Juan A.; Orengo, Christine; Ellenberg, Jan

    2014-01-01

    The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest. PMID:24943848

  18. SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis.

    PubMed

    Aguirre-Gamboa, Raul; Gomez-Rueda, Hugo; Martínez-Ledesma, Emmanuel; Martínez-Torteya, Antonio; Chacolla-Huaringa, Rafael; Rodriguez-Barrientos, Alberto; Tamez-Peña, José G; Treviño, Victor

    2013-01-01

    Validation of multi-gene biomarkers for clinical outcomes is one of the most important issues for cancer prognosis. An important source of information for virtual validation is the high number of available cancer datasets. Nevertheless, assessing the prognostic performance of a gene expression signature along datasets is a difficult task for Biologists and Physicians and also time-consuming for Statisticians and Bioinformaticians. Therefore, to facilitate performance comparisons and validations of survival biomarkers for cancer outcomes, we developed SurvExpress, a cancer-wide gene expression database with clinical outcomes and a web-based tool that provides survival analysis and risk assessment of cancer datasets. The main input of SurvExpress is only the biomarker gene list. We generated a cancer database collecting more than 20,000 samples and 130 datasets with censored clinical information covering tumors over 20 tissues. We implemented a web interface to perform biomarker validation and comparisons in this database, where a multivariate survival analysis can be accomplished in about one minute. We show the utility and simplicity of SurvExpress in two biomarker applications for breast and lung cancer. Compared to other tools, SurvExpress is the largest, most versatile, and quickest free tool available. SurvExpress web can be accessed in http://bioinformatica.mty.itesm.mx/SurvExpress (a tutorial is included). The website was implemented in JSP, JavaScript, MySQL, and R.

  19. SurvExpress: An Online Biomarker Validation Tool and Database for Cancer Gene Expression Data Using Survival Analysis

    PubMed Central

    Aguirre-Gamboa, Raul; Gomez-Rueda, Hugo; Martínez-Ledesma, Emmanuel; Martínez-Torteya, Antonio; Chacolla-Huaringa, Rafael; Rodriguez-Barrientos, Alberto; Tamez-Peña, José G.; Treviño, Victor

    2013-01-01

    Validation of multi-gene biomarkers for clinical outcomes is one of the most important issues for cancer prognosis. An important source of information for virtual validation is the high number of available cancer datasets. Nevertheless, assessing the prognostic performance of a gene expression signature along datasets is a difficult task for Biologists and Physicians and also time-consuming for Statisticians and Bioinformaticians. Therefore, to facilitate performance comparisons and validations of survival biomarkers for cancer outcomes, we developed SurvExpress, a cancer-wide gene expression database with clinical outcomes and a web-based tool that provides survival analysis and risk assessment of cancer datasets. The main input of SurvExpress is only the biomarker gene list. We generated a cancer database collecting more than 20,000 samples and 130 datasets with censored clinical information covering tumors over 20 tissues. We implemented a web interface to perform biomarker validation and comparisons in this database, where a multivariate survival analysis can be accomplished in about one minute. We show the utility and simplicity of SurvExpress in two biomarker applications for breast and lung cancer. Compared to other tools, SurvExpress is the largest, most versatile, and quickest free tool available. SurvExpress web can be accessed in http://bioinformatica.mty.itesm.mx/SurvExpress (a tutorial is included). The website was implemented in JSP, JavaScript, MySQL, and R. PMID:24066126

  20. GeneTopics - interpretation of gene sets via literature-driven topic models

    PubMed Central

    2013-01-01

    Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875

  1. Comparative analysis of predictive models for nongenotoxic hepatocarcinogenicity using both toxicogenomics and quantitative structure-activity relationships.

    PubMed

    Liu, Zhichao; Kelly, Reagan; Fang, Hong; Ding, Don; Tong, Weida

    2011-07-18

    The primary testing strategy to identify nongenotoxic carcinogens largely relies on the 2-year rodent bioassay, which is time-consuming and labor-intensive. There is an increasing effort to develop alternative approaches to prioritize the chemicals for, supplement, or even replace the cancer bioassay. In silico approaches based on quantitative structure-activity relationships (QSAR) are rapid and inexpensive and thus have been investigated for such purposes. A slightly more expensive approach based on short-term animal studies with toxicogenomics (TGx) represents another attractive option for this application. Thus, the primary questions are how much better predictive performance using short-term TGx models can be achieved compared to that of QSAR models, and what length of exposure is sufficient for high quality prediction based on TGx. In this study, we developed predictive models for rodent liver carcinogenicity using gene expression data generated from short-term animal models at different time points and QSAR. The study was focused on the prediction of nongenotoxic carcinogenicity since the genotoxic chemicals can be inexpensively removed from further development using various in vitro assays individually or in combination. We identified 62 chemicals whose hepatocarcinogenic potential was available from the National Center for Toxicological Research liver cancer database (NCTRlcdb). The gene expression profiles of liver tissue obtained from rats treated with these chemicals at different time points (1 day, 3 days, and 5 days) are available from the Gene Expression Omnibus (GEO) database. Both TGx and QSAR models were developed on the basis of the same set of chemicals using the same modeling approach, a nearest-centroid method with a minimum redundancy and maximum relevancy-based feature selection with performance assessed using compound-based 5-fold cross-validation. We found that the TGx models outperformed QSAR in every aspect of modeling. For example, the TGx models' predictive accuracy (0.77, 0.77, and 0.82 for the 1-day, 3-day, and 5-day models, respectively) was much higher for an independent validation set than that of a QSAR model (0.55). Permutation tests confirmed the statistical significance of the model's prediction performance. The study concluded that a short-term 5-day TGx animal model holds the potential to predict nongenotoxic hepatocarcinogenicity. © 2011 American Chemical Society

  2. Model-based design of RNA hybridization networks implemented in living cells

    PubMed Central

    Rodrigo, Guillermo; Prakash, Satya; Shen, Shensi; Majer, Eszter

    2017-01-01

    Abstract Synthetic gene circuits allow the behavior of living cells to be reprogrammed, and non-coding small RNAs (sRNAs) are increasingly being used as programmable regulators of gene expression. However, sRNAs (natural or synthetic) are generally used to regulate single target genes, while complex dynamic behaviors would require networks of sRNAs regulating each other. Here, we report a strategy for implementing such networks that exploits hybridization reactions carried out exclusively by multifaceted sRNAs that are both targets of and triggers for other sRNAs. These networks are ultimately coupled to the control of gene expression. We relied on a thermodynamic model of the different stable conformational states underlying this system at the nucleotide level. To test our model, we designed five different RNA hybridization networks with a linear architecture, and we implemented them in Escherichia coli. We validated the network architecture at the molecular level by native polyacrylamide gel electrophoresis, as well as the network function at the bacterial population and single-cell levels with a fluorescent reporter. Our results suggest that it is possible to engineer complex cellular programs based on RNA from first principles. Because these networks are mainly based on physical interactions, our designs could be expanded to other organisms as portable regulatory resources or to implement biological computations. PMID:28934501

  3. Genetic modifiers of abnormal organelle biogenesis in a Drosophila model of BLOC-1 deficiency

    PubMed Central

    Cheli, Verónica T.; Daniels, Richard W.; Godoy, Ruth; Hoyle, Diego J.; Kandachar, Vasundhara; Starcevic, Marta; Martinez-Agosto, Julian A.; Poole, Stephen; DiAntonio, Aaron; Lloyd, Vett K.; Chang, Henry C.; Krantz, David E.; Dell'Angelica, Esteban C.

    2010-01-01

    Biogenesis of lysosome-related organelles complex 1 (BLOC-1) is a protein complex formed by the products of eight distinct genes. Loss-of-function mutations in two of these genes, DTNBP1 and BLOC1S3, cause Hermansky–Pudlak syndrome, a human disorder characterized by defective biogenesis of lysosome-related organelles. In addition, haplotype variants within the same two genes have been postulated to increase the risk of developing schizophrenia. However, the molecular function of BLOC-1 remains unknown. Here, we have generated a fly model of BLOC-1 deficiency. Mutant flies lacking the conserved Blos1 subunit displayed eye pigmentation defects due to abnormal pigment granules, which are lysosome-related organelles, as well as abnormal glutamatergic transmission and behavior. Epistatic analyses revealed that BLOC-1 function in pigment granule biogenesis requires the activities of BLOC-2 and a putative Rab guanine-nucleotide-exchange factor named Claret. The eye pigmentation phenotype was modified by misexpression of proteins involved in intracellular protein trafficking; in particular, the phenotype was partially ameliorated by Rab11 and strongly enhanced by the clathrin-disassembly factor, Auxilin. These observations validate Drosophila melanogaster as a powerful model for the study of BLOC-1 function and its interactions with modifier genes. PMID:20015953

  4. Lentiviral vector-based insertional mutagenesis identifies genes associated with liver cancer

    PubMed Central

    Ranzani, Marco; Cesana, Daniela; Bartholomae, Cynthia C.; Sanvito, Francesca; Pala, Mauro; Benedicenti, Fabrizio; Gallina, Pierangela; Sergi, Lucia Sergi; Merella, Stefania; Bulfone, Alessandro; Doglioni, Claudio; von Kalle, Christof; Kim, Yoon Jun; Schmidt, Manfred; Tonon, Giovanni; Naldini, Luigi; Montini, Eugenio

    2013-01-01

    Transposons and γ-retroviruses have been efficiently used as insertional mutagens in different tissues to identify molecular culprits of cancer. However, these systems are characterized by recurring integrations that accumulate in tumor cells, hampering the identification of early cancer-driving events amongst bystander and progression-related events. We developed an insertional mutagenesis platform based on lentiviral vectors (LVV) by which we could efficiently induce hepatocellular carcinoma (HCC) in 3 different mouse models. By virtue of LVV’s replication-deficient nature and broad genome-wide integration pattern, LVV-based insertional mutagenesis allowed identification of 4 new liver cancer genes from a limited number of integrations. We validated the oncogenic potential of all the identified genes in vivo, with different levels of penetrance. Our newly identified cancer genes are likely to play a role in human disease, since they are upregulated and/or amplified/deleted in human HCCs and can predict clinical outcome of patients. PMID:23314173

  5. ADHD-associated dopamine transporter, latrophilin and neurofibromin share a dopamine-related locomotor signature in Drosophila

    PubMed Central

    van der Voet, M; Harich, B; Franke, B; Schenck, A

    2016-01-01

    Attention-deficit/hyperactivity disorder (ADHD) is a common, highly heritable neuropsychiatric disorder with hyperactivity as one of the hallmarks. Aberrant dopamine signaling is thought to be a major theme in ADHD, but how this relates to the vast majority of ADHD candidate genes is illusive. Here we report a Drosophila dopamine-related locomotor endophenotype that is shared by pan-neuronal knockdown of orthologs of the ADHD-associated genes Dopamine transporter (DAT1) and Latrophilin (LPHN3), and of a gene causing a monogenic disorder with frequent ADHD comorbidity: Neurofibromin (NF1). The locomotor signature was not found in control models and could be ameliorated by methylphenidate, validating its relevance to symptoms of the disorder. The Drosophila ADHD endophenotype can be further exploited in high throughput to characterize the growing number of candidate genes. It represents an equally useful outcome measure for testing chemical compounds to define novel treatment options. PMID:25962619

  6. Global transcriptome analysis reveals extensive gene remodeling, alternative splicing and differential transcription profiles in non-seed vascular plant Selaginella moellendorffii.

    PubMed

    Zhu, Yan; Chen, Longxian; Zhang, Chengjun; Hao, Pei; Jing, Xinyun; Li, Xuan

    2017-01-25

    Selaginella moellendorffii, a lycophyte, is a model plant to study the early evolution and development of vascular plants. As the first and only sequenced lycophyte to date, the genome of S. moellendorffii revealed many conserved genes and pathways, as well as specialized genes different from flowering plants. Despite the progress made, little is known about long noncoding RNAs (lncRNA) and the alternative splicing (AS) of coding genes in S. moellendorffii. Its coding gene models have not been fully validated with transcriptome data. Furthermore, it remains important to understand whether the regulatory mechanisms similar to flowering plants are used, and how they operate in a non-seed primitive vascular plant. RNA-sequencing (RNA-seq) was performed for three S. moellendorffii tissues, root, stem, and leaf, by constructing strand-specific RNA-seq libraries from RNA purified using RiboMinus isolation protocol. A total of 176 million reads (44 Gbp) were obtained from three tissue types, and were mapped to S. moellendorffii genome. By comparing with 22,285 existing gene models of S. moellendorffii, we identified 7930 high-confidence novel coding genes (a 35.6% increase), and for the first time reported 4422 lncRNAs in a lycophyte. Further, we refined 2461 (11.0%) of existing gene models, and identified 11,030 AS events (for 5957 coding genes) revealed for the first time for lycophytes. Tissue-specific gene expression with functional implication was analyzed, and 1031, 554, and 269 coding genes, and 174, 39, and 17 lncRNAs were identified in root, stem, and leaf tissues, respectively. The expression of critical genes for vascular development stages, i.e. formation of provascular cells, xylem specification and differentiation, and phloem specification and differentiation, was compared in S. moellendorffii tissues, indicating a less complex regulatory mechanism in lycophytes than in flowering plants. The results were further strengthened by the evolutionary trend of seven transcription factor families related to vascular development, which was observed among four representative species of seed and non-seed vascular plants, and nonvascular land and aquatic plants. The deep RNA-seq study of S. moellendorffii discovered extensive new gene contents, including novel coding genes, lncRNAs, AS events, and refined gene models. Compared to flowering vascular plants, S. moellendorffii displayed a less complexity in both gene structure, alternative splicing, and regulatory elements of vascular development. The study offered important insight into the evolution of vascular plants, and the regulation mechanism of vascular development in a non-seed plant.

  7. An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    PubMed Central

    Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055

  8. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways.

    PubMed

    Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei

    2013-01-01

    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

  9. Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

    PubMed Central

    2011-01-01

    Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981

  10. Inference of quantitative models of bacterial promoters from time-series reporter gene data.

    PubMed

    Stefan, Diana; Pinel, Corinne; Pinhal, Stéphane; Cinquemani, Eugenio; Geiselmann, Johannes; de Jong, Hidde

    2015-01-01

    The inference of regulatory interactions and quantitative models of gene regulation from time-series transcriptomics data has been extensively studied and applied to a range of problems in drug discovery, cancer research, and biotechnology. The application of existing methods is commonly based on implicit assumptions on the biological processes under study. First, the measurements of mRNA abundance obtained in transcriptomics experiments are taken to be representative of protein concentrations. Second, the observed changes in gene expression are assumed to be solely due to transcription factors and other specific regulators, while changes in the activity of the gene expression machinery and other global physiological effects are neglected. While convenient in practice, these assumptions are often not valid and bias the reverse engineering process. Here we systematically investigate, using a combination of models and experiments, the importance of this bias and possible corrections. We measure in real time and in vivo the activity of genes involved in the FliA-FlgM module of the E. coli motility network. From these data, we estimate protein concentrations and global physiological effects by means of kinetic models of gene expression. Our results indicate that correcting for the bias of commonly-made assumptions improves the quality of the models inferred from the data. Moreover, we show by simulation that these improvements are expected to be even stronger for systems in which protein concentrations have longer half-lives and the activity of the gene expression machinery varies more strongly across conditions than in the FliA-FlgM module. The approach proposed in this study is broadly applicable when using time-series transcriptome data to learn about the structure and dynamics of regulatory networks. In the case of the FliA-FlgM module, our results demonstrate the importance of global physiological effects and the active regulation of FliA and FlgM half-lives for the dynamics of FliA-dependent promoters.

  11. An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence

    PubMed Central

    Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.

    2008-01-01

    Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495

  12. An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence.

    PubMed

    Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P

    2008-10-01

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/

  13. Cross-species transcriptomic approach reveals genes in hamster implantation sites.

    PubMed

    Lei, Wei; Herington, Jennifer; Galindo, Cristi L; Ding, Tianbing; Brown, Naoko; Reese, Jeff; Paria, Bibhash C

    2014-12-01

    The mouse model has greatly contributed to understanding molecular mechanisms involved in the regulation of progesterone (P4) plus estrogen (E)-dependent blastocyst implantation process. However, little is known about contributory molecular mechanisms of the P4-only-dependent blastocyst implantation process that occurs in species such as hamsters, guineapigs, rabbits, pigs, rhesus monkeys, and perhaps humans. We used the hamster as a model of P4-only-dependent blastocyst implantation and carried out cross-species microarray (CSM) analyses to reveal differentially expressed genes at the blastocyst implantation site (BIS), in order to advance the understanding of molecular mechanisms of implantation. Upregulation of 112 genes and downregulation of 77 genes at the BIS were identified using a mouse microarray platform, while use of the human microarray revealed 62 up- and 38 down-regulated genes at the BIS. Excitingly, a sizable number of genes (30 up- and 11 down-regulated genes) were identified as a shared pool by both CSMs. Real-time RT-PCR and in situ hybridization validated the expression patterns of several up- and down-regulated genes identified by both CSMs at the hamster and mouse BIS to demonstrate the merit of CSM findings across species, in addition to revealing genes specific to hamsters. Functional annotation analysis found that genes involved in the spliceosome, proteasome, and ubiquination pathways are enriched at the hamster BIS, while genes associated with tight junction, SAPK/JNK signaling, and PPARα/RXRα signalings are repressed at the BIS. Overall, this study provides a pool of genes and evidence of their participation in up- and down-regulated cellular functions/pathways at the hamster BIS. © 2014 Society for Reproduction and Fertility.

  14. Discovering time-lagged rules from microarray data using gene profile classifiers

    PubMed Central

    2011-01-01

    Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. PMID:21524308

  15. PTEN Loss as Determined by Clinical-grade Immunohistochemistry Assay Is Associated with Worse Recurrence-free Survival in Prostate Cancer.

    PubMed

    Lotan, Tamara L; Wei, Wei; Morais, Carlos L; Hawley, Sarah T; Fazli, Ladan; Hurtado-Coll, Antonio; Troyer, Dean; McKenney, Jesse K; Simko, Jeffrey; Carroll, Peter R; Gleave, Martin; Lance, Raymond; Lin, Daniel W; Nelson, Peter S; Thompson, Ian M; True, Lawrence D; Feng, Ziding; Brooks, James D

    2016-06-01

    PTEN is the most commonly deleted tumor suppressor gene in primary prostate cancer (PCa) and its loss is associated with poor clinical outcomes and ERG gene rearrangement. We tested whether PTEN loss is associated with shorter recurrence-free survival (RFS) in surgically treated PCa patients with known ERG status. A genetically validated, automated PTEN immunohistochemistry (IHC) protocol was used for 1275 primary prostate tumors from the Canary Foundation retrospective PCa tissue microarray cohort to assess homogeneous (in all tumor tissue sampled) or heterogeneous (in a subset of tumor tissue sampled) PTEN loss. ERG status as determined by a genetically validated IHC assay was available for a subset of 938 tumors. Associations between PTEN and ERG status were assessed using Fisher's exact test. Kaplan-Meier and multivariate weighted Cox proportional models for RFS were constructed. When compared to intact PTEN, homogeneous (hazard ratio [HR] 1.66, p = 0.001) but not heterogeneous (HR 1.24, p = 0.14) PTEN loss was significantly associated with shorter RFS in multivariate models. Among ERG-positive tumors, homogeneous (HR 3.07, p < 0.0001) but not heterogeneous (HR 1.46, p = 0.10) PTEN loss was significantly associated with shorter RFS. Among ERG-negative tumors, PTEN did not reach significance for inclusion in the final multivariate models. The interaction term for PTEN and ERG status with respect to RFS did not reach statistical significance ( p = 0.11) for the current sample size. These data suggest that PTEN is a useful prognostic biomarker and that there is no statistically significant interaction between PTEN and ERG status for RFS. We found that loss of the PTEN tumor suppressor gene in prostate tumors as assessed by tissue staining is correlated with shorter time to prostate cancer recurrence after radical prostatectomy.

  16. Systematic network assessment of the carcinogenic activities of cadmium

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Peizhan; Duan, Xiaohua; Li, Mian

    Cadmium has been defined as type I carcinogen for humans, but the underlying mechanisms of its carcinogenic activity and its influence on protein-protein interactions in cells are not fully elucidated. The aim of the current study was to evaluate, systematically, the carcinogenic activity of cadmium with systems biology approaches. From a literature search of 209 studies that performed with cellular models, 208 proteins influenced by cadmium exposure were identified. All of these were assessed by Western blotting and were recognized as key nodes in network analyses. The protein-protein functional interaction networks were constructed with NetBox software and visualized with Cytoscapemore » software. These cadmium-rewired genes were used to construct a scale-free, highly connected biological protein interaction network with 850 nodes and 8770 edges. Of the network, nine key modules were identified and 60 key signaling pathways, including the estrogen, RAS, PI3K-Akt, NF-κB, HIF-1α, Jak-STAT, and TGF-β signaling pathways, were significantly enriched. With breast cancer, colorectal and prostate cancer cellular models, we validated the key node genes in the network that had been previously reported or inferred form the network by Western blotting methods, including STAT3, JNK, p38, SMAD2/3, P65, AKT1, and HIF-1α. These results suggested the established network was robust and provided a systematic view of the carcinogenic activities of cadmium in human. - Highlights: • A cadmium-influenced network with 850 nodes and 8770 edges was established. • The cadmium-rewired gene network was scale-free and highly connected. • Nine modules were identified, and 60 key signaling pathways related to cadmium-induced carcinogenesis were found. • Key mediators in the network were validated in multiple cellular models.« less

  17. Integrative molecular network analysis identifies emergent enzalutamide resistance mechanisms in prostate cancer

    PubMed Central

    King, Carly J.; Woodward, Josha; Schwartzman, Jacob; Coleman, Daniel J.; Lisac, Robert; Wang, Nicholas J.; Van Hook, Kathryn; Gao, Lina; Urrutia, Joshua; Dane, Mark A.; Heiser, Laura M.; Alumkal, Joshi J.

    2017-01-01

    Recent work demonstrates that castration-resistant prostate cancer (CRPC) tumors harbor countless genomic aberrations that control many hallmarks of cancer. While some specific mutations in CRPC may be actionable, many others are not. We hypothesized that genomic aberrations in cancer may operate in concert to promote drug resistance and tumor progression, and that organization of these genomic aberrations into therapeutically targetable pathways may improve our ability to treat CRPC. To identify the molecular underpinnings of enzalutamide-resistant CRPC, we performed transcriptional and copy number profiling studies using paired enzalutamide-sensitive and resistant LNCaP prostate cancer cell lines. Gene networks associated with enzalutamide resistance were revealed by performing an integrative genomic analysis with the PAthway Representation and Analysis by Direct Reference on Graphical Models (PARADIGM) tool. Amongst the pathways enriched in the enzalutamide-resistant cells were those associated with MEK, EGFR, RAS, and NFKB. Functional validation studies of 64 genes identified 10 candidate genes whose suppression led to greater effects on cell viability in enzalutamide-resistant cells as compared to sensitive parental cells. Examination of a patient cohort demonstrated that several of our functionally-validated gene hits are deregulated in metastatic CRPC tumor samples, suggesting that they may be clinically relevant therapeutic targets for patients with enzalutamide-resistant CRPC. Altogether, our approach demonstrates the potential of integrative genomic analyses to clarify determinants of drug resistance and rational co-targeting strategies to overcome resistance. PMID:29340039

  18. Chemical Memory Reactions Induced Bursting Dynamics in Gene Expression

    PubMed Central

    Tian, Tianhai

    2013-01-01

    Memory is a ubiquitous phenomenon in biological systems in which the present system state is not entirely determined by the current conditions but also depends on the time evolutionary path of the system. Specifically, many memorial phenomena are characterized by chemical memory reactions that may fire under particular system conditions. These conditional chemical reactions contradict to the extant stochastic approaches for modeling chemical kinetics and have increasingly posed significant challenges to mathematical modeling and computer simulation. To tackle the challenge, I proposed a novel theory consisting of the memory chemical master equations and memory stochastic simulation algorithm. A stochastic model for single-gene expression was proposed to illustrate the key function of memory reactions in inducing bursting dynamics of gene expression that has been observed in experiments recently. The importance of memory reactions has been further validated by the stochastic model of the p53-MDM2 core module. Simulations showed that memory reactions is a major mechanism for realizing both sustained oscillations of p53 protein numbers in single cells and damped oscillations over a population of cells. These successful applications of the memory modeling framework suggested that this innovative theory is an effective and powerful tool to study memory process and conditional chemical reactions in a wide range of complex biological systems. PMID:23349679

  19. Chemical memory reactions induced bursting dynamics in gene expression.

    PubMed

    Tian, Tianhai

    2013-01-01

    Memory is a ubiquitous phenomenon in biological systems in which the present system state is not entirely determined by the current conditions but also depends on the time evolutionary path of the system. Specifically, many memorial phenomena are characterized by chemical memory reactions that may fire under particular system conditions. These conditional chemical reactions contradict to the extant stochastic approaches for modeling chemical kinetics and have increasingly posed significant challenges to mathematical modeling and computer simulation. To tackle the challenge, I proposed a novel theory consisting of the memory chemical master equations and memory stochastic simulation algorithm. A stochastic model for single-gene expression was proposed to illustrate the key function of memory reactions in inducing bursting dynamics of gene expression that has been observed in experiments recently. The importance of memory reactions has been further validated by the stochastic model of the p53-MDM2 core module. Simulations showed that memory reactions is a major mechanism for realizing both sustained oscillations of p53 protein numbers in single cells and damped oscillations over a population of cells. These successful applications of the memory modeling framework suggested that this innovative theory is an effective and powerful tool to study memory process and conditional chemical reactions in a wide range of complex biological systems.

  20. Systems Perturbation Analysis of a Large-Scale Signal Transduction Model Reveals Potentially Influential Candidates for Cancer Therapeutics

    PubMed Central

    Puniya, Bhanwar Lal; Allen, Laura; Hochfelder, Colleen; Majumder, Mahbubul; Helikar, Tomáš

    2016-01-01

    Dysregulation in signal transduction pathways can lead to a variety of complex disorders, including cancer. Computational approaches such as network analysis are important tools to understand system dynamics as well as to identify critical components that could be further explored as therapeutic targets. Here, we performed perturbation analysis of a large-scale signal transduction model in extracellular environments that stimulate cell death, growth, motility, and quiescence. Each of the model’s components was perturbed under both loss-of-function and gain-of-function mutations. Using 1,300 simulations under both types of perturbations across various extracellular conditions, we identified the most and least influential components based on the magnitude of their influence on the rest of the system. Based on the premise that the most influential components might serve as better drug targets, we characterized them for biological functions, housekeeping genes, essential genes, and druggable proteins. The most influential components under all environmental conditions were enriched with several biological processes. The inositol pathway was found as most influential under inactivating perturbations, whereas the kinase and small lung cancer pathways were identified as the most influential under activating perturbations. The most influential components were enriched with essential genes and druggable proteins. Moreover, known cancer drug targets were also classified in influential components based on the affected components in the network. Additionally, the systemic perturbation analysis of the model revealed a network motif of most influential components which affect each other. Furthermore, our analysis predicted novel combinations of cancer drug targets with various effects on other most influential components. We found that the combinatorial perturbation consisting of PI3K inactivation and overactivation of IP3R1 can lead to increased activity levels of apoptosis-related components and tumor-suppressor genes, suggesting that this combinatorial perturbation may lead to a better target for decreasing cell proliferation and inducing apoptosis. Finally, our approach shows a potential to identify and prioritize therapeutic targets through systemic perturbation analysis of large-scale computational models of signal transduction. Although some components of the presented computational results have been validated against independent gene expression data sets, more laboratory experiments are warranted to more comprehensively validate the presented results. PMID:26904540

  1. Modeling antibiotic and cytotoxic effects of the dimeric isoquinoline IQ-143 on metabolism and its regulation in Staphylococcus aureus, Staphylococcus epidermidis and human cells

    PubMed Central

    2011-01-01

    Background Xenobiotics represent an environmental stress and as such are a source for antibiotics, including the isoquinoline (IQ) compound IQ-143. Here, we demonstrate the utility of complementary analysis of both host and pathogen datasets in assessing bacterial adaptation to IQ-143, a synthetic analog of the novel type N,C-coupled naphthyl-isoquinoline alkaloid ancisheynine. Results Metabolite measurements, gene expression data and functional assays were combined with metabolic modeling to assess the effects of IQ-143 on Staphylococcus aureus, Staphylococcus epidermidis and human cell lines, as a potential paradigm for novel antibiotics. Genome annotation and PCR validation identified novel enzymes in the primary metabolism of staphylococci. Gene expression response analysis and metabolic modeling demonstrated the adaptation of enzymes to IQ-143, including those not affected by significant gene expression changes. At lower concentrations, IQ-143 was bacteriostatic, and at higher concentrations bactericidal, while the analysis suggested that the mode of action was a direct interference in nucleotide and energy metabolism. Experiments in human cell lines supported the conclusions from pathway modeling and found that IQ-143 had low cytotoxicity. Conclusions The data suggest that IQ-143 is a promising lead compound for antibiotic therapy against staphylococci. The combination of gene expression and metabolite analyses with in silico modeling of metabolite pathways allowed us to study metabolic adaptations in detail and can be used for the evaluation of metabolic effects of other xenobiotics. PMID:21418624

  2. Mutant characterization and in vivo conditional repression identify aromatic amino acid biosynthesis to be essential for Aspergillus fumigatus virulence

    PubMed Central

    Sasse, Anna; Hamer, Stefanie N; Amich, Jorge; Binder, Jasmin; Krappmann, Sven

    2016-01-01

    Pathogenicity of the saprobe Aspergillus fumigatus strictly depends on nutrient acquisition during infection, as fungal growth determines colonisation and invasion of a susceptible host. Primary metabolism has to be considered as a valid target for antimycotic therapy, based on the fact that several fungal anabolic pathways are not conserved in higher eukaryotes. To test whether fungal proliferation during invasive aspergillosis relies on endogenous biosynthesis of aromatic amino acids, defined auxotrophic mutants of A. fumigatus were generated and assessed for their infectious capacities in neutropenic mice and found to be strongly attenuated in virulence. Moreover, essentiality of the complete biosynthetic pathway could be demonstrated, corroborated by conditional gene expression in infected animals and inhibitor studies. This brief report not only validates the aromatic amino acid biosynthesis pathway of A. fumigatus to be a promising antifungal target but furthermore demonstrates feasibility of conditional gene expression in a murine infection model of aspergillosis. PMID:26605426

  3. Workshop overview: approaches to the assessment of the allergenic potential of food from genetically modified crops.

    PubMed

    Ladics, Gregory S; Holsapple, Michael P; Astwood, James D; Kimber, Ian; Knippels, Leon M J; Helm, Ricki M; Dong, Wumin

    2003-05-01

    There is a need to assess the safety of foods deriving from genetically modified (GM) crops, including the allergenic potential of novel gene products. Presently, there is no single in vitro or in vivo model that has been validated for the identification or characterization of potential food allergens. Instead, the evaluation focuses on risk factors such as source of the gene (i.e., allergenic vs. nonallergenic sources), physicochemical and genetic comparisons to known allergens, and exposure assessments. The purpose of this workshop was to gather together researchers working on various strategies for assessing protein allergenicity: (1) to describe the current state of knowledge and progress that has been made in the development and evaluation of appropriate testing strategies and (2) to identify critical issues that must now be addressed. This overview begins with a consideration of the current issues involved in assessing the allergenicity of GM foods. The second section presents information on in vitro models of digestibility, bioinformatics, and risk assessment in the context of clinical prevention and management of food allergy. Data on rodent models are presented in the next two sections. Finally, nonrodent models for assessing protein allergenicity are discussed. Collectively, these studies indicate that significant progress has been made in developing testing strategies. However, further efforts are needed to evaluate and validate the sensitivity, specificity, and reproducibility of many of these assays for determining the allergenicity potential of GM foods.

  4. Differential susceptibility to plasticity: a 'missing link' between gene-culture co-evolution and neuropsychiatric spectrum disorders?

    PubMed Central

    2012-01-01

    Brüne's proposal that erstwhile 'vulnerability' genes need to be reconsidered as 'plasticity' genes, given the potential for certain environments to yield increased positive function in the same domain as potential dysfunction, has implications for psychiatric nosology as well as a more dynamic understanding of the relationship between genes and culture. In addition to validating neuropsychiatric spectrum disorder nosologies by calling for similar methodological shifts in gene-environment-interaction studies, Brüne's position elevates the importance of environmental contexts - inclusive of socio-cultural variables - as mechanisms that contribute to clinical presentation. We assert that when models of susceptibility to plasticity and neuropsychiatric spectrum disorders are concomitantly considered, a new line of inquiry emerges into the co-evolution and co-determination of socio-cultural contexts and endophenotypes. This presents potentially unique opportunities, benefits, challenges, and responsibilities for research and practice in psychiatry. Please see related manuscript: http://www.biomedcentral.com/1741-7015/10/38 PMID:22510307

  5. A Sleeping Beauty forward genetic screen identifies new genes and pathways driving osteosarcoma development and metastasis

    PubMed Central

    Moriarity, Branden S; Otto, George M; Rahrmann, Eric P; Rathe, Susan K; Wolf, Natalie K; Weg, Madison T; Manlove, Luke A; LaRue, Rebecca S; Temiz, Nuri A; Molyneux, Sam D; Choi, Kwangmin; Holly, Kevin J; Sarver, Aaron L; Scott, Milcah C; Forster, Colleen L; Modiano, Jaime F; Khanna, Chand; Hewitt, Stephen M; Khokha, Rama; Yang, Yi; Gorlick, Richard; Dyer, Michael A; Largaespada, David A

    2016-01-01

    Osteosarcomas are sarcomas of the bone, derived from osteoblasts or their precursors, with a high propensity to metastasize. Osteosarcoma is associated with massive genomic instability, making it problematic to identify driver genes using human tumors or prototypical mouse models, many of which involve loss of Trp53 function. To identify the genes driving osteosarcoma development and metastasis, we performed a Sleeping Beauty (SB) transposon-based forward genetic screen in mice with and without somatic loss of Trp53. Common insertion site (CIS) analysis of 119 primary tumors and 134 metastatic nodules identified 232 sites associated with osteosarcoma development and 43 sites associated with metastasis, respectively. Analysis of CIS-associated genes identified numerous known and new osteosarcoma-associated genes enriched in the ErbB, PI3K-AKT-mTOR and MAPK signaling pathways. Lastly, we identified several oncogenes involved in axon guidance, including Sema4d and Sema6d, which we functionally validated as oncogenes in human osteosarcoma. PMID:25961939

  6. Differential susceptibility to plasticity: a 'missing link' between gene-culture co-evolution and neuropsychiatric spectrum disorders?

    PubMed

    Wurzman, Rachel; Giordano, James

    2012-04-17

    Brüne's proposal that erstwhile 'vulnerability' genes need to be reconsidered as 'plasticity' genes, given the potential for certain environments to yield increased positive function in the same domain as potential dysfunction, has implications for psychiatric nosology as well as a more dynamic understanding of the relationship between genes and culture. In addition to validating neuropsychiatric spectrum disorder nosologies by calling for similar methodological shifts in gene-environment-interaction studies, Brüne's position elevates the importance of environmental contexts - inclusive of socio-cultural variables - as mechanisms that contribute to clinical presentation. We assert that when models of susceptibility to plasticity and neuropsychiatric spectrum disorders are concomitantly considered, a new line of inquiry emerges into the co-evolution and co-determination of socio-cultural contexts and endophenotypes. This presents potentially unique opportunities, benefits, challenges, and responsibilities for research and practice in psychiatry. Please see related manuscript: http://www.biomedcentral.com/1741-7015/10/38.

  7. Analysis of the dynamic co-expression network of heart regeneration in the zebrafish

    PubMed Central

    Rodius, Sophie; Androsova, Ganna; Götz, Lou; Liechti, Robin; Crespo, Isaac; Merz, Susanne; Nazarov, Petr V.; de Klein, Niek; Jeanty, Céline; González-Rosa, Juan M.; Muller, Arnaud; Bernardin, Francois; Niclou, Simone P.; Vallar, Laurent; Mercader, Nadia; Ibberson, Mark; Xenarios, Ioannis; Azuaje, Francisco

    2016-01-01

    The zebrafish has the capacity to regenerate its heart after severe injury. While the function of a few genes during this process has been studied, we are far from fully understanding how genes interact to coordinate heart regeneration. To enable systematic insights into this phenomenon, we generated and integrated a dynamic co-expression network of heart regeneration in the zebrafish and linked systems-level properties to the underlying molecular events. Across multiple post-injury time points, the network displays topological attributes of biological relevance. We show that regeneration steps are mediated by modules of transcriptionally coordinated genes, and by genes acting as network hubs. We also established direct associations between hubs and validated drivers of heart regeneration with murine and human orthologs. The resulting models and interactive analysis tools are available at http://infused.vital-it.ch. Using a worked example, we demonstrate the usefulness of this unique open resource for hypothesis generation and in silico screening for genes involved in heart regeneration. PMID:27241320

  8. Circuit-wide Transcriptional Profiling Reveals Brain Region-Specific Gene Networks Regulating Depression Susceptibility.

    PubMed

    Bagot, Rosemary C; Cates, Hannah M; Purushothaman, Immanuel; Lorsch, Zachary S; Walker, Deena M; Wang, Junshi; Huang, Xiaojie; Schlüter, Oliver M; Maze, Ian; Peña, Catherine J; Heller, Elizabeth A; Issler, Orna; Wang, Minghui; Song, Won-Min; Stein, Jason L; Liu, Xiaochuan; Doyle, Marie A; Scobie, Kimberly N; Sun, Hao Sheng; Neve, Rachael L; Geschwind, Daniel; Dong, Yan; Shen, Li; Zhang, Bin; Nestler, Eric J

    2016-06-01

    Depression is a complex, heterogeneous disorder and a leading contributor to the global burden of disease. Most previous research has focused on individual brain regions and genes contributing to depression. However, emerging evidence in humans and animal models suggests that dysregulated circuit function and gene expression across multiple brain regions drive depressive phenotypes. Here, we performed RNA sequencing on four brain regions from control animals and those susceptible or resilient to chronic social defeat stress at multiple time points. We employed an integrative network biology approach to identify transcriptional networks and key driver genes that regulate susceptibility to depressive-like symptoms. Further, we validated in vivo several key drivers and their associated transcriptional networks that regulate depression susceptibility and confirmed their functional significance at the levels of gene transcription, synaptic regulation, and behavior. Our study reveals novel transcriptional networks that control stress susceptibility and offers fundamentally new leads for antidepressant drug discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. An Arabidopsis gene regulatory network for secondary cell wall synthesis

    DOE PAGES

    Taylor-Teeples, M.; Lin, L.; de Lucas, M.; ...

    2014-12-24

    The plant cell wall is an important factor for determining cell shape, function and response to the environment. Secondary cell walls, such as those found in xylem, are composed of cellulose, hemicelluloses and lignin and account for the bulk of plant biomass. The coordination between transcriptional regulation of synthesis for each polymer is complex and vital to cell function. A regulatory hierarchy of developmental switches has been proposed, although the full complement of regulators remains unknown. In this paper, we present a protein–DNA network between Arabidopsis thaliana transcription factors and secondary cell wall metabolic genes with gene expression regulated bymore » a series of feed-forward loops. This model allowed us to develop and validate new hypotheses about secondary wall gene regulation under abiotic stress. Distinct stresses are able to perturb targeted genes to potentially promote functional adaptation. Finally, these interactions will serve as a foundation for understanding the regulation of a complex, integral plant component.« less

  10. Analysis of the dynamic co-expression network of heart regeneration in the zebrafish

    NASA Astrophysics Data System (ADS)

    Rodius, Sophie; Androsova, Ganna; Götz, Lou; Liechti, Robin; Crespo, Isaac; Merz, Susanne; Nazarov, Petr V.; de Klein, Niek; Jeanty, Céline; González-Rosa, Juan M.; Muller, Arnaud; Bernardin, Francois; Niclou, Simone P.; Vallar, Laurent; Mercader, Nadia; Ibberson, Mark; Xenarios, Ioannis; Azuaje, Francisco

    2016-05-01

    The zebrafish has the capacity to regenerate its heart after severe injury. While the function of a few genes during this process has been studied, we are far from fully understanding how genes interact to coordinate heart regeneration. To enable systematic insights into this phenomenon, we generated and integrated a dynamic co-expression network of heart regeneration in the zebrafish and linked systems-level properties to the underlying molecular events. Across multiple post-injury time points, the network displays topological attributes of biological relevance. We show that regeneration steps are mediated by modules of transcriptionally coordinated genes, and by genes acting as network hubs. We also established direct associations between hubs and validated drivers of heart regeneration with murine and human orthologs. The resulting models and interactive analysis tools are available at http://infused.vital-it.ch. Using a worked example, we demonstrate the usefulness of this unique open resource for hypothesis generation and in silico screening for genes involved in heart regeneration.

  11. The Genome of the Western Clawed Frog Xenopus tropicalis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.

    2009-10-01

    The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with humanmore » and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.« less

  12. Conserved gene regulation during acute inflammation between zebrafish and mammals

    PubMed Central

    Forn-Cuní, G.; Varela, M.; Pereiro, P.; Novoa, B.; Figueras, A.

    2017-01-01

    Zebrafish (Danio rerio), largely used as a model for studying developmental processes, has also emerged as a valuable system for modelling human inflammatory diseases. However, in a context where even mice have been questioned as a valid model for these analysis, a systematic study evaluating the reproducibility of human and mammalian inflammatory diseases in zebrafish is still lacking. In this report, we characterize the transcriptomic regulation to lipopolysaccharide in adult zebrafish kidney, liver, and muscle tissues using microarrays and demonstrate how the zebrafish genomic responses can effectively reproduce the mammalian inflammatory process induced by acute endotoxin stress. We provide evidence that immune signaling pathways and single gene expression is well conserved throughout evolution and that the zebrafish and mammal acute genomic responses after lipopolysaccharide stimulation are highly correlated despite the differential susceptibility between species to that compound. Therefore, we formally confirm that zebrafish inflammatory models are suited to study the basic mechanisms of inflammation in human inflammatory diseases, with great translational impact potential. PMID:28157230

  13. A de novo transcriptome and valid reference genes for quantitative real-time PCR in Colaphellus bowringi.

    PubMed

    Tan, Qian-Qian; Zhu, Li; Li, Yi; Liu, Wen; Ma, Wei-Hua; Lei, Chao-Liang; Wang, Xiao-Ping

    2015-01-01

    The cabbage beetle Colaphellus bowringi Baly is a serious insect pest of crucifers and undergoes reproductive diapause in soil. An understanding of the molecular mechanisms of diapause regulation, insecticide resistance, and other physiological processes is helpful for developing new management strategies for this beetle. However, the lack of genomic information and valid reference genes limits knowledge on the molecular bases of these physiological processes in this species. Using Illumina sequencing, we obtained more than 57 million sequence reads derived from C. bowringi, which were assembled into 39,390 unique sequences. A Clusters of Orthologous Groups classification was obtained for 9,048 of these sequences, covering 25 categories, and 16,951 were assigned to 255 Kyoto Encyclopedia of Genes and Genomes pathways. Eleven candidate reference gene sequences from the transcriptome were then identified through reverse transcriptase polymerase chain reaction. Among these candidate genes, EF1α, ACT1, and RPL19 proved to be the most stable reference genes for different reverse transcriptase quantitative polymerase chain reaction experiments in C. bowringi. Conversely, aTUB and GAPDH were the least stable reference genes. The abundant putative C. bowringi transcript sequences reported enrich the genomic resources of this beetle. Importantly, the larger number of gene sequences and valid reference genes provide a valuable platform for future gene expression studies, especially with regard to exploring the molecular mechanisms of different physiological processes in this species.

  14. Down-regulation of miR-146a-5p and its potential targets in hepatocellular carcinoma validated by a TCGA- and GEO-based study.

    PubMed

    Zhang, Xin; Ye, Zhi-Hua; Liang, Hai-Wei; Ren, Fang-Hui; Li, Ping; Dang, Yi-Wu; Chen, Gang

    2017-04-01

    Our previous research has demonstrated that miR-146a-5p is down-regulated in hepatocellular carcinoma (HCC) and might play a tumor-suppressive role. In this study, we sought to validate the decreased expression with a larger cohort and to explore potential molecular mechanisms. GEO and TCGA databases were used to gather miR-146a-5p expression data in HCC, which included 762 HCC and 454 noncancerous liver tissues. A meta-analysis of the GEO-based microarrays, TCGA-based RNA-seq data, and additional qRT-PCR data validated the down-regulation of miR-146a-5p in HCC and no publication bias was observed. Integrated genes were generated by overlapping miR-146a-5p-related genes from predicted and formerly reported HCC-related genes using natural language processing. The overlaps were comprehensively analyzed to discover the potential gene signatures, regulatory pathways, and networks of miR-146a-5p in HCC. A total of 251 miR-146a-5p potential target genes were predicted by bioinformatics platforms and 104 genes were considered as both HCC- and miR-146a-5p-related overlaps. RAC1 was the most connected hub gene for miR-146a-5p and four pathways with high enrichment (VEGF signaling pathway, adherens junction, toll-like receptor signaling pathway, and neurotrophin signaling pathway) were denoted for the overlapped genes. The down-regulation of miR-146a-5p in HCC has been validated with the most complete data possible. The potential gene signatures, regulatory pathways, and networks identified for miR-146a-5p in HCC could prove useful for molecular-targeted diagnostics and therapeutics.

  15. Preclinical modeling highlights the therapeutic potential of hematopoietic stem cell gene editing for correction of SCID-X1.

    PubMed

    Schiroli, Giulia; Ferrari, Samuele; Conway, Anthony; Jacob, Aurelien; Capo, Valentina; Albano, Luisa; Plati, Tiziana; Castiello, Maria C; Sanvito, Francesca; Gennery, Andrew R; Bovolenta, Chiara; Palchaudhuri, Rahul; Scadden, David T; Holmes, Michael C; Villa, Anna; Sitia, Giovanni; Lombardo, Angelo; Genovese, Pietro; Naldini, Luigi

    2017-10-11

    Targeted genome editing in hematopoietic stem/progenitor cells (HSPCs) is an attractive strategy for treating immunohematological diseases. However, the limited efficiency of homology-directed editing in primitive HSPCs constrains the yield of corrected cells and might affect the feasibility and safety of clinical translation. These concerns need to be addressed in stringent preclinical models and overcome by developing more efficient editing methods. We generated a humanized X-linked severe combined immunodeficiency (SCID-X1) mouse model and evaluated the efficacy and safety of hematopoietic reconstitution from limited input of functional HSPCs, establishing thresholds for full correction upon different types of conditioning. Unexpectedly, conditioning before HSPC infusion was required to protect the mice from lymphoma developing when transplanting small numbers of progenitors. We then designed a one-size-fits-all IL2RG (interleukin-2 receptor common γ-chain) gene correction strategy and, using the same reagents suitable for correction of human HSPC, validated the edited human gene in the disease model in vivo, providing evidence of targeted gene editing in mouse HSPCs and demonstrating the functionality of the IL2RG -edited lymphoid progeny. Finally, we optimized editing reagents and protocol for human HSPCs and attained the threshold of IL2RG editing in long-term repopulating cells predicted to safely rescue the disease, using clinically relevant HSPC sources and highly specific zinc finger nucleases or CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR-associated protein 9). Overall, our work establishes the rationale and guiding principles for clinical translation of SCID-X1 gene editing and provides a framework for developing gene correction for other diseases. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  16. Experimental validation of predicted cancer genes using FRET

    NASA Astrophysics Data System (ADS)

    Guala, Dimitri; Bernhem, Kristoffer; Ait Blal, Hammou; Jans, Daniel; Lundberg, Emma; Brismar, Hjalmar; Sonnhammer, Erik L. L.

    2018-07-01

    Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.

  17. Neuroprotective changes of thalamic degeneration-related gene expression by acupuncture in an MPTP mouse model of parkinsonism: microarray analysis.

    PubMed

    Yeo, Sujung; Choi, Yeong-Gon; Hong, Yeon-Mi; Lim, Sabina

    2013-02-25

    Acupuncture stimulations at GB34 and LR3 inhibit the reduction of tyrosine hydroxylase in the nigrostriatal dopaminergic neurons in the parkinsonism animal models. Especially, behavioral tests showed that acupuncture stimulations improved the motor dysfunction in a previous study by almost 87.7%. The thalamus is a crucial area for the motor circuit and has been identified as one of the most markedly damaged areas in Parkinson's disease (PD), so acupuncture stimulations might also have an effect on the thalamic damage. In this study, gene expression changes following acupuncture at the acupoints were investigated in the thalamus of a 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP)-induced parkinsonism model using a whole transcript array. It was confirmed that acupuncture at these acupoints could inhibit the decrease of tyrosine hydroxylase in the thalamic regions of the MPTP model, while acupuncture at the non-acupoints could not suppress this decrease by its level shown in the acupoints. GeneChip gene array analysis showed that 18 (5 annotated genes: Dnase1l2, Dusp4, Mafg, Ndph and Pgm5) of the probes down-regulated in MPTP, as compared to the control, were exclusively up-regulated by acupuncture at the acupoints, but not at the non-acupoints. In addition, 14 (3 annotated genes; Serinc2, Sp2 and Ucp2) of the probes up-regulated in MPTP, as compared to the control, were exclusively down-regulated by acupuncture at the acupoints, but not at the non-acupoints. The expression levels of the representative genes in the microarray were validated by real-time RT-PCR. These results suggest that the 32 probes (8 annotated genes) which are affected by MPTP and acupuncture may be responsible for exerting the inhibitory effect of acupuncture in the thalamus which can be damaged by MPTP intoxication. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Genome-Wide Screening and Characterization of the Dof Gene Family in Physic Nut (Jatropha curcas L.).

    PubMed

    Wang, Peipei; Li, Jing; Gao, Xiaoyang; Zhang, Di; Li, Anlin; Liu, Changning

    2018-05-29

    Physic nut ( Jatropha curcas L.) is a species of flowering plant with great potential for biofuel production and as an emerging model organism for functional genomic analysis, particularly in the Euphorbiaceae family. DNA binding with one finger (Dof) transcription factors play critical roles in numerous biological processes in plants. Nevertheless, the knowledge about members, and the evolutionary and functional characteristics of the Dof gene family in physic nut is insufficient. Therefore, we performed a genome-wide screening and characterization of the Dof gene family within the physic nut draft genome. In total, 24 JcDof genes (encoding 33 JcDof proteins) were identified. All the JcDof genes were divided into three major groups based on phylogenetic inference, which was further validated by the subsequent gene structure and motif analysis. Genome comparison revealed that segmental duplication may have played crucial roles in the expansion of the JcDof gene family, and gene expansion was mainly subjected to positive selection. The expression profile demonstrated the broad involvement of JcDof genes in response to various abiotic stresses, hormonal treatments and functional divergence. This study provides valuable information for better understanding the evolution of JcDof genes, and lays a foundation for future functional exploration of JcDof genes.

  19. Genome-wide identification of bacterial plant colonization genes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cole, Benjamin J.; Feltcher, Meghan E.; Waters, Robert J.

    Diverse soil-resident bacteria can contribute to plant growth and health, but the molecular mechanisms enabling them to effectively colonize their plant hosts remain poorly understood. We used randomly barcoded transposon mutagenesis sequencing (RB-TnSeq) in Pseudomonas simiae, a model root-colonizing bacterium, to establish a genome-wide map of bacterial genes required for colonization of the Arabidopsis thaliana root system. We identified 115 genes (2% of all P. simiae genes) with functions that are required for maximal competitive colonization of the root system. Among the genes we identified were some with obvious colonization-related roles in motility and carbon metabolism, as well as 44more » other genes that had no or vague functional predictions. Independent validation assays of individual genes confirmed colonization functions for 20 of 22 (91%) cases tested. To further characterize genes identified by our screen, we compared the functional contributions of P. simiae genes to growth in 90 distinct in vitro conditions by RB-TnSeq, highlighting specific metabolic functions associated with root colonization genes. Here, our analysis of bacterial genes by sequence-driven saturation mutagenesis revealed a genome-wide map of the genetic determinants of plant root colonization and offers a starting point for targeted improvement of the colonization capabilities of plant-beneficial microbes.« less

  20. Genome-wide identification of bacterial plant colonization genes

    DOE PAGES

    Cole, Benjamin J.; Feltcher, Meghan E.; Waters, Robert J.; ...

    2017-09-22

    Diverse soil-resident bacteria can contribute to plant growth and health, but the molecular mechanisms enabling them to effectively colonize their plant hosts remain poorly understood. We used randomly barcoded transposon mutagenesis sequencing (RB-TnSeq) in Pseudomonas simiae, a model root-colonizing bacterium, to establish a genome-wide map of bacterial genes required for colonization of the Arabidopsis thaliana root system. We identified 115 genes (2% of all P. simiae genes) with functions that are required for maximal competitive colonization of the root system. Among the genes we identified were some with obvious colonization-related roles in motility and carbon metabolism, as well as 44more » other genes that had no or vague functional predictions. Independent validation assays of individual genes confirmed colonization functions for 20 of 22 (91%) cases tested. To further characterize genes identified by our screen, we compared the functional contributions of P. simiae genes to growth in 90 distinct in vitro conditions by RB-TnSeq, highlighting specific metabolic functions associated with root colonization genes. Here, our analysis of bacterial genes by sequence-driven saturation mutagenesis revealed a genome-wide map of the genetic determinants of plant root colonization and offers a starting point for targeted improvement of the colonization capabilities of plant-beneficial microbes.« less

Top