Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis
2016-09-02
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A three-gene expression signature model for risk stratification of patients with neuroblastoma.
Garcia, Idoia; Mayol, Gemma; Ríos, José; Domenech, Gema; Cheung, Nai-Kong V; Oberthuer, André; Fischer, Matthias; Maris, John M; Brodeur, Garrett M; Hero, Barbara; Rodríguez, Eva; Suñol, Mariona; Galvan, Patricia; de Torres, Carmen; Mora, Jaume; Lavarino, Cinzia
2012-04-01
Neuroblastoma is an embryonal tumor with contrasting clinical courses. Despite elaborate stratification strategies, precise clinical risk assessment still remains a challenge. The purpose of this study was to develop a PCR-based predictor model to improve clinical risk assessment of patients with neuroblastoma. The model was developed using real-time PCR gene expression data from 96 samples and tested on separate expression data sets obtained from real-time PCR and microarray studies comprising 362 patients. On the basis of our prior study of differentially expressed genes in favorable and unfavorable neuroblastoma subgroups, we identified three genes, CHD5, PAFAH1B1, and NME1, strongly associated with patient outcome. The expression pattern of these genes was used to develop a PCR-based single-score predictor model. The model discriminated patients into two groups with significantly different clinical outcome [set 1: 5-year overall survival (OS): 0.93 ± 0.03 vs. 0.53 ± 0.06, 5-year event-free survival (EFS): 0.85 ± 0.04 vs. 0.042 ± 0.06, both P < 0.001; set 2 OS: 0.97 ± 0.02 vs. 0.61 ± 0.1, P = 0.005, EFS: 0.91 ± 0.8 vs. 0.56 ± 0.1, P = 0.005; and set 3 OS: 0.99 ± 0.01 vs. 0.56 ± 0.06, EFS: 0.96 ± 0.02 vs. 0.43 ± 0.05, both P < 0.001]. Multivariate analysis showed that the model was an independent marker for survival (P < 0.001, for all). In comparison with accepted risk stratification systems, the model robustly classified patients in the total cohort and in different clinically relevant risk subgroups. We propose for the first time in neuroblastoma, a technically simple PCR-based predictor model that could help refine current risk stratification systems. ©2012 AACR.
A Three-Gene Expression Signature Model for Risk Stratification of Patients with Neuroblastoma
Garcia, Idoia; Mayol, Gemma; Ríos, José; Domenech, Gema; Cheung, Nai-Kong V.; Oberthuer, André; Fischer, Matthias; Maris, John M.; Brodeur, Garrett M.; Hero, Barbara; Rodríguez, Eva; Suñol, Mariona; Galvan, Patricia; de Torres, Carmen; Mora, Jaume; Lavarino, Cinzia
2014-01-01
Purpose Neuroblastoma is an embryonal tumor with contrasting clinical courses. Despite elaborate stratification strategies, precise clinical risk assessment still remains a challenge. The purpose of this study was to develop a PCR-based predictor model to improve clinical risk assessment of patients with neuroblastoma. Experimental Design The model was developed using real-time PCR gene expression data from 96 samples and tested on separate expression data sets obtained from real-time PCR and microarray studies comprising 362 patients. Results On the basis of our prior study of differentially expressed genes in favorable and unfavorable neuroblastoma subgroups, we identified three genes, CHD5, PAFAH1B1, and NME1, strongly associated with patient outcome. The expression pattern of these genes was used to develop a PCR-based single-score predictor model. The model discriminated patients into two groups with significantly different clinical outcome [set 1: 5-year overall survival (OS): 0.93 ± 0.03 vs. 0.53 ± 0.06, 5-year event-free survival (EFS): 0.85 ± 0.04 vs. 0.042 ± 0.06, both P < 0.001; set 2 OS: 0.97 ± 0.02 vs. 0.61 ± 0.1, P = 0.005, EFS: 0.91 ± 0.8 vs. 0.56 ± 0.1, P = 0.005; and set 3 OS: 0.99 ± 0.01 vs. 0.56 ± 0.06, EFS: 0.96 ± 0.02 vs. 0.43 ± 0.05, both P < 0.001]. Multivariate analysis showed that the model was an independent marker for survival (P < 0.001, for all). In comparison with accepted risk stratification systems, the model robustly classified patients in the total cohort and in different clinically relevant risk subgroups. Conclusion We propose for the first time in neuroblastoma, a technically simple PCR-based predictor model that could help refine current risk stratification systems. PMID:22328561
In vitro transcriptomic prediction of hepatotoxicity for early drug discovery
Cheng, Feng; Theodorescu, Dan; Schulman, Ira G.; Lee, Jae K.
2012-01-01
Liver toxicity (hepatotoxicity) is a critical issue in drug discovery and development. Standard preclinical evaluation of drug hepatotoxicity is generally performed using in vivo animal systems. However, only a small number of preselected compounds can be examined in vivo due to high experimental costs. A more efficient yet accurate screening technique which can identify potentially hepatotoxic compounds in the early stages of drug development would thus be valuable. Here, we develop and apply a novel genomic prediction technique for screening hepatotoxic compounds based on in vitro human liver cell tests. Using a training set of in vivo rodent experiments for drug hepatotoxicity evaluation, we discovered common biomarkers of drug-induced liver toxicity among six heterogeneous compounds. This gene set was further triaged to a subset of 32 genes that can be used as a multi-gene expression signature to predict hepatotoxicity. This multi-gene predictor was independently validated and showed consistently high prediction performance on five test sets of in vitro human liver cell and in vivo animal toxicity experiments. The predictor also demonstrated utility in evaluating different degrees of toxicity in response to drug concentrations which may be useful not only for discerning a compound’s general hepatotoxicity but also for determining its toxic concentration. PMID:21884709
Johnson, Brent A
2009-10-01
We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.
APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Hieke, Stefanie; Benner, Axel; Schlenl, Richard F; Schumacher, Martin; Bullinger, Lars; Binder, Harald
2016-08-30
High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building multivariable risk prediction models for a clinical endpoint, such as treatment response or survival. Unfortunately, such a high-dimensional modeling task will often be complicated by a limited overlap of molecular measurements at different levels between patients, i.e. measurements from all molecular levels are available only for a smaller proportion of patients. We propose a sequential strategy for building clinical risk prediction models that integrate genome-wide measurements from two molecular levels in a complementary way. To deal with partial overlap, we develop an imputation approach that allows us to use all available data. This approach is investigated in two acute myeloid leukemia applications combining gene expression with either SNP or DNA methylation data. After obtaining a sparse risk prediction signature e.g. from SNP data, an automatically selected set of prognostic SNPs, by componentwise likelihood-based boosting, imputation is performed for the corresponding linear predictor by a linking model that incorporates e.g. gene expression measurements. The imputed linear predictor is then used for adjustment when building a prognostic signature from the gene expression data. For evaluation, we consider stability, as quantified by inclusion frequencies across resampling data sets. Despite an extremely small overlap in the application example with gene expression and SNPs, several genes are seen to be more stably identified when taking the (imputed) linear predictor from the SNP data into account. In the application with gene expression and DNA methylation, prediction performance with respect to survival also indicates that the proposed approach might work well. We consider imputation of linear predictor values to be a feasible and sensible approach for dealing with partial overlap in complementary integrative analysis of molecular measurements at different levels. More generally, these results indicate that a complementary strategy for integrating different molecular levels can result in more stable risk prediction signatures, potentially providing a more reliable insight into the underlying biology.
Oberthuer, André; Berthold, Frank; Warnat, Patrick; Hero, Barbara; Kahlert, Yvonne; Spitz, Rüdiger; Ernestus, Karen; König, Rainer; Haas, Stefan; Eils, Roland; Schwab, Manfred; Brors, Benedikt; Westermann, Frank; Fischer, Matthias
2006-11-01
To develop a gene expression-based classifier for neuroblastoma patients that reliably predicts courses of the disease. Two hundred fifty-one neuroblastoma specimens were analyzed using a customized oligonucleotide microarray comprising 10,163 probes for transcripts with differential expression in clinical subgroups of the disease. Subsequently, the prediction analysis for microarrays (PAM) was applied to a first set of patients with maximally divergent clinical courses (n = 77). The classification accuracy was estimated by a complete 10-times-repeated 10-fold cross validation, and a 144-gene predictor was constructed from this set. This classifier's predictive power was evaluated in an independent second set (n = 174) by comparing results of the gene expression-based classification with those of risk stratification systems of current trials from Germany, Japan, and the United States. The first set of patients was accurately predicted by PAM (cross-validated accuracy, 99%). Within the second set, the PAM classifier significantly separated cohorts with distinct courses (3-year event-free survival [EFS] 0.86 +/- 0.03 [favorable; n = 115] v 0.52 +/- 0.07 [unfavorable; n = 59] and 3-year overall survival 0.99 +/- 0.01 v 0.84 +/- 0.05; both P < .0001) and separated risk groups of current neuroblastoma trials into subgroups with divergent outcome (NB2004: low-risk 3-year EFS 0.86 +/- 0.04 v 0.25 +/- 0.15, P < .0001; intermediate-risk 1.00 v 0.57 +/- 0.19, P = .018; high-risk 0.81 +/- 0.10 v 0.56 +/- 0.08, P = .06). In a multivariate Cox regression model, the PAM predictor classified patients of the second set more accurately than risk stratification of current trials from Germany, Japan, and the United States (P < .001; hazard ratio, 4.756 [95% CI, 2.544 to 8.893]). Integration of gene expression-based class prediction of neuroblastoma patients may improve risk estimation of current neuroblastoma trials.
Bartlett, Thomas E.; Jones, Allison; Goode, Ellen L.; Fridley, Brooke L.; Cunningham, Julie M.; Berns, Els M. J. J.; Wik, Elisabeth; Salvesen, Helga B.; Davidson, Ben; Trope, Claes G.; Lambrechts, Sandrina; Vergote, Ignace; Widschwendter, Martin
2015-01-01
We introduce a novel per-gene measure of intra-gene DNA methylation variability (IGV) based on the Illumina Infinium HumanMethylation450 platform, which is prognostic independently of well-known predictors of clinical outcome. Using IGV, we derive a robust gene-panel prognostic signature for ovarian cancer (OC, n = 221), which validates in two independent data sets from Mayo Clinic (n = 198) and TCGA (n = 358), with significance of p = 0.004 in both sets. The OC prognostic signature gene-panel is comprised of four gene groups, which represent distinct biological processes. We show the IGV measurements of these gene groups are most likely a reflection of a mixture of intra-tumour heterogeneity and transcription factor (TF) binding/activity. IGV can be used to predict clinical outcome in patients individually, providing a surrogate read-out of hard-to-measure disease processes. PMID:26629914
Bartlett, Thomas E; Jones, Allison; Goode, Ellen L; Fridley, Brooke L; Cunningham, Julie M; Berns, Els M J J; Wik, Elisabeth; Salvesen, Helga B; Davidson, Ben; Trope, Claes G; Lambrechts, Sandrina; Vergote, Ignace; Widschwendter, Martin
2015-01-01
We introduce a novel per-gene measure of intra-gene DNA methylation variability (IGV) based on the Illumina Infinium HumanMethylation450 platform, which is prognostic independently of well-known predictors of clinical outcome. Using IGV, we derive a robust gene-panel prognostic signature for ovarian cancer (OC, n = 221), which validates in two independent data sets from Mayo Clinic (n = 198) and TCGA (n = 358), with significance of p = 0.004 in both sets. The OC prognostic signature gene-panel is comprised of four gene groups, which represent distinct biological processes. We show the IGV measurements of these gene groups are most likely a reflection of a mixture of intra-tumour heterogeneity and transcription factor (TF) binding/activity. IGV can be used to predict clinical outcome in patients individually, providing a surrogate read-out of hard-to-measure disease processes.
Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei
2006-06-23
Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.
The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.
Tang, Zaixiang; Shen, Yueping; Zhang, Xinyan; Yi, Nengjun
2017-01-01
Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Copyright © 2017 by the Genetics Society of America.
Siano, Marco; Espeli, Vittoria; Mach, Nicolas; Bossi, Paolo; Licitra, Lisa; Ghielmini, Michele; Frattini, Milo; Canevari, Silvana; De Cecco, Loris
2018-07-01
Platinum-based chemotherapy plus the anti-EGFR monoclonal antibody (mAb) cetuximab is used to treat recurrent/metastatic (RM) head-neck squamous cell carcinoma (HNSCC). Recently, we defined Cluster3 gene-expression signature as a potential predictor of favorable progression-free survival (PFS) in cetuximab-treated RM-HNSCC patients and predictor of partial metabolic FDG-PET response in an afatinib window-of-opportunity trial. Another anti-EGFR-mAb (panitumumab) was used as the treatment agent in RM-HNSCC patients in the phase II PANI01trial. PANI01 tumor samples were analyzed using functional genomics to explore response predictors to anti-EGFR therapy. Whole-gene expression and real-time PCR analyses were applied to pre-treatment samples from 25 PANI01 patients. Three gene signatures (Cluster3 score, RAS onco-signature, microenvironment score) and seven selected miRNAs were separately analyzed for association with panitumumab efficacy. Cluster3 expression levels had a profile with a significant bimodal separation of samples (P = 3.08 E-13). Higher RAS activation, microenvironment score, and miRNA expression were associated with low-Cluster3 patients. The same biomarkers were separately associated with PFS. Patients with high-Cluster3 had significantly longer PFS than patients with low-Cluster3 (median PFS: 174 versus 51 days; log-rank P = 0.0021). ROC analysis demonstrated accuracy in predicting PFS (AUC = 0.877). Despite differences in clinical settings and anti-EGFR inhibitors used for treatment, response prediction by the Cluster3 signature and selected miRNAs was essentially the same. Translation into a useful clinical assay requires validation in a broader setting. Copyright © 2018 Elsevier Ltd. All rights reserved.
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
Good, Benjamin M; Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I
2014-07-29
Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.
In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pandi, Narayanan Sathiya, E-mail: sathiyapandi@gmail.com; Suganya, Sivagurunathan; Rajendran, Suriliyandi
Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However,more » the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.« less
Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P
2015-03-11
The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M
2006-04-21
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
Chang, Yu-Chun; Ding, Yan; Dong, Lingsheng; Zhu, Lang-Jing; Jensen, Roderick V.
2018-01-01
Background Using DNA microarrays, we previously identified 451 genes expressed in 19 different human tissues. Although ubiquitously expressed, the variable expression patterns of these “housekeeping genes” (HKGs) could separate one normal human tissue type from another. Current focus on identifying “specific disease markers” is problematic as single gene expression in a given sample represents the specific cellular states of the sample at the time of collection. In this study, we examine the diagnostic and prognostic potential of the variable expressions of HKGs in lung cancers. Methods Microarray and RNA-seq data for normal lungs, lung adenocarcinomas (AD), squamous cell carcinomas of the lung (SQCLC), and small cell carcinomas of the lung (SCLC) were collected from online databases. Using 374 of 451 HKGs, differentially expressed genes between pairs of sample types were determined via two-sided, homoscedastic t-test. Principal component analysis and hierarchical clustering classified normal lung and lung cancers subtypes according to relative gene expression variations. We used uni- and multi-variate cox-regressions to identify significant predictors of overall survival in AD patients. Classifying genes were selected using a set of training samples and then validated using an independent test set. Gene Ontology was examined by PANTHER. Results This study showed that the differential expression patterns of 242, 245, and 99 HKGs were able to distinguish normal lung from AD, SCLC, and SQCLC, respectively. From these, 70 HKGs were common across the three lung cancer subtypes. These HKGs have low expression variation compared to current lung cancer markers (e.g., EGFR, KRAS) and were involved in the most common biological processes (e.g., metabolism, stress response). In addition, the expression pattern of 106 HKGs alone was a significant classifier of AD versus SQCLC. We further highlighted that a panel of 13 HKGs was an independent predictor of overall survival and cumulative risk in AD patients. Discussion Here we report HKG expression patterns may be an effective tool for evaluation of lung cancer states. For example, the differential expression pattern of 70 HKGs alone can separate normal lung tissue from various lung cancers while a panel of 106 HKGs was a capable class predictor of subtypes of non-small cell carcinomas. We also reported that HKGs have significantly lower variance compared to traditional cancer markers across samples, highlighting the robustness of a panel of genes over any one specific biomarker. Using RNA-seq data, we showed that the expression pattern of 13 HKGs is a significant, independent predictor of overall survival for AD patients. This reinforces the predictive power of a HKG panel across different gene expression measurement platforms. Thus, we propose the expression patterns of HKGs alone may be sufficient for the diagnosis and prognosis of individuals with lung cancer. PMID:29761043
Combining Gene Signatures Improves Prediction of Breast Cancer Survival
Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian
2011-01-01
Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Low, Yee Syuen; Blöcker, Christopher; McPherson, John R; Tang, See Aik; Cheng, Ying Ying; Wong, Joyner Y S; Chua, Clarinda; Lim, Tony K H; Tang, Choong Leong; Chew, Min Hoe; Tan, Patrick; Tan, Iain B; Rozen, Steven G; Cheah, Peh Yean
2017-09-10
Approximately 20% early-stage (I/II) colorectal cancer (CRC) patients develop metastases despite curative surgery. We aim to develop a formalin-fixed and paraffin-embedded (FFPE)-based predictor of metastases in early-stage, clinically-defined low risk, microsatellite-stable (MSS) CRC patients. We considered genome-wide mRNA and miRNA expression and mutation status of 20 genes assayed in 150 fresh-frozen tumours with known metastasis status. We selected 193 genes for further analysis using NanoString nCounter arrays on corresponding FFPE tumours. Neither mutation status nor miRNA expression improved the estimated prediction. The final predictor, ColoMet19, based on the top 19 genes' mRNA levels trained by Random Forest machine-learning strategy, had an estimated positive-predictive-value (PPV) of 0.66. We tested ColoMet19 on an independent test-set of 131 tumours and obtained a population-adjusted PPV of 0.67 indicating that early-stage CRC patients who tested positive have a 67% risk of developing metastases, substantially higher than the metastasis risk of 40% for node-positive (Stage III) patients who are generally treated with chemotherapy. Predicted-positive patients also had poorer metastasis-free survival (hazard ratios [HR] = 1.92, design-set; HR = 2.05, test-set). Thus, early-stage CRC patients who test positive may be considered for adjuvant therapy after surgery. Copyright © 2017 Elsevier B.V. All rights reserved.
Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I
2014-01-01
Background Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. Objective The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player’s prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Methods We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Results Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. Conclusions The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge. PMID:25654473
Cangelosi, Davide; Muselli, Marco; Parodi, Stefano; Blengio, Fabiola; Becherini, Pamela; Versteeg, Rogier; Conte, Massimo; Varesio, Luigi
2014-01-01
Cancer patient's outcome is written, in part, in the gene expression profile of the tumor. We previously identified a 62-probe sets signature (NB-hypo) to identify tissue hypoxia in neuroblastoma tumors and showed that NB-hypo stratified neuroblastoma patients in good and poor outcome 1. It was important to develop a prognostic classifier to cluster patients into risk groups benefiting of defined therapeutic approaches. Novel classification and data discretization approaches can be instrumental for the generation of accurate predictors and robust tools for clinical decision support. We explored the application to gene expression data of Rulex, a novel software suite including the Attribute Driven Incremental Discretization technique for transforming continuous variables into simplified discrete ones and the Logic Learning Machine model for intelligible rule generation. We applied Rulex components to the problem of predicting the outcome of neuroblastoma patients on the bases of 62 probe sets NB-hypo gene expression signature. The resulting classifier consisted in 9 rules utilizing mainly two conditions of the relative expression of 11 probe sets. These rules were very effective predictors, as shown in an independent validation set, demonstrating the validity of the LLM algorithm applied to microarray data and patients' classification. The LLM performed as efficiently as Prediction Analysis of Microarray and Support Vector Machine, and outperformed other learning algorithms such as C4.5. Rulex carried out a feature selection by selecting a new signature (NB-hypo-II) of 11 probe sets that turned out to be the most relevant in predicting outcome among the 62 of the NB-hypo signature. Rules are easily interpretable as they involve only few conditions. Our findings provided evidence that the application of Rulex to the expression values of NB-hypo signature created a set of accurate, high quality, consistent and interpretable rules for the prediction of neuroblastoma patients' outcome. We identified the Rulex weighted classification as a flexible tool that can support clinical decisions. For these reasons, we consider Rulex to be a useful tool for cancer classification from microarray gene expression data.
Common Genetic Variation in Circadian Rhythm Genes and Risk of Epithelial Ovarian Cancer (EOC)
Jim, Heather S.L.; Lin, Hui-Yi; Tyrer, Jonathan P.; Lawrenson, Kate; Dennis, Joe; Chornokur, Ganna; Chen, Zhihua; Chen, Ann Y.; Permuth-Wey, Jennifer; Aben, Katja KH.; Anton-Culver, Hoda; Antonenkova, Natalia; Bruinsma, Fiona; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Bisogna, Maria; Bjorge, Line; Bogdanova, Natalia; Brinton, Louise A.; Brooks-Wilson, Angela; Bunker, Clareann H.; Butzow, Ralf; Campbell, Ian G.; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S.; Cramer, Daniel W.; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; du Bois, Andreas; Despierre, Evelyn; Sieh, Weiva; Doherty, Jennifer A.; Dörk, Thilo; Dürst, Matthias; Easton, Douglas F.; Eccles, Diana M.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goodman, Marc T.; Gronwald, Jacek; Harter, Philipp; Hasmad, Hanis N.; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Claus K.; Hogdall, Estrid; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y.; Kellar, Melissa; Kiemeney, Lambertus A.; Krakstad, Camilla; Kjaer, Susanne K.; Kupryjanczyk, Jolanta; Vierkant, Robert A.; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lim, Boon Kiong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon F.A.G.; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Ian; Menon, Usha; Milne, Roger L.; Modugno, Francesmary; Thomsen, Lotte; Moysich, Kirsten B.; Ness, Roberta B.; Nevanlinna, Heli; Eilber, Ursula; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Palmieri Weber, Rachel; Paul, James; Pearce, Celeste L.; Pejovic, Tanja; Pelttari, Liisa M.; Pike, Malcolm C.; Poole, Elizabeth M.; Schernhammer, Eva; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B.; Siddiqui, Nadeem; Song, Honglin; Southey, Melissa C.; Spiewankiewicz, Beata; Sucheston-Campbell, Lara; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J.; Tangen, Ingvild L.; Tworoger, Shelley S.; van Altena, Anne M.; Vergote, Ignace; Walsh, Christine S.; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Wu, Anna H.; Wu, Xifeng; Woo, Yin-Ling; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Amankwah, Ernest; Berchuck, Andrew; Schildkraut, Joellen M.; Kelemen, Linda E.; Ramus, Susan J.; Monteiro, Alvaro N.A.; Goode, Ellen L.; Narod, Steven A.; Gayther, Simon A.; Pharoah, Paul D. P.; Sellers, Thomas A.; Phelan, Catherine M.
2016-01-01
Disruption in circadian gene expression, whether due to genetic variation or environmental factors (e.g., light at night, shiftwork), is associated with increased incidence of breast, prostate, gastrointestinal and hematologic cancers and gliomas. Circadian genes are highly expressed in the ovaries where they regulate ovulation; circadian disruption is associated with several ovarian cancer risk factors (e.g., endometriosis). However, no studies have examined variation in germline circadian genes as predictors of ovarian cancer risk and invasiveness. The goal of the current study was to examine single nucleotide polymorphisms (SNPs) in circadian genes BMAL1, CRY2, CSNK1E, NPAS2, PER3, REV1 and TIMELESS and downstream transcription factors KLF10 and SENP3 as predictors of risk of epithelial ovarian cancer (EOC) and histopathologic subtypes. The study included a test set of 3,761 EOC cases and 2,722 controls and a validation set of 44,308 samples including 18,174 (10,316 serous) cases and 26,134 controls from 43 studies participating in the Ovarian Cancer Association Consortium (OCAC). Analysis of genotype data from 36 genotyped SNPs and 4600 imputed SNPs indicated that the most significant association was rs117104877 in BMAL1 (OR = 0.79, 95% CI = 0.68–0.90, p = 5.59 × 10−4]. Functional analysis revealed a significant down regulation of BMAL1 expression following cMYC overexpression and increasing transformation in ovarian surface epithelial (OSE) cells as well as alternative splicing of BMAL1 exons in ovarian and granulosa cells. These results suggest that variation in circadian genes, and specifically BMAL1, may be associated with risk of ovarian cancer, likely through disruption of hormonal pathways. PMID:26807442
Common Genetic Variation in Circadian Rhythm Genes and Risk of Epithelial Ovarian Cancer (EOC).
Jim, Heather S L; Lin, Hui-Yi; Tyrer, Jonathan P; Lawrenson, Kate; Dennis, Joe; Chornokur, Ganna; Chen, Zhihua; Chen, Ann Y; Permuth-Wey, Jennifer; Aben, Katja Kh; Anton-Culver, Hoda; Antonenkova, Natalia; Bruinsma, Fiona; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Bisogna, Maria; Bjorge, Line; Bogdanova, Natalia; Brinton, Louise A; Brooks-Wilson, Angela; Bunker, Clareann H; Butzow, Ralf; Campbell, Ian G; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; du Bois, Andreas; Despierre, Evelyn; Sieh, Weiva; Doherty, Jennifer A; Dörk, Thilo; Dürst, Matthias; Easton, Douglas F; Eccles, Diana M; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goodman, Marc T; Gronwald, Jacek; Harter, Philipp; Hasmad, Hanis N; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Claus K; Hogdall, Estrid; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kellar, Melissa; Kiemeney, Lambertus A; Krakstad, Camilla; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Vierkant, Robert A; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lim, Boon Kiong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon F A G; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Ian; Menon, Usha; Milne, Roger L; Modugno, Francesmary; Thomsen, Lotte; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Eilber, Ursula; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Palmieri Weber, Rachel; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Pelttari, Liisa M; Pike, Malcolm C; Poole, Elizabeth M; Schernhammer, Eva; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Song, Honglin; Southey, Melissa C; Spiewankiewicz, Beata; Sucheston-Campbell, Lara; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Tangen, Ingvild L; Tworoger, Shelley S; van Altena, Anne M; Vergote, Ignace; Walsh, Christine S; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Wu, Anna H; Wu, Xifeng; Woo, Yin-Ling; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Amankwah, Ernest; Berchuck, Andrew; Schildkraut, Joellen M; Kelemen, Linda E; Ramus, Susan J; Monteiro, Alvaro N A; Goode, Ellen L; Narod, Steven A; Gayther, Simon A; Pharoah, Paul D P; Sellers, Thomas A; Phelan, Catherine M
Disruption in circadian gene expression, whether due to genetic variation or environmental factors (e.g., light at night, shiftwork), is associated with increased incidence of breast, prostate, gastrointestinal and hematologic cancers and gliomas. Circadian genes are highly expressed in the ovaries where they regulate ovulation; circadian disruption is associated with several ovarian cancer risk factors (e.g., endometriosis). However, no studies have examined variation in germline circadian genes as predictors of ovarian cancer risk and invasiveness. The goal of the current study was to examine single nucleotide polymorphisms (SNPs) in circadian genes BMAL1, CRY2, CSNK1E, NPAS2, PER3, REV1 and TIMELESS and downstream transcription factors KLF10 and SENP3 as predictors of risk of epithelial ovarian cancer (EOC) and histopathologic subtypes. The study included a test set of 3,761 EOC cases and 2,722 controls and a validation set of 44,308 samples including 18,174 (10,316 serous) cases and 26,134 controls from 43 studies participating in the Ovarian Cancer Association Consortium (OCAC). Analysis of genotype data from 36 genotyped SNPs and 4600 imputed SNPs indicated that the most significant association was rs117104877 in BMAL1 (OR = 0.79, 95% CI = 0.68-0.90, p = 5.59 × 10 -4 ]. Functional analysis revealed a significant down regulation of BMAL1 expression following cMYC overexpression and increasing transformation in ovarian surface epithelial (OSE) cells as well as alternative splicing of BMAL1 exons in ovarian and granulosa cells. These results suggest that variation in circadian genes, and specifically BMAL1 , may be associated with risk of ovarian cancer, likely through disruption of hormonal pathways.
NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482
Fekete, Tibor; Rásó, Erzsébet; Pete, Imre; Tegze, Bálint; Liko, István; Munkácsy, Gyöngyi; Sipos, Norbert; Rigó, János; Györffy, Balázs
2012-07-01
Transcriptomic analysis of global gene expression in ovarian carcinoma can identify dysregulated genes capable to serve as molecular markers for histology subtypes and survival. The aim of our study was to validate previous candidate signatures in an independent setting and to identify single genes capable to serve as biomarkers for ovarian cancer progression. As several datasets are available in the GEO today, we were able to perform a true meta-analysis. First, 829 samples (11 datasets) were downloaded, and the predictive power of 16 previously published gene sets was assessed. Of these, eight were capable to discriminate histology subtypes, and none was capable to predict survival. To overcome the differences in previous studies, we used the 829 samples to identify new predictors. Then, we collected 64 ovarian cancer samples (median relapse-free survival 24.5 months) and performed TaqMan Real Time Polimerase Chain Reaction (RT-PCR) analysis for the best 40 genes associated with histology subtypes and survival. Over 90% of subtype-associated genes were confirmed. Overall survival was effectively predicted by hormone receptors (PGR and ESR2) and by TSPAN8. Relapse-free survival was predicted by MAPT and SNCG. In summary, we successfully validated several gene sets in a meta-analysis in large datasets of ovarian samples. Additionally, several individual genes identified were validated in a clinical cohort. Copyright © 2011 UICC.
2010-01-01
Introduction Various multigene predictors of breast cancer clinical outcome have been commercialized, but proved to be prognostic only for hormone receptor (HR) subsets overexpressing estrogen or progesterone receptors. Hormone receptor negative (HRneg) breast cancers, particularly those lacking HER2/ErbB2 overexpression and known as triple-negative (Tneg) cases, are heterogeneous and generally aggressive breast cancer subsets in need of prognostic subclassification, since most early stage HRneg and Tneg breast cancer patients are cured with conservative treatment yet invariably receive aggressive adjuvant chemotherapy. Methods An unbiased search for genes predictive of distant metastatic relapse was undertaken using a training cohort of 199 node-negative, adjuvant treatment naïve HRneg (including 154 Tneg) breast cancer cases curated from three public microarray datasets. Prognostic gene candidates were subsequently validated using a different cohort of 75 node-negative, adjuvant naïve HRneg cases curated from three additional datasets. The HRneg/Tneg gene signature was prognostically compared with eight other previously reported gene signatures, and evaluated for cancer network associations by two commercial pathway analysis programs. Results A novel set of 14 prognostic gene candidates was identified as outcome predictors: CXCL13, CLIC5, RGS4, RPS28, RFX7, EXOC7, HAPLN1, ZNF3, SSX3, HRBL, PRRG3, ABO, PRTN3, MATN1. A composite HRneg/Tneg gene signature index proved more accurate than any individual candidate gene or other reported multigene predictors in identifying cases likely to remain free of metastatic relapse. Significant positive correlations between the HRneg/Tneg index and three independent immune-related signatures (STAT1, IFN, and IR) were observed, as were consistent negative associations between the three immune-related signatures and five other proliferation module-containing signatures (MS-14, ONCO-RS, GGI, CSR/wound and NKI-70). Network analysis identified 8 genes within the HRneg/Tneg signature as being functionally linked to immune/inflammatory chemokine regulation. Conclusions A multigene HRneg/Tneg signature linked to immune/inflammatory cytokine regulation was identified from pooled expression microarray data and shown to be superior to other reported gene signatures in predicting the metastatic outcome of early stage and conservatively managed HRneg and Tneg breast cancer. Further validation of this prognostic signature may lead to new therapeutic insights and spare many newly diagnosed breast cancer patients the need for aggressive adjuvant chemotherapy. PMID:20946665
NASA Astrophysics Data System (ADS)
Auslander, Noam; Yizhak, Keren; Weinstock, Adam; Budhu, Anuradha; Tang, Wei; Wang, Xin Wei; Ambs, Stefan; Ruppin, Eytan
2016-07-01
Disrupted regulation of cellular processes is considered one of the hallmarks of cancer. We analyze metabolomic and transcriptomic profiles jointly collected from breast cancer and hepatocellular carcinoma patients to explore the associations between the expression of metabolic enzymes and the levels of the metabolites participating in the reactions they catalyze. Surprisingly, both breast cancer and hepatocellular tumors exhibit an increase in their gene-metabolites associations compared to noncancerous adjacent tissues. Following, we build predictors of metabolite levels from the expression of the enzyme genes catalyzing them. Applying these predictors to a large cohort of breast cancer samples we find that depleted levels of key cancer-related metabolites including glucose, glycine, serine and acetate are significantly associated with improved patient survival. Thus, we show that the levels of a wide range of metabolites in breast cancer can be successfully predicted from the transcriptome, going beyond the limited set of those measured.
The essential gene set of a photosynthetic organism
Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.; ...
2015-10-27
Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ~250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism's 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlapmore » with wellconserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNA Leu , which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism's physiology and defines the essential gene set required for the growth of a photosynthetic organism.« less
2015-01-01
Background Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. Results We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are adjusted for Minor allele frequency(MAF); otherwise, ascertainment bias is a huge factor in all eQTL data sets. Relationships between Gene Ontology categories, positive selection and eQTL specificity were replicated with H|H in a single larger data set. Our measure, Adjusted Haplotype Conditional Entropy (H|H), was essential in generating all of the results above because it: 1) is a stronger overall predictor for eQTLs than comparable existing approaches, and 2) shows low sequential auto-correlation, overcoming problems with convergence of these conditional regression statistical models. Conclusions Our new method, H|H, provides a consistently more robust signal associated with cis-eQTLs compared to existing methods. We interpret this to indicate that some cis-eQTLs are under positive selection compared to their surrounding genes. Conditional entropy indicative of a selective sweep is an especially strong predictor of eQTLs for genes in several biological processes of medical interest. Where conditional entropy is a weak or negative predictor of eQTLs, such as innate immune genes, this would be consistent with balancing selection acting on such eQTLs over long time periods. Different measures of selection may be needed for variant prioritization under other modes of evolutionary selection. PMID:26111110
Identifying gnostic predictors of the vaccine response.
Haining, W Nicholas; Pulendran, Bali
2012-06-01
Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis of their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of vaccine prediction. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. Copyright © 2012 Elsevier Ltd. All rights reserved.
Identifying gnostic predictors of the vaccine response
Haining, W. Nicholas; Pulendran, Bali
2012-01-01
Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis for their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of outcome classification. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. PMID:22633886
Takahashi, Hiro; Kobayashi, Takeshi; Honda, Hiroyuki
2005-01-15
For establishing prognostic predictors of various diseases using DNA microarray analysis technology, it is desired to find selectively significant genes for constructing the prognostic model and it is also necessary to eliminate non-specific genes or genes with error before constructing the model. We applied projective adaptive resonance theory (PART) to gene screening for DNA microarray data. Genes selected by PART were subjected to our FNN-SWEEP modeling method for the construction of a cancer class prediction model. The model performance was evaluated through comparison with a conventional screening signal-to-noise (S2N) method or nearest shrunken centroids (NSC) method. The FNN-SWEEP predictor with PART screening could discriminate classes of acute leukemia in blinded data with 97.1% accuracy and classes of lung cancer with 90.0% accuracy, while the predictor with S2N was only 85.3 and 70.0% or the predictor with NSC was 88.2 and 90.0%, respectively. The results have proven that PART was superior for gene screening. The software is available upon request from the authors. honda@nubio.nagoya-u.ac.jp
Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos
2005-01-01
We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308
A stochastic model for optimizing composite predictors based on gene expression profiles.
Ramanathan, Murali
2003-07-01
This project was done to develop a mathematical model for optimizing composite predictors based on gene expression profiles from DNA arrays and proteomics. The problem was amenable to a formulation and solution analogous to the portfolio optimization problem in mathematical finance: it requires the optimization of a quadratic function subject to linear constraints. The performance of the approach was compared to that of neighborhood analysis using a data set containing cDNA array-derived gene expression profiles from 14 multiple sclerosis patients receiving intramuscular inteferon-beta1a. The Markowitz portfolio model predicts that the covariance between genes can be exploited to construct an efficient composite. The model predicts that a composite is not needed for maximizing the mean value of a treatment effect: only a single gene is needed, but the usefulness of the effect measure may be compromised by high variability. The model optimized the composite to yield the highest mean for a given level of variability or the least variability for a given mean level. The choices that meet this optimization criteria lie on a curve of composite mean vs. composite variability plot referred to as the "efficient frontier." When a composite is constructed using the model, it outperforms the composite constructed using the neighborhood analysis method. The Markowitz portfolio model may find potential applications in constructing composite biomarkers and in the pharmacogenomic modeling of treatment effects derived from gene expression endpoints.
Suchting, Robert; Gowin, Joshua L; Green, Charles E; Walss-Bass, Consuelo; Lane, Scott D
2018-01-01
Rationale : Given datasets with a large or diverse set of predictors of aggression, machine learning (ML) provides efficient tools for identifying the most salient variables and building a parsimonious statistical model. ML techniques permit efficient exploration of data, have not been widely used in aggression research, and may have utility for those seeking prediction of aggressive behavior. Objectives : The present study examined predictors of aggression and constructed an optimized model using ML techniques. Predictors were derived from a dataset that included demographic, psychometric and genetic predictors, specifically FK506 binding protein 5 (FKBP5) polymorphisms, which have been shown to alter response to threatening stimuli, but have not been tested as predictors of aggressive behavior in adults. Methods : The data analysis approach utilized component-wise gradient boosting and model reduction via backward elimination to: (a) select variables from an initial set of 20 to build a model of trait aggression; and then (b) reduce that model to maximize parsimony and generalizability. Results : From a dataset of N = 47 participants, component-wise gradient boosting selected 8 of 20 possible predictors to model Buss-Perry Aggression Questionnaire (BPAQ) total score, with R 2 = 0.66. This model was simplified using backward elimination, retaining six predictors: smoking status, psychopathy (interpersonal manipulation and callous affect), childhood trauma (physical abuse and neglect), and the FKBP5_13 gene (rs1360780). The six-factor model approximated the initial eight-factor model at 99.4% of R 2 . Conclusions : Using an inductive data science approach, the gradient boosting model identified predictors consistent with previous experimental work in aggression; specifically psychopathy and trauma exposure. Additionally, allelic variants in FKBP5 were identified for the first time, but the relatively small sample size limits generality of results and calls for replication. This approach provides utility for the prediction of aggression behavior, particularly in the context of large multivariate datasets.
Tamura, Takeyuki; Akutsu, Tatsuya
2007-11-30
Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2014-01-01
Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.
Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset
Lipps, David; Devineni, Sree
2016-01-01
MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at: https://github.com/xueLab/mirMeta PMID:28002428
Lustgarten, Jonathan Lyle; Balasubramanian, Jeya Balaji; Visweswaran, Shyam; Gopalakrishnan, Vanathi
2017-03-01
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.
Shahriyari, Leili
2017-11-03
One of the main challenges in machine learning (ML) is choosing an appropriate normalization method. Here, we examine the effect of various normalization methods on analyzing FPKM upper quartile (FPKM-UQ) RNA sequencing data sets. We collect the HTSeq-FPKM-UQ files of patients with colon adenocarcinoma from TCGA-COAD project. We compare three most common normalization methods: scaling, standardizing using z-score and vector normalization by visualizing the normalized data set and evaluating the performance of 12 supervised learning algorithms on the normalized data set. Additionally, for each of these normalization methods, we use two different normalization strategies: normalizing samples (files) or normalizing features (genes). Regardless of normalization methods, a support vector machine (SVM) model with the radial basis function kernel had the maximum accuracy (78%) in predicting the vital status of the patients. However, the fitting time of SVM depended on the normalization methods, and it reached its minimum fitting time when files were normalized to the unit length. Furthermore, among all 12 learning algorithms and 6 different normalization techniques, the Bernoulli naive Bayes model after standardizing files had the best performance in terms of maximizing the accuracy as well as minimizing the fitting time. We also investigated the effect of dimensionality reduction methods on the performance of the supervised ML algorithms. Reducing the dimension of the data set did not increase the maximum accuracy of 78%. However, it leaded to discovery of the 7SK RNA gene expression as a predictor of survival in patients with colon adenocarcinoma with accuracy of 78%. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Dominguez, Daniel; Tsai, Yi-Hsuan; Gomez, Nicholas; Jha, Deepak Kumar; Davis, Ian; Wang, Zefeng
2016-01-01
Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes. PMID:27364684
Ensemble positive unlabeled learning for disease gene identification.
Yang, Peng; Li, Xiaoli; Chua, Hon-Nian; Kwoh, Chee-Keong; Ng, See-Kiong
2014-01-01
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Su, Yuhua; Nielsen, Dahlia; Zhu, Lei; Richards, Kristy; Suter, Steven; Breen, Matthew; Motsinger-Reif, Alison; Osborne, Jason
2013-01-05
: A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. The model utility was illustrated using a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University. A small number of genes were identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma (DLBCL) patients into two subgroups, the germinal center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. The number of human genes that were observed to be significantly differentially expressed (21) from the two-species analysis was very small compared to the number of human genes (190) identified with only one-species analysis (human data). The genes may be clinically relevant/important, as this small set achieved low misclassification rates of DLBCL subtypes. Additionally, the two subgroups defined by this cluster of human genes had significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed mixture model provided improved insight into the clinical differences between the two cancer subtypes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.
Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ~250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism's 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlapmore » with wellconserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNA Leu , which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism's physiology and defines the essential gene set required for the growth of a photosynthetic organism.« less
Relative codon adaptation: a generic codon bias index for prediction of gene expression.
Fox, Jesse M; Erill, Ivan
2010-06-01
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
Bhuju, Sabin; Aranday-Cortes, Elihu; Villarreal-Ramos, Bernardo; Xing, Zhou; Singh, Mahavir; Vordermeier, H Martin
2012-12-01
Bovine tuberculosis (bTB) is a chronic disease of cattle caused by Mycobacterium bovis, a member of the Mycobacterium tuberculosis complex group of bacteria. Vaccination of cattle might offer a long-term solution for controlling the disease and priority has been given to the development of a cattle vaccine against bTB. Identification of biomarkers in tuberculosis research remains elusive and the goal is to identify host correlates of protection. We hypothesized that by studying global gene expression we could identify in vitro predictors of protection that could help to facilitate vaccine development. Calves were vaccinated with BCG or with a heterologous BCG prime adenovirally vectored subunit boosting protocol. Protective efficacy was determined after M. bovis challenge. RNA was prepared from PPD-stimulated PBMC prepared from vaccinated-protected, vaccinated-unprotected and unvaccinated control cattle prior to M. bovis challenge and global gene expression determined by RNA-seq. 668 genes were differentially expressed in vaccinated-protected cattle compared with vaccinated-unprotected and unvaccinated control cattle. Cytokine-cytokine receptor interaction was the most significant pathway related to this dataset with IL-22 expression identified as the dominant surrogate of protection besides INF-γ. Finally, the expression of these candidate genes identified by RNA-seq was evaluated by RT-qPCR in an independent set of PBMC samples from BCG vaccinated and unvaccinated calves. This experiment confirmed the importance of IL-22 as predictor of vaccine efficacy.
Uncertainties of statistical downscaling from predictor selection: Equifinality and transferability
NASA Astrophysics Data System (ADS)
Fu, Guobin; Charles, Stephen P.; Chiew, Francis H. S.; Ekström, Marie; Potter, Nick J.
2018-05-01
The nonhomogeneous hidden Markov model (NHMM) statistical downscaling model, 38 catchments in southeast Australia and 19 general circulation models (GCMs) were used in this study to demonstrate statistical downscaling uncertainties caused by equifinality to and transferability. That is to say, there could be multiple sets of predictors that give similar daily rainfall simulation results for both calibration and validation periods, but project different amounts (or even directions of change) of rainfall changing in the future. Results indicated that two sets of predictors (Set 1 with predictors of sea level pressure north-south gradient, u-wind at 700 hPa, v-wind at 700 hPa, and specific humidity at 700 hPa and Set 2 with predictors of sea level pressure north-south gradient, u-wind at 700 hPa, v-wind at 700 hPa, and dewpoint temperature depression at 850 hPa) as inputs to the NHMM produced satisfactory results of seasonal rainfall in comparison with observations. For example, during the model calibration period, the relative errors across the 38 catchments ranged from 0.48 to 1.76% with a mean value of 1.09% for the predictor Set 1, and from 0.22 to 2.24% with a mean value of 1.16% for the predictor Set 2. However, the changes of future rainfall from NHMM projections based on 19 GCMs produced projections with a different sign for these two different sets of predictors: Set 1 predictors project an increase of future rainfall with magnitudes depending on future time periods and emission scenarios, but Set 2 predictors project a decline of future rainfall. Such divergent projections may present a significant challenge for applications of statistical downscaling as well as climate change impact studies, and could potentially imply caveats in many existing studies in the literature.
2009-01-01
Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. PMID:19758426
On construction of stochastic genetic networks based on gene expression sequences.
Ching, Wai-Ki; Ng, Michael M; Fung, Eric S; Akutsu, Tatsuya
2005-08-01
Reconstruction of genetic regulatory networks from time series data of gene expression patterns is an important research topic in bioinformatics. Probabilistic Boolean Networks (PBNs) have been proposed as an effective model for gene regulatory networks. PBNs are able to cope with uncertainty, corporate rule-based dependencies between genes and discover the sensitivity of genes in their interactions with other genes. However, PBNs are unlikely to use directly in practice because of huge amount of computational cost for obtaining predictors and their corresponding probabilities. In this paper, we propose a multivariate Markov model for approximating PBNs and describing the dynamics of a genetic network for gene expression sequences. The main contribution of the new model is to preserve the strength of PBNs and reduce the complexity of the networks. The number of parameters of our proposed model is O(n2) where n is the number of genes involved. We also develop efficient estimation methods for solving the model parameters. Numerical examples on synthetic data sets and practical yeast data sequences are given to demonstrate the effectiveness of the proposed model.
Predictors of CPAP compliance in different clinical settings: primary care versus sleep unit.
Nadal, Núria; de Batlle, Jordi; Barbé, Ferran; Marsal, Josep Ramon; Sánchez-de-la-Torre, Alicia; Tarraubella, Nuria; Lavega, Merce; Sánchez-de-la-Torre, Manuel
2018-03-01
Good adherence to continuous positive airway pressure (CPAP) treatment improves the patient's quality of life and decreases the risk of cardiovascular disease. Previous studies that have analyzed the adherence to CPAP were performed in a sleep unit (SU) setting. The involvement of primary care (PC) in the management of obstructive sleep apnea (OSA) patients receiving CPAP treatment could introduce factors related to the adherence to treatment. The objective was to compare the baseline predictors of CPAP compliance in SU and PC settings. OSA patients treated with CPAP were followed for 6 months in SU or PC setting. We included baseline clinical and anthropometrical variables, the Epworth Sleep Scale (ESS) score, the quality of life index, and the Charlson index. A logistic regression was performed for each group to determine the CPAP compliance predictors. Discrimination and calibration were performed using the area under the curve and Hosmer-Lemeshow tests. We included 191 patients: 91 in the PC group and 100 in the SU group. In 74.9% of the patients, the compliance was ≥ 4 h per day, with 80% compliance in the SU setting and 69.2% compliance in the PC setting (p = 0.087). The predictors of CPAP compliance were different between SU and PC settings. Body mass index, ESS, and CPAP pressure were predictors in the SU setting, and ESS, gender, and waist circumference were predictors in the PC setting. The predictors of adequate CPAP compliance vary between SU and PC settings. Detecting compliance predictors could help in the planning of early interventions to improve CPAP adherence.
Predicting fatty acid profiles in blood based on food intake and the FADS1 rs174546 SNP.
Hallmann, Jacqueline; Kolossa, Silvia; Gedrich, Kurt; Celis-Morales, Carlos; Forster, Hannah; O'Donovan, Clare B; Woolhead, Clara; Macready, Anna L; Fallaize, Rosalind; Marsaux, Cyril F M; Lambrinou, Christina-Paulina; Mavrogianni, Christina; Moschonis, George; Navas-Carretero, Santiago; San-Cristobal, Rodrigo; Godlewska, Magdalena; Surwiłło, Agnieszka; Mathers, John C; Gibney, Eileen R; Brennan, Lorraine; Walsh, Marianne C; Lovegrove, Julie A; Saris, Wim H M; Manios, Yannis; Martinez, Jose Alfredo; Traczyk, Iwona; Gibney, Michael J; Daniel, Hannelore
2015-12-01
A high intake of n-3 PUFA provides health benefits via changes in the n-6/n-3 ratio in blood. In addition to such dietary PUFAs, variants in the fatty acid desaturase 1 (FADS1) gene are also associated with altered PUFA profiles. We used mathematical modeling to predict levels of PUFA in whole blood, based on multiple hypothesis testing and bootstrapped LASSO selected food items, anthropometric and lifestyle factors, and the rs174546 genotypes in FADS1 from 1607 participants (Food4Me Study). The models were developed using data from the first reported time point (training set) and their predictive power was evaluated using data from the last reported time point (test set). Among other food items, fish, pizza, chicken, and cereals were identified as being associated with the PUFA profiles. Using these food items and the rs174546 genotypes as predictors, models explained 26-43% of the variability in PUFA concentrations in the training set and 22-33% in the test set. Selecting food items using multiple hypothesis testing is a valuable contribution to determine predictors, as our models' predictive power is higher compared to analogue studies. As unique feature, we additionally confirmed our models' power based on a test set. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gao, Shanwu; Tibiche, Chabane; Zou, Jinfeng; Zaman, Naif; Trifiro, Mark; O'Connor-McCourt, Maureen; Wang, Edwin
2016-01-01
Decisions regarding adjuvant therapy in patients with stage II colorectal cancer (CRC) have been among the most challenging and controversial in oncology over the past 20 years. To develop robust combinatory cancer hallmark-based gene signature sets (CSS sets) that more accurately predict prognosis and identify a subset of patients with stage II CRC who could gain survival benefits from adjuvant chemotherapy. Thirteen retrospective studies of patients with stage II CRC who had clinical follow-up and adjuvant chemotherapy were analyzed. Respective totals of 162 and 843 patients from 2 and 11 independent cohorts were used as the discovery and validation cohorts, respectively. A total of 1005 patients with stage II CRC were included in the 13 cohorts. Among them, 84 of 416 patients in 3 independent cohorts received fluorouracil-based adjuvant chemotherapy. Identification of CSS sets to predict relapse-free survival and identify a subset of patients with stage II CRC who could gain substantial survival benefits from fluorouracil-based adjuvant chemotherapy. Eight cancer hallmark-based gene signatures (30 genes each) were identified and used to construct CSS sets for determining prognosis. The CSS sets were validated in 11 independent cohorts of 767 patients with stage II CRC who did not receive adjuvant chemotherapy. The CSS sets accurately stratified patients into low-, intermediate-, and high-risk groups. Five-year relapse-free survival rates were 94%, 78%, and 45%, respectively, representing 60%, 28%, and 12% of patients with stage II disease. The 416 patients with CSS set-defined high-risk stage II CRC who received fluorouracil-based adjuvant chemotherapy showed a substantial gain in survival benefits from the treatment (ie, recurrence reduced by 30%-40% in 5 years). The CSS sets substantially outperformed other prognostic predictors of stage 2 CRC. They are more accurate and robust for prognostic predictions and facilitate the identification of patients with stage II disease who could gain survival benefit from fluorouracil-based adjuvant chemotherapy.
SFM: A novel sequence-based fusion method for disease genes identification and prioritization.
Yousef, Abdulaziz; Moghadam Charkari, Nasrollah
2015-10-21
The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Tabchy, Adel; Valero, Vicente; Vidaurre, Tatiana; Lluch, Ana; Gomez, Henry; Martin, Miguel; Qi, Yuan; Barajas-Figueroa, Luis Javier; Souchon, Eduardo; Coutant, Charles; Doimi, Franco D; Ibrahim, Nuhad K; Gong, Yun; Hortobagyi, Gabriel N; Hess, Kenneth R; Symmans, W Fraser; Pusztai, Lajos
2010-01-01
Purpose We examined in a prospective, randomized, international clinical trial the performance of a previously defined 30-gene predictor (DLDA-30) of pathologic complete response (pCR) to preoperative weekly paclitaxel and fluorouracil, doxorubicin, cyclophosphamide (T/FAC) chemotherapy, and assessed if DLDA-30 also predicts increased sensitivity to FAC-only chemotherapy. We compared the pCR rates after T/FAC versus FAC×6 preoperative chemotherapy. We also performed an exploratory analysis to identify novel candidate genes that differentially predict response in the two treatment arms. Experimental Design 273 patients were randomly assigned to receive either weekly paclitaxel × 12 followed by FAC × 4 (T/FAC, n=138), or FAC × 6 (n=135) neoadjuvant chemotherapy. All patients underwent a pretreatment FNA biopsy of the tumor for gene expression profiling and treatment response prediction. Results The pCR rates were 19% and 9% in the T/FAC and FAC arms, respectively (p<0.05). In the T/FAC arm, the positive predictive value (PPV) of the genomic predictor was 38% (95%CI:21–56%), the negative predictive value (NPV) 88% (CI:77–95%) and the AUC 0.711. In the FAC arm, the PPV was 9% (CI:1–29%) and the AUC 0.584. This suggests that the genomic predictor may have regimen-specificity. Its performance was similar to a clinical variable-based predictor nomogram. Conclusions Gene expression profiling for prospective response prediction was feasible in this international trial. The 30-gene predictor can identify patients with greater than average sensitivity to T/FAC chemotherapy. However, it captured molecular equivalents of clinical phenotype. Next generation predictive markers will need to be developed separately for different molecular subsets of breast cancers. PMID:20829329
Dimova, Violeta; Lötsch, Jörn; Hühne, Kathrin; Winterpacht, Andreas; Heesen, Michael; Parthum, Andreas; Weber, Peter G; Carbon, Roman; Griessinger, Norbert; Sittl, Reinhard; Lautenbacher, Stefan
2015-01-01
The genetic control of pain has been repeatedly demonstrated in human association studies. In the present study, we assessed the relative contribution of 16 single nucleotide polymorphisms in pain-related genes, such as cathechol-O-methyl transferase gene (COMT), fatty acid amino hydrolase gene (FAAH), transient receptor potential cation channel, subfamily V, member 1 gene (TRPV1), and δ-opioid receptor gene (OPRD1), for postsurgical pain chronification. Ninety preoperatively pain-free male patients were assigned to good or poor outcome groups according to their intensity or disability score assessed at 1 week, 3 months, 6 months, and 1 year after funnel chest correction. The genetic effects were compared with those of two psychological predictors, the attentional bias toward positive words (dot-probe task) and the self-reported pain vigilance (Pain Vigilance and Awareness Questionnaire [PVAQ]), which were already shown to be the best predictors for pain intensity and disability at 6 months after surgery in the same sample, respectively. Cox regression analyses revealed no significant effects of any of the genetic predictors up to the end point of survival time at 1 year after surgery. Adding the genetics to the prediction by the attentional bias to positive words for pain intensity and the PVAQ for pain disability, again no significant additional explanation could be gained by the genetic predictors. In contrast, the preoperative PVAQ score was also, in the present enlarged sample, a meaningful predictor for lasting pain disability after surgery. Effect size measures suggested some genetic variables, for example, the polymorphism rs1800587G>A in the interleukin 1 alpha gene (IL1A) and the COMT haplotype rs4646312T>C/rs165722T>C/rs6269A>G/rs4633T>C/rs4818C>G/rs4680A>G, as possible relevant modulators of long-term postsurgical pain outcome. A comparison between pathophysiologically different predictor groups appears to be helpful in identifying clinically relevant predictors of chronic pain. PMID:26664154
Clinical Trials With Large Numbers of Variables: Important Advantages of Canonical Analysis.
Cleophas, Ton J
2016-01-01
Canonical analysis assesses the combined effects of a set of predictor variables on a set of outcome variables, but it is little used in clinical trials despite the omnipresence of multiple variables. The aim of this study was to assess the performance of canonical analysis as compared with traditional multivariate methods using multivariate analysis of covariance (MANCOVA). As an example, a simulated data file with 12 gene expression levels and 4 drug efficacy scores was used. The correlation coefficient between the 12 predictor and 4 outcome variables was 0.87 (P = 0.0001) meaning that 76% of the variability in the outcome variables was explained by the 12 covariates. Repeated testing after the removal of 5 unimportant predictor and 1 outcome variable produced virtually the same overall result. The MANCOVA identified identical unimportant variables, but it was unable to provide overall statistics. (1) Canonical analysis is remarkable, because it can handle many more variables than traditional multivariate methods such as MANCOVA can. (2) At the same time, it accounts for the relative importance of the separate variables, their interactions and differences in units. (3) Canonical analysis provides overall statistics of the effects of sets of variables, whereas traditional multivariate methods only provide the statistics of the separate variables. (4) Unlike other methods for combining the effects of multiple variables such as factor analysis/partial least squares, canonical analysis is scientifically entirely rigorous. (5) Limitations include that it is less flexible than factor analysis/partial least squares, because only 2 sets of variables are used and because multiple solutions instead of one is offered. We do hope that this article will stimulate clinical investigators to start using this remarkable method.
Extensive complementarity between gene function prediction methods.
Vidulin, Vedrana; Šmuc, Tomislav; Supek, Fran
2016-12-01
The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them. The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A; Guinney, Justin
2015-01-01
Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.
Evaluation of variable selection methods for random forests and omics data sets.
Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke
2017-10-16
Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.
Nguyen, Quan; Lukowski, Samuel; Chiu, Han; Senabouth, Anne; Bruxner, Timothy; Christ, Angelika; Palpant, Nathan; Powell, Joseph
2018-05-11
Heterogeneity of cell states represented in pluripotent cultures have not been described at the transcriptional level. Since gene expression is highly heterogeneous between cells, single-cell RNA sequencing can be used to identify how individual pluripotent cells function. Here, we present results from the analysis of single-cell RNA sequencing data from 18,787 individual WTC CRISPRi human induced pluripotent stem cells. We developed an unsupervised clustering method, and through this identified four subpopulations distinguishable on the basis of their pluripotent state including: a core pluripotent population (48.3%), proliferative (47.8%), early-primed for differentiation (2.8%) and late-primed for differentiation (1.1%). For each subpopulation we were able to identify the genes and pathways that define differences in pluripotent cell states. Our method identified four discrete predictor gene sets comprised of 165 unique genes that denote the specific pluripotency states; and using these sets, we developed a multigenic machine learning prediction method to accurately classify single cells into each of the subpopulations. Compared against a set of established pluripotency markers, our method increases prediction accuracy by 10%, specificity by 20%, and explains a substantially larger proportion of deviance (up to 3-fold) from the prediction model. Finally, we developed an innovative method to predict cells transitioning between subpopulations, and support our conclusions with results from two orthogonal pseudotime trajectory methods. Published by Cold Spring Harbor Laboratory Press.
The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A.; Kim, Sungjoon; Wilson, Christopher J.; Lehár, Joseph; Kryukov, Gregory V.; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F.; Monahan, John E.; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A.; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H.; Cheng, Jill; Yu, Guoying K.; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D.; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C.; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P.; Gabriel, Stacey B.; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E.; Weber, Barbara L.; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L.; Meyerson, Matthew; Golub, Todd R.; Morrissey, Michael P.; Sellers, William R.; Schlegel, Robert; Garraway, Levi A.
2012-01-01
The systematic translation of cancer genomic data into knowledge of tumor biology and therapeutic avenues remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacologic annotation is available1. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacologic profiles for 24 anticancer drugs across 479 of the lines, this collection allowed identification of genetic, lineage, and gene expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Altogether, our results suggest that large, annotated cell line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of “personalized” therapeutic regimens2. PMID:22460905
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A; Kim, Sungjoon; Wilson, Christopher J; Lehár, Joseph; Kryukov, Gregory V; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F; Monahan, John E; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H; Cheng, Jill; Yu, Guoying K; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P; Gabriel, Stacey B; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E; Weber, Barbara L; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L; Meyerson, Matthew; Golub, Todd R; Morrissey, Michael P; Sellers, William R; Schlegel, Robert; Garraway, Levi A
2012-03-28
The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of 'personalized' therapeutic regimens.
Empuku, Shinichiro; Nakajima, Kentaro; Akagi, Tomonori; Kaneko, Kunihiko; Hijiya, Naoki; Etoh, Tsuyoshi; Shiraishi, Norio; Moriyama, Masatsugu; Inomata, Masafumi
2016-05-01
Preoperative chemoradiotherapy (CRT) for locally advanced rectal cancer not only improves the postoperative local control rate, but also induces downstaging. However, it has not been established how to individually select patients who receive effective preoperative CRT. The aim of this study was to identify a predictor of response to preoperative CRT for locally advanced rectal cancer. This study is additional to our multicenter phase II study evaluating the safety and efficacy of preoperative CRT using oral fluorouracil (UMIN ID: 03396). From April, 2009 to August, 2011, 26 biopsy specimens obtained prior to CRT were analyzed by cyclopedic microarray analysis. Response to CRT was evaluated according to a histological grading system using surgically resected specimens. To decide on the number of genes for dividing into responder and non-responder groups, we statistically analyzed the data using a dimension reduction method, a principle component analysis. Of the 26 cases, 11 were responders and 15 non-responders. No significant difference was found in clinical background data between the two groups. We determined that the optimal number of genes for the prediction of response was 80 of 40,000 and the functions of these genes were analyzed. When comparing non-responders with responders, genes expressed at a high level functioned in alternative splicing, whereas those expressed at a low level functioned in the septin complex. Thus, an 80-gene expression set that predicts response to preoperative CRT for locally advanced rectal cancer was identified using a novel statistical method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uehara, Takeki, E-mail: takeki.uehara@shionogi.co.jp; Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, 7-6-8 Asagi, Ibaraki, Osaka 567-0085; Minowa, Yohsuke
2011-09-15
The present study was performed to develop a robust gene-based prediction model for early assessment of potential hepatocarcinogenicity of chemicals in rats by using our toxicogenomics database, TG-GATEs (Genomics-Assisted Toxicity Evaluation System developed by the Toxicogenomics Project in Japan). The positive training set consisted of high- or middle-dose groups that received 6 different non-genotoxic hepatocarcinogens during a 28-day period. The negative training set consisted of high- or middle-dose groups of 54 non-carcinogens. Support vector machine combined with wrapper-type gene selection algorithms was used for modeling. Consequently, our best classifier yielded prediction accuracies for hepatocarcinogenicity of 99% sensitivity and 97% specificitymore » in the training data set, and false positive prediction was almost completely eliminated. Pathway analysis of feature genes revealed that the mitogen-activated protein kinase p38- and phosphatidylinositol-3-kinase-centered interactome and the v-myc myelocytomatosis viral oncogene homolog-centered interactome were the 2 most significant networks. The usefulness and robustness of our predictor were further confirmed in an independent validation data set obtained from the public database. Interestingly, similar positive predictions were obtained in several genotoxic hepatocarcinogens as well as non-genotoxic hepatocarcinogens. These results indicate that the expression profiles of our newly selected candidate biomarker genes might be common characteristics in the early stage of carcinogenesis for both genotoxic and non-genotoxic carcinogens in the rat liver. Our toxicogenomic model might be useful for the prospective screening of hepatocarcinogenicity of compounds and prioritization of compounds for carcinogenicity testing. - Highlights: >We developed a toxicogenomic model to predict hepatocarcinogenicity of chemicals. >The optimized model consisting of 9 probes had 99% sensitivity and 97% specificity. >This model enables us to detect genotoxic as well as non-genotoxic hepatocarcinogens.« less
Zimmermann, N.E.; Edwards, T.C.; Moisen, Gretchen G.; Frescino, T.S.; Blackard, J.A.
2007-01-01
1. Compared to bioclimatic variables, remote sensing predictors are rarely used for predictive species modelling. When used, the predictors represent typically habitat classifications or filters rather than gradual spectral, surface or biophysical properties. Consequently, the full potential of remotely sensed predictors for modelling the spatial distribution of species remains unexplored. Here we analysed the partial contributions of remotely sensed and climatic predictor sets to explain and predict the distribution of 19 tree species in Utah. We also tested how these partial contributions were related to characteristics such as successional types or species traits. 2. We developed two spatial predictor sets of remotely sensed and topo-climatic variables to explain the distribution of tree species. We used variation partitioning techniques applied to generalized linear models to explore the combined and partial predictive powers of the two predictor sets. Non-parametric tests were used to explore the relationships between the partial model contributions of both predictor sets and species characteristics. 3. More than 60% of the variation explained by the models represented contributions by one of the two partial predictor sets alone, with topo-climatic variables outperforming the remotely sensed predictors. However, the partial models derived from only remotely sensed predictors still provided high model accuracies, indicating a significant correlation between climate and remote sensing variables. The overall accuracy of the models was high, but small sample sizes had a strong effect on cross-validated accuracies for rare species. 4. Models of early successional and broadleaf species benefited significantly more from adding remotely sensed predictors than did late seral and needleleaf species. The core-satellite species types differed significantly with respect to overall model accuracies. Models of satellite and urban species, both with low prevalence, benefited more from use of remotely sensed predictors than did the more frequent core species. 5. Synthesis and applications. If carefully prepared, remotely sensed variables are useful additional predictors for the spatial distribution of trees. Major improvements resulted for deciduous, early successional, satellite and rare species. The ability to improve model accuracy for species having markedly different life history strategies is a crucial step for assessing effects of global change. ?? 2007 The Authors.
ZIMMERMANN, N E; EDWARDS, T C; MOISEN, G G; FRESCINO, T S; BLACKARD, J A
2007-01-01
Compared to bioclimatic variables, remote sensing predictors are rarely used for predictive species modelling. When used, the predictors represent typically habitat classifications or filters rather than gradual spectral, surface or biophysical properties. Consequently, the full potential of remotely sensed predictors for modelling the spatial distribution of species remains unexplored. Here we analysed the partial contributions of remotely sensed and climatic predictor sets to explain and predict the distribution of 19 tree species in Utah. We also tested how these partial contributions were related to characteristics such as successional types or species traits. We developed two spatial predictor sets of remotely sensed and topo-climatic variables to explain the distribution of tree species. We used variation partitioning techniques applied to generalized linear models to explore the combined and partial predictive powers of the two predictor sets. Non-parametric tests were used to explore the relationships between the partial model contributions of both predictor sets and species characteristics. More than 60% of the variation explained by the models represented contributions by one of the two partial predictor sets alone, with topo-climatic variables outperforming the remotely sensed predictors. However, the partial models derived from only remotely sensed predictors still provided high model accuracies, indicating a significant correlation between climate and remote sensing variables. The overall accuracy of the models was high, but small sample sizes had a strong effect on cross-validated accuracies for rare species. Models of early successional and broadleaf species benefited significantly more from adding remotely sensed predictors than did late seral and needleleaf species. The core-satellite species types differed significantly with respect to overall model accuracies. Models of satellite and urban species, both with low prevalence, benefited more from use of remotely sensed predictors than did the more frequent core species. Synthesis and applications. If carefully prepared, remotely sensed variables are useful additional predictors for the spatial distribution of trees. Major improvements resulted for deciduous, early successional, satellite and rare species. The ability to improve model accuracy for species having markedly different life history strategies is a crucial step for assessing effects of global change. PMID:18642470
More on the Best Evolutionary Rate for Phylogenetic Analysis
Massingham, Tim; Goldman, Nick
2017-01-01
Abstract The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations. We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well. Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data. PMID:28595363
Salient Predictors of School Dropout among Secondary Students with Learning Disabilities
ERIC Educational Resources Information Center
Doren, Bonnie; Murray, Christopher; Gau, Jeff M.
2014-01-01
The purpose of this study was to identify the unique contributions of a comprehensive set of predictors and the most salient predictors of school dropout among a nationally representative sample of students with learning disabilities (LD). A comprehensive set of theoretically and empirically relevant factors was selected for examination. Analyses…
In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.
Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi
2013-10-04
Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC. Copyright © 2013 Elsevier Inc. All rights reserved.
Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia
2012-01-01
Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521
Exogenous and Endogenous Determinants of Blood Trihalomethane Levels after Showering
Backer, Lorraine C.; Lan, Qing; Blount, Benjamin C.; Nuckols, J.R.; Branch, Robert; Lyu, Christopher W.; Kieszak, Stephanie M.; Brinkman, Marielle C.; Gordon, Sydney M.; Flanders, W. Dana; Romkes, Marjorie; Cantor, Kenneth P.
2008-01-01
Background We previously conducted a study to assess whether household exposures to tap water increased an individual’s internal dose of trihalomethanes (THMs). Increases in blood THM levels among subjects who showered or bathed were variable, with increased levels tending to cluster in two groups. Objectives Our goal was to assess the importance of personal characteristics, previous exposures, genetic polymorphisms, and environmental exposures in determining THM concentrations in blood after showering. Methods One hundred study participants completed a health symptom questionnaire, a 48-hr food and water consumption diary, and took a 10-min shower in a controlled setting. We examined THM levels in blood samples collected at baseline and 10 and 30 min after the shower. We assessed the significance of personal characteristics, previous exposures to THMs, and specific gene polymorphisms in predicting postshower blood THM concentrations. Results We did not observe the clustering of blood THM concentrations observed in our earlier study. We found that environmental THM concentrations were important predictors of blood THM concentrations immediately after showering. For example, the chloroform concentration in the shower stall air was the most important predictor of blood chloroform levels 10 min after the shower (p < 0.001). Personal characteristics, previous exposures to THMs, and specific polymorphisms in CYP2D6 and GSTT1 genes were significant predictors of both baseline and postshowering blood THM concentrations as well as of changes in THM concentrations associated with showering. Conclusion The inclusion of information about individual physiologic characteristics and environmental measurements would be valuable in future studies to assess human health effects from exposures to THMs in tap water. PMID:18197300
Blood Gene Expression Predicts Bronchiolitis Obliterans Syndrome
Danger, Richard; Royer, Pierre-Joseph; Reboulleau, Damien; Durand, Eugénie; Loy, Jennifer; Tissot, Adrien; Lacoste, Philippe; Roux, Antoine; Reynaud-Gaubert, Martine; Gomez, Carine; Kessler, Romain; Mussot, Sacha; Dromer, Claire; Brugière, Olivier; Mornex, Jean-François; Guillemain, Romain; Dahan, Marcel; Knoop, Christiane; Botturi, Karine; Foureau, Aurore; Pison, Christophe; Koutsokera, Angela; Nicod, Laurent P.; Brouard, Sophie; Magnan, Antoine; Jougon, J.
2018-01-01
Bronchiolitis obliterans syndrome (BOS), the main manifestation of chronic lung allograft dysfunction, leads to poor long-term survival after lung transplantation. Identifying predictors of BOS is essential to prevent the progression of dysfunction before irreversible damage occurs. By using a large set of 107 samples from lung recipients, we performed microarray gene expression profiling of whole blood to identify early biomarkers of BOS, including samples from 49 patients with stable function for at least 3 years, 32 samples collected at least 6 months before BOS diagnosis (prediction group), and 26 samples at or after BOS diagnosis (diagnosis group). An independent set from 25 lung recipients was used for validation by quantitative PCR (13 stables, 11 in the prediction group, and 8 in the diagnosis group). We identified 50 transcripts differentially expressed between stable and BOS recipients. Three genes, namely POU class 2 associating factor 1 (POU2AF1), T-cell leukemia/lymphoma protein 1A (TCL1A), and B cell lymphocyte kinase, were validated as predictive biomarkers of BOS more than 6 months before diagnosis, with areas under the curve of 0.83, 0.77, and 0.78 respectively. These genes allow stratification based on BOS risk (log-rank test p < 0.01) and are not associated with time posttransplantation. This is the first published large-scale gene expression analysis of blood after lung transplantation. The three-gene blood signature could provide clinicians with new tools to improve follow-up and adapt treatment of patients likely to develop BOS. PMID:29375549
Hunegnaw, Emirie; Tiruneh, Moges
2017-01-01
Background Tuberculosis, mainly in prisoners, is a major public health problem in Ethiopia where there is no medical screening during prison admission. This creates scarcity of TB data in such settings. Objective To determine prevalence and associated factors of TB in prisons in East Gojjam Zone, Northwest Ethiopia. Methods A cross-sectional study was conducted from February to May 2016 among 265 prisoners in three prison sites. Sputum was processed using GeneXpert MTB/RIF. Data were analyzed using SPSS version 20.0. Multivariable logistic regression was used; p values = 0.05 were considered statistically significant. Results Of 265 prisoners, 9 (3.4%) were TB positive (males); 77.8%, 55.6%, and 55.6% of cases were rural dwellers, married, and farmers, respectively. Seven (2.6%) prisoners were HIV positive, and 3 (1.13%) had TB/HIV coinfection. One (0.4%) TB case was rifampicin resistant. Marriage (AOR = 1.5; 95% CI: 1.7, 13.03), HIV (AOR = 0.14; 95% CI: 0.001, 0.17), and sharing of rooms (AOR = 1.62; 95% CI: 2.6, 10.20) were predictors for TB. Conclusion Nine prisoners were TB positive. One case showed rifampicin resistance and three had TB/HIV coinfection. Marriage, HIV, and sharing of rooms were predictors for TB. Prevention/control and monitoring are mandatory in such settings. PMID:29226216
Painful Temporomandibular Disorder: Decade of Discovery from OPPERA Studies.
Slade, G D; Ohrbach, R; Greenspan, J D; Fillingim, R B; Bair, E; Sanders, A E; Dubner, R; Diatchenko, L; Meloto, C B; Smith, S; Maixner, W
2016-09-01
In 2006, the OPPERA project (Orofacial Pain: Prospective Evaluation and Risk Assessment) set out to identify risk factors for development of painful temporomandibular disorder (TMD). A decade later, this review summarizes its key findings. At 4 US study sites, OPPERA recruited and examined 3,258 community-based TMD-free adults assessing genetic and phenotypic measures of biological, psychosocial, clinical, and health status characteristics. During follow-up, 4% of participants per annum developed clinically verified TMD, although that was a "symptom iceberg" when compared with the 19% annual rate of facial pain symptoms. The most influential predictors of clinical TMD were simple checklists of comorbid health conditions and nonpainful orofacial symptoms. Self-reports of jaw parafunction were markedly stronger predictors than corresponding examiner assessments. The strongest psychosocial predictor was frequency of somatic symptoms, although not somatic reactivity. Pressure pain thresholds measured at cranial sites only weakly predicted incident TMD yet were strongly associated with chronic TMD, cross-sectionally, in OPPERA's separate case-control study. The puzzle was resolved in OPPERA's nested case-control study where repeated measures of pressure pain thresholds revealed fluctuation that coincided with TMD's onset, persistence, and recovery but did not predict its incidence. The nested case-control study likewise furnished novel evidence that deteriorating sleep quality predicted TMD incidence. Three hundred genes were investigated, implicating 6 single-nucleotide polymorphisms (SNPs) as risk factors for chronic TMD, while another 6 SNPs were associated with intermediate phenotypes for TMD. One study identified a serotonergic pathway in which multiple SNPs influenced risk of chronic TMD. Two other studies investigating gene-environment interactions found that effects of stress on pain were modified by variation in the gene encoding catechol O-methyltransferase. Lessons learned from OPPERA have verified some implicated risk factors for TMD and refuted others, redirecting our thinking. Now it is time to apply those lessons to studies investigating treatment and prevention of TMD. © International & American Associations for Dental Research 2016.
The Emerging Role of miR-223 in Platelet Reactivity: Implications in Antiplatelet Therapy
Shi, Rui; Zhou, Xin; Ji, Wen-Jie; Zhang, Ying-Ying; Ma, Yong-Qiang; Zhang, Jian-Qi
2015-01-01
Platelets are anuclear cells and are devoid of genomic DNA, but they are capable of de novo protein synthesis from mRNA derived from their progenitor cells, megakaryocytes. There is mounting evidence that microRNA (miRNA) plays an important role in regulating gene expression in platelets. miR-223 is the most abundant miRNAs in megakaryocytes and platelets. One of the miR-223-regulated genes is ADP P2Y12, a key target for current antiplatelet drug therapy. Recent studies showed that a blunted response to P2Y12 antagonist, that is, high on-treatment platelet reactivity (HTPR), is a strong predictor of major cardiovascular events (MACEs) in coronary heart disease (CHD) patients receiving antiplatelet treatment. Recent clinical cohort study showed that the level of circulating miR-223 is inversely associated with MACE in CHD patients. In addition, our recent data demonstrated that the level of both intraplatelet and circulating miR-223 is an independent predictor for HTPR, thus providing a link between miR-223 and MACE. These lines of evidence indicate that miR-223 may serve as a potential regulatory target for HTPR, as well as a diagnostic tool for identification of HTPR in clinical settings. PMID:26221610
Shiao, S Pamela K; Grayson, James; Lie, Amanda; Yu, Chong Ho
2018-06-20
To personalize nutrition, the purpose of this study was to examine five key genes in the folate metabolism pathway, and dietary parameters and related interactive parameters as predictors of colorectal cancer (CRC) by measuring the healthy eating index (HEI) in multiethnic families. The five genes included methylenetetrahydrofolate reductase ( MTHFR ) 677 and 1298, methionine synthase ( MTR ) 2756, methionine synthase reductase ( MTRR 66), and dihydrofolate reductase ( DHFR ) 19bp , and they were used to compute a total gene mutation score. We included 53 families, 53 CRC patients and 53 paired family friend members of diverse population groups in Southern California. We measured multidimensional data using the ensemble bootstrap forest method to identify variables of importance within domains of genetic, demographic, and dietary parameters to achieve dimension reduction. We then constructed predictive generalized regression (GR) modeling with a supervised machine learning validation procedure with the target variable (cancer status) being specified to validate the results to allow enhanced prediction and reproducibility. The results showed that the CRC group had increased total gene mutation scores compared to the family members ( p < 0.05). Using the Akaike's information criterion and Leave-One-Out cross validation GR methods, the HEI was interactive with thiamine (vitamin B1), which is a new finding for the literature. The natural food sources for thiamine include whole grains, legumes, and some meats and fish which HEI scoring included as part of healthy portions (versus limiting portions on salt, saturated fat and empty calories). Additional predictors included age, as well as gender and the interaction of MTHFR 677 with overweight status (measured by body mass index) in predicting CRC, with the cancer group having more men and overweight cases. The HEI score was significant when split at the median score of 77 into greater or less scores, confirmed through the machine-learning recursive tree method and predictive modeling, although an HEI score of greater than 80 is the US national standard set value for a good diet. The HEI and healthy eating are modifiable factors for healthy living in relation to dietary parameters and cancer prevention, and they can be used for personalized nutrition in the precision-based healthcare era.
Effects of advancing gestation and non-Caucasian race on ductus arteriosus gene expression
Waleh, Nahid; Barrette, Anne Marie; Dagle, John M.; Momany, Allison; Jin, Chengshi; Hills, Nancy K.; Shelton, Elaine L.; Reese, Jeff; Clyman, Ronald I.
2015-01-01
Objective To identify genes affected by advancing gestation and racial/ethnic origin in human ductus arteriosus (DA). Study design We collected three sets of DA tissue (n=93, n=89, n=91; total = 273 fetuses) from second trimester pregnancies. We examined four genes, with DNA polymorphisms that distribute along racial lines, to identify "Caucasian" and "Non-Caucasian" DA. We used RT-PCR to measure RNA expression of 48 candidate genes involved in functional closure of the DA, and used multivariable regression analyses to examine the relationships between advancing gestation, "Non-Caucasian" race, and gene expression. Results Mature gestation and Non-Caucasian race are significant predictors for identifying infants who will close their patent DA when treated with indomethacin. Advancing gestation consistently altered gene expression in pathways involved with oxygen-induced constriction (e.g., calcium-channels, potassium-channels, and endothelin signaling), contractile protein maturation, tissue remodeling, and prostaglandin and nitric oxide signaling in all three tissue sets. None of the pathways involved with oxygen-induced constriction appeared to be altered in "Non-Caucasian" DA. Two genes, SLCO2A1 and NOS3, (involved with prostaglandin reuptake/metabolism and nitric oxide production, respectively) were consistently decreased in "Non-Caucasian" DA. Conclusions Prostaglandins and nitric oxide are the most important vasodilators opposing DA closure. Indomethacin inhibits prostaglandin production, but not nitric oxide production. Because decreased SLCO2A1 and NOS3 expression can lead to increased prostaglandin and decreased nitric oxide concentrations, we speculate that prostaglandin-mediated vasodilation may play a more dominant role in maintaining the "Non-Caucasian" PDA, making it more likely to close when inhibited by indomethacin. PMID:26265282
Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.
2013-01-01
Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943
Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J
2013-01-01
Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.
Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin
2009-01-01
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
Modeling 3D Facial Shape from DNA
Claes, Peter; Liberton, Denise K.; Daniels, Katleen; Rosana, Kerri Matthes; Quillen, Ellen E.; Pearson, Laurel N.; McEvoy, Brian; Bauchet, Marc; Zaidi, Arslan A.; Yao, Wei; Tang, Hua; Barsh, Gregory S.; Absher, Devin M.; Puts, David A.; Rocha, Jorge; Beleza, Sandra; Pereira, Rinaldo W.; Baynam, Gareth; Suetens, Paul; Vandermeulen, Dirk; Wagner, Jennifer K.; Boster, James S.; Shriver, Mark D.
2014-01-01
Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks to measure face shape in population samples with mixed West African and European ancestry from three locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables, which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting normal-range facial features and for approximating the appearance of a face from genetic markers. PMID:24651127
Gong, Ping; Nan, Xiaofei; Barker, Natalie D; Boyd, Robert E; Chen, Yixin; Wilkins, Dawn E; Johnson, David R; Suedel, Burton C; Perkins, Edward J
2016-03-08
Chemical bioavailability is an important dose metric in environmental risk assessment. Although many approaches have been used to evaluate bioavailability, not a single approach is free from limitations. Previously, we developed a new genomics-based approach that integrated microarray technology and regression modeling for predicting bioavailability (tissue residue) of explosives compounds in exposed earthworms. In the present study, we further compared 18 different regression models and performed variable selection simultaneously with parameter estimation. This refined approach was applied to both previously collected and newly acquired earthworm microarray gene expression datasets for three explosive compounds. Our results demonstrate that a prediction accuracy of R(2) = 0.71-0.82 was achievable at a relatively low model complexity with as few as 3-10 predictor genes per model. These results are much more encouraging than our previous ones. This study has demonstrated that our approach is promising for bioavailability measurement, which warrants further studies of mixed contamination scenarios in field settings.
Martín-Navarro, Antonio; Gaudioso-Simón, Andrés; Álvarez-Jarreta, Jorge; Montoya, Julio; Mayordomo, Elvira; Ruiz-Pesini, Eduardo
2017-03-07
Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved. We have developed a SVM classifier (Mitoclass.1) specific for mtDNA missense variants. Training and validation of the model was executed with 2,835 mtDNA damaging and neutral amino acid substitutions, previously curated by a set of rigorous pathogenicity criteria with high specificity. Each instance is described by a set of three attributes based on evolutionary conservation in Eukaryota of wildtype and mutant amino acids as well as coevolution and a novel evolutionary analysis of specific substitutions belonging to the same domain of mitochondrial polypeptides. Our classifier has performed better than other web-available tested predictors. We checked performance of three broadly used predictors with the total mutations of our curated dataset. PolyPhen-2 showed the best results for a screening proposal with a good sensitivity. Nevertheless, the number of false positive predictions was too high. Our method has an improved sensitivity and better specificity in relation to PolyPhen-2. We also publish predictions for the complete set of 24,201 possible missense variants in the 13 human mtDNA-encoded polypeptides. Mitoclass.1 allows a better selection of candidate damaging missense variants from mtDNA. A careful search of discriminatory attributes and a training step based on a curated dataset of amino acid substitutions belonging exclusively to human mtDNA genes allows an improved performance. Mitoclass.1 accuracy could be improved in the future when more mtDNA missense substitutions will be available for updating the attributes and retraining the model.
Searching for missing heritability: Designing rare variant association studies
Zuk, Or; Schaffner, Stephen F.; Samocha, Kaitlin; Do, Ron; Hechter, Eliana; Kathiresan, Sekar; Daly, Mark J.; Neale, Benjamin M.; Sunyaev, Shamil R.; Lander, Eric S.
2014-01-01
Genetic studies have revealed thousands of loci predisposing to hundreds of human diseases and traits, revealing important biological pathways and defining novel therapeutic hypotheses. However, the genes discovered to date typically explain less than half of the apparent heritability. Because efforts have largely focused on common genetic variants, one hypothesis is that much of the missing heritability is due to rare genetic variants. Studies of common variants are typically referred to as genomewide association studies, whereas studies of rare variants are often simply called sequencing studies. Because they are actually closely related, we use the terms common variant association study (CVAS) and rare variant association study (RVAS). In this paper, we outline the similarities and differences between RVAS and CVAS and describe a conceptual framework for the design of RVAS. We apply the framework to address key questions about the sample sizes needed to detect association, the relative merits of testing disruptive alleles vs. missense alleles, frequency thresholds for filtering alleles, the value of predictors of the functional impact of missense alleles, the potential utility of isolated populations, the value of gene-set analysis, and the utility of de novo mutations. The optimal design depends critically on the selection coefficient against deleterious alleles and thus varies across genes. The analysis shows that common variant and rare variant studies require similarly large sample collections. In particular, a well-powered RVAS should involve discovery sets with at least 25,000 cases, together with a substantial replication set. PMID:24443550
Development and Validation of a qRT-PCR Classifier for Lung Cancer Prognosis
Chen, Guoan; Kim, Sinae; Taylor, Jeremy MG; Wang, Zhuwen; Lee, Oliver; Ramnath, Nithya; Reddy, Rishindra M; Lin, Jules; Chang, Andrew C; Orringer, Mark B; Beer, David G
2011-01-01
Purpose This prospective study aimed to develop a robust and clinically-applicable method to identify high-risk early stage lung cancer patients and then to validate this method for use in future translational studies. Patients and Methods Three published Affymetrix microarray data sets representing 680 primary tumors were used in the survival-related gene selection procedure using clustering, Cox model and random survival forest (RSF) analysis. A final set of 91 genes was selected and tested as a predictor of survival using a qRT-PCR-based assay utilizing an independent cohort of 101 lung adenocarcinomas. Results The RSF model built from 91 genes in the training set predicted patient survival in an independent cohort of 101 lung adenocarcinomas, with a prediction error rate of 26.6%. The mortality risk index (MRI) was significantly related to survival (Cox model p < 0.00001) and separated all patients into low, medium, and high-risk groups (HR = 1.00, 2.82, 4.42). The MRI was also related to survival in stage 1 patients (Cox model p = 0.001), separating patients into low, medium, and high-risk groups (HR = 1.00, 3.29, 3.77). Conclusions The development and validation of this robust qRT-PCR platform allows prediction of patient survival with early stage lung cancer. Utilization will now allow investigators to evaluate it prospectively by incorporation into new clinical trials with the goal of personalized treatment of lung cancer patients and improving patient survival. PMID:21792073
Goal Setting in Principal Evaluation: Goal Quality and Predictors of Achievement
ERIC Educational Resources Information Center
Sinnema, Claire E. L.; Robinson, Viviane M. J.
2012-01-01
This article draws on goal-setting theory to investigate the goals set by experienced principals during their performance evaluations. While most goals were about teaching and learning, they tended to be vaguely expressed and only partially achieved. Five predictors (commitment, challenge, learning, effort, and support) explained a significant…
NASA Astrophysics Data System (ADS)
Fritz, Andreas; Enßle, Fabian; Zhang, Xiaoli; Koch, Barbara
2016-08-01
The present study analyses the two earth observation sensors regarding their capability of modelling forest above ground biomass and forest density. Our research is carried out at two different demonstration sites. The first is located in south-western Germany (region Karlsruhe) and the second is located in southern China in Jiangle County (Province Fujian). A set of spectral and spatial predictors are computed from both, Sentinel-2A and WorldView-2 data. Window sizes in the range of 3*3 pixels to 21*21 pixels are computed in order to cover the full range of the canopy sizes of mature forest stands. Textural predictors of first and second order (grey-level-co-occurrence matrix) are calculated and are further used within a feature selection procedure. Additionally common spectral predictors from WorldView-2 and Sentinel-2A data such as all relevant spectral bands and NDVI are integrated in the analyses. To examine the most important predictors, a predictor selection algorithm is applied to the data, whereas the entire predictor set of more than 1000 predictors is used to find most important ones. Out of the original set only the most important predictors are then further analysed. Predictor selection is done with the Boruta package in R (Kursa and Rudnicki (2010)), whereas regression is computed with random forest. Prior the classification and regression a tuning of parameters is done by a repetitive model selection (100 runs), based on the .632 bootstrapping. Both are implemented in the caret R pack- age (Kuhn et al. (2016)). To account for the variability in the data set 100 independent runs are performed. Within each run 80 percent of the data is used for training and the 20 percent are used for an independent validation. With the subset of original predictors mapping of above ground biomass is performed.
A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks.
Zhou, Xiaobo; Wang, Xiaodong; Pal, Ranadip; Ivanov, Ivan; Bittner, Michael; Dougherty, Edward R
2004-11-22
We have hypothesized that the construction of transcriptional regulatory networks using a method that optimizes connectivity would lead to regulation consistent with biological expectations. A key expectation is that the hypothetical networks should produce a few, very strong attractors, highly similar to the original observations, mimicking biological state stability and determinism. Another central expectation is that, since it is expected that the biological control is distributed and mutually reinforcing, interpretation of the observations should lead to a very small number of connection schemes. We propose a fully Bayesian approach to constructing probabilistic gene regulatory networks (PGRNs) that emphasizes network topology. The method computes the possible parent sets of each gene, the corresponding predictors and the associated probabilities based on a nonlinear perceptron model, using a reversible jump Markov chain Monte Carlo (MCMC) technique, and an MCMC method is employed to search the network configurations to find those with the highest Bayesian scores to construct the PGRN. The Bayesian method has been used to construct a PGRN based on the observed behavior of a set of genes whose expression patterns vary across a set of melanoma samples exhibiting two very different phenotypes with respect to cell motility and invasiveness. Key biological features have been faithfully reflected in the model. Its steady-state distribution contains attractors that are either identical or very similar to the states observed in the data, and many of the attractors are singletons, which mimics the biological propensity to stably occupy a given state. Most interestingly, the connectivity rules for the most optimal generated networks constituting the PGRN are remarkably similar, as would be expected for a network operating on a distributed basis, with strong interactions between the components.
Functional pathway analysis of genes associated with response to treatment for chronic hepatitis C.
Birerdinc, A; Afendy, A; Stepanova, M; Younossi, I; Manyam, G; Baranova, A; Younossi, Z M
2010-10-01
Chronic hepatitis C (CH-C) is among the most common causes of chronic liver disease. Approximately 50% of patients with CH-C treated with pegylated interferon-α and ribavirin (PEG-IFN-α + RBV) achieve a sustained virological response (SVR). Several factors such as genotype 1, African American (AA) race, obesity and the absence of an early virological response (EVR) are associated with low SVR. This study elucidates molecular pathways deregulated in patients with CH-C with negative predictors of response to antiviral therapy. Sixty-eight patients with CH-C who underwent a full course of treatment with PEG-IFN-α + RBV were included in the study. Pretreatment blood samples were collected in PAXgene™ RNA tubes. EVR, complete EVR (cEVR), and SVR rates were 76%, 57% and 41%, respectively. Total RNA was extracted from pretreatment peripheral blood mononuclear cells, quantified and used for one-step RT-PCR to profile 154 mRNAs. The expression of mRNAs was normalized with six 'housekeeping' genes. Differentially expressed genes were separated into up and downregulated gene lists according to the presence or absence of a risk factor and subjected to KEGG Pathway Painter which allows high-throughput visualization of the pathway-specific changes in expression profiles. The genes were consolidated into the networks associated with known predictors of response. Before treatment, various genes associated with core components of the JAK/STAT pathway were activated in the cohorts least likely to achieve SVR. Genes related to focal adhesion and TGF-β pathways were activated in some patients with negative predictors of response. Pathway-centred analysis of gene expression profiles from treated patients with CH-C points to the Janus kinase-signal transducers and activators of transcription signalling cascade as the major pathogenetic component responsible for not achieving SVR. In addition, focal adhesion and TGF-β pathways are associated with some predictors of response. © 2009 Blackwell Publishing Ltd.
Shiao, S Pamela K; Grayson, James; Yu, Chong Ho; Wasek, Brandi; Bottiglieri, Teodoro
2018-02-16
For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene-environment interactions and predictors of colorectal cancer (CRC) by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black). We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls ( p < 0.05), on MTHFR C677T , MTR A2756G , MTRR A66G, and DHFR 19 bp except MTHFR A1298C. Four racial groups presented different polymorphism rates for four genes (all p < 0.05) except MTHFR A1298C. Following the ensemble method, the most influential factors were identified, and the best predictive models were generated by using the generalized regression models, with Akaike's information criterion and leave-one-out cross validation methods. Body mass index (BMI) and gender were consistent predictors of CRC for both models when individual genes versus total polymorphism counts were used, and alcohol use was interactive with BMI status. Body mass index status was also interactive with both gender and MTHFR C677T gene polymorphism, and the exposure to environmental pollutants was an additional predictor. These results point to the important roles of environmental and modifiable factors in relation to gene-environment interactions in the prevention of CRC.
Fourati, Slim; Cristescu, Razvan; Loboda, Andrey; Talla, Aarthi; Filali, Ali; Railkar, Radha; Schaeffer, Andrea K.; Favre, David; Gagnon, Dominic; Peretz, Yoav; Wang, I-Ming; Beals, Chan R.; Casimiro, Danilo R.; Carayannopoulos, Leonidas N.; Sékaly, Rafick-Pierre
2016-01-01
Aging is associated with hyporesponse to vaccination, whose mechanisms remain unclear. In this study hepatitis B virus (HBV)-naive older adults received three vaccines, including one against HBV. Here we show, using transcriptional and cytometric profiling of whole blood collected before vaccination, that heightened expression of genes that augment B-cell responses and higher memory B-cell frequencies correlate with stronger responses to HBV vaccine. In contrast, higher levels of inflammatory response transcripts and increased frequencies of pro-inflammatory innate cells correlate with weaker responses to this vaccine. Increased numbers of erythrocytes and the haem-induced response also correlate with poor response to the HBV vaccine. A transcriptomics-based pre-vaccination predictor of response to HBV vaccine is built and validated in distinct sets of older adults. This moderately accurate (area under the curve≈65%) but robust signature is supported by flow cytometry and cytokine profiling. This study is the first that identifies baseline predictors and mechanisms of response to the HBV vaccine. PMID:26742691
Golimbet, V E; Volel', B A; Kopylov, F Iu; Dolzhikov, A V; Korovaitseva, G I; Kasparov, S V; Isaeva, M I
2015-01-01
In a framework of search for early predictors of depression in patients with ischemic heart disease (IHD) we studied effect of molecular-genetic factors (polymorphism of brain-derived neirotrophic factor--BDNF), personality traits (anxiety, neuroticism), IHD severity, and psychosocial stressors on manifestations of depression in men with verified diagnosis of IHD. Severity of depression was assessed by Hamilton Depression Rating Scale 21-item (HAMD 21), anxiety and neuroticism were evaluated by the Spielberger State-Trait Anxiety Inventory and "Big Five" questionnaire, respectively. It wa shown that personal anxiety and ValVal genotype of BDNF gene appeared to be predictors of moderate and severe depression.
Missing value imputation for gene expression data by tailored nearest neighbors.
Faisal, Shahla; Tutz, Gerhard
2017-04-25
High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.
Siegfried, Jill M; Lin, Yan; Diergaarde, Brenda; Lin, Hui-Min; Dacic, Sanja; Pennathur, Arjun; Weissfeld, Joel L; Romkes, Marjorie; Nukui, Tomoko; Stabile, Laura P
2015-11-01
Non-small cell lung cancers (NSCLCs) frequently express estrogen receptor (ER) β, and estrogen signaling is active in many lung tumors. We investigated the ability of genes contained in the prediction analysis of microarray 50 (PAM50) breast cancer risk predictor gene signature to provide prognostic information in NSCLC. Supervised principal component analysis of mRNA expression data was used to evaluate the ability of the PAM50 panel to provide prognostic information in a stage I NSCLC cohort, in an all-stage NSCLC cohort, and in The Cancer Genome Atlas data. Immunohistochemistry was used to determine status of ERβ and other proteins in lung tumor tissue. Associations with prognosis were observed in the stage I cohort. Cross-validation identified seven genes that, when analyzed together, consistently showed survival associations. In pathway analysis, the seven-gene panel described one network containing the ER and progesterone receptor, as well as human epidermal growth factor receptor (HER)2/HER3 and neuregulin-1. NSCLC cases also showed a significant association between ERβ and HER2 protein expression. Cases positive for HER2 expression were more likely to express HER3, and ERβ-positive cases were less likely to be both HER2 and HER3 negative. Prognostic ability of genes in the PAM50 panel was verified in an ERβ-positive cohort representing all NSCLC stages. In The Cancer Genome Atlas data sets, the PAM50 gene set was prognostic in both adenocarcinoma and squamous cell carcinoma, whereas the seven-gene panel was prognostic only in squamous cell carcinoma. Genes in the PAM50 panel, including those linking ER and HER2, identify lung cancer patients at risk for poor outcome, especially among ERβ-positive cases and squamous cell carcinoma. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Review of Opioid Pharmacogenetics and Considerations for Pain Management.
Owusu Obeng, Aniwaa; Hamadeh, Issam; Smith, Michael
2017-09-01
Opioid analgesics are the standards of care for the treatment of moderate to severe nociceptive pain, particularly in the setting of cancer and surgery. Their analgesic properties mainly emanate from stimulation of the μ receptors, which are encoded by the OPRM1 gene. Hepatic metabolism represents the major route of elimination, which, for some opioids, namely codeine and tramadol, is necessary for their bioactivation into more potent analgesics. The highly polymorphic nature of the genes coding for phase I and phase II enzymes (pharmacokinetics genes) that are involved in the metabolism and bioactivation of opioids suggests a potential interindividual variation in their disposition and, most likely, response. In fact, such an association has been substantiated in several pharmacokinetic studies described in this review, in which drug exposure and/or metabolism differed significantly based on the presence of polymorphisms in these pharmacokinetics genes. Furthermore, in some studies, the observed variability in drug exposure translated into differences in the incidence of opioid-related adverse effects, particularly nausea, vomiting, constipation, and respiratory depression. Although the influence of polymorphisms in pharmacokinetics genes, as well as pharmacodynamics genes (OPRM1 and COMT) on response to opioids has been a subject of intense research, the results have been somehow conflicting, with some evidence insinuating for a potential role for OPRM1. The Clinical Pharmacogenetics Implementation Consortium guidelines provide CYP2D6-guided therapeutic recommendations to individualize treatment with tramadol and codeine. However, implementation guidelines for other opioids, which are more commonly used in real-world settings for pain management, are currently lacking. Hence, further studies are warranted to bridge this gap in our knowledge base and ultimately ascertain the role of pharmacogenetic markers as predictors of response to opioid analgesics. © 2017 Pharmacotherapy Publications, Inc.
Genetic Marker Discovery in Complex Traits: A Field Example on Fat Content and Composition in Pigs.
Pena, Ramona Natacha; Ros-Freixedes, Roger; Tor, Marc; Estany, Joan
2016-12-14
Among the large number of attributes that define pork quality, fat content and composition have attracted the attention of breeders in the recent years due to their interaction with human health and technological and sensorial properties of meat. In livestock species, fat accumulates in different depots following a temporal pattern that is also recognized in humans. Intramuscular fat deposition rate and fatty acid composition change with life. Despite indication that it might be possible to select for intramuscular fat without affecting other fat depots, to date only one depot-specific genetic marker ( PCK1 c.2456C>A) has been reported. In contrast, identification of polymorphisms related to fat composition has been more successful. For instance, our group has described a variant in the stearoyl-coA desaturase ( SCD ) gene that improves the desaturation index of fat without affecting overall fatness or growth. Identification of mutations in candidate genes can be a tedious and costly process. Genome-wide association studies can help in narrowing down the number of candidate genes by highlighting those which contribute most to the genetic variation of the trait. Results from our group and others indicate that fat content and composition are highly polygenic and that very few genes explain more than 5% of the variance of the trait. Moreover, as the complexity of the genome emerges, the role of non-coding genes and regulatory elements cannot be disregarded. Prediction of breeding values from genomic data is discussed in comparison with conventional best linear predictors of breeding values. An example based on real data is given, and the implications in phenotype prediction are discussed in detail. The benefits and limitations of using large SNP sets versus a few very informative markers as predictors of genetic merit of breeding candidates are evaluated using field data as an example.
Verdejo-García, Antonio; Albein-Urios, Natalia; Molina, Esther; Ching-López, Ana; Martínez-González, José M; Gutiérrez, Blanca
2013-11-01
Based on previous evidence of a MAOA gene*cocaine use interaction on orbitofrontal cortex volume attrition, we tested whether the MAOA low activity variant and cocaine use severity are interactively associated with impulsivity and behavioral indices of orbitofrontal dysfunction: emotion recognition and decision-making. 72 cocaine dependent individuals and 52 non-drug using controls (including healthy individuals and problem gamblers) were genotyped for the MAOA gene and tested using the UPPS-P Impulsive Behavior Scale, the Iowa Gambling Task and the Ekman's Facial Emotions Recognition Test. To test the main hypothesis, we conducted hierarchical multiple regression analyses including three sets of predictors: (1) age, (2) MAOA genotype and severity of cocaine use, and (3) the interaction between MAOA genotype and severity of cocaine use. UPPS-P, Ekman Test and Iowa Gambling Task's scores were the outcome measures. We computed the statistical significance of the prediction change yielded by each consecutive set, with 'a priori' interest in the MAOA*cocaine severity interaction. We found significant effects of the MAOA gene*cocaine use severity interaction on the emotion recognition scores and the UPPS-P's dimensions of Positive Urgency and Sensation Seeking: Low activity carriers with higher cocaine exposure had poorer emotion recognition and higher Positive Urgency and Sensation Seeking. Cocaine users carrying the MAOA low activity show a greater impact of cocaine use on impulsivity and behavioral measures of orbitofrontal cortex dysfunction. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
On the statistical assessment of classifiers using DNA microarray data
Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F
2006-01-01
Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. Conclusions The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. PMID:16919171
NASA Technical Reports Server (NTRS)
Lewis, Michael
1994-01-01
Statistical encoding techniques enable the reduction of the number of bits required to encode a set of symbols, and are derived from their probabilities. Huffman encoding is an example of statistical encoding that has been used for error-free data compression. The degree of compression given by Huffman encoding in this application can be improved by the use of prediction methods. These replace the set of elevations by a set of corrections that have a more advantageous probability distribution. In particular, the method of Lagrange Multipliers for minimization of the mean square error has been applied to local geometrical predictors. Using this technique, an 8-point predictor achieved about a 7 percent improvement over an existing simple triangular predictor.
A robust prognostic signature for hormone-positive node-negative breast cancer.
Griffith, Obi L; Pepin, François; Enache, Oana M; Heiser, Laura M; Collisson, Eric A; Spellman, Paul T; Gray, Joe W
2013-01-01
Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.
A robust prognostic signature for hormone-positive node-negative breast cancer
2013-01-01
Background Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. Conclusions RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment. PMID:24112773
Yao, Z; Peng, Y; Bi, J; Xie, C; Chen, X; Li, Y; Ye, X; Zhou, J
2016-03-01
Multidrug-resistant Pseudomonas aeruginosa (MDRPA) infections are major threats to healthcare-associated infection control and the intrinsic molecular mechanisms of MDRPA are also unclear. We examined 348 isolates of P. aeruginosa, including 188 MDRPA and 160 non-MDRPA, obtained from five tertiary-care hospitals in Guangzhou, China. Significant correlations were found between gene/enzyme carriage and increased rates of antimicrobial resistance (P < 0·01). gyrA mutation, OprD loss and metallo-β-lactamase (MBL) presence were identified as crucial molecular risk factors for MDRPA acquisition by a combination of univariate logistic regression and a multifactor dimensionality reduction approach. The MDRPA rate was also elevated with the increase in positive numbers of those three determinants (P < 0·001). Thus, gyrA mutation, OprD loss and MBL presence may serve as predictors for early screening of MDRPA infections in clinical settings.
Badal, Brateil; Solovyov, Alexander; Di Cecilia, Serena; Chan, Joseph Minhow; Chang, Li-Wei; Iqbal, Ramiz; Aydin, Iraz T.; Rajan, Geena S.; Chen, Chen; Abbate, Franco; Arora, Kshitij S.; Tanne, Antoine; Gruber, Stephen B.; Johnson, Timothy M.; Fullen, Douglas R.; Phelps, Robert; Bhardwaj, Nina; Bernstein, Emily; Ting, David T.; Brunner, Georg; Schadt, Eric E.; Greenbaum, Benjamin D.; Celebi, Julide Tok
2017-01-01
BACKGROUND. Melanoma is a heterogeneous malignancy. We set out to identify the molecular underpinnings of high-risk melanomas, those that are likely to progress rapidly, metastasize, and result in poor outcomes. METHODS. We examined transcriptome changes from benign states to early-, intermediate-, and late-stage tumors using a set of 78 treatment-naive melanocytic tumors consisting of primary melanomas of the skin and benign melanocytic lesions. We utilized a next-generation sequencing platform that enabled a comprehensive analysis of protein-coding and -noncoding RNA transcripts. RESULTS. Gene expression changes unequivocally discriminated between benign and malignant states, and a dual epigenetic and immune signature emerged defining this transition. To our knowledge, we discovered previously unrecognized melanoma subtypes. A high-risk primary melanoma subset was distinguished by a 122-epigenetic gene signature (“epigenetic” cluster) and TP53 family gene deregulation (TP53, TP63, and TP73). This subtype associated with poor overall survival and showed enrichment of cell cycle genes. Noncoding repetitive element transcripts (LINEs, SINEs, and ERVs) that can result in immunostimulatory signals recapitulating a state of “viral mimicry” were significantly repressed. The high-risk subtype and its poor predictive characteristics were validated in several independent cohorts. Additionally, primary melanomas distinguished by specific immune signatures (“immune” clusters) were identified. CONCLUSION. The TP53 family of genes and genes regulating the epigenetic machinery demonstrate strong prognostic and biological relevance during progression of early disease. Gene expression profiling of protein-coding and -noncoding RNA transcripts may be a better predictor for disease course in melanoma. This study outlines the transcriptional interplay of the cancer cell’s epigenome with the immune milieu with potential for future therapeutic targeting. FUNDING. National Institutes of Health (CA154683, CA158557, CA177940, CA087497-13), Tisch Cancer Institute, Melanoma Research Foundation, the Dow Family Charitable Foundation, and the Icahn School of Medicine at Mount Sinai. PMID:28469092
Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI.
Carraro, Marco; Minervini, Giovanni; Giollo, Manuel; Bromberg, Yana; Capriotti, Emidio; Casadio, Rita; Dunbrack, Roland; Elefanti, Lisa; Fariselli, Pietro; Ferrari, Carlo; Gough, Julian; Katsonis, Panagiotis; Leonardi, Emanuela; Lichtarge, Olivier; Menin, Chiara; Martelli, Pier Luigi; Niroula, Abhishek; Pal, Lipika R; Repo, Susanna; Scaini, Maria Chiara; Vihinen, Mauno; Wei, Qiong; Xu, Qifang; Yang, Yuedong; Yin, Yizhou; Zaucha, Jan; Zhao, Huiying; Zhou, Yaoqi; Brenner, Steven E; Moult, John; Tosatto, Silvio C E
2017-09-01
Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants. © 2017 Wiley Periodicals, Inc.
Quick survey of avirulence genes in field isolates of Magnaporthe oryzae in the past 60 years
USDA-ARS?s Scientific Manuscript database
Avirulence (AVR) genes in Magnaporthe oryzae determine deployment of effective corresponding resistance (R) genes. Instability of AVR genes is the major cause for resistance breakdown. Information on the presence or absence (P/A) of AVR genes can be used as a predictor of the stability of deployed R...
Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza
2017-09-27
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.
A single determinant dominates the rate of yeast protein evolution.
Drummond, D Allan; Raval, Alpan; Wilke, Claus O
2006-02-01
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.
Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K
2003-11-01
Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Classification based upon gene expression data: bias and precision of error rates.
Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L
2007-06-01
Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp
Valhondo, Álvaro; Fernández-Echeverría, Carmen; González-Silva, Jara; Claver, Fernando; Moreno, M. Perla
2018-01-01
Abstract The objective of this study was to determine the variables that predicted serve efficacy in elite men’s volleyball, in sets with different quality of opposition. 3292 serve actions were analysed, of which 2254 were carried out in high quality of opposition sets and 1038 actions were in low quality of opposition sets, corresponding to a total of 24 matches played during the Men’s European Volleyball Championships held in 2011. The independent variables considered in this study were the serve zone, serve type, serving player, serve direction, reception zone, receiving player and reception type; the dependent variable was serve efficacy and the situational variable was quality of opposition sets. The variables that acted as predictors in both high and low quality of opposition sets were the serving player, reception zone and reception type. The serve type variable only acted as a predictor in high quality of opposition sets, while the serve zone variable only acted as a predictor in low quality of opposition sets. These results may provide important guidance in men’s volleyball training processes. PMID:29599869
ERIC Educational Resources Information Center
Froehlich, Tanya E.; Epstein, Jeffery N.; Nick, Todd G.; Melguizo Castro, Maria S.; Stein, Mark A.; Brinkman, William B.; Graham, Amanda J.; Langberg, Joshua M.; Kahn, Robert S.
2011-01-01
Objective: Because of significant individual variability in attention-deficit/hyperactivity disorder (ADHD) medication response, there is increasing interest in identifying genetic predictors of treatment effects. This study examined the role of four catecholamine-related candidate genes in moderating methylphenidate (MPH) dose-response. Method:…
Evaluating mallard adaptive management models with time series
Conn, P.B.; Kendall, W.L.
2004-01-01
Wildlife practitioners concerned with midcontinent mallard (Anas platyrhynchos) management in the United States have instituted a system of adaptive harvest management (AHM) as an objective format for setting harvest regulations. Under the AHM paradigm, predictions from a set of models that reflect key uncertainties about processes underlying population dynamics are used in coordination with optimization software to determine an optimal set of harvest decisions. Managers use comparisons of the predictive abilities of these models to gauge the relative truth of different hypotheses about density-dependent recruitment and survival, with better-predicting models giving more weight to the determination of harvest regulations. We tested the effectiveness of this strategy by examining convergence rates of 'predictor' models when the true model for population dynamics was known a priori. We generated time series for cases when the a priori model was 1 of the predictor models as well as for several cases when the a priori model was not in the model set. We further examined the addition of different levels of uncertainty into the variance structure of predictor models, reflecting different levels of confidence about estimated parameters. We showed that in certain situations, the model-selection process favors a predictor model that incorporates the hypotheses of additive harvest mortality and weakly density-dependent recruitment, even when the model is not used to generate data. Higher levels of predictor model variance led to decreased rates of convergence to the model that generated the data, but model weight trajectories were in general more stable. We suggest that predictive models should incorporate all sources of uncertainty about estimated parameters, that the variance structure should be similar for all predictor models, and that models with different functional forms for population dynamics should be considered for inclusion in predictor model! sets. All of these suggestions should help lower the probability of erroneous learning in mallard ABM and adaptive management in general.
Classification of samples into two or more ordered populations with application to a cancer trial.
Conde, D; Fernández, M A; Rueda, C; Salvador, B
2012-12-10
In many applications, especially in cancer treatment and diagnosis, investigators are interested in classifying patients into various diagnosis groups on the basis of molecular data such as gene expression or proteomic data. Often, some of the diagnosis groups are known to be related to higher or lower values of some of the predictors. The standard methods of classifying patients into various groups do not take into account the underlying order. This could potentially result in high misclassification rates, especially when the number of groups is larger than two. In this article, we develop classification procedures that exploit the underlying order among the mean values of the predictor variables and the diagnostic groups by using ideas from order-restricted inference. We generalize the existing methodology on discrimination under restrictions and provide empirical evidence to demonstrate that the proposed methodology improves over the existing unrestricted methodology. The proposed methodology is applied to a bladder cancer data set where the researchers are interested in classifying patients into various groups. Copyright © 2012 John Wiley & Sons, Ltd.
Roy, Janine; Aust, Daniela; Knösel, Thomas; Rümmele, Petra; Jahnke, Beatrix; Hentrich, Vera; Rückert, Felix; Niedergethmann, Marco; Weichert, Wilko; Bahra, Marcus; Schlitt, Hans J.; Settmacher, Utz; Friess, Helmut; Büchler, Markus; Saeger, Hans-Detlev; Schroeder, Michael; Pilarsky, Christian; Grützmann, Robert
2012-01-01
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice. PMID:22615549
Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes
Parker, Joel S.; Mullins, Michael; Cheang, Maggie C.U.; Leung, Samuel; Voduc, David; Vickery, Tammi; Davies, Sherri; Fauron, Christiane; He, Xiaping; Hu, Zhiyuan; Quackenbush, John F.; Stijleman, Inge J.; Palazzo, Juan; Marron, J.S.; Nobel, Andrew B.; Mardis, Elaine; Nielsen, Torsten O.; Ellis, Matthew J.; Perou, Charles M.; Bernard, Philip S.
2009-01-01
Purpose To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression–based “intrinsic” subtypes luminal A, luminal B, HER2-enriched, and basal-like. Methods A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. Results The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. Conclusion Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy. PMID:19204204
ERIC Educational Resources Information Center
Stephan, Yannick; Caudroit, Johan; Boiche, Julie; Sarrazin, Philippe
2011-01-01
Background: Although psychological disengagement is a well-documented phenomenon in the academic setting, the attempts to identify its predictors are scarce. In addition, existing research has mainly focused on chronic disengagement and less is known on the determinants of situational disengagement. Aims: The purpose of the present study was to…
Early Predictors of ASD in Young Children Using a Nationally Representative Data Set
ERIC Educational Resources Information Center
Jeans, Laurie M.; Santos, Rosa Milagros; Laxman, Daniel J.; McBride, Brent A.; Dyer, W. Justin
2013-01-01
Current clinical diagnosis of Autism Spectrum Disorders (ASD) occurs between 3 and 4 years of age, but increasing evidence indicates that intervention begun earlier may improve outcomes. Using secondary analysis of the Early Childhood Longitudinal Study-Birth Cohort data set, the current study identifies early predictors prior to the diagnosis of…
Moving on? Predictors of Intent to Leave among Rural and Remote RNs in Canada
ERIC Educational Resources Information Center
Stewart, Norma J.; D'Arcy, Carl; Kosteniuk, Julie; Andrews, Mary Ellen; Morgan, Debra; Forbes, Dorothy; MacLeod, Martha L. P.; Kulig, Judith C.; Pitblado, J. Roger
2011-01-01
Context: Examination of factors related to the retention or voluntary turnover of Registered Nurses (RNs) has mainly focused on urban, acute care settings. Purpose: This paper explored predictors of intent to leave (ITL) a nursing position in all rural and remote practice settings in Canada. Based on the conceptual framework developed for this…
AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.
Zhao, X G; Dai, W; Li, Y; Tian, L
2011-11-01
The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.
Wei, Lei; Wang, Jianmin; Lampert, Erika; Schlanger, Simon; DePriest, Adam D.; Hu, Qiang; Gomez, Eduardo Cortes; Murakam, Mitsuko; Glenn, Sean T.; Conroy, Jeffrey; Morrison, Carl; Azabdaftari, Gissou; Mohler, James L.; Liu, Song; Heemers, Hannelore V.
2018-01-01
Background Next-generation sequencing is revealing genomic heterogeneity in localized prostate cancer (CaP). Incomplete sampling of CaP multiclonality has limited the implications for molecular subtyping, stratification, and systemic treatment. Objective To determine the impact of genomic and transcriptomic diversity within and among intraprostatic CaP foci on CaP molecular taxonomy, predictors of progression, and actionable therapeutic targets. Design, setting, and participants Four consecutive patients with clinically localized National Comprehensive Cancer Network intermediate- or high-risk CaP who did not receive neoadjuvant therapy underwent radical prostatectomy at Roswell Park Cancer Institute in June–July 2014. Presurgical information on CaP content and a customized tissue procurement procedure were used to isolate nonmicroscopic and noncontiguous CaP foci in radical prostatectomy specimens. Three cores were obtained from the index lesion and one core from smaller lesions. RNA and DNA were extracted simultaneously from 26 cores with ≥90% CaP content and analyzed using whole-exome sequencing, single-nucleotide polymorphism arrays, and RNA sequencing. Outcome measurements and statistical analysis Somatic mutations, copy number alternations, gene expression, gene fusions, and phylogeny were defined. The impact of genomic alterations on CaP molecular classification, gene sets measured in Oncotype DX, Prolaris, and Decipher assays, and androgen receptor activity among CaP cores was determined. Results and limitations There was considerable variability in genomic alterations among CaP cores, and between RNA- and DNA-based platforms. Heterogeneity was found in molecular grouping of individual CaP foci and the activity of gene sets underlying the assays for risk stratification and androgen receptor activity, and was validated in independent genomic data sets. Determination of the implications for clinical decision-making requires follow-up studies. Conclusions Genomic make-up varies widely among CaP foci, so care should be taken when making treatment decisions based on a single biopsy or index lesions. Patient summary We examined the molecular composition of individual cancers in a patient’s prostate. We found a lot of genetic diversity among these cancers, and concluded that information from a single cancer biopsy is not sufficient to guide treatment decisions. PMID:27451135
Fine motor skills and early comprehension of the world: two new school readiness indicators.
Grissmer, David; Grimm, Kevin J; Aiyer, Sophie M; Murrah, William M; Steele, Joel S
2010-09-01
Duncan et al. (2007) presented a new methodology for identifying kindergarten readiness factors and quantifying their importance by determining which of children's developing skills measured around kindergarten entrance would predict later reading and math achievement. This article extends Duncan et al.'s work to identify kindergarten readiness factors with 6 longitudinal data sets. Their results identified kindergarten math and reading readiness and attention as the primary long-term predictors but found no effects from social skills or internalizing and externalizing behavior. We incorporated motor skills measures from 3 of the data sets and found that fine motor skills are an additional strong predictor of later achievement. Using one of the data sets, we also predicted later science scores and incorporated an additional early test of general knowledge of the social and physical world as a predictor. We found that the test of general knowledge was by far the strongest predictor of science and reading and also contributed significantly to predicting later math, making the content of this test another important kindergarten readiness indicator. Together, attention, fine motor skills, and general knowledge are much stronger overall predictors of later math, reading, and science scores than early math and reading scores alone.
Irons, Richard D; Le, Anh; Bao, Liming; Zhu, Xiongzeng; Ryder, John; Wang, Xiao Qin; Ji, Meirong; Chen, Yan; Wu, Xichun; Lin, Guowei
2009-12-01
The clinical, cytogenetic and molecular features of chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL), a disease previously considered to be rare in Asia, were examined in consecutive series of 70 cases diagnosed by our laboratory over a 30-month period. Clonal abnormalities were observed in 80% of CLL/SLL cases using a combination of conventional cytogenetic and fluorescence in situ hybridization (FISH) analysis. Those involving 14q32/IGH were the most frequent (24 cases), followed by trisomy 12 and 11q abnormalities. IgV(H) gene usage was non-random with over-representation of V(H)4-34, V(H)3-23 and a previously unreported increase in V(H)3-48 gene use. Somatic hypermutation (SHM) of IgV(H) germline sequences was observed in 56.5% of cases with stereotyped patterns of SHM observed in V(H)4-34 heavy chain complimentary-determining (HCDR1) and framework region CFR2 sequences. These findings in a Chinese population suggest subtle geographical differences in IgV(H) gene usage while the remarkably specific pattern of SHM suggest that a relatively limited set of antigens may be involved in the development of this disease worldwide. IgV(H) gene mutation status was a significant predictor of initial survival in CLL/SLL. However, an influence of karyotype on prognosis was not observed.
Smoothed Particle Hydrodynamics: Applications Within DSTO
2006-10-01
Most SPH codes use either an improved Euler method (a mid-point predictor - corrector method) [50] or a leapfrog predictor - corrector algorithm for...in the next section we used the predictor - corrector leapfrog algorithm for time stepping. If we write the set of equations describing the change in... predictor - corrector or leapfrog method is used when solving the equations. Monaghan has also noted [53] that, with a correctly chosen time step, total
van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W
2016-10-01
Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.
Network selection, Information filtering and Scalable computation
NASA Astrophysics Data System (ADS)
Ye, Changqing
This dissertation explores two application scenarios of sparsity pursuit method on large scale data sets. The first scenario is classification and regression in analyzing high dimensional structured data, where predictors corresponds to nodes of a given directed graph. This arises in, for instance, identification of disease genes for the Parkinson's diseases from a network of candidate genes. In such a situation, directed graph describes dependencies among the genes, where direction of edges represent certain causal effects. Key to high-dimensional structured classification and regression is how to utilize dependencies among predictors as specified by directions of the graph. In this dissertation, we develop a novel method that fully takes into account such dependencies formulated through certain nonlinear constraints. We apply the proposed method to two applications, feature selection in large margin binary classification and in linear regression. We implement the proposed method through difference convex programming for the cost function and constraints. Finally, theoretical and numerical analyses suggest that the proposed method achieves the desired objectives. An application to disease gene identification is presented. The second application scenario is personalized information filtering which extracts the information specifically relevant to a user, predicting his/her preference over a large number of items, based on the opinions of users who think alike or its content. This problem is cast into the framework of regression and classification, where we introduce novel partial latent models to integrate additional user-specific and content-specific predictors, for higher predictive accuracy. In particular, we factorize a user-over-item preference matrix into a product of two matrices, each representing a user's preference and an item preference by users. Then we propose a likelihood method to seek a sparsest latent factorization, from a class of over-complete factorizations, possibly with a high percentage of missing values. This promotes additional sparsity beyond rank reduction. Computationally, we design methods based on a ``decomposition and combination'' strategy, to break large-scale optimization into many small subproblems to solve in a recursive and parallel manner. On this basis, we implement the proposed methods through multi-platform shared-memory parallel programming, and through Mahout, a library for scalable machine learning and data mining, for mapReduce computation. For example, our methods are scalable to a dataset consisting of three billions of observations on a single machine with sufficient memory, having good timings. Both theoretical and numerical investigations show that the proposed methods exhibit significant improvement in accuracy over state-of-the-art scalable methods.
Co-acting gene networks predict TRAIL responsiveness of tumour cells with high accuracy.
O'Reilly, Paul; Ortutay, Csaba; Gernon, Grainne; O'Connell, Enda; Seoighe, Cathal; Boyce, Susan; Serrano, Luis; Szegezdi, Eva
2014-12-19
Identification of differentially expressed genes from transcriptomic studies is one of the most common mechanisms to identify tumor biomarkers. This approach however is not well suited to identify interaction between genes whose protein products potentially influence each other, which limits its power to identify molecular wiring of tumour cells dictating response to a drug. Due to the fact that signal transduction pathways are not linear and highly interlinked, the biological response they drive may be better described by the relative amount of their components and their functional relationships than by their individual, absolute expression. Gene expression microarray data for 109 tumor cell lines with known sensitivity to the death ligand cytokine tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) was used to identify genes with potential functional relationships determining responsiveness to TRAIL-induced apoptosis. The machine learning technique Random Forest in the statistical environment "R" with backward elimination was used to identify the key predictors of TRAIL sensitivity and differentially expressed genes were identified using the software GeneSpring. Gene co-regulation and statistical interaction was assessed with q-order partial correlation analysis and non-rejection rate. Biological (functional) interactions amongst the co-acting genes were studied with Ingenuity network analysis. Prediction accuracy was assessed by calculating the area under the receiver operator curve using an independent dataset. We show that the gene panel identified could predict TRAIL-sensitivity with a very high degree of sensitivity and specificity (AUC=0·84). The genes in the panel are co-regulated and at least 40% of them functionally interact in signal transduction pathways that regulate cell death and cell survival, cellular differentiation and morphogenesis. Importantly, only 12% of the TRAIL-predictor genes were differentially expressed highlighting the importance of functional interactions in predicting the biological response. The advantage of co-acting gene clusters is that this analysis does not depend on differential expression and is able to incorporate direct- and indirect gene interactions as well as tissue- and cell-specific characteristics. This approach (1) identified a descriptor of TRAIL sensitivity which performs significantly better as a predictor of TRAIL sensitivity than any previously reported gene signatures, (2) identified potential novel regulators of TRAIL-responsiveness and (3) provided a systematic view highlighting fundamental differences between the molecular wiring of sensitive and resistant cell types.
Pre-Veterinary Medical Grade Point Averages as Predictors of Academic Success in Veterinary College.
ERIC Educational Resources Information Center
Julius, Marcia F.; Kaiser, Herbert E.
1978-01-01
A five-year longitudinal study was designed to find the best predictors of academic success in veterinary school at Kansas State University and to set up a multiple regression formula to be used in selecting students. The preveterinary grade point average was found to be the best predictor. (JMD)
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui
2012-11-07
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Clock gene evolution: seasonal timing, phylogenetic signal, or functional constraint?
Krabbenhoft, Trevor J; Turner, Thomas F
2014-01-01
Genetic determinants of seasonal reproduction are not fully understood but may be important predictors of organism responses to climate change. We used a comparative approach to study the evolution of seasonal timing within a fish community in a natural common garden setting. We tested the hypothesis that allelic length variation in the PolyQ domain of a circadian rhythm gene, Clock1a, corresponded to interspecific differences in seasonal reproductive timing across 5 native and 1 introduced cyprinid fishes (n = 425 individuals) that co-occur in the Rio Grande, NM, USA. Most common allele lengths were longer in native species that initiated reproduction earlier (Spearman's r = -0.70, P = 0.23). Clock1a allele length exhibited strong phylogenetic signal and earlier spawners were evolutionarily derived. Aside from length variation in Clock1a, all other amino acids were identical across native species, suggesting functional constraint over evolutionary time. Interestingly, the endangered Rio Grande silvery minnow (Hybognathus amarus) exhibited less allelic variation in Clock1a and observed heterozygosity was 2- to 6-fold lower than the 5 other (nonimperiled) species. Reduced genetic variation in this functionally important gene may impede this species' capacity to respond to ongoing environmental change.
Giménez-Espert, María Del Carmen; Prado-Gascó, Vicente Javier
2018-03-01
To analyse link between empathy and emotional intelligence as a predictor of nurses' attitudes towards communication while comparing the contribution of emotional aspects and attitudinal elements on potential behaviour. Nurses' attitudes towards communication, empathy and emotional intelligence are key skills for nurses involved in patient care. There are currently no studies analysing this link, and its investigation is needed because attitudes may influence communication behaviours. Correlational study. To attain this goal, self-reported instruments (attitudes towards communication of nurses, trait emotional intelligence (Trait Emotional Meta-Mood Scale) and Jefferson Scale of Nursing Empathy (Jefferson Scale Nursing Empathy) were collected from 460 nurses between September 2015-February 2016. Two different analytical methodologies were used: traditional regression models and fuzzy-set qualitative comparative analysis models. The results of the regression model suggest that cognitive dimensions of attitude are a significant and positive predictor of the behavioural dimension. The perspective-taking dimension of empathy and the emotional-clarity dimension of emotional intelligence were significant positive predictors of the dimensions of attitudes towards communication, except for the affective dimension (for which the association was negative). The results of the fuzzy-set qualitative comparative analysis models confirm that the combination of high levels of cognitive dimension of attitudes, perspective-taking and emotional clarity explained high levels of the behavioural dimension of attitude. Empathy and emotional intelligence are predictors of nurses' attitudes towards communication, and the cognitive dimension of attitude is a good predictor of the behavioural dimension of attitudes towards communication of nurses in both regression models and fuzzy-set qualitative comparative analysis. In general, the fuzzy-set qualitative comparative analysis models appear to be better predictors than the regression models are. To evaluate current practices, establish intervention strategies and evaluate their effectiveness. The evaluation of these variables and their relationships are important in creating a satisfied and sustainable workforce and improving quality of care and patient health. © 2018 John Wiley & Sons Ltd.
Jelinski, Nicolas A; Broz, Karen; Jonkers, Wilfried; Ma, Li-Jun; Kistler, H Corby
2017-07-01
Seventy-four Fusarium oxysporum soil isolates were assayed for known effector genes present in an F. oxysporum f. sp. lycopersici race 3 tomato wilt strain (FOL MN-25) obtained from the same fields in Manatee County, Florida. Based on the presence or absence of these genes, four haplotypes were defined, two of which represented 96% of the surveyed isolates. These two most common effector haplotypes contained either all or none of the assayed race 3 effector genes. We hypothesized that soil isolates with all surveyed effector genes, similar to FOL MN-25, would be pathogenic toward tomato, whereas isolates lacking all effectors would be nonpathogenic. However, inoculation experiments revealed that presence of the effector genes alone was not sufficient to ensure pathogenicity on tomato. Interestingly, a nonpathogenic isolate containing the full suite of unmutated effector genes (FOS 4-4) appears to have undergone a chromosomal rearrangement yet remains vegetatively compatible with FOL MN-25. These observations confirm the highly dynamic nature of the F. oxysporum genome and support the conclusion that pathogenesis among free-living populations of F. oxysporum is a complex process. Therefore, the presence of effector genes alone may not be an accurate predictor of pathogenicity among soil isolates of F. oxysporum.
Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils’ carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms—including the model tuning and predictor selection—were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models’ predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction. PMID:27128736
Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.
Gene sensitizes cancer cells to chemotherapy drugs
NCI scientists have found that a gene, Schlafen-11 (SLFN11), sensitizes cells to substances known to cause irreparable damage to DNA. As part of their study, the researchers used a repository of 60 cell types to identify predictors of cancer cell respons
Kenneth B. Jr. Pierce; C. Kenneth Brewer; Janet L. Ohmann
2010-01-01
This study was designed to test the feasibility of combining a method designed to populate pixels with inventory plot data at the 30-m scale with a new national predictor data set. The new national predictor data set was developed by the USDA Forest Service Remote Sensing Applications Center (hereafter RSAC) at the 250-m scale. Gradient Nearest Neighbor (GNN)...
Gene-Environment Interactions in Cardiovascular Disease
Flowers, Elena; Froelicher, Erika Sivarajan; Aouizerat, Bradley E.
2011-01-01
Background Historically, models to describe disease were exclusively nature-based or nurture-based. Current theoretical models for complex conditions such as cardiovascular disease acknowledge the importance of both biologic and non-biologic contributors to disease. A critical feature is the occurrence of interactions between numerous risk factors for disease. The interaction between genetic (i.e. biologic, nature) and environmental (i.e. non-biologic, nurture) causes of disease is an important mechanism for understanding both the etiology and public health impact of cardiovascular disease. Objectives The purpose of this paper is to describe theoretical underpinnings of gene-environment interactions, models of interaction, methods for studying gene-environment interactions, and the related concept of interactions between epigenetic mechanisms and the environment. Discussion Advances in methods for measurement of genetic predictors of disease have enabled an increasingly comprehensive understanding of the causes of disease. In order to fully describe the effects of genetic predictors of disease, it is necessary to place genetic predictors within the context of known environmental risk factors. The additive or multiplicative effect of the interaction between genetic and environmental risk factors is often greater than the contribution of either risk factor alone. PMID:21684212
Lowe, Sarah R; Meyers, Jacquelyn L; Galea, Sandro; Aiello, Allison E; Uddin, Monica; Wildman, Derek E; Koenen, Karestan C
2015-01-01
Background Longitudinal studies of posttraumatic stress (PTS) have documented environmental factors as predictors of trajectories of higher, versus lower, symptoms, among them experiences of childhood physical abuse. Although it is now well-accepted that genes and environments jointly shape the risk of PTS, no published studies have investigated genes, or gene-by-environment interactions (GxEs), as predictors of PTS trajectories. The purpose of this study was to fill this gap. Methods and Materials We examined associations between variants of the retinoid-related orphan receptor alpha (RORA) gene and trajectory membership among a sample of predominantly non-Hispanic Black urban adults (N = 473). The RORA gene was selected based on its association with posttraumatic stress disorder (PTSD) in the first PTSD genome wide association study. Additionally, we explored GxEs between RORA variants and childhood physical abuse history. Results We found that the minor allele of the RORA SNP rs893290 was a significant predictor of membership in a trajectory of consistently high PTS, relatively to a trajectory of consistently low PTS. Additionally, the GxE of rs893290 with childhood physical abuse was significant. Decomposition of the interaction showed that minor allele frequency was more strongly associated with membership in consistently high or decreasing PTS trajectories, relative to a consistently low PTS trajectory, among participants with higher levels of childhood physical abuse. Conclusion The results of the study provide preliminary evidence that variation in the RORA gene is associated with membership in trajectories of higher PTS and that these associations are stronger among persons exposed to childhood physical abuse. Replication and analysis of functional data are needed to further our understanding of how RORA relates to PTS trajectories. PMID:25798337
Rautaharju, Pentti M; Zhang, Zhu-Ming; Vitolins, Mara; Perez, Marco; Allison, Matthew A; Greenland, Philip; Soliman, Elsayed Z
2014-07-28
We evaluated 25 repolarization-related ECG variables for the risk of coronary heart disease (CHD) death in 52 994 postmenopausal women from the Women's Health Initiative study. Hazard ratios from Cox regression were computed for subgroups of women with and without cardiovascular disease (CVD). During the average follow-up of 16.9 years, 941 CHD deaths occurred. Based on electrophysiological considerations, 2 sets of ECG variables with low correlations were considered as candidates for independent predictors of CHD death: Set 1, Ѳ(Tp|Tref), the spatial angle between T peak (Tp) and normal T reference (Tref) vectors; Ѳ(Tinit|Tterm), the angle between the initial and terminal T vectors; STJ depression in V6 and rate-adjusted QTp interval (QTpa); and Set 2, TaVR and TV1 amplitudes, heart rate, and QRS duration. Strong independent predictors with over 2-fold increased risk for CHD death in women with and without CVD were Ѳ(Tp|Tref) >42° from Set 1 and TaVR amplitude >-100 μV from Set 2. The risk for these CHD death predictors remained significant after multivariable adjustment for demographic/clinical factors. Other significant predictors for CHD death in fully adjusted risk models were Ѳ(Tinit|Tterm) >30°, TV1 >175 μV, and QRS duration >100 ms. Ѳ(Tp|Tref) angle and TaVR amplitude are associated with CHD mortality in postmenopausal women. The use of these measures to identify high-risk women for further diagnostic evaluation or more intense preventive intervention warrants further study. http://www.clinicaltrials.gov. Unique identifier: NCT00000611. © 2014 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
In Pursuit of the Elusive Elixir: Predictors of First Grade Reading.
ERIC Educational Resources Information Center
Porter, Robin
Multivariate sets of predictor variables including both cognitive and social variables, different types of preschool experiences, and family environment variables were used to predict the first-grade reading achievement of 144 first-grade boys and girls. Measures for the predictor variables had been taken at school entry and at the end of the…
Hatzis, Christos; Pusztai, Lajos; Valero, Vicente; Booser, Daniel J.; Esserman, Laura; Lluch, Ana; Vidaurre, Tatiana; Holmes, Frankie; Souchon, Eduardo; Martin, Miguel; Cotrina, José; Gomez, Henry; Hubbard, Rebekah; Chacón, J. Ignacio; Ferrer-Lozano, Jaime; Dyer, Richard; Buxton, Meredith; Gong, Yun; Wu, Yun; Ibrahim, Nuhad; Andreopoulou, Eleni; Ueno, Naoto T.; Hunt, Kelly; Yang, Wei; Nazario, Arlene; DeMichele, Angela; O’Shaughnessy, Joyce; Hortobagyi, Gabriel N.; Symmans, W. Fraser
2017-01-01
CONTEXT Accurate prediction of who will (or won’t) have high probability of survival benefit from standard treatments is fundamental for individualized cancer treatment strategies. OBJECTIVE To develop a predictor of response and survival from chemotherapy for newly diagnosed invasive breast cancer. DESIGN Development of different predictive signatures for resistance and response to neoadjuvant chemotherapy (stratified according to estrogen receptor (ER) status) from gene expression microarrays of newly diagnosed breast cancer (310 patients). Then prediction of breast cancer treatment-sensitivity using the combination of signatures for: 1) sensitivity to endocrine therapy, 2) chemo-resistance, and 3) chemo-sensitivity. Independent validation (198 patients) and comparison with other reported genomic predictors of chemotherapy response. SETTING Prospective multicenter study to develop and test genomic predictors for neoadjuvant chemotherapy. PATIENTS Newly diagnosed HER2-negative breast cancer treated with chemotherapy containing sequential taxane and anthracycline-based regimens then endocrine therapy (if hormone receptor-positive). MAIN OUTCOME MEASURES Distant relapse-free survival (DRFS) if predicted treatment-sensitive and absolute risk reduction (ARR, difference in DRFS of the two predicted groups) at median follow-up (3 years), and their 95% confidence intervals (CI). RESULTS Patients in the independent validation cohort (99% clinical Stage II–III) who were predicted to be treatment-sensitive (28% of total) had DRFS of 92% (CI 85–100) and survival benefit compared to others (absolute risk reduction (ARR) 18%; CI 6–28). Predictions were accurate if breast cancer was ER-positive (30% predicted sensitive, DRFS 97%, CI 91–100; ARR 11%, CI 0.1–21) or ER-negative (26% predicted sensitive, DRFS 83%, CI 68–100; ARR 26%, CI 4–28), and were significant in multivariate analysis after adjusting for relevant clinical-pathologic characteristics. Other genomic predictors showed paradoxically worse survival if predicted to be responsive to chemotherapy. CONCLUSION A genomic predictor combining ER status, predicted chemo-resistance, predicted chemo-sensitivity, and predicted endocrine sensitivity accurately identified patients with survival benefit following taxane-anthracycline chemotherapy. PMID:21558518
USDA-ARS?s Scientific Manuscript database
This is the first study examining the distribution of fungal effector genes among soil populations of Fusarium oxysporum in a tomato field undergoing a wilt disease epidemic. 74 F. oxysporum soil isolates were assayed for known effector genes present in a Race 3 tomato wilt strain (FOL MN-25) obtain...
Budczies, Jan; Mechtersheimer, Gunhild; Denkert, Carsten; Klauschen, Frederick; Jöhrens, Korinna; Endris, Volker; Lier, Amelie; Lasitschka, Felix; Penzel, Roland; Dietel, Manfred; Brors, Benedikt; Gröschel, Stefan; Glimm, Hanno; Schirmacher, Peter; Renner, Marcus; Fröhling, Stefan; Stenzinger, Albrecht
2017-01-01
ABSTRACT Soft-tissue sarcomas (STS) are rare malignancies that account for 1% of adult cancers and comprise more than 50 entities. Current therapeutic options for advanced-stage STS are limited. Immune checkpoint inhibitors targeting the PD-1/PD-L1 signaling axis are being explored as new treatment modality in STS; however, the determinants of response to these agents are largely unknown. Using the sarcoma data set of The Cancer Genome Altas (TCGA) and an independent cohort of untreated high-grade STS, we analyzed DNA copy number status and mRNA expression of PD-L1 in a total of 335 STS cases. Copy number gains (CNG) were detected in 54 TCGA cases (21.1%), of which 21 (8.2%) harbored focal PD-L1 CNG and that were most prevalent in myxofibrosarcoma (35%) and undifferentiated pleomorphic sarcoma (34%). In the untreated high-grade STS cohort, we detected CNG in six cases (7.6%). Analysis of co-amplified genes identified a 5.6-Mb core region comprising 27 genes, including JAK2. Patients with PD-L1 CNG had higher PD-L1 expression compared with STS without CNG (fold change, 1.8; p = 0.02), an effect that was most pronounced in the setting of focal PD-L1 CNG (fold change, 3.0; p = 0.0027). STS with PD-L1 CNG showed a significantly higher mutational load compared with tumors with a diploid PD-L1 locus (median number of mutated genes; 58 vs. 40; p = 3.6E-06), and PD-L1 CNG were associated with inferior survival (HR = 1.82; p = 0.025). In contrast, T-cell infiltrates quantified by mRNA expression of CD3Z were associated with improved survival (HR = 0.88; p = 0.024) and consequently influenced the prognostic power of PD-L1 CNG, with low CD3Z levels conferring poor survival in cases with PD-L1 CNG (HR = 1.8; p = 0.049). These data demonstrate that PD-L1 GNG and elevated expression of PD-L1 occur in a substantial proportion of STS, have prognostic impact that is modulated by T-cell infiltrates, and thus warrant investigation as response predictors for immune checkpoint inhibition. PMID:28405504
Budczies, Jan; Mechtersheimer, Gunhild; Denkert, Carsten; Klauschen, Frederick; Mughal, Sadaf S; Chudasama, Priya; Bockmayr, Michael; Jöhrens, Korinna; Endris, Volker; Lier, Amelie; Lasitschka, Felix; Penzel, Roland; Dietel, Manfred; Brors, Benedikt; Gröschel, Stefan; Glimm, Hanno; Schirmacher, Peter; Renner, Marcus; Fröhling, Stefan; Stenzinger, Albrecht
2017-01-01
Soft-tissue sarcomas (STS) are rare malignancies that account for 1% of adult cancers and comprise more than 50 entities. Current therapeutic options for advanced-stage STS are limited. Immune checkpoint inhibitors targeting the PD-1/PD-L1 signaling axis are being explored as new treatment modality in STS; however, the determinants of response to these agents are largely unknown. Using the sarcoma data set of The Cancer Genome Altas (TCGA) and an independent cohort of untreated high-grade STS, we analyzed DNA copy number status and mRNA expression of PD-L1 in a total of 335 STS cases. Copy number gains (CNG) were detected in 54 TCGA cases (21.1%), of which 21 (8.2%) harbored focal PD-L1 CNG and that were most prevalent in myxofibrosarcoma (35%) and undifferentiated pleomorphic sarcoma (34%). In the untreated high-grade STS cohort, we detected CNG in six cases (7.6%). Analysis of co-amplified genes identified a 5.6-Mb core region comprising 27 genes, including JAK2 . Patients with PD-L1 CNG had higher PD-L1 expression compared with STS without CNG (fold change, 1.8; p = 0.02), an effect that was most pronounced in the setting of focal PD-L1 CNG (fold change, 3.0; p = 0.0027). STS with PD-L1 CNG showed a significantly higher mutational load compared with tumors with a diploid PD-L1 locus (median number of mutated genes; 58 vs. 40; p = 3.6E-06), and PD-L1 CNG were associated with inferior survival (HR = 1.82; p = 0.025). In contrast, T-cell infiltrates quantified by mRNA expression of CD3Z were associated with improved survival (HR = 0.88; p = 0.024) and consequently influenced the prognostic power of PD-L1 CNG, with low CD3Z levels conferring poor survival in cases with PD-L1 CNG (HR = 1.8; p = 0.049). These data demonstrate that PD-L1 GNG and elevated expression of PD-L1 occur in a substantial proportion of STS, have prognostic impact that is modulated by T-cell infiltrates, and thus warrant investigation as response predictors for immune checkpoint inhibition.
Kudo, Itsuhiro; Esumi, Mariko; Kida, Akihiro; Ikeda, Minoru
2010-10-01
To predict the efficacy of cisplatin and radiation therapy for maxillary squamous cell carcinoma, we examined the mRNA expression of 14 cisplatin-resistant genes and p53 mutation in specimens biopsied from patients prior to initiation of therapy. Five of 10 patients had mutations in the p53 gene, of whom four had residual tumors pathologically following chemoradiotherapy (p=0.0476). Of 14 genes examined, the mRNA expression of ATP7B was significantly lower in cases that were resistant to chemoradiotherapy. Six genes including multidrug resistance protein 1 (MDR-1), multidrug resistance associated protein 1 (MRP-1), Cu++ transporting, beta polypeptide (ATP7B), xeroderma pigmentosum, complementation group A (XPA), excision repair cross-complementing rodent repair deficiency, complementation group 1 (ERCC-1) and B-cell CLL/lymphoma 2 (BCL2) were down-regulated in cases of recurrent cancers. These results show that the evaluation of p53 mutation provides the most useful predictor of therapeutic effects. In responder cases, the drug-resistant genes that were determined in cell lines by culture do not necessarily translate into clinical relevance.
Lobato, Robert L; White, William D; Mathew, Joseph P; Newman, Mark F; Smith, Peter K; McCants, Charles B; Alexander, John H; Podgoreanu, Mihai V
2011-09-13
We tested the hypothesis that genetic variation in thrombotic and inflammatory pathways is independently associated with long-term mortality after coronary artery bypass graft (CABG) surgery. Two separate cohorts of patients undergoing CABG surgery at a single institution were examined, and all-cause mortality between 30 days and 5 years after the index CABG was ascertained from the National Death Index. In a discovery cohort of 1018 patients, a panel of 90 single-nucleotide polymorphisms (SNPs) in 49 candidate genes was tested with Cox proportional hazard models to identify clinical and genomic multivariate predictors of incident death. After adjustment for multiple comparisons and clinical predictors of mortality, the homozygote minor allele of a common variant in the thrombomodulin (THBD) gene (rs1042579) was independently associated with significantly increased risk of all-cause mortality (hazard ratio, 2.26; 95% CI, 1.31 to 3.92; P=0.003). Six tag SNPs in the THBD gene, 1 of which (rs3176123) in complete linkage disequilibrium with rs1042579, were then assessed in an independent validation cohort of 930 patients. After multivariate adjustment for the clinical predictors identified in the discovery cohort and multiple testing, the homozygote minor allele of rs3176123 independently predicted all-cause mortality (hazard ratio, 3.6; 95% CI, 1.67 to 7.78; P=0.001). In 2 independent cardiac surgery cohorts, linked common allelic variants in the THBD gene are independently associated with increased long-term mortality risk after CABG and significantly improve the classification ability of traditional postoperative mortality prediction models.
Fumoto, Shoichi; Shibata, Tomotaka; Nishiki, Kohei; Tsukamoto, Yoshiyuki; Etoh, Tsuyoshi; Moriyama, Masatsugu; Shiraishi, Norio; Inomata, Masafumi
2017-01-01
Background Recently, neoadjuvant chemotherapy with docetaxel/cisplatin/5-fluorouracil (NAC-DCF) was identified as a novel strong regimen with a high rate of pathological complete response (pCR) in advanced esophageal cancer in Japan. Predicting pCR will contribute to the therapeutic strategy and the prevention of surgical invasion. However, a predictor of pCR after NAC-DCF has not yet been developed. The aim of this study was to identify a novel predictor of pCR in locally advanced esophageal cancer treated with NAC-DCF. Patients and methods A total of 32 patients who received NAC-DCF followed by esophagectomy between June 2013 and March 2016 were enrolled in this study. We divided the patients into the following 2 groups: pCR group (9 cases) and non-pCR group (23 cases), and compared gene expressions between these groups using DNA microarray data and KeyMolnet. Subsequently, a validation study of candidate molecular expression was performed in 7 additional cases. Results Seventeen molecules, including transcription factor E2F, T-cell-specific transcription factor, Src (known as “proto-oncogene tyrosine-protein kinase of sarcoma”), interferon regulatory factor 1, thymidylate synthase, cyclin B, cyclin-dependent kinase (CDK) 4, CDK, caspase-1, vitamin D receptor, histone deacetylase, MAPK/ERK kinase, bcl-2-associated X protein, runt-related transcription factor 1, PR domain zinc finger protein 1, platelet-derived growth factor receptor, and interleukin 1, were identified as candidate molecules. The molecules were mainly associated with pathways, such as transcriptional regulation by SMAD, RB/E2F, and STAT. The validation study indicated that 12 of the 17 molecules (71%) matched the trends of molecular expression. Conclusions A 17-molecule set that predicts pCR after NAC-DCF for locally advanced esophageal cancer was identified. PMID:29136005
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
Shrader, Sarah; Kern, Donna; Zoller, James; Blue, Amy
2013-01-01
Teaching interprofessional (IP) teamwork skills is a goal of interprofessional education. The purpose of this study was to examine the relationship between IP teamwork skills, attitudes and clinical outcomes in a simulated clinical setting. One hundred-twenty health professions students (medicine, pharmacy, physician assistant) worked in interprofessional teams to manage a "patient" in a health care simulation setting. Students completed the Interdisciplinary Education Perception Scale (IEPS) attitudinal survey instrument. Students' responses were averaged by team to create an IEPS attitudes score. Teamwork skills for each team were rated by trained observers using a checklist to calculate a teamwork score (TWS). Clinical outcome scores (COS) were determined by summation of completed clinical tasks performed by the team based on an expert developed checklist. Regression analyses were conducted to determine the relationship of IEPS and TWS with COS. IEPS score was not a significant predictor of COS (p=0.054), but TWS was a significant predictor (p<0.001) of COS. Results suggest that in a simulated clinical setting, students' interprofessional teamwork skills are significant predictors of positive clinical outcomes. Interprofessional curricular models that produce effective teamwork skills can improve student performance in clinical environments and likely improve teamwork practice to positively affect patient care outcomes.
Lin, Hao; Deng, En-Ze; Ding, Hui; Chen, Wei; Chou, Kuo-Chen
2014-01-01
The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called ‘iPro54-PseKNC’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k-tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC. For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the σ54 promoters. PMID:25361964
Down-weighting overlapping genes improves gene set analysis
2012-01-01
Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124
Novel gene sets improve set-level classification of prokaryotic gene expression data.
Holec, Matěj; Kuželka, Ondřej; Železný, Filip
2015-10-28
Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
Polderman, Tinca J C; Gosso, M Florencia; Posthuma, Danielle; Van Beijsterveldt, Toos C E M; Heutink, Peter; Verhulst, Frank C; Boomsma, Dorret I
2006-12-01
Variation in human behavior may be caused by differences in genotype and by non-genetic differences ("environment") between individuals. The relative contributions of genotype (G) and environment (E) to phenotypic variation can be assessed with the classical twin design. We illustrate this approach with longitudinal data collected in 5 and 12-year-old Dutch twins. At age 5 data on cognitive abilities as assessed with a standard intelligence test (IQ), working memory, selective and sustained attention, and attention problems were collected in 237 twin pairs. Seven years later, 172 twin pairs participated again when they were 12 years old and underwent a similar protocol. Results showed that variation in all phenotypes was influenced by genetic factors. For IQ the heritability estimates increased from 30% at age 5, to 80% at age 12. For executive functioning performance genetic factors accounted for around 50% of the variance at both ages. Attention problems showed high heritabilities (above 60%) at both ages, for maternal and teacher ratings. Longitudinal analyses revealed that executive functioning during childhood was weakly correlated with IQ scores at age 12. Attention problems during childhood, as rated by the mother and the teacher were stronger predictors (r = -0.28 and -0.36, respectively). This association could be attributed to a partly overlapping set of genes influencing attention problems at age 5 and IQ at age 12. IQ performance at age 5 was the best predictor of IQ at age 12. IQ at both ages was influenced by the same genes, whose influence was amplified during development.
2013-01-01
Background Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis. Methods We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities. Results A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis. Conclusions We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression. PMID:24308539
Empirical constrained Bayes predictors accounting for non-detects among repeated measures.
Moore, Reneé H; Lyles, Robert H; Manatunga, Amita K
2010-11-10
When the prediction of subject-specific random effects is of interest, constrained Bayes predictors (CB) have been shown to reduce the shrinkage of the widely accepted Bayes predictor while still maintaining desirable properties, such as optimizing mean-square error subsequent to matching the first two moments of the random effects of interest. However, occupational exposure and other epidemiologic (e.g. HIV) studies often present a further challenge because data may fall below the measuring instrument's limit of detection. Although methodology exists in the literature to compute Bayes estimates in the presence of non-detects (Bayes(ND)), CB methodology has not been proposed in this setting. By combining methodologies for computing CBs and Bayes(ND), we introduce two novel CBs that accommodate an arbitrary number of observable and non-detectable measurements per subject. Based on application to real data sets (e.g. occupational exposure, HIV RNA) and simulation studies, these CB predictors are markedly superior to the Bayes predictor and to alternative predictors computed using ad hoc methods in terms of meeting the goal of matching the first two moments of the true random effects distribution. Copyright © 2010 John Wiley & Sons, Ltd.
Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A
2013-01-29
Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.
2013-01-01
Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878
USDA-ARS?s Scientific Manuscript database
Background: Several single nucleotide polymorphisms have been proposed as potential predictors of the development of age-related diseases. Objective: To explore whether Tumor Necrosis Factor Alpha (TNFA) gene variants were associated with inflammatory status, thus facilitating the rate of telomere s...
Ross-Adams, H.; Lamb, A.D.; Dunning, M.J.; Halim, S.; Lindberg, J.; Massie, C.M.; Egevad, L.A.; Russell, R.; Ramos-Montoya, A.; Vowler, S.L.; Sharma, N.L.; Kay, J.; Whitaker, H.; Clark, J.; Hurst, R.; Gnanapragasam, V.J.; Shah, N.C.; Warren, A.Y.; Cooper, C.S.; Lynch, A.G.; Stark, R.; Mills, I.G.; Grönberg, H.; Neal, D.E.
2015-01-01
Background Understanding the heterogeneous genotypes and phenotypes of prostate cancer is fundamental to improving the way we treat this disease. As yet, there are no validated descriptions of prostate cancer subgroups derived from integrated genomics linked with clinical outcome. Methods In a study of 482 tumour, benign and germline samples from 259 men with primary prostate cancer, we used integrative analysis of copy number alterations (CNA) and array transcriptomics to identify genomic loci that affect expression levels of mRNA in an expression quantitative trait loci (eQTL) approach, to stratify patients into subgroups that we then associated with future clinical behaviour, and compared with either CNA or transcriptomics alone. Findings We identified five separate patient subgroups with distinct genomic alterations and expression profiles based on 100 discriminating genes in our separate discovery and validation sets of 125 and 103 men. These subgroups were able to consistently predict biochemical relapse (p = 0.0017 and p = 0.016 respectively) and were further validated in a third cohort with long-term follow-up (p = 0.027). We show the relative contributions of gene expression and copy number data on phenotype, and demonstrate the improved power gained from integrative analyses. We confirm alterations in six genes previously associated with prostate cancer (MAP3K7, MELK, RCBTB2, ELAC2, TPD52, ZBTB4), and also identify 94 genes not previously linked to prostate cancer progression that would not have been detected using either transcript or copy number data alone. We confirm a number of previously published molecular changes associated with high risk disease, including MYC amplification, and NKX3-1, RB1 and PTEN deletions, as well as over-expression of PCA3 and AMACR, and loss of MSMB in tumour tissue. A subset of the 100 genes outperforms established clinical predictors of poor prognosis (PSA, Gleason score), as well as previously published gene signatures (p = 0.0001). We further show how our molecular profiles can be used for the early detection of aggressive cases in a clinical setting, and inform treatment decisions. Interpretation For the first time in prostate cancer this study demonstrates the importance of integrated genomic analyses incorporating both benign and tumour tissue data in identifying molecular alterations leading to the generation of robust gene sets that are predictive of clinical outcome in independent patient cohorts. PMID:26501111
Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M
2012-01-01
Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.
PID-controller with predictor and auto-tuning algorithm: study of efficiency for thermal plants
NASA Astrophysics Data System (ADS)
Kuzishchin, V. F.; Merzlikina, E. I.; Hoang, Van Va
2017-09-01
The problem of efficiency estimation of an automatic control system (ACS) with a Smith predictor and PID-algorithm for thermal plants is considered. In order to use the predictor, it is proposed to include an auto-tuning module (ATC) into the controller; the module calculates parameters for a second-order plant module with a time delay. The study was conducted using programmable logical controllers (PLC), one of which performed control, ATC, and predictor functions. A simulation model was used as a control plant, and there were two variants of the model: one of them was built on the basis of a separate PLC, and the other was a physical model of a thermal plant in the form of an electrical heater. Analysis of the efficiency of the ACS with the predictor was carried out for several variants of the second order plant model with time delay, and the analysis was performed on the basis of the comparison of transient processes in the system when the set point was changed and when a disturbance influenced the control plant. The recommendations are given on correction of the PID-algorithm parameters when the predictor is used by means of using the correcting coefficient k for the PID parameters. It is shown that, when the set point is changed, the use of the predictor is effective taking into account the parameters correction with k = 2. When the disturbances influence the plant, the use of the predictor is doubtful, because the transient process is too long. The reason for this is that, in the neighborhood of the zero frequency, the amplitude-frequency characteristic (AFC) of the system with the predictor has an ascent in comparison with the AFC of the system without the predictor.
Compton, Michael T; Kelley, Mary E; Lloyd, Robert Brett; McClam, Tamela; Ramsay, Claire E; Haggard, Patrick J; Augustin, Sara
2011-02-01
Little is known about determinants of second-generation antipsychotic dosages during initial hospitalization of first-episode psychosis. This study examined potential predictors of dosage of an atypical antipsychotic agent, risperidone, at hospital discharge after initial evaluation and treatment of first-episode nonaffective psychosis in 3 naturalistic, public-sector treatment settings. The number of psychotropic agents prescribed and discharge antipsychotic dosage were abstracted from the medical record. Demographic and extensive clinical characteristics were assessed through a clinical research study conducted at the 3 sites. One-way analyses of variance, trend tests using specific linear combinations of estimates, and χ² tests assessed for associations between atypical antipsychotic dosage and 5 hypothesized predictors, as well as 12 exploratory variables. Among 155 hospitalized first-episode patients, 121 (78.1%) were discharged on risperidone, and subsequent analyses focused on that subset. The mean risperidone dosage among those 121 patients was 4.26 mg; 31 received 1 to 2 mg, 45 received 3 to 4 mg, 37 received 5 to 6 mg, and 8 received more than 6 mg. Analyses suggested that older age at hospitalization, the number of psychotropic agents prescribed, excited symptoms, and premorbid social functioning may be predictors of the discharge dosage. Although several factors emerged, in general, predictors of discharge dosages of second-generation agents, here exemplified by risperidone, in real-world practice settings remain to be clarified. Given the importance of antipsychotic initiation during first hospitalization, future research should test an even broader array of potential predictors.
NASA Astrophysics Data System (ADS)
Andersen, Hendrik; Cermak, Jan
2015-04-01
This contribution studies the determinants of low cloud properties based on the application of various global observation data sets in machine learning algorithms. Clouds play a crucial role in the climate system as their radiative properties and precipitation patterns significantly impact the Earth's energy balance. Cloud properties are determined by environmental conditions, as cloud formation requires the availability of water vapour ("precipitable water") and condensation nuclei in sufficiently saturated conditions. A main challenge in the research of aerosol-cloud interactions is the separation of aerosol effects from meteorological influence. To gain understanding of the processes that govern low cloud properties in order to increase accuracy of climate models and predictions of future changes in the climate system is thus of great importance. In this study, artificial neural networks are used to relate a selection of predictors (meteorological parameters, aerosol loading) to a set of predictands (cloud microphysical and optical properties). As meteorological parameters, wind direction and velocity, sea level pressure, static stability of the lower troposphere, atmospheric water vapour and temperature at the surface are used (re-analysis data by the European Centre for Medium-Range Weather Forecasts). In addition to meteorological conditions, aerosol loading is used as a predictor of cloud properties (MODIS collection 6 aerosol optical depth). The statistical model reveals significant relationships between predictors and predictands and is able to represent the aerosol-cloud-meteorology system better than frequently used bivariate relationships. The most important predictors can be identified by the additional error when excluding one predictor at a time. The sensitivity of each predictand to each of the predictors is analyzed.
What is the Optimal Strategy for Adaptive Servo-Ventilation Therapy?
Imamura, Teruhiko; Kinugawa, Koichiro
2018-05-23
Clinical advantages in the adaptive servo-ventilation (ASV) therapy have been reported in selected heart failure patients with/without sleep-disorder breathing, whereas multicenter randomized control trials could not demonstrate such advantages. Considering this discrepancy, optimal patient selection and device setting may be a key for the successful ASV therapy. Hemodynamic and echocardiographic parameters indicating pulmonary congestion such as elevated pulmonary capillary wedge pressure were reported as predictors of good response to ASV therapy. Recently, parameters indicating right ventricular dysfunction also have been reported as good predictors. Optimal device setting with appropriate pressure setting during appropriate time may also be a key. Large-scale prospective trial with optimal patient selection and optimal device setting is warranted.
2014-01-01
Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Zhu, Junyong; Chen, Zuhua; Yong, Lei
2018-02-01
The majority of genes are alternatively spliced and growing evidence suggests that alternative splicing is modified in cancer and is associated with cancer progression. Systematic analysis of alternative splicing signature in ovarian cancer is lacking and greatly needed. We profiled genome-wide alternative splicing events in 408 ovarian serous cystadenocarcinoma (OV) patients in TCGA. Seven types of alternative splicing events were curated and prognostic analyses were performed with predictive models and splicing network built for OV patients. Among 48,049 mRNA splicing events in 10,582 genes, we detected 2,611 alternative splicing events in 2,036 genes which were significant associated with overall survival of OV patients. Exon skip events were the most powerful prognostic factors among the seven types. The area under the curve of the receiver-operator characteristic curve for prognostic predictor, which was built with top significant alternative splicing events, was 0.937 at 2,000 days of overall survival, indicating powerful efficiency in distinguishing patient outcome. Interestingly, splicing correlation network suggested obvious trends in the role of splicing factors in OV. In summary, we built powerful prognostic predictors for OV patients and uncovered interesting splicing networks which could be underlying mechanisms. Copyright © 2017 Elsevier Inc. All rights reserved.
Tal, Reshef; Seifer, David B; Wantman, Ethan; Baker, Valerie; Tal, Oded
2018-02-01
To determine if serum antimüllerian hormone (AMH) is associated with and/or predictive of live birth assisted reproductive technology (ART) outcomes. Retrospective analysis of Society for Assisted Reproductive Technology Clinic Outcome Reporting System database from 2012 to 2013. Not applicable. A total of 69,336 (81.8%) fresh and 15,458 (18.2%) frozen embryo transfer (FET) cycles with AMH values. None. Live birth. A total of 85,062 out of 259,499 (32.7%) fresh and frozen-thawed autologous non-preimplantation genetic diagnosis cycles had AMH reported for cycles over this 2-year period. Of those, 70,565 cycles which had embryo transfers were included in the analysis. Serum AMH was significantly associated with live birth outcome per transfer in both fresh and FET cycles. Multiple logistic regression demonstrated that AMH is an independent predictor of live birth in fresh transfer cycles and FET cycles when controlling for age, body mass index, race, day of transfer, and number of embryos transferred. Receiver operating characteristic (ROC) curves demonstrated that the areas under the curve (AUC) for AMH as predictors of live birth in fresh cycles and thawed cycles were 0.631 and 0.540, respectively, suggesting that AMH alone is a weak independent predictor of live birth after ART. Similar ROC curves were obtained also when elective single-embryo transfer (eSET) cycles were analyzed separately in either fresh (AUC 0.655) or FET (AUC 0.533) cycles, although AMH was not found to be an independent predictor in eSET cycles. AMH is a poor independent predictor of live birth outcome in either fresh or frozen embryo transfer for both eSET and non-SET transfers. Copyright © 2017 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora
Steige, Kim A.; Laenen, Benjamin; Reimegård, Johan; Slotte, Tanja
2017-01-01
Understanding the causes of cis-regulatory variation is a long-standing aim in evolutionary biology. Although cis-regulatory variation has long been considered important for adaptation, we still have a limited understanding of the selective importance and genomic determinants of standing cis-regulatory variation. To address these questions, we studied the prevalence, genomic determinants, and selective forces shaping cis-regulatory variation in the outcrossing plant Capsella grandiflora. We first identified a set of 1,010 genes with common cis-regulatory variation using analyses of allele-specific expression (ASE). Population genomic analyses of whole-genome sequences from 32 individuals showed that genes with common cis-regulatory variation (i) are under weaker purifying selection and (ii) undergo less frequent positive selection than other genes. We further identified genomic determinants of cis-regulatory variation. Gene body methylation (gbM) was a major factor constraining cis-regulatory variation, whereas presence of nearby transposable elements (TEs) and tissue specificity of expression increased the odds of ASE. Our results suggest that most common cis-regulatory variation in C. grandiflora is under weak purifying selection, and that gene-specific functional constraints are more important for the maintenance of cis-regulatory variation than genome-scale variation in the intensity of selection. Our results agree with previous findings that suggest TE silencing affects nearby gene expression, and provide evidence for a link between gbM and cis-regulatory constraint, possibly reflecting greater dosage sensitivity of body-methylated genes. Given the extensive conservation of gbM in flowering plants, this suggests that gbM could be an important predictor of cis-regulatory variation in a wide range of plant species. PMID:28096395
Zaretzki, Jed; Bergeron, Charles; Rydberg, Patrik; Huang, Tao-wei; Bennett, Kristin P; Breneman, Curt M
2011-07-25
This article describes RegioSelectivity-Predictor (RS-Predictor), a new in silico method for generating predictive models of P450-mediated metabolism for drug-like compounds. Within this method, potential sites of metabolism (SOMs) are represented as "metabolophores": A concept that describes the hierarchical combination of topological and quantum chemical descriptors needed to represent the reactivity of potential metabolic reaction sites. RS-Predictor modeling involves the use of metabolophore descriptors together with multiple-instance ranking (MIRank) to generate an optimized descriptor weight vector that encodes regioselectivity trends across all cases in a training set. The resulting pathway-independent (O-dealkylation vs N-oxidation vs Csp(3) hydroxylation, etc.), isozyme-specific regioselectivity model may be used to predict potential metabolic liabilities. In the present work, cross-validated RS-Predictor models were generated for a set of 394 substrates of CYP 3A4 as a proof-of-principle for the method. Rank aggregation was then employed to merge independently generated predictions for each substrate into a single consensus prediction. The resulting consensus RS-Predictor models were shown to reliably identify at least one observed site of metabolism in the top two rank-positions on 78% of the substrates. Comparisons between RS-Predictor and previously described regioselectivity prediction methods reveal new insights into how in silico metabolite prediction methods should be compared.
Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.
Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K
2011-01-01
Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.
Genomic and transcriptomic predictors of triglyceride response to regular exercise
Sarzynski, Mark A; Davidsen, Peter K; Sung, Yun Ju; Hesselink, Matthijs K C; Schrauwen, Patrick; Rice, Treva K; Rao, D C; Falciani, Francesco; Bouchard, Claude
2015-01-01
Aim We performed genome-wide and transcriptome-wide profiling to identify genes and single nucleotide polymorphisms (SNPs) associated with the response of triglycerides (TG) to exercise training. Methods Plasma TG levels were measured before and after a 20-week endurance training programme in 478 white participants from the HERITAGE Family Study. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. Affymetrix HG-U133+2 arrays were used to quantitate gene expression levels from baseline muscle biopsies of a subset of participants (N=52). Genome-wide association study (GWAS) analysis was performed using MERLIN, while transcriptomic predictor models were developed using the R-package GALGO. Results The GWAS results showed that eight SNPs were associated with TG training-response (ΔTG) at p<9.9×10−6, while another 31 SNPs showed p values <1×10−4. In multivariate regression models, the top 10 SNPs explained 32.0% of the variance in ΔTG, while conditional heritability analysis showed that four SNPs statistically accounted for all of the heritability of ΔTG. A molecular signature based on the baseline expression of 11 genes predicted 27% of ΔTG in HERITAGE, which was validated in an independent study. A composite SNP score based on the top four SNPs, each from the genomic and transcriptomic analyses, was the strongest predictor of ΔTG (R2=0.14, p=3.0×10−68). Conclusions Our results indicate that skeletal muscle transcript abundance at 11 genes and SNPs at a number of loci contribute to TG response to exercise training. Combining data from genomics and transcriptomics analyses identified a SNP-based gene signature that should be further tested in independent samples. PMID:26491034
Hirdes, John P; Poss, Jeffrey W; Mitchell, Lori; Korngut, Lawrence; Heckman, George
2014-01-01
Persons with certain neurological conditions have higher mortality rates than the population without neurological conditions, but the risk factors for increased mortality within diagnostic groups are less well understood. The interRAI CHESS scale has been shown to be a strong predictor of mortality in the overall population of persons receiving health care in community and institutional settings. This study examines the performance of CHESS as a predictor of mortality among persons with 11 different neurological conditions. Survival analyses were done with interRAI assessments linked to mortality data among persons in home care (n = 359,940), complex continuing care hospitals/units (n = 88,721), and nursing homes (n = 185,309) in seven Canadian provinces/territories. CHESS was a significant predictor of mortality in all 3 care settings for the 11 neurological diagnostic groups considered after adjusting for age and sex. The distribution of CHESS scores varied between diagnostic groups and within diagnostic groups in different care settings. CHESS is a valid predictor of mortality in neurological populations in community and institutional care. It may prove useful for several clinical, administrative, policy-development, evaluation and research purposes. Because it is routinely gathered as part of normal clinical practice in jurisdictions (like Canada) that have implemented interRAI assessment instruments, CHESS can be derived without additional need for data collection.
Milioli, Heloisa Helena; Vimieiro, Renato; Riveros, Carlos; Tishchenko, Inna; Berretta, Regina; Moscato, Pablo
2015-01-01
Background The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. Methods and Findings The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. Conclusions The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes. PMID:26132585
Pharmacogenetics of Antidepressants
Crisafulli, Concetta; Fabbri, Chiara; Porcelli, Stefano; Drago, Antonio; Spina, Edoardo; De Ronchi, Diana; Serretti, Alessandro
2010-01-01
Up to 60% of depressed patients do not respond completely to antidepressants (ADs) and up to 30% do not respond at all. Genetic factors contribute for about 50% of the AD response. During the recent years the possible influence of a set of candidate genes as genetic predictors of AD response efficacy was investigated by us and others. They include the cytochrome P450 superfamily, the P-glycoprotein (ABCB1), the tryptophan hydroxylase, the catechol-O-methyltransferase, the monoamine oxidase A, the serotonin transporter (5-HTTLPR), the norepinephrine transporter, the dopamine transporter, variants in the 5-hydroxytryptamine receptors (5-HT1A, 5-HT2A, 5-HT3A, 5-HT3B, and 5-HT6), adrenoreceptor beta-1 and alpha-2, the dopamine receptors (D2), the G protein beta 3 subunit, the corticotropin releasing hormone receptors (CRHR1 and CRHR2), the glucocorticoid receptors, the c-AMP response-element binding, and the brain-derived neurotrophic factor. Marginal associations were reported for angiotensin I converting enzyme, circadian locomotor output cycles kaput protein, glutamatergic system, nitric oxide synthase, and interleukin 1-beta gene. In conclusion, gene variants seem to influence human behavior, liability to disorders and treatment response. Nonetheless, gene × environment interactions have been hypothesized to modulate several of these effects. PMID:21687501
MSH6 and MSH3 are rarely involved in genetic predisposition to nonpolypotic colon cancer.
Huang, J; Kuismanen, S A; Liu, T; Chadwick, R B; Johnson, C K; Stevens, M W; Richards, S K; Meek, J E; Gao, X; Wright, F A; Mecklin, J P; Järvinen, H J; Grönberg, H; Bisgaard, M L; Lindblom, A; Peltomäki, P
2001-02-15
A set of 90 nonpolypotic colon cancer families in which germ-line mutations of MSH2 and MLH1 had been excluded were screened for mutations in two additional DNA mismatch repair genes, MSH6 and MSH3. Kindreds fulfilling and not fulfilling the Amsterdam I criteria, showing early and late onset colorectal (and other) cancers, and having microsatellite stable and unstable tumors were included. Two partly parallel approaches were used: genetic linkage analysis (19 large families) and the protein truncation test (85, mostly smaller, families). Whereas MSH3 was not involved in any family, a large Amsterdam-positive, late-onset family showed a novel germ-line mutation in MSH6 (deletion of CT at nucleotide 3052 in exon 4). The mutation was identified through genetic linkage (multipoint lod score 2.4) and subsequent sequencing of MSH6. Furthermore, the entire MSH6 gene was sequenced exon by exon in families with frameshift mutations in the (C)8 tract in tumors, previously suggested as a predictor of MSH6 germ-line mutations; no mutations were found. We conclude that germ-line involvement of MSH6 and MSH3 is rare and that other genes are likely to account for a majority of MSH2-, MLH1-mutation negative families with nonpolypotic colon cancer.
Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H
2017-07-01
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.
Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.
Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew
2012-08-08
Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.
Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong
2015-01-01
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.
Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong
2015-01-01
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378
Gene set analysis of purine and pyrimidine antimetabolites cancer therapies.
Fridley, Brooke L; Batzler, Anthony; Li, Liang; Li, Fang; Matimba, Alice; Jenkins, Gregory D; Ji, Yuan; Wang, Liewei; Weinshilboum, Richard M
2011-11-01
Responses to therapies, either with regard to toxicities or efficacy, are expected to involve complex relationships of gene products within the same molecular pathway or functional gene set. Therefore, pathways or gene sets, as opposed to single genes, may better reflect the true underlying biology and may be more appropriate units for analysis of pharmacogenomic studies. Application of such methods to pharmacogenomic studies may enable the detection of more subtle effects of multiple genes in the same pathway that may be missed by assessing each gene individually. A gene set analysis of 3821 gene sets is presented assessing the association between basal messenger RNA expression and drug cytotoxicity using ethnically defined human lymphoblastoid cell lines for two classes of drugs: pyrimidines [gemcitabine (dFdC) and arabinoside] and purines [6-thioguanine and 6-mercaptopurine]. The gene set nucleoside-diphosphatase activity was found to be significantly associated with both dFdC and arabinoside, whereas gene set γ-aminobutyric acid catabolic process was associated with dFdC and 6-thioguanine. These gene sets were significantly associated with the phenotype even after adjusting for multiple testing. In addition, five associated gene sets were found in common between the pyrimidines and two gene sets for the purines (3',5'-cyclic-AMP phosphodiesterase activity and γ-aminobutyric acid catabolic process) with a P value of less than 0.0001. Functional validation was attempted with four genes each in gene sets for thiopurine and pyrimidine antimetabolites. All four genes selected from the pyrimidine gene sets (PSME3, CANT1, ENTPD6, ADRM1) were validated, but only one (PDE4D) was validated for the thiopurine gene sets. In summary, results from the gene set analysis of pyrimidine and purine therapies, used often in the treatment of various cancers, provide novel insight into the relationship between genomic variation and drug response.
MAGMA: Generalized Gene-Set Analysis of GWAS Data
de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle
2015-01-01
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710
MAGMA: generalized gene-set analysis of GWAS data.
de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle
2015-04-01
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.
Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay
2004-01-01
Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
2012-09-03
prac- tice to solve these initial value problems. Additionally, the predictor / corrector methods are combined with adaptive stepsize and adaptive ...for implementing a numerical path tracking algorithm is to decide which predictor / corrector method to employ, how large to take the step ∆t, and what...the endgame algorithm . Output: A steady state solution Set ǫ = 1 while ǫ >= ǫend do set the stepsize ∆ǫ by using adaptive stepsize control algorithm
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Comparison of molecular breeding values based on within- and across-breed training in beef cattle.
Kachman, Stephen D; Spangler, Matthew L; Bennett, Gary L; Hanford, Kathryn J; Kuehn, Larry A; Snelling, Warren M; Thallman, R Mark; Saatchi, Mahdi; Garrick, Dorian J; Schnabel, Robert D; Taylor, Jeremy F; Pollak, E John
2013-08-16
Although the efficacy of genomic predictors based on within-breed training looks promising, it is necessary to develop and evaluate across-breed predictors for the technology to be fully applied in the beef industry. The efficacies of genomic predictors trained in one breed and utilized to predict genetic merit in differing breeds based on simulation studies have been reported, as have the efficacies of predictors trained using data from multiple breeds to predict the genetic merit of purebreds. However, comparable studies using beef cattle field data have not been reported. Molecular breeding values for weaning and yearling weight were derived and evaluated using a database containing BovineSNP50 genotypes for 7294 animals from 13 breeds in the training set and 2277 animals from seven breeds (Angus, Red Angus, Hereford, Charolais, Gelbvieh, Limousin, and Simmental) in the evaluation set. Six single-breed and four across-breed genomic predictors were trained using pooled data from purebred animals. Molecular breeding values were evaluated using field data, including genotypes for 2227 animals and phenotypic records of animals born in 2008 or later. Accuracies of molecular breeding values were estimated based on the genetic correlation between the molecular breeding value and trait phenotype. With one exception, the estimated genetic correlations of within-breed molecular breeding values with trait phenotype were greater than 0.28 when evaluated in the breed used for training. Most estimated genetic correlations for the across-breed trained molecular breeding values were moderate (> 0.30). When molecular breeding values were evaluated in breeds that were not in the training set, estimated genetic correlations clustered around zero. Even for closely related breeds, within- or across-breed trained molecular breeding values have limited prediction accuracy for breeds that were not in the training set. For breeds in the training set, across- and within-breed trained molecular breeding values had similar accuracies. The benefit of adding data from other breeds to a within-breed training population is the ability to produce molecular breeding values that are more robust across breeds and these can be utilized until enough training data has been accumulated to allow for a within-breed training set.
Estimation of gene induction enables a relevance-based ranking of gene sets.
Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens
2009-07-01
In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.
Weng, Li; Du, Juan; Zhou, Qinghui; Cheng, Binbin; Li, Jun; Zhang, Denghai; Ling, Changquan
2012-06-08
Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide. Frequent tumor recurrence after surgery is related to its poor prognosis. Although gene expression signatures have been associated with outcome, the molecular basis of HCC recurrence is not fully understood, and there is no method to predict recurrence using peripheral blood mononuclear cells (PBMCs), which can be easily obtained for recurrence prediction in the clinical setting. According to the microarray analysis results, we constructed a co-expression network using the k-core algorithm to determine which genes play pivotal roles in the recurrence of HCC associated with the hepatitis B virus (HBV) infection. Furthermore, we evaluated the mRNA and protein expressions in the PBMCs from 80 patients with or without recurrence and 30 healthy subjects. The stability of the signatures was determined in HCC tissues from the same 80 patients. Data analysis included ROC analysis, correlation analysis, log-lank tests, and Cox modeling to identify independent predictors of tumor recurrence. The tumor-associated proteins cyclin B1, Sec62, and Birc3 were highly expressed in a subset of samples of recurrent HCC; cyclin B1, Sec62, and Birc3 positivity was observed in 80%, 65.7%, and 54.2% of the samples, respectively. The Kaplan-Meier analysis revealed that high expression levels of these proteins was associated with significantly reduced recurrence-free survival. Cox proportional hazards model analysis revealed that cyclin B1 (hazard ratio [HR], 4.762; p = 0.002) and Sec62 (HR, 2.674; p = 0.018) were independent predictors of HCC recurrence. These results revealed that cyclin B1 and Sec62 may be candidate biomarkers and potential therapeutic targets for HBV-related HCC recurrence after surgery.
Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.
Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin
2011-04-14
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.
The Gene Set Builder: collation, curation, and distribution of sets of genes
Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W
2005-01-01
Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of genes in a user-friendly environment. The application can be accessed via . PMID:16371163
NASA Technical Reports Server (NTRS)
Lin, S. S.; Tiong, I. Y.; Asher, C. R.; Murphy, M. T.; Thomas, J. D.; Griffin, B. P.
2000-01-01
Identification of thrombus-related mechanical prosthetic valve dysfunction (MPVD) has important therapeutic implications. We sought to develop an algorithm, combining clinical and echocardiographic parameters, for prediction of thrombus-related MPVD in a series of 53 patients (24 men, age 52 +/- 16 years) who had intraoperative diagnosis of thrombus or pannus from 1992 to 1997. Clinical and echocardiographic parameters were analyzed to identify predictors of thrombus and pannus. Prevalence of thrombus and diagnostic yields relative to the number of predictors were determined. There were 22 patients with thrombus, 19 patients with pannus, and 12 patients with both. Forty-two of 53 masses were visualized using transesophageal echocardiography (TEE), including 29 of 34 thrombi or both thrombi and panni and 13 of 19 isolated panni. Predictors of thrombus or mixed presentation include mobile mass (p = 0.009), attachment to occluder (p = 0.02), elevated gradients (p = 0.04), and an international normalized ratio of < or = 2.5 (p = 0.03). All 34 patients with thrombus or mixed presentation had > or = 1 predictor. The prevalence of thrombus in the presence of < or = 1, 2, and > or = 3 predictors is 14%, 69%, and 91%, respectively. Thus, TEE is sensitive in the identification of abnormal mass in the setting of MPVD. An algorithm based on clinical and transesophageal echocardiographic predictors may be useful to estimate the likelihood of thrombus in the setting of MPVD. In the presence of > or = 3 predictors, the probability of thrombus is high.
Spectral gene set enrichment (SGSE).
Frost, H Robert; Li, Zhigang; Moore, Jason H
2015-03-03
Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.
Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi
2013-01-01
Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.
Rich, S. S.; Goodarzi, M. O.; Palmer, N. D.; Langefeld, C. D.; Ziegler, J.; Haffner, S. M.; Bryer-Ash, M.; Norris, J. M.; Taylor, K. D.; Haritunians, T.; Rotter, J. I.; Chen, Y-D. I.; Wagenknecht, L. E.; Bowden, D. W.; Bergman, R. N.
2009-01-01
Aims/Hypothesis The goal of this study was to identify genes and regions in the human genome that are associated with the acute insulin response to glucose (AIRg), an important predictor of type 2 diabetes, in Hispanic-American participants from the Insulin Resistance Atherosclerosis Family Study (IRAS FS). Methods A two-stage genome-wide association scan (GWAS) was performed in IRAS FS Hispanic-American samples. In the first stage, 318K single nucleotide polymorphisms (SNPs) were assessed in 229 Hispanic-American DNA samples (from 34 families) from San Antonio, TX. SNPs with the most significant associations with AIRg were genotyped in the entire set of IRAS FS Hispanic-American samples (n = 1190). In chromosomal regions with evidence of association, additional SNPs were genotyped to capture variation in genes. Results No individual SNP achieved genome-wide levels of significance (P < 5 × 10-7); however, two regions — chromosomes 6p21 and 20p11 — had multiple highly-ranked SNPs that were associated with AIRg. Additional genotyping in these regions supported the initial evidence for variants contributing to variation in AIRg. One region resides in a gene desert between PXT1 and KCTD20 on 6p21 while the region on 20p11 has several viable candidate genes (ENTPD6, PYGB, GINS1 and R4-691N24.1). Conclusions/Interpretation A GWAS in Hispanic-American samples identified several candidate genes and loci that may be associated with AIRg. These associations explain a small component of variation in AIRg. The genes identified are involved in phosphorylation and ion transport and provide preliminary evidence that these processes have importance in beta cell response. PMID:19430760
Rich, S S; Goodarzi, M O; Palmer, N D; Langefeld, C D; Ziegler, J; Haffner, S M; Bryer-Ash, M; Norris, J M; Taylor, K D; Haritunians, T; Rotter, J I; Chen, Y-D I; Wagenknecht, L E; Bowden, D W; Bergman, R N
2009-07-01
This study sought to identify genes and regions in the human genome that are associated with the acute insulin response to glucose (AIRg), an important predictor of type 2 diabetes, in Hispanic-American participants from the Insulin Resistance Atherosclerosis Family Study (IRAS FS). A two-stage genome-wide association scan (GWAS) was performed in IRAS FS Hispanic-American samples. In the first stage, 317K single nucleotide polymorphisms (SNPs) were assessed in 229 Hispanic-American DNA samples from 34 families from San Antonio, TX, USA. SNPs with the most significant associations with AIRg were genotyped in the entire set of IRAS FS Hispanic-American samples (n = 1,190). In chromosomal regions with evidence of association, additional SNPs were genotyped to capture variation in genes. No individual SNP achieved genome-wide levels of significance (p < 5 x 10(-7)); however, two regions (chromosomes 6p21 and 20p11) had multiple highly ranked SNPs that were associated with AIRg. Additional genotyping in these regions supported the initial evidence of variants contributing to variation in AIRg. One region resides in a gene desert between PXT1 and KCTD20 on 6p21, while the region on 20p11 has several viable candidate genes (ENTPD6, PYGB, GINS1 and RP4-691N24.1). A GWAS in Hispanic-American samples identified several candidate genes and loci that may be associated with AIRg. These associations explain a small component of variation in AIRg. The genes identified are involved in phosphorylation and ion transport, and provide preliminary evidence that these processes are important in beta cell response.
Gaye, Amadou; Doumatey, Ayo P; Davis, Sharon K; Rotimi, Charles N; Gibbons, Gary H
2018-01-01
Several clinical guidelines have been proposed to distinguish metabolically healthy obesity (MHO) from other subgroups of obesity but the molecular mechanisms by which MHO individuals remain metabolically healthy despite having a high fat mass are yet to be elucidated. We conducted the first whole blood transcriptomic study designed to identify specific sets of genes that might shed novel insights into the molecular mechanisms that protect or delay the occurrence of obesity-related co-morbidities in MHO. The study included 29 African-American obese individuals, 8 MHO and 21 metabolically abnormal obese (MAO). Unbiased transcriptome-wide network analysis was carried out to identify molecular modules of co-expressed genes that are collectively associated with MHO. Network analysis identified a group of 23 co-expressed genes, including ribosomal protein genes (RPs), which were significantly downregulated in MHO subjects. The three pathways enriched in the group of co-expressed genes are EIF2 signaling, regulation of eIF4 and p70S6K signaling, and mTOR signaling. The expression of ten of the RPs collectively predicted MHO status with an area under the curve of 0.81. Triglycerides/HDL (TG/HDL) ratio, an index of insulin resistance, was the best predictor of the expression of genes in the MHO group. The higher TG/HDL values observed in the MAO subjects may underlie the activation of endoplasmic reticulum (ER) and related-stress pathways that lead to a chronic inflammatory state. In summary, these findings suggest that controlling ER stress and/or ribosomal stress by downregulating RPs or controlling TG/HDL ratio may represent effective strategies to prevent or delay the occurrence of metabolic disorders in obese individuals.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Schummers, Laura; Himes, Katherine P; Bodnar, Lisa M; Hutcheon, Jennifer A
2016-09-21
Compelled by the intuitive appeal of predicting each individual patient's risk of an outcome, there is a growing interest in risk prediction models. While the statistical methods used to build prediction models are increasingly well understood, the literature offers little insight to researchers seeking to gauge a priori whether a prediction model is likely to perform well for their particular research question. The objective of this study was to inform the development of new risk prediction models by evaluating model performance under a wide range of predictor characteristics. Data from all births to overweight or obese women in British Columbia, Canada from 2004 to 2012 (n = 75,225) were used to build a risk prediction model for preeclampsia. The data were then augmented with simulated predictors of the outcome with pre-set prevalence values and univariable odds ratios. We built 120 risk prediction models that included known demographic and clinical predictors, and one, three, or five of the simulated variables. Finally, we evaluated standard model performance criteria (discrimination, risk stratification capacity, calibration, and Nagelkerke's r 2 ) for each model. Findings from our models built with simulated predictors demonstrated the predictor characteristics required for a risk prediction model to adequately discriminate cases from non-cases and to adequately classify patients into clinically distinct risk groups. Several predictor characteristics can yield well performing risk prediction models; however, these characteristics are not typical of predictor-outcome relationships in many population-based or clinical data sets. Novel predictors must be both strongly associated with the outcome and prevalent in the population to be useful for clinical prediction modeling (e.g., one predictor with prevalence ≥20 % and odds ratio ≥8, or 3 predictors with prevalence ≥10 % and odds ratios ≥4). Area under the receiver operating characteristic curve values of >0.8 were necessary to achieve reasonable risk stratification capacity. Our findings provide a guide for researchers to estimate the expected performance of a prediction model before a model has been built based on the characteristics of available predictors.
The limitations of simple gene set enrichment analysis assuming gene independence.
Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P
2016-02-01
Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.
Salnikova, L E; Kolobkov, D S
2016-06-01
Oncologists have pointed out an urgent need for biomarkers that can be useful for clinical application to predict the susceptibility of patients to preoperative therapy. This review collects, evaluates and combines data on the influence of reported somatic and germline genetic variations on histological tumor regression in neoadjuvant settings of rectal and esophageal cancers. Five hundred and twenty-seven articles were identified, 204 retrieved and 61 studies included. Among 24 and 14 genetic markers reported for rectal and esophageal cancers, respectively, significant associations in meta-analyses were demonstrated for the following markers. In rectal cancer, major response was more frequent in carriers of the TYMS genotype 2 R/2 R-2 R/3 R (rs34743033), MTHFR genotype 677C/C (rs1801133), wild-type TP53 and KRAS genes. In esophageal cancer, successful therapy appeared to correlate with wild-type TP53. These results may be useful for future research directions to translate reported data into practical clinical use.
Functional analysis of limb enhancers in the developing fin
Booker, Betty M.; Murphy, Karl K.
2013-01-01
Despite diverging ~365 million years ago, tetrapod limbs and pectoral fins express similar genes that could be regulated by shared regulatory elements. In this study, we set out to analyze the ability of enhancers to maintain tissue specificity in these two divergent structures. We tested 22 human sequences that were previously reported as mouse limb enhancers for their enhancer activity in zebrafish (Danio rerio). Using a zebrafish enhancer assay, we found that 10/22 (45 %) were positive for pectoral fin activity. Analysis of the various criteria that correlated with positive fin activity found that both spatial limb activity and evolutionary conservation are not good predictors of fin enhancer activity. These results suggest that zebrafish enhancer assays may be limited in detecting human limb enhancers, and this limitation does not improve by the use of limb spatial expression or evolutionary conservation. PMID:24068387
NASA Astrophysics Data System (ADS)
Hsu, Shih-Jang
The major purpose of this study was to determine the relative contribution of nine variables in predicting teachers' responsible environmental behavior (REB). The theoretic framework of this study was based on the Hines model, the Hungerford and Volk model, and the environmental literacy framework proposed by Environmental Literacy Assessment Consortium. A nine-page instrument was administered by mailed questionnaire to 300 randomly selected secondary teachers in Hualien County of Taiwan with a 78.7% response rate. Correlation and stepwise multiple regression analyses were conducted. The following conclusions were drawn: (1) For all the respondents, all the nine environmental literacy variables were significant correlates of REB. These correlates included: perceived knowledge of environmental action strategies (KNOW; r =.46), intention to act (IA; r =.46), perceived skill in using environmental action strategies (SKILL; r =.45), perceived knowledge of environmental problems and issues (KISSU; r =.34), environmental sensitivity (r =.28), environmental responsibility (r =.27), perceived knowledge of ecology and environmental science (r =.27), locus of control (r =.27), and environmental attitudes (r =.21). (2) When only the nine environmental literacy variables were considered, the most parsimonious set of predictors of REB for all the teachers included: (a) KNOW, (Rsp2 =.2116); (b) IA, (Rsp2 =.0916); and (c) SKILL, (Rsp2 =.0205). For the urban teachers, the most parsimonious set of predictors included: (a) IA (Rsp2 =.2559); (b) SKILL (Rsp2.0926); and (c) environmental responsibility (Rsp2 =.0219). For the rural teachers, the most parsimonious set of predictors included: (a) KNOW (Rsp2 =.1872); (b) IA (Rsp2 =.0816); and (c) KISSU (Rsp2 =.0318). (3) When the environmental literacy variables as well as demographic and experience variables were considered, the most parsimonious set of predictors for all the teachers included: (a) KNOW, (Rsp2 =.2834); (b) IA, (Rsp2 =.0696); (c) area of residence, (Rsp2 =.0174); and (d) SKILL, (Rsp2 =.0163). For the urban teachers, the most parsimonious set of predictors included: (a) IA (Rsp2 =.3199); (b) SKILL (Rsp2 =.0840); (c) major sources of environmental information (Rsp2 =.0432); and (d) membership in environmental organizations, (Rsp2 =.0240). Implications for environmental education program development and instructional practice were presented. Recommendations for further research were also provided.
Wallert, John; Tomasoni, Mattia; Madison, Guy; Held, Claes
2017-07-05
Machine learning algorithms hold potential for improved prediction of all-cause mortality in cardiovascular patients, yet have not previously been developed with high-quality population data. This study compared four popular machine learning algorithms trained on unselected, nation-wide population data from Sweden to solve the binary classification problem of predicting survival versus non-survival 2 years after first myocardial infarction (MI). This prospective national registry study for prognostic accuracy validation of predictive models used data from 51,943 complete first MI cases as registered during 6 years (2006-2011) in the national quality register SWEDEHEART/RIKS-HIA (90% coverage of all MIs in Sweden) with follow-up in the Cause of Death register (> 99% coverage). Primary outcome was AUROC (C-statistic) performance of each model on the untouched test set (40% of cases) after model development on the training set (60% of cases) with the full (39) predictor set. Model AUROCs were bootstrapped and compared, correcting the P-values for multiple comparisons with the Bonferroni method. Secondary outcomes were derived when varying sample size (1-100% of total) and predictor sets (39, 10, and 5) for each model. Analyses were repeated on 79,869 completed cases after multivariable imputation of predictors. A Support Vector Machine with a radial basis kernel developed on 39 predictors had the highest complete cases performance on the test set (AUROC = 0.845, PPV = 0.280, NPV = 0.966) outperforming Boosted C5.0 (0.845 vs. 0.841, P = 0.028) but not significantly higher than Logistic Regression or Random Forest. Models converged to the point of algorithm indifference with increased sample size and predictors. Using the top five predictors also produced good classifiers. Imputed analyses had slightly higher performance. Improved mortality prediction at hospital discharge after first MI is important for identifying high-risk individuals eligible for intensified treatment and care. All models performed accurately and similarly and because of the superior national coverage, the best model can potentially be used to better differentiate new patients, allowing for improved targeting of limited resources. Future research should focus on further model development and investigate possibilities for implementation.
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.
Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas
2017-01-21
We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
2006-07-01
ATM genetic variant identified affects radiosensitivity and levels of the protein encoded by the ATM gene for each mutation examined. 15. SUBJECT...women without breast cancer. An additional objective is to determine the functional impact upon the protein encoded by the ATM gene for each mutation ...each ATM variant identified affects radiosensitivity and levels of the protein encoded by the ATM gene for mutations identified. Body STATEMENT
Cystic fibrosis modifier genes.
Davies, Jane; Alton, Eric; Griesenbach, Uta
2005-01-01
Since the recognition that CFTR genotype was not a good predictor of pulmonary disease severity in CF, several candidate modifier genes have been identified. It is unlikely that a single modifier gene will be found, but more probable that several haplotypes in combination may contribute, which in itself presents a major methodological challenge. The aims of such studies are to increase our understanding of disease pathogenesis, to aid prognosis and ultimately to lead to the development of novel treatments. PMID:16025767
Finding structure in data using multivariate tree boosting
Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.
2016-01-01
Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
Triggers of Eating in Everyday Life
Tomiyama, A. Janet; Mann, Traci; Comer, Lisa
2009-01-01
Understanding the triggers of eating in everyday life is crucial for the creation of interventions to promote healthy eating and to prevent overeating. Here, the proximal predictors of eating are explored in a natural setting. Research from laboratory settings suggests that restrained eaters overeat after experiencing anxiety, distraction, and the presence of positive or negative moods, but not hunger; whereas the only factor that triggers eating in unrestrained eaters is hunger. In this study, 137 female participants reported hourly for two days on these potential predictors and their eating using electronic diaries, allowing us to establish the relationships between these factors while participants went about their normal daily activities. The main outcome variables were the number of servings eaten and whether or not food was eaten. Contrary to findings from laboratory settings, in everyday life restrained eaters (1) did not overeat in response to anxiety; (2) ate less in the presence of positive or negative moods; and (3) ate more in response to hunger. The relationships between these factors and eating among unrestrained eaters were closer to those found in laboratory settings. In conclusion, predictors of eating must be studied in everyday life to develop successful interventions. PMID:18773931
Yang, Yang; Fu, Xiaofeng; Qu, Wenhao; Xiao, Yiqun; Shen, Hong-Bin
2018-04-27
Benefiting from high-throughput experimental technologies, whole-genome analysis of microRNAs (miRNAs) has been more and more common to uncover important regulatory roles of miRNAs and identify miRNA biomarkers for disease diagnosis. As a complementary information to the high-throughput experimental data, domain knowledge like the Gene Ontology and KEGG pathway is usually used to guide gene function analysis. However, functional annotation for miRNAs is scarce in the public databases. Till now, only a few methods have been proposed for measuring the functional similarity between miRNAs based on public annotation data, and these methods cover a very limited number of miRNAs, which are not applicable to large-scale miRNA analysis. In this paper, we propose a new method to measure the functional similarity for miRNAs, called miRGOFS, which has two notable features: I) it adopts a new GO semantic similarity metric which considers both common ancestors and descendants of GO terms; II) it computes similarity between GO sets in an asymmetric manner, and weights each GO term by its statistical significance. The miRGOFS-based predictor achieves an F1 of 61.2% on a benchmark data set of miRNA localization, and AUC values of 87.7% and 81.1% on two benchmark sets of miRNA-disease association, respectively. Compared with the existing functional similarity measurements of miRNAs, miRGOFS has the advantages of higher accuracy and larger coverage of human miRNAs (over 1000 miRNAs). http://www.csbio.sjtu.edu.cn/bioinf/MiRGOFS/. yangyang@cs.sjtu.edu.cn or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Baker, Stuart G
2018-02-01
When using risk prediction models, an important consideration is weighing performance against the cost (monetary and harms) of ascertaining predictors. The minimum test tradeoff (MTT) for ruling out a model is the minimum number of all-predictor ascertainments per correct prediction to yield a positive overall expected utility. The MTT for ruling out an added predictor is the minimum number of added-predictor ascertainments per correct prediction to yield a positive overall expected utility. An approximation to the MTT for ruling out a model is 1/[P (H(AUC model )], where H(AUC) = AUC - {½ (1-AUC)} ½ , AUC is the area under the receiver operating characteristic (ROC) curve, and P is the probability of the predicted event in the target population. An approximation to the MTT for ruling out an added predictor is 1 /[P {(H(AUC Model:2 ) - H(AUC Model:1 )], where Model 2 includes an added predictor relative to Model 1. The latter approximation requires the Tangent Condition that the true positive rate at the point on the ROC curve with a slope of 1 is larger for Model 2 than Model 1. These approximations are suitable for back-of-the-envelope calculations. For example, in a study predicting the risk of invasive breast cancer, Model 2 adds to the predictors in Model 1 a set of 7 single nucleotide polymorphisms (SNPs). Based on the AUCs and the Tangent Condition, an MTT of 7200 was computed, which indicates that 7200 sets of SNPs are needed for every correct prediction of breast cancer to yield a positive overall expected utility. If ascertaining the SNPs costs $500, this MTT suggests that SNP ascertainment is not likely worthwhile for this risk prediction.
Teacher Instruction as a Predictor for Student Engagement and Disruptive Behaviors
ERIC Educational Resources Information Center
Scott, Terrance M.; Hirn, Regina G.; Alter, Peter J.
2014-01-01
Effective instruction is a critical predictor of student achievement. As students with exceptionalities such as emotional and behavioral disorders and learning disabilities, who typically struggle with academic achievement, spend increasing amounts of general education settings, the need for precise instructional behaviors becomes more imperative.…
Individual- and School-Level Predictors of Student Office Disciplinary Referrals
ERIC Educational Resources Information Center
Martinez, Andrew; McMahon, Susan D.; Treger, Stan
2016-01-01
Research has widely documented the over-representation of office disciplinary referrals (ODRs) among specific student groups (e.g., African American, boys). Despite extant research documenting individual-level predictors of ODRs, few studies have accounted for the nested structure of the settings in which these events occur. Guided by critical…
The Molecular Signatures Database (MSigDB) hallmark gene set collection.
Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo
2015-12-23
The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.
Sperm RNA elements as markers of health.
Burl, Rayanne B; Clough, Stephanie; Sendler, Edward; Estill, Molly; Krawetz, Stephen A
2018-02-01
Idiopathic infertility, an etiology not identified as part of standard clinical assessment, represents approximately 20% of all infertility cases. Current male infertility diagnosis focuses on the concentration, motility, and morphology of spermatozoa. This is of limited value when predicting birth success and of limited utility when selecting the optimum treatment. At fertilization, spermatozoa provide their genomic contribution, as well as a set of RNAs and proteins that have distinct roles in development. The potential of spermatozoal RNAs to be used as a prognostic of live birth has been shown [Jodar et al. (2015) Science Translational Medicine 7(295):295re6]. This relied on a set of 648 sperm RNA elements derived from 285 genes that are perhaps indicative of future health status. To address this tenet, the present study correlated the levels of each transcript among all samples to assess linkage between transcript absence, birth success, and possible disease association. Correlations between transcript levels of the 285 genes were analyzed amongst themselves, and within the context of the entire transcript population for these samples. The transcripts ACE, GIGYF2, and ODF2 had many negative correlations and form the majority of correlations, suggesting an important function for these transcripts. Eleven of the 285 queried genes had disease-associated variants within a sperm RNA element. Three genes, GPX4, NDRG1, and RPS24 had SREs were absent in at least one individual from the test cohort. GPX4 and RPS24 are associated with developmental defects and/or neonatal lethality. This leaves the intriguing possibility that, while sperm RNAs delivered to the oocyte inform the success of live birth, they may also be predictors of human health. GO: Gene Ontology; ART: assisted reproductive technology; IVF: in vitro fertilization; ICSI: intra-cytoplasmic sperm injection; RNA-seq: RNA-sequencing; TIC: timed intercourse; IUI: intrauterine insemination; SRE: sperm RNA elements; HPA: Human Protein Atlas; SMDS: sedaghatian-type spondylometaphyseal dysplasia; DBA: Diamond-Blackfan anemia; RPKM: reads per kilobase per million; TPM: transcripts per million; IPA: Ingenuity Pathway Analysis; OMIM: Online Mendelian Inheritance in Man.
ERIC Educational Resources Information Center
Pandey, Ghanshyam N.; Rizavi, Hooriyah S.; Dwivedi, Yogesh; Pavuluri, Mani N.
2008-01-01
The study determines the gene expression of brain-derived neurotrophic factor (BDNF) in the lymphocytes of subjects with pediatric bipolar disorder (PBD) before and during treatment with mood stabilizers and in drug-free normal control subjects. Results indicate the potential of BDNF levels as a biomarker for PBD and as a treatment predictor and…
Time-Course Gene Set Analysis for Longitudinal Gene Expression Data
Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe
2015-01-01
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374
Is geography an accurate predictor of evolutionary history in the millipede family Xystodesmidae?
Marek, Paul E.
2017-01-01
For the past several centuries, millipede taxonomists have used the morphology of male copulatory structures (modified legs called gonopods), which are strongly variable and suggestive of species-level differences, as a source to understand taxon relationships. Millipedes in the family Xystodesmidae are blind, dispersal-limited and have narrow habitat requirements. Therefore, geographical proximity may instead be a better predictor of evolutionary relationship than morphology, especially since gonopodal anatomy is extremely divergent and similarities may be masked by evolutionary convergence. Here we provide a phylogenetics-based test of the power of morphological versus geographical character sets for resolving phylogenetic relationships in xystodesmid millipedes. Molecular data from 90 species-group taxa in the family were included in a six-gene phylogenetic analysis to provide the basis for comparing trees generated from these alternative character sets. The molecular phylogeny was compared to topologies representing three hypotheses: (1) a prior classification formulated using morphological and geographical data, (2) hierarchical groupings derived from Euclidean geographical distance, and (3) one based solely on morphological data. Euclidean geographical distance was not found to be a better predictor of evolutionary relationship than the prior classification, the latter of which was the most similar to the molecular topology. However, all three of the alternative topologies were highly divergent (Bayes factor >10) from the molecular topology, with the tree inferred exclusively from morphology being the most divergent. The results of this analysis show that a high degree of morphological convergence from substantial gonopod shape divergence generated spurious phylogenetic relationships. These results indicate the impact that a high degree of morphological homoplasy may have had on prior treatments of the family. Using the results of our phylogenetic analysis, we make several changes to the classification of the family, including transferring the rare state-threatened species Sigmoria whiteheadi Shelley, 1986 to the genus Apheloria Chamberlin, 1921—a relationship not readily apparent based on morphology alone. We show that while gonopod differences are a premier source of taxonomic characters to diagnose species pairwise, the traits should be viewed critically as taxonomic features uniting higher levels. PMID:29038750
Response to Early AED Therapy and Its Prognostic Implications
French, Jacqueline A.
2002-01-01
Determining the prognosis of patients when they first present with epilepsy is a difficult task. Several clinical studies have shed light on this very important topic. Potential predictors of the refractory state, including seizure etiology, duration of epilepsy before treatment, and epilepsy type, have not been successful indicators of long-term outcome. One predictor of the refractory state appears to be early response to AED therapy. Inadequate seizure control after initial treatment is a poor prognostic sign. Recent research into genetic causes of the refractory state has included investigation of the multiple drug resistance gene, and polymorphisms at drug targets. More work is needed to determine the causes and predictors of drug resistance. PMID:15309146
Predictors of fibromyalgia: a population-based twin cohort study.
Markkula, Ritva A; Kalso, Eija A; Kaprio, Jaakko A
2016-01-15
Fibromyalgia (FM) is a pain syndrome, the mechanisms and predictors of which are still unclear. We have earlier validated a set of FM-symptom questions for detecting possible FM in an epidemiological survey and thereby identified a cluster with "possible FM". This study explores prospectively predictors for membership of that FM-symptom cluster. A population-based sample of 8343 subjects of the older Finnish Twin Cohort replied to health questionnaires in 1975, 1981, and 1990. Their answers to the set of FM-symptom questions in 1990 classified them in three latent classes (LC): LC1 with no or few symptoms, LC2 with some symptoms, and LC3 with many FM symptoms. We analysed putative predictors for these symptom classes using baseline (1975 and 1981) data on regional pain, headache, migraine, sleeping, body mass index (BMI), physical activity, smoking, and zygosity, adjusted for age, gender, and education. Those with a high likelihood of having fibromyalgia at baseline were excluded from the analysis. In the final multivariate regression model, regional pain, sleeping problems, and overweight were all predictors for membership in the class with many FM symptoms. The strongest non-genetic predictor was frequent headache (OR 8.6, CI 95% 3.8-19.2), followed by persistent back pain (OR 4.7, CI 95% 3.3-6.7) and persistent neck pain (OR 3.3, CI 95% 1.8-6.0). Regional pain, frequent headache, and persistent back or neck pain, sleeping problems, and overweight are predictors for having a cluster of symptoms consistent with fibromyalgia.
snpGeneSets: An R Package for Genome-Wide Study Annotation
Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian
2016-01-01
Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex
2010-01-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
Ficklin, Stephen P; Luo, Feng; Feltus, F Alex
2010-09-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
Turning publicly available gene expression data into discoveries using gene set context analysis.
Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai
2016-01-08
Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Learning style and concept acquisition of community college students in introductory biology
NASA Astrophysics Data System (ADS)
Bobick, Sandra Burin
This study investigated the influence of learning style on concept acquisition within a sample of community college students in a general biology course. There are two subproblems within the larger problem: (1) the influence of demographic variables (age, gender, number of college credits, prior exposure to scientific information) on learning style, and (2) the correlations between prior scientific knowledge, learning style and student understanding of the concept of the gene. The sample included all students enrolled in an introductory general biology course during two consecutive semesters at an urban community college. Initial data was gathered during the first week of the semester, at which time students filled in a short questionnaire (age, gender, number of college credits, prior exposure to science information either through reading/visual sources or a prior biology course). Subjects were then given the Inventory of Learning Processes-Revised (ILP-R) which measures general preferences in five learning styles; Deep Learning; Elaborative Learning, Agentic Learning, Methodical Learning and Literal Memorization. Subjects were then given the Gene Conceptual Knowledge pretest: a 15 question objective section and an essay section. Subjects were exposed to specific concepts during lecture and laboratory exercises. At the last lab, students were given the Genetics Conceptual Knowledge Posttest. Pretest/posttest gains were correlated with demographic variables and learning styles were analyzed for significant correlations. Learning styles, as the independent variable in a simultaneous multiple regression, were significant predictors of results on the gene assessment tests, including pretest, posttest and gain. Of the learning styles, Deep Learning accounted for the greatest positive predictive value of pretest essay and pretest objective results. Literal Memorization was a significant negative predictor for posttest essay, essay gain and objective gain. Simultaneous multiple regression indicated that demographic variables were significant positive predictors for Methodical, Deep and Elaborative Learning Styles. Stepwise multiple regression resulted in number of credits, Read Science and gender (female) as significant predictors of learning styles. The findings of this study emphasize the importance of learning styles in conceptual understanding of the gene and the correlation of nonformal exposure to science information with learning style and conceptual understanding.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis
Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...
2014-01-01
Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
Rajkumar, A P; Poonkuzhali, B; Kuruvilla, A; Srivastava, A; Jacob, M; Jacob, K S
2012-12-01
Pharmacogenetics of schizophrenia has not yet delivered anticipated clinical dividends. Clinical heterogeneity of schizophrenia contributes to the poor replication of the findings of pharmacogenetic association studies. Functionally important HTR3A gene single-nucleotide polymorphisms (SNPs) were reported to be associated with response to clozapine. The aim of this study was to investigate how the association between HTR3A gene SNP and response to clozapine is influenced by various clinical predictors and by differing outcome definitions in patients with treatment-resistant schizophrenia (TRS). We recruited 101 consecutive patients with TRS, on stable doses of clozapine, and evaluated their HTR3A gene SNP (rs1062613 and rs2276302), psychopathology, and serum clozapine levels. We assessed their socio-demographic and clinical profiles, premorbid adjustment, traumatic events, cognition, and disability using standard assessment schedules. We evaluated their response to clozapine, by employing six differing outcome definitions. We employed appropriate multivariate statistics to calculate allelic and genotypic association, accounting for the effects of various clinical variables. T allele of rs1062613 and G allele of rs2276302 were significantly associated with good clinical response to clozapine (p = 0.02). However, varying outcome definitions make these associations inconsistent. rs1062613 and rs2276302 could explain only 13.8 % variability in the responses to clozapine, while combined clinical predictors and HTR3A pharmacogenetic association model could explain 38 % variability. We demonstrated that the results of pharmacogenetic studies in schizophrenia depend heavily on their outcome definitions and that combined clinical and pharmacogenetic models have better predictive values. Future pharmacogenetic studies should employ multiple outcome definitions and should evaluate associated clinical variables.
Premzl, Marko
2015-01-01
Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635
Boosting for detection of gene-environment interactions.
Pashova, H; LeBlanc, M; Kooperberg, C
2013-01-30
In genetic association studies, it is typically thought that genetic variants and environmental variables jointly will explain more of the inheritance of a phenotype than either of these two components separately. Traditional methods to identify gene-environment interactions typically consider only one measured environmental variable at a time. However, in practice, multiple environmental factors may each be imprecise surrogates for the underlying physiological process that actually interacts with the genetic factors. In this paper, we develop a variant of L(2) boosting that is specifically designed to identify combinations of environmental variables that jointly modify the effect of a gene on a phenotype. Because the effect modifiers might have a small signal compared with the main effects, working in a space that is orthogonal to the main predictors allows us to focus on the interaction space. In a simulation study that investigates some plausible underlying model assumptions, our method outperforms the least absolute shrinkage and selection and Akaike Information Criterion and Bayesian Information Criterion model selection procedures as having the lowest test error. In an example for the Women's Health Initiative-Population Architecture using Genomics and Epidemiology study, the dedicated boosting method was able to pick out two single-nucleotide polymorphisms for which effect modification appears present. The performance was evaluated on an independent test set, and the results are promising. Copyright © 2012 John Wiley & Sons, Ltd.
Training set selection for the prediction of essential genes.
Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng
2014-01-01
Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.
Excavating Culture: Ethnicity and Context as Predictors of Parenting Behavior
ERIC Educational Resources Information Center
Hill, Nancy E.; Tyson, Diana F.
2008-01-01
Ethnic, socioeconomic, and contextual predictors of parenting and family socialization practices were examined among African American and European American families. This is one of a set of coordinated studies presented in this special issue (Le et al.). With the goal of sampling African American and European American children and families that…
Significant Predictors for Effectiveness of Blended Learning in a Language Course
ERIC Educational Resources Information Center
Wichadee, Saovapa
2018-01-01
A wide variety of technologies combined with traditional classroom methods can make learning easier in the digital age. This paper studied undergraduate students' learning performance and satisfaction after they had studied in a blended setting and investigated if variables of learner characteristics and course features would be predictors for…
Predictors of Employment and Postsecondary Education of Youth with Autism
ERIC Educational Resources Information Center
Migliore, Alberto; Timmons, Jaimie; Butterworth, John; Lugas, Jaime
2012-01-01
Using logistic and multiple regressions, the authors investigated predictors of employment and postsecondary education outcomes of youth with autism in the Vocational Rehabilitation Program. Data were obtained from the RSA911 data set, fiscal year 2008. Findings showed that the odds of gaining employment were greater for youth who received job…
ERIC Educational Resources Information Center
Crowson, H. Michael; Brandes, Joyce A.
2014-01-01
This study addressed predictors of pre-service teachers' opposition toward the practice of educating students with disabilities in mainstream classroom settings--a practice known as inclusion. We tested a hypothesized path model that incorporated social dominance orientation (SDO) and contact as distal predictors, and intergroup anxiety,…
Longitudinal Predictors of Achievement: Achievement History, Family Environment, and Mental Health.
ERIC Educational Resources Information Center
Petersen, Anne C.; Kellam, Sheppard G.
In this seven year longitudinal study predictors of achievement for first graders were measured against actual school achievement of the same students in the seventh and eight grades. Three sets of variables were obtained in the first grade. Achievement history, family environment, and mental health were used as measures. Mental health was…
Predictors of Paternal and Maternal Controlling Feeding Practices with 2- to 5-Year-Old Children
ERIC Educational Resources Information Center
Haycraft, Emma; Blissett, Jackie
2012-01-01
Objective: This study aimed to identify predictors of controlling feeding practices in both mothers and fathers of young children. Design: Cross-sectional, questionnaire design. Setting: Nursery schools within the United Kingdom recruited participants. Participants: Ninety-six mothers and fathers comprising 48 mother-father pairs of male and…
ERIC Educational Resources Information Center
Shyman, Eric
2010-01-01
The purpose of this preliminary study was to identify predictors of emotional exhaustion among special education paraeducators. A sample of 100 paraeducators in public and specialized alternative setting schools was used to determine whether self-reported levels of emotional exhaustion and other job-related factors were reported. Using…
Predictors of Career Adaptability Skill among Higher Education Students in Nigeria
ERIC Educational Resources Information Center
Ebenehi, Amos Shaibu; Rashid, Abdullah Mat; Bakar, Ab Rahim
2016-01-01
This paper examined predictors of career adaptability skill among higher education students in Nigeria. A sample of 603 higher education students randomly selected from six colleges of education in Nigeria participated in this study. A set of self-reported questionnaire was used for data collection, and multiple linear regression analysis was used…
Intelligent Predictor of Energy Expenditure with the Use of Patch-Type Sensor Module
Li, Meina; Kwak, Keun-Chang; Kim, Youn-Tae
2012-01-01
This paper is concerned with an intelligent predictor of energy expenditure (EE) using a developed patch-type sensor module for wireless monitoring of heart rate (HR) and movement index (MI). For this purpose, an intelligent predictor is designed by an advanced linguistic model (LM) with interval prediction based on fuzzy granulation that can be realized by context-based fuzzy c-means (CFCM) clustering. The system components consist of a sensor board, the rubber case, and the communication module with built-in analysis algorithm. This sensor is patched onto the user's chest to obtain physiological data in indoor and outdoor environments. The prediction performance was demonstrated by root mean square error (RMSE). The prediction performance was obtained as the number of contexts and clusters increased from 2 to 6, respectively. Thirty participants were recruited from Chosun University to take part in this study. The data sets were recorded during normal walking, brisk walking, slow running, and jogging in an outdoor environment and treadmill running in an indoor environment, respectively. We randomly divided the data set into training (60%) and test data set (40%) in the normalized space during 10 iterations. The training data set is used for model construction, while the test set is used for model validation. The experimental results revealed that the prediction error on treadmill running simulation was improved by about 51% and 12% in comparison to conventional LM for training and checking data set, respectively. PMID:23202166
Comparison of molecular breeding values based on within- and across-breed training in beef cattle
2013-01-01
Background Although the efficacy of genomic predictors based on within-breed training looks promising, it is necessary to develop and evaluate across-breed predictors for the technology to be fully applied in the beef industry. The efficacies of genomic predictors trained in one breed and utilized to predict genetic merit in differing breeds based on simulation studies have been reported, as have the efficacies of predictors trained using data from multiple breeds to predict the genetic merit of purebreds. However, comparable studies using beef cattle field data have not been reported. Methods Molecular breeding values for weaning and yearling weight were derived and evaluated using a database containing BovineSNP50 genotypes for 7294 animals from 13 breeds in the training set and 2277 animals from seven breeds (Angus, Red Angus, Hereford, Charolais, Gelbvieh, Limousin, and Simmental) in the evaluation set. Six single-breed and four across-breed genomic predictors were trained using pooled data from purebred animals. Molecular breeding values were evaluated using field data, including genotypes for 2227 animals and phenotypic records of animals born in 2008 or later. Accuracies of molecular breeding values were estimated based on the genetic correlation between the molecular breeding value and trait phenotype. Results With one exception, the estimated genetic correlations of within-breed molecular breeding values with trait phenotype were greater than 0.28 when evaluated in the breed used for training. Most estimated genetic correlations for the across-breed trained molecular breeding values were moderate (> 0.30). When molecular breeding values were evaluated in breeds that were not in the training set, estimated genetic correlations clustered around zero. Conclusions Even for closely related breeds, within- or across-breed trained molecular breeding values have limited prediction accuracy for breeds that were not in the training set. For breeds in the training set, across- and within-breed trained molecular breeding values had similar accuracies. The benefit of adding data from other breeds to a within-breed training population is the ability to produce molecular breeding values that are more robust across breeds and these can be utilized until enough training data has been accumulated to allow for a within-breed training set. PMID:23953034
Maesato, Akira; Higa, Satoshi; Lin, Yenn-Jiang; Chinen, Ichiro; Ishigaki, Sugako; Yajima, Machiko; Masuzaki, Hiroaki; Chen, Shih-Ann
2011-01-01
Predictors of T wave oversensing with implantable cardioverter-defibrillator (ICD) systems remains to be clarified. Thirteen consecutive patients who underwent ICD implantations were included. The depolarization (R) and repolarization (T) of bipolar electrograms during baseline, AAI and DDD modes, and an isoproterenol (ISO) infusion were evaluated. The R wave amplitude during DDD was significantly lower as compared to that during the other conditions in all high-pass filter settings. In contrast, there was no significant difference in the T wave amplitude during the DDD as compared to the other conditions. With the DDD, there was a significantly higher incidence of a T/R ratio of greater than 0.25 as compared to that with the other conditions. T wave amplitude in Brugada syndrome was significantly higher than that in non-Brugada syndrome. The existence of Brugada syndrome and T/R ratio during the AAI with a high-pass filter setting of 10/20 Hz was an excellent predictor of T wave oversensing in the follow-up period. DDD had a significant impact on the R wave amplitude reduction and the T/R ratio during AAI can be predictors of T wave oversensing. These findings have important implications for inappropriate shocks due to T wave oversensing.
Sex-specific predictors of inpatient rehabilitation outcomes after traumatic brain injury
Chan, Vincy; Mollayeva, Tatyana; Ottenbacher, Kenneth J.; Colantonio, Angela
2016-01-01
Objective To identify sex-specific predictors of inpatient rehabilitation outcomes among patients with a traumatic brain injury (TBI) from a population based perspective. Design Retrospective cohort study Setting Ontario, Canada Participants Patients in inpatient rehabilitation for a TBI within one year of acute care discharge between 2008/09 and 2011/12 (N=1,730, 70% male, 30% female). Interventions None Main Outcome Measures Inpatient rehabilitation length of stay, total Functional Independence Measure (FIM™) score, and motor and cognitive FIM™ ratings at discharge. Results Sex, as a covariate in multivariable linear regression models, was not a significant predictor of rehabilitation outcomes. While many of the predictors examined were similar across males and females, sex-specific multivariable models identified some predictors of rehabilitation outcome that are specific for males and females; mechanism of injury (p<.0001) was a significant predictor of functional outcome only among females while comorbidities (p<.0001) was a significant predictor for males only. Conclusions Predictors of outcomes after inpatient rehabilitation differed by sex, providing evidence for a sex-specific approach in planning and resource allocation for inpatient rehabilitation services for patients with TBI. PMID:26836952
Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L.; Berry, James D.; Perry, Bridget; Green, Jordan R.
2016-01-01
Purpose To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Method Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Results Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Conclusion Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred. PMID:27148967
Adaptation of clinical prediction models for application in local settings.
Kappen, Teus H; Vergouwe, Yvonne; van Klei, Wilton A; van Wolfswinkel, Leo; Kalkman, Cor J; Moons, Karel G M
2012-01-01
When planning to use a validated prediction model in new patients, adequate performance is not guaranteed. For example, changes in clinical practice over time or a different case mix than the original validation population may result in inaccurate risk predictions. To demonstrate how clinical information can direct updating a prediction model and development of a strategy for handling missing predictor values in clinical practice. A previously derived and validated prediction model for postoperative nausea and vomiting was updated using a data set of 1847 patients. The update consisted of 1) changing the definition of an existing predictor, 2) reestimating the regression coefficient of a predictor, and 3) adding a new predictor to the model. The updated model was then validated in a new series of 3822 patients. Furthermore, several imputation models were considered to handle real-time missing values, so that possible missing predictor values could be anticipated during actual model use. Differences in clinical practice between our local population and the original derivation population guided the update strategy of the prediction model. The predictive accuracy of the updated model was better (c statistic, 0.68; calibration slope, 1.0) than the original model (c statistic, 0.62; calibration slope, 0.57). Inclusion of logistical variables in the imputation models, besides observed patient characteristics, contributed to a strategy to deal with missing predictor values at the time of risk calculation. Extensive knowledge of local, clinical processes provides crucial information to guide the process of adapting a prediction model to new clinical practices.
Modeling Predictors of Duties Not Including Flying Status.
Tvaryanas, Anthony P; Griffith, Converse
2018-01-01
The purpose of this study was to reuse available datasets to conduct an analysis of potential predictors of U.S. Air Force aircrew nonavailability in terms of being in "duties not to include flying" (DNIF) status. This study was a retrospective cohort analysis of U.S. Air Force aircrew on active duty during the period from 2003-2012. Predictor variables included age, Air Force Specialty Code (AFSC), clinic location, diagnosis, gender, pay grade, and service component. The response variable was DNIF duration. Nonparametric methods were used for the exploratory analysis and parametric methods were used for model building and statistical inference. Out of a set of 783 potential predictor variables, 339 variables were identified from the nonparametric exploratory analysis for inclusion in the parametric analysis. Of these, 54 variables had significant associations with DNIF duration in the final model fitted to the validation data set. The predicted results of this model for DNIF duration had a correlation of 0.45 with the actual number of DNIF days. Predictor variables included age, 6 AFSCs, 7 clinic locations, and 40 primary diagnosis categories. Specific demographic (i.e., age), occupational (i.e., AFSC), and health (i.e., clinic location and primary diagnosis category) DNIF drivers were identified. Subsequent research should focus on the application of primary, secondary, and tertiary prevention measures to ameliorate the potential impact of these DNIF drivers where possible.Tvaryanas AP, Griffith C Jr. Modeling predictors of duties not including flying status. Aerosp Med Hum Perform. 2018; 89(1):52-57.
Rong, Panying; Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L; Berry, James D; Perry, Bridget; Green, Jordan R
2016-01-01
To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred.
Positive-unlabeled learning for disease gene identification
Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong
2012-01-01
Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Result: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r.a-star.edu.sg/∼xlli/PUDI/PUDI.html. Contact: xlli@i2r.a-star.edu.sg or yang0293@e.ntu.edu.sg Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:22923290
ERIC Educational Resources Information Center
Propper, Cathi; Moore, Ginger A.; Mills-Koonce, W. Roger; Halpern, Carolyn Tucker; Hill-Soderlund, Ashley L.; Calkins, Susan D.; Carbone, Mary Anna; Cox, Martha
2008-01-01
This study investigated dopamine receptor genes ("DRD2" and "DRD4") and maternal sensitivity as predictors of infant respiratory sinus arrhythmia (RSA) and RSA reactivity, purported indices of vagal tone and vagal regulation, in a challenge task at 3, 6, and 12 months in 173 infant-mother dyads. Hierarchical linear modeling (HLM) revealed that at…
Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi
2016-01-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi
2015-11-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Computation and application of tissue-specific gene set weights.
Frost, H Robert
2018-04-06
Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.
NASA Astrophysics Data System (ADS)
Hofer, Marlis; Nemec, Johanna
2016-04-01
This study presents first steps towards verifying the hypothesis that uncertainty in global and regional glacier mass simulations can be reduced considerably by reducing the uncertainty in the high-resolution atmospheric input data. To this aim, we systematically explore the potential of different predictor strategies for improving the performance of regression-based downscaling approaches. The investigated local-scale target variables are precipitation, air temperature, wind speed, relative humidity and global radiation, all at a daily time scale. Observations of these target variables are assessed from three sites in geo-environmentally and climatologically very distinct settings, all within highly complex topography and in the close proximity to mountain glaciers: (1) the Vernagtbach station in the Northern European Alps (VERNAGT), (2) the Artesonraju measuring site in the tropical South American Andes (ARTESON), and (3) the Brewster measuring site in the Southern Alps of New Zealand (BREWSTER). As the large-scale predictors, ERA interim reanalysis data are used. In the applied downscaling model training and evaluation procedures, particular emphasis is put on appropriately accounting for the pitfalls of limited and/or patchy observation records that are usually the only (if at all) available data from the glacierized mountain sites. Generalized linear models and beta regression are investigated as alternatives to ordinary least squares regression for the non-Gaussian target variables. By analyzing results for the three different sites, five predictands and for different times of the year, we look for systematic improvements in the downscaling models' skill specifically obtained by (i) using predictor data at the optimum scale rather than the minimum scale of the reanalysis data, (ii) identifying the optimum predictor allocation in the vertical, and (iii) considering multiple (variable, level and/or grid point) predictor options combined with state-of-art empirical feature selection tools. First results show that in particular for air temperature, those downscaling models based on direct predictor selection show comparative skill like those models based on multiple predictors. For all other target variables, however, multiple predictor approaches can considerably outperform those models based on single predictors. Including multiple variable types emerges as the most promising predictor option (in particular for wind speed at all sites), even if the same predictor set is used across the different cases.
Predicting Response to Histone Deacetylase Inhibitors Using High-Throughput Genomics.
Geeleher, Paul; Loboda, Andrey; Lenkala, Divya; Wang, Fan; LaCroix, Bonnie; Karovic, Sanja; Wang, Jacqueline; Nebozhyn, Michael; Chisamore, Michael; Hardwick, James; Maitland, Michael L; Huang, R Stephanie
2015-11-01
Many disparate biomarkers have been proposed as predictors of response to histone deacetylase inhibitors (HDI); however, all have failed when applied clinically. Rather than this being entirely an issue of reproducibility, response to the HDI vorinostat may be determined by the additive effect of multiple molecular factors, many of which have previously been demonstrated. We conducted a large-scale gene expression analysis using the Cancer Genome Project for discovery and generated another large independent cancer cell line dataset across different cancers for validation. We compared different approaches in terms of how accurately vorinostat response can be predicted on an independent out-of-batch set of samples and applied the polygenic marker prediction principles in a clinical trial. Using machine learning, the small effects that aggregate, resulting in sensitivity or resistance, can be recovered from gene expression data in a large panel of cancer cell lines.This approach can predict vorinostat response accurately, whereas single gene or pathway markers cannot. Our analyses recapitulated and contextualized many previous findings and suggest an important role for processes such as chromatin remodeling, autophagy, and apoptosis. As a proof of concept, we also discovered a novel causative role for CHD4, a helicase involved in the histone deacetylase complex that is associated with poor clinical outcome. As a clinical validation, we demonstrated that a common dose-limiting toxicity of vorinostat, thrombocytopenia, can be predicted (r = 0.55, P = .004) several days before it is detected clinically. Our work suggests a paradigm shift from single-gene/pathway evaluation to simultaneously evaluating multiple independent high-throughput gene expression datasets, which can be easily extended to other investigational compounds where similar issues are hampering clinical adoption. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Transcriptomics of cortical gray matter thickness decline during normal aging
Kochunov, P; Charlesworth, J; Winkler, A; Hong, LE; Nichols, T; Curran, JE; Sprooten, E; Jahanshad, N; Thompson, PM; Johnson, MP; Kent, JW; Landman, BA; Mitchell, B; Cole, SA; Dyer, TD; Moses, EK; Goring, HHH; Almasy, L; Duggirala, R; Olvera, RL; Glahn, DC; Blangero, J
2013-01-01
Introduction We performed a whole-transcriptome correlation analysis, followed by the pathway enrichment and testing of innate immune response pathways analyses to evaluate the hypothesis that transcriptional activity can predict cortical gray matter thickness (GMT) variability during normal cerebral aging Methods Transcriptome and GMT data were availabe for 379 individuals (age range=28–85) community-dwelling members of large extended Mexican-American families. Collection of transcriptome data preceded that of neuroimaging data by 17 years. Genome-wide gene transcriptome data consisted of 20,413 heritable lymphocytes-based transcripts. GMT measurements were performed from high-resolution (isotropic 800µm) T1-weighted MRI. Transcriptome-wide and pathway enrichment analysis was used to classify genes correlated with GMT. Transcripts for sixty genes from seven innate immune pathways were tested as specific predictors of GMT variability. Results Transcripts for eight genes (IGFBP3, LRRN3, CRIP2, SCD, IDS, TCF4, GATA3, HN1) passed the transcriptome-wide significance threshold. Four orthogonal factors extracted from this set predicted 31.9% of the variability in the whole-brain and between 23.4 and 35% of regional GMT measurements. Pathway enrichment analysis identified six functional categories including cellular proliferation, aggregation, differentiation, viral infection, and metabolism. The integrin signaling pathway was significantly (p<10−6) enriched with GMT. Finally, three innate immune pathways (complement signaling, toll-receptors and scavenger and immunoglobulins) were significantly associated with GMT. Conclusion Expression activity for the genes that regulate cellular proliferation, adhesion, differentiation and inflammation can explain a significant proportion of individual variability in cortical GMT. Our findings suggest that normal cerebral aging is the product of a progressive decline in regenerative capacity and increased neuroinflammation. PMID:23707588
Transcriptomics of cortical gray matter thickness decline during normal aging.
Kochunov, P; Charlesworth, J; Winkler, A; Hong, L E; Nichols, T E; Curran, J E; Sprooten, E; Jahanshad, N; Thompson, P M; Johnson, M P; Kent, J W; Landman, B A; Mitchell, B; Cole, S A; Dyer, T D; Moses, E K; Goring, H H H; Almasy, L; Duggirala, R; Olvera, R L; Glahn, D C; Blangero, J
2013-11-15
We performed a whole-transcriptome correlation analysis, followed by the pathway enrichment and testing of innate immune response pathway analyses to evaluate the hypothesis that transcriptional activity can predict cortical gray matter thickness (GMT) variability during normal cerebral aging. Transcriptome and GMT data were available for 379 individuals (age range=28-85) community-dwelling members of large extended Mexican American families. Collection of transcriptome data preceded that of neuroimaging data by 17 years. Genome-wide gene transcriptome data consisted of 20,413 heritable lymphocytes-based transcripts. GMT measurements were performed from high-resolution (isotropic 800 μm) T1-weighted MRI. Transcriptome-wide and pathway enrichment analysis was used to classify genes correlated with GMT. Transcripts for sixty genes from seven innate immune pathways were tested as specific predictors of GMT variability. Transcripts for eight genes (IGFBP3, LRRN3, CRIP2, SCD, IDS, TCF4, GATA3, and HN1) passed the transcriptome-wide significance threshold. Four orthogonal factors extracted from this set predicted 31.9% of the variability in the whole-brain and between 23.4 and 35% of regional GMT measurements. Pathway enrichment analysis identified six functional categories including cellular proliferation, aggregation, differentiation, viral infection, and metabolism. The integrin signaling pathway was significantly (p<10(-6)) enriched with GMT. Finally, three innate immune pathways (complement signaling, toll-receptors and scavenger and immunoglobulins) were significantly associated with GMT. Expression activity for the genes that regulate cellular proliferation, adhesion, differentiation and inflammation can explain a significant proportion of individual variability in cortical GMT. Our findings suggest that normal cerebral aging is the product of a progressive decline in regenerative capacity and increased neuroinflammation. Copyright © 2013 Elsevier Inc. All rights reserved.
Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma.
Fowles, Jared S; Brown, Kristen C; Hess, Ann M; Duval, Dawn L; Gustafson, Daniel L
2016-02-19
Genomics-based predictors of drug response have the potential to improve outcomes associated with cancer therapy. Osteosarcoma (OS), the most common primary bone cancer in dogs, is commonly treated with adjuvant doxorubicin or carboplatin following amputation of the affected limb. We evaluated the use of gene-expression based models built in an intra- or interspecies manner to predict chemosensitivity and treatment outcome in canine OS. Models were built and evaluated using microarray gene expression and drug sensitivity data from human and canine cancer cell lines, and canine OS tumor datasets. The "COXEN" method was utilized to filter gene signatures between human and dog datasets based on strong co-expression patterns. Models were built using linear discriminant analysis via the misclassification penalized posterior algorithm. The best doxorubicin model involved genes identified in human lines that were co-expressed and trained on canine OS tumor data, which accurately predicted clinical outcome in 73 % of dogs (p = 0.0262, binomial). The best carboplatin model utilized canine lines for gene identification and model training, with canine OS tumor data for co-expression. Dogs whose treatment matched our predictions had significantly better clinical outcomes than those that didn't (p = 0.0006, Log Rank), and this predictor significantly associated with longer disease free intervals in a Cox multivariate analysis (hazard ratio = 0.3102, p = 0.0124). Our data show that intra- and interspecies gene expression models can successfully predict response in canine OS, which may improve outcome in dogs and serve as pre-clinical validation for similar methods in human cancer research.
GO-based functional dissimilarity of gene sets.
Díaz-Díaz, Norberto; Aguilar-Ruiz, Jesús S
2011-09-01
The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity), a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG). It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.
Pluess, Andrea R; Frank, Aline; Heiri, Caroline; Lalagüe, Hadrien; Vendramin, Giovanni G; Oddou-Muratorio, Sylvie
2016-04-01
The evolutionary potential of long-lived species, such as forest trees, is fundamental for their local persistence under climate change (CC). Genome-environment association (GEA) analyses reveal if species in heterogeneous environments at the regional scale are under differential selection resulting in populations with potential preadaptation to CC within this area. In 79 natural Fagus sylvatica populations, neutral genetic patterns were characterized using 12 simple sequence repeat (SSR) markers, and genomic variation (144 single nucleotide polymorphisms (SNPs) out of 52 candidate genes) was related to 87 environmental predictors in the latent factor mixed model, logistic regressions and isolation by distance/environmental (IBD/IBE) tests. SSR diversity revealed relatedness at up to 150 m intertree distance but an absence of large-scale spatial genetic structure and IBE. In the GEA analyses, 16 SNPs in 10 genes responded to one or several environmental predictors and IBE, corrected for IBD, was confirmed. The GEA often reflected the proposed gene functions, including indications for adaptation to water availability and temperature. Genomic divergence and the lack of large-scale neutral genetic patterns suggest that gene flow allows the spread of advantageous alleles in adaptive genes. Thereby, adaptation processes are likely to take place in species occurring in heterogeneous environments, which might reduce their regional extinction risk under CC. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Ynalvez, Ruby; Garza-Gongora, Claudia; Ynalvez, Marcus Antonius; Hara, Noriko
2014-01-01
Although doctoral mentors recognize the benefits of providing quality advisement and close guidance, those of sharing project management responsibilities with mentees are still not well recognized. We observed that mentees, who have the opportunity to co-manage projects, generate more written output. Here we examine the link between research productivity, doctoral mentoring practices (DMP), and doctoral research experiences (DRE) of mentees in programs in the non-West. Inspired by previous findings that early career productivity is a strong predictor of later productivity, we examine the research productivity of 210 molecular biology doctoral students in selected programs in Japan, Singapore, and Taiwan. Using principal component (PC) analysis, we derive two sets of PCs: one set from 15 DMP and another set from 16 DRE items. We model research productivity using Poisson and negative-binomial regression models with these sets as predictors. Our findings suggest a need to re-think extant practices and to allocate resources toward professional career development in training future scientists. We contend that doctoral science training must not only be an occasion for future scientists to learn scientific and technical skills, but it must also be the opportunity to experience, to acquire, and to hone research management skills. © 2014 The International Union of Biochemistry and Molecular Biology.
Prediction of epigenetically regulated genes in breast cancer cell lines.
Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria E H; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram
2010-06-04
Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profiles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines, which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profiles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fixed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically significant negative correlation between methylation profiles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identified 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.
Bostrom, Meredith A.; Kao, W.H. Linda; Li, Man; Abboud, Hanna E.; Adler, Sharon G.; Iyengar, Sudha K.; Kimmel, Paul L.; Hanson, Robert L.; Nicholas, Susanne B.; Rasooly, Rebekah S.; Sedor, John R.; Coresh, Josef; Kohn, Orly F.; Leehey, David J.; Thornley-Brown, Denyse; Bottinger, Erwin P.; Lipkowitz, Michael S.; Meoni, Lucy A.; Klag, Michael J.; Lu, Lingyi; Hicks, Pamela J.; Langefeld, Carl D.; Parekh, Rulan S.; Bowden, Donald W.; Freedman, Barry I.
2011-01-01
Background African Americans (AAs) have increased susceptibility to non-diabetic nephropathy relative to European Americans. Study Design Follow-up of a pooled genome-wide association study (GWAS) in AA dialysis patients with nondiabetic nephropathy; novel gene-gene interaction analyses. Setting & Participants Wake Forest sample: 962 AA nondiabetic nephropathy cases; 931 non-nephropathy controls. Replication sample: 668 Family Investigation of Nephropathy and Diabetes (FIND) AA nondiabetic nephropathy cases; 804 non-nephropathy controls. Predictors Individual genotyping of top 1420 pooled GWAS-associated single nucleotide polymorphisms (SNPs) and 54 SNPs in six nephropathy susceptibility genes. Outcomes APOL1 genetic association and additional candidate susceptibility loci interacting with, or independently from, APOL1. Results The strongest GWAS associations included two non-coding APOL1 SNPs, rs2239785 (odds ratio [OR], 0.33; dominant; p = 5.9 × 10−24) and rs136148 (OR, 0.54; additive; p = 1.1 × 10−7) with replication in FIND (p = 5.0 × 10−21 and 1.9 × 10−05, respectively). Rs2239785 remained significantly associated after controlling for the APOL1 G1 and G2 coding variants. Additional top hits included a CFH SNP(OR from meta-analysis in above 3367 AA cases and controls, 0.81; additive; p = 6.8 × 10−4). The 1420 SNPs were tested for interaction with APOL1 G1 and G2 variants. Several interactive SNPs were detected, the most significant was rs16854341 in the podocin gene (NPHS2) (p = 0.0001). Limitations Non-pooled GWAS have not been performed in AA nondiabetic nephropathy. Conclusions This follow-up of a pooled GWAS provides additional and independent evidence that APOL1 variants contribute to nondiabetic nephropathy in AAs and identified additional associated and interactive non-diabetic nephropathy susceptibility genes. PMID:22119407
van Manen, Janine G; Andrea, Helene; van den Eijnden, Ellen; Meerman, Anke M M A; Thunnissen, Moniek M; Hamers, Elisabeth F M; Huson, Nelleke; Ziegler, Uli; Stijnen, Theo; Busschbach, Jan J V; Timman, Reinier; Verheul, Roel
2011-10-01
Within a large multi-center study in patients with personality disorders, we investigated the relationship between patient characteristics and treatment allocation. Personality pathology, symptom distress, treatment history, motivational factors, and sociodemographics were measured at intake in 923 patients, who subsequently enrolled in short-term or long-term outpatient, day hospital, or inpatient psychotherapy for personality pathology. Logistic regressions were used to examine the predictors of allocation decisions. We found a moderate relationship (R(2) = 0.36) between patient characteristics and treatment setting, and a weak relationship (R(2) = 0.18) between patient characteristics and treatment duration. The most prominent predictors for setting were: symptom distress, cluster C personality pathology, level of identity integration, treatment history, motivation, and parental responsibility. For duration the most prominent predictor was age. We conclude from this study that, in addition to pathology and motivation factors, sociodemographics and treatment history are related to treatment allocation in clinical practice.
Cortese, Michael J; Khanna, Maya M
2007-08-01
Age of acquisition (AoA) ratings were obtained and were used in hierarchical regression analyses to predict naming and lexical-decision performance for 2,342 words (from Balota, Cortese, Sergent-Marshall, Spieler, & Yap, 2004). In the analyses, AoA was included in addition to the set of predictors used by Balota et al. (2004). AoA significantly predicted latency performance on both tasks above and beyond the standard predictor set. However, AoA was more strongly related to lexical-decision performance than to naming performance. Finally, the previously reported effect of imageability on naming latencies by Balota et al. was not significant with AoA included as a factor. These results are consistent with the idea either that AoA has a semantic/lexical locus or that AoA effects emerge primarily in situations in which the input-output mapping is arbitrary.
Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won
2014-01-01
Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways.
Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won
2014-01-01
Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways. PMID:24497971
The use of generalised additive models (GAM) in dentistry.
Helfenstein, U; Steiner, M; Menghini, G
1997-12-01
Ordinary multiple regression and logistic multiple regression are widely applied statistical methods which allow a researcher to 'explain' or 'predict' a response variable from a set of explanatory variables or predictors. In these models it is usually assumed that quantitative predictors such as age enter linearly into the model. During recent years these methods have been further developed to allow more flexibility in the way explanatory variables 'act' on a response variable. The methods are called 'generalised additive models' (GAM). The rigid linear terms characterising the association between response and predictors are replaced in an optimal way by flexible curved functions of the predictors (the 'profiles'). Plotting the 'profiles' allows the researcher to visualise easily the shape by which predictors 'act' over the whole range of values. The method facilitates detection of particular shapes such as 'bumps', 'U-shapes', 'J-shapes, 'threshold values' etc. Information about the shape of the association is not revealed by traditional methods. The shapes of the profiles may be checked by performing a Monte Carlo simulation ('bootstrapping'). After the presentation of the GAM a relevant case study is presented in order to demonstrate application and use of the method. The dependence of caries in primary teeth on a set of explanatory variables is investigated. Since GAMs may not be easily accessible to dentists, this article presents them in an introductory condensed form. It was thought that a nonmathematical summary and a worked example might encourage readers to consider the methods described. GAMs may be of great value to dentists in allowing visualisation of the shape by which predictors 'act' and obtaining a better understanding of the complex relationships between predictors and response.
Rassenti, Laura Z; Huynh, Lang; Toy, Tracy L; Chen, Liguang; Keating, Michael J; Gribben, John G; Neuberg, Donna S; Flinn, Ian W; Rai, Kanti R; Byrd, John C; Kay, Neil E; Greaves, Andrew; Weiss, Arthur; Kipps, Thomas J
2004-08-26
The course of chronic lymphocytic leukemia (CLL) is variable. In aggressive disease, the CLL cells usually express an unmutated immunoglobulin heavy-chain variable-region gene (IgV(H)) and the 70-kD zeta-associated protein (ZAP-70), whereas in indolent disease, the CLL cells usually express mutated IgV(H) but lack expression of ZAP-70. We evaluated the CLL B cells from 307 patients with CLL for ZAP-70 and mutations in the rearranged IgV(H) gene. We then investigated the association between the results and the time from diagnosis to initial therapy. We found that ZAP-70 was expressed above a defined threshold level in 117 of the 164 patients with an unmutated IgV(H) gene (71 percent), but in only 24 of the 143 patients with a mutated IgV(H) gene (17 percent, P<0.001). Among the patients with ZAP-70-positive CLL cells, the median time from diagnosis to initial therapy in those who had an unmutated IgV(H) gene (2.8 years) was not significantly different from the median time in those who had a mutated IgV(H) gene (4.2 years, P=0.07). However, the median time from diagnosis to initial treatment in each of these groups was significantly shorter than the time in patients with ZAP-70-negative CLL cells who had either mutated or unmutated IgV(H) genes (P<0.001). The median time from diagnosis to initial therapy among patients who did not have ZAP-70 was 11.0 years in those with a mutated IgV(H) gene and 7.1 years in those with an unmutated IgV(H) gene (P<0.001). Although the presence of an unmutated IgV(H) gene is strongly associated with the expression of ZAP-70, ZAP-70 is a stronger predictor of the need for treatment in B-cell CLL. Copyright 2004 Massachusetts Medical Society
ERIC Educational Resources Information Center
Bowman-Perrott, Lisa; Benz, Michael R.; Hsu, Hsien-Yuan; Kwok, Oi-Man; Eisterhold, Leigh Ann; Zhang, Dalun
2013-01-01
Disciplinary exclusion practices are on the rise nationally, as are concerns about their disproportionate use and lack of effectiveness. This study used data from the Special Education Elementary Longitudinal Study to examine patterns and predictors of disciplinary exclusion over time. Students with emotional/behavioral disorders were most likely…
ERIC Educational Resources Information Center
Shittu, Ahmed Tajudeen; Kareem, Bamidele Wahab; Obielodan, Omotayo Olabo; Fakomogbon, Michael Ayodele
2017-01-01
This study examined predictors of pre-service science teachers' behavioral intention toward e-resources use for teaching in Nigeria. The study used cross-sectional survey research method and a questionnaire with a set of items that measure technology preparedness, perceived usefulness, perceived ease of use and behavioral intention to gather the…
Hope & Achievement Goals as Predictors of Student Behavior & Achievement in a Rural Middle School
ERIC Educational Resources Information Center
Walker, Christopher O.; Winn, Tina D.; Adams, Blakely N.; Shepard, Misty R.; Huddleston, Chelsea D.; Godwin, Kayce L.
2009-01-01
Relations among a set of cognitive-motivational variables were examined with the intent being to assess and clarify the nature of their interconnections within a middle school sample. Student perception of hope, which includes perceptions of agency and pathways, was investigated, along with personal achievement goal orientation, as predictors of…
Bowers, John C.; Griffitt, Kimberly J.; Molina, Vanessa; Clostio, Rachel W.; Pei, Shaofeng; Laws, Edward; Paranjpye, Rohinee N.; Strom, Mark S.; Chen, Arlene; Hasan, Nur A.; Huq, Anwar; Noriea, Nicholas F.; Grimes, D. Jay; Colwell, Rita R.
2012-01-01
Vibrio parahaemolyticus and Vibrio vulnificus, which are native to estuaries globally, are agents of seafood-borne or wound infections, both potentially fatal. Like all vibrios autochthonous to coastal regions, their abundance varies with changes in environmental parameters. Sea surface temperature (SST), sea surface height (SSH), and chlorophyll have been shown to be predictors of zooplankton and thus factors linked to vibrio populations. The contribution of salinity, conductivity, turbidity, and dissolved organic carbon to the incidence and distribution of Vibrio spp. has also been reported. Here, a multicoastal, 21-month study was conducted to determine relationships between environmental parameters and V. parahaemolyticus and V. vulnificus populations in water, oysters, and sediment in three coastal areas of the United States. Because ecologically unique sites were included in the study, it was possible to analyze individual parameters over wide ranges. Molecular methods were used to detect genes for thermolabile hemolysin (tlh), thermostable direct hemolysin (tdh), and tdh-related hemolysin (trh) as indicators of V. parahaemolyticus and the hemolysin gene vvhA for V. vulnificus. SST and suspended particulate matter were found to be strong predictors of total and potentially pathogenic V. parahaemolyticus and V. vulnificus. Other predictors included chlorophyll a, salinity, and dissolved organic carbon. For the ecologically unique sites included in the study, SST was confirmed as an effective predictor of annual variation in vibrio abundance, with other parameters explaining a portion of the variation not attributable to SST. PMID:22865080
GeneTopics - interpretation of gene sets via literature-driven topic models
2013-01-01
Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875
Molecular proxies for climate maladaptation in a long-lived tree (Pinus pinaster Aiton, Pinaceae).
Jaramillo-Correa, Juan-Pablo; Rodríguez-Quilón, Isabel; Grivet, Delphine; Lepoittevin, Camille; Sebastiani, Federico; Heuertz, Myriam; Garnier-Géré, Pauline H; Alía, Ricardo; Plomion, Christophe; Vendramin, Giovanni G; González-Martínez, Santiago C
2015-03-01
Understanding adaptive genetic responses to climate change is a main challenge for preserving biological diversity. Successful predictive models for climate-driven range shifts of species depend on the integration of information on adaptation, including that derived from genomic studies. Long-lived forest trees can experience substantial environmental change across generations, which results in a much more prominent adaptation lag than in annual species. Here, we show that candidate-gene SNPs (single nucleotide polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived keystone tree. A set of 18 SNPs potentially associated with climate, 5 of them involving amino acid-changing variants, were retained after performing logistic regression, latent factor mixed models, and Bayesian analyses of SNP-climate correlations. These relationships identified temperature as an important adaptive driver in maritime pine and highlighted that selective forces are operating differentially in geographically discrete gene pools. The frequency of the locally advantageous alleles at these selected loci was strongly correlated with survival in a common garden under extreme (hot and dry) climate conditions, which suggests that candidate-gene SNPs can be used to forecast the likely destiny of natural forest ecosystems under climate change scenarios. Differential levels of forest decline are anticipated for distinct maritime pine gene pools. Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive power of range-shift models and help establish mitigation measures for long-lived keystone forest trees in the face of impending climate change. Copyright © 2015 by the Genetics Society of America.
Molecular Proxies for Climate Maladaptation in a Long-Lived Tree (Pinus pinaster Aiton, Pinaceae)
Jaramillo-Correa, Juan-Pablo; Rodríguez-Quilón, Isabel; Grivet, Delphine; Lepoittevin, Camille; Sebastiani, Federico; Heuertz, Myriam; Garnier-Géré, Pauline H.; Alía, Ricardo; Plomion, Christophe; Vendramin, Giovanni G.; González-Martínez, Santiago C.
2015-01-01
Understanding adaptive genetic responses to climate change is a main challenge for preserving biological diversity. Successful predictive models for climate-driven range shifts of species depend on the integration of information on adaptation, including that derived from genomic studies. Long-lived forest trees can experience substantial environmental change across generations, which results in a much more prominent adaptation lag than in annual species. Here, we show that candidate-gene SNPs (single nucleotide polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived keystone tree. A set of 18 SNPs potentially associated with climate, 5 of them involving amino acid-changing variants, were retained after performing logistic regression, latent factor mixed models, and Bayesian analyses of SNP–climate correlations. These relationships identified temperature as an important adaptive driver in maritime pine and highlighted that selective forces are operating differentially in geographically discrete gene pools. The frequency of the locally advantageous alleles at these selected loci was strongly correlated with survival in a common garden under extreme (hot and dry) climate conditions, which suggests that candidate-gene SNPs can be used to forecast the likely destiny of natural forest ecosystems under climate change scenarios. Differential levels of forest decline are anticipated for distinct maritime pine gene pools. Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive power of range-shift models and help establish mitigation measures for long-lived keystone forest trees in the face of impending climate change. PMID:25549630
Gene Selection and Cancer Classification: A Rough Sets Based Approach
NASA Astrophysics Data System (ADS)
Sun, Lijun; Miao, Duoqian; Zhang, Hongyun
Indentification of informative gene subsets responsible for discerning between available samples of gene expression data is an important task in bioinformatics. Reducts, from rough sets theory, corresponding to a minimal set of essential genes for discerning samples, is an efficient tool for gene selection. Due to the compuational complexty of the existing reduct algoritms, feature ranking is usually used to narrow down gene space as the first step and top ranked genes are selected . In this paper,we define a novel certierion based on the expression level difference btween classes and contribution to classification of the gene for scoring genes and present a algorithm for generating all possible reduct from informative genes.The algorithm takes the whole attribute sets into account and find short reduct with a significant reduction in computational complexity. An exploration of this approach on benchmark gene expression data sets demonstrates that this approach is successful for selecting high discriminative genes and the classification accuracy is impressive.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi
2016-01-01
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961
Almeida, Luciana O; Neto, Marinaldo P C; Sousa, Lucas O; Tannous, Maryna A; Curti, Carlos; Leopoldino, Andreia M
2017-04-18
Epigenetic modifications are essential in the control of normal cellular processes and cancer development. DNA methylation and histone acetylation are major epigenetic modifications involved in gene transcription and abnormal events driving the oncogenic process. SET protein accumulates in many cancer types, including head and neck squamous cell carcinoma (HNSCC); SET is a member of the INHAT complex that inhibits gene transcription associating with histones and preventing their acetylation. We explored how SET protein accumulation impacts on the regulation of gene expression, focusing on DNA methylation and histone acetylation. DNA methylation profile of 24 tumour suppressors evidenced that SET accumulation decreased DNA methylation in association with loss of 5-methylcytidine, formation of 5-hydroxymethylcytosine and increased TET1 levels, indicating an active DNA demethylation mechanism. However, the expression of some suppressor genes was lowered in cells with high SET levels, suggesting that loss of methylation is not the main mechanism modulating gene expression. SET accumulation also downregulated the expression of 32 genes of a panel of 84 transcription factors, and SET directly interacted with chromatin at the promoter of the downregulated genes, decreasing histone acetylation. Gene expression analysis after cell treatment with 5-aza-2'-deoxycytidine (5-AZA) and Trichostatin A (TSA) revealed that histone acetylation reversed transcription repression promoted by SET. These results suggest a new function for SET in the regulation of chromatin dynamics. In addition, TSA diminished both SET protein levels and SET capability to bind to gene promoter, suggesting that administration of epigenetic modifier agents could be efficient to reverse SET phenotype in cancer.
Combining multiple tools outperforms individual methods in gene set enrichment analyses.
Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E
2017-02-01
Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Blatti, Charles; Sinha, Saurabh
2016-07-15
Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. DRaWR was implemented as an R package available at veda.cs.illinois.edu/DRaWR. blatti@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
GSCALite: A Web Server for Gene Set Cancer Analysis.
Liu, Chun-Jie; Hu, Fei-Fei; Xia, Mengxuan; Han, Leng; Zhang, Qiong; Guo, An-Yuan
2018-05-22
The availability of cancer genomic data makes it possible to analyze genes related to cancer. Cancer is usually the result of a set of genes and the signal of a single gene could be covered by background noise. Here, we present a web server named Gene Set Cancer Analysis (GSCALite) to analyze a set of genes in cancers with the following functional modules. (i) Differential expression in tumor vs normal, and the survival analysis; (ii) Genomic variations and their survival analysis; (iii) Gene expression associated cancer pathway activity; (iv) miRNA regulatory network for genes; (v) Drug sensitivity for genes; (vi) Normal tissue expression and eQTL for genes. GSCALite is a user-friendly web server for dynamic analysis and visualization of gene set in cancer and drug sensitivity correlation, which will be of broad utilities to cancer researchers. GSCALite is available on http://bioinfo.life.hust.edu.cn/web/GSCALite/. guoay@hust.edu.cn or zhangqiong@hust.edu.cn. Supplementary data are available at Bioinformatics online.
Forecasting malaria in a highly endemic country using environmental and clinical predictors.
Zinszer, Kate; Kigozi, Ruth; Charland, Katia; Dorsey, Grant; Brewer, Timothy F; Brownstein, John S; Kamya, Moses R; Buckeridge, David L
2015-06-18
Malaria thrives in poor tropical and subtropical countries where local resources are limited. Accurate disease forecasts can provide public and clinical health services with the information needed to implement targeted approaches for malaria control that make effective use of limited resources. The objective of this study was to determine the relevance of environmental and clinical predictors of malaria across different settings in Uganda. Forecasting models were based on health facility data collected by the Uganda Malaria Surveillance Project and satellite-derived rainfall, temperature, and vegetation estimates from 2006 to 2013. Facility-specific forecasting models of confirmed malaria were developed using multivariate autoregressive integrated moving average models and produced weekly forecast horizons over a 52-week forecasting period. The model with the most accurate forecasts varied by site and by forecast horizon. Clinical predictors were retained in the models with the highest predictive power for all facility sites. The average error over the 52 forecasting horizons ranged from 26 to 128% whereas the cumulative burden forecast error ranged from 2 to 22%. Clinical data, such as drug treatment, could be used to improve the accuracy of malaria predictions in endemic settings when coupled with environmental predictors. Further exploration of malaria forecasting is necessary to improve its accuracy and value in practice, including examining other environmental and intervention predictors, including insecticide-treated nets.
Eguzo, KN; Lawal, AK; Umezurike, CC; Eseigbe, CE
2015-01-01
Background: Patient attrition has been a challenge in managing HIV programs in resource-limited settings. Aim: This study reviews the predictors of loss to follow-up (LTFU) in our hospital and suggests the best practices for dealing with the issue. Subjects and Methods: A 5-year retrospective cohort study of 1256 HIV-infected patients. Baseline CD4 counts, age, gender, year of enrolment, and antiretroviral therapy combination regimen were considered in this study. Kaplan–Meier models were used to estimate the univariate time-to-LTFU and Cox proportional hazards models to identify the multivariate predictors of LTFU. Results: Twenty-four percent (23.9% [301/1256]) of patients were lost to follow-up. Baseline CD4 count, year of enrolment, and drug combination were significant predictors of LTFU. Patients enrolled earlier (2008/2009) were twice as likely to be LTFU compared with those enrolled later (2010–2013). Gender and age did not significantly predict LTFU nor confound other predictors. Conclusion: The program showed higher LTFU rates than most studies in Nigeria and Africa, maybe due to difficulties with the access to the hospital and possible treatment fatigue. This study recommends the provision of transportation subsidies and proactive patient follow-up with “peer-tracking” to reduce LTFU among HIV infected patients, especially in resource-limited settings. PMID:27057373
Bayesian estimation of the discrete coefficient of determination.
Chen, Ting; Braga-Neto, Ulisses M
2016-12-01
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.
Prognostic values of soluble CD30 and CD30 gene polymorphisms in heart transplantation.
Frisaldi, Elisa; Conca, Raffaele; Magistroni, Paola; Fasano, Maria Edvige; Mazzola, Gina; Patanè, Francesco; Zingarelli, Edoardo; Dall'omo, Anna M; Brusco, Alfredo; Amoroso, Antonio
2006-04-27
Pretransplant soluble CD30 (sCD30) is a predictor of kidney graft outcome. Its status as a predictor of heart transplant (HT) outcome has not been established. We have studied this question by assessing sCD30 levels and the number of (CCAT)n repeats of the microsatellite in the CD30 promoter region, which is able alone to repress gene transcription, in the sera of 83 HT patients and 77 of their donors. sCD30 was non-significantly increased in the patients, whereas there were no differences in the CD30 microsatellite allele frequencies. A negative correlation between the number of (CCAT)n and sCD30 levels was evident in the donors. Patients with pretransplant sCD30
Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano
2011-04-11
In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.
Shuryak, Igor
2017-01-01
The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected "signal"; (5) using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets
Shuryak, Igor
2017-01-01
The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation. PMID:28068401
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.
Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi
2016-07-08
Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Independent Predictors of Prognosis Based on Oral Cavity Squamous Cell Carcinoma Surgical Margins.
Buchakjian, Marisa R; Ginader, Timothy; Tasche, Kendall K; Pagedar, Nitin A; Smith, Brian J; Sperry, Steven M
2018-05-01
Objective To conduct a multivariate analysis of a large cohort of oral cavity squamous cell carcinoma (OCSCC) cases for independent predictors of local recurrence (LR) and overall survival (OS), with emphasis on the relationship between (1) prognosis and (2) main specimen permanent margins and intraoperative tumor bed frozen margins. Study Design Retrospective cohort study. Setting Tertiary academic head and neck cancer program. Subjects and Methods This study included 426 patients treated with OCSCC resection between 2005 and 2014 at University of Iowa Hospitals and Clinics. Patients underwent excision of OCSCC with intraoperative tumor bed frozen margin sampling and main specimen permanent margin assessment. Multivariate analysis of the data set to predict LR and OS was performed. Results Independent predictors of LR included nodal involvement, histologic grade, and main specimen permanent margin status. Specifically, the presence of a positive margin (odds ratio, 6.21; 95% CI, 3.3-11.9) or <1-mm/carcinoma in situ margin (odds ratio, 2.41; 95% CI, 1.19-4.87) on the main specimen was an independent predictor of LR, whereas intraoperative tumor bed margins were not predictive of LR on multivariate analysis. Similarly, independent predictors of OS on multivariate analysis included nodal involvement, extracapsular extension, and a positive main specimen margin. Tumor bed margins did not independently predict OS. Conclusion The main specimen margin is a strong independent predictor of LR and OS on multivariate analysis. Intraoperative tumor bed frozen margins do not independently predict prognosis. We conclude that emphasis should be placed on evaluating the main specimen margins when estimating prognosis after OCSCC resection.
Calibration of Predictor Models Using Multiple Validation Experiments
NASA Technical Reports Server (NTRS)
Crespo, Luis G.; Kenny, Sean P.; Giesy, Daniel P.
2015-01-01
This paper presents a framework for calibrating computational models using data from several and possibly dissimilar validation experiments. The offset between model predictions and observations, which might be caused by measurement noise, model-form uncertainty, and numerical error, drives the process by which uncertainty in the models parameters is characterized. The resulting description of uncertainty along with the computational model constitute a predictor model. Two types of predictor models are studied: Interval Predictor Models (IPMs) and Random Predictor Models (RPMs). IPMs use sets to characterize uncertainty, whereas RPMs use random vectors. The propagation of a set through a model makes the response an interval valued function of the state, whereas the propagation of a random vector yields a random process. Optimization-based strategies for calculating both types of predictor models are proposed. Whereas the formulations used to calculate IPMs target solutions leading to the interval value function of minimal spread containing all observations, those for RPMs seek to maximize the models' ability to reproduce the distribution of observations. Regarding RPMs, we choose a structure for the random vector (i.e., the assignment of probability to points in the parameter space) solely dependent on the prediction error. As such, the probabilistic description of uncertainty is not a subjective assignment of belief, nor is it expected to asymptotically converge to a fixed value, but instead it casts the model's ability to reproduce the experimental data. This framework enables evaluating the spread and distribution of the predicted response of target applications depending on the same parameters beyond the validation domain.
Predictors of physical activity in persons with mental illness: Testing a social cognitive model.
Zechner, Michelle R; Gill, Kenneth J
2016-12-01
This study examined whether the social cognitive theory (SCT) model can be used to explain the variance in physical exercise among persons with serious mental illnesses. A cross-sectional, correlational design was employed. Participants from community mental health centers and supported housing programs (N = 120) completed 9 measures on exercise, social support, self-efficacy, outcome expectations, barriers, and goal-setting. Hierarchical regression tested the relationship between self-report physical activity and SCT determinants while controlling for personal characteristics. The model explained 25% of the variance in exercise. Personal characteristics explained 18% of the variance in physical activity, SCT variables of social support, self-efficacy, outcome expectations, barriers, and goals were entered simultaneously, and they added an r2 change value of .07. Gender (β = -.316, p = .001) and Brief Symptom Inventory Depression subscale (β = -2.08, p < .040) contributed significantly to the prediction of exercise. In a separate stepwise multiple regression, we entered only SCT variables as potential predictors of exercise. Goal-setting was the single significant predictor, F(1, 118) = 13.59, p < .01), r2 = .10. SCT shows promise as an explanatory model of exercise in persons with mental illnesses. Goal-setting practices, self-efficacy, outcome expectations and social support from friends for exercise should be encouraged by psychiatric rehabilitation practitioners. People with more depressive symptoms and women exercise less. More work is needed on theoretical exploration of predictors of exercise. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
USDA-ARS?s Scientific Manuscript database
The number of females genotyped in the US has increased to 12,650 per month, comprising 74% of the total genotypes received in 2013. Concerns of increased computing time of the ever-growing predictor population set and linkage decay between the ancestral population and the current animals have arise...
ERIC Educational Resources Information Center
Bagamery, Bruce D.; Lasik, John J.; Nixon, Don R.
2005-01-01
Extending previous studies, the authors examined a larger set of variables to identify predictors of student performance on the Educational Testing Service Major Field Exam in Business, which has been shown to be an externally valid measure of student learning outcomes. Significant predictors include gender, whether students took the SAT, and…
ERIC Educational Resources Information Center
Hermans, Mikaela; Korhonen, Johan
2017-01-01
The aim of this study is to examine Finnish ninth graders' attitudes towards the consequences of climate change, their views on climate change mitigation and the impact of a set of selected predictors on their willingness to act in climate change mitigation. Students (N = 549) from 11 secondary schools participated in the questionnaire-based…
Motivational Factors and Predictors for Attending a Continuing Education Program for Older Adults
ERIC Educational Resources Information Center
Cachioni, Meire; Nascimento Ordonez, Tiago; Lima da Silva, Thais Bento; Tavares Batistoni, Samila Sathler; Sanches Yassuda, Mônica; Caldeira Melo, Ruth; Rodrigues da Costa Domingues, Marisa Accioly; Lopes, Andrea
2014-01-01
The objectives were to describe the stated motives of participants who enrolled in a program at the Open University for the Elderly (UnATI, in Portuguese), identify correlations between the stated motives and sociodemographic data, and find a set of predictors related to the listed motives. A total of 306 middle-aged and elderly adults aged 50 or…
ERIC Educational Resources Information Center
Lavigne, John V.; LeBailly, Susan A.; Gouze, Karen R.; Binns, Helen J.; Keller, Jennifer; Pate, Lindsay
2010-01-01
This study examined the role of pretreatment demographic and clinical predictors of attendance as well as barriers to treatment and consumer satisfaction on attendance at therapist-led parent training with 86 families of children ages 3 to 6 years conducted in pediatric primary care settings. Only socioeconomic status (SES) and minority group…
Modjarrad, Kayvon; Zulu, Isaac; Redden, David T.; Njobvu, Lungowe; Freedman, David O.; Vermund, Sten H.
2009-01-01
Sub-Saharan Africa is disproportionately burdened by intestinal helminth and human immunodeficiency virus (HIV)-1 infection. Recent evidence suggests detrimental immunologic effects from concomitant infection with the two pathogens. Few studies, however, have assessed the prevalence of and predictors for intestinal helminth infection among HIV-1–infected adults in urban African settings where HIV infection rates are highest. We collected and analyzed sociodemographic and parasitologic data from 297 HIV-1–infected adults (mean age = 31.1 years, 69% female) living in Lusaka, Zambia to assess the prevalence and associated predictors of helminth infection. We found at least one type of intestinal helminth in 24.9% of HIV-infected adults. Thirty-nine (52.7%) were infected with Ascaris lumbricoides, and 29 (39.2%) were infected with hookworm. More than 80% were light-intensity infections. A recent visit to a rural area, food shortage, and prior history of helminth infection were significant predictors of current helminth status. The high helminth prevalence and potential for adverse interactions between helminths and HIV suggests that helminth diagnosis and treatment should be part of routine HIV care. PMID:16222025
Chiu, Herng-Chia; Ho, Te-Wei; Lee, King-Teh; Chen, Hong-Yaw; Ho, Wen-Hsien
2013-01-01
The aim of this present study is firstly to compare significant predictors of mortality for hepatocellular carcinoma (HCC) patients undergoing resection between artificial neural network (ANN) and logistic regression (LR) models and secondly to evaluate the predictive accuracy of ANN and LR in different survival year estimation models. We constructed a prognostic model for 434 patients with 21 potential input variables by Cox regression model. Model performance was measured by numbers of significant predictors and predictive accuracy. The results indicated that ANN had double to triple numbers of significant predictors at 1-, 3-, and 5-year survival models as compared with LR models. Scores of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) of 1-, 3-, and 5-year survival estimation models using ANN were superior to those of LR in all the training sets and most of the validation sets. The study demonstrated that ANN not only had a great number of predictors of mortality variables but also provided accurate prediction, as compared with conventional methods. It is suggested that physicians consider using data mining methods as supplemental tools for clinical decision-making and prognostic evaluation. PMID:23737707
Evaluation of a deep learning architecture for MR imaging prediction of ATRX in glioma patients
NASA Astrophysics Data System (ADS)
Korfiatis, Panagiotis; Kline, Timothy L.; Erickson, Bradley J.
2018-02-01
Predicting mutation/loss of alpha-thalassemia/mental retardation syndrome X-linked (ATRX) gene utilizing MR imaging is of high importance since it is a predictor of response and prognosis in brain tumors. In this study, we compare a deep neural network approach based on a residual deep neural network (ResNet) architecture and one based on a classical machine learning approach and evaluate their ability in predicting ATRX mutation status without the need for a distinct tumor segmentation step. We found that the ResNet50 (50 layers) architecture, pre trained on ImageNet data was the best performing model, achieving an accuracy of 0.91 for the test set (classification of a slice as no tumor, ATRX mutated, or mutated) in terms of f1 score in a test set of 35 cases. The SVM classifier achieved 0.63 for differentiating the Flair signal abnormality regions from the test patients based on their mutation status. We report a method that alleviates the need for extensive preprocessing and acts as a proof of concept that deep neural network architectures can be used to predict molecular biomarkers from routine medical images.
Cruz-Fuentes, Carlos S; Benjet, Corina; Martínez-Levy, Gabriela A; Pérez-Molina, Amado; Briones-Velasco, Magdalena; Suárez-González, Jesús
2014-03-01
The interplay among lifetime adversities and the genetic background has been previously examined on a variety of measures of depression; however, only few studies have focused on major depression disorder (MDD) in adolescence. Using clinical data and DNA samples from mouthwash gathered from an epidemiological study on the prevalence of mental disorders in youths between 12 and 17 years old, we tested the statistical interaction between a set of psychosocial adversities experienced during childhood (CAs) with two common polymorphisms in the brain-derived neurotrophic factor (BDNF) (Val66Met) and SLC6A4 (L/S) genes on the probability of suffering MDD in adolescence. Genotype or allele frequencies for both polymorphisms were similar between groups of comparison (MDD N = 246; controls N = 270). The CAs factors: Abuse, neglect, and family dysfunctions; parental maladjustment, parental death, and to have experienced a life-threatening physical illness were predictors of clinical depression in adolescents. Remarkably, the cumulative number of psychosocial adversities was distinctly associated with an increase in the prevalence of depression but only in those Val/Val BDNF individuals; while the possession of at least a copy of the BDNF Met allele (i.e., Met +) was statistically linked with a "refractory" or resilient phenotype to the noticeable influence of CAs. Liability or resilience to develop MDD in adolescence is dependent of a complex interplay between particular environmental exposures and a set of plasticity genes including BDNF. A better understanding of these factors is important for developing better prevention and early intervention measures.
Zhu, Xinyu; Ma, Hong; Chen, Zhiduan
2011-03-09
Plants contain numerous Su(var)3-9 homologues (SUVH) and related (SUVR) genes, some of which await functional characterization. Although there have been studies on the evolution of plant Su(var)3-9 SET genes, a systematic evolutionary study including major land plant groups has not been reported. Large-scale phylogenetic and evolutionary analyses can help to elucidate the underlying molecular mechanisms and contribute to improve genome annotation. Putative orthologs of plant Su(var)3-9 SET protein sequences were retrieved from major representatives of land plants. A novel clustering that included most members analyzed, henceforth referred to as core Su(var)3-9 homologues and related (cSUVHR) gene clade, was identified as well as all orthologous groups previously identified. Our analysis showed that plant Su(var)3-9 SET proteins possessed a variety of domain organizations, and can be classified into five types and ten subtypes. Plant Su(var)3-9 SET genes also exhibit a wide range of gene structures among different paralogs within a family, even in the regions encoding conserved PreSET and SET domains. We also found that the majority of SUVH members were intronless and formed three subclades within the SUVH clade. A detailed phylogenetic analysis of the plant Su(var)3-9 SET genes was performed. A novel deep phylogenetic relationship including most plant Su(var)3-9 SET genes was identified. Additional domains such as SAR, ZnF_C2H2 and WIYLD were early integrated into primordial PreSET/SET/PostSET domain organization. At least three classes of gene structures had been formed before the divergence of Physcomitrella patens (moss) from other land plants. One or multiple retroposition events might have occurred among SUVH genes with the donor genes leading to the V-2 orthologous group. The structural differences among evolutionary groups of plant Su(var)3-9 SET genes with different functions were described, contributing to the design of further experimental studies.
Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets.
Park, Inho; Lee, Kwang H; Lee, Doheon
2010-06-15
Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. Supplementary data are available at Bioinformatics online.
Phylogenetics and evolution of Trx SET genes in fully sequenced land plants.
Zhu, Xinyu; Chen, Caoyi; Wang, Baohua
2012-04-01
Plant Trx SET proteins are involved in H3K4 methylation and play a key role in plant floral development. Genes encoding Trx SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history of the Trx SET gene family, we made a comprehensive evolutionary analysis on this gene family from 13 major representatives of green plants. A novel clustering (here named as cpTrx clade), which included the III-1, III-2, and III-4 orthologous groups, previously resolved was identified. Our analysis showed that plant Trx proteins possessed a variety of domain organizations and gene structures among paralogs. Additional domains such as PHD, PWWP, and FYR were early integrated into primordial SET-PostSET domain organization of cpTrx clade. We suggested that the PostSET domain was lost in some members of III-4 orthologous group during the evolution of land plants. At least four classes of gene structures had been formed at the early evolutionary stage of land plants. Three intronless orphan Trx SET genes from the Physcomitrella patens (moss) were identified, and supposedly, their parental genes have been eliminated from the genome. The structural differences among evolutionary groups of plant Trx SET genes with different functions were described, contributing to the design of further experimental studies.
ADGO: analysis of differentially expressed gene sets using composite GO annotation.
Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun
2006-09-15
Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.
Shieh, Gwowen
2010-05-28
Due to its extensive applicability and computational ease, moderated multiple regression (MMR) has been widely employed to analyze interaction effects between 2 continuous predictor variables. Accordingly, considerable attention has been drawn toward the supposed multicollinearity problem between predictor variables and their cross-product term. This article attempts to clarify the misconception of multicollinearity in MMR studies. The counterintuitive yet beneficial effects of multicollinearity on the ability to detect moderator relationships are explored. Comprehensive treatments and numerical investigations are presented for the simplest interaction model and more complex three-predictor setting. The results provide critical insight that both helps avoid misleading interpretations and yields better understanding for the impact of intercorrelation among predictor variables in MMR analyses.
Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A
2014-11-15
Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Assessing Discriminative Performance at External Validation of Clinical Prediction Models
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.
2016-01-01
Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W
2016-01-01
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Comparative study on gene set and pathway topology-based enrichment methods.
Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim
2015-10-22
Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.
Pitfalls in statistical landslide susceptibility modelling
NASA Astrophysics Data System (ADS)
Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut
2010-05-01
The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.
Assessing the accuracy and stability of variable selection ...
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used, or stepwise procedures are employed which iteratively add/remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating dataset consists of the good/poor condition of n=1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p=212) of landscape features from the StreamCat dataset. Two types of RF models are compared: a full variable set model with all 212 predictors, and a reduced variable set model selected using a backwards elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors, and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substanti
Vuoristo-Myllys, Salla; Lipsanen, Jari; Lahti, Jari; Kalska, Hely; Alho, Hannu
2014-03-01
The opioid antagonist naltrexone, combined with cognitive behavioural therapy (CBT), has proven efficacious for patients with alcohol dependence, but studies examining how this treatment works in a naturalistic treatment setting are lacking. This study examined predictors of the outcome of targeted naltrexone and CBT in a real-life outpatient setting. Participants were 315 patients who attended a treatment program providing CBT combined with the targeted use of naltrexone. Mixture models for estimating developmental trajectories were used to examine change in patients' alcohol consumption and symptoms of alcohol craving from treatment entry until the end of the treatment (20 weeks) or dropout. Predictors of treatment outcome were examined with analyses of multinomial logistic regression. Minimal exclusion criteria were applied to enhance the generalizability of the findings. Regular drinking pattern, having no history of previous treatments, and high-risk alcohol consumption level before the treatment were associated with less change in alcohol use during the treatment. The patients with low-risk alcohol consumption level before the treatment had the most rapid reduction in alcohol craving. Patients who drank more alcohol during the treatment had lower adherence with naltrexone. Medication non-adherence is a major barrier to naltrexone's effectiveness in a real-life treatment setting. Patients with more severe alcohol problems may need more intensive treatment for achieving better treatment outcome in real-word treatment settings.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Contributing factors to the use of health-related websites.
Hong, Traci
2006-03-01
This study explicates the influence of audience factors on website credibility and the subsequent effect that credibility has on the intention to revisit a site. It does so in an experimental setting in which participants were given two health-related search tasks. Reliance on the web for health-related information positively influenced website credibility in both searches. Knowledge was a significant predictor for the search task that required more cognitive ability. Of the credibility dimensions, trust/expertise and depth were significant predictors of intention to revisit a site in both searches. Fairness and goodwill were nonsignificant predictors in both searches.
Husek, Petr; Pacovsky, Jaroslav; Chmelarova, Marcela; Podhola, Miroslav; Brodak, Milos
2017-06-01
Genetic and epigenetic alterations play an important role in urothelial cancer pathogenesis. Deeper understanding of these processes could help us achieve better diagnosis and management of this life-threatening disease. The aim of this research was to evaluate the methylation status of selected tumor suppressor genes for predicting BCG response in patients with high grade non-muscle-invasive bladder tumor (NMIBC). We retrospectively evaluated 82 patients with high grade non-muscle-invasive bladder tumor (stage Ta, T1, CIS) who had undergone BCG instillation therapy. We compared epigenetic methylation status in BCG-responsive and BCG-failure groups. We used the MS-MLPA (Methylation-Specific Multiplex Ligation-Dependent Probe Amplification probe sets ME001 and ME004. The control group was 13 specimens of normal urotel (bladder tissue)). Newly identified methylations in high grade NMIBC were found in MUS81a, NTRK1 and PCCA. The methylation status of CDKN2B (P=0.00312 ** ) and MUS81a (P=0.0191 * ) is associated with clinical outcomes of BCG instillation therapy response. CDKN2B and MUS81a unmethylation was found in BCG failure patients. The results show that the methylation status of selected tumor suppressor genes (TSGs) has the potential for predicting BCG response in patients with NMIBC high grade tumors. Tumor suppressor genes such as CDKN2b, MUS81a, PFM-1, MSH6 and THBS1 are very promising for future research.
Wang, Xiao; Zhang, Jun; Li, Guo-Zheng
2015-01-01
It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-01
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-11
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
PNAC: a protein nucleolar association classifier
2011-01-01
Background Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction of each protein pool that is nucleolus-associated nor whether their association is permanent or conditional. Results To describe the dynamic localisation of proteins in the nucleolus, we investigated the extent of nucleolar association of proteins by first collating an extensively curated literature-derived dataset. This dataset then served to train a probabilistic predictor which integrates gene and protein characteristics. Unlike most previous experimental and computational studies of the nucleolar proteome that produce large static lists of nucleolar proteins regardless of their extent of nucleolar association, our predictor models the fluidity of the nucleolus by considering different classes of nucleolar-associated proteins. The new method predicts all human proteins as either nucleolar-enriched, nucleolar-nucleoplasmic, nucleolar-cytoplasmic or non-nucleolar. Leave-one-out cross validation tests reveal sensitivity values for these four classes ranging from 0.72 to 0.90 and positive predictive values ranging from 0.63 to 0.94. The overall accuracy of the classifier was measured to be 0.85 on an independent literature-based test set and 0.74 using a large independent quantitative proteomics dataset. While the three nucleolar-association groups display vastly different Gene Ontology biological process signatures and evolutionary characteristics, they collectively represent the most well characterised nucleolar functions. Conclusions Our proteome-wide classification of nucleolar association provides a novel representation of the dynamic content of the nucleolus. This model of nucleolar localisation thus increases the coverage while providing accurate and specific annotations of the nucleolar proteome. It will be instrumental in better understanding the central role of the nucleolus in the cell and its interaction with other subcellular compartments. PMID:21272300
Douville, Christopher; Masica, David L.; Stenson, Peter D.; Cooper, David N.; Gygax, Derek M.; Kim, Rick; Ryan, Michael
2015-01-01
ABSTRACT Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features—DNA and protein sequence conservation, indel length, and occurrence in repeat regions—are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in‐frame and frameshift indels (VEST‐indel) as pathogenic or benign. We apply 24 features, including a new “PubMed” feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false‐positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta‐predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta‐predictor with improved performance over any individual method. PMID:26442818
Douville, Christopher; Masica, David L; Stenson, Peter D; Cooper, David N; Gygax, Derek M; Kim, Rick; Ryan, Michael; Karchin, Rachel
2016-01-01
Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method. © 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
A support vector machine based test for incongruence between sets of trees in tree space
2012-01-01
Background The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. Results Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. Conclusions The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. PMID:22909268
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
Counting Dependence Predictors
2008-05-02
sophisticated dependence predictors, such as Store Sets, have been tightly coupled to the fetch and ex- ecution streams, requiring global knowledge of...applicable to any architecture with distributed fetch and distributed memory banks, in which the comprehensive event completion knowledge needed by previous...adapted for Core Fusion [5] by giv- ing its steering management unit (SMU) the responsibilities of the controller core. While Ipek et al. describe how a
The use of genetic programming to develop a predictor of swash excursion on sandy beaches
NASA Astrophysics Data System (ADS)
Passarella, Marinella; Goldstein, Evan B.; De Muro, Sandro; Coco, Giovanni
2018-02-01
We use genetic programming (GP), a type of machine learning (ML) approach, to predict the total and infragravity swash excursion using previously published data sets that have been used extensively in swash prediction studies. Three previously published works with a range of new conditions are added to this data set to extend the range of measured swash conditions. Using this newly compiled data set we demonstrate that a ML approach can reduce the prediction errors compared to well-established parameterizations and therefore it may improve coastal hazards assessment (e.g. coastal inundation). Predictors obtained using GP can also be physically sound and replicate the functionality and dependencies of previous published formulas. Overall, we show that ML techniques are capable of both improving predictability (compared to classical regression approaches) and providing physical insight into coastal processes.
Reimers, Marlies S; Kuppen, Peter J K; Lee, Mark; Lopatin, Margarita; Tezcan, Haluk; Putter, Hein; Clark-Langone, Kim; Liefers, Gerrit Jan; Shak, Steve; van de Velde, Cornelis J H
2014-11-01
The 12-gene Recurrence Score assay is a validated predictor of recurrence risk in stage II and III colon cancer patients. We conducted a prospectively designed study to validate this assay for prediction of recurrence risk in stage II and III rectal cancer patients from the Dutch Total Mesorectal Excision (TME) trial. RNA was extracted from fixed paraffin-embedded primary rectal tumor tissue from stage II and III patients randomized to TME surgery alone, without (neo)adjuvant treatment. Recurrence Score was assessed by quantitative real time-polymerase chain reaction using previously validated colon cancer genes and algorithm. Data were analysed by Cox proportional hazards regression, adjusting for stage and resection margin status. All statistical tests were two-sided. Recurrence Score predicted risk of recurrence (hazard ratio [HR] = 1.57, 95% confidence interval [CI] = 1.11 to 2.21, P = .01), risk of distant recurrence (HR = 1.50, 95% CI = 1.04 to 2.17, P = .03), and rectal cancer-specific survival (HR = 1.64, 95% CI = 1.15 to 2.34, P = .007). The effect of Recurrence Score was most prominent in stage II patients and attenuated with more advanced stage (P(interaction) ≤ .007 for each endpoint). In stage II, five-year cumulative incidence of recurrence ranged from 11.1% in the predefined low Recurrence Score group (48.5% of patients) to 43.3% in the high Recurrence Score group (23.1% of patients). The 12-gene Recurrence Score is a predictor of recurrence risk and cancer-specific survival in rectal cancer patients treated with surgery alone, suggesting a similar underlying biology in colon and rectal cancers. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Virulotyping of Shigella spp. isolated from pediatric patients in Tehran, Iran.
Ranjbar, Reza; Bolandian, Masomeh; Behzadi, Payam
2017-03-01
Shigellosis is a considerable infectious disease with high morbidity and mortality among children worldwide. In this survey the prevalence of four important virulence genes including ial, ipaH, set1A, and set1B were investigated among Shigella strains and the related gene profiles identified in the present investigation, stool specimens were collected from children who were referred to two hospitals in Tehran, Iran. The samples were collected during 3 years (2008-2010) from children who were suspected to shigellosis. Shigella spp. were identified throughout microbiological and serological tests and then subjected to PCR for virulotyping. Shigella sonnei was ranking first (65.5%) followed by Shigella flexneri (25.9%), Shigella boydii (6.9%), and Shigella dysenteriae (1.7%). The ial gene was the most frequent virulence gene among isolated bacterial strains and was followed by ipaH, set1B, and set1A. S. flexneri possessed all of the studied virulence genes (ial 65.51%, ipaH 58.62%, set1A 12.07%, and set1B 22.41%). Moreover, the pattern of virulence gene profiles including ial, ial-ipaH, ial-ipaH-set1B, and ial-ipaH-set1B-set1A was identified for isolated Shigella spp. strains. The pattern of virulence genes is changed in isolated strains of Shigella in this study. So, the ial gene is placed first and the ipaH in second.
Generated effect modifiers (GEM’s) in randomized clinical trials
Petkova, Eva; Tarpey, Thaddeus; Su, Zhe; Ogden, R. Todd
2017-01-01
In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an “effect modifier”. Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration. PMID:27465235
Spatio-temporal Bayesian model selection for disease mapping
Carroll, R; Lawson, AB; Faes, C; Kirby, RS; Aregay, M; Watjou, K
2016-01-01
Spatio-temporal analysis of small area health data often involves choosing a fixed set of predictors prior to the final model fit. In this paper, we propose a spatio-temporal approach of Bayesian model selection to implement model selection for certain areas of the study region as well as certain years in the study time line. Here, we examine the usefulness of this approach by way of a large-scale simulation study accompanied by a case study. Our results suggest that a special case of the model selection methods, a mixture model allowing a weight parameter to indicate if the appropriate linear predictor is spatial, spatio-temporal, or a mixture of the two, offers the best option to fitting these spatio-temporal models. In addition, the case study illustrates the effectiveness of this mixture model within the model selection setting by easily accommodating lifestyle, socio-economic, and physical environmental variables to select a predominantly spatio-temporal linear predictor. PMID:28070156
Li, Qiyuan; Eklund, Aron C.; Juul, Nicolai; Haibe-Kains, Benjamin; Workman, Christopher T.; Richardson, Andrea L.; Szallasi, Zoltan; Swanton, Charles
2010-01-01
Background Expression of the oestrogen receptor (ER) in breast cancer predicts benefit from endocrine therapy. Minimising the frequency of false negative ER status classification is essential to identify all patients with ER positive breast cancers who should be offered endocrine therapies in order to improve clinical outcome. In routine oncological practice ER status is determined by semi-quantitative methods such as immunohistochemistry (IHC) or other immunoassays in which the ER expression level is compared to an empirical threshold[1], [2]. The clinical relevance of gene expression-based ER subtypes as compared to IHC-based determination has not been systematically evaluated. Here we attempt to reduce the frequency of false negative ER status classification using two gene expression approaches and compare these methods to IHC based ER status in terms of predictive and prognostic concordance with clinical outcome. Methodology/Principal Findings Firstly, ER status was discriminated by fitting the bimodal expression of ESR1 to a mixed Gaussian model. The discriminative power of ESR1 suggested bimodal expression as an efficient way to stratify breast cancer; therefore we identified a set of genes whose expression was both strongly bimodal, mimicking ESR expression status, and highly expressed in breast epithelial cell lines, to derive a 23-gene ER expression signature-based classifier. We assessed our classifiers in seven published breast cancer cohorts by comparing the gene expression-based ER status to IHC-based ER status as a predictor of clinical outcome in both untreated and tamoxifen treated cohorts. In untreated breast cancer cohorts, the 23 gene signature-based ER status provided significantly improved prognostic power compared to IHC-based ER status (P = 0.006). In tamoxifen-treated cohorts, the 23 gene ER expression signature predicted clinical outcome (HR = 2.20, P = 0.00035). These complementary ER signature-based strategies estimated that between 15.1% and 21.8% patients of IHC-based negative ER status would be classified with ER positive breast cancer. Conclusion/Significance Expression-based ER status classification may complement IHC to minimise false negative ER status classification and optimise patient stratification for endocrine therapies. PMID:21152022
Khan, Burhan A.; Robinson, Renee; Fohner, Alison E.; Muzquiz, LeeAnna I.; Schilling, Brian D.; Beans, Julie A.; Olnes, Matthew J.; Trawicki, Laura; Frydenlund, Holly; Laukes, Cindi; Beatty, Patrick; Phillips, Brian; Nickerson, Deborah; Howlett, Kevin; Dillard, Denise A.; Thornton, Timothy A.; Thummel, Kenneth E.
2018-01-01
Abstract Despite evidence that pharmacogenetics can improve tamoxifen pharmacotherapy, there are few studies with American Indian and Alaska Native (AIAN) people. We examined variation in cytochrome P450 (CYP) genes (CYP2D6, CYP3A4, CYP3A5, and CYP2C9) and tamoxifen biotransformation in AIAN patients with breast cancer (n = 42) from the Southcentral Foundation in Alaska and the Confederated Salish and Kootenai Tribes in Montana. We tested for associations between CYP diplotypes and plasma concentrations of tamoxifen and metabolites. Only the CYP2D6 variation was significantly associated with concentrations of endoxifen (P = 0.0008) and 4‐hydroxytamoxifen (P = 0.0074), tamoxifen's principal active metabolites, as well as key metabolic ratios. The CYP2D6 was also the most significant predictor of active metabolites and metabolic ratios in a multivariate regression model, including all four genes as predictors, with minor roles for other CYP genes. In AIAN populations, CYP2D6 is the largest contributor to tamoxifen bioactivation, illustrating the importance of validating pharmacogenetic testing for therapy optimization in an understudied population. PMID:29436156
Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng
2017-08-01
Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10 -10 ), MGC57346 (p value=6.92×10 -7 ), BLK (p value=1.01×10 -6 ), XKR6 (p value=1.11×10 -6 ), C17ORF69 (p value=1.12×10 -6 ) and KIAA1267 (p value=4.00×10 -6 ). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.
Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H
2015-12-01
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis
Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill
2016-01-01
Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956
Docking and scoring protein complexes: CAPRI 3rd Edition.
Lensink, Marc F; Méndez, Raúl; Wodak, Shoshana J
2007-12-01
The performance of methods for predicting protein-protein interactions at the atomic scale is assessed by evaluating blind predictions performed during 2005-2007 as part of Rounds 6-12 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). These Rounds also included a new scoring experiment, where a larger set of models contributed by the predictors was made available to groups developing scoring functions. These groups scored the uploaded set and submitted their own best models for assessment. The structures of nine protein complexes including one homodimer were used as targets. These targets represent biologically relevant interactions involved in gene expression, signal transduction, RNA, or protein processing and membrane maintenance. For all the targets except one, predictions started from the experimentally determined structures of the free (unbound) components or from models derived by homology, making it mandatory for docking methods to model the conformational changes that often accompany association. In total, 63 groups and eight automatic servers, a substantial increase from previous years, submitted docking predictions, of which 1994 were evaluated here. Fifteen groups submitted 305 models for five targets in the scoring experiment. Assessment of the predictions reveals that 31 different groups produced models of acceptable and medium accuracy-but only one high accuracy submission-for all the targets, except the homodimer. In the latter, none of the docking procedures reproduced the large conformational adjustment required for correct assembly, underscoring yet again that handling protein flexibility remains a major challenge. In the scoring experiment, a large fraction of the groups attained the set goal of singling out the correct association modes from incorrect solutions in the limited ensembles of contributed models. But in general they seemed unable to identify the best models, indicating that current scoring methods are probably not sensitive enough. With the increased focus on protein assemblies, in particular by structural genomics efforts, the growing community of CAPRI predictors is engaged more actively than ever in the development of better scoring functions and means of modeling conformational flexibility, which hold promise for much progress in the future. (c) 2007 Wiley-Liss, Inc.
TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.
Hu, Jun; Han, Ke; Li, Yang; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun
2016-11-01
The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the "intermediate" decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys .
Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis.
Cho, Seoae; Kim, Haseong; Oh, Sohee; Kim, Kyunga; Park, Taesung
2009-12-15
The current trend in genome-wide association studies is to identify regions where the true disease-causing genes may lie by evaluating thousands of single-nucleotide polymorphisms (SNPs) across the whole genome. However, many challenges exist in detecting disease-causing genes among the thousands of SNPs. Examples include multicollinearity and multiple testing issues, especially when a large number of correlated SNPs are simultaneously tested. Multicollinearity can often occur when predictor variables in a multiple regression model are highly correlated, and can cause imprecise estimation of association. In this study, we propose a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic-net regularization, a variable selection method that allows one to address multicollinearity. At Step 1, the single-marker association analysis was conducted to screen SNPs. At Step 2, the multiple-marker association was scanned based on the elastic-net regularization. The proposed approach was applied to the rheumatoid arthritis (RA) case-control data set of Genetic Analysis Workshop 16. While the selected SNPs at the screening step are located mostly on chromosome 6, the elastic-net approach identified putative RA-related SNPs on other chromosomes in an increased proportion. For some of those putative RA-related SNPs, we identified the interactions with sex, a well known factor affecting RA susceptibility.
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A
2014-01-01
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
Langer, David A.; Wood, Jeffrey J.; Wood, Patricia A.; Garland, Ann F.; Landsverk, John; Hough, Richard L.
2015-01-01
Researchers have consistently documented a gap between the large number of US youth meeting criteria for a mental health disorder with significant associated impairment, and the comparatively few youth receiving services. School-based mental health care may address the need–services gap by offering services more equitably to youth in need, irrespective of family economic resources, availability of transportation, and other factors that can impede access to community clinics. However, diagnoses alone do not fully capture the severity of an individual's mental health status and need for services. Studying service use only in relation to diagnoses may restrict our understanding of the degree to which service use is reflective of service need, and inhibit our ability to compare school and non-school-based outpatient settings on their responsiveness to service need. The present study evaluated predictors of mental health service use in school- and community-based settings for youth who had had an active case in one of two public sectors of care, comparing empirically-derived dimensional measurements of youth mental health service need and impairment ratings against non-need variables (e.g., ethnicity, income). Three dimensions of youth mental health service need were identified. Mental health service need and non-need variables each played a significant predictive role. Parent-rated impairment was the strongest need-based predictor of service use across settings. The impact of non-need variables varied by service setting, with parental income having a particularly noticeable effect on school-based services. Across time, preceding service use and impairment each significantly predicted future service use. PMID:26442131
Langer, David A; Wood, Jeffrey J; Wood, Patricia A; Garland, Ann F; Landsverk, John; Hough, Richard L
2015-09-01
Researchers have consistently documented a gap between the large number of US youth meeting criteria for a mental health disorder with significant associated impairment, and the comparatively few youth receiving services. School-based mental health care may address the need-services gap by offering services more equitably to youth in need, irrespective of family economic resources, availability of transportation, and other factors that can impede access to community clinics. However, diagnoses alone do not fully capture the severity of an individual's mental health status and need for services. Studying service use only in relation to diagnoses may restrict our understanding of the degree to which service use is reflective of service need, and inhibit our ability to compare school and non-school-based outpatient settings on their responsiveness to service need. The present study evaluated predictors of mental health service use in school- and community-based settings for youth who had had an active case in one of two public sectors of care, comparing empirically-derived dimensional measurements of youth mental health service need and impairment ratings against non-need variables (e.g., ethnicity, income). Three dimensions of youth mental health service need were identified. Mental health service need and non-need variables each played a significant predictive role. Parent-rated impairment was the strongest need-based predictor of service use across settings. The impact of non-need variables varied by service setting, with parental income having a particularly noticeable effect on school-based services. Across time, preceding service use and impairment each significantly predicted future service use.
Predictors of Spoken Language Learning
ERIC Educational Resources Information Center
Wong, Patrick C. M.; Ettlinger, Marc
2011-01-01
We report two sets of experiments showing that the large individual variability in language learning success in adults can be attributed to neurophysiological, neuroanatomical, cognitive, and perceptual factors. In the first set of experiments, native English-speaking adults learned to incorporate lexically meaningfully pitch patterns in words. We…
Dwivedi, Bhakti; Kowalski, Jeanne
2018-01-01
While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.
Dwivedi, Bhakti
2018-01-01
While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010
Binder, Elisabeth B.; Bradley, Rebekah G.; Liu, Wei; Epstein, Michael P.; Deveau, Todd C.; Mercer, Kristina B.; Tang, Yilang; Gillespie, Charles F.; Heim, Christine M.; Nemeroff, Charles B.; Schwartz, Ann C.; Cubells, Joseph F.; Ressler, Kerry J.
2008-01-01
Context In addition to trauma exposure, other factors contribute to risk for development of posttraumatic stress disorder (PTSD) in adulthood. Both genetic and environmental factors are contributory, with child abuse providing significant risk liability. Objective To increase understanding of genetic and environmental risk factors as well as their interaction in the development of PTSD by gene × environment interactions of child abuse, level of non–child abuse trauma exposure, and genetic polymorphisms at the stress-related gene FKBP5. Design, Setting, and Participants A cross-sectional study examining genetic and psychological risk factors in 900 non psychiatric clinic patients (762 included for all genotype studies) with significant levels of childhood abuse as well as non–child abuse trauma using a verbally presented survey combined with single-nucleotide polymorphism (SNP) genotyping. Participants were primarily urban, low-income, black (>95%) men and women seeking care in the general medical care and obstetrics-gynecology clinics of an urban public hospital in Atlanta, Georgia, between 2005 and 2007. Main Outcome Measures Severity of adult PTSD symptomatology, measured with the modified PTSD Symptom Scale, non–child abuse (primarily adult) trauma exposure and child abuse measured using the traumatic events inventory and 8 SNPs spanning the FKBP5 locus. Results Level of child abuse and non–child abuse trauma each separately predicted level of adult PTSD symptomatology (mean [SD], PTSD Symptom Scale for no child abuse, 8.03 [10.48] vs ≥2 types of abuse, 20.93 [14.32]; and for no non–child abuse trauma, 3.58 [6.27] vs ≥4 types, 16.74 [12.90]; P<.001). Although FKBP5 SNPs did not directly predict PTSD symptom outcome or interact with level of non–child abuse trauma to predict PTSD symptom severity, 4 SNPs in the FKBP5 locus significantly interacted (rs9296158, rs3800373, rs1360780, and rs9470080; minimum P=.0004) with the severity of child abuse to predict level of adult PTSD symptoms after correcting for multiple testing. This gene × environment interaction remained significant when controlling for depression severity scores, age, sex, levels of non–child abuse trauma exposure, and genetic ancestry. This genetic interaction was also paralleled by FKBP5 genotype-dependent and PTSD-dependent effects on glucocorticoid receptor sensitivity, measured by the dexamethasone suppression test. Conclusions Four SNPs of the FKBP5 gene interacted with severity of child abuse as a predictor of adult PTSD symptoms. There were no main effects of the SNPs on PTSD symptoms and no significant genetic interactions with level of non–child abuse trauma as predictor of adult PTSD symptoms, suggesting a potential gene-childhood environment interaction for adult PTSD. PMID:18349090
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jarvik, G.P.; Larson, E.B.; Goddard, K.
1996-01-01
The {epsilon}4 allele of the apolipoprotein E locus (APOE) has been found to be an important predictor of Alzheimer disease (AD). However, linkage analysis has not clarified the role of APOE in the transmission of AD. The results of the current study provide evidence that the pattern of transmission of memory disorders differs in nuclear families in which the AD-affected proband did carry an {epsilon}4 allele versus those families in which the AD-affected proband did not carry an {epsilon}4 allele. Further, risk of AD due to APOE genotype in the probands is modified by family history of memory disorders, suggestingmore » gene-by-gene interactions. Family history remained a significant predictor of AD for affected probands with some, but not all, APOE genotypes in a logistic regression analysis. Though nonadditive in the prediction of AD, APOE genotype and family history acted additively in the prediction of age at AD onset. The results of complex segregation analysis were inconsistent with Mendelian segregation of memory disorders both in families of affected probands who did or did not carry an {epsilon}4 allele, yet these two groups had significantly different parameter estimates for their transmission models. These results are consistent with gene-by-gene interactions, but also could result from common elements in the familial environment. 41 refs., 1 fig., 7 tabs.« less
Office-based treatment and outcomes for febrile infants with clinically diagnosed bronchiolitis.
Luginbuhl, Lynn M; Newman, Thomas B; Pantell, Robert H; Finch, Stacia A; Wasserman, Richard C
2008-11-01
The goals were to describe the (1) frequency of sepsis evaluation and empiric antibiotic treatment, (2) clinical predictors of management, and (3) serious bacterial illness frequency for febrile infants with clinically diagnosed bronchiolitis seen in office settings. The Pediatric Research in Office Settings network conducted a prospective cohort study of 3066 febrile infants (<3 months of age with temperatures >or=38 degrees C) in 219 practices in 44 states. We compared the frequency of sepsis evaluation, parenteral antibiotic treatment, and serious bacterial illness in infants with and without clinically diagnosed bronchiolitis. We identified predictors of sepsis evaluation and parenteral antibiotic treatment in infants with bronchiolitis by using logistic regression models. Practitioners were less likely to perform a complete sepsis evaluation, urine testing, and cerebrospinal fluid culture and to administer parenteral antibiotic treatment for infants with bronchiolitis, compared with those without bronchiolitis. Significant predictors of sepsis evaluation in infants with bronchiolitis included younger age, higher maximal temperature, and respiratory syncytial virus testing. Predictors of parenteral antibiotic use included initial ill appearance, age of <30 days, higher maximal temperature, and general signs of infant distress. Among infants with bronchiolitis (N = 218), none had serious bacterial illness and those with respiratory distress signs were less likely to receive parenteral antibiotic treatment. Diagnoses among 2848 febrile infants without bronchiolitis included bacterial meningitis (n = 14), bacteremia (n = 49), and urinary tract infection (n = 167). In office settings, serious bacterial illness in young febrile infants with clinically diagnosed bronchiolitis is uncommon. Limited testing for bacterial infections seems to be an appropriate management strategy.
NASA Astrophysics Data System (ADS)
Madonna, Erica; Ginsbourger, David; Martius, Olivia
2018-05-01
In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
Hvidhjelm, Jacob; Sestoft, Dorte; Skovgaard, Lene Theil; Bue Bjorner, Jakob
2014-11-01
Violence and aggressive behavior within psychiatric facilities are serious work environment problems, which have negative consequences for both patients and staff. It is therefore of great importance to reduce both the number and the severity of these violent incidents to improve quality of care. To evaluate the specificity and sensitivity of the Brøset Violence Checklist (BVC) as a predictor of violent incidents for Danish forensic psychiatry patients. A total of 156 patients were assessed three times daily with the BVC for 24 months. All aggressive or violent incidents were recorded using the Staff Observation Aggression Scale-Revised (SOAS-R). SOAS-R scores of 9 or more defined violent incidents. Data were analyzed using standard logistic regression models as well as models incorporating a random person effect. We used receiver operating curve (ROC) analysis to evaluate different BVC thresholds. Of a total of 139,579 BVC registrations we found 1999 scores above 0 and 419 violent incidents. The BVC score was a strong predictor of violence. For the standard cut-off point of 3, specificity was 0.997 and sensitivity was 0.656. For the general risk of violence seen in this study, the risk of violence given a BVC score > 3 (positive predictive value) was 37.2%, and the risk of violence given a BVC score < 3 (negative predictive value) was 0.1%. The BVC showed satisfactory specificity and sensitivity as a predictor of the short-term risk of violence against staff and others by patients in a forensic setting.
Ellison, Marsha Langer; Schutt, Russell K; Glickman, Mark E; Schultz, Mark R; Chinman, Matthew; Jensen, Kristina; Mitchell-Miland, Chantele; Smelson, David; Eisen, Susan
2016-09-01
Patterns and predictors of engagement in peer support services were examined among 50 previously homeless veterans with co-occurring mental health conditions and substance use histories receiving services from the Veterans Health Administration supported housing program. Veteran peer specialists were trained to deliver sessions focusing on mental health and substance use recovery to veterans for an intended 1-hr weekly contact over 9 months. Trajectories of peer engagement over the study's duration are summarized. A mixed-effects log-linear model of the rate of peer engagement is tested with three sets of covariates representing characteristics of the veterans. These sets were demographics, mental health and substance use status, and indicators of community participation and support. Data indicate that veterans engaged with peers about once per month rather than the intended once per week. However, frequency of contacts varied greatly. The best predictor of engagement was time, with most contacts occurring within the first 6 months. No other veteran characteristic was a statistically significant predictor of engagement. Older veterans tended to have higher rates of engagement with peer supporters. Planners of peer support services could consider yardsticks of monthly services up to 6 months. Peer support services need a flexible strategy with varying levels of intensity according to need. Peer support services will need to be tailored to better engage younger veterans. Future research should consider other sources of variation in engagement with peer support such as characteristics of the peer supporters and service content and setting. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Predictors and enablers of mental health nurses' family-focused practice.
Grant, Anne; Reupert, Andrea; Maybery, Darryl; Goodyear, Melinda
2018-06-27
Family-focused practice improves outcomes for families where parents have a mental illness. However, there is limited understanding regarding the factors that predict and enable these practices. This study aimed to identify factors that predict and enable mental health nurses' family-focused practice. A sequential mixed methods design was used. A total of 343 mental health nurses, practicing in 12 mental health services (in acute inpatient and community settings), throughout Ireland completed the Family Focused Mental Health Practice Questionnaire, measuring family-focused behaviours and other factors that impact family-focused activities. Hierarchical multiple regression identified 14 predictors of family-focused practice. The most important predictors noted were nurses' skill and knowledge, own parenting experience, and work setting (i.e. community). Fourteen nurses, who achieved high scores on the questionnaire, subsequently participated in semistructured interviews to elaborate on enablers of family-focused practice. Participants described drawing on their parenting experiences to normalize parenting challenges, encouraging service users to disclose parenting concerns, and promoting trust. The opportunity to visit a service user's home allowed them to observe how the parent was coping and forge a close relationship with them. Nurses' personal characteristics and work setting are key factors in determining family-focused practice. This study extends current research by clearly highlighting predictors of family-focused practice and reporting how various enablers promoted family-focused practice. The capacity of nurses to support families has training, organizational and policy implications within adult mental health services in Ireland and elsewhere. © 2018 Australian College of Mental Health Nurses Inc.
Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.
Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J
2017-01-01
The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
McCollum, Eric D; King, Carina; Hollowell, Robert; Zhou, Janet; Colbourn, Tim; Nambiar, Bejoy; Mukanga, David; Burgess, Deborah C Hay
2015-07-09
Improved referral algorithms for children with non-severe pneumonia at the community level are desirable. We sought to identify predictors of oral antibiotic failure in children who fulfill the case definition of World Health Organization (WHO) non-severe pneumonia. Predictors of greatest interest were those not currently utilized in referral algorithms and feasible to obtain at the community level. We systematically reviewed prospective studies reporting independent predictors of oral antibiotic failure for children 2-59 months of age in resource-limited settings with WHO non-severe pneumonia (either fast breathing for age and/or lower chest wall indrawing without danger signs), with an emphasis on predictors not currently utilized for referral and reasonable for community health workers. We searched PubMed, Cochrane, and Embase and qualitatively analyzed publications from 1997-2014. To supplement the limited published evidence in this subject area we also surveyed respiratory experts. Nine studies met criteria, seven of which were performed in south Asia. One eligible study occurred exclusively at the community level. Overall, oral antibiotic failure rates ranged between 7.8-22.9%. Six studies found excess age-adjusted respiratory rate (either WHO-defined very fast breathing for age or 10-15 breaths/min faster than normal WHO age-adjusted thresholds) and four reported young age as predictive for oral antibiotic failure. Of the seven predictors identified by the expert panel, abnormal oxygen saturation and malnutrition were most highly favored per the panel's rankings and comments. This review identified several candidate predictors of oral antibiotic failure not currently utilized in childhood pneumonia referral algorithms; excess age-specific respiratory rate, young age, abnormal oxygen saturation, and moderate malnutrition. However, the data was limited and there are clear evidence gaps; research in rural, low-resource settings with community health workers is needed.
Zhang, Xu; Chu, Qin; Guo, Gang; Dong, Ganghui; Li, Xizhi; Zhang, Qin; Zhang, Shengli; Zhang, Zhiwu; Wang, Yachun
2017-01-01
The growth and maturity of cattle body size affect not only feed efficiency, but also productivity and longevity. Dissecting the genetic architecture of body size is critical for cattle breeding to improve both efficiency and productivity. The volume and weight of body size are indicated by several measurements. Among them, Heart Girth (HG) and Hip Height (HH) are the most important traits. They are widely used as predictors of body weight (BW). Few association studies have been conducted for HG and HH in cattle focusing on single growth stage. In this study, we extended the Genome-wide association studies to a full spectrum of four growth stages (6-, 12-, 18-, and 24-months after birth) in Chinese Holstein heifers. The whole genomic single nucleotide polymorphisms (SNPs) were obtained from the Illumina BovineSNP50 v2 BeadChip genotyped on 3,325 individuals. Estimated breeding values (EBVs) were derived for both HG and HH at the four different ages and analyzed separately for GWAS by using the Fixed and random model Circuitous Probability Unification (FarmCPU) method. In total, 27 SNPs were identified to be significantly associated with HG and HH at different growth stages. We found 66 candidate genes located nearby the associated SNPs, including nine genes that were known as highly related to development and skeletal and muscular growth. In addition, biological function analysis was performed by Ingenuity Pathway Analysis and an interaction network related to development was obtained, which contained 16 genes out of the 66 candidates. The set of putative genes provided valuable resources and can help elucidate the genomic architecture and mechanisms underlying growth traits in dairy cattle.
Gene selection for tumor classification using neighborhood rough sets and entropy measures.
Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu
2017-03-01
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.
Curated eutherian third party data gene data sets.
Premzl, Marko
2016-03-01
The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Hester, Susan D; Nesnow, Stephen
2008-03-15
Conazoles are azole-containing fungicides that are used in agriculture and medicine. Conazoles can induce follicular cell adenomas of the thyroid in rats after chronic bioassay. The goal of this study was to identify pathways and networks of genes that were associated with thyroid tumorigenesis through transcriptional analyses. To this end, we compared transcriptional profiles from tissues of rats treated with a tumorigenic and a non-tumorigenic conazole. Triadimefon, a rat thyroid tumorigen, and myclobutanil, which was not tumorigenic in rats after a 2-year bioassay, were administered in the feed to male Wistar/Han rats for 30 or 90 days similar to the treatment conditions previously used in their chronic bioassays. Thyroid gene expression was determined using high density Affymetrix GeneChips (Rat 230_2). Gene expression was analyzed by the Gene Set Expression Analyses method which clearly separated the tumorigenic treatments (tumorigenic response group (TRG)) from the non-tumorigenic treatments (non-tumorigenic response group (NRG)). Core genes from these gene sets were mapped to canonical, metabolic, and GeneGo processes and these processes compared across group and treatment time. Extensive analyses were performed on the 30-day gene sets as they represented the major perturbations. Gene sets in the 30-day TRG group had over representation of fatty acid metabolism, oxidation, and degradation processes (including PPARgamma and CYP involvement), and of cell proliferation responses. Core genes from these gene sets were combined into networks and found to possess signaling interactions. In addition, the core genes in each gene set were compared with genes known to be associated with human thyroid cancer. Among the genes that appeared in both rat and human data sets were: Acaca, Asns, Cebpg, Crem, Ddit3, Gja1, Grn, Jun, Junb, and Vegf. These genes were major contributors in the previously developed network from triadimefon-treated rat thyroids. It is postulated that triadimefon induces oxidative response genes and activates the nuclear receptor, Ppargamma, initiating transcription of gene products and signaling to a series of genes involved in cell proliferation.
Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis.
de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Verheijen, Mark H G; Posthuma, Danielle
2015-11-01
Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis.
Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis
de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Scharf, J M; Pauls, D L; Yu, D; Illmann, C; Osiecki, L; Neale, B M; Mathews, C A; Reus, V I; Lowe, T L; Freimer, N B; Cox, N J; Davis, L K; Rouleau, G A; Chouinard, S; Dion, Y; Girard, S; Cath, D C; Posthuma, D; Smit, J H; Heutink, P; King, R A; Fernandez, T; Leckman, J F; Sandor, P; Barr, C L; McMahon, W; Lyon, G; Leppert, M; Morgan, J; Weiss, R; Grados, M A; Singer, H; Jankovic, J; Tischfield, J A; Heiman, G A; Verheijen, Mark H G; Posthuma, Danielle
2015-01-01
Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis. PMID:25735483
Psychological and Social Work Factors as Predictors of Mental Distress: A Prospective Study
Finne, Live Bakke; Christensen, Jan Olav; Knardahl, Stein
2014-01-01
Studies exploring psychological and social work factors in relation to mental health problems (anxiety and depression) have mainly focused on a limited set of exposures. The current study investigated prospectively a broad set of specific psychological and social work factors as predictors of potentially clinically relevant mental distress (anxiety and depression), i.e. “caseness” level of distress. Employees were recruited from 48 Norwegian organizations, representing a wide variety of job types. A total of 3644 employees responded at both baseline and at follow-up two years later. Respondents were distributed across 832 departments within the 48 organizations. Nineteen work factors were measured. Two prospective designs were tested: (i) with baseline predictors and (ii) with average exposure over time ([T1+T2]/2) as predictors. Random intercept logistic regressions were conducted to account for clustering of the data. Baseline “cases” were excluded (n = 432). Age, sex, skill level, and mental distress as a continuous variable at T1 were adjusted for. Fourteen of 19 factors showed some prospective association with mental distress. The most consistent risk factor was role conflict (highest odds ratio [OR] 2.08, 99% confidence interval [CI]: 1.45–3.00). The most consistent protective factors were support from immediate superior (lowest OR 0.56, 99% CI: 0.43–0.72), fair leadership (lowest OR 0.52, 99% CI: 0.40–0.68), and positive challenge (lowest OR 0.60, 99% CI: 0.41–0.86). The present study demonstrated that a broad set of psychological and social work factors predicted mental distress of potential clinical relevance. Some of the most consistent predictors were different from those traditionally studied. This highlights the importance of expanding the range of factors beyond commonly studied concepts like the demand-control model and the effort-reward imbalance model. PMID:25048033
Kayala, Matthew A; Baldi, Pierre
2012-10-22
Proposing reasonable mechanisms and predicting the course of chemical reactions is important to the practice of organic chemistry. Approaches to reaction prediction have historically used obfuscating representations and manually encoded patterns or rules. Here we present ReactionPredictor, a machine learning approach to reaction prediction that models elementary, mechanistic reactions as interactions between approximate molecular orbitals (MOs). A training data set of productive reactions known to occur at reasonable rates and yields and verified by inclusion in the literature or textbooks is derived from an existing rule-based system and expanded upon with manual curation from graduate level textbooks. Using this training data set of complex polar, hypervalent, radical, and pericyclic reactions, a two-stage machine learning prediction framework is trained and validated. In the first stage, filtering models trained at the level of individual MOs are used to reduce the space of possible reactions to consider. In the second stage, ranking models over the filtered space of possible reactions are used to order the reactions such that the productive reactions are the top ranked. The resulting model, ReactionPredictor, perfectly ranks polar reactions 78.1% of the time and recovers all productive reactions 95.7% of the time when allowing for small numbers of errors. Pericyclic and radical reactions are perfectly ranked 85.8% and 77.0% of the time, respectively, rising to >93% recovery for both reaction types with a small number of allowed errors. Decisions about which of the polar, pericyclic, or radical reaction type ranking models to use can be made with >99% accuracy. Finally, for multistep reaction pathways, we implement the first mechanistic pathway predictor using constrained tree-search to discover a set of reasonable mechanistic steps from given reactants to given products. Webserver implementations of both the single step and pathway versions of ReactionPredictor are available via the chemoinformatics portal http://cdb.ics.uci.edu/.
Prediction of epigenetically regulated genes in breast cancer cell lines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen
Methylation of CpG islands within the DNA promoter regions is one mechanism that leads to aberrant gene expression in cancer. In particular, the abnormal methylation of CpG islands may silence associated genes. Therefore, using high-throughput microarrays to measure CpG island methylation will lead to better understanding of tumor pathobiology and progression, while revealing potentially new biomarkers. We have examined a recently developed high-throughput technology for measuring genome-wide methylation patterns called mTACL. Here, we propose a computational pipeline for integrating gene expression and CpG island methylation profles to identify epigenetically regulated genes for a panel of 45 breast cancer cell lines,more » which is widely used in the Integrative Cancer Biology Program (ICBP). The pipeline (i) reduces the dimensionality of the methylation data, (ii) associates the reduced methylation data with gene expression data, and (iii) ranks methylation-expression associations according to their epigenetic regulation. Dimensionality reduction is performed in two steps: (i) methylation sites are grouped across the genome to identify regions of interest, and (ii) methylation profles are clustered within each region. Associations between the clustered methylation and the gene expression data sets generate candidate matches within a fxed neighborhood around each gene. Finally, the methylation-expression associations are ranked through a logistic regression, and their significance is quantified through permutation analysis. Our two-step dimensionality reduction compressed 90% of the original data, reducing 137,688 methylation sites to 14,505 clusters. Methylation-expression associations produced 18,312 correspondences, which were used to further analyze epigenetic regulation. Logistic regression was used to identify 58 genes from these correspondences that showed a statistically signifcant negative correlation between methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.« less
Opportunities for genetic improvement of metabolic diseases
USDA-ARS?s Scientific Manuscript database
Metabolic disorders are disturbances to one or more of the metabolic processes in dairy cattle. Dysfunction of any of these processes is associated with the manifestation of metabolic diseases or disorders. In this review, data recording, incidences, genetic parameters, predictors and status of gene...
More Precise Estimation of Lower-Level Interaction Effects in Multilevel Models.
Loeys, Tom; Josephy, Haeike; Dewitte, Marieke
2018-01-01
In hierarchical data, the effect of a lower-level predictor on a lower-level outcome may often be confounded by an (un)measured upper-level factor. When such confounding is left unaddressed, the effect of the lower-level predictor is estimated with bias. Separating this effect into a within- and between-component removes such bias in a linear random intercept model under a specific set of assumptions for the confounder. When the effect of the lower-level predictor is additionally moderated by another lower-level predictor, an interaction between both lower-level predictors is included into the model. To address unmeasured upper-level confounding, this interaction term ought to be decomposed into a within- and between-component as well. This can be achieved by first multiplying both predictors and centering that product term next, or vice versa. We show that while both approaches, on average, yield the same estimates of the interaction effect in linear models, the former decomposition is much more precise and robust against misspecification of the effects of cross-level and upper-level terms, compared to the latter.
Case-based retrieval framework for gene expression data.
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J
2015-01-01
The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.
ERIC Educational Resources Information Center
Ernst, Julie
2014-01-01
In efforts to encourage use of natural outdoor settings as learning environments within early childhood education, survey research was conducted with 46 early childhood educators from northern Minnesota (United States) to explore their beliefs and practices regarding natural outdoor settings, as well investigate predictors of and barriers to the…
Su, Hang; Li, Zhibin; Du, Jiang; Jiang, Haifeng; Chen, Zhikang; Sun, Haiming; Zhao, Min
2015-12-01
Relapse is a typical feature of heroin addiction and rooted in genetic and psychological determinants. The aim of this study was to evaluate the effect of personality traits, impulsivity, and COMT gene polymorphism (rs4680) on relapse to heroin use during 5-year follow up. 564 heroin dependent patients were enrolled in compulsory drug rehabilitation center. 12 months prior to their release, personality traits were measured by BIS-11 (Barratt Impulsiveness Scale-11) and Temperament and Character Inventory (TCI). The COMT gene rs4680 polymorphism was genotyped using a DNA sequence detection system. The heroin use status was evaluated for 5 years after discharged. Among the 564 heroin-dependent patients, 500 were followed for 5 years after discharge and 53.0% (n = 265) were considered as relapsed to heroin use according to a strict monitor system. Univariate analysis showed that age, having ever been in methadone maintenance treatment (MMT), the total scores and non-planning scores of BIS-11, and the COMT rs4680 gene variants were different between relapse and abstinent groups. Logistic regression analysis showed higher BIS total score, having ever been in MMT and younger first heroin use age are the predictors of relapse to heroin use during 5 years follow-up, and the COMT rs4680 gene had an interaction with BIS scores. Our findings indicated that the impulsive personality traits, methadone use history, and onset age could predict relapse in heroin-dependent patients during 5 year's follow up. The COMT gene showed a moderational effect in part the relationship of impulsivity with heroin relapse. © 2015 Wiley Periodicals, Inc.
Barbosa, P R; Stabler, S P; Machado, A L K; Braga, R C; Hirata, R D C; Hirata, M H; Sampaio-Neto, L F; Allen, R H; Guerra-Shinohara, E M
2008-08-01
To examine the association between methylenetetrahydrofolate reductase (MTHFR) (C677T and A1298C), methionine synthase (MTR) A2756G and methionine synthase reductase (MTRR) A66G gene polymorphisms and total homocysteine (tHcy), methylmalonic acid (MMA) and S-adenosylmethionine/S-adenosylhomocysteine (SAM/SAH) levels; and to evaluate the potential interactions with folate or cobalamin (Cbl) status. Two hundred seventy-five healthy women at labor who delivered full-term normal babies. Cbl, folate, tHcy, MMA, SAM and SAH were measured in serum specimens. The genotypes for polymorphisms were determined by PCR-restriction fragment length polymorphism (RFLP). Serum folate, MTHFR 677T allele and MTR 2756AA genotypes were the predictors of tHcy levels in pregnant women. Serum Cbl and creatinine were the predictors of SAM/SAH ratio and MMA levels, respectively. The gene polymorphisms were not determinants for MMA levels and SAM/SAH ratios. Low levels of serum folate were associated with elevated tHcy in pregnant women, independently of the gene polymorphisms. In pregnant women carrying MTHFR 677T allele, or MTHFR 1298AA or MTRR 66AA genotypes, lower Cbl levels were associated with higher levels of tHcy. Lower SAM/SAH ratio was found in MTHFR 677CC or MTRR A2756AA genotypes carriers when Cbl levels were lower than 142 pmol/l. Serum folate and MTHFR C677T and MTR A2576G gene polymorphisms were the determinants for tHcy levels. The interaction between low levels of serum Cbl and MTHFR (C677T or A1298C) or MTRR A66G gene polymorphisms was associated with increased tHcy.
Byappanahalli, M.N.; Przybyla-Kelly, K.; Shively, D.A.; Whitman, R.L.
2008-01-01
The enterococcal surface protein (esp) gene found in Enterococcus faecalis and E. faecium has recently been explored as a marker of sewage pollution in recreational waters but its occurrence and distribution in environmental enterococci has not been well-documented. If the esp gene is found in environmental samples, there are potential implications for microbial source tracking applications. In the current study, a total of 452 samples (lake water, 100; stream water, 129; nearshore sand, 96; and backshore sand, 71; Cladophora sp. (Chlorophyta), 41; and periphyton (mostly Bacillariophyceae), 15) collected from the coastal watersheds of southern Lake Michigan were selectively cultured for enterococci and then analyzed for the esp gene by PCR, targeting E. faecalis/ E. faecium (espfs/fm) and E. faecium (espfm). Overall relative frequencies for espfs/fm and espfm were 27.4 and 5.1%. Respective percent frequency for the espfs/fm and espfm was 36 and 14% in lake water; 38.8 and 2.3% in stream water; 24 and 6.3% in nearshore sand; 0% in backshore sand; 24.4 and 0% in Cladophora sp.; and 33.3 and 0% in periphyton. The overall occurrence of both espfs/fm and espfm was significantly related (χ2 = 49, P espfs/fm increased in lake and stream water and nearshore sand. Further, E. coli and enterococci cell densities were significant predictors for espfs/fm occurrence in post-rain lake water, but espfm was not. F+ coliphage densities were not significant predictors for espfm or espfs/fm gene incidence. In summary, the differential occurrence of the esp gene in the environment suggests that it is not limited to human fecal sources and thus may weaken its use as a reliable tool in discriminating contaminant sources (i.e., human vs nonhuman).
A Deep Machine Learning Algorithm to Optimize the Forecast of Atmospherics
NASA Astrophysics Data System (ADS)
Russell, A. M.; Alliss, R. J.; Felton, B. D.
Space-based applications from imaging to optical communications are significantly impacted by the atmosphere. Specifically, the occurrence of clouds and optical turbulence can determine whether a mission is a success or a failure. In the case of space-based imaging applications, clouds produce atmospheric transmission losses that can make it impossible for an electro-optical platform to image its target. Hence, accurate predictions of negative atmospheric effects are a high priority in order to facilitate the efficient scheduling of resources. This study seeks to revolutionize our understanding of and our ability to predict such atmospheric events through the mining of data from a high-resolution Numerical Weather Prediction (NWP) model. Specifically, output from the Weather Research and Forecasting (WRF) model is mined using a Random Forest (RF) ensemble classification and regression approach in order to improve the prediction of low cloud cover over the Haleakala summit of the Hawaiian island of Maui. RF techniques have a number of advantages including the ability to capture non-linear associations between the predictors (in this case physical variables from WRF such as temperature, relative humidity, wind speed and pressure) and the predictand (clouds), which becomes critical when dealing with the complex non-linear occurrence of clouds. In addition, RF techniques are capable of representing complex spatial-temporal dynamics to some extent. Input predictors to the WRF-based RF model are strategically selected based on expert knowledge and a series of sensitivity tests. Ultimately, three types of WRF predictors are chosen: local surface predictors, regional 3D moisture predictors and regional inversion predictors. A suite of RF experiments is performed using these predictors in order to evaluate the performance of the hybrid RF-WRF technique. The RF model is trained and tuned on approximately half of the input dataset and evaluated on the other half. The RF approach is validated using in-situ observations of clouds. All of the hybrid RF-WRF experiments demonstrated here significantly outperform the base WRF local low cloud cover forecasts in terms of the probability of detection and the overall bias. In particular, RF experiments that use only regional three-dimensional moisture predictors from the WRF model produce the highest accuracy when compared to RF experiments that use local surface predictors only or regional inversion predictors only. Furthermore, adding multiple types of WRF predictors and additional WRF predictors to the RF algorithm does not necessarily add more value in the resulting forecasts, indicating that it is better to have a small set of meaningful predictors than to have a vast set of indiscriminately-chosen predictors. This work also reveals that the WRF-based RF approach is highly sensitive to the time period over which the algorithm is trained and evaluated. Future work will focus on developing a similar WRF-based RF model for high cloud prediction and expanding the algorithm to two-dimensions horizontally.
Anomaly Detection Using an Ensemble of Feature Models
Noto, Keith; Brodley, Carla; Slonim, Donna
2011-01-01
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among some features that fail to appear in the anomalous examples. Our approach learns to predict the values of training set features from the values of other features. After we have formed an ensemble of predictors, we apply this ensemble to new data points. To combine the contribution of each predictor in our ensemble, we have developed a novel, information-theoretic anomaly measure that our experimental results show selects against noisy and irrelevant features. Our results on 47 data sets show that for most data sets, this approach significantly improves performance over current state-of-the-art feature space distance and density-based approaches. PMID:22020249
Consistency of gene starts among Burkholderia genomes
2011-01-01
Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528
NASA Astrophysics Data System (ADS)
Cadier, E.; Rossel, F.; Pouyaud, B.; Raymond, M.
2003-04-01
Coastal regions of Southern Ecuador and Northern Peru rainfalls are well known for their sensitivity to the El Niño/Southern Oscillation (ENSO) phenomenon. New monthly rainfall index series were set up from a network of 200 rainfall stations in the Ecuadorian and Peruvian coastal region. Throughout the study, rainfall was modelled keeping a distinction between a "dependent" data set used as a training period and an "independent" portion of the record reserved for validation. Multiple regression models were proposed to predict monthly rainfall in the Guayaquil and in northern coastal Peru, using as predictors, sea surface temperature, precipitation, meridional and zonal wind in the eastern equatorial Pacific. Then, the resulting equations were used to predict rainfall anomalies in the independent data set. In the Guayaquil zone, there is considerable predictable expertise for the rainy months of the year, the best predictability being assessed from March to May. The multiple linear correlations explain 60 to 82% of the monthly-precipitation variance. Northern coastal Ecuadorian region's preseason rainfall is the most powerful predictor for the rainy season peak in Guayaquil, while the eastern equatorial Pacific sea surface temperature is the most powerful predictor for the end of rainy season. KEY WORDS: El Niño, Rainfall Prediction, Ecuador.
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.
Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen
2015-05-01
Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Suarthana, Eva; Vergouwe, Yvonne; Moons, Karel G; de Monchy, Jan; Grobbee, Diederick; Heederik, Dick; Meijer, Evert
2010-09-01
To develop and validate a prediction model to detect sensitization to wheat allergens in bakery workers. The prediction model was developed in 867 Dutch bakery workers (development set, prevalence of sensitization 13%) and included questionnaire items (candidate predictors). First, principal component analysis was used to reduce the number of candidate predictors. Then, multivariable logistic regression analysis was used to develop the model. Internal validation and extent of optimism was assessed with bootstrapping. External validation was studied in 390 independent Dutch bakery workers (validation set, prevalence of sensitization 20%). The prediction model contained the predictors nasoconjunctival symptoms, asthma symptoms, shortness of breath and wheeze, work-related upper and lower respiratory symptoms, and traditional bakery. The model showed good discrimination with an area under the receiver operating characteristic (ROC) curve area of 0.76 (and 0.75 after internal validation). Application of the model in the validation set gave a reasonable discrimination (ROC area=0.69) and good calibration after a small adjustment of the model intercept. A simple model with questionnaire items only can be used to stratify bakers according to their risk of sensitization to wheat allergens. Its use may increase the cost-effectiveness of (subsequent) medical surveillance.
Salnikova, Lyubov E; Smelaya, Tamara V; Golubev, Arkadiy M; Rubanovich, Alexander V; Moroz, Viktor V
2013-11-01
This study was conducted to establish the possible contribution of functional gene polymorphisms in detoxification/oxidative stress and vascular remodeling pathways to community-acquired pneumonia (CAP) susceptibility in the case-control study (350 CAP patients, 432 control subjects) and to predisposition to the development of CAP complications in the prospective study. All subjects were genotyped for 16 polymorphic variants in the 14 genes of xenobiotics detoxification CYP1A1, AhR, GSTM1, GSTT1, ABCB1, redox-status SOD2, CAT, GCLC, and vascular homeostasis ACE, AGT, AGTR1, NOS3, MTHFR, VEGFα. Risk of pulmonary complications (PC) in the single locus analysis was associated with CYP1A1, GCLC and AGTR1 genes. Extra PC (toxic shock syndrome and myocarditis) were not associated with these genes. We evaluated gene-gene interactions using multi-factor dimensionality reduction, and cumulative gene risk score approaches. The final model which included >5 risk alleles in the CYP1A1 (rs2606345, rs4646903, rs1048943), GCLC, AGT, and AGTR1 genes was associated with pleuritis, empyema, acute respiratory distress syndrome, all PC and acute respiratory failure (ARF). We considered CYP1A1, GCLC, AGT, AGTR1 gene set using Set Distiller mode implemented in GeneDecks for discovering gene-set relations via the degree of sharing descriptors within a given gene set. N-acetylcysteine and oxygen were defined by Set Distiller as the best descriptors for the gene set associated in the present study with PC and ARF. Results of the study are in line with literature data and suggest that genetically determined oxidative stress exacerbation may contribute to the progression of lung inflammation.
Insecticide treated bednet strategy in rural settings: can we exploit women's decision making power?
Tilak, Rina; Tilak, V W; Bhalwar, R
2007-01-01
Use of insecticide treated bednets in prevention of malaria is a widely propagated global strategy, however, its use has been reported to be influenced and limited by many variables especially gender bias. A cross sectional field epidemiological study was conducted in a rural setting with two outcome variables, 'Bednet use'(primary outcome variable) and 'Women's Decision Making Power' which were studied in reference to various predictor variables. Analysis reveals a significant effect on the primary outcome variable 'Bednet use' of the predictor variables- age, occupation, bednet purchase decision, women's decision making power, husband's education and knowledge about malaria and its prevention. The study recommends IEC on treated bednets to be disseminated through TV targeting the elderly women who have better decision making power and mobilizing younger women who were found to prefer bednets for prevention of mosquito bites for optimizing the use of treated bednets in similar settings.
Jaiswal, Deepika; Jezek, Meagan; Quijote, Jeremiah; Lum, Joanna; Choi, Grace; Kulkarni, Rushmie; Park, DoHwan; Green, Erin M.
2017-01-01
The conserved yeast histone methyltransferase Set1 targets H3 lysine 4 (H3K4) for mono, di, and trimethylation and is linked to active transcription due to the euchromatic distribution of these methyl marks and the recruitment of Set1 during transcription. However, loss of Set1 results in increased expression of multiple classes of genes, including genes adjacent to telomeres and middle sporulation genes, which are repressed under normal growth conditions because they function in meiotic progression and spore formation. The mechanisms underlying Set1-mediated gene repression are varied, and still unclear in some cases, although repression has been linked to both direct and indirect action of Set1, associated with noncoding transcription, and is often dependent on the H3K4me2 mark. We show that Set1, and particularly the H3K4me2 mark, are implicated in repression of a subset of middle sporulation genes during vegetative growth. In the absence of Set1, there is loss of the DNA-binding transcriptional regulator Sum1 and the associated histone deacetylase Hst1 from chromatin in a locus-specific manner. This is linked to increased H4K5ac at these loci and aberrant middle gene expression. These data indicate that, in addition to DNA sequence, histone modification status also contributes to proper localization of Sum1. Our results also show that the role for Set1 in middle gene expression control diverges as cells receive signals to undergo meiosis. Overall, this work dissects an unexplored role for Set1 in gene-specific repression, and provides important insights into a new mechanism associated with the control of gene expression linked to meiotic differentiation. PMID:29066473
Age is no barrier: predictors of academic success in older learners
NASA Astrophysics Data System (ADS)
Imlach, Abbie-Rose; Ward, David D.; Stuart, Kimberley E.; Summers, Mathew J.; Valenzuela, Michael J.; King, Anna E.; Saunders, Nichole L.; Summers, Jeffrey; Srikanth, Velandai K.; Robinson, Andrew; Vickers, James C.
2017-11-01
Although predictors of academic success have been identified in young adults, such predictors are unlikely to translate directly to an older student population, where such information is scarce. The current study aimed to examine cognitive, psychosocial, lifetime, and genetic predictors of university-level academic performance in older adults (50-79 years old). Participants were mostly female (71%) and had a greater than high school education level (M = 14.06 years, SD = 2.76), on average. Two multiple linear regression analyses were conducted. The first examined all potential predictors of grade point average (GPA) in the subset of participants who had volunteered samples for genetic analysis (N = 181). Significant predictors of GPA were then re-examined in a second multiple linear regression using the full sample (N = 329). Our data show that the cognitive domains of episodic memory and language processing, in conjunction with midlife engagement in cognitively stimulating activities, have a role in predicting academic performance as measured by GPA in the first year of study. In contrast, it was determined that age, IQ, gender, working memory, psychosocial factors, and common brain gene polymorphisms linked to brain function, plasticity and degeneration (APOE, BDNF, COMT, KIBRA, SERT) did not influence academic performance. These findings demonstrate that ageing does not impede academic achievement, and that discrete cognitive skills as well as lifetime engagement in cognitively stimulating activities can promote academic success in older adults.
McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.
2016-01-01
ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537
Rose, Danielle E; Tisnado, Diana M; Tao, May L; Malin, Jennifer L; Adams, John L; Ganz, Patricia A; Kahn, Katherine L
2012-06-01
Physician co-management, representing joint participation in the planning, decision-making, and delivery of care, is often cited in association with coordination of care. Yet little is known about how physicians manage tasks and how their management style impacts patient outcomes. To describe physician practice style using breast cancer as a model. We characterize correlates and predictors of physician practice style for 10 clinical tasks, and then test for associations between physician practice style and patient ratings of care. We queried 347 breast cancer physicians identified by a population-based cohort of women with incident breast cancer regarding care using a clinical vignette about a hypothetical 65-year-old diabetic woman with incident breast cancer. To test the association between physician practice style and patient outcomes, we linked medical oncologists' responses to patient ratings of care (physician n=111; patient n=411). After adjusting for physician and practice setting characteristics, physician practice style varied by physician specialty, practice setting, financial incentives, and barriers to referrals. Patients with medical oncologists who co-managed tasks had higher patient ratings of care. Physician practice style for breast cancer is influenced by provider and practice setting characteristics, and it is an important predictor of patient ratings. We identify physician and practice setting factors associated with physician practice style and found associations between physician co-management and patient outcomes (e.g., patient ratings of care). © Health Research and Educational Trust.
Wirth, Christian; Schumacher, Jens; Schulze, Ernst-Detlef
2004-02-01
To facilitate future carbon and nutrient inventories, we used mixed-effect linear models to develop new generic biomass functions for Norway spruce (Picea abies (L.) Karst.) in Central Europe. We present both the functions and their respective variance-covariance matrices and illustrate their application for biomass prediction and uncertainty estimation for Norway spruce trees ranging widely in size, age, competitive status and site. We collected biomass data for 688 trees sampled in 102 stands by 19 authors. The total number of trees in the "base" model data sets containing the predictor variables diameter at breast height (D), height (H), age (A), site index (SI) and site elevation (HSL) varied according to compartment (roots: n = 114, stem: n = 235, dry branches: n = 207, live branches: n = 429 and needles: n = 551). "Core" data sets with about 40% fewer trees could be extracted containing the additional predictor variables crown length and social class. A set of 43 candidate models representing combinations of lnD, lnH, lnA, SI and HSL, including second-order polynomials and interactions, was established. The categorical variable "author" subsuming mainly methodological differences was included as a random effect in a mixed linear model. The Akaike Information Criterion was used for model selection. The best models for stem, root and branch biomass contained only combinations of D, H and A as predictors. More complex models that included site-related variables resulted for needle biomass. Adding crown length as a predictor for needles, branches and roots reduced both the bias and the confidence interval of predictions substantially. Applying the best models to a test data set of 17 stands ranging in age from 16 to 172 years produced realistic allocation patterns at the tree and stand levels. The 95% confidence intervals (% of mean prediction) were highest for crown compartments (approximately +/- 12%) and lowest for stem biomass (approximately +/- 5%), and within each compartment, they were highest for the youngest and oldest stands, respectively.
Al-Khafaji, Ahmed S K; Marcus, Michael W; Davies, Michael P A; Risk, Janet M; Shaw, Richard J; Field, John K; Liloglou, Triantafillos
2017-06-01
Deregulation of mitotic spindle genes has been reported to contribute to the development and progression of malignant tumours. The aim of the present study was to explore the association between the expression profiles of Aurora kinases ( AURKA , AURKB and AURKC ), cytoskeleton-associated protein 5 ( CKAP5 ), discs large-associated protein 5 ( DLGAP5 ), kinesin-like protein 11 ( KIF11 ), microtubule nucleation factor ( TPX2 ), monopolar spindle 1 kinase ( TTK ), and β-tubulins ( TUBB ) and ( TUBB3 ) genes and clinicopathological characteristics in human non-small cell lung carcinoma (NSCLC). Reverse transcription-quantitative polymerase chain reaction-based RNA gene expression profiles of 132 NSCLC and 44 adjacent wild-type tissues were generated, and Cox's proportional hazard regression was used to examine associations. With the exception of AURKC , all genes exhibited increased expression in NSCLC tissues. Of the 10 genes examined, only AURKA was significantly associated with prognosis in NSCLC. Multivariate Cox's regression analysis demonstrated that AURKA mRNA expression [hazard ratio (HR), 1.81; 95% confidence interval (CI), 1.16-2.84; P=0.009], age (HR, 1.03; 95% CI, 1.00-1.06; P=0.020), pathological tumour stage 2 (HR, 2.43; 95% CI, 1.16-5.10; P=0.019) and involvement of distal nodes (pathological node stage 2) (HR, 3.14; 95% CI, 1.24-7.99; P=0.016) were independent predictors of poor prognosis in patients with NSCLC. Poor prognosis of patients with increased AURKA expression suggests that those patients may benefit from surrogate therapy with AURKA inhibitors.
Discrimination, Acculturation and Other Predictors of Depression among Pregnant Hispanic Women
Walker, Janiece L.; Ruiz, R. Jeanne; Chinn, Juanita J.; Marti, Nathan; Ricks, Tiffany N.
2012-01-01
Objective The purpose of our study was to examine the effects of socioeconomic status, acculturative stress, discrimination, and marginalization as predictors of depression in pregnant Hispanic women. Design A prospective observational design was used. Setting Central and Gulf coast areas of Texas in obstetrical offices. Participants A convenience sample of 515 pregnant, low income, low medical risk, and self-identified Hispanic women who were between 22–24 weeks gestation was used to collect data. Measures The predictor variables were socioeconomic status, discrimination, acculturative stress, and marginalization. The outcome variable was depression. Results Education, frequency of discrimination, age, and Anglo marginality were significant predictors of depressive symptoms in a linear regression model, F (6, 458) = 8.36, P<.0001. Greater frequency of discrimination was the strongest positive predictor of increased depressive symptoms. Conclusions It is important that health care providers further understand the impact that age and experiences of discrimination throughout the life course have on depressive symptoms during pregnancy. PMID:23140083
Laska, Matthias; Genzel, Daria; Wieser, Alexandra
2005-02-01
The ability of four squirrel monkeys and three pigtail macaques to distinguish between nine enantiomeric odor pairs sharing an isopropenyl group at the chiral center was investigated in terms of a conditioning paradigm. All animals from both species were able to discriminate between the optical isomers of limonene, carvone, dihydrocarvone, dihydrocarveole and dihydrocarvyl acetate, whereas they failed to distinguish between the (+)- and (-)-forms of perillaaldehyde and limonene oxide. The pigtail macaques, but not the squirrel monkeys, also discriminated between the antipodes of perillaalcohol and isopulegol. A comparison of the across-task patterns of discrimination performance shows a high degree of similarity among the two primate species and also between these nonhuman primates and human subjects tested in an earlier study on the same tasks. These findings suggest that between-species comparisons of the relative size of olfactory brain structures or of the number of functional olfactory receptor genes are poor predictors of olfactory discrimination performance with enantiomers.
Translational systems pharmacology‐based predictive assessment of drug‐induced cardiomyopathy
Messinis, Dimitris E.; Melas, Ioannis N.; Hur, Junguk; Varshney, Navya; Alexopoulos, Leonidas G.
2018-01-01
Drug‐induced cardiomyopathy contributes to drug attrition. We compared two pipelines of predictive modeling: (1) applying elastic net (EN) to differentially expressed genes (DEGs) of drugs; (2) applying integer linear programming (ILP) to construct each drug's signaling pathway starting from its targets to downstream proteins, to transcription factors, and to its DEGs in human cardiomyocytes, and then subjecting the genes/proteins in the drugs' signaling networks to EN regression. We classified 31 drugs with availability of DEGs into 13 toxic and 18 nontoxic drugs based on a clinical cardiomyopathy incidence cutoff of 0.1%. The ILP‐augmented modeling increased prediction accuracy from 79% to 88% (sensitivity: 88%; specificity: 89%) under leave‐one‐out cross validation. The ILP‐constructed signaling networks of drugs were better predictors than DEGs. Per literature, the microRNAs that reportedly regulate expression of our six top predictors are of diagnostic value for natural heart failure or doxorubicin‐induced cardiomyopathy. This translational predictive modeling might uncover potential biomarkers. PMID:29341478
Inference of Evolutionary Forces Acting on Human Biological Pathways
Daub, Josephine T.; Dupanloup, Isabelle; Robinson-Rechavi, Marc; Excoffier, Laurent
2015-01-01
Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures. PMID:25971280
Smith predictor based-sliding mode controller for integrating processes with elevated deadtime.
Camacho, Oscar; De la Cruz, Francisco
2004-04-01
An approach to control integrating processes with elevated deadtime using a Smith predictor sliding mode controller is presented. A PID sliding surface and an integrating first-order plus deadtime model have been used to synthesize the controller. Since the performance of existing controllers with a Smith predictor decrease in the presence of modeling errors, this paper presents a simple approach to combining the Smith predictor with the sliding mode concept, which is a proven, simple, and robust procedure. The proposed scheme has a set of tuning equations as a function of the characteristic parameters of the model. For implementation of our proposed approach, computer based industrial controllers that execute PID algorithms can be used. The performance and robustness of the proposed controller are compared with the Matausek-Micić scheme for linear systems using simulations.
Lester, David
2012-10-01
In a sample of 140 undergraduate students, measures of defeat and entrapment and of haplessness, helplessness, and hopelessness were similar in their ability to predict depression and suicidality. It was concluded that the two sets of measures may tap the same cognitive mind set.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; Taylor, Ronald C.; Weisenhorn, Pamela; Olson, Robert D.; Stevens, Rick L.; Rocha, Miguel; Rocha, Isabel; Best, Aaron A.; DeJongh, Matthew; Tintle, Nathan L.; Parrello, Bruce; Overbeek, Ross; Henry, Christopher S.
2016-01-01
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. An important step toward meeting the challenge of understanding gene function and regulation is the identification of sets of genes that are always co-expressed. These gene sets, Atomic Regulons (ARs), represent fundamental units of function within a cell and could be used to associate genes of unknown function with cellular processes and to enable rational genetic engineering of cellular systems. Here, we describe an approach for inferring ARs that leverages large-scale expression data sets, gene context, and functional relationships among genes. We computed ARs for Escherichia coli based on 907 gene expression experiments and compared our results with gene clusters produced by two prevalent data-driven methods: Hierarchical clustering and k-means clustering. We compared ARs and purely data-driven gene clusters to the curated set of regulatory interactions for E. coli found in RegulonDB, showing that ARs are more consistent with gold standard regulons than are data-driven gene clusters. We further examined the consistency of ARs and data-driven gene clusters in the context of gene interactions predicted by Context Likelihood of Relatedness (CLR) analysis, finding that the ARs show better agreement with CLR predicted interactions. We determined the impact of increasing amounts of expression data on AR construction and find that while more data improve ARs, it is not necessary to use the full set of gene expression experiments available for E. coli to produce high quality ARs. In order to explore the conservation of co-regulated gene sets across different organisms, we computed ARs for Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus, each of which represents increasing degrees of phylogenetic distance from E. coli. Comparison of the organism-specific ARs showed that the consistency of AR gene membership correlates with phylogenetic distance, but there is clear variability in the regulatory networks of closely related organisms. As large scale expression data sets become increasingly common for model and non-model organisms, comparative analyses of atomic regulons will provide valuable insights into fundamental regulatory modules used across the bacterial domain. PMID:27933038
Hudson, Jennifer L.; Keers, Robert; Roberts, Susanna; Coleman, Jonathan R.I.; Breen, Gerome; Arendt, Kristian; Bögels, Susan; Cooper, Peter; Creswell, Cathy; Hartman, Catharina; Heiervang, Einar R.; Hötzel, Katrin; In-Albon, Tina; Lavallee, Kristen; Lyneham, Heidi J.; Marin, Carla E.; McKinnon, Anna; Meiser-Stedman, Richard; Morris, Talia; Nauta, Maaike; Rapee, Ronald M.; Schneider, Silvia; Schneider, Sophie C.; Silverman, Wendy K.; Thastum, Mikael; Thirlwall, Kerstin; Waite, Polly; Wergeland, Gro Janne; Lester, Kathryn J.; Eley, Thalia C.
2015-01-01
Objective The Genes for Treatment study is an international, multisite collaboration exploring the role of genetic, demographic, and clinical predictors in response to cognitive-behavioral therapy (CBT) in pediatric anxiety disorders. The current article, the first from the study, examined demographic and clinical predictors of response to CBT. We hypothesized that the child’s gender, type of anxiety disorder, initial severity and comorbidity, and parents’ psychopathology would significantly predict outcome. Method A sample of 1,519 children 5 to 18 years of age with a primary anxiety diagnosis received CBT across 11 sites. Outcome was defined as response (change in diagnostic severity) and remission (absence of the primary diagnosis) at each time point (posttreatment, 3-, 6-, and/or 12-month follow-up) and analyzed using linear and logistic mixed models. Separate analyses were conducted using data from posttreatment and follow-up assessments to explore the relative importance of predictors at these time points. Results Individuals with social anxiety disorder (SoAD) had significantly poorer outcomes (poorer response and lower rates of remission) than those with generalized anxiety disorder (GAD). Although individuals with specific phobia (SP) also had poorer outcomes than those with GAD at posttreatment, these differences were not maintained at follow-up. Both comorbid mood and externalizing disorders significantly predicted poorer outcomes at posttreatment and follow-up, whereas self-reported parental psychopathology had little effect on posttreatment outcomes but significantly predicted response (although not remission) at follow-up. Conclusion SoAD, nonanxiety comorbidity, and parental psychopathology were associated with poorer outcomes after CBT. The results highlight the need for enhanced treatments for children at risk for poorer outcomes. PMID:26004660
Hudson, Jennifer L; Keers, Robert; Roberts, Susanna; Coleman, Jonathan R I; Breen, Gerome; Arendt, Kristian; Bögels, Susan; Cooper, Peter; Creswell, Cathy; Hartman, Catharina; Heiervang, Einar R; Hötzel, Katrin; In-Albon, Tina; Lavallee, Kristen; Lyneham, Heidi J; Marin, Carla E; McKinnon, Anna; Meiser-Stedman, Richard; Morris, Talia; Nauta, Maaike; Rapee, Ronald M; Schneider, Silvia; Schneider, Sophie C; Silverman, Wendy K; Thastum, Mikael; Thirlwall, Kerstin; Waite, Polly; Wergeland, Gro Janne; Lester, Kathryn J; Eley, Thalia C
2015-06-01
The Genes for Treatment study is an international, multisite collaboration exploring the role of genetic, demographic, and clinical predictors in response to cognitive-behavioral therapy (CBT) in pediatric anxiety disorders. The current article, the first from the study, examined demographic and clinical predictors of response to CBT. We hypothesized that the child's gender, type of anxiety disorder, initial severity and comorbidity, and parents' psychopathology would significantly predict outcome. A sample of 1,519 children 5 to 18 years of age with a primary anxiety diagnosis received CBT across 11 sites. Outcome was defined as response (change in diagnostic severity) and remission (absence of the primary diagnosis) at each time point (posttreatment, 3-, 6-, and/or 12-month follow-up) and analyzed using linear and logistic mixed models. Separate analyses were conducted using data from posttreatment and follow-up assessments to explore the relative importance of predictors at these time points. Individuals with social anxiety disorder (SoAD) had significantly poorer outcomes (poorer response and lower rates of remission) than those with generalized anxiety disorder (GAD). Although individuals with specific phobia (SP) also had poorer outcomes than those with GAD at posttreatment, these differences were not maintained at follow-up. Both comorbid mood and externalizing disorders significantly predicted poorer outcomes at posttreatment and follow-up, whereas self-reported parental psychopathology had little effect on posttreatment outcomes but significantly predicted response (although not remission) at follow-up. SoAD, nonanxiety comorbidity, and parental psychopathology were associated with poorer outcomes after CBT. The results highlight the need for enhanced treatments for children at risk for poorer outcomes. Copyright © 2015 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.
Bonizzoni, Paola; Rizzi, Raffaella; Pesole, Graziano
2005-10-05
Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems--hence the need to develop novel strategies. We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.
Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila
2017-07-12
The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets. SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.
Elloumi, Fathi; Hu, Zhiyuan; Li, Yan; Parker, Joel S; Gulley, Margaret L; Amos, Keith D; Troester, Melissa A
2011-06-30
Genomic tests are available to predict breast cancer recurrence and to guide clinical decision making. These predictors provide recurrence risk scores along with a measure of uncertainty, usually a confidence interval. The confidence interval conveys random error and not systematic bias. Standard tumor sampling methods make this problematic, as it is common to have a substantial proportion (typically 30-50%) of a tumor sample comprised of histologically benign tissue. This "normal" tissue could represent a source of non-random error or systematic bias in genomic classification. To assess the performance characteristics of genomic classification to systematic error from normal contamination, we collected 55 tumor samples and paired tumor-adjacent normal tissue. Using genomic signatures from the tumor and paired normal, we evaluated how increasing normal contamination altered recurrence risk scores for various genomic predictors. Simulations of normal tissue contamination caused misclassification of tumors in all predictors evaluated, but different breast cancer predictors showed different types of vulnerability to normal tissue bias. While two predictors had unpredictable direction of bias (either higher or lower risk of relapse resulted from normal contamination), one signature showed predictable direction of normal tissue effects. Due to this predictable direction of effect, this signature (the PAM50) was adjusted for normal tissue contamination and these corrections improved sensitivity and negative predictive value. For all three assays quality control standards and/or appropriate bias adjustment strategies can be used to improve assay reliability. Normal tissue sampled concurrently with tumor is an important source of bias in breast genomic predictors. All genomic predictors show some sensitivity to normal tissue contamination and ideal strategies for mitigating this bias vary depending upon the particular genes and computational methods used in the predictor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, H; Wang, J; Chuong, M
2015-06-15
Purpose: To evaluate the role of mid-treatment and post-treatment FDG-PET/CT in predicting progression-free survival (PFS) and distant metastasis (DM) of anal cancer patients treated with chemoradiotherapy (CRT). Methods: 17 anal cancer patients treated with CRT were retrospectively studied. The median prescription dose was 56 Gy (range, 50–62.5 Gy). All patients underwent FDG-PET/CT scans before and after CRT. 16 of the 17 patients had an additional FDG-PET/CT image at 3–5 weeks into the treatment (denoted as mid-treatment FDG-PET/CT). 750 features were extracted from these three sets of scans, which included both traditional PET/CT measures (SUVmax, SUVpeak, tumor diameters, etc.) and spatialtemporalmore » PET/CT features (comprehensively quantify a tumor’s FDG uptake intensity and distribution, spatial variation (texture), geometric property and their temporal changes relative to baseline). 26 clinical parameters (age, gender, TNM stage, histology, GTV dose, etc.) were also analyzed. Advanced analytics including methods to select an optimal set of predictors and a model selection engine, which identifies the most accurate machine learning algorithm for predictive analysis was developed. Results: Comparing baseline + mid-treatment PET/CT set to baseline + posttreatment PET/CT set, 14 predictors were selected from each feature group. Same three clinical parameters (tumor size, T stage and whether 5-FU was held during any cycle of chemotherapy) and two traditional measures (pre- CRT SUVmin and SUVmedian) were selected by both predictor groups. Different mix of spatial-temporal PET/CT features was selected. Using the 14 predictors and Naive Bayes, mid-treatment PET/CT set achieved 87.5% accuracy (2 PFS patients misclassified, all local recurrence and DM patients correctly classified). Post-treatment PET/CT set achieved 94.0% accuracy (all PFS and DM patients correctly predicted, 1 local recurrence patient misclassified) with logistic regression, neural network or support vector machine model. Conclusion: Applying radiomics approach to either midtreatment or post-treatment PET/CT could achieve high accuracy in predicting anal cancer treatment outcomes. This work was supported in part by the National Cancer Institute Grant R01CA172638.« less
Allegra, S; Cusato, J; De Francia, S; Arduino, A; Longo, F; Pirro, E; Massano, D; De Nicolò, A; Piga, A; D'Avolio, A
2018-05-22
β-Thalassemia patients develop deficiency in vitamin D absorption and liver hydroxylation, resulting in extremely low calcitriol levels. We explored the role of single-nucleotide polymorphisms (SNPs) involved in vitamin D metabolism, transport and activity on deferasirox pharmacokinetics and outcomes (effectiveness trough levels (C trough ) and the area under the curve (AUC) cutoffs of 20 μg ml -1 and 360 μg ml -1 h -1 , respectively; nonresponse AUC limit of 250 μg ml -1 h -1 ). Ninety-nine β-thalassemic patients were enrolled. Drug plasma C trough and AUC were measured by the high-performance liquid chromatography system coupled with an ultraviolet determination method. Allelic discrimination for VDR, CYP24A1, CYP27B1 and GC gene SNPs was performed by real-time PCR. CYP24A1 22776 TT significantly influenced C min and negatively predicted it in regression analysis. CYP24A1 3999 CC was associated with C trough and C min and was a negative predictor of T max , whereas CYP24A1 8620 GG seemed to have a role in C trough , AUC, t 1/2 and C min , and was an AUC negative predictor factor. Considering treatment outcome, Cdx2 and GC 1296 were retained in regression analysis as AUC efficacy cutoff negative predictors.
Fisher, Diana E.; Li, Chuan-Ming; Hoffman, Howard J.; Chiu, May S.; Themann, Christa L.; Petersen, Hannes; Jonsson, Palmi V.; Jonsson, Helgi; Jonasson, Fridbert; Sverrisdottir, Johanna Eyrun; Launer, Lenore J.; Eiriksdottir, Gudny; Gudnason, Vilmundur; Cotch, Mary Frances
2015-01-01
Objective We estimate the prevalence of hearing-aid use in Iceland and identify sex-specific factors associated with use. Design Population-based cohort study. Study sample A total of 5172 age, gene/environment susceptibility - Reykjavik study (AGES-RS) participants, aged 67 to 96 years (mean age 76.5 years), who completed air-conduction and pure-tone audiometry. Results Hearing-aid use was reported by 23.0% of men and 15.9% of women in the cohort, although among participants with at least moderate hearing loss in the better ear (pure-tone average [PTA] of thresholds at 0.5, 1, 2, and 4 kHz ≥ 35 dB hearing level [HL]) it was 49.9% and did not differ by sex. Self-reported hearing loss was the strongest predictor of hearing-aid use in men [OR: 2.68 (95% CI: 1.77, 4.08)] and women [OR: 3.07 (95% CI: 1.94, 4.86)], followed by hearing loss severity based on audiometry. Having diabetes or osteoarthritis were significant positive predictors of use in men, whereas greater physical activity and unimpaired cognitive status were important in women. Conclusions Hearing-aid use was comparable in Icelandic men and women with moderate or greater hearing loss. Self-recognition of hearing loss was the factor most predictive of hearing-aid use; other influential factors differed for men and women. PMID:25816699
McKay, Adam; Liew, Carine; Schönberger, Michael; Ross, Pamela; Ponsford, Jennie
(1) To examine the relations between performance on cognitive tests and on-road driving assessment in a sample of persons with traumatic brain injury (TBI). (2) To compare cognitive predictors of the on-road assessment with demographic and injury-related predictors. Ninety-nine people with mild-severe TBI who completed an on-road driving assessment in an Australian rehabilitation setting. Retrospective case series. Wechsler Test of Adult Reading or National Adult Reading Test-Revised; 4 subtests from the Wechsler Adult Intelligence Scale-III; Rey Auditory Verbal Leaning Test; Rey Complex Figure Test; Trail Making Test; demographic factors (age, sex, years licensed); and injury-related factors (duration of posttraumatic amnesia; time postinjury). Participants who failed the driving assessment did worse on measures of attention, visual memory, and executive processing; however, cognitive tests were weak correlates (r values <0.3) and poor predictors of the driving assessment. Posttraumatic amnesia duration mediated by time postinjury was the strongest predictor of the driving assessment-that is, participants with more severe TBIs had later driving assessments and were more likely to fail. Cognitive tests are not reliable predictors of the on-road driving assessment outcome. Traumatic brain injury severity may be a better predictor of on-road driving; however, further research is needed to identify the best predictors of driving behavior after TBI.
Mountifield, Réme; Andrews, Jane M; Mikocka-Walus, Antonina; Bampton, Peter
2015-03-28
To examine the frequency of regular complementary and alternative therapy (CAM) use in three Australian cohorts of contrasting care setting and geography, and identify independent attitudinal and psychological predictors of CAM use across all cohorts. A cross sectional questionnaire was administered to inflammatory bowel disease (IBD) patients in 3 separate cohorts which differed by geographical region and care setting. Demographics and frequency of regular CAM use were assessed, along with attitudes towards IBD medication and psychological parameters such as anxiety, depression, personality traits and quality of life (QOL), and compared across cohorts. Independent attitudinal and psychological predictors of CAM use were determined using binary logistic regression analysis. In 473 respondents (mean age 50.3 years, 60.2% female) regular CAM use was reported by 45.4%, and did not vary between cohorts. Only 54.1% of users disclosed CAM use to their doctor. Independent predictors of CAM use which confirm those reported previously were: covert conventional medication dose reduction (P < 0.001), seeking psychological treatment (P < 0.001), adverse effects of conventional medication (P = 0.043), and higher QOL (P < 0.001). Newly identified predictors were CAM use by family or friends (P < 0.001), dissatisfaction with patient-doctor communication (P < 0.001), and lower depression scores (P < 0.001). In addition to previously identified predictors of CAM use, these data show that physician attention to communication and the patient-doctor relationship is important as these factors influence CAM use. Patient reluctance to discuss CAM with physicians may promote greater reliance on social contacts to influence CAM decisions.
Kirst, Maritt; Mecredy, Graham; Borland, Tracey; Chaiton, Michael
2014-11-01
Young adulthood has been shown to be a time of increased substance use. Yet, not enough is known about which factors contribute to initiation and progression of substance use among young adults specifically during the transition year away from high school. A narrative review was undertaken to increase understanding of the predictors of changes in use of tobacco, alcohol, cannabis, other illicit drugs, and mental health problems among young adults during the transition period after high school. A review of academic literature examining predictors of the use of tobacco, alcohol and cannabis, and co-morbidities (e.g., co-occurring substance use and/or mental health issues) among young adults transitioning from high school to post-secondary education or the workforce. Twenty six studies were included in the review. The majority of the studies (19) examined substance use during the transition from high school to post-secondary settings. Seven studies examined substance use in post-secondary settings. The studies consistently found that substance use increases among young adults as they transition away from high school. During the transition away from high school, common predictors of substance use include substance use in high school, and peer influence. Common predictors of substance use in post-secondary education include previous substance use, peer influence, psychological factors and mental health issues. Conclusions/Importance: Further research on social contextual influences on substance use, mental health issues, gender differences and availability of substances during the transition period is needed to inform the development of new preventive interventions for this age group.
NASA Astrophysics Data System (ADS)
Fang, Wei; Huang, Shengzhi; Huang, Qiang; Huang, Guohe; Meng, Erhao; Luan, Jinkai
2018-06-01
In this study, reference evapotranspiration (ET0) forecasting models are developed for the least economically developed regions subject to meteorological data scarcity. Firstly, the partial mutual information (PMI) capable of capturing the linear and nonlinear dependence is investigated regarding its utility to identify relevant predictors and exclude those that are redundant through the comparison with partial linear correlation. An efficient input selection technique is crucial for decreasing model data requirements. Then, the interconnection between global climate indices and regional ET0 is identified. Relevant climatic indices are introduced as additional predictors to comprise information regarding ET0, which ought to be provided by meteorological data unavailable. The case study in the Jing River and Beiluo River basins, China, reveals that PMI outperforms the partial linear correlation in excluding the redundant information, favouring the yield of smaller predictor sets. The teleconnection analysis identifies the correlation between Nino 1 + 2 and regional ET0, indicating influences of ENSO events on the evapotranspiration process in the study area. Furthermore, introducing Nino 1 + 2 as predictors helps to yield more accurate ET0 forecasts. A model performance comparison also shows that non-linear stochastic models (SVR or RF with input selection through PMI) do not always outperform linear models (MLR with inputs screen by linear correlation). However, the former can offer quite comparable performance depending on smaller predictor sets. Therefore, efforts such as screening model inputs through PMI and incorporating global climatic indices interconnected with ET0 can benefit the development of ET0 forecasting models suitable for data-scarce regions.
Mountifield, Réme; Andrews, Jane M; Mikocka-Walus, Antonina; Bampton, Peter
2015-01-01
AIM: To examine the frequency of regular complementary and alternative therapy (CAM) use in three Australian cohorts of contrasting care setting and geography, and identify independent attitudinal and psychological predictors of CAM use across all cohorts. METHODS: A cross sectional questionnaire was administered to inflammatory bowel disease (IBD) patients in 3 separate cohorts which differed by geographical region and care setting. Demographics and frequency of regular CAM use were assessed, along with attitudes towards IBD medication and psychological parameters such as anxiety, depression, personality traits and quality of life (QOL), and compared across cohorts. Independent attitudinal and psychological predictors of CAM use were determined using binary logistic regression analysis. RESULTS: In 473 respondents (mean age 50.3 years, 60.2% female) regular CAM use was reported by 45.4%, and did not vary between cohorts. Only 54.1% of users disclosed CAM use to their doctor. Independent predictors of CAM use which confirm those reported previously were: covert conventional medication dose reduction (P < 0.001), seeking psychological treatment (P < 0.001), adverse effects of conventional medication (P = 0.043), and higher QOL (P < 0.001). Newly identified predictors were CAM use by family or friends (P < 0.001), dissatisfaction with patient-doctor communication (P < 0.001), and lower depression scores (P < 0.001). CONCLUSION: In addition to previously identified predictors of CAM use, these data show that physician attention to communication and the patient-doctor relationship is important as these factors influence CAM use. Patient reluctance to discuss CAM with physicians may promote greater reliance on social contacts to influence CAM decisions. PMID:25834335
Performance of Polygenic Scores for Predicting Phobic Anxiety
Walter, Stefan; Glymour, M. Maria; Koenen, Karestan; Liang, Liming; Tchetgen Tchetgen, Eric J.; Cornelis, Marilyn; Chang, Shun-Chiao; Rimm, Eric; Kawachi, Ichiro; Kubzansky, Laura D.
2013-01-01
Context Anxiety disorders are common, with a lifetime prevalence of 20% in the U.S., and are responsible for substantial burdens of disability, missed work days and health care utilization. To date, no causal genetic variants have been identified for anxiety, anxiety disorders, or related traits. Objective To investigate whether a phobic anxiety symptom score was associated with 3 alternative polygenic risk scores, derived from external genome-wide association studies of anxiety, an internally estimated agnostic polygenic score, or previously identified candidate genes. Design Longitudinal follow-up study. Using linear and logistic regression we investigated whether phobic anxiety was associated with polygenic risk scores derived from internal, leave-one out genome-wide association studies, from 31 candidate genes, and from out-of-sample genome-wide association weights previously shown to predict depression and anxiety in another cohort. Setting and Participants Study participants (n = 11,127) were individuals from the Nurses' Health Study and Health Professionals Follow-up Study. Main Outcome Measure Anxiety symptoms were assessed via the 8-item phobic anxiety scale of the Crown Crisp Index at two time points, from which a continuous phenotype score was derived. Results We found no genome-wide significant associations with phobic anxiety. Phobic anxiety was also not associated with a polygenic risk score derived from the genome-wide association study beta weights using liberal p-value thresholds; with a previously published genome-wide polygenic score; or with a candidate gene risk score based on 31 genes previously hypothesized to predict anxiety. Conclusion There is a substantial gap between twin-study heritability estimates of anxiety disorders ranging between 20–40% and heritability explained by genome-wide association results. New approaches such as improved genome imputations, application of gene expression and biological pathways information, and incorporating social or environmental modifiers of genetic risks may be necessary to identify significant genetic predictors of anxiety. PMID:24278274
Oelsner, Kathryn Tully; Guo, Yan; To, Sophie Bao-Chieu; Non, Amy L; Barkin, Shari L
2017-01-09
The study of epigenetic processes and mechanisms present a dynamic approach to assess complex individual variation in obesity susceptibility. However, few studies have examined epigenetic patterns in preschool-age children at-risk for obesity despite the relevance of this developmental stage to trajectories of weight gain. We hypothesized that salivary DNA methylation patterns of key obesogenic genes in Hispanic children would 1) correlate with maternal BMI and 2) allow for identification of pathways associated with children at-risk for obesity. Genome-wide DNA methylation was conducted on 92 saliva samples collected from Hispanic preschool children using the Infinium Illumina HumanMethylation 450 K BeadChip (Illumina, San Diego, CA, USA), which interrogates >484,000 CpG sites associated with ~24,000 genes. The analysis was limited to 936 genes that have been associated with obesity in a prior GWAS Study. Child DNA methylation at 17 CpG sites was found to be significantly associated with maternal BMI, with increased methylation at 12 CpG sites and decreased methylation at 5 CpG sites. Pathway analysis revealed methylation at these sites related to homocysteine and methionine degradation as well as cysteine biosynthesis and circadian rhythm. Furthermore, eight of the 17 CpG sites reside in genes (FSTL1, SORCS2, NRF1, DLC1, PPARGC1B, CHN2, NXPH1) that have prior known associations with obesity, diabetes, and the insulin pathway. Our study confirms that saliva is a practical human tissue to obtain in community settings and in pediatric populations. These salivary findings indicate potential epigenetic differences in Hispanic preschool children at risk for pediatric obesity. Identifying early biomarkers and understanding pathways that are epigenetically regulated during this critical stage of child development may present an opportunity for prevention or early intervention for addressing childhood obesity. The clinical trial protocol is available at ClinicalTrials.gov ( NCT01316653 ). Registered 3 March 2011.
Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiaohan; Jawdy, Sara; Tschaplinski, Timothy J
2009-01-01
Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 192, 641 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifsmore » in Arabidopsis and Oryza SS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.« less
Cross-Study Homogeneity of Psoriasis Gene Expression in Skin across a Large Expression Range
Kerkof, Keith; Timour, Martin; Russell, Christopher B.
2013-01-01
Background In psoriasis, only limited overlap between sets of genes identified as differentially expressed (psoriatic lesional vs. psoriatic non-lesional) was found using statistical and fold-change cut-offs. To provide a framework for utilizing prior psoriasis data sets we sought to understand the consistency of those sets. Methodology/Principal Findings Microarray expression profiling and qRT-PCR were used to characterize gene expression in PP and PN skin from psoriasis patients. cDNA (three new data sets) and cRNA hybridization (four existing data sets) data were compared using a common analysis pipeline. Agreement between data sets was assessed using varying qualitative and quantitative cut-offs to generate a DEG list in a source data set and then using other data sets to validate the list. Concordance increased from 67% across all probe sets to over 99% across more than 10,000 probe sets when statistical filters were employed. The fold-change behavior of individual genes tended to be consistent across the multiple data sets. We found that genes with <2-fold change values were quantitatively reproducible between pairs of data-sets. In a subset of transcripts with a role in inflammation changes detected by microarray were confirmed by qRT-PCR with high concordance. For transcripts with both PN and PP levels within the microarray dynamic range, microarray and qRT-PCR were quantitatively reproducible, including minimal fold-changes in IL13, TNFSF11, and TNFRSF11B and genes with >10-fold changes in either direction such as CHRM3, IL12B and IFNG. Conclusions/Significance Gene expression changes in psoriatic lesions were consistent across different studies, despite differences in patient selection, sample handling, and microarray platforms but between-study comparisons showed stronger agreement within than between platforms. We could use cut-offs as low as log10(ratio) = 0.1 (fold-change = 1.26), generating larger gene lists that validate on independent data sets. The reproducibility of PP signatures across data sets suggests that different sample sets can be productively compared. PMID:23308107
Hernandez, J E; Epstein, L D; Rodriguez, M H; Rodriguez, A D; Rejmankova, E; Roberts, D R
1997-03-01
We propose the use of generalized tree models (GTMs) to analyze data from entomological field studies. Generalized tree models can be used to characterize environments with different mosquito breeding capacity. A GTM simultaneously analyzes a set of predictor variables (e.g., vegetation coverage) in relation to a response variable (e.g., counts of Anopheles albimanus larvae), and how it varies with respect to a set of criterion variables (e.g., presence of predators). The algorithm produces a treelike graphical display with its root at the top and 2 branches stemming down from each node. At each node, conditions on the value of predictors partition the observations into subgroups (environments) in which the relation between response and criterion variables is most homogeneous.
Discovering monotonic stemness marker genes from time-series stem cell microarray data.
Wang, Hsei-Wei; Sun, Hsing-Jen; Chang, Ting-Yu; Lo, Hung-Hao; Cheng, Wei-Chung; Tseng, George C; Lin, Chin-Teng; Chang, Shing-Jyh; Pal, Nikhil; Chung, I-Fang
2015-01-01
Identification of genes with ascending or descending monotonic expression patterns over time or stages of stem cells is an important issue in time-series microarray data analysis. We propose a method named Monotonic Feature Selector (MFSelector) based on a concept of total discriminating error (DEtotal) to identify monotonic genes. MFSelector considers various time stages in stage order (i.e., Stage One vs. other stages, Stages One and Two vs. remaining stages and so on) and computes DEtotal of each gene. MFSelector can successfully identify genes with monotonic characteristics. We have demonstrated the effectiveness of MFSelector on two synthetic data sets and two stem cell differentiation data sets: embryonic stem cell neurogenesis (ESCN) and embryonic stem cell vasculogenesis (ESCV) data sets. We have also performed extensive quantitative comparisons of the three monotonic gene selection approaches. Some of the monotonic marker genes such as OCT4, NANOG, BLBP, discovered from the ESCN dataset exhibit consistent behavior with that reported in other studies. The role of monotonic genes found by MFSelector in either stemness or differentiation is validated using information obtained from Gene Ontology analysis and other literature. We justify and demonstrate that descending genes are involved in the proliferation or self-renewal activity of stem cells, while ascending genes are involved in differentiation of stem cells into variant cell lineages. We have developed a novel system, easy to use even with no pre-existing knowledge, to identify gene sets with monotonic expression patterns in multi-stage as well as in time-series genomics matrices. The case studies on ESCN and ESCV have helped to get a better understanding of stemness and differentiation. The novel monotonic marker genes discovered from a data set are found to exhibit consistent behavior in another independent data set, demonstrating the utility of the proposed method. The MFSelector R function and data sets can be downloaded from: http://microarray.ym.edu.tw/tools/MFSelector/.
Rosswog, Carolina; Schmidt, Rene; Oberthuer, André; Juraeva, Dilafruz; Brors, Benedikt; Engesser, Anne; Kahlert, Yvonne; Volland, Ruth; Bartenhagen, Christoph; Simon, Thorsten; Berthold, Frank; Hero, Barbara; Faldum, Andreas; Fischer, Matthias
2017-12-01
Current risk stratification systems for neuroblastoma patients consider clinical, histopathological, and genetic variables, and additional prognostic markers have been proposed in recent years. We here sought to select highly informative covariates in a multistep strategy based on consecutive Cox regression models, resulting in a risk score that integrates hazard ratios of prognostic variables. A cohort of 695 neuroblastoma patients was divided into a discovery set (n=75) for multigene predictor generation, a training set (n=411) for risk score development, and a validation set (n=209). Relevant prognostic variables were identified by stepwise multivariable L1-penalized least absolute shrinkage and selection operator (LASSO) Cox regression, followed by backward selection in multivariable Cox regression, and then integrated into a novel risk score. The variables stage, age, MYCN status, and two multigene predictors, NB-th24 and NB-th44, were selected as independent prognostic markers by LASSO Cox regression analysis. Following backward selection, only the multigene predictors were retained in the final model. Integration of these classifiers in a risk scoring system distinguished three patient subgroups that differed substantially in their outcome. The scoring system discriminated patients with diverging outcome in the validation cohort (5-year event-free survival, 84.9±3.4 vs 63.6±14.5 vs 31.0±5.4; P<.001), and its prognostic value was validated by multivariable analysis. We here propose a translational strategy for developing risk assessment systems based on hazard ratios of relevant prognostic variables. Our final neuroblastoma risk score comprised two multigene predictors only, supporting the notion that molecular properties of the tumor cells strongly impact clinical courses of neuroblastoma patients. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Sperschneider, Jana; Williams, Angela H; Hane, James K; Singh, Karam B; Taylor, Jennifer M
2015-01-01
The steadily increasing number of sequenced fungal and oomycete genomes has enabled detailed studies of how these eukaryotic microbes infect plants and cause devastating losses in food crops. During infection, fungal and oomycete pathogens secrete effector molecules which manipulate host plant cell processes to the pathogen's advantage. Proteinaceous effectors are synthesized intracellularly and must be externalized to interact with host cells. Computational prediction of secreted proteins from genomic sequences is an important technique to narrow down the candidate effector repertoire for subsequent experimental validation. In this study, we benchmark secretion prediction tools on experimentally validated fungal and oomycete effectors. We observe that for a set of fungal SwissProt protein sequences, SignalP 4 and the neural network predictors of SignalP 3 (D-score) and SignalP 2 perform best. For effector prediction in particular, the use of a sensitive method can be desirable to obtain the most complete candidate effector set. We show that the neural network predictors of SignalP 2 and 3, as well as TargetP were the most sensitive tools for fungal effector secretion prediction, whereas the hidden Markov model predictors of SignalP 2 and 3 were the most sensitive tools for oomycete effectors. Thus, previous versions of SignalP retain value for oomycete effector prediction, as the current version, SignalP 4, was unable to reliably predict the signal peptide of the oomycete Crinkler effectors in the test set. Our assessment of subcellular localization predictors shows that cytoplasmic effectors are often predicted as not extracellular. This limits the reliability of secretion predictions that depend on these tools. We present our assessment with a view to informing future pathogenomics studies and suggest revised pipelines for secretion prediction to obtain optimal effector predictions in fungi and oomycetes.
NASA Astrophysics Data System (ADS)
Pande, Saket; Sharma, Ashish
2014-05-01
This study is motivated by the need to robustly specify, identify, and forecast runoff generation processes for hydroelectricity production. It atleast requires the identification of significant predictors of runoff generation and the influence of each such significant predictor on runoff response. To this end, we compare two non-parametric algorithms of predictor subset selection. One is based on information theory that assesses predictor significance (and hence selection) based on Partial Information (PI) rationale of Sharma and Mehrotra (2014). The other algorithm is based on a frequentist approach that uses bounds on probability of error concept of Pande (2005), assesses all possible predictor subsets on-the-go and converges to a predictor subset in an computationally efficient manner. Both the algorithms approximate the underlying system by locally constant functions and select predictor subsets corresponding to these functions. The performance of the two algorithms is compared on a set of synthetic case studies as well as a real world case study of inflow forecasting. References: Sharma, A., and R. Mehrotra (2014), An information theoretic alternative to model a natural system using observational information alone, Water Resources Research, 49, doi:10.1002/2013WR013845. Pande, S. (2005), Generalized local learning in water resource management, PhD dissertation, Utah State University, UT-USA, 148p.
Predictors of Complications in Patients Receiving Head and Neck Free Flap Reconstructive Procedures.
Eskander, Antoine; Kang, Stephen; Tweel, Ben; Sitapara, Jigar; Old, Matthew; Ozer, Enver; Agrawal, Amit; Carrau, Ricardo; Rocco, James W; Teknos, Theodoros N
2018-05-01
Objective To (1) determine the overall complication rate, wound healing, and wound infection complications and (2) identify preoperative, intraoperative, and postoperative predictors of these complications. Study Design Case series with chart review. Setting Tertiary academic cancer hospital. Subjects and Methods All head and neck free flap patients at The Ohio State University (2006-2012) were assessed. Multivariable logistic regression assessed the impact of patient factors, flap and wound factors, and intraoperative factors on the aforementioned quality metric outcomes. Results Of the 515 patients identified, 54% had a complication predicted by longer operating room (OR) time, higher comorbidity index, and oral cavity and pharyngeal tumor sites. Predictors of wound-healing complications (15%) were longer OR time, volume of crystalloid given intraoperatively, and oral cavity and pharyngeal tumor sites. Predictors of wound infection (12%) were younger age, diabetes mellitus, and malnutrition. Conclusions Wound healing and infectious complications account for most complications in patients with head and neck cancer undergoing free flap reconstruction. Clean contaminated wounds are a significant predictor of wound complications. Advanced OR time, advanced age, and comorbidity status, including diabetes mellitus and malnutrition, are other important predictors. Crystalloid administration is also an important predictor of wound-healing complications, and this warrants further study.
A statistical approach to identify, monitor, and manage incomplete curated data sets.
Howe, Douglas G
2018-04-02
Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to identify data sets that are incompletely curated, as demonstrated using the gene expression data set from ZFIN. This information can help both database resources and data consumers gauge when it may be useful to look further for published data to augment the existing expertly curated information.
Costa, Caroline B; Monteiro, Karina M; Teichmann, Aline; da Silva, Edileuza D; Lorenzatto, Karina R; Cancela, Martín; Paes, Jéssica A; Benitz, André de N D; Castillo, Estela; Margis, Rogério; Zaha, Arnaldo; Ferreira, Henrique B
2015-08-01
The histone chaperone SET/TAF-Iβ is implicated in processes of chromatin remodelling and gene expression regulation. It has been associated with the control of developmental processes, but little is known about its function in helminth parasites. In Mesocestoides corti, a partial cDNA sequence related to SET/TAF-Iβ was isolated in a screening for genes differentially expressed in larvae (tetrathyridia) and adult worms. Here, the full-length coding sequence of the M. corti SET/TAF-Iβ gene was analysed and the encoded protein (McSET/TAF) was compared with orthologous sequences, showing that McSET/TAF can be regarded as a SET/TAF-Iβ family member, with a typical nucleosome-assembly protein (NAP) domain and an acidic tail. The expression patterns of the McSET/TAF gene and protein were investigated during the strobilation process by RT-qPCR, using a set of five reference genes, and by immunoblot and immunofluorescence, using monospecific polyclonal antibodies. A gradual increase in McSET/TAF transcripts and McSET/TAF protein was observed upon development induction by trypsin, demonstrating McSET/TAF differential expression during strobilation. These results provided the first evidence for the involvement of a protein from the NAP family of epigenetic effectors in the regulation of cestode development.
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes
Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.
2005-01-01
Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209
Gene integrated set profile analysis: a context-based approach for inferring biological endpoints
Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.
2016-01-01
The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710
Discovery of cancer common and specific driver gene sets
2017-01-01
Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295
Dahlke, Jeffrey A; Kostal, Jack W; Sackett, Paul R; Kuncel, Nathan R
2018-05-03
We explore potential explanations for validity degradation using a unique predictive validation data set containing up to four consecutive years of high school students' cognitive test scores and four complete years of those students' college grades. This data set permits analyses that disentangle the effects of predictor-score age and timing of criterion measurements on validity degradation. We investigate the extent to which validity degradation is explained by criterion dynamism versus the limited shelf-life of ability scores. We also explore whether validity degradation is attributable to fluctuations in criterion variability over time and/or GPA contamination from individual differences in course-taking patterns. Analyses of multiyear predictor data suggest that changes to the determinants of performance over time have much stronger effects on validity degradation than does the shelf-life of cognitive test scores. The age of predictor scores had only a modest relationship with criterion-related validity when the criterion measurement occasion was held constant. Practical implications and recommendations for future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Generated effect modifiers (GEM's) in randomized clinical trials.
Petkova, Eva; Tarpey, Thaddeus; Su, Zhe; Ogden, R Todd
2017-01-01
In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an "effect modifier". Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Estimation of relative effectiveness of phylogenetic programs by machine learning.
Krivozubov, Mikhail; Goebels, Florian; Spirin, Sergei
2014-04-01
Reconstruction of phylogeny of a protein family from a sequence alignment can produce results of different quality. Our goal is to predict the quality of phylogeny reconstruction basing on features that can be extracted from the input alignment. We used Fitch-Margoliash (FM) method of phylogeny reconstruction and random forest as a predictor. For training and testing the predictor, alignments of orthologous series (OS) were used, for which the result of phylogeny reconstruction can be evaluated by comparison with trees of corresponding organisms. Our results show that the quality of phylogeny reconstruction can be predicted with more than 80% precision. Also, we tried to predict which phylogeny reconstruction method, FM or UPGMA, is better for a particular alignment. With the used set of features, among alignments for which the obtained predictor predicts a better performance of UPGMA, 56% really give a better result with UPGMA. Taking into account that in our testing set only for 34% alignments UPGMA performs better, this result shows a principal possibility to predict the better phylogeny reconstruction method basing on features of a sequence alignment.
Moul, Caroline; Dobson-Stone, Carol; Brennan, John; Hawes, David; Dadds, Mark
2013-01-01
Background The serotonin system is thought to play a role in the aetiology of antisocial and aggressive behaviour in both adults and children however previous findings have been inconsistent. Recently, research has suggested that the function of the serotonin system may be specifically altered in a sub-set of antisocial populations – those with psychopathic (callous-unemotional) personality traits. We explored the relationships between callous-unemotional traits and functional polymorphisms of selected serotonin-system genes, and tested the association between callous-unemotional traits and serum serotonin levels independently of antisocial and aggressive behaviour. Method Participants were boys with antisocial behaviour problems aged 3–16 years referred to University of New South Wales Child Behaviour Research Clinics. Participants volunteered either a blood or saliva sample from which levels of serum serotonin (N = 66) and/or serotonin-system single nucleotide polymorphisms (N = 157) were assayed. Results Functional single nucleotide polymorphisms from the serotonin 1b receptor gene (HTR1B) and 2a receptor gene (HTR2A) were found to be associated with callous-unemotional traits. Serum serotonin level was a significant predictor of callous-unemotional traits; levels were significantly lower in boys with high callous-unemotional traits than in boys with low callous-unemotional traits. Conclusion Results provide support to the emerging literature that argues for a genetically-driven system-wide alteration in serotonin function in the aetiology of callous-unemotional traits. The findings should be interpreted as preliminary and future research that aims to replicate and further investigate these results is required. PMID:23457595
Colacino, Justin A.; Dolinoy, Dana C.; Duffy, Sonia A.; Sartor, Maureen A.; Chepeha, Douglas B.; Bradford, Carol R.; McHugh, Jonathan B.; Patel, Divya A.; Virani, Shama; Walline, Heather M.; Bellile, Emily; Terrell, Jeffrey E.; Stoerker, Jay A.; Taylor, Jeremy M. G.; Carey, Thomas E.; Wolf, Gregory T.; Rozek, Laura S.
2013-01-01
Head and neck squamous cell carcinoma (HNSCC) is the eighth most commonly diagnosed cancer in the United States. The risk of developing HNSCC increases with exposure to tobacco, alcohol and infection with human papilloma virus (HPV). HPV-associated HNSCCs have a distinct risk profile and improved prognosis compared to cancers associated with tobacco and alcohol exposure. Epigenetic changes are an important mechanism in carcinogenic progression, but how these changes differ between viral- and chemical-induced cancers remains unknown. CpG methylation at 1505 CpG sites across 807 genes in 68 well-annotated HNSCC tumor samples from the University of Michigan Head and Neck SPORE patient population were quantified using the Illumina Goldengate Methylation Cancer Panel. Unsupervised hierarchical clustering based on methylation identified 6 distinct tumor clusters, which significantly differed by age, HPV status, and three year survival. Weighted linear modeling was used to identify differentially methylated genes based on epidemiological characteristics. Consistent with previous in vitro findings by our group, methylation of sites in the CCNA1 promoter was found to be higher in HPV(+) tumors, which was validated in an additional sample set of 128 tumors. After adjusting for cancer site, stage, age, gender, alcohol consumption, and smoking status, HPV status was found to be a significant predictor for DNA methylation at an additional 11 genes, including CASP8 and SYBL1. These findings provide insight into the epigenetic regulation of viral vs. chemical carcinogenesis and could provide novel targets for development of individualized therapeutic and prevention regimens based on environmental exposures. PMID:23358896
Colacino, Justin A; Dolinoy, Dana C; Duffy, Sonia A; Sartor, Maureen A; Chepeha, Douglas B; Bradford, Carol R; McHugh, Jonathan B; Patel, Divya A; Virani, Shama; Walline, Heather M; Bellile, Emily; Terrell, Jeffrey E; Stoerker, Jay A; Taylor, Jeremy M G; Carey, Thomas E; Wolf, Gregory T; Rozek, Laura S
2013-01-01
Head and neck squamous cell carcinoma (HNSCC) is the eighth most commonly diagnosed cancer in the United States. The risk of developing HNSCC increases with exposure to tobacco, alcohol and infection with human papilloma virus (HPV). HPV-associated HNSCCs have a distinct risk profile and improved prognosis compared to cancers associated with tobacco and alcohol exposure. Epigenetic changes are an important mechanism in carcinogenic progression, but how these changes differ between viral- and chemical-induced cancers remains unknown. CpG methylation at 1505 CpG sites across 807 genes in 68 well-annotated HNSCC tumor samples from the University of Michigan Head and Neck SPORE patient population were quantified using the Illumina Goldengate Methylation Cancer Panel. Unsupervised hierarchical clustering based on methylation identified 6 distinct tumor clusters, which significantly differed by age, HPV status, and three year survival. Weighted linear modeling was used to identify differentially methylated genes based on epidemiological characteristics. Consistent with previous in vitro findings by our group, methylation of sites in the CCNA1 promoter was found to be higher in HPV(+) tumors, which was validated in an additional sample set of 128 tumors. After adjusting for cancer site, stage, age, gender, alcohol consumption, and smoking status, HPV status was found to be a significant predictor for DNA methylation at an additional 11 genes, including CASP8 and SYBL1. These findings provide insight into the epigenetic regulation of viral vs. chemical carcinogenesis and could provide novel targets for development of individualized therapeutic and prevention regimens based on environmental exposures.
Moul, Caroline; Dobson-Stone, Carol; Brennan, John; Hawes, David; Dadds, Mark
2013-01-01
The serotonin system is thought to play a role in the aetiology of antisocial and aggressive behaviour in both adults and children however previous findings have been inconsistent. Recently, research has suggested that the function of the serotonin system may be specifically altered in a sub-set of antisocial populations - those with psychopathic (callous-unemotional) personality traits. We explored the relationships between callous-unemotional traits and functional polymorphisms of selected serotonin-system genes, and tested the association between callous-unemotional traits and serum serotonin levels independently of antisocial and aggressive behaviour. Participants were boys with antisocial behaviour problems aged 3-16 years referred to University of New South Wales Child Behaviour Research Clinics. Participants volunteered either a blood or saliva sample from which levels of serum serotonin (N = 66) and/or serotonin-system single nucleotide polymorphisms (N = 157) were assayed. Functional single nucleotide polymorphisms from the serotonin 1b receptor gene (HTR1B) and 2a receptor gene (HTR2A) were found to be associated with callous-unemotional traits. Serum serotonin level was a significant predictor of callous-unemotional traits; levels were significantly lower in boys with high callous-unemotional traits than in boys with low callous-unemotional traits. Results provide support to the emerging literature that argues for a genetically-driven system-wide alteration in serotonin function in the aetiology of callous-unemotional traits. The findings should be interpreted as preliminary and future research that aims to replicate and further investigate these results is required.
Prediction of gestational age based on genome-wide differentially methylated regions.
Bohlin, J; Håberg, S E; Magnus, P; Reese, S E; Gjessing, H K; Magnus, M C; Parr, C L; Page, C M; London, S J; Nystad, W
2016-10-07
We explored the association between gestational age and cord blood DNA methylation at birth and whether DNA methylation could be effective in predicting gestational age due to limitations with the presently used methods. We used data from the Norwegian Mother and Child Birth Cohort study (MoBa) with Illumina HumanMethylation450 data measured for 1753 newborns in two batches: MoBa 1, n = 1068; and MoBa 2, n = 685. Gestational age was computed using both ultrasound and the last menstrual period. We evaluated associations between DNA methylation and gestational age and developed a statistical model for predicting gestational age using MoBa 1 for training and MoBa 2 for predictions. The prediction model was additionally used to compare ultrasound and last menstrual period-based gestational age predictions. Furthermore, both CpGs and associated genes detected in the training models were compared to those detected in a published prediction model for chronological age. There were 5474 CpGs associated with ultrasound gestational age after adjustment for a set of covariates, including estimated cell type proportions, and Bonferroni-correction for multiple testing. Our model predicted ultrasound gestational age more accurately than it predicted last menstrual period gestational age. DNA methylation at birth appears to be a good predictor of gestational age. Ultrasound gestational age is more strongly associated with methylation than last menstrual period gestational age. The CpGs linked with our gestational age prediction model, and their associated genes, differed substantially from the corresponding CpGs and genes associated with a chronological age prediction model.
Pointwise influence matrices for functional-response regression.
Reiss, Philip T; Huang, Lei; Wu, Pei-Shien; Chen, Huaihou; Colcombe, Stan
2017-12-01
We extend the notion of an influence or hat matrix to regression with functional responses and scalar predictors. For responses depending linearly on a set of predictors, our definition is shown to reduce to the conventional influence matrix for linear models. The pointwise degrees of freedom, the trace of the pointwise influence matrix, are shown to have an adaptivity property that motivates a two-step bivariate smoother for modeling nonlinear dependence on a single predictor. This procedure adapts to varying complexity of the nonlinear model at different locations along the function, and thereby achieves better performance than competing tensor product smoothers in an analysis of the development of white matter microstructure in the brain. © 2017, The International Biometric Society.
iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition.
Chen, Wei; Feng, Peng-Mian; Lin, Hao; Chou, Kuo-Chen
2014-01-01
In eukaryotic genes, exons are generally interrupted by introns. Accurately removing introns and joining exons together are essential processes in eukaryotic gene expression. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapid and effective detection of splice sites that play important roles in gene structure annotation and even in RNA splicing. Although a series of computational methods were proposed for splice site identification, most of them neglected the intrinsic local structural properties. In the present study, a predictor called "iSS-PseDNC" was developed for identifying splice sites. In the new predictor, the sequences were formulated by a novel feature-vector called "pseudo dinucleotide composition" (PseDNC) into which six DNA local structural properties were incorporated. It was observed by the rigorous cross-validation tests on two benchmark datasets that the overall success rates achieved by iSS-PseDNC in identifying splice donor site and splice acceptor site were 85.45% and 87.73%, respectively. It is anticipated that iSS-PseDNC may become a useful tool for identifying splice sites and that the six DNA local structural properties described in this paper may provide novel insights for in-depth investigations into the mechanism of RNA splicing.
Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling; Wang, Xianhui; Kang, Le
2017-06-01
The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain-containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. © The Authors 2017. Published by Oxford University Press.
Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling
2017-01-01
Abstract The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain–containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. PMID:28444351
Genome-Scale Analysis of Translation Elongation with a Ribosome Flow Model
Meilijson, Isaac; Kupiec, Martin; Ruppin, Eytan
2011-01-01
We describe the first large scale analysis of gene translation that is based on a model that takes into account the physical and dynamical nature of this process. The Ribosomal Flow Model (RFM) predicts fundamental features of the translation process, including translation rates, protein abundance levels, ribosomal densities and the relation between all these variables, better than alternative (‘non-physical’) approaches. In addition, we show that the RFM can be used for accurate inference of various other quantities including genes' initiation rates and translation costs. These quantities could not be inferred by previous predictors. We find that increasing the number of available ribosomes (or equivalently the initiation rate) increases the genomic translation rate and the mean ribosome density only up to a certain point, beyond which both saturate. Strikingly, assuming that the translation system is tuned to work at the pre-saturation point maximizes the predictive power of the model with respect to experimental data. This result suggests that in all organisms that were analyzed (from bacteria to Human), the global initiation rate is optimized to attain the pre-saturation point. The fact that similar results were not observed for heterologous genes indicates that this feature is under selection. Remarkably, the gap between the performance of the RFM and alternative predictors is strikingly large in the case of heterologous genes, testifying to the model's promising biotechnological value in predicting the abundance of heterologous proteins before expressing them in the desired host. PMID:21909250
Pernat Drobež, Cvetka; Ferkolj, Ivan; Potočnik, Uroš; Repnik, Katja
2018-03-01
Crohn's disease (CD) patients are mostly diagnosed with the uncomplicated inflammatory form of disease; however, the majority will progress to complicated stricturing or penetrating disease over time. It is important to identify patients at risk for disease progression at an early stage. The aim of our study was to examine the role of 33 candidate CD genes as possible predictors of disease progression and their influence on time to progression from an inflammatory to a stricturing or penetrating phenotype. Patients with an inflammatory phenotype at diagnosis were followed for 10 years and 33 CD-associated polymorphisms were genotyped. To test for association with CD, 449 healthy individuals were analyzed as the control group. Ten years after diagnosis, 39.1% of patients had not progressed beyond an inflammatory phenotype, but 60.9% had progressed to complicated disease, with average time to progression being 5.91 years. Association analyses of selected single nucleotide polymorphisms (SNPs) confirmed associations with CD for 12 SNPs. Furthermore, seven loci were associated with disease progression, out of which SNP rs4263839 in the gene TNFSF15 showed the strongest association with disease progression and the frameshift mutation rs2066847 in the gene NOD2 showed the strongest association with time to progression. The results of our study identified specific genetic biomarkers as useful predictors of both disease progression and speed of disease progression in patients with CD.
Wei, Zhe; He, Jin-Wei; Fu, Wen-Zhen; Zhang, Zhen-Lin
2016-12-01
Adefovir dipivoxil (ADV) was an important cause of adult-onset hypophosphatemic osteomalacia. However, its clinical characteristics and mechanisms have not been well defined. The objective of the study was to summarize the clinical characteristics of ADV-induced osteomalacia and to explore the association between ADV-associated tubulopathy and polymorphisms in genes encoding drug transporters. Seventy-six affected patients were clinically studied. The SLC22A6 and ABCC2 genes were screened and compared with healthy people from the HapMap. Hypophosphatemia, high serum alkaline phosphatase (ALP) levels, hypouricemia, nondiabetic glycosuria, proteinuria, metabolic acidosis and high bone turnover markers were the main metabolic characteristics. Fractures and pseudofractures occurred in 39 patients. Stopping ADV administration, supplementing calcitriol and calcium was effective during the follow-up period. Single SNP analysis revealed a higher percentage of the G/A genotype at c.2934 in exon 22 of the ABCC2 gene (rs3740070) in patients than in healthy people (12% [7 of 58 patients] vs. 0% [0 of 45 patients]; P=0.017), while there was no subject with homozygosity for the A allele at c.2934. ADV can be nephrotoxic at a conventional dosage. The G/A genotype at c.2934 of the ABCC2 gene may be a predictor of patients at greater risk for developing ADV-associated tubulopathy. Larger case-control studies are needed to further verify this finding. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Background Limited controlled data exist to guide treatment choices for clinicians caring for patients with major depressive disorder (MDD). Although many putative predictors of treatment response have been reported, most were identified through retrospective analyses of existing datasets and very few have been replicated in a manner that can impact clinical practice. One major confound in previous studies examining predictors of treatment response is the patient’s treatment history, which may affect both the predictor of interest and treatment outcomes. Moreover, prior treatment history provides an important source of selection bias, thereby limiting generalizability. Consequently, we initiated a randomized clinical trial designed to identify factors that moderate response to three treatments for MDD among patients never treated previously for the condition. Methods/design Treatment-naïve adults aged 18 to 65 years with moderate-to-severe, non-psychotic MDD are randomized equally to one of three 12-week treatment arms: (1) cognitive behavior therapy (CBT, 16 sessions); (2) duloxetine (30–60 mg/d); or (3) escitalopram (10–20 mg/d). Prior to randomization, patients undergo multiple assessments, including resting state functional magnetic resonance imaging (fMRI), immune markers, DNA and gene expression products, and dexamethasone-corticotropin-releasing hormone (Dex/CRH) testing. Prior to or shortly after randomization, patients also complete a comprehensive personality assessment. Repeat assessment of the biological measures (fMRI, immune markers, and gene expression products) occurs at an early time-point in treatment, and upon completion of 12-week treatment, when a second Dex/CRH test is also conducted. Patients remitting by the end of this acute treatment phase are then eligible to enter a 21-month follow-up phase, with quarterly visits to monitor for recurrence. Non-remitters are offered augmentation treatment for a second 12-week course of treatment, during which they receive a combination of CBT and antidepressant medication. Predictors of the primary outcome, remission, will be identified for overall and treatment-specific effects, and a statistical model incorporating multiple predictors will be developed to predict outcomes. Discussion The PReDICT study’s evaluation of biological, psychological, and clinical factors that may differentially impact treatment outcomes represents a sizeable step toward developing personalized treatments for MDD. Identified predictors should help guide the selection of initial treatments, and identify those patients most vulnerable to recurrence, who thus warrant maintenance or combination treatments to achieve and maintain wellness. Trial registration Clinicaltrials.gov Identifier: NCT00360399. Registered 02 AUG 2006. First patient randomized 09 FEB 2007. PMID:22776534
shRNA-Induced Gene Knockdown In Vivo to Investigate Neutrophil Function.
Basit, Abdul; Tang, Wenwen; Wu, Dianqing
2016-01-01
To silence genes in neutrophils efficiently, we exploited the RNA interference and developed an shRNA-based gene knockdown technique. This method involves transfection of mouse bone marrow-derived hematopoietic stem cells with retroviral vector carrying shRNA directed at a specific gene. Transfected stem cells are then transplanted into irradiated wild-type mice. After engraftment of stem cells, the transplanted mice have two sets of circulating neutrophils. One set has a gene of interest knocked down while the other set has full complement of expressed genes. This efficient technique provides a unique way to directly compare the response of neutrophils with a knocked-down gene to that of neutrophils with the full complement of expressed genes in the same environment.
Bioclimatic predictors for supporting ecological applications in the conterminous United States
O'Donnel, Michael S.; Ignizio, Drew A.
2012-01-01
The U.S. Geological Survey (USGS) has developed climate indices, referred to as bioclimatic predictors, which highlight climate conditions best related to species physiology. A set of 20 bioclimatic predictors were developed as Geographic Information Systems (GIS) continuous raster surfaces for each year between 1895 and 2009. The Parameter-elevation Regression on Independent Slopes Model (PRISM) and down-scaled PRISM data, which included both averaged multi-year and averaged monthly climate summaries, was used to develop these multi-scale bioclimatic predictors. Bioclimatic predictors capture information about annual conditions (annual mean temperature, annual precipitation, annual range in temperature and precipitation), as well as seasonal mean climate conditions and intra-year seasonality (temperature of the coldest and warmest months, precipitation of the wettest and driest quarters). Examining climate over time is useful when quantifying the effects of climate changes on species' distributions for past, current, and forecasted scenarios. These data, which have not been readily available to scientists, can provide biologists and ecologists with relevant and multi-scaled climate data to augment research on the responses of species to changing climate conditions. The relationships established between species demographics and distributions with bioclimatic predictors can inform land managers of climatic effects on species during decisionmaking processes.
Windhorst, Dafna A; Mileva-Seitz, Viara R; Rippe, Ralph C A; Tiemeier, Henning; Jaddoe, Vincent W V; Verhulst, Frank C; van IJzendoorn, Marinus H; Bakermans-Kranenburg, Marian J
2016-08-01
In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome-wide environmental interaction (GWEI) studies in the search for genetic variation underlying individual differences in behavior. Genetic variants in 12 autosomal dopaminergic genes were available in an ethnically homogenous part of a population-based cohort. Harsh parenting was assessed with maternal (n = 1881) and paternal (n = 1710) reports at age 3. Externalizing behavior was assessed with the Child Behavior Checklist (CBCL) at age 5 (71 ± 3.7 months). We conducted gene-set analyses of the association between variation in dopaminergic genes and externalizing behavior, stratified for harsh parenting. The association was statistically significant or approached significance for children without harsh parenting experiences, but was absent in the group with harsh parenting. Similarly, significant associations between single genes and externalizing behavior were only found in the group without harsh parenting. Effect sizes in the groups with and without harsh parenting did not differ significantly. Gene-environment interaction tests were conducted for individual genetic variants, resulting in two significant interaction effects (rs1497023 and rs4922132) after correction for multiple testing. Our findings are suggestive of G × E interplay, with associations between dopamine genes and externalizing behavior present in children without harsh parenting, but not in children with harsh parenting experiences. Harsh parenting may overrule the role of genetic factors in externalizing behavior. Gene-based and gene-set analyses offer promising new alternatives to analyses focusing on single candidate polymorphisms when examining the interplay between genetic and environmental factors.
About miRNAs, miRNA seeds, target genes and target pathways.
Kehl, Tim; Backes, Christina; Kern, Fabian; Fehlmann, Tobias; Ludwig, Nicole; Meese, Eckart; Lenhof, Hans-Peter; Keller, Andreas
2017-12-05
miRNAs are typically repressing gene expression by binding to the 3' UTR, leading to degradation of the mRNA. This process is dominated by the eight-base seed region of the miRNA. Further, miRNAs are known not only to target genes but also to target significant parts of pathways. A logical line of thoughts is: miRNAs with similar (seed) sequence target similar sets of genes and thus similar sets of pathways. By calculating similarity scores for all 3.25 million pairs of 2,550 human miRNAs, we found that this pattern frequently holds, while we also observed exceptions. Respective results were obtained for both, predicted target genes as well as experimentally validated targets. We note that miRNAs target gene set similarity follows a bimodal distribution, pointing at a set of 282 miRNAs that seems to target genes with very high specificity. Further, we discuss miRNAs with different (seed) sequences that nonetheless regulate similar gene sets or pathways. Most intriguingly, we found miRNA pairs that regulate different gene sets but similar pathways such as miR-6886-5p and miR-3529-5p. These are jointly targeting different parts of the MAPK signaling cascade. The main goal of this study is to provide a general overview on the results, to highlight a selection of relevant results on miRNAs, miRNA seeds, target genes and target pathways and to raise awareness for artifacts in respective comparisons. The full set of information that allows to infer detailed results on each miRNA has been included in miRPathDB, the miRNA target pathway database (https://mpd.bioinf.uni-sb.de).
Shimoni, Yishai
2018-02-01
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.
2018-01-01
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520
Lee, Hyeonjeong; Shin, Miyoung
2017-01-01
The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.
ERIC Educational Resources Information Center
Ames, Megan E.; Wintre, Maxine G.; Prancer, S. Mark; Pratt, Michael W.; Birnie-Lefcovitch, Shelly; Polivy, Janet; Adams, Gerald R.
2014-01-01
Undergraduates (N = 2,823) at 6 universities were surveyed longitudinally to examine the relevance of student home setting on the transition to university. Preliminary results indicated that rural students seem less likely to attend large, ethnically diverse universities. Hierarchical linear models revealed that "proximal rural" students…
Microgravity and Immunity: Changes in Lymphocyte Gene Expression
NASA Technical Reports Server (NTRS)
Risin, D.; Pellis, N. R.; Ward, N. E.; Risin, S. A.
2006-01-01
Earlier studies had shown that modeled and true microgravity (MG) cause multiple direct effects on human lymphocytes. MG inhibits lymphocyte locomotion, suppresses polyclonal and antigen-specific activation, affects signal transduction mechanisms, as well as activation-induced apoptosis. In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes (MGSG) in general and specifically those genes that might be responsible for the functional and structural changes observed earlier. Two sets of experiments targeting different goals were conducted. In the first set, T-lymphocytes from normal donors were activated with antiCD3 and IL2 and then cultured in 1g (static) and modeled MG (MMG) conditions (Rotating Wall Vessel bioreactor) for 24 hours. This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity. In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus (PHA) thus triggering the apoptotic pathway. Total RNA was extracted using the RNeasy isolation kit (Qiagen, Valencia, CA). Affymetrix Gene Chips (U133A), allowing testing for 18,400 human genes, were used for microarray analysis. In the first set of experiments MMG exposure resulted in altered expression of 89 genes, 10 of them were up-regulated and 79 down-regulated. In the second set, changes in expression were revealed in 85 genes, 20 were up-regulated and 65 were down-regulated. The analysis revealed that significant numbers of MGS genes are associated with signal transduction and apoptotic pathways. Interestingly, the majority of genes that responded by up- or down-regulation in the alternative sets of experiments were not the same, possibly reflecting different functional states of the examined T-lymphocyte populations. The responder genes (MGSG) might play an essential role in adaptation to MG and/or be responsible for pathologic changes encountered in Space and thus represent potential targets for molecular-based countermeasures
Cheung, Y M; Leung, W M; Xu, L
1997-01-01
We propose a prediction model called Rival Penalized Competitive Learning (RPCL) and Combined Linear Predictor method (CLP), which involves a set of local linear predictors such that a prediction is made by the combination of some activated predictors through a gating network (Xu et al., 1994). Furthermore, we present its improved variant named Adaptive RPCL-CLP that includes an adaptive learning mechanism as well as a data pre-and-post processing scheme. We compare them with some existing models by demonstrating their performance on two real-world financial time series--a China stock price and an exchange-rate series of US Dollar (USD) versus Deutschmark (DEM). Experiments have shown that Adaptive RPCL-CLP not only outperforms the other approaches with the smallest prediction error and training costs, but also brings in considerable high profits in the trading simulation of foreign exchange market.
Parenting Style and Behavior as Longitudinal Predictors of Adolescent Alcohol Use.
Minaie, Matin Ghayour; Hui, Ka Kit; Leung, Rachel K; Toumbourou, John W; King, Ross M
2015-09-01
Adolescent alcohol use is a serious problem in Australia and other nations. Longitudinal data on family predictors are valuable to guide parental education efforts. The present study tested Baumrind's proposal that parenting styles are direct predictors of adolescent alcohol use. Latent class modeling was used to investigate adolescent perceptions of parenting styles and multivariate regression to examine their predictive effect on the development of adolescent alcohol use. The data set comprised 2,081 secondary school students (55.9% female) from metropolitan Melbourne, Australia, who completed three waves of annual longitudinal data starting in 2004. Baumrind's parenting styles were significant predictors in unadjusted analyses, but these effects were not maintained in multivariate models that also included parenting behavior dimensions. Family influences on the development of adolescent alcohol use appear to operate more directly through specific family management behaviors rather than through more global parenting styles.
Development of the Meharry Medical College Prostate Cancer Research Program
2007-03-01
Immunology Jul;62 Suppl 1:73-83, 2005. Marshall D, Sabek O, Fraga D, Kotb M and Gaber AO. Examination of the molecular signature associated...Sabek OM, Marshall DR, Minoru O, Fraga DW, Gaber AO. Gene expression profile of nonfunctional human pancreatic islets:predictors of transplant
Identifying prognostic signature in ovarian cancer using DirGenerank
Wang, Jian-Yong; Chen, Ling-Ling; Zhou, Xiong-Hui
2017-01-01
Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes’ correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients. PMID:28615526
Pathway Distiller - multisource biological pathway consolidation
2012-01-01
Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636
Pathway Distiller - multisource biological pathway consolidation.
Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong
2012-01-01
One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
Newton, Richard; Wernisch, Lorenz
2014-01-01
Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247
Esperón-Rodríguez, Manuel; Baumgartner, John B.; Beaumont, Linda J.
2017-01-01
Background Shrubs play a key role in biogeochemical cycles, prevent soil and water erosion, provide forage for livestock, and are a source of food, wood and non-wood products. However, despite their ecological and societal importance, the influence of different environmental variables on shrub distributions remains unclear. We evaluated the influence of climate and soil characteristics, and whether including soil variables improved the performance of a species distribution model (SDM), Maxent. Methods This study assessed variation in predictions of environmental suitability for 29 Australian shrub species (representing dominant members of six shrubland classes) due to the use of alternative sets of predictor variables. Models were calibrated with (1) climate variables only, (2) climate and soil variables, and (3) soil variables only. Results The predictive power of SDMs differed substantially across species, but generally models calibrated with both climate and soil data performed better than those calibrated only with climate variables. Models calibrated solely with soil variables were the least accurate. We found regional differences in potential shrub species richness across Australia due to the use of different sets of variables. Conclusions Our study provides evidence that predicted patterns of species richness may be sensitive to the choice of predictor set when multiple, plausible alternatives exist, and demonstrates the importance of considering soil properties when modeling availability of habitat for plants. PMID:28652933
Predicting story goodness performance from cognitive measures following traumatic brain injury.
Lê, Karen; Coelho, Carl; Mozeiko, Jennifer; Krueger, Frank; Grafman, Jordan
2012-05-01
This study examined the prediction of performance on measures of the Story Goodness Index (SGI; Lê, Coelho, Mozeiko, & Grafman, 2011) from executive function (EF) and memory measures following traumatic brain injury (TBI). It was hypothesized that EF and memory measures would significantly predict SGI outcomes. One hundred sixty-seven individuals with TBI participated in the study. Story retellings were analyzed using the SGI protocol. Three cognitive measures--Delis-Kaplan Executive Function System (D-KEFS; Delis, Kaplan, & Kramer, 2001) Sorting Test, Wechsler Memory Scale--Third Edition (WMS-III; Wechsler, 1997) Working Memory Primary Index (WMI), and WMS-III Immediate Memory Primary Index (IMI)--were entered into a multiple linear regression model for each discourse measure. Two sets of regression analyses were performed, the first with the Sorting Test as the first predictor and the second with it as the last. The first set of regression analyses identified the Sorting Test and IMI as the only significant predictors of performance on measures of the SGI. The second set identified all measures as significant predictors when evaluating each step of the regression function. The cognitive variables predicted performance on the SGI measures, although there were differences in the amount of explained variance. The results (a) suggest that storytelling ability draws on a number of underlying skills and (b) underscore the importance of using discrete cognitive tasks rather than broad cognitive indices to investigate the cognitive substrates of discourse.
Analysis of genetic association using hierarchical clustering and cluster validation indices.
Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L
2017-10-01
It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.
Superior Cross-Species Reference Genes: A Blueberry Case Study
Die, Jose V.; Rowland, Lisa J.
2013-01-01
The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469
Naaijen, J; Bralten, J; Poelmans, G; Glennon, J C; Franke, B; Buitelaar, J K
2017-01-10
Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD.
Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino
2014-01-30
The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
A Review of Feature Extraction Software for Microarray Gene Expression Data
Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini
2014-01-01
When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315
NASA Astrophysics Data System (ADS)
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2016-04-01
Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the considered region. Finally, even with such an adaptive predictor identification, the downscaling link appears to be robust: for a same prediction day, predictors selected for different locations of a given region are similar and the regression parameters are consistent within the region of interest.
Randomized Trial of Infusion Set Function: Steel Versus Teflon
Patel, Parul J.; Benasi, Kari; Ferrari, Gina; Evans, Mark G.; Shanmugham, Satya; Wilson, Darrell M.
2014-01-01
Abstract Background: This study compared infusion set function for up to 1 week using either a Teflon® (Dupont™, Wilmington, DE) catheter or a steel catheter for insulin pump therapy in type 1 diabetes mellitus. Subjects and Methods: Twenty subjects participating in a randomized, open-labeled, crossover study were asked to wear two Quick-Set® and two Sure-T® infusion sets (both from Medtronic Minimed, Northridge, CA) until the infusion set failed or was worn for 1 week. All subjects wore a MiniMed continuous glucose monitoring system for the duration of the study. Results: One subject withdrew from the study. There were 38 weeks of Sure-T wear and 39 weeks of Quick-Set wear with no difference in the survival curves of the infusion sets. There was, however, a 15% initial failure rate with the Teflon infusion set. After 7 days, both types of infusion sets had a 64% failure rate. Overall, 30% failed because of hyperglycemia and a failed correction dose, 13% were removed for pain, 10% were pulled out by accident, 10% had erythema and/or induration of>10 mm, 5% fell out because of loss of adhesion, and 4% were removed for infection. The main predictor of length of wear was the individual subject. There was no increase in hyperglycemia or daily insulin requirements when an infusion set was successfully used for 7 days (n=25 of 77 weeks). Conclusions: We found no difference between steel and Teflon infusion sets in their function over 7 days, although 15% of Teflon sets failed because of kinking on insertion. The strongest predictor of prolonged 7-day infusion set function was the individual subject, not the type of infusion set. PMID:24090124
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Dreams Fulfilled and Shattered: Determinants of Segmented Assimilation in the Second Generation*
Haller, William; Portes, Alejandro; Lynch, Scott M.
2013-01-01
We summarize prior theories on the adaptation process of the contemporary immigrant second generation as a prelude to presenting additive and interactive models showing the impact of family variables, school contexts and academic outcomes on the process. For this purpose, we regress indicators of educational and occupational achievement in early adulthood on predictors measured three and six years earlier. The Children of Immigrants Longitudinal Study (CILS), used for the analysis, allows us to establish a clear temporal order among exogenous predictors and the two dependent variables. We also construct a Downward Assimilation Index (DAI), based on six indicators and regress it on the same set of predictors. Results confirm a pattern of segmented assimilation in the second generation, with a significant proportion of the sample experiencing downward assimilation. Predictors of the latter are the obverse of those of educational and occupational achievement. Significant interaction effects emerge between these predictors and early school contexts, defined by different class and racial compositions. Implications of these results for theory and policy are examined. PMID:24223437
Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su
2015-01-01
It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
2015-01-01
Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779
Frequency and predictors of return to incentive spirometry volume baseline after cardiac surgery.
Harton, Suzanne C; Grap, Mary Jo; Savage, Laura; Elswick, R K
2007-01-01
Incentive spirometry (IS) is routinely used in most clinical settings, but evaluation of patient efficacy of IS is not standardized. The purpose of this study was to describe the degree and predictors of return to preoperative IS volume after cardiac surgery. IS volumes were documented in 69 subjects (71% men; mean age, 59 years) undergoing cardiac surgery during the preoperative evaluation and twice daily postoperatively. Nineteen percent of subjects achieved their IS preoperative volume by hospital discharge. Based on highest volume achieved, subjects achieved an average of 75% of their preoperative volume by discharge, and only age and number of bypass grafts predicted return to preoperative IS volume. These data may assist nurses and patients to set realistic goals for postoperative IS volume achievement.
Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei
2014-12-10
Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.
iSS-PC: Identifying Splicing Sites via Physical-Chemical Properties Using Deep Sparse Auto-Encoder.
Xu, Zhao-Chun; Wang, Peng; Qiu, Wang-Ren; Xiao, Xuan
2017-08-15
Gene splicing is one of the most significant biological processes in eukaryotic gene expression, such as RNA splicing, which can cause a pre-mRNA to produce one or more mature messenger RNAs containing the coded information with multiple biological functions. Thus, identifying splicing sites in DNA/RNA sequences is significant for both the bio-medical research and the discovery of new drugs. However, it is expensive and time consuming based only on experimental technique, so new computational methods are needed. To identify the splice donor sites and splice acceptor sites accurately and quickly, a deep sparse auto-encoder model with two hidden layers, called iSS-PC, was constructed based on minimum error law, in which we incorporated twelve physical-chemical properties of the dinucleotides within DNA into PseDNC to formulate given sequence samples via a battery of cross-covariance and auto-covariance transformations. In this paper, five-fold cross-validation test results based on the same benchmark data-sets indicated that the new predictor remarkably outperformed the existing prediction methods in this field. Furthermore, it is expected that many other related problems can be also studied by this approach. To implement classification accurately and quickly, an easy-to-use web-server for identifying slicing sites has been established for free access at: http://www.jci-bioinfo.cn/iSS-PC.
Towards an understanding of the role of the environment in the development of early callous behavior
Waller, Rebecca; Shaw, Daniel S.; Neiderhiser, Jenae M.; Ganiban, Jody M.; Natsuaki, Misaki N.; Reiss, David; Trentacosta, Christopher; Leve, Leslie D.; Hyde, Luke W.
2015-01-01
Key to understanding the long-term impact of social inequalities is identifying early behaviors that may signal higher risk for later poor psychosocial outcomes, such as psychopathology. A set of early-emerging characteristics that may signal risk for later externalizing psychopathology is Callous-Unemotional (CU) behavior. CU behavior predict severe and chronic trajectories of externalizing behaviors in youth. However, much research on CU behavior has focused on late childhood and adolescence, with little attention paid to early childhood when preventative interventions may be most effective. In this paper, we summarize our recent work showing that: (1) CU behavior can be identified in early childhood using items from common behavior checklists; (2) CU behavior predicts worse outcomes across early childhood; (3) CU behavior exhibits a distinct nomological network from other early externalizing behaviors; and (4) malleable environmental factors, particularly parenting, may play a role in the development of early CU behaviors. We discuss the challenges of studying contextual contributors to the development of CU behavior in terms of gene-environment correlations and present initial results from work examining CU behavior in an adoption study in which gene-environment correlations are examined in early childhood. We find that parenting is a predictor of early CU behavior even in a sample in which parents are not genetically related to the children. PMID:26291075
Freed, Nikki E; Bumann, Dirk; Silander, Olin K
2016-09-06
Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 481 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 revealed that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, implying that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.
El Desoky, Ehab S; Abdelhafez, Alaa T; Cusato, Jessica; Kamel, Sherif I; Hussein, Abeer Mr; De Nicolo, Amedeo; Di Perri, Giovanni; D'Avolio, Antonio
2017-09-01
Few data are available concerning the roles of polymorphisms of inosine triphosphatase (ITPA) gene and ribavirin (RBV) transporter genes in the prediction of RBV-induced anaemia among Egyptians with chronic hepatitis C (CHC). Genotyping of three ITPA gene variants and two variants of RBV transporter genes has been performed in 123 patients under pegylated interferon-α/ribavirin treatment. The baseline haemoglobin and ITPA rs1127354 CA/AA have been found as predictors of anaemia at 4, 8 and 12 weeks of RBV therapy. In addition, ITPA rs7270101 AC/CC and age predicted anaemia after 12 weeks of therapy. In conclusion, the ITPA variant rs1127354C>A significantly predict RBV-induced anaemia during the first 3 months of treatment and it is recommended to be assessed before RBV administration. © 2017 John Wiley & Sons Australia, Ltd.
Long-range evolutionary constraints reveal cis-regulatory interactions on the human X chromosome
Naville, Magali; Ishibashi, Minaka; Ferg, Marco; Bengani, Hemant; Rinkwitz, Silke; Krecsmarik, Monika; Hawkins, Thomas A.; Wilson, Stephen W.; Manning, Elizabeth; Chilamakuri, Chandra S. R.; Wilson, David I.; Louis, Alexandra; Lucy Raymond, F.; Rastegar, Sepand; Strähle, Uwe; Lenhard, Boris; Bally-Cuif, Laure; van Heyningen, Veronica; FitzPatrick, David R.; Becker, Thomas S.; Roest Crollius, Hugues
2015-01-01
Enhancers can regulate the transcription of genes over long genomic distances. This is thought to lead to selection against genomic rearrangements within such regions that may disrupt this functional linkage. Here we test this concept experimentally using the human X chromosome. We describe a scoring method to identify evolutionary maintenance of linkage between conserved noncoding elements and neighbouring genes. Chromatin marks associated with enhancer function are strongly correlated with this linkage score. We test >1,000 putative enhancers by transgenesis assays in zebrafish to ascertain the identity of the target gene. The majority of active enhancers drive a transgenic expression in a pattern consistent with the known expression of a linked gene. These results show that evolutionary maintenance of linkage is a reliable predictor of an enhancer's function, and provide new information to discover the genetic basis of diseases caused by the mis-regulation of gene expression. PMID:25908307
Gangopadhyay, Aparna
2018-01-01
To identify risk factors that lower efficacy of antibiotic prophylaxis of febrile neutropenia among older patients on chemoradiation. Audit of institutional data showed that older adults are at higher risk of febrile neutropenia during chemoradiation. In limited resource settings widespread use of Granulocyte-Colony Stimulating Factor (G-CSF) is not economically feasible and antibiotics are used commonly. Despite compliance with antibiotics, prophylaxis is inadequate in many patients owing to patient and tumor related factors. Data from records of 219 older patients receiving antibiotic prophylaxis during chemoradiation were studied. Baseline assessment data and predisposing factors for febrile neutropenia were recorded. All patients received prophylactic fluoroquinolones. Incidence of febrile neutropenia and association with predisposing factors at baseline was analyzed by multiple logistic regression. 38.4% developed febrile neutropenia despite compliance. Multiple logistic regression revealed geriatric assessment (G8) score and tumor stage to be significant predictors of febrile neutropenia while on antibiotics ( p < 0.0001). Odds ratios for two significant predictors G8 score and tumor stage, respectively, were 2.9 (95% CI 1.8036-4.6815) and 2.7 (95% CI 1.7501-4.1318). Correlation between these two significant predictors was found to be low in our cohort (Spearman's coefficient of rank correlation (rho) - 0.431, p < 0.0001). G8 score and tumor burden are significant predictors of efficacy of antibiotic prophylaxis among older adults receiving chemoradiation. In older patients having poor G8 scores and advanced tumors, antibiotic prophylaxis is unsuitable. Interestingly, co-morbidities and poor performance status did not impact efficacy of antibiotic prophylaxis among our elderly patients.
Kautzky, Alexander; Dold, Markus; Bartova, Lucie; Spies, Marie; Vanicek, Thomas; Souery, Daniel; Montgomery, Stuart; Mendlewicz, Julien; Zohar, Joseph; Fabbri, Chiara; Serretti, Alessandro; Lanzenberger, Rupert; Kasper, Siegfried
The study objective was to generate a prediction model for treatment-resistant depression (TRD) using machine learning featuring a large set of 47 clinical and sociodemographic predictors of treatment outcome. 552 Patients diagnosed with major depressive disorder (MDD) according to DSM-IV criteria were enrolled between 2011 and 2016. TRD was defined as failure to reach response to antidepressant treatment, characterized by a Montgomery-Asberg Depression Rating Scale (MADRS) score below 22 after at least 2 antidepressant trials of adequate length and dosage were administered. RandomForest (RF) was used for predicting treatment outcome phenotypes in a 10-fold cross-validation. The full model with 47 predictors yielded an accuracy of 75.0%. When the number of predictors was reduced to 15, accuracies between 67.6% and 71.0% were attained for different test sets. The most informative predictors of treatment outcome were baseline MADRS score for the current episode; impairment of family, social, and work life; the timespan between first and last depressive episode; severity; suicidal risk; age; body mass index; and the number of lifetime depressive episodes as well as lifetime duration of hospitalization. With the application of the machine learning algorithm RF, an efficient prediction model with an accuracy of 75.0% for forecasting treatment outcome could be generated, thus surpassing the predictive capabilities of clinical evaluation. We also supply a simplified algorithm of 15 easily collected clinical and sociodemographic predictors that can be obtained within approximately 10 minutes, which reached an accuracy of 70.6%. Thus, we are confident that our model will be validated within other samples to advance an accurate prediction model fit for clinical usage in TRD. © Copyright 2017 Physicians Postgraduate Press, Inc.
Finkelstein, Julia L; Mehta, Saurabh; Duggan, Christopher P; Spiegelman, Donna; Aboud, Said; Kupka, Roland; Msamanga, Gernard I; Fawzi, Wafaie W
2012-01-01
Objective Anaemia is common during pregnancy, and prenatal Fe supplementation is the standard of care. However, the persistence of anaemia despite Fe supplementation, particularly in HIV infection, suggests that its aetiology may be more complex and warrants further investigation. The present study was conducted to examine predictors of incident haematological outcomes in HIV-infected pregnant women in Tanzania. Design Prospective cohort study. Cox proportional hazards and binomial regression models were used to identify predictors of incident haematological outcomes: anaemia (Hb < 110 g/l), severe anaemia (Hb < 85 g/l) and hypochromic microcytosis, during the follow-up period. Setting Antenatal clinics in Dar es Salaam, Tanzania. Subjects Participants were 904 HIV-infected pregnant women enrolled in a randomized trial of vitamins (1995–1997). Results Malaria, pathogenic protozoan and hookworm infections at baseline were associated with a two-fold increase in the risk of anaemia and hypochromic microcytosis during follow-up. Higher baseline erythrocyte sedimentation rate and CD8 T-cell concentrations, and lower Hb concentrations and CD4 T-cell counts, were independent predictors of incident anaemia and Fe deficiency. Low baseline vitamin D (<32 ng/ml) concentrations predicted a 1·4 and 2·3 times greater risk of severe anaemia and hypochromic microcytosis, respectively, during the follow-up period. Conclusions Parasitic infections, vitamin D insufficiency, low CD4 T-cell count and high erythrocyte sedimentation rate were the main predictors of anaemia and Fe deficiency in pregnancy and the postpartum period in this population. A comprehensive approach to prevent and manage anaemia, including micronutrient supplementation and infectious disease control, is warranted in HIV-infected women in resource-limited settings – particularly during the pre- and postpartum periods. PMID:22014374
Jones, Eric C; Faas, Albert J; Murphy, Arthur D; Tobin, Graham A; Whiteford, Linda M; McCarty, Christopher
2013-03-01
Although virtually all comparative research about risk perception focuses on which hazards are of concern to people in different culture groups, much can be gained by focusing on predictors of levels of risk perception in various countries and places. In this case, we examine standard and novel predictors of risk perception in seven sites among communities affected by a flood in Mexico (one site) and volcanic eruptions in Mexico (one site) and Ecuador (five sites). We conducted more than 450 interviews with questions about how people feel at the time (after the disaster) regarding what happened in the past, their current concerns, and their expectations for the future. We explore how aspects of the context in which people live have an effect on how strongly people perceive natural hazards in relationship with demographic, well-being, and social network factors. Generally, our research indicates that levels of risk perception for past, present, and future aspects of a specific hazard are similar across these two countries and seven sites. However, these contexts produced different predictors of risk perception-in other words, there was little overlap between sites in the variables that predicted the past, present, or future aspects of risk perception in each site. Generally, current stress was related to perception of past danger of an event in the Mexican sites, but not in Ecuador; network variables were mainly important for perception of past danger (rather than future or present danger), although specific network correlates varied from site to site across the countries.
Pareto Tracer: a predictor-corrector method for multi-objective optimization problems
NASA Astrophysics Data System (ADS)
Martín, Adanay; Schütze, Oliver
2018-03-01
This article proposes a novel predictor-corrector (PC) method for the numerical treatment of multi-objective optimization problems (MOPs). The algorithm, Pareto Tracer (PT), is capable of performing a continuation along the set of (local) solutions of a given MOP with k objectives, and can cope with equality and box constraints. Additionally, the first steps towards a method that manages general inequality constraints are also introduced. The properties of PT are first discussed theoretically and later numerically on several examples.
Yang, Xinan Holly; Li, Meiyi; Wang, Bin; Zhu, Wanqi; Desgardin, Aurelie; Onel, Kenan; de Jong, Jill; Chen, Jianjun; Chen, Luonan; Cunningham, John M
2015-03-24
Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational "gene-to-function, snapshot-to-dynamics, and biology-to-clinic" framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform. We introduced an adjustable "soft threshold" to a functional gene-set algorithm and found that two different analysis methods identified distinct gene-set signatures from the same samples. We identified a 30-gene cluster that characterizes leukemic stem cell (LSC)-depleted cells and a 25-gene cluster that characterizes LSC-enriched cells in parallel; both mark favorable-prognosis in AML. Genes within each signature significantly share common biological processes and/or molecular functions (empirical p = 6e-5 and 0.03 respectively). The 25-gene signature reflects the abnormal development of stem cells in AML, such as AURKA over-expression. We subsequently determined that the clinical relevance of both signatures is independent of known clinical risk classifications in 214 patients with cytogenetically normal AML. We successfully validated the prognosis of both signatures in two independent cohorts of 91 and 242 patients respectively (log-rank p < 0.0015 and 0.05; empirical p < 0.015 and 0.08). The proposed algorithms and computational framework will harness systems biology research because they efficiently translate gene-sets (rather than single genes) into biological discoveries about AML and other complex diseases.
Raychaudhuri, Soumya; Korn, Joshua M.; McCarroll, Steven A.; Altshuler, David; Sklar, Pamela; Purcell, Shaun; Daly, Mark J.
2010-01-01
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package. PMID:20838587
Radiation Quality Effects on Transcriptome Profiles in 3-d Cultures After Particle Irradiation
NASA Technical Reports Server (NTRS)
Patel, Z. S.; Kidane, Y. H.; Huff, J. L.
2014-01-01
In this work, we evaluate the differential effects of low- and high-LET radiation on 3-D organotypic cultures in order to investigate radiation quality impacts on gene expression and cellular responses. Reducing uncertainties in current risk models requires new knowledge on the fundamental differences in biological responses (the so-called radiation quality effects) triggered by heavy ion particle radiation versus low-LET radiation associated with Earth-based exposures. We are utilizing novel 3-D organotypic human tissue models that provide a format for study of human cells within a realistic tissue framework, thereby bridging the gap between 2-D monolayer culture and animal models for risk extrapolation to humans. To identify biological pathway signatures unique to heavy ion particle exposure, functional gene set enrichment analysis (GSEA) was used with whole transcriptome profiling. GSEA has been used extensively as a method to garner biological information in a variety of model systems but has not been commonly used to analyze radiation effects. It is a powerful approach for assessing the functional significance of radiation quality-dependent changes from datasets where the changes are subtle but broad, and where single gene based analysis using rankings of fold-change may not reveal important biological information. We identified 45 statistically significant gene sets at 0.05 q-value cutoff, including 14 gene sets common to gamma and titanium irradiation, 19 gene sets specific to gamma irradiation, and 12 titanium-specific gene sets. Common gene sets largely align with DNA damage, cell cycle, early immune response, and inflammatory cytokine pathway activation. The top gene set enriched for the gamma- and titanium-irradiated samples involved KRAS pathway activation and genes activated in TNF-treated cells, respectively. Another difference noted for the high-LET samples was an apparent enrichment in gene sets involved in cycle cycle/mitotic control. It is plausible that the enrichment in these particular pathways results from the complex DNA damage resulting from high-LET exposure where repair processes are not completed during the same time scale as the less complex damage resulting from low-LET radiation.
In vitro downregulated hypoxia transcriptome is associated with poor prognosis in breast cancer.
Abu-Jamous, Basel; Buffa, Francesca M; Harris, Adrian L; Nandi, Asoke K
2017-06-15
Hypoxia is a characteristic of breast tumours indicating poor prognosis. Based on the assumption that those genes which are up-regulated under hypoxia in cell-lines are expected to be predictors of poor prognosis in clinical data, many signatures of poor prognosis were identified. However, it was observed that cell line data do not always concur with clinical data, and therefore conclusions from cell line analysis should be considered with caution. As many transcriptomic cell-line datasets from hypoxia related contexts are available, integrative approaches which investigate these datasets collectively, while not ignoring clinical data, are required. We analyse sixteen heterogeneous breast cancer cell-line transcriptomic datasets in hypoxia-related conditions collectively by employing the unique capabilities of the method, UNCLES, which integrates clustering results from multiple datasets and can address questions that cannot be answered by existing methods. This has been demonstrated by comparison with the state-of-the-art iCluster method. From this collection of genome-wide datasets include 15,588 genes, UNCLES identified a relatively high number of genes (>1000 overall) which are consistently co-regulated over all of the datasets, and some of which are still poorly understood and represent new potential HIF targets, such as RSBN1 and KIAA0195. Two main, anti-correlated, clusters were identified; the first is enriched with MYC targets participating in growth and proliferation, while the other is enriched with HIF targets directly participating in the hypoxia response. Surprisingly, in six clinical datasets, some sub-clusters of growth genes are found consistently positively correlated with hypoxia response genes, unlike the observation in cell lines. Moreover, the ability to predict bad prognosis by a combined signature of one sub-cluster of growth genes and one sub-cluster of hypoxia-induced genes appears to be comparable and perhaps greater than that of known hypoxia signatures. We present a clustering approach suitable to integrate data from diverse experimental set-ups. Its application to breast cancer cell line datasets reveals new hypoxia-regulated signatures of genes which behave differently when in vitro (cell-line) data is compared with in vivo (clinical) data, and are of a prognostic value comparable or exceeding the state-of-the-art hypoxia signatures.
Hopman, J; Hakizimana, B; Meintjes, W A J; Nillessen, M; de Both, E; Voss, A; Mehtar, S
2016-01-01
Hospital-associated infections (HAIs) are more frequently encountered in low- than in high-resource settings. There is a need to identify and implement feasible and sustainable approaches to strengthen HAI prevention in low-resource settings. To evaluate the biological contamination of routinely cleaned mattresses in both high- and low-resource settings. In this two-stage observational study, routine manual bed cleaning was evaluated at two university hospitals using adenosine triphosphate (ATP). Standardized training of cleaning personnel was achieved in both high- and low-resource settings. Qualitative analysis of the cleaning process was performed to identify predictors of cleaning outcome in low-resource settings. Mattresses in low-resource settings were highly contaminated prior to cleaning. Cleaning significantly reduced biological contamination of mattresses in low-resource settings (P < 0.0001). After training, the contamination observed after cleaning in both the high- and low-resource settings seemed comparable. Cleaning with appropriate type of cleaning materials reduced the contamination of mattresses adequately. Predictors for mattresses that remained contaminated in a low-resource setting included: type of product used, type of ward, training, and the level of contamination prior to cleaning. In low-resource settings mattresses were highly contaminated as noted by ATP levels. Routine manual cleaning by trained staff can be as effective in a low-resource setting as in a high-resource setting. We recommend a multi-modal cleaning strategy that consists of training of domestic services staff, availability of adequate time to clean beds between patients, and application of the correct type of cleaning products. Copyright © 2015 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
The Random Forests Statistical Technique: An Examination of Its Value for the Study of Reading
ERIC Educational Resources Information Center
Matsuki, Kazunaga; Kuperman, Victor; Van Dyke, Julie A.
2016-01-01
Studies investigating individual differences in reading ability often involve data sets containing a large number of collinear predictors and a small number of observations. In this article, we discuss the method of Random Forests and demonstrate its suitability for addressing the statistical concerns raised by such data sets. The method is…
Cloud Computing Adoption and Usage in Community Colleges
ERIC Educational Resources Information Center
Behrend, Tara S.; Wiebe, Eric N.; London, Jennifer E.; Johnson, Emily C.
2011-01-01
Cloud computing is gaining popularity in higher education settings, but the costs and benefits of this tool have gone largely unexplored. The purpose of this study was to examine the factors that lead to technology adoption in a higher education setting. Specifically, we examined a range of predictors and outcomes relating to the acceptance of a…
ERIC Educational Resources Information Center
Kolko, David J.; Herschell, Amy D.; Scharf, Deborah M.
2006-01-01
Given the relative absence of treatment outcome studies, information about the specificity and utility of interventions for children who set fires has not been reported. In a treatment outcome study with young boys referred for firesetting that compared brief home visitation from a firefighter, fire safety education (FSE), and cognitive-behavioral…
Happell, Brenda; Platania-Phung, Chris; Scott, David; Stanton, Robert
2015-07-01
A cardiometabolic specialist nursing role could potentially improve physical health of people with serious mental illness. A national survey of Australian nurses working in mental health settings investigated predictors of support for the role. Predictors included belief in physical healthcare neglect, interest in training; higher perceived value of improving physical health care. The findings suggest that nurses see the cardiometabolic health nurse role as a promising initiative for closing gaps in cardiometabolic health care and skilling other nurses in mental health. However, as the majority of variance in cardiometabolic health nurse support was unexplained, more research is urgently needed on factors that explain differences in cardiometabolic health nurse endorsement. © 2014 Wiley Periodicals, Inc.
Cankorur-Cetinkaya, Ayca; Dereli, Elif; Eraslan, Serpil; Karabekmez, Erkan; Dikicioglu, Duygu; Kirdar, Betul
2012-01-01
Background Understanding the dynamic mechanism behind the transcriptional organization of genes in response to varying environmental conditions requires time-dependent data. The dynamic transcriptional response obtained by real-time RT-qPCR experiments could only be correctly interpreted if suitable reference genes are used in the analysis. The lack of available studies on the identification of candidate reference genes in dynamic gene expression studies necessitates the identification and the verification of a suitable gene set for the analysis of transient gene expression response. Principal Findings In this study, a candidate reference gene set for RT-qPCR analysis of dynamic transcriptional changes in Saccharomyces cerevisiae was determined using 31 different publicly available time series transcriptome datasets. Ten of the twelve candidates (TPI1, FBA1, CCW12, CDC19, ADH1, PGK1, GCN4, PDC1, RPS26A and ARF1) we identified were not previously reported as potential reference genes. Our method also identified the commonly used reference genes ACT1 and TDH3. The most stable reference genes from this pool were determined as TPI1, FBA1, CDC19 and ACT1 in response to a perturbation in the amount of available glucose and as FBA1, TDH3, CCW12 and ACT1 in response to a perturbation in the amount of available ammonium. The use of these newly proposed gene sets outperformed the use of common reference genes in the determination of dynamic transcriptional response of the target genes, HAP4 and MEP2, in response to relaxation from glucose and ammonium limitations, respectively. Conclusions A candidate reference gene set to be used in dynamic real-time RT-qPCR expression profiling in yeast was proposed for the first time in the present study. Suitable pools of stable reference genes to be used under different experimental conditions could be selected from this candidate set in order to successfully determine the expression profiles for the genes of interest. PMID:22675547
Chen, Wei-Hua; Lu, Guanting; Chen, Xiao; Zhao, Xing-Ming; Bork, Peer
2017-01-04
OGEE is an Online GEne Essentiality database. To enhance our understanding of the essentiality of genes, in OGEE we collected experimentally tested essential and non-essential genes, as well as associated gene properties known to contribute to gene essentiality. We focus on large-scale experiments, and complement our data with text-mining results. We organized tested genes into data sets according to their sources, and tagged those with variable essentiality statuses across data sets as conditionally essential genes, intending to highlight the complex interplay between gene functions and environments/experimental perturbations. Developments since the last public release include increased numbers of species and gene essentiality data sets, inclusion of non-coding essential sequences and genes with intermediate essentiality statuses. In addition, we included 16 essentiality data sets from cancer cell lines, corresponding to 9 human cancers; with OGEE, users can easily explore the shared and differentially essential genes within and between cancer types. These genes, especially those derived from cell lines that are similar to tumor samples, could reveal the oncogenic drivers, paralogous gene expression pattern and chromosomal structure of the corresponding cancer types, and can be further screened to identify targets for cancer therapy and/or new drug development. OGEE is freely available at http://ogee.medgenius.info. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
2013-01-01
Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432
Emir, Birol; Johnson, Kjell; Kuhn, Max; Parsons, Bruce
2017-01-01
This post hoc analysis used 11 predictive models of data from a large observational study in Germany to evaluate potential predictors of achieving at least 50% pain reduction by week 6 after treatment initiation (50% pain response) with pregabalin (150-600 mg/d) in patients with neuropathic pain (NeP). The potential predictors evaluated included baseline demographic and clinical characteristics, such as patient-reported pain severity (0 [no pain] to 10 [worst possible pain]) and pain-related sleep disturbance scores (0 [sleep not impaired] to 10 [severely impaired sleep]) that were collected during clinic visits (baseline and weeks 1, 3, and 6). Baseline characteristics were also evaluated combined with pain change at week 1 or weeks 1 and 3 as potential predictors of end-of-treatment 50% pain response. The 11 predictive models were linear, nonlinear, and tree based, and all predictors in the training dataset were ranked according to their variable importance and normalized to 100%. The training dataset comprised 9187 patients, and the testing dataset had 6114 patients. To adjust for the high imbalance in the responder distribution (75% of patients were 50% responders), which can skew the parameter tuning process, the training set was balanced into sets of 1000 responders and 1000 nonresponders. The predictive modeling approaches that were used produced consistent results. Baseline characteristics alone had fair predictive value (accuracy range, 0.61-0.72; κ range, 0.17-0.30). Baseline predictors combined with pain change at week 1 had moderate predictive value (accuracy, 0.73-0.81; κ range, 0.37-0.49). Baseline predictors with pain change at weeks 1 and 3 had substantial predictive value (accuracy, 0.83-0.89; κ range, 0.54-0.71). When variable importance across the models was estimated, the best predictor of 50% responder status was pain change at week 3 (average importance 100.0%), followed by pain change at week 1 (48.1%), baseline pain score (14.1%), baseline depression (13.9%), and using pregabalin as a monotherapy (11.7%). The finding that pain changes by week 1 or weeks 1 and 3 are the best predictors of pregabalin response at 6 weeks suggests that adhering to a pregabalin medication regimen is important for an optimal end-of-treatment outcome. Regarding baseline predictors alone, considerable published evidence supports the importance of high baseline pain score and presence of depression as factors that can affect treatment response. Future research would be required to elucidate why using pregabalin as a monotherapy also had more than a 10% variable importance as a potential predictor. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Multiconstrained gene clustering based on generalized projections
2010-01-01
Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386
What more do we need to know for a world without violence?
Ercan, Oya; Baltas, Zuhal; Tuzun, Umran; Alikasifoglu, Mujgan
2007-01-01
Violence, a universal health issue, presents serious implications for general health and interpersonal relations. Roots of violence appear in early childhood and instances of extreme violence may become apparent in adolescence. Serious antisocial behavior in adolescence is a predictor of violence in later age. Risk factors for violent behavior could be categorized as individual and environmental. Environmental risk factors can be familial, social, and chemical environmental. Maltreatment in childhood is an important predictor of violent behavior in later age. The presence of mental illness is another important predictor of violence. Contemporary television has a visual and auditory power to promote violence with all its elements. Computers are another field where children confront violence. For identification of individuals who have an increased propensity or susceptibility, for violent behavior, research has suggested that polymorphisms related to certain genes might be important. However, we should emphasize that the expression of such behavior would always depend on interactions between various genes, environmental factors, and genetic-environmental interactions. Experiments in rhesus monkeys have shown that optimal early social experiences might overcome the deleterious effects of susceptible alleles. The effective prevention of violence should consist of interventions that aim to reduce the number of risk factors during early childhood, such as home visitation programs and giving individuals the skills and opportunities for engaging in positive behaviors during school years and adolescence, coupled with the identification of new barriers and reassessment of needs.
Kaushik, Abhinav; Bhatia, Yashuma; Ali, Shakir; Gupta, Dinesh
2015-01-01
Metastatic melanoma patients have a poor prognosis, mainly attributable to the underlying heterogeneity in melanoma driver genes and altered gene expression profiles. These characteristics of melanoma also make the development of drugs and identification of novel drug targets for metastatic melanoma a daunting task. Systems biology offers an alternative approach to re-explore the genes or gene sets that display dysregulated behaviour without being differentially expressed. In this study, we have performed systems biology studies to enhance our knowledge about the conserved property of disease genes or gene sets among mutually exclusive datasets representing melanoma progression. We meta-analysed 642 microarray samples to generate melanoma reconstructed networks representing four different stages of melanoma progression to extract genes with altered molecular circuitry wiring as compared to a normal cellular state. Intriguingly, a majority of the melanoma network-rewired genes are not differentially expressed and the disease genes involved in melanoma progression consistently modulate its activity by rewiring network connections. We found that the shortlisted disease genes in the study show strong and abnormal network connectivity, which enhances with the disease progression. Moreover, the deviated network properties of the disease gene sets allow ranking/prioritization of different enriched, dysregulated and conserved pathway terms in metastatic melanoma, in agreement with previous findings. Our analysis also reveals presence of distinct network hubs in different stages of metastasizing tumor for the same set of pathways in the statistically conserved gene sets. The study results are also presented as a freely available database at http://bioinfo.icgeb.res.in/m3db/. The web-based database resource consists of results from the analysis presented here, integrated with cytoscape web and user-friendly tools for visualization, retrieval and further analysis. PMID:26558755
JGI Plant Genomics Gene Annotation Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Shengqiang; Rokhsar, Dan; Goodstein, David
2014-07-14
Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less
Zotova, T Yu; Kubanova, A P; Azova, M M; Aissa, A Ait; Gigani, O O; Frolov, V A
2016-07-01
Changes in the frequencies of genotypes and mutant alleles of ACE, AGTR1, AGT, and ITGB3 genes were analyzed in patients with arterial hypertension coupled with metabolic syndrome (N=15) and compared with population data and corresponding parameters in patients with isolated hypertension (N=15). Increased frequency of genotype ID of ACE gene (hypertension predictor) was confirmed for both groups. In case of isolated hypertension, M235M genotype (gene AGT) was more frequent, in case of hypertension combined with metabolic syndrome, the frequency of genotypes A1166C and C1166C of the gene AGTR1 was higher in comparison with population data. Comparison of mutant allele frequencies in the two groups showed that at the 90% significance level allele T of the AGT gene was more frequent in hypertension coupled with metabolic syndrome (OR=1.26) and genotype A1166A of the AGTR1 gene was more frequent in the group with isolated hypertension.
Repressors Nrg1 and Nrg2 Regulate a Set of Stress-Responsive Genes in Saccharomyces cerevisiae§
Vyas, Valmik K.; Berkey, Cristin D.; Miyao, Takenori; Carlson, Marian
2005-01-01
The yeast Saccharomyces cerevisiae responds to environmental stress by rapidly altering the expression of large sets of genes. We report evidence that the transcriptional repressors Nrg1 and Nrg2 (Nrg1/Nrg2), which were previously implicated in glucose repression, regulate a set of stress-responsive genes. Genome-wide expression analysis identified 150 genes that were upregulated in nrg1Δ nrg2Δ double mutant cells, relative to wild-type cells, during growth in glucose. We found that many of these genes are regulated by glucose repression. Stress response elements (STREs) and STRE-like elements are overrepresented in the promoters of these genes, and a search of available expression data sets showed that many are regulated in response to a variety of environmental stress signals. In accord with these findings, mutation of NRG1 and NRG2 enhanced the resistance of cells to salt and oxidative stress and decreased tolerance to freezing. We present evidence that Nrg1/Nrg2 not only contribute to repression of target genes in the absence of stress but also limit induction in response to salt stress. We suggest that Nrg1/Nrg2 fine-tune the regulation of a set of stress-responsive genes. PMID:16278455
Maxwell, Rochelle R; Cole, Peter D
2017-06-01
The aim of this review is to summarize the most recent and most robust pharmacogenetic predictors of treatment-related toxicity (TRT) in childhood acute lymphoblastic leukemia (ALL). Multiple studies have examined the toxicities of the primary chemotherapeutic agents used to treat childhood ALL in relation to host genetic factors. However, few results have been replicated independently, largely due to cohort differences in ancestry, chemotherapy treatment protocols, and definitions of toxicities. To date, there is only one widely accepted clinical guideline for dose modification based on gene status: thiopurine dosing based on TPMT genotype. Based on recent data, it is likely that this guideline will be modified to incorporate other gene variants, such as NUDT15. We highlight genetic variants that have been consistently associated with TRT across treatment groups, as well as those that best illustrate the underlying pathophysiology of TRT. In the coming decade, we expect that survivorship care will routinely specify screening recommendations based on genetics. Furthermore, clinical trials testing protective interventions may modify inclusion criteria based on genetically determined risk of specific TRTs.
Tsai, Chia-Ti; Hsieh, Chia-Shan; Chang, Sheng-Nan; Chuang, Eric Y.; Ueng, Kwo-Chang; Tsai, Chin-Feng; Lin, Tsung-Hsien; Wu, Cho-Kai; Lee, Jen-Kuang; Lin, Lian-Yu; Wang, Yi-Chih; Yu, Chih-Chieh; Lai, Ling-Ping; Tseng, Chuen-Den; Hwang, Juey-Jen; Chiang, Fu-Tien; Lin, Jiunn-Lee
2016-01-01
Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia. Previous genome-wide association studies had identified single-nucleotide polymorphisms in several genomic regions to be associated with AF. In human genome, copy number variations (CNVs) are known to contribute to disease susceptibility. Using a genome-wide multistage approach to identify AF susceptibility CNVs, we here show a common 4,470-bp diallelic CNV in the first intron of potassium interacting channel 1 gene (KCNIP1) is strongly associated with AF in Taiwanese populations (odds ratio=2.27 for insertion allele; P=6.23 × 10−24). KCNIP1 insertion is associated with higher KCNIP1 mRNA expression. KCNIP1-encoded protein potassium interacting channel 1 (KCHIP1) is physically associated with potassium Kv channels and modulates atrial transient outward current in cardiac myocytes. Overexpression of KCNIP1 results in inducible AF in zebrafish. In conclusions, a common CNV in KCNIP1 gene is a genetic predictor of AF risk possibly pointing to a functional pathway. PMID:26831368
Naaijen, J; Bralten, J; Poelmans, G; Faraone, Stephen; Asherson, Philip; Banaschewski, Tobias; Buitelaar, Jan; Franke, Barbara; P Ebstein, Richard; Gill, Michael; Miranda, Ana; D Oades, Robert; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph; Sonuga-Barke, Edmund; Anney, Richard; Mulas, Fernando; Steinhausen, Hans-Christoph; Glennon, J C; Franke, B; Buitelaar, J K
2017-01-01
Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD. PMID:28072412