Sample records for cross-study gene set

  1. An integrated analysis of genes and functional pathways for aggression in human and rodent models.

    PubMed

    Zhang-James, Yanli; Fernàndez-Castillo, Noèlia; Hess, Jonathan L; Malki, Karim; Glatt, Stephen J; Cormand, Bru; Faraone, Stephen V

    2018-06-01

    Human genome-wide association studies (GWAS), transcriptome analyses of animal models, and candidate gene studies have advanced our understanding of the genetic architecture of aggressive behaviors. However, each of these methods presents unique limitations. To generate a more confident and comprehensive view of the complex genetics underlying aggression, we undertook an integrated, cross-species approach. We focused on human and rodent models to derive eight gene lists from three main categories of genetic evidence: two sets of genes identified in GWAS studies, four sets implicated by transcriptome-wide studies of rodent models, and two sets of genes with causal evidence from online Mendelian inheritance in man (OMIM) and knockout (KO) mice reports. These gene sets were evaluated for overlap and pathway enrichment to extract their similarities and differences. We identified enriched common pathways such as the G-protein coupled receptor (GPCR) signaling pathway, axon guidance, reelin signaling in neurons, and ERK/MAPK signaling. Also, individual genes were ranked based on their cumulative weights to quantify their importance as risk factors for aggressive behavior, which resulted in 40 top-ranked and highly interconnected genes. The results of our cross-species and integrated approach provide insights into the genetic etiology of aggression.

  2. A Guideline to Family-Wide Comparative State-of-the-Art Quantitative RT-PCR Analysis Exemplified with a Brassicaceae Cross-Species Seed Germination Case Study[W][OA

    PubMed Central

    Graeber, Kai; Linkies, Ada; Wood, Andrew T.A.; Leubner-Metzger, Gerhard

    2011-01-01

    Comparative biology includes the comparison of transcriptome and quantitative real-time RT-PCR (qRT-PCR) data sets in a range of species to detect evolutionarily conserved and divergent processes. Transcript abundance analysis of target genes by qRT-PCR requires a highly accurate and robust workflow. This includes reference genes with high expression stability (i.e., low intersample transcript abundance variation) for correct target gene normalization. Cross-species qRT-PCR for proper comparative transcript quantification requires reference genes suitable for different species. We addressed this issue using tissue-specific transcriptome data sets of germinating Lepidium sativum seeds to identify new candidate reference genes. We investigated their expression stability in germinating seeds of L. sativum and Arabidopsis thaliana by qRT-PCR, combined with in silico analysis of Arabidopsis and Brassica napus microarray data sets. This revealed that reference gene expression stability is higher for a given developmental process between distinct species than for distinct developmental processes within a given single species. The identified superior cross-species reference genes may be used for family-wide comparative qRT-PCR analysis of Brassicaceae seed germination. Furthermore, using germinating seeds, we exemplify optimization of the qRT-PCR workflow for challenging tissues regarding RNA quality, transcript stability, and tissue abundance. Our work therefore can serve as a guideline for moving beyond Arabidopsis by establishing high-quality cross-species qRT-PCR. PMID:21666000

  3. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

    PubMed

    Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh

    2018-04-26

    Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

  4. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

    PubMed Central

    Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe

    2015-01-01

    Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374

  5. Expression of HOXB genes is significantly different in acute myeloid leukemia with a partial tandem duplication of MLL vs. a MLL translocation: a cross-laboratory study.

    PubMed

    Liu, Hsi-Che; Shih, Lee-Yung; May Chen, Mei-Ju; Wang, Chien-Chih; Yeh, Ting-Chi; Lin, Tung-Huei; Chen, Chien-Yu; Lin, Chih-Jen; Liang, Der-Cherng

    2011-05-01

    In acute myeloid leukemia (AML), the mixed lineage leukemia (MLL) gene may be rearranged to generate a partial tandem duplication (PTD), or fused to partner genes through a chromosomal translocation (tMLL). In this study, we first explored the differentially expressed genes between MLL-PTD and tMLL using gene expression profiling of our cohort (15 MLL-PTD and 10 tMLL) and one published data set. The top 250 probes were chosen from each set, resulting in 29 common probes (21 unique genes) to both sets. The selected genes include four HOXB genes, HOXB2, B3, B5, and B6. The expression values of these HOXB genes significantly differ between MLL-PTD and tMLL cases. Clustering and classification analyses were thoroughly conducted to support our gene selection results. Second, as MLL-PTD, FLT3-ITD, and NPM1 mutations are identified in AML with normal karyotypes, we briefly studied their impact on the HOXB genes. Another contribution of this study is to demonstrate that using public data from other studies enriches samples for analysis and yields more conclusive results. 2011 Elsevier Inc. All rights reserved.

  6. Superior Cross-Species Reference Genes: A Blueberry Case Study

    PubMed Central

    Die, Jose V.; Rowland, Lisa J.

    2013-01-01

    The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the application of high throughput gene expression approaches that should relatively quickly increase our understanding of blueberry physiology. These studies, however, require a highly accurate and robust workflow and make necessary the identification of reference genes with high expression stability for correct target gene normalization. To create a set of superior reference genes for blueberry expression analyses, we mined a publicly available transcriptome data set from blueberry for orthologs to a set of Arabidopsis genes that showed the most stable expression in a developmental series. In total, the expression stability of 13 putative reference genes was evaluated by qPCR and a set of new references with high stability values across a developmental series in fruits and floral buds of blueberry were identified. We also demonstrated the need to use at least two, preferably three, reference genes to avoid inconsistencies in results, even when superior reference genes are used. The new references identified here provide a valuable resource for accurate normalization of gene expression in Vaccinium spp. and may be useful for other members of the Ericaceae family as well. PMID:24058469

  7. Frameshift Suppression in SACCHAROMYCES CEREVISIAE VI. Complete Genetic Map of Twenty-Five Suppressor Genes

    PubMed Central

    Gaber, Richard F.; Mathison, Lorilee; Edelman, Irv; Culbertson, Michael R.

    1983-01-01

    Five previously unmapped frameshift suppressor genes have been located on the yeast genetic map. In addition, we have further characterized the map positions of two suppressors whose approximate locations were determined in an earlier study. These results represent the completion of genetic mapping studies on all 25 of the known frameshift suppressor genes in yeast.—The approximate location of each suppressor gene was initially determined through the use of a set of mapping strains containing 61 signal markers distributed throughout the yeast genome. Standard meiotic linkage was assayed in crosses between strains carrying the suppressors and the mapping strains. Subsequent to these approximate linkage determinations, each suppressor gene was more precisely located in multi-point crosses. The implications of these mapping results for the genomic distribution of frameshift suppressor genes, which include both glycine and proline tRNA genes, are discussed. PMID:17246112

  8. Targeted exploration and analysis of large cross-platform human transcriptomic compendia

    PubMed Central

    Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

    2016-01-01

    We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801

  9. The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.

    PubMed

    Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo

    2018-05-17

    The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.

  10. Bayesian Population Genomic Inference of Crossing Over and Gene Conversion

    PubMed Central

    Padhukasahasram, Badri; Rannala, Bruce

    2011-01-01

    Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857

  11. Statistical assessment of crosstalk enrichment between gene groups in biological networks.

    PubMed

    McCormack, Theodore; Frings, Oliver; Alexeyenko, Andrey; Sonnhammer, Erik L L

    2013-01-01

    Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

  12. Cross-Study Homogeneity of Psoriasis Gene Expression in Skin across a Large Expression Range

    PubMed Central

    Kerkof, Keith; Timour, Martin; Russell, Christopher B.

    2013-01-01

    Background In psoriasis, only limited overlap between sets of genes identified as differentially expressed (psoriatic lesional vs. psoriatic non-lesional) was found using statistical and fold-change cut-offs. To provide a framework for utilizing prior psoriasis data sets we sought to understand the consistency of those sets. Methodology/Principal Findings Microarray expression profiling and qRT-PCR were used to characterize gene expression in PP and PN skin from psoriasis patients. cDNA (three new data sets) and cRNA hybridization (four existing data sets) data were compared using a common analysis pipeline. Agreement between data sets was assessed using varying qualitative and quantitative cut-offs to generate a DEG list in a source data set and then using other data sets to validate the list. Concordance increased from 67% across all probe sets to over 99% across more than 10,000 probe sets when statistical filters were employed. The fold-change behavior of individual genes tended to be consistent across the multiple data sets. We found that genes with <2-fold change values were quantitatively reproducible between pairs of data-sets. In a subset of transcripts with a role in inflammation changes detected by microarray were confirmed by qRT-PCR with high concordance. For transcripts with both PN and PP levels within the microarray dynamic range, microarray and qRT-PCR were quantitatively reproducible, including minimal fold-changes in IL13, TNFSF11, and TNFRSF11B and genes with >10-fold changes in either direction such as CHRM3, IL12B and IFNG. Conclusions/Significance Gene expression changes in psoriatic lesions were consistent across different studies, despite differences in patient selection, sample handling, and microarray platforms but between-study comparisons showed stronger agreement within than between platforms. We could use cut-offs as low as log10(ratio) = 0.1 (fold-change = 1.26), generating larger gene lists that validate on independent data sets. The reproducibility of PP signatures across data sets suggests that different sample sets can be productively compared. PMID:23308107

  13. yStreX: yeast stress expression database

    PubMed Central

    Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina

    2014-01-01

    Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351

  14. Superior cross-species reference genes: a blueberry case study

    USDA-ARS?s Scientific Manuscript database

    The advent of affordable Next Generation Sequencing technologies has had major impact on studies of many crop species, where access to genomic technologies and genome-scale data sets has been extremely limited until now. The recent development of genomic resources in blueberry will enable the applic...

  15. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

    PubMed

    Doungpan, Narumol; Engchuan, Worrawat; Chan, Jonathan H; Meechai, Asawin

    2016-12-05

    Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results. The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms. The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method. The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.

  16. Self-incompatibility in passionfruit: evidence of gametophytic-sporophytic control.

    PubMed

    Suassuna, T de M F; Bruckner, H; de Carvalho, R; Borém, A

    2003-01-01

    Self-incompatibility in passionfruit was studied in families originated from crosses among plants that presented differences in reciprocal crosses. The three families, obtained by crossing S(3) plants, exhibited one incompatible group; no reciprocal differences were observed. The phenotype of the families was the same as the parent plants, S(3). These results suggest the presence of a gene ( G), gametophytic in its action, associated to the sporophytic gene S, modifying the incompatibility reaction in passionfruit. The reciprocal difference exhibited in the crosses among the parents could be explained as a matching between plants homozygous for S, but homozygous and heterozygous for G. Actually this would be a partially compatible cross, not detectable when the evaluation is done based on fruit set data. As the family originated from this kind of cross is homozygous for S and heterozygous for G, no reciprocal differences are expected, and the phenotype should be the same as the parental plants, as observed in the present work.

  17. Gene expression-based molecular diagnostic system for malignant gliomas is superior to histological diagnosis.

    PubMed

    Shirahata, Mitsuaki; Iwao-Koizumi, Kyoko; Saito, Sakae; Ueno, Noriko; Oda, Masashi; Hashimoto, Nobuo; Takahashi, Jun A; Kato, Kikuya

    2007-12-15

    Current morphology-based glioma classification methods do not adequately reflect the complex biology of gliomas, thus limiting their prognostic ability. In this study, we focused on anaplastic oligodendroglioma and glioblastoma, which typically follow distinct clinical courses. Our goal was to construct a clinically useful molecular diagnostic system based on gene expression profiling. The expression of 3,456 genes in 32 patients, 12 and 20 of whom had prognostically distinct anaplastic oligodendroglioma and glioblastoma, respectively, was measured by PCR array. Next to unsupervised methods, we did supervised analysis using a weighted voting algorithm to construct a diagnostic system discriminating anaplastic oligodendroglioma from glioblastoma. The diagnostic accuracy of this system was evaluated by leave-one-out cross-validation. The clinical utility was tested on a microarray-based data set of 50 malignant gliomas from a previous study. Unsupervised analysis showed divergent global gene expression patterns between the two tumor classes. A supervised binary classification model showed 100% (95% confidence interval, 89.4-100%) diagnostic accuracy by leave-one-out cross-validation using 168 diagnostic genes. Applied to a gene expression data set from a previous study, our model correlated better with outcome than histologic diagnosis, and also displayed 96.6% (28 of 29) consistency with the molecular classification scheme used for these histologically controversial gliomas in the original article. Furthermore, we observed that histologically diagnosed glioblastoma samples that shared anaplastic oligodendroglioma molecular characteristics tended to be associated with longer survival. Our molecular diagnostic system showed reproducible clinical utility and prognostic ability superior to traditional histopathologic diagnosis for malignant glioma.

  18. Development of a cross-platform biomarker signature to detect renal transplant tolerance in humans

    PubMed Central

    Sagoo, Pervinder; Perucha, Esperanza; Sawitzki, Birgit; Tomiuk, Stefan; Stephens, David A.; Miqueu, Patrick; Chapman, Stephanie; Craciun, Ligia; Sergeant, Ruhena; Brouard, Sophie; Rovis, Flavia; Jimenez, Elvira; Ballow, Amany; Giral, Magali; Rebollo-Mesa, Irene; Le Moine, Alain; Braudeau, Cecile; Hilton, Rachel; Gerstmayer, Bernhard; Bourcier, Katarzyna; Sharif, Adnan; Krajewska, Magdalena; Lord, Graham M.; Roberts, Ian; Goldman, Michel; Wood, Kathryn J.; Newell, Kenneth; Seyfert-Margolis, Vicki; Warrens, Anthony N.; Janssen, Uwe; Volk, Hans-Dieter; Soulillou, Jean-Paul; Hernandez-Fuentes, Maria P.; Lechler, Robert I.

    2010-01-01

    Identifying transplant recipients in whom immunological tolerance is established or is developing would allow an individually tailored approach to their posttransplantation management. In this study, we aimed to develop reliable and reproducible in vitro assays capable of detecting tolerance in renal transplant recipients. Several biomarkers and bioassays were screened on a training set that included 11 operationally tolerant renal transplant recipients, recipient groups following different immunosuppressive regimes, recipients undergoing chronic rejection, and healthy controls. Highly predictive assays were repeated on an independent test set that included 24 tolerant renal transplant recipients. Tolerant patients displayed an expansion of peripheral blood B and NK lymphocytes, fewer activated CD4+ T cells, a lack of donor-specific antibodies, donor-specific hyporesponsiveness of CD4+ T cells, and a high ratio of forkhead box P3 to α-1,2-mannosidase gene expression. Microarray analysis further revealed in tolerant recipients a bias toward differential expression of B cell–related genes and their associated molecular pathways. By combining these indices of tolerance as a cross-platform biomarker signature, we were able to identify tolerant recipients in both the training set and the test set. This study provides an immunological profile of the tolerant state that, with further validation, should inform and shape drug-weaning protocols in renal transplant recipients. PMID:20501943

  19. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    PubMed Central

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775

  20. Revisiting adverse effects of cross-hybridization in Affymetrix gene expression data: do they matter for correlation analysis?

    PubMed

    Klebanov, Lev; Chen, Linlin; Yakovlev, Andrei

    2007-11-07

    This work was undertaken in response to a recently published paper by Okoniewski and Miller (BMC Bioinformatics 2006, 7: Article 276). The authors of that paper came to the conclusion that the process of multiple targeting in short oligonucleotide microarrays induces spurious correlations and this effect may deteriorate the inference on correlation coefficients. The design of their study and supporting simulations cast serious doubt upon the validity of this conclusion. The work by Okoniewski and Miller drove us to revisit the issue by means of experimentation with biological data and probabilistic modeling of cross-hybridization effects. We have identified two serious flaws in the study by Okoniewski and Miller: (1) The data used in their paper are not amenable to correlation analysis; (2) The proposed simulation model is inadequate for studying the effects of cross-hybridization. Using two other data sets, we have shown that removing multiply targeted probe sets does not lead to a shift in the histogram of sample correlation coefficients towards smaller values. A more realistic approach to mathematical modeling of cross-hybridization demonstrates that this process is by far more complex than the simplistic model considered by the authors. A diversity of correlation effects (such as the induction of positive or negative correlations) caused by cross-hybridization can be expected in theory but there are natural limitations on the ability to provide quantitative insights into such effects due to the fact that they are not directly observable. The proposed stochastic model is instrumental in studying general regularities in hybridization interaction between probe sets in microarray data. As the problem stands now, there is no compelling reason to believe that multiple targeting causes a large-scale effect on the correlation structure of Affymetrix gene expression data. Our analysis suggests that the observed long-range correlations in microarray data are of a biological nature rather than a technological flaw.

  1. High Diversity of Genes for Nonhost Resistance of Barley to Heterologous Rust Fungi

    PubMed Central

    Jafary, Hossein; Albertazzi, Giorgia; Marcel, Thierry C.; Niks, Rients E.

    2008-01-01

    Inheritance studies on the nonhost resistance of plants would normally require interspecific crosses that suffer from sterility and abnormal segregation. Therefore, we developed the barley–Puccinia rust model system to study, using forward genetics, the specificity, number, and diversity of genes involved in nonhost resistance. We developed two mapping populations by crossing the line SusPtrit, with exceptional susceptibility to heterologous rust species, with the immune barley cultivars Vada and Cebada Capa. These two mapping populations along with the Oregon Wolfe Barley population, which showed unexpected segregation for resistance to heterologous rusts, were phenotyped with four heterologous rust fungal species. Positions of QTL conferring nonhost resistance in the three mapping populations were compared using an integrated consensus map. The results confirmed that nonhost resistance in barley to heterologous rust species is controlled by QTL with different and overlapping specificities and by an occasional contribution of an R-gene for hypersensitivity. In each population, different sets of loci were implicated in resistance. Few genes were common between the populations, suggesting a high diversity of genes conferring nonhost resistance to heterologous pathogens. These loci were significantly associated with QTL for partial resistance to the pathogen Puccinia hordei and with defense-related genes. PMID:18430953

  2. Supervised group Lasso with applications to microarray data analysis

    PubMed Central

    Ma, Shuangge; Song, Xiao; Huang, Jian

    2007-01-01

    Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436

  3. Twenty-four signature genes predict the prognosis of oral squamous cell carcinoma with high accuracy and repeatability

    PubMed Central

    Gao, Jianyong; Tian, Gang; Han, Xu; Zhu, Qiang

    2018-01-01

    Oral squamous cell carcinoma (OSCC) is the sixth most common type cancer worldwide, with poor prognosis. The present study aimed to identify gene signatures that could classify OSCC and predict prognosis in different stages. A training data set (GSE41613) and two validation data sets (GSE42743 and GSE26549) were acquired from the online Gene Expression Omnibus database. In the training data set, patients were classified based on the tumor-node-metastasis staging system, and subsequently grouped into low stage (L) or high stage (H). Signature genes between L and H stages were selected by disparity index analysis, and classification was performed by the expression of these signature genes. The established classification was compared with the L and H classification, and fivefold cross validation was used to evaluate the stability. Enrichment analysis for the signature genes was implemented by the Database for Annotation, Visualization and Integration Discovery. Two validation data sets were used to determine the precise of classification. Survival analysis was conducted followed each classification using the package ‘survival’ in R software. A set of 24 signature genes was identified based on the classification model with the Fi value of 0.47, which was used to distinguish OSCC samples in two different stages. Overall survival of patients in the H stage was higher than those in the L stage. Signature genes were primarily enriched in ‘ether lipid metabolism’ pathway and biological processes such as ‘positive regulation of adaptive immune response’ and ‘apoptotic cell clearance’. The results provided a novel 24-gene set that may be used as biomarkers to predict OSCC prognosis with high accuracy, which may be used to determine an appropriate treatment program for patients with OSCC in addition to the traditional evaluation index. PMID:29257303

  4. Towards the integration, annotation and association of historical microarray experiments with RNA-seq.

    PubMed

    Chavan, Shweta S; Bauer, Michael A; Peterson, Erich A; Heuck, Christoph J; Johann, Donald J

    2013-01-01

    Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches. Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets. Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline. A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.

  5. A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children.

    PubMed

    Verhagen, Lilly M; Zomer, Aldert; Maes, Mailis; Villalba, Julian A; Del Nogal, Berenice; Eleveld, Marc; van Hijum, Sacha Aft; de Waard, Jacobus H; Hermans, Peter Wm

    2013-02-01

    Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.

  6. A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children

    PubMed Central

    2013-01-01

    Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts. PMID:23375113

  7. Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

    PubMed

    Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

    2016-02-01

    Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  8. Cross-kingdom amplification using bacteria-specific primers: complications for studies of coral microbial ecology.

    PubMed

    Galkiewicz, Julia P; Kellogg, Christina A

    2008-12-01

    PCR amplification of pure bacterial DNA is vital to the study of bacterial interactions with corals. Commonly used Bacteria-specific primers 8F and 27F paired with the universal primer 1492R amplify both eukaryotic and prokaryotic rRNA genes. An alternative primer set, 63F/1542R, is suggested to resolve this problem.

  9. Cross-Kingdom Amplification Using Bacteria-Specific Primers: Complications for Studies of Coral Microbial Ecology▿

    PubMed Central

    Galkiewicz, Julia P.; Kellogg, Christina A.

    2008-01-01

    PCR amplification of pure bacterial DNA is vital to the study of bacterial interactions with corals. Commonly used Bacteria-specific primers 8F and 27F paired with the universal primer 1492R amplify both eukaryotic and prokaryotic rRNA genes. An alternative primer set, 63F/1542R, is suggested to resolve this problem. PMID:18931299

  10. Innate factors causing differences in gene flow frequency from transgenic rice to different weedy rice biotypes.

    PubMed

    Zuo, Jiao; Zhang, Lianju; Song, Xiaoling; Dai, Weimin; Qiang, Sheng

    2011-06-01

    The compatibility and outcrossing rates between transgenic rice and weedy rice biotypes have been studied in some previous cases. However, few studies have addressed the reasons for these differences. The present study compared the compatibility and outcrossing rates between transgenic rice and selected weedy rice biotypes using manual and natural crossing experiments to elucidate the key innate factors causing the different outcrossing rates. Hybrid seed sets from manual crossing between transgenic rice and weedy rice varied from 31.8 to 82.7%, which correlated directly with genetic compatibility. Moreover, the significant differences in the quantity of germinated donor pollens and pollen tubes entering the weedy rice ovule directly contributed to the different seed sets. The natural outcrossing rates varied from 0 to 6.66‰. The duration of flowering overlap was the key factor influencing natural outcrossing. Plant and panicle height also affected outcrossing success. From this study, it is concluded that the likelihood of gene flow between transgenic rice and weedy rice biotypes is primarily determined by floral synchronisation and secondarily influenced by genetic compatibility and some morphological characteristics. Copyright © 2011 Society of Chemical Industry.

  11. Classification based upon gene expression data: bias and precision of error rates.

    PubMed

    Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L

    2007-06-01

    Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp

  12. Improved score statistics for meta-analysis in single-variant and gene-level association studies.

    PubMed

    Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

    2018-06-01

    Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.

  13. APPRIS 2017: principal isoforms for multiple gene sets

    PubMed Central

    Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso

    2018-01-01

    Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475

  14. Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data

    PubMed Central

    Toker, Lilah; Rocco, Brad; Sibille, Etienne

    2017-01-01

    Establishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single-cell RNA-sequencing (RNA-seq) studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets (MGSs) in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at www.neuroexpresso.org. PMID:29204516

  15. MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis.

    PubMed

    Kim, SungHwan; Lin, Chien-Wei; Tseng, George C

    2016-07-01

    Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies. We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients. An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm). ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Altered gene expression changes in Arabidopsis leaf tissues and protoplasts in response to Plum pox virus infection

    PubMed Central

    Babu, Mohan; Griffiths, Jonathan S; Huang, Tyng-Shyan; Wang, Aiming

    2008-01-01

    Background Virus infection induces the activation and suppression of global gene expression in the host. Profiling gene expression changes in the host may provide insights into the molecular mechanisms that underlie host physiological and phenotypic responses to virus infection. In this study, the Arabidopsis Affymetrix ATH1 array was used to assess global gene expression changes in Arabidopsis thaliana plants infected with Plum pox virus (PPV). To identify early genes in response to PPV infection, an Arabidopsis synchronized single-cell transformation system was developed. Arabidopsis protoplasts were transfected with a PPV infectious clone and global gene expression changes in the transfected protoplasts were profiled. Results Microarray analysis of PPV-infected Arabidopsis leaf tissues identified 2013 and 1457 genes that were significantly (Q ≤ 0.05) up- (≥ 2.5 fold) and downregulated (≤ -2.5 fold), respectively. Genes associated with soluble sugar, starch and amino acid, intracellular membrane/membrane-bound organelles, chloroplast, and protein fate were upregulated, while genes related to development/storage proteins, protein synthesis and translation, and cell wall-associated components were downregulated. These gene expression changes were associated with PPV infection and symptom development. Further transcriptional profiling of protoplasts transfected with a PPV infectious clone revealed the upregulation of defence and cellular signalling genes as early as 6 hours post transfection. A cross sequence comparison analysis of genes differentially regulated by PPV-infected Arabidopsis leaves against uniEST sequences derived from PPV-infected leaves of Prunus persica, a natural host of PPV, identified orthologs related to defence, metabolism and protein synthesis. The cross comparison of genes differentially regulated by PPV infection and by the infections of other positive sense RNA viruses revealed a common set of 416 genes. These identified genes, particularly the early responsive genes, may be critical in virus infection. Conclusion Gene expression changes in PPV-infected Arabidopsis are the molecular basis of stress and defence-like responses, PPV pathogenesis and symptom development. The differentially regulated genes, particularly the early responsive genes, and a common set of genes regulated by infections of PPV and other positive sense RNA viruses identified in this study are candidates suitable for further functional characterization to shed lights on molecular virus-host interactions. PMID:18613973

  17. Genetically diverse CC-founder mouse strains replicate the human influenza gene expression signature.

    PubMed

    Elbahesh, Husni; Schughart, Klaus

    2016-05-19

    Influenza A viruses (IAV) are zoonotic pathogens that pose a major threat to human and animal health. Influenza virus disease severity is influenced by viral virulence factors as well as individual differences in host response. We analyzed gene expression changes in the blood of infected mice using a previously defined set of signature genes that was derived from changes in the blood transcriptome of IAV-infected human volunteers. We found that the human signature was reproduced well in the founder strains of the Collaborative Cross (CC) mice, thus demonstrating the relevance and importance of mouse experimental model systems for studying human influenza disease.

  18. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine.

    PubMed

    Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida

    2016-03-15

    Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.

  19. Cross-Study Comparison Reveals Common Genomic, Network, and Functional Signatures of Desiccation Resistance in Drosophila melanogaster

    PubMed Central

    Telonis-Scott, Marina; Sgrò, Carla M.; Hoffmann, Ary A.; Griffin, Philippa C.

    2016-01-01

    Repeated attempts to map the genomic basis of complex traits often yield different outcomes because of the influence of genetic background, gene-by-environment interactions, and/or statistical limitations. However, where repeatability is low at the level of individual genes, overlap often occurs in gene ontology categories, genetic pathways, and interaction networks. Here we report on the genomic overlap for natural desiccation resistance from a Pool-genome-wide association study experiment and a selection experiment in flies collected from the same region in southeastern Australia in different years. We identified over 600 single nucleotide polymorphisms associated with desiccation resistance in flies derived from almost 1,000 wild-caught genotypes, a similar number of loci to that observed in our previous genomic study of selected lines, demonstrating the genetic complexity of this ecologically important trait. By harnessing the power of cross-study comparison, we narrowed the candidates from almost 400 genes in each study to a core set of 45 genes, enriched for stimulus, stress, and defense responses. In addition to gene-level overlap, there was higher order congruence at the network and functional levels, suggesting genetic redundancy in key stress sensing, stress response, immunity, signaling, and gene expression pathways. We also identified variants linked to different molecular aspects of desiccation physiology previously verified from functional experiments. Our approach provides insight into the genomic basis of a complex and ecologically important trait and predicts candidate genetic pathways to explore in multiple genetic backgrounds and related species within a functional framework. PMID:26733490

  20. hSAGEing: an improved SAGE-based software for identification of human tissue-specific or common tumor markers and suppressors.

    PubMed

    Yang, Cheng-Hong; Chuang, Li-Yeh; Shih, Tsung-Mu; Chang, Hsueh-Wei

    2010-12-17

    SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers. To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique "multi-pool method" that analyzes multiple pools of pair-wise case controls individually. When all the settings are in "inclusion", the common SAGE tag sequences are mined. When one tissue type is in "inclusion" and the other types of tissues are not in "inclusion", the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries. The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing.

  1. Cross-kingdom amplification using Bacteria-specific primers: Complications for studies of coral microbial ecology

    USGS Publications Warehouse

    Galkiewicz, J.P.; Kellogg, C.A.

    2008-01-01

    PCR amplification of pure bacterial DNA is vital to the study of bacterial interactions with corals. Commonly used Bacteria-specific primers 8F and 27F paired with the universal primer 1492R amplify both eukaryotic and prokaryotic rRNA genes. An alternative primer set, 63F/1542R, is suggested to resolve this problem. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.

  2. Identifying differentially expressed genes in cancer patients using a non-parameter Ising model.

    PubMed

    Li, Xumeng; Feltus, Frank A; Sun, Xiaoqian; Wang, James Z; Luo, Feng

    2011-10-01

    Identification of genes and pathways involved in diseases and physiological conditions is a major task in systems biology. In this study, we developed a novel non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also proposed a simulated annealing algorithm to find the optimal configuration of the Ising model. The Ising model was applied to two breast cancer microarray data sets. The results showed that more cancer-related DE sub-networks and genes were identified by the Ising model than those by the Markov random field model. Furthermore, cross-validation experiments showed that DE genes identified by Ising model can improve classification performance compared with DE genes identified by Markov random field model. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Latent Gammaherpesvirus 68 Infection Induces Distinct Transcriptional Changes in Different Organs

    PubMed Central

    Canny, Susan P.; Goel, Gautam; Reese, Tiffany A.; Zhang, Xin; Xavier, Ramnik

    2014-01-01

    Previous studies identified a role for latent herpesvirus infection in cross-protection against infection and exacerbation of chronic inflammatory diseases. Here, we identified more than 500 genes differentially expressed in spleens, livers, or brains of mice latently infected with gammaherpesvirus 68 and found that distinct sets of genes linked to different pathways were altered in the spleen compared to those in the liver. Several of the most differentially expressed latency-specific genes (e.g., the gamma interferon [IFN-γ], Cxcl9, and Ccl5 genes) are associated with known latency-specific phenotypes. Chronic herpesvirus infection, therefore, significantly alters the transcriptional status of host organs. We speculate that such changes may influence host physiology, the status of the immune system, and disease susceptibility. PMID:24155394

  4. Gene expression profiles in whole blood and associations with metabolic dysregulation in obesity.

    PubMed

    Cox, Amanda J; Zhang, Ping; Evans, Tiffany J; Scott, Rodney J; Cripps, Allan W; West, Nicholas P

    Gene expression data provides one tool to gain further insight into the complex biological interactions linking obesity and metabolic disease. This study examined associations between blood gene expression profiles and metabolic disease in obesity. Whole blood gene expression profiles, performed using the Illumina HT-12v4 Human Expression Beadchip, were compared between (i) individuals with obesity (O) or lean (L) individuals (n=21 each), (ii) individuals with (M) or without (H) Metabolic Syndrome (n=11 each) matched on age and gender. Enrichment of differentially expressed genes (DEG) into biological pathways was assessed using Ingenuity Pathway Analysis. Association between sets of genes from biological pathways considered functionally relevant and Metabolic Syndrome were further assessed using an area under the curve (AUC) and cross-validated classification rate (CR). For OvL, only 50 genes were significantly differentially expressed based on the selected differential expression threshold (1.2-fold, p<0.05). For MvH, 582 genes were significantly differentially expressed (1.2-fold, p<0.05) and pathway analysis revealed enrichment of DEG into a diverse set of pathways including immune/inflammatory control, insulin signalling and mitochondrial function pathways. Gene sets from the mTOR signalling pathways demonstrated the strongest association with Metabolic Syndrome (p=8.1×10 -8 ; AUC: 0.909, CR: 72.7%). These results support the use of expression profiling in whole blood in the absence of more specific tissue types for investigations of metabolic disease. Using a pathway analysis approach it was possible to identify an enrichment of DEG into biological pathways that could be targeted for in vitro follow-up. Copyright © 2017 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.

  5. Reranking candidate gene models with cross-species comparison for improved gene prediction

    PubMed Central

    Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S

    2008-01-01

    Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050

  6. CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies.

    PubMed

    Paananen, Jussi; Storvik, Markus; Wong, Garry

    2006-09-22

    Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies. We have developed a web-based software resource, called CROPPER that uses the latest genomic information concerning different data identifiers and orthologous genes from the Ensembl database. CROPPER can be used to combine genomic data from different heterogeneous sources, allowing researchers to perform cross-platform/cross-species compendium studies without the need for complex computational tools or the requirement of setting up one's own in-house database. We also present an example of a simple cross-platform/cross-species compendium study based on publicly available Parkinson's disease data derived from different sources. CROPPER is a user-friendly and freely available web-based software resource that can be successfully used for cross-species/cross-platform compendium studies.

  7. Expression profiling and cross-species RNA interference (RNAi) of desiccation-induced transcripts in the anhydrobiotic nematode Aphelenchus avenae

    PubMed Central

    2010-01-01

    Background Some organisms can survive extreme desiccation by entering a state of suspended animation known as anhydrobiosis. The free-living mycophagous nematode Aphelenchus avenae can be induced to enter anhydrobiosis by pre-exposure to moderate reductions in relative humidity (RH) prior to extreme desiccation. This preconditioning phase is thought to allow modification of the transcriptome by activation of genes required for desiccation tolerance. Results To identify such genes, a panel of expressed sequence tags (ESTs) enriched for sequences upregulated in A. avenae during preconditioning was created. A subset of 30 genes with significant matches in databases, together with a number of apparently novel sequences, were chosen for further study. Several of the recognisable genes are associated with water stress, encoding, for example, two new hydrophilic proteins related to the late embryogenesis abundant (LEA) protein family. Expression studies confirmed EST panel members to be upregulated by evaporative water loss, and the majority of genes was also induced by osmotic stress and cold, but rather fewer by heat. We attempted to use RNA interference (RNAi) to demonstrate the importance of this gene set for anhydrobiosis, but found A. avenae to be recalcitrant with the techniques used. Instead, therefore, we developed a cross-species RNAi procedure using A. avenae sequences in another anhydrobiotic nematode, Panagrolaimus superbus, which is amenable to gene silencing. Of 20 A. avenae ESTs screened, a significant reduction in survival of desiccation in treated P. superbus populations was observed with two sequences, one of which was novel, while the other encoded a glutathione peroxidase. To confirm a role for glutathione peroxidases in anhydrobiosis, RNAi with cognate sequences from P. superbus was performed and was also shown to reduce desiccation tolerance in this species. Conclusions This study has identified and characterised the expression profiles of members of the anhydrobiotic gene set in A. avenae. It also demonstrates the potential of RNAi for the analysis of anhydrobiosis and provides the first genetic data to underline the importance of effective antioxidant systems in metazoan desiccation tolerance. PMID:20085654

  8. Baghdadite ceramics modulate the cross talk between human adipose stem cells and osteoblasts for bone regeneration.

    PubMed

    Lu, Zufu; Wang, Guocheng; Roohani-Esfahani, Iman; Dunstan, Colin R; Zreiqat, Hala

    2014-03-01

    Understanding interactions among the three elements (cells, scaffolds, and bioactive factors) is critical for successful tissue engineering. This study was aimed to investigate how scaffolds would affect osteogenic gene expression in human adipose tissue-derived stem cells (ASCs) or human primary osteoblasts (HOBs), and their cross talk. Either ASCs or HOBs were seeded on Baghdadite (Ca3ZrSi2O9) and hydroxyapatite/tricalcium phosphate (HA/TCP) scaffolds, and osteogenic gene expression was assessed. To further evaluate how substrate affected HOB and ASC cross talk, an indirect co-culture system with semipermeable inserts placed on the culture plate was set up to co-culture ASCs or HOBs, which were grown in monolayer or seeded on Baghdadite or HA/TCP scaffolds, and osteogenic differentiation of the cells was assessed. We found that Baghdadite scaffolds induced a significantly greater increase in RUNX2, osteopontin, bone sialoprotein, and osteocalcin gene expression in HOBs in comparison to HA/TCP scaffolds; Baghdadite scaffolds also significantly induced RUNX2 and osteopontin, but not bone sialoprotein and osteocalcin gene expression in ASCs. In the co-culture system, the HOBs on Baghdadite scaffolds more markedly promoted osteogenic gene expression in ASCs compared to HOBs in monolayer or the HOBs on HA/TCP scaffolds. In addition, the ASCs seeded on Baghdadite scaffolds more markedly promoted osteogenic gene expression in HOBs than did the ASCs on HA/TCP scaffolds. BMP-2 expression in ASCs or HOBs was increased when they were seeded on Baghdadite scaffolds, and adding Noggin into the co-culture medium largely abrogated Baghdadite scaffold-modulated ASC-HOB cross talk. In summary, Baghdadite scaffolds not only promote the osteogenic differentiation of HOBs or ASCs but also modulate the cross talk between ASCs and HOBs, in part via increasing BMP2 expression, thereby promoting their osteogenic differentiation.

  9. Validation of RNAi Silencing Efficiency Using Gene Array Data shows 18.5% Failure Rate across 429 Independent Experiments.

    PubMed

    Munkácsy, Gyöngyi; Sztupinszki, Zsófia; Herman, Péter; Bán, Bence; Pénzváltó, Zsófia; Szarvas, Nóra; Győrffy, Balázs

    2016-09-27

    No independent cross-validation of success rate for studies utilizing small interfering RNA (siRNA) for gene silencing has been completed before. To assess the influence of experimental parameters like cell line, transfection technique, validation method, and type of control, we have to validate these in a large set of studies. We utilized gene chip data published for siRNA experiments to assess success rate and to compare methods used in these experiments. We searched NCBI GEO for samples with whole transcriptome analysis before and after gene silencing and evaluated the efficiency for the target and off-target genes using the array-based expression data. Wilcoxon signed-rank test was used to assess silencing efficacy and Kruskal-Wallis tests and Spearman rank correlation were used to evaluate study parameters. All together 1,643 samples representing 429 experiments published in 207 studies were evaluated. The fold change (FC) of down-regulation of the target gene was above 0.7 in 18.5% and was above 0.5 in 38.7% of experiments. Silencing efficiency was lowest in MCF7 and highest in SW480 cells (FC = 0.59 and FC = 0.30, respectively, P = 9.3E-06). Studies utilizing Western blot for validation performed better than those with quantitative polymerase chain reaction (qPCR) or microarray (FC = 0.43, FC = 0.47, and FC = 0.55, respectively, P = 2.8E-04). There was no correlation between type of control, transfection method, publication year, and silencing efficiency. Although gene silencing is a robust feature successfully cross-validated in the majority of experiments, efficiency remained insufficient in a significant proportion of studies. Selection of cell line model and validation method had the highest influence on silencing proficiency.

  10. Heterologous oligonucleotide microarrays for transcriptomics in a non-model species; a proof-of-concept study of drought stress in Musa

    PubMed Central

    Davey, Mark W; Graham, Neil S; Vanholme, Bartel; Swennen, Rony; May, Sean T; Keulemans, Johan

    2009-01-01

    Background 'Systems-wide' approaches such as microarray RNA-profiling are ideally suited to the study of the complex overlapping responses of plants to biotic and abiotic stresses. However, commercial microarrays are only available for a limited number of plant species and development costs are so substantial as to be prohibitive for most research groups. Here we evaluate the use of cross-hybridisation to Affymetrix oligonucleotide GeneChip® microarrays to profile the response of the banana (Musa spp.) leaf transcriptome to drought stress using a genomic DNA (gDNA)-based probe-selection strategy to improve the efficiency of detection of differentially expressed Musa transcripts. Results Following cross-hybridisation of Musa gDNA to the Rice GeneChip® Genome Array, ~33,700 gene-specific probe-sets had a sufficiently high degree of homology to be retained for transcriptomic analyses. In a proof-of-concept approach, pooled RNA representing a single biological replicate of control and drought stressed leaves of the Musa cultivar 'Cachaco' were hybridised to the Affymetrix Rice Genome Array. A total of 2,910 Musa gene homologues with a >2-fold difference in expression levels were subsequently identified. These drought-responsive transcripts included many functional classes associated with plant biotic and abiotic stress responses, as well as a range of regulatory genes known to be involved in coordinating abiotic stress responses. This latter group included members of the ERF, DREB, MYB, bZIP and bHLH transcription factor families. Fifty-two of these drought-sensitive Musa transcripts were homologous to genes underlying QTLs for drought and cold tolerance in rice, including in 2 instances QTLs associated with a single underlying gene. The list of drought-responsive transcripts also included genes identified in publicly-available comparative transcriptomics experiments. Conclusion Our results demonstrate that despite the general paucity of nucleotide sequence data in Musa and only distant phylogenetic relations to rice, gDNA probe-based cross-hybridisation to the Rice GeneChip® is a highly promising strategy to study complex biological responses and illustrates the potential of such strategies for gene discovery in non-model species. PMID:19758430

  11. Unique molecular changes in kidney allografts after simultaneous liver-kidney compared with solitary kidney transplantation.

    PubMed

    Taner, Timucin; Park, Walter D; Stegall, Mark D

    2017-05-01

    Kidney allografts transplanted simultaneously with liver allografts from the same donor are known to be immunologically privileged. This is especially evident in recipients with high levels of donor-specific anti-HLA antibodies. Here we investigated the mechanisms of liver's protective impact using gene expression in the kidney allograft. Select solitary kidney transplant or simultaneous liver-kidney transplant recipients were retrospectively reviewed and separated into four groups: 16 cross-match negative kidney transplants, 15 cross-match positive kidney transplants, 12 cross-match negative simultaneous liver-kidney transplants, and nine cross-match-positive simultaneous liver-kidney transplants. Surveillance biopsies of cross-match-positive kidney transplants had increased expression of genes associated with donor-specific antigens, inflammation, and endothelial cell activation compared to cross-match-negative kidney transplants. These changes were not found in cross-match-positive simultaneous liver-kidney transplant biopsies when compared to cross-match-negative simultaneous liver-kidney transplants. In addition, simultaneously transplanting a liver markedly increased renal expression of genes associated with tissue integrity/metabolism, regardless of the cross-match status. While the expression of inflammatory gene sets in cross-match-positive simultaneous liver-kidney transplants was not completely reduced to the level of cross-match-negative kidney transplants, the downstream effects of donor-specific anti-HLA antibodies were blocked. Thus, simultaneous liver-kidney transplants can have a profound impact on the kidney allograft, not only by decreasing inflammation and avoiding endothelial cell activation in cross-match-positive recipients, but also by increasing processes associated with tissue integrity/metabolism by unknown mechanisms. Copyright © 2017 International Society of Nephrology. Published by Elsevier Inc. All rights reserved.

  12. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification.

    PubMed

    Oberthuer, André; Berthold, Frank; Warnat, Patrick; Hero, Barbara; Kahlert, Yvonne; Spitz, Rüdiger; Ernestus, Karen; König, Rainer; Haas, Stefan; Eils, Roland; Schwab, Manfred; Brors, Benedikt; Westermann, Frank; Fischer, Matthias

    2006-11-01

    To develop a gene expression-based classifier for neuroblastoma patients that reliably predicts courses of the disease. Two hundred fifty-one neuroblastoma specimens were analyzed using a customized oligonucleotide microarray comprising 10,163 probes for transcripts with differential expression in clinical subgroups of the disease. Subsequently, the prediction analysis for microarrays (PAM) was applied to a first set of patients with maximally divergent clinical courses (n = 77). The classification accuracy was estimated by a complete 10-times-repeated 10-fold cross validation, and a 144-gene predictor was constructed from this set. This classifier's predictive power was evaluated in an independent second set (n = 174) by comparing results of the gene expression-based classification with those of risk stratification systems of current trials from Germany, Japan, and the United States. The first set of patients was accurately predicted by PAM (cross-validated accuracy, 99%). Within the second set, the PAM classifier significantly separated cohorts with distinct courses (3-year event-free survival [EFS] 0.86 +/- 0.03 [favorable; n = 115] v 0.52 +/- 0.07 [unfavorable; n = 59] and 3-year overall survival 0.99 +/- 0.01 v 0.84 +/- 0.05; both P < .0001) and separated risk groups of current neuroblastoma trials into subgroups with divergent outcome (NB2004: low-risk 3-year EFS 0.86 +/- 0.04 v 0.25 +/- 0.15, P < .0001; intermediate-risk 1.00 v 0.57 +/- 0.19, P = .018; high-risk 0.81 +/- 0.10 v 0.56 +/- 0.08, P = .06). In a multivariate Cox regression model, the PAM predictor classified patients of the second set more accurately than risk stratification of current trials from Germany, Japan, and the United States (P < .001; hazard ratio, 4.756 [95% CI, 2.544 to 8.893]). Integration of gene expression-based class prediction of neuroblastoma patients may improve risk estimation of current neuroblastoma trials.

  13. Type-2 diabetes-associated variants with cross-trait relevance: Post-GWAs strategies for biological function interpretation.

    PubMed

    Frau, Francesca; Crowther, Daniel; Ruetten, Hartmut; Allebrandt, Karla V

    2017-05-01

    Genome-wide association studies (GWAs) for type 2 diabetes (T2D) have been successful in identifying many loci with robust association signals. Nevertheless, there is a clear need for post-GWAs strategies to understand mechanism of action and clinical relevance of these variants. The association of several comorbidities with T2D suggests a common etiology for these phenotypes and complicates the management of the disease. In this study, we focused on the genetics underlying these relationships, using systems genomics to identify genetic variation associated with T2D and 12 other traits. GWAs studies summary statistics for pairwise comparisons were obtained for glycemic traits, obesity, coronary artery disease, and lipids from large consortia GWAs meta-analyses. We used a network medicine approach to leverage experimental information about the identified genes and variants with cross traits effects for biological function interpretation. We identified a set of 38 genetic variants with cross traits effects that point to a main network of genes that should be relevant for T2D and its comorbidities. We prioritized the T2D associated genes based on the number of traits they showed association with and the experimental evidence showing their relation to the disease etiology. In this study, we demonstrated how systems genomics and network medicine approaches can shed light into GWAs discoveries, translating findings into a more therapeutically relevant context. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

    PubMed

    Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

    2012-01-01

    Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.

  15. Profiling conserved biological pathways in Autosomal Dominant Polycystic Kidney Disorder (ADPKD) to elucidate key transcriptomic alterations regulating cystogenesis: A cross-species meta-analysis approach.

    PubMed

    Chatterjee, Shatakshee; Verma, Srikant Prasad; Pandey, Priyanka

    2017-09-05

    Initiation and progression of fluid filled cysts mark Autosomal Dominant Polycystic Kidney Disease (ADPKD). Thus, improved therapeutics targeting cystogenesis remains a constant challenge. Microarray studies in single ADPKD animal models species with limited sample sizes tend to provide scattered views on underlying ADPKD pathogenesis. Thus we aim to perform a cross species meta-analysis to profile conserved biological pathways that might be key targets for therapy. Nine ADPKD microarray datasets on rat, mice and human fulfilled our study criteria and were chosen. Intra-species combined analysis was performed after considering removal of batch effect. Significantly enriched GO biological processes and KEGG pathways were computed and their overlap was observed. For the conserved pathways, biological modules and gene regulatory networks were observed. Additionally, Gene Set Enrichment Analysis (GSEA) using Molecular Signature Database (MSigDB) was performed for genes found in conserved pathways. We obtained 28 modules of significantly enriched GO processes and 5 major functional categories from significantly enriched KEGG pathways conserved in human, mice and rats that in turn suggest a global transcriptomic perturbation affecting cyst - formation, growth and progression. Significantly enriched pathways obtained from up-regulated genes such as Genomic instability, Protein localization in ER and Insulin Resistance were found to regulate cyst formation and growth whereas cyst progression due to increased cell adhesion and inflammation was suggested by perturbations in Angiogenesis, TGF-beta, CAMs, and Infection related pathways. Additionally, networks revealed shared genes among pathways e.g. SMAD2 and SMAD7 in Endocytosis and TGF-beta. Our study suggests cyst formation and progression to be an outcome of interplay between a set of several key deregulated pathways. Thus, further translational research is warranted focusing on developing a combinatorial therapeutic approach for ADPKD redressal. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Genes associated with metabolic syndrome predict disease-free survival in stage II colorectal cancer patients. A novel link between metabolic dysregulation and colorectal cancer.

    PubMed

    Vargas, Teodoro; Moreno-Rubio, Juan; Herranz, Jesús; Cejas, Paloma; Molina, Susana; González-Vallinas, Margarita; Ramos, Ricardo; Burgos, Emilio; Aguayo, Cristina; Custodio, Ana B; Reglero, Guillermo; Feliu, Jaime; Ramírez de Molina, Ana

    2014-12-01

    Studies have recently suggested that metabolic syndrome and its components increase the risk of colorectal cancer. Both diseases are increasing in most countries, and the genetic association between them has not been fully elucidated. The objective of this study was to assess the association between genetic risk factors of metabolic syndrome or related conditions (obesity, hyperlipidaemia, diabetes mellitus type 2) and clinical outcome in stage II colorectal cancer patients. Expression levels of several genes related to metabolic syndrome and associated alterations were analysed by real-time qPCR in two equivalent but independent sets of stage II colorectal cancer patients. Using logistic regression models and cross-validation analysis with all tumour samples, we developed a metabolic syndrome-related gene expression profile to predict clinical outcome in stage II colorectal cancer patients. The results showed that a gene expression profile constituted by genes previously related to metabolic syndrome was significantly associated with clinical outcome of stage II colorectal cancer patients. This metabolic profile was able to identify patients with a low risk and high risk of relapse. Its predictive value was validated using an independent set of stage II colorectal cancer patients. The identification of a set of genes related to metabolic syndrome that predict survival in intermediate-stage colorectal cancer patients allows delineation of a high-risk group that may benefit from adjuvant therapy and avoid the toxic and unnecessary chemotherapy in patients classified as low risk. Our results also confirm the linkage between metabolic disorder and colorectal cancer and suggest the potential for cancer prevention and/or treatment by targeting these genes. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  17. Identification and fine-mapping of Xa33, a novel gene for resistance to Xanthomonas oryzae pv. oryzae.

    PubMed

    Kumar, P Natraj; Sujatha, K; Laha, G S; Rao, K Srinivasa; Mishra, B; Viraktamath, B C; Hari, Y; Reddy, C S; Balachandran, S M; Ram, T; Madhav, M Sheshu; Rani, N Shobha; Neeraja, C N; Reddy, G Ashok; Shaik, H; Sundaram, R M

    2012-02-01

    Broadening of the genetic base for identification and transfer of genes for resistance to insect pests and diseases from wild relatives of rice is an important strategy in resistance breeding programs across the world. An accession of Oryza nivara, International Rice Germplasm Collection (IRGC) accession number 105710, was identified to exhibit high level and broad-spectrum resistance to Xanthomonas oryzae pv. oryzae. In order to study the genetics of resistance and to tag and map the resistance gene or genes present in IRGC 105710, it was crossed with the bacterial blight (BB)-susceptible varieties 'TN1' and 'Samba Mahsuri' (SM) and then backcrossed to generate backcross mapping populations. Analysis of these populations and their progeny testing revealed that a single dominant gene controls resistance in IRGC 105710. The BC(1)F(2) population derived from the cross IRGC 105710/TN1//TN1 was screened with a set of 72 polymorphic simple-sequence repeat (SSR) markers distributed across the rice genome and the resistance gene was coarse mapped on chromosome 7 between the SSR markers RM5711 and RM6728 at a genetic distance of 17.0 and 19.3 centimorgans (cM), respectively. After analysis involving 49 SSR markers located between the genomic interval spanned by RM5711 and RM6728, and BC(2)F(2) population consisting of 2,011 individuals derived from the cross IRGC 105710/TN1//TN1, the gene was fine mapped between two SSR markers (RMWR7.1 and RMWR7.6) located at a genetic distance of 0.9 and 1.2 cM, respectively, from the gene and flanking it. The linkage distances were validated in a BC(1)F(2) mapping population derived from the cross IRGC 105710/SM//2 × SM. The BB resistance gene present in the O. nivara accession was identified to be novel based on its unique map location on chromosome 7 and wider spectrum of BB resistance; this gene has been named Xa33. The genomic region between the two closely flanking SSR markers was in silico analyzed for putatively expressed candidate genes. In total, eight genes were identified in the region and a putative gene encoding serinethreonine kinase appears to be a candidate for the Xa33 gene.

  18. DNA Repair Mechanism Gene, XRCC1A ( Arg194Trp) but not XRCC3 ( Thr241Met) Polymorphism Increased the Risk of Breast Cancer in Premenopausal Females: A Case-Control Study in Northeastern Region of India.

    PubMed

    Devi, K Rekha; Ahmed, Jishan; Narain, Kanwar; Mukherjee, Kaustab; Majumdar, Gautam; Chenkual, Saia; Zonunmawia, Jason C

    2017-12-01

    X-ray repair cross complementary group gene is one of the most studied candidate gene involved in different types of cancers. Studies have shown that X-ray repair cross complementary genes are significantly associated with increased risk of breast cancer in females. Moreover, studies have revealed that X-ray repair cross complementary gene polymorphism significantly varies between and within different ethnic groups globally. The present case-control study was aimed to investigate the association of X-ray repair cross complementary 1A (Arg194Trp) and X-ray repair cross complementary 3 (Thr241Met) polymorphism with the risk of breast cancer in females from northeastern region of India. The present case-control study includes histopathologically confirmed and newly diagnosed 464 cases with breast cancer and 534 apparently healthy neighborhood community controls. Information on sociodemographic factors and putative risk factors were collected from each study participant by conducting face-to-face interviews. Genotyping of X-ray repair cross complementary 1A (Arg194Trp) and X-ray repair cross complementary 3 (Thr241Met) was carried out by polymerase chain reaction-restriction fragment length polymorphism. For statistical analysis, both univariate and multivariate logistic regression analyses were performed. We also performed stratified analysis to find out the association of X-ray repair cross complementary genes with the risk of breast cancer stratified based on menstrual status. This study revealed that tryptophan allele (R/W-W/W genotype) in X-ray repair cross complementary 1A (Arg194Trp) gene significantly increased the risk of breast cancer (adjusted odds ratio = 1.44, 95% confidence interval = 1.06-1.97, P < .05 for R/W-W/W genotype). Moreover, it was found that tryptophan allele (W/W genotype) at codon 194 of X-ray repair cross complementary 1A (Arg194Trp) gene significantly increased the risk of breast cancer in premenopausal females (crude odds ratio = 1.66, 95% confidence interval = 1.11-2.46, P < .05 for R/W-W/W genotype). The present study did not reveal any significant association of X-ray repair cross complementary 3 (Thr241Met) polymorphism with the risk of breast cancer. The present study has explored that X-ray repair cross complementary 1A (Arg194Trp) gene polymorphism is significantly associated with the increased risk of breast cancer in premenopausal females from northeastern region of India which may be beneficial for prognostic purposes.

  19. DNA Repair Mechanism Gene, XRCC1A (Arg194Trp) but not XRCC3 (Thr241Met) Polymorphism Increased the Risk of Breast Cancer in Premenopausal Females: A Case–Control Study in Northeastern Region of India

    PubMed Central

    Ahmed, Jishan; Narain, Kanwar; Mukherjee, Kaustab; Majumdar, Gautam; Chenkual, Saia; Zonunmawia, Jason C.

    2017-01-01

    X-ray repair cross complementary group gene is one of the most studied candidate gene involved in different types of cancers. Studies have shown that X-ray repair cross complementary genes are significantly associated with increased risk of breast cancer in females. Moreover, studies have revealed that X-ray repair cross complementary gene polymorphism significantly varies between and within different ethnic groups globally. The present case–control study was aimed to investigate the association of X-ray repair cross complementary 1A (Arg194Trp) and X-ray repair cross complementary 3 (Thr241Met) polymorphism with the risk of breast cancer in females from northeastern region of India. The present case–control study includes histopathologically confirmed and newly diagnosed 464 cases with breast cancer and 534 apparently healthy neighborhood community controls. Information on sociodemographic factors and putative risk factors were collected from each study participant by conducting face-to-face interviews. Genotyping of X-ray repair cross complementary 1A (Arg194Trp) and X-ray repair cross complementary 3 (Thr241Met) was carried out by polymerase chain reaction-restriction fragment length polymorphism. For statistical analysis, both univariate and multivariate logistic regression analyses were performed. We also performed stratified analysis to find out the association of X-ray repair cross complementary genes with the risk of breast cancer stratified based on menstrual status. This study revealed that tryptophan allele (R/W-W/W genotype) in X-ray repair cross complementary 1A (Arg194Trp) gene significantly increased the risk of breast cancer (adjusted odds ratio = 1.44, 95% confidence interval = 1.06-1.97, P < .05 for R/W-W/W genotype). Moreover, it was found that tryptophan allele (W/W genotype) at codon 194 of X-ray repair cross complementary 1A (Arg194Trp) gene significantly increased the risk of breast cancer in premenopausal females (crude odds ratio = 1.66, 95% confidence interval = 1.11-2.46, P < .05 for R/W-W/W genotype). The present study did not reveal any significant association of X-ray repair cross complementary 3 (Thr241Met) polymorphism with the risk of breast cancer. The present study has explored that X-ray repair cross complementary 1A (Arg194Trp) gene polymorphism is significantly associated with the increased risk of breast cancer in premenopausal females from northeastern region of India which may be beneficial for prognostic purposes. PMID:29332455

  20. Maternal exposure to carbamazepine at environmental concentrations can cross intestinal and placental barriers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaushik, Gaurav, E-mail: kausgaur@isu.edu; Department of Medical Pathology and Laboratory Medicine, University of California at Davis, Davis, CA 95817; Institute for Pediatric Regenerative Medicine, Shriners Hospitals for Children, Northern California, 2425 Stockton Boulevard, Sacramento, CA 95817

    Psychoactive pharmaceuticals have been found as teratogens at clinical dosage during pregnancy. These pharmaceuticals have also been detected in minute (ppb) concentrations in drinking water in the US, and are environmental contaminants that may be complicit in triggering neurological disorders in genetically susceptible individuals. Previous studies have determined that psychoactive pharmaceuticals (fluoxetine, venlafaxine and carbamazepine) at environmentally relevant concentrations enriched sets of genes regulating development and function of the nervous system in fathead minnows. Altered gene sets were also associated with potential neurological disorders, including autism spectrum disorders (ASD). Subsequent in vitro studies indicated that psychoactive pharmaceuticals altered ASD-associated synaptic proteinmore » expression and gene expression in human neuronal cells. However, it is unknown if environmentally relevant concentrations of these pharmaceuticals are able to cross biological barriers from mother to fetus, thus potentially posing risks to nervous system development. The main objective of this study was to test whether psychoactive pharmaceuticals (fluoxetine, venlafaxine, and carbamazepine) administered through the drinking water at environmental concentrations to pregnant mice could reach the brain of the developing embryo by crossing intestinal and placental barriers. We addressed this question by adding {sup 2}H-isotope labeled pharmaceuticals to the drinking water of female mice for 20 days (10 pre-and 10 post–conception days), and quantifying {sup 2}H-isotope enrichment signals in the dam liver and brain of developing embryos using isotope ratio mass spectrometry. Significant levels of {sup 2}H enrichment was detected in the brain of embryos and livers of carbamazepine-treated mice but not in those of control dams, or for fluoxetine or venlafaxine application. These results provide the first evidence that carbamazepine in drinking water and at typical environmental concentrations is transmitted from mother to embryo. Our results, combined with previous evidence that carbamazepine may be associated with ASD in infants, warrant the closer examination of psychoactive pharmaceuticals in drinking water and their potential association with neurodevelopmental disorders.« less

  1. Chromosome doubling to overcome the chrysanthemum cross barrier based on insight from transcriptomic and proteomic analyses.

    PubMed

    Zhang, Fengjiao; Hua, Lichun; Fei, Jiangsong; Wang, Fan; Liao, Yuan; Fang, Weimin; Chen, Fadi; Teng, Nianjun

    2016-08-09

    Cross breeding is the most commonly used method in chrysanthemum (Chrysanthemum morifolium) breeding; however, cross barriers always exist in these combinations. Many studies have shown that paternal chromosome doubling can often overcome hybridization barriers during cross breeding, although the underlying mechanism has seldom been investigated. In this study, we performed two crosses: C. morifolium (pollen receptor) × diploid C. nankingense (pollen donor) and C. morifolium × tetraploid C. nankingense. Seeds were obtained only from the latter cross. RNA-Seq and isobaric tags for relative and absolute quantitation (iTRAQ) were used to investigate differentially expressed genes and proteins during key embryo development stages in the latter cross. A previously performed cross, C. morifolium × diploid C. nankingense, was compared to our results and revealed that transcription factors (i.e., the agamous-like MADS-box protein AGL80 and the leucine-rich repeat receptor protein kinase EXS), hormone-responsive genes (auxin-binding protein 1), genes and proteins related to metabolism (ATP-citrate synthase, citrate synthase and malate dehydrogenase) and other genes reported to contribute to embryo development (i.e., LEA, elongation factor and tubulin) had higher expression levels in the C. morifolium × tetraploid C. nankingense cross. In contrast, genes related to senescence and cell death were down-regulated in the C. morifolium × tetraploid C. nankingense cross. The data resources helped elucidate the gene and protein expression profiles and identify functional genes during different development stages. When the chromosomes from the male parent are doubled, the genes contributing to normal embryo developmentare more abundant. However, genes with negative functions were suppressed, suggesting that chromosome doubling may epigenetically inhibit the expression of these genes and allow the embryo to develop normally.

  2. Multi-species data integration and gene ranking enrich significant results in an alcoholism genome-wide association study.

    PubMed

    Zhao, Zhongming; Guo, An-Yuan; van den Oord, Edwin J C G; Aliev, Fazil; Jia, Peilin; Edenberg, Howard J; Riley, Brien P; Dick, Danielle M; Bettinger, Jill C; Davies, Andrew G; Grotewiel, Michael S; Schuckit, Marc A; Agrawal, Arpana; Kramer, John; Nurnberger, John I; Kendler, Kenneth S; Webb, Bradley T; Miles, Michael F

    2012-01-01

    A variety of species and experimental designs have been used to study genetic influences on alcohol dependence, ethanol response, and related traits. Integration of these heterogeneous data can be used to produce a ranked target gene list for additional investigation. In this study, we performed a unique multi-species evidence-based data integration using three microarray experiments in mice or humans that generated an initial alcohol dependence (AD) related genes list, human linkage and association results, and gene sets implicated in C. elegans and Drosophila. We then used permutation and false discovery rate (FDR) analyses on the genome-wide association studies (GWAS) dataset from the Collaborative Study on the Genetics of Alcoholism (COGA) to evaluate the ranking results and weighting matrices. We found one weighting score matrix could increase FDR based q-values for a list of 47 genes with a score greater than 2. Our follow up functional enrichment tests revealed these genes were primarily involved in brain responses to ethanol and neural adaptations occurring with alcoholism. These results, along with our experimental validation of specific genes in mice, C. elegans and Drosophila, suggest that a cross-species evidence-based approach is useful to identify candidate genes contributing to alcoholism.

  3. Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.

    PubMed

    Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas

    2017-01-21

    We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.

  4. Genetic analysis and fine mapping of LH1 and LH2, a set of complementary genes controlling late heading in rice (Oryza sativa L.)

    PubMed Central

    Liu, Shuang; Wang, Feng; Gao, Li Jun; Li, Jin Hua; Li, Rong Bai; Gao, Han Liang; Deng, Guo Fu; Yang, Jin Shui; Luo, Xiao Jin

    2012-01-01

    Heading date in rice (Oryza sativa L.) is a critical agronomic trait with a complex inheritance. To investigate the genetic basis and mechanism of gene interaction in heading date, we conducted genetic analysis on segregation populations derived from crosses among the indica cultivars Bo B, Yuefeng B and Baoxuan 2. A set of dominant complementary genes controlling late heading, designated LH1 and LH2, were detected by molecular marker mapping. Genetic analysis revealed that Baoxuan 2 contains both dominant genes, while Bo B and Yuefeng B each possess either LH1 or LH2. Using larger populations with segregant ratios of 3 : 1, we fine-mapped LH1 to a 63-kb region near the centromere of chromosome 7 flanked by markers RM5436 and RM8034, and LH2 to a 177-kb region on the short arm of chromosome 8 between flanking markers Indel22468-3 and RM25. Some candidate genes were identified through sequencing of Bo B and Yuefeng B in these target regions. Our work provides a solid foundation for further study on gene interaction in heading date and has application in marker-assisted breeding of photosensitive hybrid rice in China. PMID:23341744

  5. Genetic analysis and fine mapping of LH1 and LH2, a set of complementary genes controlling late heading in rice (Oryza sativa L.).

    PubMed

    Liu, Shuang; Wang, Feng; Gao, Li Jun; Li, Jin Hua; Li, Rong Bai; Gao, Han Liang; Deng, Guo Fu; Yang, Jin Shui; Luo, Xiao Jin

    2012-12-01

    Heading date in rice (Oryza sativa L.) is a critical agronomic trait with a complex inheritance. To investigate the genetic basis and mechanism of gene interaction in heading date, we conducted genetic analysis on segregation populations derived from crosses among the indica cultivars Bo B, Yuefeng B and Baoxuan 2. A set of dominant complementary genes controlling late heading, designated LH1 and LH2, were detected by molecular marker mapping. Genetic analysis revealed that Baoxuan 2 contains both dominant genes, while Bo B and Yuefeng B each possess either LH1 or LH2. Using larger populations with segregant ratios of 3 : 1, we fine-mapped LH1 to a 63-kb region near the centromere of chromosome 7 flanked by markers RM5436 and RM8034, and LH2 to a 177-kb region on the short arm of chromosome 8 between flanking markers Indel22468-3 and RM25. Some candidate genes were identified through sequencing of Bo B and Yuefeng B in these target regions. Our work provides a solid foundation for further study on gene interaction in heading date and has application in marker-assisted breeding of photosensitive hybrid rice in China.

  6. Cross-referencing yeast genetics and mammalian genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hieter, P.; Basset, D.; Boguski, M.

    1994-09-01

    We have initiated a project that will systematically transfer information about yeast genes onto the genetic maps of mice and human beings. Rapidly expanding human EST data will serve as a source of candidate human homologs that will be repeatedly searched using yeast protein sequence queries. Search results will be automatically reported to participating labs. Human cDNA sequences from which the ESTs are derived will be mapped at high resolution in the human and mouse genomes. The comparative mapping information cross-references the genomic position of novel human cDNAs with functional information known about the cognate yeast genes. This should facilitatemore » the initial identification of genes responsible for mammalian mutant phenotypes, including human disease. In addition, the identification of mammalian homologs of yeast genes provides reagents for determining evolutionary conservation and for performing direct experiments in multicellular eukaryotes to enhance study of the yeast protein`s function. For example, ESTs homologous to CDC27 and CDC16 were identified, and the corresponding cDNA clones were obtained from ATTC, completely sequenced, and mapped on human and mouse chromosomes. In addition, the CDC17hs cDNA has been used to raise antisera to the CDC27Hs protein and used in subcellular localization experiments and junctional studies in mammalian cells. We have received funding from the National Center for Human Genome Research to provide a community resource which will establish comprehensive cross-referencing among yeast, human, and mouse loci. The project is set up as a service and information on how to communicate with this effort will be provided.« less

  7. A combined analysis of genome-wide expression profiling of bipolar disorder in human prefrontal cortex.

    PubMed

    Wang, Jinglu; Qu, Susu; Wang, Weixiao; Guo, Liyuan; Zhang, Kunlin; Chang, Suhua; Wang, Jing

    2016-11-01

    Numbers of gene expression profiling studies of bipolar disorder have been published. Besides different array chips and tissues, variety of the data processes in different cohorts aggravated the inconsistency of results of these genome-wide gene expression profiling studies. By searching the gene expression databases, we obtained six data sets for prefrontal cortex (PFC) of bipolar disorder with raw data and combinable platforms. We used standardized pre-processing and quality control procedures to analyze each data set separately and then combined them into a large gene expression matrix with 101 bipolar disorder subjects and 106 controls. A standard linear mixed-effects model was used to calculate the differentially expressed genes (DEGs). Multiple levels of sensitivity analyses and cross validation with genetic data were conducted. Functional and network analyses were carried out on basis of the DEGs. In the result, we identified 198 unique differentially expressed genes in the PFC of bipolar disorder and control. Among them, 115 DEGs were robust to at least three leave-one-out tests or different pre-processing methods; 51 DEGs were validated with genetic association signals. Pathway enrichment analysis showed these DEGs were related with regulation of neurological system, cell death and apoptosis, and several basic binding processes. Protein-protein interaction network further identified one key hub gene. We have contributed the most comprehensive integrated analysis of bipolar disorder expression profiling studies in PFC to date. The DEGs, especially those with multiple validations, may denote a common signature of bipolar disorder and contribute to the pathogenesis of disease. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06

    PubMed Central

    2010-01-01

    Background The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally. Results Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent. Conclusion The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed. PMID:21092259

  9. Measurement of hygromycin B phosphotransferase activity in crude mammalian cell extracts by a simple dot-blot assay.

    PubMed

    Sørensen, M S; Duch, M; Paludan, K; Jørgensen, P; Pedersen, F S

    1992-03-15

    Hygromycin B (Hy) resistance, encoded by the prokaryotic gene hph, is commonly used as a dominant selectable marker for gene transfer experiments in mammalian cells. We describe a simple, quantitative dot-blot assay for measuring the activity in crude mammalian cell extracts of Hy phosphotransferase, the product of the hph gene. The assay shows no cross interference with substrates for neomycin phosphotransferase II, the product of the commonly used marker gene neo; hph and neo may thus be useful as a set of two non-interfering selectable marker and reporter genes for gene transfer experiments in mammalian cells.

  10. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    PubMed

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  11. Marker-assisted identification of restorer gene(s) in iso-cytoplasmic restorer lines of WA cytoplasm in rice and assessment of their fertility restoration potential across environments.

    PubMed

    Kumar, Amit; Bhowmick, Prolay Kumar; Singh, Vikram Jeet; Malik, Manoj; Gupta, Ashish Kumar; Seth, R; Nagarajan, M; Krishnan, S Gopala; Singh, Ashok Kumar

    2017-10-01

    Iso-cytoplasmic restorers possess the same male sterile cytoplasm as the cytoplasmic male sterile (CMS) lines, thereby minimizing the potential cyto-nuclear conflict in the hybrids. Restoration of fertility of the wild abortive CMS is governed by two major genes namely, Rf3 and Rf4 . Therefore, assessing the allelic status of these restorer genes in the iso-cytoplasmic restorers using molecular markers will not only help in estimating the efficiency of these genes either alone or in combination, in fertility restoration in the hybrids in different environments, but will also be useful in determining the efficacy of these markers. In the present study, the efficiency of molecular markers in identifying genotypes carrying restorer allele of the gene(s) Rf3 and Rf4, restoring male fertility of WA cytoplasm in rice was assessed in a set of 100 iso-cytoplasmic rice restorers using gene linked as well as candidate gene based markers. In order to validate the efficacy of markers in identifying the restorers, a sub-set of selected 25 iso-cytoplasmic rice restorers were crossed with four different cytoplasmic male sterile lines namely, IR 79156A, IR 58025A, Pusa 6A and RTN 12A, and the pollen and spikelet fertility of the F 1 s were evaluated at three different locations. Marker analysis showed that Rf4 was the predominant fertility restorer gene in the iso-cytoplasmic restorers and Rf3 had a synergistic effect on fertility restoration. The efficiency of gene based markers, DRCG-RF4-14 and DRRM-RF3-10 for Rf4 (87%) and Rf3 (84%) genes was higher than respective gene-linked SSR markers RM6100 (80%) and RM3873 (82%). It is concluded that the gene based markers can be effectively used in identifying fertility restorer lines obviating the need for making crosses and evaluating the F 1 s. Though gene based markers are more efficient, there is a need to identify functional polymorphisms which can provide 100% efficiency. Three iso-cytoplasmic restorers namely, PRR 300, PRR 363 and PRR 396 possessing both Rf4 and Rf3 genes and good fertility restoration have been identified which could be used further in hybrid rice breeding.

  12. Analysis of ripening-related gene expression in papaya using an Arabidopsis-based microarray

    PubMed Central

    2012-01-01

    Background Papaya (Carica papaya L.) is a commercially important crop that produces climacteric fruits with a soft and sweet pulp that contain a wide range of health promoting phytochemicals. Despite its importance, little is known about transcriptional modifications during papaya fruit ripening and their control. In this study we report the analysis of ripe papaya transcriptome by using a cross-species (XSpecies) microarray technique based on the phylogenetic proximity between papaya and Arabidopsis thaliana. Results Papaya transcriptome analyses resulted in the identification of 414 ripening-related genes with some having their expression validated by qPCR. The transcription profile was compared with that from ripening tomato and grape. There were many similarities between papaya and tomato especially with respect to the expression of genes encoding proteins involved in primary metabolism, regulation of transcription, biotic and abiotic stress and cell wall metabolism. XSpecies microarray data indicated that transcription factors (TFs) of the MADS-box, NAC and AP2/ERF gene families were involved in the control of papaya ripening and revealed that cell wall-related gene expression in papaya had similarities to the expression profiles seen in Arabidopsis during hypocotyl development. Conclusion The cross-species array experiment identified a ripening-related set of genes in papaya allowing the comparison of transcription control between papaya and other fruit bearing taxa during the ripening process. PMID:23256600

  13. Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.

    PubMed

    Soneson, Charlotte; Lilljebjörn, Henrik; Fioretos, Thoas; Fontes, Magnus

    2010-04-15

    With the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia. Using the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA. We conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.

  14. University of California San Francisco (UCSF-2): Gene Expression Profiling of Normal Mouse Skin, Hras WT and Hras -/- | Office of Cancer Genomics

    Cancer.gov

    University of California San Francisco (UCSF-2): Gene Expression Profiling of Normal Mouse Skin, Hras WT and Hras -/- This data set contains the transcriptional profiles of 20 dorsal skin samples from eight-week-old mice. Mice were generated by crossing FVB/N to Mus spretus mice to generate F1 mice, and then crossing F1 mice back to the FVB/N strain. 10  FVB/N mice lacking Hras1 (aka HrasKO, Hras-/-) and 10  FVB/N mice with wild-type Hras1 were generated. Read the abstract.

  15. CrossLink: a novel method for cross-condition classification of cancer subtypes.

    PubMed

    Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei

    2016-08-22

    We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

  16. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. Early Cone Setting in Picea abies acrocona Is Associated with Increased Transcriptional Activity of a MADS Box Transcription Factor1[W][OA

    PubMed Central

    Uddenberg, Daniel; Reimegård, Johan; Clapham, David; Almqvist, Curt; von Arnold, Sara; Emanuelsson, Olof; Sundström, Jens F.

    2013-01-01

    Conifers normally go through a long juvenile period, for Norway spruce (Picea abies) around 20 to 25 years, before developing male and female cones. We have grown plants from inbred crosses of a naturally occurring spruce mutant (acrocona). One-fourth of the segregating acrocona plants initiate cones already in their second growth cycle, suggesting control by a single locus. The early cone-setting properties of the acrocona mutant were utilized to identify candidate genes involved in vegetative-to-reproductive phase change in Norway spruce. Poly(A+) RNA samples from apical and basal shoots of cone-setting and non-cone-setting plants were subjected to high-throughput sequencing (RNA-seq). We assembled and investigated 33,383 expressed putative protein-coding acrocona transcripts. Eight transcripts were differentially expressed between selected sample pairs. One of these (Acr42124_1) was significantly up-regulated in apical shoot samples from cone-setting acrocona plants, and the encoded protein belongs to the MADS box gene family of transcription factors. Using quantitative real-time polymerase chain reaction with independently derived plant material, we confirmed that the MADS box gene is up-regulated in both needles and buds of cone-inducing shoots when reproductive identity is determined. Our results constitute important steps for the development of a rapid cycling model system that can be used to study gene function in conifers. In addition, our data suggest the involvement of a MADS box transcription factor in the vegetative-to-reproductive phase change in Norway spruce. PMID:23221834

  18. Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources

    PubMed Central

    Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel

    2013-01-01

    An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964

  19. Mining functionally relevant gene sets for analyzing physiologically novel clinical expression data.

    PubMed

    Turcan, Sevin; Vetter, Douglas E; Maron, Jill L; Wei, Xintao; Slonim, Donna K

    2011-01-01

    Gene set analyses have become a standard approach for increasing the sensitivity of transcriptomic studies. However, analytical methods incorporating gene sets require the availability of pre-defined gene sets relevant to the underlying physiology being studied. For novel physiological problems, relevant gene sets may be unavailable or existing gene set databases may bias the results towards only the best-studied of the relevant biological processes. We describe a successful attempt to mine novel functional gene sets for translational projects where the underlying physiology is not necessarily well characterized in existing annotation databases. We choose targeted training data from public expression data repositories and define new criteria for selecting biclusters to serve as candidate gene sets. Many of the discovered gene sets show little or no enrichment for informative Gene Ontology terms or other functional annotation. However, we observe that such gene sets show coherent differential expression in new clinical test data sets, even if derived from different species, tissues, and disease states. We demonstrate the efficacy of this method on a human metabolic data set, where we discover novel, uncharacterized gene sets that are diagnostic of diabetes, and on additional data sets related to neuronal processes and human development. Our results suggest that our approach may be an efficient way to generate a collection of gene sets relevant to the analysis of data for novel clinical applications where existing functional annotation is relatively incomplete.

  20. Context dependency of Set1/COMPASS-mediated histone H3 Lys4 trimethylation

    PubMed Central

    Thornton, Janet L.; Westfield, Gerwin H.; Takahashi, Yoh-hei; Cook, Malcolm; Gao, Xin; Woodfin, Ashley R.; Lee, Jung-Shin; Morgan, Marc A.; Jackson, Jessica; Smith, Edwin R.; Couture, Jean-Francois; Skiniotis, Georgios; Shilatifard, Ali

    2014-01-01

    The stimulation of trimethylation of histone H3 Lys4 (H3K4) by H2B monoubiquitination (H2Bub) has been widely studied, with multiple mechanisms having been proposed for this form of histone cross-talk. Cps35/Swd2 within COMPASS (complex of proteins associated with Set1) is considered to bridge these different processes. However, a truncated form of Set1 (762-Set1) is reported to function in H3K4 trimethylation (H3K4me3) without interacting with Cps35/Swd2, and such cross-talk is attributed to the n-SET domain of Set1 and its interaction with the Cps40/Spp1 subunit of COMPASS. Here, we used biochemical, structural, in vivo, and chromatin immunoprecipitation (ChIP) sequencing (ChIP-seq) approaches to demonstrate that Cps40/Spp1 and the n-SET domain of Set1 are required for the stability of Set1 and not the cross-talk. Furthermore, the apparent wild-type levels of H3K4me3 in the 762-Set1 strain are due to the rogue methylase activity of this mutant, resulting in the mislocalization of H3K4me3 from the promoter-proximal regions to the gene bodies and intergenic regions. We also performed detailed screens and identified yeast strains lacking H2Bub but containing intact H2Bub enzymes that have normal levels of H3K4me3, suggesting that monoubiquitination may not directly stimulate COMPASS but rather works in the context of the PAF and Rad6/Bre1 complexes. Our study demonstrates that the monoubiquitination machinery and Cps35/Swd2 function to focus COMPASS's H3K4me3 activity at promoter-proximal regions in a context-dependent manner. PMID:24402317

  1. Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations.

    PubMed

    Gupta, Mayetri; Cheung, Ching-Lung; Hsu, Yi-Hsiang; Demissie, Serkalem; Cupples, L Adrienne; Kiel, Douglas P; Karasik, David

    2011-06-01

    Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated phenotypes, as well as in deciphering genetic and phenotypic modularity underlying osteoporosis risk. Copyright © 2011 American Society for Bone and Mineral Research.

  2. Global Mapping of the Yeast Genetic Interaction Network

    NASA Astrophysics Data System (ADS)

    Tong, Amy Hin Yan; Lesage, Guillaume; Bader, Gary D.; Ding, Huiming; Xu, Hong; Xin, Xiaofeng; Young, James; Berriz, Gabriel F.; Brost, Renee L.; Chang, Michael; Chen, YiQun; Cheng, Xin; Chua, Gordon; Friesen, Helena; Goldberg, Debra S.; Haynes, Jennifer; Humphries, Christine; He, Grace; Hussein, Shamiza; Ke, Lizhu; Krogan, Nevan; Li, Zhijian; Levinson, Joshua N.; Lu, Hong; Ménard, Patrice; Munyana, Christella; Parsons, Ainslie B.; Ryan, Owen; Tonikian, Raffi; Roberts, Tania; Sdicu, Anne-Marie; Shapiro, Jesse; Sheikh, Bilal; Suter, Bernhard; Wong, Sharyl L.; Zhang, Lan V.; Zhu, Hongwei; Burd, Christopher G.; Munro, Sean; Sander, Chris; Rine, Jasper; Greenblatt, Jack; Peter, Matthias; Bretscher, Anthony; Bell, Graham; Roth, Frederick P.; Brown, Grant W.; Andrews, Brenda; Bussey, Howard; Boone, Charles

    2004-02-01

    A genetic interaction network containing ~1000 genes and ~4000 interactions was mapped by crossing mutations in 132 different query genes into a set of ~4700 viable gene yeast deletion mutants and scoring the double mutant progeny for fitness defects. Network connectivity was predictive of function because interactions often occurred among functionally related genes, and similar patterns of interactions tended to identify components of the same pathway. The genetic network exhibited dense local neighborhoods; therefore, the position of a gene on a partially mapped network is predictive of other genetic interactions. Because digenic interactions are common in yeast, similar networks may underlie the complex genetics associated with inherited phenotypes in other organisms.

  3. Comparative Genomics and Host Resistance against Infectious Diseases

    PubMed Central

    Qureshi, Salman T.; Skamene, Emil

    1999-01-01

    The large size and complexity of the human genome have limited the identification and functional characterization of components of the innate immune system that play a critical role in front-line defense against invading microorganisms. However, advances in genome analysis (including the development of comprehensive sets of informative genetic markers, improved physical mapping methods, and novel techniques for transcript identification) have reduced the obstacles to discovery of novel host resistance genes. Study of the genomic organization and content of widely divergent vertebrate species has shown a remarkable degree of evolutionary conservation and enables meaningful cross-species comparison and analysis of newly discovered genes. Application of comparative genomics to host resistance will rapidly expand our understanding of human immune defense by facilitating the translation of knowledge acquired through the study of model organisms. We review the rationale and resources for comparative genomic analysis and describe three examples of host resistance genes successfully identified by this approach. PMID:10081670

  4. Cross disease analysis of co-functional microRNA pairs on a reconstructed network of disease-gene-microRNA tripartite.

    PubMed

    Peng, Hui; Lan, Chaowang; Zheng, Yi; Hutvagner, Gyorgy; Tao, Dacheng; Li, Jinyan

    2017-03-24

    MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics. We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs. With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.

  5. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

    PubMed

    Zheng, Qi; Wang, Xiu-Jie

    2008-07-01

    Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/

  6. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  7. Pathway Analysis in Attention Deficit Hyperactivity Disorder: An Ensemble Approach

    PubMed Central

    Mooney, Michael A.; McWeeney, Shannon K.; Faraone, Stephen V.; Hinney, Anke; Hebebrand, Johannes; Nigg, Joel T.; Wilmot, Beth

    2016-01-01

    Despite a wealth of evidence for the role of genetics in attention deficit hyperactivity disorder (ADHD), specific and definitive genetic mechanisms have not been identified. Pathway analyses, a subset of gene-set analyses, extend the knowledge gained from genome-wide association studies (GWAS) by providing functional context for genetic associations. However, there are numerous methods for association testing of gene sets and no real consensus regarding the best approach. The present study applied six pathway analysis methods to identify pathways associated with ADHD in two GWAS datasets from the Psychiatric Genomics Consortium. Methods that utilize genotypes to model pathway-level effects identified more replicable pathway associations than methods using summary statistics. In addition, pathways implicated by more than one method were significantly more likely to replicate. A number of brain-relevant pathways, such as RhoA signaling, glycosaminoglycan biosynthesis, fibroblast growth factor receptor activity, and pathways containing potassium channel genes, were nominally significant by multiple methods in both datasets. These results support previous hypotheses about the role of regulation of neurotransmitter release, neurite outgrowth and axon guidance in contributing to the ADHD phenotype and suggest the value of cross-method convergence in evaluating pathway analysis results. PMID:27004716

  8. Interconnected microbiomes and resistomes in low-income human habitats

    PubMed Central

    Pehrsson, Erica C.; Tsukayama, Pablo; Patel, Sanket; Mejía-Bautista, Melissa; Sosa-Soto, Giordano; Navarrete, Karla M.; Calderon, Maritza; Cabrera, Lilia; Hoyos-Arango, William; Bertoli, M. Teresita; Berg, Douglas E.; Gilman, Robert H.; Dantas, Gautam

    2016-01-01

    Summary Antibiotic-resistant infections annually claim hundreds of thousands of lives worldwide. This problem is exacerbated by resistance gene exchange between pathogens and benign microbes from diverse habitats. Mapping resistance gene dissemination between humans and their environment is a public health priority. We characterized the bacterial community structure and resistance exchange networks of hundreds of interconnected human fecal and environmental samples from two low-income Latin American communities. We found that resistomes across habitats are generally structured by bacterial phylogeny along ecological gradients, but identified key resistance genes that cross habitat boundaries and determined their association with mobile genetic elements. We also assessed the effectiveness of widely-used excreta management strategies in reducing fecal bacteria and resistance genes in these settings representative of low- and middle-income countries. Our results lay the foundation for quantitative risk assessment and surveillance of resistance dissemination across interconnected habitats in settings representing over two-thirds of the world’s population. PMID:27172044

  9. Neurobehavioral Integrity of Chimpanzee Newborns: Comparisons across groups and across species reveal gene-environment interaction effects

    PubMed Central

    Bard, Kim A.; Brent, Linda; Lester, Barry; Worobey, John; Suomi, Stephen J.

    2014-01-01

    The aims of this article are to describe the neurobehavioral integrity of chimpanzee newborns, to investigate how early experiences affect the neurobehavioral organization of chimpanzees, and to explore species differences by comparing chimpanzee newborns to a group of typically developing human newborns. Neurobehavioral integrity related to orientation, motor performance, arousal, and state regulation of 55 chimpanzee (raised in four different settings) and 42 human newborns was measured with the Neonatal Behavioral Assessment Scale (NBAS) a semi-structured 25-minute interactive assessment. Thirty-eight chimpanzees were tested every other day from birth, and analyses revealed significant developmental changes in 19 of 27 NBAS scores. The cross-group and cross-species comparisons were conducted at 2 and 30 days of age. Among the 4 chimpanzee groups, significant differences were found in 23 of 24 NBAS scores. Surprisingly, the cross-species comparisons revealed that the human group was distinct in only 1 of 25 NBAS scores (the human group had significantly less muscle tone than all the chimpanzee groups). The human group was indistinguishable from at least one of the chimpanzee groups in the remaining 24 of 25 NBAS scores. The results of this study support the conclusion that the interplay between genes and environment, rather than genes alone or environment alone, accounts for phenotypic expressions of newborn neurobehavioral integrity in hominids. PMID:25110465

  10. Disease modeling in genetic kidney diseases: zebrafish.

    PubMed

    Schenk, Heiko; Müller-Deile, Janina; Kinast, Mark; Schiffer, Mario

    2017-07-01

    Growing numbers of translational genomics studies are based on the highly efficient and versatile zebrafish (Danio rerio) vertebrate model. The increasing types of zebrafish models have improved our understanding of inherited kidney diseases, since they not only display pathophysiological changes but also give us the opportunity to develop and test novel treatment options in a high-throughput manner. New paradigms in inherited kidney diseases have been developed on the basis of the distinct genome conservation of approximately 70 % between zebrafish and humans in terms of existing gene orthologs. Several options are available to determine the functional role of a specific gene or gene sets. Permanent genome editing can be induced via complete gene knockout by using the CRISPR/Cas-system, among others, or via transient modification by using various morpholino techniques. Cross-species rescues succeeding knockdown techniques are employed to determine the functional significance of a target gene or a specific mutation. This article summarizes the current techniques and discusses their perspectives.

  11. A large-scale benchmark of gene prioritization methods.

    PubMed

    Guala, Dimitri; Sonnhammer, Erik L L

    2017-04-21

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  12. SYBR green-based real-time reverse transcription-PCR for typing and subtyping of all hemagglutinin and neuraminidase genes of avian influenza viruses and comparison to standard serological subtyping tests.

    PubMed

    Tsukamoto, Kenji; Panei, Carlos Javier; Javier, Panei Carlos; Shishido, Makiko; Noguchi, Daigo; Pearce, John; Kang, Hyun-Mi; Jeong, Ok Mi; Lee, Youn-Jeong; Nakanishi, Koji; Ashizawa, Takayoshi

    2012-01-01

    Continuing outbreaks of H5N1 highly pathogenic (HP) avian influenza virus (AIV) infections of wild birds and poultry worldwide emphasize the need for global surveillance of wild birds. To support the future surveillance activities, we developed a SYBR green-based, real-time reverse transcriptase PCR (rRT-PCR) for detecting nucleoprotein (NP) genes and subtyping 16 hemagglutinin (HA) and 9 neuraminidase (NA) genes simultaneously. Primers were improved by focusing on Eurasian or North American lineage genes; the number of mixed-base positions per primer was set to five or fewer, and the concentration of each primer set was optimized empirically. Also, 30 cycles of amplification of 1:10 dilutions of cDNAs from cultured viruses effectively reduced minor cross- or nonspecific reactions. Under these conditions, 346 HA and 345 NA genes of 349 AIVs were detected, with average sensitivities of NP, HA, and NA genes of 10(1.5), 10(2.3), and 10(3.1) 50% egg infective doses, respectively. Utility of rRT-PCR for subtyping AIVs was compared with that of current standard serological tests by using 104 recent migratory duck virus isolates. As a result, all HA genes and 99% of the NA genes were genetically subtyped, while only 45% of HA genes and 74% of NA genes were serologically subtyped. Additionally, direct subtyping of AIVs in fecal samples was possible by 40 cycles of amplification: approximately 70% of HA and NA genes of NP gene-positive samples were successfully subtyped. This validation study indicates that rRT-PCR with optimized primers and reaction conditions is a powerful tool for subtyping varied AIVs in clinical and cultured samples.

  13. Interactions Between Secondhand Smoke and Genes That Affect Cystic Fibrosis Lung Disease

    PubMed Central

    Collaco, J. Michael; Vanscoy, Lori; Bremer, Lindsay; McDougal, Kathryn; Blackman, Scott M.; Bowers, Amanda; Naughton, Kathleen; Jennings, Jacky; Ellen, Jonathan; Cutting, Garry R.

    2011-01-01

    Context Disease variation can be substantial even in conditions with a single gene etiology such as cystic fibrosis (CF). Simultaneously studying the effects of genes and environment may provide insight into the causes of variation. Objective To determine whether secondhand smoke exposure is associated with lung function and other outcomes in individuals with CF, whether socioeconomic status affects the relationship between secondhand smoke exposure and lung disease severity, and whether specific gene-environment interactions influence the effect of secondhand smoke exposure on lung function. Design, Setting, and Participants Retrospective assessment of lung function, stratified by environmental and genetic factors. Data were collected by the US Cystic Fibrosis Twin and Sibling Study with missing data supplemented by the Cystic Fibrosis Foundation Data Registry. All participants were diagnosed with CF, were recruited between October 2000 and October 2006, and were primarily from the United States. Main Outcome Measures Disease-specific cross-sectional and longitudinal measures of lung function. Results Of 812 participants with data on secondhand smoke in the home, 188 (23.2%) were exposed. Of 780 participants with data on active maternal smoking during gestation, 129 (16.5%) were exposed. Secondhand smoke exposure in the home was associated with significantly lower cross-sectional (9.8 percentile point decrease; P<.001) and longitudinal lung function (6.1 percentile point decrease; P=.007) compared with those not exposed. Regression analysis demonstrated that socioeconomic status did not confound the adverse effect of secondhand smoke exposure on lung function. Interaction between gene variants and secondhand smoke exposure resulted in significant percentile point decreases in lung function, namely in CFTR non-ΔF508 homozygotes (12.8 percentile point decrease; P=.001), TGFβ1-509 TT homozygotes (22.7 percentile point decrease; P=.006), and TGFβ1 codon 10 CC homozygotes (20.3 percentile point decrease; P=.005). Conclusions Any exposure to secondhand smoke adversely affects both cross-sectional and longitudinal measures of lung function in individuals with CF. Variations in the gene that causes CF (CFTR) and a CF-modifier gene (TGFβ1) amplify the negative effects of secondhand smoke exposure. PMID:18230779

  14. A Compendium of Canine Normal Tissue Gene Expression

    PubMed Central

    Chen, Qing-Rong; Wen, Xinyu; Khan, Javed; Khanna, Chand

    2011-01-01

    Background Our understanding of disease is increasingly informed by changes in gene expression between normal and abnormal tissues. The release of the canine genome sequence in 2005 provided an opportunity to better understand human health and disease using the dog as clinically relevant model. Accordingly, we now present the first genome-wide, canine normal tissue gene expression compendium with corresponding human cross-species analysis. Methodology/Principal Findings The Affymetrix platform was utilized to catalogue gene expression signatures of 10 normal canine tissues including: liver, kidney, heart, lung, cerebrum, lymph node, spleen, jejunum, pancreas and skeletal muscle. The quality of the database was assessed in several ways. Organ defining gene sets were identified for each tissue and functional enrichment analysis revealed themes consistent with known physio-anatomic functions for each organ. In addition, a comparison of orthologous gene expression between matched canine and human normal tissues uncovered remarkable similarity. To demonstrate the utility of this dataset, novel canine gene annotations were established based on comparative analysis of dog and human tissue selective gene expression and manual curation of canine probeset mapping. Public access, using infrastructure identical to that currently in use for human normal tissues, has been established and allows for additional comparisons across species. Conclusions/Significance These data advance our understanding of the canine genome through a comprehensive analysis of gene expression in a diverse set of tissues, contributing to improved functional annotation that has been lacking. Importantly, it will be used to inform future studies of disease in the dog as a model for human translational research and provides a novel resource to the community at large. PMID:21655323

  15. Defining Aggressive Prostate Cancer Using a 12-Gene Model1

    PubMed Central

    Riva, Alberto; Kim, Robert; Varambally, Sooryanarayana; He, Le; Kutok, Jeff; Aster, Jonathan C; Tang, Jeffery; Kuefer, Rainer; Hofer, Matthias D; Febbo, Phillip G; Chinnaiyan, Arul M; Rubin, Mark A

    2006-01-01

    Abstract The critical clinical question in prostate cancer research is: How do we develop means of distinguishing aggressive disease from indolent disease? Using a combination of proteomic and expression array data, we identified a set of 36 genes with concordant dysregulation of protein products that could be evaluated in situ by quantitative immunohistochemistry. Another five prostate cancer biomarkers were included using linear discriminant analysis, we determined that the optimal model used to predict prostate cancer progression consisted of 12 proteins. Using a separate patient population, transcriptional levels of the 12 genes encoding for these proteins predicted prostate-specific antigen failure in 79 men following surgery for clinically localized prostate cancer (P = .0015). This study demonstrates that cross-platform models can lead to predictive models with the possible advantage of being more robust through this selection process. PMID:16533427

  16. Genetic dissection of the Gpnmb network in the eye.

    PubMed

    Lu, Hong; Wang, Xusheng; Pullen, Matthew; Guan, Huaijin; Chen, Hui; Sahu, Shwetapadma; Zhang, Bing; Chen, Hao; Williams, Robert W; Geisert, Eldon E; Lu, Lu; Jablonski, Monica M

    2011-06-13

    To use a systematic genetics approach to investigate the regulation of Gpnmb, a gene that contributes to pigmentary dispersion syndrome (PDS) and pigmentary glaucoma (PG) in the DBA/2J (D2) mouse. Global patterns of gene expression were studied in whole eyes of a large family of BXD mouse strains (n = 67) generated by crossing the PDS- and PG-prone parent (DBA/2J) with a resistant strain (C57BL/6J). Quantitative trait locus (eQTL) mapping methods and gene set analysis were used to evaluate Gpnmb coexpression networks in wild-type and mutant cohorts. The level of Gpnmb expression was associated with a highly significant cis-eQTL at the location of the gene itself. This autocontrol of Gpnmb is likely to be a direct consequence of the known premature stop codon in exon 4. Both gene ontology and coexpression network analyses demonstrated that the mutation in Gpnmb radically modified the set of genes with which Gpnmb expression is correlated. The covariates of wild-type Gpnmb are involved in biological processes including melanin synthesis and cell migration, whereas the covariates of mutant Gpnmb are involved in the biological processes of posttranslational modification, stress activation, and sensory processing. These results demonstrated that a systematic genetics approach provides a powerful tool for constructing coexpression networks that define the biological process categories within which similarly regulated genes function. The authors showed that the R150X mutation in Gpnmb dramatically modified its list of genetic covariates, which may explain the associated ocular pathology.

  17. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    PubMed

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  18. A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

    PubMed Central

    Seok, Junhee; Davis, Ronald W.; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge. PMID:25933378

  19. Chromosomal localization of murine and human oligodendrocyte-specific protein genes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronstein, J.M.; Wu, S.; Korenberg, J.R.

    1996-06-01

    Oligodendrocyte-specific protein (OSP) is a recently described protein present only in myelin of the central nervous system. Several inherited disorders of myelin are caused by mutations in myelin genes but the etiology of many remain unknown. We mapped the location of the mouse OSP gene to the proximal region of chromosome 3 using two sets of multilocus crosses and to human chromosome 3 using somatic cell hybrids. Fine mapping with fluorescence in situ hybridization placed the OSP gene at human chromosome 3q26.2-q26.3. To date, there are no known inherited neurological disorders that localize to these regions. 24 refs., 2 figs.

  20. Prioritization of candidate disease genes by combining topological similarity and semantic similarity.

    PubMed

    Liu, Bin; Jin, Min; Zeng, Pan

    2015-10-01

    The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Bioreducible Fluorinated Peptide Dendrimers Capable of Circumventing Various Physiological Barriers for Highly Efficient and Safe Gene Delivery.

    PubMed

    Cai, Xiaojun; Jin, Rongrong; Wang, Jiali; Yue, Dong; Jiang, Qian; Wu, Yao; Gu, Zhongwei

    2016-03-09

    Polymeric vectors have shown great promise in the development of safe and efficient gene delivery systems; however, only a few have been developed in clinical settings due to poor transport across multiple physiological barriers. To address this issue and promote clinical translocation of polymeric vectors, a new type of polymeric vector, bioreducible fluorinated peptide dendrimers (BFPDs), was designed and synthesized by reversible cross-linking of fluorinated low generation peptide dendrimers. Through masterly integration all of the features of reversible cross-linking, fluorination, and polyhedral oligomeric silsesquioxane (POSS) core-based peptide dendrimers, this novel vector exhibited lots of unique features, including (i) inactive surface to resist protein interactions; (ii) virus-mimicking surface topography to augment cellular uptake; (iii) fluorination-mediated efficient cellular uptake, endosome escape, cytoplasm trafficking, and nuclear entry, and (iv) disulfide-cleavage-mediated polyplex disassembly and DNA release that allows efficient DNA transcription. Noteworthy, all of these features are functionally important and can synergistically facilitate DNA transport from solution to the nucleus. As a consequences, BFPDs showed excellent gene transfection efficiency in several cell lines (∼95% in HEK293 cells) and superior biocompatibility compared with polyethylenimine (PEI). Meanwhile BFPDs provided excellent serum resistance in gene delivery. More importantly, BFPDs offer considerable in vivo gene transfection efficiency (in muscular tissues and in HepG2 tumor xenografts), which was approximately 77-fold higher than that of PEI in luciferase activity. These results suggest bioreducible fluorinated peptide dendrimers are a new class of highly efficient and safe gene delivery vectors and should be used in clinical settings.

  2. Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

    PubMed Central

    Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P

    2008-01-01

    Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951

  3. Sexual reproduction, sporophyte development and molecular variation in the model moss Physcomitrella patens: introducing the ecotype Reute.

    PubMed

    Hiss, Manuel; Meyberg, Rabea; Westermann, Jens; Haas, Fabian B; Schneider, Lucas; Schallenberg-Rüdinger, Mareike; Ullrich, Kristian K; Rensing, Stefan A

    2017-05-01

    Rich ecotype collections are used for several plant models to unravel the molecular causes of phenotypic differences, and to investigate the effects of environmental adaption and acclimation. For the model moss Physcomitrella patens collections of accessions are available, and have been used for phylogenetic and taxonomic studies, for example, but few have been investigated further for phenotypic differences. Here, we focus on the Reute accession and provide expression profiling and comparative developmental data for several stages of sporophyte development, as well as information on genetic variation via genomic sequencing. We analysed cross-technology and cross-laboratory data to define a confident set of 15 mature sporophyte-specific genes. We find that the standard laboratory strain Gransden produces fewer sporophytes than Reute or Villersexel, although gametangia develop with the same time course and do not show evident morphological differences. Reute exhibits less genetic variation relative to Gransden than Villersexel, yet we found variation between Gransden and Reute in the expression profiles of several genes, as well as variation hot spots and genes that appear to evolve under positive Darwinian selection. We analyzed expression differences between the ecotypes for selected candidate genes in the GRAS transcription factor family, the chalcone synthase family and in genes involved in cell wall modification that are potentially related to phenotypic differences. We confirm that Reute is a P. patens ecotype, and suggest its use for reverse-genetics studies that involve progression through the life cycle and multiple generations. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  4. Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

    PubMed Central

    Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

    2005-01-01

    We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308

  5. Good genes, genetic compatibility and the evolution of polyandry: use of the diallel cross to address competing hypotheses.

    PubMed

    Ivy, T M

    2007-03-01

    Genetic benefits can enhance the fitness of polyandrous females through the high intrinsic genetic quality of females' mates or through the interaction between female and male genes. I used a full diallel cross, a quantitative genetics design that involves all possible crosses among a set of genetically homogeneous lines, to determine the mechanism through which polyandrous female decorated crickets (Gryllodes sigillatus) obtain genetic benefits. I measured several traits related to fitness and partitioned the phenotypic variance into components representing the contribution of additive genetic variance ('good genes'), nonadditive genetic variance (genetic compatibility), as well as maternal and paternal effects. The results reveal a significant variance attributable to both nonadditive and additive sources in the measured traits, and their influence depended on which trait was considered. The lack of congruence in sources of phenotypic variance among these fitness-related traits suggests that the evolution and maintenance of polyandry are unlikely to have resulted from one selective influence, but rather are the result of the collective effects of a number of factors.

  6. A Transcriptomic Comparison of Two Bambara Groundnut Landraces under Dehydration Stress

    PubMed Central

    Khan, Faraz; Chai, Hui Hui; Ajmera, Ishan; Hodgman, Charlie; Mayes, Sean; Lu, Chungui

    2017-01-01

    The ability to grow crops under low-water conditions is a significant advantage in relation to global food security. Bambara groundnut is an underutilised crop grown by subsistence farmers in Africa and is known to survive in regions of water deficit. This study focuses on the analysis of the transcriptomic changes in two bambara groundnut landraces in response to dehydration stress. A cross-species hybridisation approach based on the Soybean Affymetrix GeneChip array has been employed. The differential gene expression analysis of a water-limited treatment, however, showed that the two landraces responded with almost completely different sets of genes. Hence, both landraces with very similar genotypes (as assessed by the hybridisation of genomic DNA onto the Soybean Affymetrix GeneChip) showed contrasting transcriptional behaviour in response to dehydration stress. In addition, both genotypes showed a high expression of dehydration-associated genes, even under water-sufficient conditions. Several gene regulators were identified as potentially important. Some are already known, such as WRKY40, but others may also be considered, namely PRR7, ATAUX2-11, CONSTANS-like 1, MYB60, AGL-83, and a Zinc-finger protein. These data provide a basis for drought trait research in the bambara groundnut, which will facilitate functional genomics studies. An analysis of this dataset has identified that both genotypes appear to be in a dehydration-ready state, even in the absence of dehydration stress, and may have adapted in different ways to achieve drought resistance. This will help in understanding the mechanisms underlying the ability of crops to produce viable yields under drought conditions. In addition, cross-species hybridisation to the soybean microarray has been shown to be informative for investigating the bambara groundnut transcriptome. PMID:28420201

  7. Altered expression of testis-specific genes, piRNAs, and transposons in the silkworm ovary masculinized by a W chromosome mutation

    PubMed Central

    2012-01-01

    Background In the silkworm, Bombyx mori, femaleness is strongly controlled by the female-specific W chromosome. Originally, it was presumed that the W chromosome encodes female-determining gene(s), accordingly called Fem. However, to date, neither Fem nor any protein-coding gene has been identified from the W chromosome. Instead, the W chromosome is occupied with numerous transposon-related sequences. Interestingly, the silkworm W chromosome is a source of female-enriched PIWI-interacting RNAs (piRNAs). piRNAs are small RNAs of 23-30 nucleotides in length, which are required for controlling transposon activity in animal gonads. A recent study has identified a novel mutant silkworm line called KG, whose mutation in the W chromosome causes severe female masculinization. However, the molecular nature of KG line has not been well characterized yet. Results Here we molecularly characterize the KG line. Genomic PCR analyses using currently available W chromosome-specific PCR markers indicated that no large deletion existed in the KG W chromosome. Genetic analyses demonstrated that sib-crosses within the KG line suppressed masculinization. Masculinization reactivated when crossing KG females with wild type males. Importantly, the KG ovaries exhibited a significantly abnormal transcriptome. First, the KG ovaries misexpressed testis-specific genes. Second, a set of female-enriched piRNAs was downregulated in the KG ovaries. Third, several transposons were overexpressed in the KG ovaries. Conclusions Collectively, the mutation in the KG W chromosome causes broadly altered expression of testis-specific genes, piRNAs, and transposons. To our knowledge, this is the first study that describes a W chromosome mutant with such an intriguing phenotype. PMID:22452797

  8. Dynamic gene expression response to altered gravity in human T cells.

    PubMed

    Thiel, Cora S; Hauschild, Swantje; Huge, Andreas; Tauber, Svantje; Lauber, Beatrice A; Polzer, Jennifer; Paulsen, Katrin; Lier, Hartwin; Engelmann, Frank; Schmitz, Burkhard; Schütte, Andreas; Layer, Liliana E; Ullrich, Oliver

    2017-07-12

    We investigated the dynamics of immediate and initial gene expression response to different gravitational environments in human Jurkat T lymphocytic cells and compared expression profiles to identify potential gravity-regulated genes and adaptation processes. We used the Affymetrix GeneChip® Human Transcriptome Array 2.0 containing 44,699 protein coding genes and 22,829 non-protein coding genes and performed the experiments during a parabolic flight and a suborbital ballistic rocket mission to cross-validate gravity-regulated gene expression through independent research platforms and different sets of control experiments to exclude other factors than alteration of gravity. We found that gene expression in human T cells rapidly responded to altered gravity in the time frame of 20 s and 5 min. The initial response to microgravity involved mostly regulatory RNAs. We identified three gravity-regulated genes which could be cross-validated in both completely independent experiment missions: ATP6V1A/D, a vacuolar H + -ATPase (V-ATPase) responsible for acidification during bone resorption, IGHD3-3/IGHD3-10, diversity genes of the immunoglobulin heavy-chain locus participating in V(D)J recombination, and LINC00837, a long intergenic non-protein coding RNA. Due to the extensive and rapid alteration of gene expression associated with regulatory RNAs, we conclude that human cells are equipped with a robust and efficient adaptation potential when challenged with altered gravitational environments.

  9. A meta-data based method for DNA microarray imputation.

    PubMed

    Jörnsten, Rebecka; Ouyang, Ming; Wang, Hui-Yu

    2007-03-29

    DNA microarray experiments are conducted in logical sets, such as time course profiling after a treatment is applied to the samples, or comparisons of the samples under two or more conditions. Due to cost and design constraints of spotted cDNA microarray experiments, each logical set commonly includes only a small number of replicates per condition. Despite the vast improvement of the microarray technology in recent years, missing values are prevalent. Intuitively, imputation of missing values is best done using many replicates within the same logical set. In practice, there are few replicates and thus reliable imputation within logical sets is difficult. However, it is in the case of few replicates that the presence of missing values, and how they are imputed, can have the most profound impact on the outcome of downstream analyses (e.g. significance analysis and clustering). This study explores the feasibility of imputation across logical sets, using the vast amount of publicly available microarray data to improve imputation reliability in the small sample size setting. We download all cDNA microarray data of Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans from the Stanford Microarray Database. Through cross-validation and simulation, we find that, for all three species, our proposed imputation using data from public databases is far superior to imputation within a logical set, sometimes to an astonishing degree. Furthermore, the imputation root mean square error for significant genes is generally a lot less than that of non-significant ones. Since downstream analysis of significant genes, such as clustering and network analysis, can be very sensitive to small perturbations of estimated gene effects, it is highly recommended that researchers apply reliable data imputation prior to further analysis. Our method can also be applied to cDNA microarray experiments from other species, provided good reference data are available.

  10. Expression signature as a biomarker for prenatal diagnosis of trisomy 21.

    PubMed

    Volk, Marija; Maver, Aleš; Lovrečić, Luca; Juvan, Peter; Peterlin, Borut

    2013-01-01

    A universal biomarker panel with the potential to predict high-risk pregnancies or adverse pregnancy outcome does not exist. Transcriptome analysis is a powerful tool to capture differentially expressed genes (DEG), which can be used as biomarker-diagnostic-predictive tool for various conditions in prenatal setting. In search of biomarker set for predicting high-risk pregnancies, we performed global expression profiling to find DEG in Ts21. Subsequently, we performed targeted validation and diagnostic performance evaluation on a larger group of case and control samples. Initially, transcriptomic profiles of 10 cultivated amniocyte samples with Ts21 and 9 with normal euploid constitution were determined using expression microarrays. Datasets from Ts21 transcriptomic studies from GEO repository were incorporated. DEG were discovered using linear regression modelling and validated using RT-PCR quantification on an independent sample of 16 cases with Ts21 and 32 controls. The classification performance of Ts21 status based on expression profiling was performed using supervised machine learning algorithm and evaluated using a leave-one-out cross validation approach. Global gene expression profiling has revealed significant expression changes between normal and Ts21 samples, which in combination with data from previously performed Ts21 transcriptomic studies, were used to generate a multi-gene biomarker for Ts21, comprising of 9 gene expression profiles. In addition to biomarker's high performance in discriminating samples from global expression profiling, we were also able to show its discriminatory performance on a larger sample set 2, validated using RT-PCR experiment (AUC=0.97), while its performance on data from previously published studies reached discriminatory AUC values of 1.00. Our results show that transcriptomic changes might potentially be used to discriminate trisomy of chromosome 21 in the prenatal setting. As expressional alterations reflect both, causal and reactive cellular mechanisms, transcriptomic changes may thus have future potential in the diagnosis of a wide array of heterogeneous diseases that result from genetic disturbances.

  11. Discovering relationships between nuclear receptor signaling pathways, genes, and tissues in Transcriptomine.

    PubMed

    Becnel, Lauren B; Ochsner, Scott A; Darlington, Yolanda F; McOwiti, Apollo; Kankanamge, Wasula H; Dehart, Michael; Naumov, Alexey; McKenna, Neil J

    2017-04-25

    We previously developed a web tool, Transcriptomine, to explore expression profiling data sets involving small-molecule or genetic manipulations of nuclear receptor signaling pathways. We describe advances in biocuration, query interface design, and data visualization that enhance the discovery of uncharacterized biology in these pathways using this tool. Transcriptomine currently contains about 45 million data points encompassing more than 2000 experiments in a reference library of nearly 550 data sets retrieved from public archives and systematically curated. To make the underlying data points more accessible to bench biologists, we classified experimental small molecules and gene manipulations into signaling pathways and experimental tissues and cell lines into physiological systems and organs. Incorporation of these mappings into Transcriptomine enables the user to readily evaluate tissue-specific regulation of gene expression by nuclear receptor signaling pathways. Data points from animal and cell model experiments and from clinical data sets elucidate the roles of nuclear receptor pathways in gene expression events accompanying various normal and pathological cellular processes. In addition, data sets targeting non-nuclear receptor signaling pathways highlight transcriptional cross-talk between nuclear receptors and other signaling pathways. We demonstrate with specific examples how data points that exist in isolation in individual data sets validate each other when connected and made accessible to the user in a single interface. In summary, Transcriptomine allows bench biologists to routinely develop research hypotheses, validate experimental data, or model relationships between signaling pathways, genes, and tissues. Copyright © 2017, American Association for the Advancement of Science.

  12. Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies.

    PubMed

    Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano

    2011-04-11

    In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.

  13. Gene set analysis of purine and pyrimidine antimetabolites cancer therapies.

    PubMed

    Fridley, Brooke L; Batzler, Anthony; Li, Liang; Li, Fang; Matimba, Alice; Jenkins, Gregory D; Ji, Yuan; Wang, Liewei; Weinshilboum, Richard M

    2011-11-01

    Responses to therapies, either with regard to toxicities or efficacy, are expected to involve complex relationships of gene products within the same molecular pathway or functional gene set. Therefore, pathways or gene sets, as opposed to single genes, may better reflect the true underlying biology and may be more appropriate units for analysis of pharmacogenomic studies. Application of such methods to pharmacogenomic studies may enable the detection of more subtle effects of multiple genes in the same pathway that may be missed by assessing each gene individually. A gene set analysis of 3821 gene sets is presented assessing the association between basal messenger RNA expression and drug cytotoxicity using ethnically defined human lymphoblastoid cell lines for two classes of drugs: pyrimidines [gemcitabine (dFdC) and arabinoside] and purines [6-thioguanine and 6-mercaptopurine]. The gene set nucleoside-diphosphatase activity was found to be significantly associated with both dFdC and arabinoside, whereas gene set γ-aminobutyric acid catabolic process was associated with dFdC and 6-thioguanine. These gene sets were significantly associated with the phenotype even after adjusting for multiple testing. In addition, five associated gene sets were found in common between the pyrimidines and two gene sets for the purines (3',5'-cyclic-AMP phosphodiesterase activity and γ-aminobutyric acid catabolic process) with a P value of less than 0.0001. Functional validation was attempted with four genes each in gene sets for thiopurine and pyrimidine antimetabolites. All four genes selected from the pyrimidine gene sets (PSME3, CANT1, ENTPD6, ADRM1) were validated, but only one (PDE4D) was validated for the thiopurine gene sets. In summary, results from the gene set analysis of pyrimidine and purine therapies, used often in the treatment of various cancers, provide novel insight into the relationship between genomic variation and drug response.

  14. Development of a multiplex PCR assay for detection and discrimination of Theileria annulata and Theileria sergenti in cattle.

    PubMed

    Junlong, Liu; Li, Youquan; Liu, Aihong; Guan, Guiquan; Xie, Junren; Yin, Hong; Luo, Jianxun

    2015-07-01

    Aim to construct a simple and efficient diagnostic assay for Theileria annulata and Theileria sergenti, a multiplex polymerase chain reaction (PCR) method was developed in this study. Following the alignment of the related sequences, two primer sets were designed specific targeting on T. annulata cytochrome b (COB) gene and T. sergenti internal transcribed spacer (ITS) sequences. It was found that the designed primers could react in one PCR system and generating amplifications of 818 and 393 base pair for T. sergenti and T. annulata, respectively. The standard genomic DNA of both species Theileria was serial tenfold diluted for testing the sensitivity, while specificity test confirmed both primer sets have no cross-reaction with other Theileria and Babesia species. In addition, 378 field samples were used for evaluation of the utility of the multiplex PCR assay for detection of the pathogens infection. The detection results were compared with the other two published PCR methods which targeting on T. annulata COB gene and T. sergenti major piroplasm surface protein (MPSP) gene, respectively. The developed multiplex PCR assay has similar efficient detection with COB and MPSP PCR, which indicates this multiplex PCR may be a valuable assay for the epidemiological studies for T. annulata and T. sergenti.

  15. Comparative study on gene set and pathway topology-based enrichment methods.

    PubMed

    Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

    2015-10-22

    Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps.

  16. University of California San Francisco (UCSF-2): Gene Expression Profiling of Normal Mouse Skin, Hras WT and Hras -/- | Office of Cancer Genomics

    Cancer.gov

    This data set contains the transcriptional profiles of 20 dorsal skin samples from eight-week-old mice. Mice were generated by crossing FVB/N to Mus spretus mice to generate F1 mice, and then crossing F1 mice back to the FVB/N strain. 10  FVB/N mice lacking Hras1 (aka HrasKO, Hras-/-) and 10  FVB/N mice with wild-type Hras1 were generated. Read the abstract.

  17. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGES

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  18. A robust prognostic signature for hormone-positive node-negative breast cancer.

    PubMed

    Griffith, Obi L; Pepin, François; Enache, Oana M; Heiser, Laura M; Collisson, Eric A; Spellman, Paul T; Gray, Joe W

    2013-01-01

    Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

  19. A robust prognostic signature for hormone-positive node-negative breast cancer

    PubMed Central

    2013-01-01

    Background Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. Conclusions RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment. PMID:24112773

  20. Genome-wide Meta-analyses of Breast, Ovarian and Prostate Cancer Association Studies Identify Multiple New Susceptibility Loci Shared by At Least Two Cancer Types

    PubMed Central

    Kar, Siddhartha P.; Beesley, Jonathan; Al Olama, Ali Amin; Michailidou, Kyriaki; Tyrer, Jonathan; Kote-Jarai, ZSofia; Lawrenson, Kate; Lindstrom, Sara; Ramus, Susan J.; Thompson, Deborah J.; Kibel, Adam S.; Dansonka-Mieszkowska, Agnieszka; Michael, Agnieszka; Dieffenbach, Aida K.; Gentry-Maharaj, Aleksandra; Whittemore, Alice S.; Wolk, Alicja; Monteiro, Alvaro; Peixoto, Ana; Kierzek, Andrzej; Cox, Angela; Rudolph, Anja; Gonzalez-Neira, Anna; Wu, Anna H.; Lindblom, Annika; Swerdlow, Anthony; Ziogas, Argyrios; Ekici, Arif B.; Burwinkel, Barbara; Karlan, Beth Y.; Nordestgaard, Børge G.; Blomqvist, Carl; Phelan, Catherine; McLean, Catriona; Pearce, Celeste Leigh; Vachon, Celine; Cybulski, Cezary; Slavov, Chavdar; Stegmaier, Christa; Maier, Christiane; Ambrosone, Christine B.; Høgdall, Claus K.; Teerlink, Craig C.; Kang, Daehee; Tessier, Daniel C.; Schaid, Daniel J.; Stram, Daniel O.; Cramer, Daniel W.; Neal, David E.; Eccles, Diana; Flesch-Janys, Dieter; Velez Edwards, Digna R.; Wokozorczyk, Dominika; Levine, Douglas A.; Yannoukakos, Drakoulis; Sawyer, Elinor J.; Bandera, Elisa V.; Poole, Elizabeth M.; Goode, Ellen L.; Khusnutdinova, Elza; Høgdall, Estrid; Song, Fengju; Bruinsma, Fiona; Heitz, Florian; Modugno, Francesmary; Hamdy, Freddie C.; Wiklund, Fredrik; Giles, Graham G.; Olsson, Håkan; Wildiers, Hans; Ulmer, Hans-Ulrich; Pandha, Hardev; Risch, Harvey A.; Darabi, Hatef; Salvesen, Helga B.; Nevanlinna, Heli; Gronberg, Henrik; Brenner, Hermann; Brauch, Hiltrud; Anton-Culver, Hoda; Song, Honglin; Lim, Hui-Yi; McNeish, Iain; Campbell, Ian; Vergote, Ignace; Gronwald, Jacek; Lubiński, Jan; Stanford, Janet L.; Benítez, Javier; Doherty, Jennifer A.; Permuth, Jennifer B.; Chang-Claude, Jenny; Donovan, Jenny L.; Dennis, Joe; Schildkraut, Joellen M.; Schleutker, Johanna; Hopper, John L.; Kupryjanczyk, Jolanta; Park, Jong Y.; Figueroa, Jonine; Clements, Judith A.; Knight, Julia A.; Peto, Julian; Cunningham, Julie M.; Pow-Sang, Julio; Batra, Jyotsna; Czene, Kamila; Lu, Karen H.; Herkommer, Kathleen; Khaw, Kay-Tee; Matsuo, Keitaro; Muir, Kenneth; Offitt, Kenneth; Chen, Kexin; Moysich, Kirsten B.; Aittomäki, Kristiina; Odunsi, Kunle; Kiemeney, Lambertus A.; Massuger, Leon F.A.G.; Fitzgerald, Liesel M.; Cook, Linda S.; Cannon-Albright, Lisa; Hooning, Maartje J.; Pike, Malcolm C.; Bolla, Manjeet K.; Luedeke, Manuel; Teixeira, Manuel R.; Goodman, Marc T.; Schmidt, Marjanka K.; Riggan, Marjorie; Aly, Markus; Rossing, Mary Anne; Beckmann, Matthias W.; Moisse, Matthieu; Sanderson, Maureen; Southey, Melissa C.; Jones, Michael; Lush, Michael; Hildebrandt, Michelle A. T.; Hou, Ming-Feng; Schoemaker, Minouk J.; Garcia-Closas, Montserrat; Bogdanova, Natalia; Rahman, Nazneen; Le, Nhu D.; Orr, Nick; Wentzensen, Nicolas; Pashayan, Nora; Peterlongo, Paolo; Guénel, Pascal; Brennan, Paul; Paulo, Paula; Webb, Penelope M.; Broberg, Per; Fasching, Peter A.; Devilee, Peter; Wang, Qin; Cai, Qiuyin; Li, Qiyuan; Kaneva, Radka; Butzow, Ralf; Kopperud, Reidun Kristin; Schmutzler, Rita K.; Stephenson, Robert A.; MacInnis, Robert J.; Hoover, Robert N.; Winqvist, Robert; Ness, Roberta; Milne, Roger L.; Travis, Ruth C.; Benlloch, Sara; Olson, Sara H.; McDonnell, Shannon K.; Tworoger, Shelley S.; Maia, Sofia; Berndt, Sonja; Lee, Soo Chin; Teo, Soo-Hwang; Thibodeau, Stephen N.; Bojesen, Stig E.; Gapstur, Susan M.; Kjær, Susanne Krüger; Pejovic, Tanja; Tammela, Teuvo L.J.; Dörk, Thilo; Brüning, Thomas; Wahlfors, Tiina; Key, Tim J.; Edwards, Todd L.; Menon, Usha; Hamann, Ute; Mitev, Vanio; Kosma, Veli-Matti; Setiawan, Veronica Wendy; Kristensen, Vessela; Arndt, Volker; Vogel, Walther; Zheng, Wei; Sieh, Weiva; Blot, William J.; Kluzniak, Wojciech; Shu, Xiao-Ou; Gao, Yu-Tang; Schumacher, Fredrick; Freedman, Matthew L.; Berchuck, Andrew; Dunning, Alison M.; Simard, Jacques; Haiman, Christopher A.; Spurdle, Amanda; Sellers, Thomas A.; Hunter, David J.; Henderson, Brian E.; Kraft, Peter; Chanock, Stephen J.; Couch, Fergus J.; Hall, Per; Gayther, Simon A.; Easton, Douglas F.; Chenevix-Trench, Georgia; Eeles, Rosalind; Pharoah, Paul D.P.; Lambrechts, Diether

    2016-01-01

    Breast, ovarian, and prostate cancers are hormone-related and may have a shared genetic basis but this has not been investigated systematically by genome-wide association (GWA) studies. Meta-analyses combining the largest GWA meta-analysis data sets for these cancers totaling 112,349 cases and 116,421 controls of European ancestry, all together and in pairs, identified at P < 10−8 seven new cross-cancer loci: three associated with susceptibility to all three cancers (rs17041869/2q13/BCL2L11; rs7937840/11q12/INCENP; rs1469713/19p13/GATAD2A), two breast and ovarian cancer risk loci (rs200182588/9q31/SMC2; rs8037137/15q26/RCCD1), and two breast and prostate cancer risk loci (rs5013329/1p34/NSUN4; rs9375701/6q23/L3MBTL3). Index variants in five additional regions previously associated with only one cancer also showed clear association with a second cancer type. Cell-type specific expression quantitative trait locus and enhancer-gene interaction annotations suggested target genes with potential cross-cancer roles at the new loci. Pathway analysis revealed significant enrichment of death receptor signaling genes near loci with P < 10−5 in the three-cancer meta-analysis. PMID:27432226

  1. Integrating genome-wide association study and expression quantitative trait loci data identifies multiple genes and gene set associated with neuroticism.

    PubMed

    Fan, Qianrui; Wang, Wenyu; Hao, Jingcan; He, Awen; Wen, Yan; Guo, Xiong; Wu, Cuiyan; Ning, Yujie; Wang, Xi; Wang, Sen; Zhang, Feng

    2017-08-01

    Neuroticism is a fundamental personality trait with significant genetic determinant. To identify novel susceptibility genes for neuroticism, we conducted an integrative analysis of genomic and transcriptomic data of genome wide association study (GWAS) and expression quantitative trait locus (eQTL) study. GWAS summary data was driven from published studies of neuroticism, totally involving 170,906 subjects. eQTL dataset containing 927,753 eQTLs were obtained from an eQTL meta-analysis of 5311 samples. Integrative analysis of GWAS and eQTL data was conducted by summary data-based Mendelian randomization (SMR) analysis software. To identify neuroticism associated gene sets, the SMR analysis results were further subjected to gene set enrichment analysis (GSEA). The gene set annotation dataset (containing 13,311 annotated gene sets) of GSEA Molecular Signatures Database was used. SMR single gene analysis identified 6 significant genes for neuroticism, including MSRA (p value=2.27×10 -10 ), MGC57346 (p value=6.92×10 -7 ), BLK (p value=1.01×10 -6 ), XKR6 (p value=1.11×10 -6 ), C17ORF69 (p value=1.12×10 -6 ) and KIAA1267 (p value=4.00×10 -6 ). Gene set enrichment analysis observed significant association for Chr8p23 gene set (false discovery rate=0.033). Our results provide novel clues for the genetic mechanism studies of neuroticism. Copyright © 2017. Published by Elsevier Inc.

  2. A Cross-Cancer Genetic Association Analysis of the DNA Repair and DNA Damage Signaling Pathways for Lung, Ovary, Prostate, Breast, and Colorectal Cancer.

    PubMed

    Scarbrough, Peter M; Weber, Rachel Palmieri; Iversen, Edwin S; Brhane, Yonathan; Amos, Christopher I; Kraft, Peter; Hung, Rayjean J; Sellers, Thomas A; Witte, John S; Pharoah, Paul; Henderson, Brian E; Gruber, Stephen B; Hunter, David J; Garber, Judy E; Joshi, Amit D; McDonnell, Kevin; Easton, Doug F; Eeles, Ros; Kote-Jarai, Zsofia; Muir, Kenneth; Doherty, Jennifer A; Schildkraut, Joellen M

    2016-01-01

    DNA damage is an established mediator of carcinogenesis, although genome-wide association studies (GWAS) have identified few significant loci. This cross-cancer site, pooled analysis was performed to increase the power to detect common variants of DNA repair genes associated with cancer susceptibility. We conducted a cross-cancer analysis of 60,297 single nucleotide polymorphisms, at 229 DNA repair gene regions, using data from the NCI Genetic Associations and Mechanisms in Oncology (GAME-ON) Network. Our analysis included data from 32 GWAS and 48,734 controls and 51,537 cases across five cancer sites (breast, colon, lung, ovary, and prostate). Because of the unavailability of individual data, data were analyzed at the aggregate level. Meta-analysis was performed using the Association analysis for SubSETs (ASSET) software. To test for genetic associations that might escape individual variant testing due to small effect sizes, pathway analysis of eight DNA repair pathways was performed using hierarchical modeling. We identified three susceptibility DNA repair genes, RAD51B (P < 5.09 × 10(-6)), MSH5 (P < 5.09 × 10(-6)), and BRCA2 (P = 5.70 × 10(-6)). Hierarchical modeling identified several pleiotropic associations with cancer risk in the base excision repair, nucleotide excision repair, mismatch repair, and homologous recombination pathways. Only three susceptibility loci were identified, which had all been previously reported. In contrast, hierarchical modeling identified several pleiotropic cancer risk associations in key DNA repair pathways. Results suggest that many common variants in DNA repair genes are likely associated with cancer susceptibility through small effect sizes that do not meet stringent significance testing criteria. ©2015 American Association for Cancer Research.

  3. Prevalence and Characterization of Carbapenem-Resistant Enterobacteriaceae Isolated from Mulago National Referral Hospital, Uganda

    PubMed Central

    Okoche, Deogratius; Asiimwe, Benon B.; Katabazi, Fred Ashaba; Kato, Laban; Najjuka, Christine F.

    2015-01-01

    Introduction Carbapenemases have increasingly been reported in enterobacteriaceae worldwide. Most carbapenemases are plasmid encoded hence resistance can easily spread. Carbapenem-resistant enterobacteriaceae are reported to cause mortality in up to 50% of patients who acquire bloodstream infections. We set out to determine the burden of carbapenem resistance as well as establish genes encoding for carbapenemases in enterobacteriaceae clinical isolates obtained from Mulago National Referral Hospital, Uganda. Methods This was a cross-sectional study with a total of 196 clinical isolates previously collected from pus swabs, urine, blood, sputum, tracheal aspirates, cervical swabs, endomentrial aspirates, rectal swabs, Vaginal swabs, ear swabs, products of conception, wound biopsy and amniotic fluid. All isolates were subjected to phenotypic carbapenemase screening using Boronic acid-based inhibition, Modified Hodge and EDTA double combined disk test. In addition, all the isolates were subjected to PCR assay to confirm presence of carbapenemase encoding genes. Results The study found carbapenemase prevalence of 22.4% (44/196) in the isolates using phenotypic tests, with the genotypic prevalence slightly higher at 28.6% (56/196). Over all, the most prevalent gene was blaVIM (21,10.7%), followed by blaOXA-48 (19, 9.7%), blaIMP (12, 6.1%), blaKPC (10, 5.1%) and blaNDM-1 (5, 2.6%). Among 56 isolates positive for 67 carbapenemase encoding genes, Klebsiella pneumonia was the species with the highest number (52.2%). Most 32/67(47.7%) of these resistance genes were in bacteria isolated from pus swabs. Conclusion There is a high prevalence of carbapenemases and carbapenem-resistance encoding genes among third generation cephalosporins resistant Enterobacteriaceae in Uganda, indicating a danger of limited treatment options in this setting in the near future. PMID:26284519

  4. Prediction of missing common genes for disease pairs using network based module separation on incomplete human interactome.

    PubMed

    Akram, Pakeeza; Liao, Li

    2017-12-06

    Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome. Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data. Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.

  5. Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation

    PubMed Central

    Delorenzi, Mauro

    2014-01-01

    Background With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences (“batch effects”) as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. Focus The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. Data We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., ‘control’) or group 2 (e.g., ‘treated’). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. Methods We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data. PMID:24967636

  6. Optimized Probe Masking for Comparative Transcriptomics of Closely Related Species

    PubMed Central

    Poeschl, Yvonne; Delker, Carolin; Trenner, Jana; Ullrich, Kristian Karsten; Quint, Marcel; Grosse, Ivo

    2013-01-01

    Microarrays are commonly applied to study the transcriptome of specific species. However, many available microarrays are restricted to model organisms, and the design of custom microarrays for other species is often not feasible. Hence, transcriptomics approaches of non-model organisms as well as comparative transcriptomics studies among two or more species often make use of cost-intensive RNAseq studies or, alternatively, by hybridizing transcripts of a query species to a microarray of a closely related species. When analyzing these cross-species microarray expression data, differences in the transcriptome of the query species can cause problems, such as the following: (i) lower hybridization accuracy of probes due to mismatches or deletions, (ii) probes binding multiple transcripts of different genes, and (iii) probes binding transcripts of non-orthologous genes. So far, methods for (i) exist, but these neglect (ii) and (iii). Here, we propose an approach for comparative transcriptomics addressing problems (i) to (iii), which retains only transcript-specific probes binding transcripts of orthologous genes. We apply this approach to an Arabidopsis lyrata expression data set measured on a microarray designed for Arabidopsis thaliana, and compare it to two alternative approaches, a sequence-based approach and a genomic DNA hybridization-based approach. We investigate the number of retained probe sets, and we validate the resulting expression responses by qRT-PCR. We find that the proposed approach combines the benefit of sequence-based stringency and accuracy while allowing the expression analysis of much more genes than the alternative sequence-based approach. As an added benefit, the proposed approach requires probes to detect transcripts of orthologous genes only, which provides a superior base for biological interpretation of the measured expression responses. PMID:24260119

  7. An Efficient Test for Gene-Environment Interaction in Generalized Linear Mixed Models with Family Data.

    PubMed

    Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza

    2017-09-27

    Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.

  8. Fish and chips: Various methodologies demonstrate utility of a 16,006-gene salmonid microarray

    PubMed Central

    von Schalburg, Kristian R; Rise, Matthew L; Cooper, Glenn A; Brown, Gordon D; Gibbs, A Ross; Nelson, Colleen C; Davidson, William S; Koop, Ben F

    2005-01-01

    Background We have developed and fabricated a salmonid microarray containing cDNAs representing 16,006 genes. The genes spotted on the array have been stringently selected from Atlantic salmon and rainbow trout expressed sequence tag (EST) databases. The EST databases presently contain over 300,000 sequences from over 175 salmonid cDNA libraries derived from a wide variety of tissues and different developmental stages. In order to evaluate the utility of the microarray, a number of hybridization techniques and screening methods have been developed and tested. Results We have analyzed and evaluated the utility of a microarray containing 16,006 (16K) salmonid cDNAs in a variety of potential experimental settings. We quantified the amount of transcriptome binding that occurred in cross-species, organ complexity and intraspecific variation hybridization studies. We also developed a methodology to rapidly identify and confirm the contents of a bacterial artificial chromosome (BAC) library containing Atlantic salmon genomic DNA. Conclusion We validate and demonstrate the usefulness of the 16K microarray over a wide range of teleosts, even for transcriptome targets from species distantly related to salmonids. We show the potential of the use of the microarray in a variety of experimental settings through hybridization studies that examine the binding of targets derived from different organs and tissues. Intraspecific variation in transcriptome expression is evaluated and discussed. Finally, BAC hybridizations are demonstrated as a rapid and accurate means to identify gene content. PMID:16164747

  9. Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

    PubMed

    Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

    2007-09-21

    Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.

  10. Learning a Markov Logic network for supervised gene regulatory network inference

    PubMed Central

    2013-01-01

    Background Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. Results We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate “regulates”, starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. Conclusions The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge. PMID:24028533

  11. Learning a Markov Logic network for supervised gene regulatory network inference.

    PubMed

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.

  12. A gene block causing cross-incompatibility hidden in wild and cultivated rice.

    PubMed Central

    Matsubara, Kazuki; Khin-Thidar; Sano, Yoshio

    2003-01-01

    Unidirectional cross-incompatibility was detected in advanced generations of backcrossing between wild (Oryza rufipogon) and cultivated (O. sativa) rice strains. The near-isogenic line (NIL) of T65wx (Japonica type) carrying an alien segment of chromosome 6 from a wild strain gave a reduced seed setting only when crossed with T65wx as the male. Cytological observations showed that abortion of hybrid seeds occurred as a consequence of a failure of early endosperm development followed by abnormalities in embryo development. The genetic basis of cross-incompatibility reactions in the female and male was investigated by testcrosses using recombinant inbred lines (RILs) that were established through dissecting the introgressed segments of wild and cultivated (Indica type) strains. The results revealed that the cross-incompatibility reaction was controlled by Cif in the female and by cim in the male. When the female plant with Cif was crossed with the male plant with cim, a failure of early endosperm development was observed in the hybrid zygotes. Among cultivars of O. sativa, cim was distributed predominantly in the Japonica type but not in the Indica type. In addition, a dominant suppressor, Su-Cif, which changes the reaction in the female from incompatible to compatible was proposed to present near the centromere of chromosome 6 of the Indica type. Further, the death of young F(1) zygotes was controlled by the parental genotypes rather than by the genotype of the hybrid zygote itself since all three genes acted sporophytically, which strongly suggests an involvement of parent-of-origin effects. We discuss the results in relation to the origin of a crossing barrier as well as their maintenance within the primary gene pool. PMID:14504241

  13. Chromosomal Organization of Rrna Operons in Bacillus Subtilis

    PubMed Central

    Jarvis, E. D.; Widom, R. L.; LaFauci, G.; Setoguchi, Y.; Richter, I. R.; Rudner, R.

    1988-01-01

    Integrative mapping with vectors containing ribosomal DNA sequences were used to complete the mapping of the 10 rRNA gene sets in the endospore forming bacterium Bacillus subtilis. Southern hybridizations allowed the assignment of nine operons to distinct BclI restriction fragments and their genetic locus identified by transductional crosses. Nine of the ten rRNA gene sets are located between 0 and 70° on the genomic map. In the region surrounding cysA14, two sets of closely spaced tandem clusters are present. The first (rrnJ and rrnW) is located between purA16 and cysA14 closely linked to the latter; the second (rrnI, rrnH and rrnG) previously mapped within this area is located between attSPO2 and glpT6. The operons at or near the origin of replication (rrnO,rrnA and rrnJ,rrnW) represent ``hot spots'' of plasmid insertion. PMID:2465199

  14. Training set selection for the prediction of essential genes.

    PubMed

    Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng

    2014-01-01

    Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.

  15. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  16. SYBR green-based real-time reverse transcription-PCR for typing and subtyping of all hemagglutinin and neuraminidase genes of avian influenza viruses and comparison to standard serological subtyping tests

    USGS Publications Warehouse

    Tsukamoto, K.; Javier, P.C.; Shishido, M.; Noguchi, D.; Pearce, J.; Kang, H.-M.; Jeong, O.M.; Lee, Y.-J.; Nakanishi, K.; Ashizawa, T.

    2012-01-01

    Continuing outbreaks of H5N1 highly pathogenic (HP) avian influenza virus (AIV) infections of wild birds and poultry worldwide emphasize the need for global surveillance of wild birds. To support the future surveillance activities, we developed a SYBR green-based, real-time reverse transcriptase PCR (rRT-PCR) for detecting nucleoprotein (NP) genes and subtyping 16 hemagglutinin (HA) and 9 neuraminidase (NA) genes simultaneously. Primers were improved by focusing on Eurasian or North American lineage genes; the number of mixed-base positions per primer was set to five or fewer, and the concentration of each primer set was optimized empirically. Also, 30 cycles of amplification of 1:10 dilutions of cDNAs from cultured viruses effectively reduced minor cross- or nonspecific reactions. Under these conditions, 346 HA and 345 NA genes of 349 AIVs were detected, with average sensitivities of NP, HA, and NA genes of 10 1.5, 10 2.3, and 10 3.1 50% egg infective doses, respectively. Utility of rRT-PCR for subtyping AIVs was compared with that of current standard serological tests by using 104 recent migratory duck virus isolates. As a result, all HA genes and 99% of the NA genes were genetically subtyped, while only 45% of HA genes and 74% of NA genes were serologically subtyped. Additionally, direct subtyping of AIVs in fecal samples was possible by 40 cycles of amplification: approximately 70% of HA and NA genes of NP gene-positive samples were successfully subtyped. This validation study indicates that rRT-PCR with optimized primers and reaction conditions is a powerful tool for subtyping varied AIVs in clinical and cultured samples. Copyright ?? 2012, American Society for Microbiology. All Rights Reserved.

  17. Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system

    PubMed Central

    Sunkin, Susan M.; Ng, Lydia; Lau, Chris; Dolbeare, Tim; Gilbert, Terri L.; Thompson, Carol L.; Hawrylycz, Michael; Dang, Chinh

    2013-01-01

    The Allen Brain Atlas (http://www.brain-map.org) provides a unique online public resource integrating extensive gene expression data, connectivity data and neuroanatomical information with powerful search and viewing tools for the adult and developing brain in mouse, human and non-human primate. Here, we review the resources available at the Allen Brain Atlas, describing each product and data type [such as in situ hybridization (ISH) and supporting histology, microarray, RNA sequencing, reference atlases, projection mapping and magnetic resonance imaging]. In addition, standardized and unique features in the web applications are described that enable users to search and mine the various data sets. Features include both simple and sophisticated methods for gene searches, colorimetric and fluorescent ISH image viewers, graphical displays of ISH, microarray and RNA sequencing data, Brain Explorer software for 3D navigation of anatomy and gene expression, and an interactive reference atlas viewer. In addition, cross data set searches enable users to query multiple Allen Brain Atlas data sets simultaneously. All of the Allen Brain Atlas resources can be accessed through the Allen Brain Atlas data portal. PMID:23193282

  18. CYP1A1, GCLC, AGT, AGTR1 gene-gene interactions in community-acquired pneumonia pulmonary complications.

    PubMed

    Salnikova, Lyubov E; Smelaya, Tamara V; Golubev, Arkadiy M; Rubanovich, Alexander V; Moroz, Viktor V

    2013-11-01

    This study was conducted to establish the possible contribution of functional gene polymorphisms in detoxification/oxidative stress and vascular remodeling pathways to community-acquired pneumonia (CAP) susceptibility in the case-control study (350 CAP patients, 432 control subjects) and to predisposition to the development of CAP complications in the prospective study. All subjects were genotyped for 16 polymorphic variants in the 14 genes of xenobiotics detoxification CYP1A1, AhR, GSTM1, GSTT1, ABCB1, redox-status SOD2, CAT, GCLC, and vascular homeostasis ACE, AGT, AGTR1, NOS3, MTHFR, VEGFα. Risk of pulmonary complications (PC) in the single locus analysis was associated with CYP1A1, GCLC and AGTR1 genes. Extra PC (toxic shock syndrome and myocarditis) were not associated with these genes. We evaluated gene-gene interactions using multi-factor dimensionality reduction, and cumulative gene risk score approaches. The final model which included >5 risk alleles in the CYP1A1 (rs2606345, rs4646903, rs1048943), GCLC, AGT, and AGTR1 genes was associated with pleuritis, empyema, acute respiratory distress syndrome, all PC and acute respiratory failure (ARF). We considered CYP1A1, GCLC, AGT, AGTR1 gene set using Set Distiller mode implemented in GeneDecks for discovering gene-set relations via the degree of sharing descriptors within a given gene set. N-acetylcysteine and oxygen were defined by Set Distiller as the best descriptors for the gene set associated in the present study with PC and ARF. Results of the study are in line with literature data and suggest that genetically determined oxidative stress exacerbation may contribute to the progression of lung inflammation.

  19. Cadmium exposure and the epigenome

    PubMed Central

    Sanders, Alison P; Smeester, Lisa; Rojas, Daniel; DeBussycher, Tristan; Wu, Michael C; Wright, Fred A; Zhou, Yi-Hui; Laine, Jessica E; Rager, Julia E; Swamy, Geeta K; Ashley-Koch, Allison; Lynn Miranda, Marie; Fry, Rebecca C

    2014-01-01

    Cadmium (Cd) is prevalent in the environment yet understudied as a developmental toxicant. Cd partially crosses the placental barrier from mother to fetus and is linked to detrimental effects in newborns. Here we examine the relationship between levels of Cd during pregnancy and 5-methylcytosine (5mC) levels in leukocyte DNA collected from 17 mother-newborn pairs. The methylation of cytosines is an epigenetic mechanism known to impact transcriptional signaling and influence health endpoints. A methylated cytosine-guanine (CpG) island recovery assay was used to assess over 4.6 million sites spanning 16,421 CpG islands. Exposure to Cd was classified for each mother-newborn pair according to maternal blood levels and compared with levels of cotinine. Subsets of genes were identified that showed altered DNA methylation levels in their promoter regions in fetal DNA associated with levels of Cd (n = 61), cotinine (n = 366), or both (n = 30). Likewise, in maternal DNA, differentially methylated genes were identified that were associated with Cd (n = 92) or cotinine (n = 134) levels. While the gene sets were largely distinct between maternal and fetal DNA, functional similarities at the biological pathway level were identified including an enrichment of genes that encode for proteins that control transcriptional regulation and apoptosis. Furthermore, conserved DNA motifs with sequence similarity to specific transcription factor binding sites were identified within the CpG islands of the gene sets. This study provides evidence for distinct patterns of DNA methylation or “footprints” in fetal and maternal DNA associated with exposure to Cd. PMID:24169490

  20. A TALE of shrimps: Genome-wide survey of homeobox genes in 120 species from diverse crustacean taxa.

    PubMed

    Chang, Wai Hoong; Lai, Alvina G

    2018-01-01

    The homeodomain-containing proteins are an important group of transcription factors found in most eukaryotes including animals, plants and fungi. Homeobox genes are responsible for a wide range of critical developmental and physiological processes, ranging from embryonic development, innate immune homeostasis to whole-body regeneration. With continued fascination on this key class of proteins by developmental and evolutionary biologists, multiple efforts have thus far focused on the identification and characterization of homeobox orthologs from key model organisms in attempts to infer their evolutionary origin and how this underpins the evolution of complex body plans. Despite their importance, the genetic complement of homeobox genes has yet been described in one of the most valuable groups of animals representing economically important food crops. With crustacean aquaculture being a growing industry worldwide, it is clear that systematic and cross-species identification of crustacean homeobox orthologs is necessary in order to harness this genetic circuitry for the improvement of aquaculture sustainability. Using publicly available transcriptome data sets, we identified a total of 4183 putative homeobox genes from 120 crustacean species that include food crop species, such as lobsters, shrimps, crayfish and crabs. Additionally, we identified 717 homeobox orthologs from 6 other non-crustacean arthropods, which include the scorpion, deer tick, mosquitoes and centipede. This high confidence set of homeobox genes will now serve as a key resource to the broader community for future functional and comparative genomics studies.

  1. Signal signature and transcriptome changes of Arabidopsis during pathogen and insect attack.

    PubMed

    De Vos, Martin; Van Oosten, Vivian R; Van Poecke, Remco M P; Van Pelt, Johan A; Pozo, Maria J; Mueller, Martin J; Buchala, Antony J; Métraux, Jean-Pierre; Van Loon, L C; Dicke, Marcel; Pieterse, Corné M J

    2005-09-01

    Plant defenses against pathogens and insects are regulated differentially by cross-communicating signaling pathways in which salicylic acid (SA), jasmonic acid (JA), and ethylene (ET) play key roles. To understand how plants integrate pathogen- and insect-induced signals into specific defense responses, we monitored the dynamics of SA, JA, and ET signaling in Arabidopsis after attack by a set of microbial pathogens and herbivorous insects with different modes of attack. Arabidopsis plants were exposed to a pathogenic leaf bacterium (Pseudomonas syringae pv. tomato), a pathogenic leaf fungus (Alternaria brassicicola), tissue-chewing caterpillars (Pieris rapae), cell-content-feeding thrips (Frankliniella occidentalis), or phloem-feeding aphids (Myzus persicae). Monitoring the signal signature in each plant-attacker combination showed that the kinetics of SA, JA, and ET production varies greatly in both quantity and timing. Analysis of global gene expression profiles demonstrated that the signal signature characteristic of each Arabidopsis-attacker combination is orchestrated into a surprisingly complex set of transcriptional alterations in which, in all cases, stress-related genes are overrepresented. Comparison of the transcript profiles revealed that consistent changes induced by pathogens and insects with very different modes of attack can show considerable overlap. Of all consistent changes induced by A. brassicicola, Pieris rapae, and E occidentalis, more than 50% also were induced consistently by P. syringae. Notably, although these four attackers all stimulated JA biosynthesis, the majority of the changes in JA-responsive gene expression were attacker specific. All together, our study shows that SA, JA, and ET play a primary role in the orchestration of the plant's defense response, but other regulatory mechanisms, such as pathway cross-talk or additional attacker-induced signals, eventually shape the highly complex attacker-specific defense response.

  2. Chromosomal Organization and Sequence Diversity of Genes Encoding Lachrymatory Factor Synthase in Allium cepa L.

    PubMed Central

    Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi,, Naoki; Shigyo, Masayoshi

    2012-01-01

    Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum–shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F2 mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5. PMID:22690373

  3. Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes

    PubMed Central

    Premzl, Marko

    2015-01-01

    Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635

  4. Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment.

    PubMed

    Severgnini, Marco; Bicciato, Silvio; Mangano, Eleonora; Scarlatti, Francesca; Mezzelani, Alessandra; Mattioli, Michela; Ghidoni, Riccardo; Peano, Clelia; Bonnal, Raoul; Viti, Federica; Milanesi, Luciano; De Bellis, Gianluca; Battaglia, Cristina

    2006-06-01

    Meta-analysis of microarray data is increasingly important, considering both the availability of multiple platforms using disparate technologies and the accumulation in public repositories of data sets from different laboratories. We addressed the issue of comparing gene expression profiles from two microarray platforms by devising a standardized investigative strategy. We tested this procedure by studying MDA-MB-231 cells, which undergo apoptosis on treatment with resveratrol. Gene expression profiles were obtained using high-density, short-oligonucleotide, single-color microarray platforms: GeneChip (Affymetrix) and CodeLink (Amersham). Interplatform analyses were carried out on 8414 common transcripts represented on both platforms, as identified by LocusLink ID, representing 70.8% and 88.6% of annotated GeneChip and CodeLink features, respectively. We identified 105 differentially expressed genes (DEGs) on CodeLink and 42 DEGs on GeneChip. Among them, only 9 DEGs were commonly identified by both platforms. Multiple analyses (BLAST alignment of probes with target sequences, gene ontology, literature mining, and quantitative real-time PCR) permitted us to investigate the factors contributing to the generation of platform-dependent results in single-color microarray experiments. An effective approach to cross-platform comparison involves microarrays of similar technologies, samples prepared by identical methods, and a standardized battery of bioinformatic and statistical analyses.

  5. Meiotic gene-conversion rate and tract length variation in the human genome.

    PubMed

    Padhukasahasram, Badri; Rannala, Bruce

    2013-02-27

    Meiotic recombination occurs in the form of two different mechanisms called crossing-over and gene-conversion and both processes have an important role in shaping genetic variation in populations. Although variation in crossing-over rates has been studied extensively using sperm-typing experiments, pedigree studies and population genetic approaches, our knowledge of variation in gene-conversion parameters (ie, rates and mean tract lengths) remains far from complete. To explore variability in population gene-conversion rates and its relationship to crossing-over rate variation patterns, we have developed and validated using coalescent simulations a comprehensive Bayesian full-likelihood method that can jointly infer crossing-over and gene-conversion rates as well as tract lengths from population genomic data under general variable rate models with recombination hotspots. Here, we apply this new method to SNP data from multiple human populations and attempt to characterize for the first time the fine-scale variation in gene-conversion parameters along the human genome. We find that the estimated ratio of gene-conversion to crossing-over rates varies considerably across genomic regions as well as between populations. However, there is a great degree of uncertainty associated with such estimates. We also find substantial evidence for variation in the mean conversion tract length. The estimated tract lengths did not show any negative relationship with the local heterozygosity levels in our analysis.European Journal of Human Genetics advance online publication, 27 February 2013; doi:10.1038/ejhg.2013.30.

  6. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  7. Epistasis in intra- and inter-gene pool crosses of the common bean.

    PubMed

    Borel, J C; Ramalho, M A P; Abreu, A F B

    2016-02-26

    Epistasis has been shown to have an important role in the genetic control of several quantitative traits in the common bean. This study aimed to investigate the occurrence of epistasis in intra- and inter-pool gene crosses of the common bean. Four elite lines adapted to Brazilian conditions were used as parents, two from the Andean gene pool (ESAL 686; BRS Radiante) and two from the Mesoamerican gene pool (BRSMG Majestoso; BRS Valente). Four F2 populations were obtained: "A" (ESAL 686 x BRS Radiante), "B" (BRSMG Majestoso x BRS Valente), "C" (BRS Radiante x BRSMG Majestoso), and "D" (BRS Valente x ESAL 686). A random sample of F2 plants from each population was backcrossed to parents and F1 individuals, according to the triple test cross. Three types of progenies from each population were evaluated in contiguous trials. Seed yield and 100-seed weight were evaluated. Dominance genetic variance was predominant in most cases. However, the estimates of genetic variance may be biased by the occurrence of linkage disequilibrium and epistasis. Epistasis was detected for both traits; however, the occurrence differed among the populations and between the two traits. The results of this study reinforce the hypothesis that epistasis is present in the genetic control of traits in the common bean and suggest that the phenomenon is more frequent in inter-gene pool crosses than in intra-gene pool crosses.

  8. EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries

    PubMed Central

    Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

    2008-01-01

    Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects. PMID:18402700

  9. EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.

    PubMed

    Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P

    2008-04-10

    Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.

  10. CrossQuery: a web tool for easy associative querying of transcriptome data.

    PubMed

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  11. The molecular systematics of blowflies and screwworm flies (Diptera: Calliphoridae) using 28S rRNA, COX1 and EF-1α: insights into the evolution of dipteran parasitism.

    PubMed

    McDonagh, Laura M; Stevens, Jamie R

    2011-11-01

    The Calliphoridae include some of the most economically significant myiasis-causing flies in the world - blowflies and screwworm flies - with many being notorious for their parasitism of livestock. However, despite more than 50 years of research, key taxonomic relationships within the family remain unresolved. This study utilizes nucleotide sequence data from the protein-coding genes COX1 (mitochondrial) and EF1α (nuclear), and the 28S rRNA (nuclear) gene, from 57 blowfly taxa to improve resolution of key evolutionary relationships within the family Calliphoridae. Bayesian phylogenetic inference was carried out for each single-gene data set, demonstrating significant topological difference between the three gene trees. Nevertheless, all gene trees supported a Calliphorinae-Luciliinae subfamily sister-lineage, with respect to Chrysomyinae. In addition, this study also elucidates the taxonomic and evolutionary status of several less well-studied groups, including the genus Bengalia (either within Calliphoridae or as a separate sister-family), genus Onesia (as a sister-genera to, or sub-genera within, Calliphora), genus Dyscritomyia and Lucilia bufonivora, a specialised parasite of frogs and toads. The occurrence of cross-species hybridisation within Calliphoridae is also further explored, focusing on the two economically significant species Lucilia cuprina and Lucilia sericata. In summary, this study represents the most comprehensive molecular phylogenetic analysis of family Calliphoridae undertaken to date.

  12. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    PubMed

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  13. The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks1[C][W][OA

    PubMed Central

    Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex

    2010-01-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062

  14. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks.

    PubMed

    Ficklin, Stephen P; Luo, Feng; Feltus, F Alex

    2010-09-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.

  15. Chronic Antibody-Mediated Rejection in Nonhuman Primate Renal Allografts: Validation of Human Histological and Molecular Phenotypes.

    PubMed

    Adam, B A; Smith, R N; Rosales, I A; Matsunami, M; Afzali, B; Oura, T; Cosimi, A B; Kawai, T; Colvin, R B; Mengel, M

    2017-11-01

    Molecular testing represents a promising adjunct for the diagnosis of antibody-mediated rejection (AMR). Here, we apply a novel gene expression platform in sequential formalin-fixed paraffin-embedded samples from nonhuman primate (NHP) renal transplants. We analyzed 34 previously described gene transcripts related to AMR in humans in 197 archival NHP samples, including 102 from recipients that developed chronic AMR, 80 from recipients without AMR, and 15 normal native nephrectomies. Three endothelial genes (VWF, DARC, and CAV1), derived from 10-fold cross-validation receiver operating characteristic curve analysis, demonstrated excellent discrimination between AMR and non-AMR samples (area under the curve = 0.92). This three-gene set correlated with classic features of AMR, including glomerulitis, capillaritis, glomerulopathy, C4d deposition, and DSAs (r = 0.39-0.63, p < 0.001). Principal component analysis confirmed the association between three-gene set expression and AMR and highlighted the ambiguity of v lesions and ptc lesions between AMR and T cell-mediated rejection (TCMR). Elevated three-gene set expression corresponded with the development of immunopathological evidence of rejection and often preceded it. Many recipients demonstrated mixed AMR and TCMR, suggesting that this represents the natural pattern of rejection. These data provide NHP animal model validation of recent updates to the Banff classification including the assessment of molecular markers for diagnosing AMR. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.

  16. Potential gene flow from transgenic rice (Oryza sativa L.) to different weedy rice (Oryza sativa f. spontanea) accessions based on reproductive compatibility.

    PubMed

    Song, Xiaoling; Liu, Linli; Wang, Zhou; Qiang, Sheng

    2009-08-01

    The possibility of gene flow from transgenic crops to wild relatives may be affected by reproductive capacity between them. The potential gene flow from two transgenic rice lines containing the bar gene to five accessions of weedy rice (WR1-WR5) was determined through examination of reproductive compatibility under controlled pollination. The pollen grain germination of two transgenic rice lines on the stigma of all weedy rice, rice pollen tube growth down the style and entry into the weedy rice ovary were similar to self-pollination in weedy rice. However, delayed double fertilisation and embryo abortion in crosses between WR2 and Y0003 were observed. Seed sets between transgenic rice lines and weedy rice varied from 8 to 76%. Although repeated pollination increased seed set significantly, the rank of the seed set between the weedy rice accessions and rice lines was not changed. The germination rates of F(1) hybrids were similar or greater compared with respective females. All F(1) plants expressed glufosinate resistance in the presence of glufosinate selection pressure. The frequency of gene flow between different weedy rice accessions and transgenic herbicide-resistant rice may differ owing to different reproductive compatibility. This result suggests that, when wild relatives are selected as experimental materials for assessing the gene flow of transgenic rice, it is necessary to address the compatibility between transgenic rice and wild relatives.

  17. Phylogenetics and evolution of Su(var)3-9 SET genes in land plants: rapid diversification in structure and function.

    PubMed

    Zhu, Xinyu; Ma, Hong; Chen, Zhiduan

    2011-03-09

    Plants contain numerous Su(var)3-9 homologues (SUVH) and related (SUVR) genes, some of which await functional characterization. Although there have been studies on the evolution of plant Su(var)3-9 SET genes, a systematic evolutionary study including major land plant groups has not been reported. Large-scale phylogenetic and evolutionary analyses can help to elucidate the underlying molecular mechanisms and contribute to improve genome annotation. Putative orthologs of plant Su(var)3-9 SET protein sequences were retrieved from major representatives of land plants. A novel clustering that included most members analyzed, henceforth referred to as core Su(var)3-9 homologues and related (cSUVHR) gene clade, was identified as well as all orthologous groups previously identified. Our analysis showed that plant Su(var)3-9 SET proteins possessed a variety of domain organizations, and can be classified into five types and ten subtypes. Plant Su(var)3-9 SET genes also exhibit a wide range of gene structures among different paralogs within a family, even in the regions encoding conserved PreSET and SET domains. We also found that the majority of SUVH members were intronless and formed three subclades within the SUVH clade. A detailed phylogenetic analysis of the plant Su(var)3-9 SET genes was performed. A novel deep phylogenetic relationship including most plant Su(var)3-9 SET genes was identified. Additional domains such as SAR, ZnF_C2H2 and WIYLD were early integrated into primordial PreSET/SET/PostSET domain organization. At least three classes of gene structures had been formed before the divergence of Physcomitrella patens (moss) from other land plants. One or multiple retroposition events might have occurred among SUVH genes with the donor genes leading to the V-2 orthologous group. The structural differences among evolutionary groups of plant Su(var)3-9 SET genes with different functions were described, contributing to the design of further experimental studies.

  18. Developing an Apicomplexan DNA Barcoding System to Detect Blood Parasites of Small Coral Reef Fishes.

    PubMed

    Renoux, Lance P; Dolan, Maureen C; Cook, Courtney A; Smit, Nico J; Sikkel, Paul C

    2017-08-01

    Apicomplexan parasites are obligate parasites of many species of vertebrates. To date, there is very limited understanding of these parasites in the most-diverse group of vertebrates, actinopterygian fishes. While DNA barcoding targeting the eukaryotic 18S small subunit rRNA gene sequence has been useful in identifying apicomplexans in tetrapods, identification of apicomplexans infecting fishes has relied solely on morphological identification by microscopy. In this study, a DNA barcoding method was developed that targets the 18S rRNA gene primers for identifying apicomplexans parasitizing certain actinopterygian fishes. A lead primer set was selected showing no cross-reactivity to the overwhelming abundant host DNA and successfully confirmed 37 of the 41 (90.2%) microscopically verified parasitized fish blood samples analyzed in this study. Furthermore, this DNA barcoding method identified 4 additional samples that screened negative for parasitemia, suggesting this molecular method may provide improved sensitivity over morphological characterization by microscopy. In addition, this PCR screening method for fish apicomplexans, using Whatman FTA preserved DNA, was tested in efforts leading to a more simplified field collection, transport, and sample storage method as well as a streamlining sample processing important for DNA barcoding of large sample sets.

  19. QTL analysis on rice grain appearance quality, as exemplifying the typical events of transgenic or backcrossing breeding

    PubMed Central

    Yan, Bao; Liu, Rongjia; Li, Yibo; Wang, Yan; Gao, Guanjun; Zhang, Qinglu; Liu, Xing; Jiang, Gonghao; He, Yuqing

    2014-01-01

    Rice grain shape and yield are usually controlled by multiple quantitative trait loci (QTL). This study used a set of F9–10 recombinant inbred lines (RILs) derived from a cross of Huahui 3 (Bt/Xa21) and Zhongguoxiangdao, and detected 27 QTLs on ten rice chromosomes. Among them, twelve QTLs responsive for grain shape/ or yield were mostly reproducibly detected and had not yet been reported before. Interestingly, the two known genes involved in the materials, with one insect-resistant Bt gene, and the other disease-resistant Xa21 gene, were found to closely link the QTLs responsive for grain shape and weight. The Bt fragment insertion was firstly mapped on the chromosome 10 in Huahui 3 and may disrupt grain-related QTLs resulting in weaker yield performance in transgenic plants. The introgression of Xa21 gene by backcrossing from donor material into receptor Minghui 63 may also contain a donor linkage drag which included minor-effect QTL alleles positively affecting grain shape and yield. The QTL analysis on rice grain appearance quality exemplified the typical events of transgenic or backcrossing breeding. The QTL findings in this study will in the future facilitate the gene isolation and breeding application for improvement of rice grain shape and yield. PMID:25320558

  20. QTL analysis on rice grain appearance quality, as exemplifying the typical events of transgenic or backcrossing breeding.

    PubMed

    Yan, Bao; Liu, Rongjia; Li, Yibo; Wang, Yan; Gao, Guanjun; Zhang, Qinglu; Liu, Xing; Jiang, Gonghao; He, Yuqing

    2014-09-01

    Rice grain shape and yield are usually controlled by multiple quantitative trait loci (QTL). This study used a set of F9-10 recombinant inbred lines (RILs) derived from a cross of Huahui 3 (Bt/Xa21) and Zhongguoxiangdao, and detected 27 QTLs on ten rice chromosomes. Among them, twelve QTLs responsive for grain shape/ or yield were mostly reproducibly detected and had not yet been reported before. Interestingly, the two known genes involved in the materials, with one insect-resistant Bt gene, and the other disease-resistant Xa21 gene, were found to closely link the QTLs responsive for grain shape and weight. The Bt fragment insertion was firstly mapped on the chromosome 10 in Huahui 3 and may disrupt grain-related QTLs resulting in weaker yield performance in transgenic plants. The introgression of Xa21 gene by backcrossing from donor material into receptor Minghui 63 may also contain a donor linkage drag which included minor-effect QTL alleles positively affecting grain shape and yield. The QTL analysis on rice grain appearance quality exemplified the typical events of transgenic or backcrossing breeding. The QTL findings in this study will in the future facilitate the gene isolation and breeding application for improvement of rice grain shape and yield.

  1. Metallo-Beta-Lactamase Producing Pseudomonas aeruginosa in a Healthcare Setting in Alexandria, Egypt.

    PubMed

    Abaza, Amani F; El Shazly, Soraya A; Selim, Heba S A; Aly, Gehan S A

    2017-09-27

    Pseudomonas aeruginosa has emerged as a major healthcare associated pathogen that creates a serious public health disaster in both developing and developed countries. In this work we aimed at studying the occurrence of metallo-beta-lactamase (MBL) producing P. aeruginosa in a healthcare setting in Alexandria, Egypt. This cross sectional study included 1583 clinical samples that were collected from patients admitted to Alexandria University Students' Hospital. P. aeruginosa isolates were identified using standard microbiological methods and were tested for their antimicrobial susceptibility patterns using single disc diffusion method according to the Clinical and Laboratory Standards Institute recommendations. Thirty P. aeruginosa isolates were randomly selected and tested for their MBL production by both phenotypic and genotypic methods. Diagnostic Epsilometer test was done to detect metallo-beta-lactamase enzyme producers and polymerase chain reaction test was done to detect imipenemase (IMP), Verona integron-encoded (VIM) and Sao Paulo metallo-beta-lactamase (IMP) encoding genes. Of the 1583 clinical samples, 175 (11.3%) P. aeruginosa isolates were identified. All the 30 (100%) selected P. aeruginosa isolates that were tested for MBL production by Epsilometer test were found to be positive; where 19 (63.3%) revealed blaSPM gene and 11 (36.7%) had blaIMP gene. blaVIM gene was not detected in any of the tested isolates. Isolates of MBL producing P. aeruginosa were highly susceptible to polymyxin B 26 (86.7%) and highly resistant to amikacin 26 (86.7%). MBL producers were detected phenotypically by Epsilometer test in both carbapenem susceptible and resistant P. aeruginosa isolates. blaSPM was the most commonly detected MBL gene in P. aeruginosa isolates.

  2. A 10-Gene Classifier for Indeterminate Thyroid Nodules: Development and Multicenter Accuracy Study

    PubMed Central

    González, Hernán E.; Martínez, José R.; Vargas-Salas, Sergio; Solar, Antonieta; Veliz, Loreto; Cruz, Francisco; Arias, Tatiana; Loyola, Soledad; Horvath, Eleonora; Tala, Hernán; Traipe, Eufrosina; Meneses, Manuel; Marín, Luis; Wohllk, Nelson; Diaz, René E.; Véliz, Jesús; Pineda, Pedro; Arroyo, Patricia; Mena, Natalia; Bracamonte, Milagros; Miranda, Giovanna; Bruce, Elsa

    2017-01-01

    Background: In most of the world, diagnostic surgery remains the most frequent approach for indeterminate thyroid cytology. Although several molecular tests are available for testing in centralized commercial laboratories in the United States, there are no available kits for local laboratory testing. The aim of this study was to develop a prototype in vitro diagnostic (IVD) gene classifier for the further characterization of nodules with an indeterminate thyroid cytology. Methods: In a first stage, the expression of 18 genes was determined by quantitative polymerase chain reaction (qPCR) in a broad histopathological spectrum of 114 fresh-tissue biopsies. Expression data were used to train several classifiers by supervised machine learning approaches. Classifiers were tested in an independent set of 139 samples. In a second stage, the best classifier was chosen as a model to develop a multiplexed-qPCR IVD prototype assay, which was tested in a prospective multicenter cohort of fine-needle aspiration biopsies. Results: In tissue biopsies, the best classifier, using only 10 genes, reached an optimal and consistent performance in the ninefold cross-validated testing set (sensitivity 93% and specificity 81%). In the multicenter cohort of fine-needle aspiration biopsy samples, the 10-gene signature, built into a multiplexed-qPCR IVD prototype, showed an area under the curve of 0.97, a positive predictive value of 78%, and a negative predictive value of 98%. By Bayes' theorem, the IVD prototype is expected to achieve a positive predictive value of 64–82% and a negative predictive value of 97–99% in patients with a cancer prevalence range of 20–40%. Conclusions: A new multiplexed-qPCR IVD prototype is reported that accurately classifies thyroid nodules and may provide a future solution suitable for local reference laboratory testing. PMID:28521616

  3. Absence of Measles Virus Detection from Stapes of Patients with Otosclerosis.

    PubMed

    Flores-García, María de Lourdes; Colín-Castro, Claudia Adriana; Hernández-Palestina, Mario Sabas; Sánchez-Larios, Roberto; Franco-Cendejas, Rafael

    2018-01-01

    Objective To determine molecularly the presence of measles virus genetic material in the stapes of patients with otosclerosis. Study Design A cross-sectional study. Setting A tertiary referral hospital. Subjects and Methods Genetic material was extracted from the stapes of patients with otosclerosis (n = 93) during the period from March 2011 to April 2012. The presence of viral measles sequences was evaluated by the real-time reverse transcriptase polymerase chain reaction (RT-PCR). The expression of the CD46 gene was determined. Results Ninety-three patients were included in the study. No sample was positive for any of 3 measles virus genes (H, N, and F). Measles virus RNA was not detected in any sample by real-time RT-PCR. CD46 levels were positive in 3.3% (n = 3) and negative in 96.7% (n = 90). Conclusion This study does not support the theory of measles virus as the cause of otosclerosis. It is necessary to do more research about other causal theories to clarify its etiology and prevention.

  4. Probable secondary transmission of antimicrobial-resistant Escherichia coli between people living with and without pets

    PubMed Central

    CHUNG, Yeon Soo; PARK, Young Kyung; PARK, Yong Ho; PARK, Kun Taek

    2017-01-01

    Companion animals are considered as one of the reservoirs of antimicrobial-resistant (AR) bacteria that can be cross-transmitted to humans. However, limited information is available on the possibility of AR bacteria originating from companion animals being transmitted secondarily from owners to non-owners sharing the same space. To address this issue, the present study investigated clonal relatedness among AR E. coli isolated from dog owners and non-owners in the same college classroom or household. Anal samples (n=48) were obtained from 14 owners and 34 non-owners; 31 E. coli isolates were collected (nine from owners and 22 from non-owners). Of 31 E. coli, 20 isolates (64.5%) were resistant to at least one antimicrobial, and 16 isolates (51.6%) were determined as multi-drug resistant E. coli. Six isolates (19.4%) harbored integrase genes (five harbored class I integrase gene and one harbored class 2 integrase gene, respectively). Pulsed-field gel electrophoretic analysis identified three different E. coli clonal sets among isolates, indicating that cross-transmission of AR E. coli can easily occur between owners and non-owners. The findings emphasize a potential risk of spread of AR bacteria originating from pets within human communities, once they are transferred to humans. Further studies are needed to evaluate the exact risk and identify the risk factors of secondarily transmission by investigating larger numbers of isolates from pets, their owners and non-owners in a community. PMID:28190823

  5. Probable secondary transmission of antimicrobial-resistant Escherichia coli between people living with and without pets.

    PubMed

    Chung, Yeon Soo; Park, Young Kyung; Park, Yong Ho; Park, Kun Taek

    2017-03-18

    Companion animals are considered as one of the reservoirs of antimicrobial-resistant (AR) bacteria that can be cross-transmitted to humans. However, limited information is available on the possibility of AR bacteria originating from companion animals being transmitted secondarily from owners to non-owners sharing the same space. To address this issue, the present study investigated clonal relatedness among AR E. coli isolated from dog owners and non-owners in the same college classroom or household. Anal samples (n=48) were obtained from 14 owners and 34 non-owners; 31 E. coli isolates were collected (nine from owners and 22 from non-owners). Of 31 E. coli, 20 isolates (64.5%) were resistant to at least one antimicrobial, and 16 isolates (51.6%) were determined as multi-drug resistant E. coli. Six isolates (19.4%) harbored integrase genes (five harbored class I integrase gene and one harbored class 2 integrase gene, respectively). Pulsed-field gel electrophoretic analysis identified three different E. coli clonal sets among isolates, indicating that cross-transmission of AR E. coli can easily occur between owners and non-owners. The findings emphasize a potential risk of spread of AR bacteria originating from pets within human communities, once they are transferred to humans. Further studies are needed to evaluate the exact risk and identify the risk factors of secondarily transmission by investigating larger numbers of isolates from pets, their owners and non-owners in a community.

  6. Transcriptome analysis of genes and gene networks involved in aggressive behavior in mouse and zebrafish.

    PubMed

    Malki, Karim; Du Rietz, Ebba; Crusio, Wim E; Pain, Oliver; Paya-Cano, Jose; Karadaghi, Rezhaw L; Sluyter, Frans; de Boer, Sietse F; Sandnabba, Kenneth; Schalkwyk, Leonard C; Asherson, Philip; Tosto, Maria Grazia

    2016-09-01

    Despite moderate heritability estimates, the molecular architecture of aggressive behavior remains poorly characterized. This study compared gene expression profiles from a genetic mouse model of aggression with zebrafish, an animal model traditionally used to study aggression. A meta-analytic, cross-species approach was used to identify genomic variants associated with aggressive behavior. The Rankprod algorithm was used to evaluated mRNA differences from prefrontal cortex tissues of three sets of mouse lines (N = 18) selectively bred for low and high aggressive behavior (SAL/LAL, TA/TNA, and NC900/NC100). The same approach was used to evaluate mRNA differences in zebrafish (N = 12) exposed to aggressive or non-aggressive social encounters. Results were compared to uncover genes consistently implicated in aggression across both studies. Seventy-six genes were differentially expressed (PFP < 0.05) in aggressive compared to non-aggressive mice. Seventy genes were differentially expressed in zebrafish exposed to a fight encounter compared to isolated zebrafish. Seven genes (Fos, Dusp1, Hdac4, Ier2, Bdnf, Btg2, and Nr4a1) were differentially expressed across both species 5 of which belonging to a gene-network centred on the c-Fos gene hub. Network analysis revealed an association with the MAPK signaling cascade. In human studies HDAC4 haploinsufficiency is a key genetic mechanism associated with brachydactyly mental retardation syndrome (BDMR), which is associated with aggressive behaviors. Moreover, the HDAC4 receptor is a drug target for valproic acid, which is being employed as an effective pharmacological treatment for aggressive behavior in geriatric, psychiatric, and brain-injury patients. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  7. Genetic and physical fine mapping of the novel brown midrib gene bm6 in maize (Zea mays L.) to a 180 kb region on chromosome 2.

    PubMed

    Chen, Yongsheng; Liu, Hongjun; Ali, Farhad; Scott, M Paul; Ji, Qing; Frei, Ursula Karoline; Lübberstedt, Thomas

    2012-10-01

    Brown midrib mutants in maize are known to be associated with reduced lignin content and increased cell wall digestibility, which leads to better forage quality and higher efficiency of cellulosic biomass conversion into ethanol. Four well known brown midrib (bm) mutants, named bm1-4, were identified several decades ago. Additional recessive brown midrib mutants have been identified by allelism tests and designated as bm5 and bm6. In this study, we determined that bm6 increases cell wall digestibility and decreases plant height. bm6 was confirmed onto the short arm of chromosome 2 by a small mapping set with 181 plants from a F(2) segregating population, derived from crossing B73 and a bm6 mutant line. Subsequently, 960 brown midrib individuals were selected from the same but larger F(2) population for genetic and physical mapping. With newly developed markers in the target region, the bm6 gene was assigned to a 180 kb interval flanked by markers SSR_308337 and SSR_488638. In this region, ten gene models are predicted in the maize B73 sequence. Analysis of these ten genes as well as genes in the syntenic rice region revealed that four of them are promising candidate genes for bm6. Our study will facilitate isolation of the underlying gene of bm6 and advance our understanding of brown midrib gene functions.

  8. Transcriptional profile of isoproterenol-induced cardiomyopathy and comparison to exercise-induced cardiac hypertrophy and human cardiac failure

    PubMed Central

    2009-01-01

    Background Isoproterenol-induced cardiac hypertrophy in mice has been used in a number of studies to model human cardiac disease. In this study, we compared the transcriptional response of the heart in this model to other animal models of heart failure, as well as to the transcriptional response of human hearts suffering heart failure. Results We performed microarray analyses on RNA from mice with isoproterenol-induced cardiac hypertrophy and mice with exercise-induced physiological hypertrophy and identified 865 and 2,534 genes that were significantly altered in pathological and physiological cardiac hypertrophy models, respectively. We compared our results to 18 different microarray data sets (318 individual arrays) representing various other animal models and four human cardiac diseases and identified a canonical set of 64 genes that are generally altered in failing hearts. We also produced a pairwise similarity matrix to illustrate relatedness of animal models with human heart disease and identified ischemia as the human condition that most resembles isoproterenol treatment. Conclusion The overall patterns of gene expression are consistent with observed structural and molecular differences between normal and maladaptive cardiac hypertrophy and support a role for the immune system (or immune cell infiltration) in the pathology of stress-induced hypertrophy. Cross-study comparisons such as the results presented here provide targets for further research of cardiac disease that might generally apply to maladaptive cardiac stresses and are also a means of identifying which animal models best recapitulate human disease at the transcriptional level. PMID:20003209

  9. Reproductive isolation between populations of Iris atropurpurea is associated with ecological differentiation

    PubMed Central

    Yardeni, Gil; Tessler, Naama; Imbert, Eric; Sapir, Yuval

    2016-01-01

    Background and Aims Speciation is often described as a continuous dynamic process, expressed by different magnitudes of reproductive isolation (RI) among groups in different levels of divergence. Studying intraspecific partial RI can shed light on mechanisms underlying processes of population divergence. Intraspecific divergence can be driven by spatially stochastic accumulation of genetic differences following reduced gene flow, resulting in increased RI with increased geographical distance, or by local adaptation, resulting in increased RI with environmental difference. Methods We tested for RI as a function of both geographical distance and ecological differentiation in Iris atropurpurea, an endemic Israeli coastal plant. We crossed plants in the Netanya Iris Reserve population with plants from 14 populations across the species’ full distribution, and calculated RI and reproductive success based on fruit set, seed set and fraction of seed viability. Key Results We found that total RI was not significantly associated with geographical distance, but significantly increased with ecological distance. Similarly, reproductive success of the crosses, estimated while controlling for the dependency of each component on the previous stage, significantly reduced with increased ecological distance. Conclusions Our results indicate that the rise of post-pollination reproductive barriers in I. atropurpurea is more affected by ecological differentiation between populations than by geographical distance, supporting the hypothesis that ecological differentiation is predominant over isolation by distance and by reduced gene flow in this species. These findings also affect conservation management, such as genetic rescue, in the highly fragmented and endangered I. atropurpurea. PMID:27436798

  10. Multifractal detrended cross correlation analysis of neuro-degenerative diseases-An in depth study

    NASA Astrophysics Data System (ADS)

    Dutta, Srimonti; Ghosh, Dipak; Chatterjee, Sucharita

    2018-02-01

    This work revisits our previous study on human gait diseases, (Dutta et al., 2013) where we have studied the autocorrelation of human gait pattern in normal and diseased set. Significant difference in results was observed for normal and diseased set. However we were not able to distinguish between sets of Parkinson's and Huntington's disease. In this paper we attempt to study whether cross correlations between two feet of human gait pattern can help to distinguish between different diseased set. The results reveal that study of cross correlations can help to distinguish between Parkinson's and Huntington's disease.

  11. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

    PubMed

    Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin

    2011-04-14

    High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.

  12. The Comparative Toxicogenomics Database (CTD): A Resource for Comparative Toxicological Studies

    PubMed Central

    CJ, Mattingly; MC, Rosenstein; GT, Colby; JN, Forrest; JL, Boyer

    2006-01-01

    The etiology of most chronic diseases involves interactions between environmental factors and genes that modulate important biological processes (Olden and Wilson, 2000). We are developing the publicly available Comparative Toxicogenomics Database (CTD) to promote understanding about the effects of environmental chemicals on human health. CTD identifies interactions between chemicals and genes and facilitates cross-species comparative studies of these genes. The use of diverse animal models and cross-species comparative sequence studies has been critical for understanding basic physiological mechanisms and gene and protein functions. Similarly, these approaches will be valuable for exploring the molecular mechanisms of action of environmental chemicals and the genetic basis of differential susceptibility. PMID:16902965

  13. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537

  14. When Genomes Collide: Aberrant Seed Development Following Maize Interploidy Crosses

    PubMed Central

    Pennington, Paul D.; Costa, Liliana M.; Gutierrez-Marcos, Jose F.; Greenland, Andy J.; Dickinson, Hugh G.

    2008-01-01

    Background and Aims The results of wide- or interploidy crosses in angiosperms are unpredictable and often lead to seed abortion. The consequences of reciprocal interploidy crosses have been explored in maize in detail, focusing on alterations to tissue domains in the maize endosperm, and changes in endosperm-specific gene expression. Methods Following reciprocal interploidy crosses between diploid and tetraploid maize lines, development of endosperm domains was studied using GUS reporter lines, and gene expression in resulting kernels was investigated using semi-quantitative RT-PCR on endosperms isolated at different stages of development. Key Results Reciprocal interploidy crosses result in very small, largely infertile seeds with defective endosperms. Seeds with maternal genomic excess are smaller than those with paternal genomic excess, their endosperms cellularize earlier and they accumulate significant quantities of starch. Endosperms from the reciprocal cross undergo an extended period of cell proliferation, and accumulate little starch. Analysis of reporter lines and gene expression studies confirm that functional domains of the endosperm are severely disrupted, and are modified differently according to the direction of the interploidy cross. Conclusions Interploidy crosses affect factors which regulate the balance between cell proliferation and cell differentiation within the endosperm. In particular, unbalanced crosses in maize affect transfer cell differentiation, and lead to the temporal deregulation of the ontogenic programme of endosperm development. PMID:18276791

  15. The mycoheterotrophic symbiosis between orchids and mycorrhizal fungi possesses major components shared with mutualistic plant-mycorrhizal symbioses.

    PubMed

    Miura, Chihiro; Yamaguchi, Katsushi; Miyahara, Ryohei; Yamamoto, Tatsuki; Fuji, Masako; Yagame, Takahiro; Imaizumi-Anraku, Haruko; Yamato, Masahide; Shigenobu, Shuji; Kaminaka, Hironori

    2018-04-12

    Achlorophylous and early developmental stages of chorolophylous orchids are highly dependent on carbon and other nutrients provided by mycorrhizal fungi, in a nutritional mode termed mycoheterotrophy. Previous findings have implied that some common properties at least partially underlie the mycorrhizal symbioses of mycoheterotrophic orchids and that of autotrophic arbuscular mycorrhizal (AM) plants; however, information about the molecular mechanisms of the relationship between orchids and their mycorrhizal fungi is limited. In this study, we characterized the molecular basis of an orchid-mycorrhizal (OM) symbiosis by analyzing the transcriptome of Bletilla striata at an early developmental stage associated with the mycorrhizal fungus Tulasnella sp. The essential components required for the establishment of mutual symbioses with AM fungi and/or rhizobia in most terrestrial plants were identified from B. striata gene set. A cross-species gene complementation analysis showed one of the component genes, calcium and calmodulin-dependent protein kinase gene CCaMK in B. striata, retains functional characteristics of that in AM plants. The expression analysis revealed the activation of homologs of AM-related genes during the OM symbiosis. Our results suggest that orchids possess, at least partly, the molecular mechanisms common to AM plants.

  16. Biological Networks for Predicting Chemical Hepatocarcinogenicity Using Gene Expression Data from Treated Mice and Relevance across Human and Rat Species

    PubMed Central

    Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.

    2013-01-01

    Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943

  17. Biological networks for predicting chemical hepatocarcinogenicity using gene expression data from treated mice and relevance across human and rat species.

    PubMed

    Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J

    2013-01-01

    Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.

  18. Global identification of genes regulated by estrogen signaling and demethylation in MCF-7 breast cancer cells

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Putnik, Milica, E-mail: milica.putnik@ki.se; Zhao, Chunyan, E-mail: chunyan.zhao@ki.se; Gustafsson, Jan-Ake, E-mail: jan-ake.gustafsson@ki.se

    Highlights: Black-Right-Pointing-Pointer Estrogen signaling and demethylation can both control gene expression in breast cancers. Black-Right-Pointing-Pointer Cross-talk between these mechanisms is investigated in human MCF-7 breast cancer cells. Black-Right-Pointing-Pointer 137 genes are influenced by both 17{beta}-estradiol and demethylating agent 5-aza-2 Prime -deoxycytidine. Black-Right-Pointing-Pointer A set of genes is identified as targets of both estrogen signaling and demethylation. Black-Right-Pointing-Pointer There is no direct molecular interplay of mediators of estrogen and epigenetic signaling. -- Abstract: Estrogen signaling and epigenetic modifications, in particular DNA methylation, are involved in regulation of gene expression in breast cancers. Here we investigated a potential regulatory cross-talk between thesemore » two pathways by identifying their common target genes and exploring underlying molecular mechanisms in human MCF-7 breast cancer cells. Gene expression profiling revealed that the expression of approximately 140 genes was influenced by both 17{beta}-estradiol (E2) and a demethylating agent 5-aza-2 Prime -deoxycytidine (DAC). Gene ontology (GO) analysis suggests that these genes are involved in intracellular signaling cascades, regulation of cell proliferation and apoptosis. Based on previously reported association with breast cancer, estrogen signaling and/or DNA methylation, CpG island prediction and GO analysis, we selected six genes (BTG3, FHL2, PMAIP1, BTG2, CDKN1A and TGFB2) for further analysis. Tamoxifen reverses the effect of E2 on the expression of all selected genes, suggesting that they are direct targets of estrogen receptor. Furthermore, DAC treatment reactivates the expression of all selected genes in a dose-dependent manner. Promoter CpG island methylation status analysis revealed that only the promoters of BTG3 and FHL2 genes are methylated, with DAC inducing demethylation, suggesting DNA methylation directs repression of these genes in MCF-7 cells. In a further analysis of the potential interplay between estrogen signaling and DNA methylation, E2 treatment showed no effect on the methylation status of these promoters. Additionally, we show that the ER{alpha} recruitment occurs at the FHL2 promoter in an E2- and DAC-independent fashion. In conclusion, we identified a set of genes regulated by both estrogen signaling and DNA methylation. However, our data does not support a direct molecular interplay of mediators of estrogen and epigenetic signaling at promoters of regulated genes.« less

  19. The parthenocarpic gene Pat-k is generated by a natural mutation of SlAGL6 affecting fruit development in tomato (Solanum lycopersicum L.).

    PubMed

    Takisawa, Rihito; Nakazaki, Tetsuya; Nunome, Tsukasa; Fukuoka, Hiroyuki; Kataoka, Keiko; Saito, Hiroki; Habu, Tsuyoshi; Kitajima, Akira

    2018-04-27

    Parthenocarpy is a desired trait in tomato because it can overcome problems with fruit setting under unfavorable environmental conditions. A parthenocarpic tomato cultivar, 'MPK-1', with a parthenocarpic gene, Pat-k, exhibits stable parthenocarpy that produces few seeds. Because 'MPK-1' produces few seeds, seedlings are propagated inefficiently via cuttings. It was reported that Pat-k is located on chromosome 1. However, the gene had not been isolated and the relationship between the parthenocarpy and low seed set in 'MPK-1' remained unclear. In this study, we isolated Pat-k to clarify the relationship between parthenocarpy and low seed set in 'MPK-1'. Using quantitative trait locus (QTL) analysis for parthenocarpy and seed production, we detected a major QTL for each trait on nearly the same region of the Pat-k locus on chromosome 1. To isolate Pat-k, we performed fine mapping using an F 4 population following the cross between a non-parthenocarpic cultivar, 'Micro-Tom' and 'MPK-1'. The results showed that Pat-k was located in the 529 kb interval between two markers, where 60 genes exist. By using data from a whole genome re-sequencing and genome sequence analysis of 'MPK-1', we could identify that the SlAGAMOUS-LIKE 6 (SlAGL6) gene of 'MPK-1' was mutated by a retrotransposon insertion. The transcript level of SlAGL6 was significantly lower in ovaries of 'MPK-1' than a non-parthenocarpic cultivar. From these results, we could conclude that Pat-k is SlAGL6, and its down-regulation in 'MPK-1' causes parthenocarpy and low seed set. In addition, we observed abnormal micropyles only in plants homozygous for the 'MPK-1' allele at the Pat-k/SlAGL6 locus. This result suggests that Pat-k/SlAGL6 is also related to ovule formation and that the low seed set in 'MPK-1' is likely caused by abnormal ovule formation through down-regulation of Pat-k/SlAGL6. Pat-k is identical to SlAGL6, and its down-regulation causes parthenocarpy and low seed set in 'MPK-1'. Moreover, down-regulation of Pat-k/SlAGL6 could cause abnormal ovule formation, leading to a reduction in the number of seeds.

  20. oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes

    PubMed Central

    Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.

    2005-01-01

    Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209

  1. CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

    PubMed Central

    Hestand, Matthew S; van Galen, Michiel; Villerius, Michel P; van Ommen, Gert-Jan B; den Dunnen, Johan T; 't Hoen, Peter AC

    2008-01-01

    Background The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. Results We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFACR database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. Conclusion The program CORE_TF is accessible in a user friendly web interface at . It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites. PMID:19036135

  2. Virulotyping of Shigella spp. isolated from pediatric patients in Tehran, Iran.

    PubMed

    Ranjbar, Reza; Bolandian, Masomeh; Behzadi, Payam

    2017-03-01

    Shigellosis is a considerable infectious disease with high morbidity and mortality among children worldwide. In this survey the prevalence of four important virulence genes including ial, ipaH, set1A, and set1B were investigated among Shigella strains and the related gene profiles identified in the present investigation, stool specimens were collected from children who were referred to two hospitals in Tehran, Iran. The samples were collected during 3 years (2008-2010) from children who were suspected to shigellosis. Shigella spp. were identified throughout microbiological and serological tests and then subjected to PCR for virulotyping. Shigella sonnei was ranking first (65.5%) followed by Shigella flexneri (25.9%), Shigella boydii (6.9%), and Shigella dysenteriae (1.7%). The ial gene was the most frequent virulence gene among isolated bacterial strains and was followed by ipaH, set1B, and set1A. S. flexneri possessed all of the studied virulence genes (ial 65.51%, ipaH 58.62%, set1A 12.07%, and set1B 22.41%). Moreover, the pattern of virulence gene profiles including ial, ial-ipaH, ial-ipaH-set1B, and ial-ipaH-set1B-set1A was identified for isolated Shigella spp. strains. The pattern of virulence genes is changed in isolated strains of Shigella in this study. So, the ial gene is placed first and the ipaH in second.

  3. Differential gene expression between African American and European American colorectal cancer patients.

    PubMed

    Jovov, Biljana; Araujo-Perez, Felix; Sigel, Carlie S; Stratford, Jeran K; McCoy, Amber N; Yeh, Jen Jen; Keku, Temitope

    2012-01-01

    The incidence and mortality of colorectal cancer (CRC) is higher in African Americans (AAs) than other ethnic groups in the U. S., but reasons for the disparities are unknown. We performed gene expression profiling of sporadic CRCs from AAs vs. European Americans (EAs) to assess the contribution to CRC disparities. We evaluated the gene expression of 43 AA and 43 EA CRC tumors matched by stage and 40 matching normal colorectal tissues using the Agilent human whole genome 4x44K cDNA arrays. Gene and pathway analyses were performed using Significance Analysis of Microarrays (SAM), Ten-fold cross validation, and Ingenuity Pathway Analysis (IPA). SAM revealed that 95 genes were differentially expressed between AA and EA patients at a false discovery rate of ≤5%. Using IPA we determined that most prominent disease and pathway associations of differentially expressed genes were related to inflammation and immune response. Ten-fold cross validation demonstrated that following 10 genes can predict ethnicity with an accuracy of 94%: CRYBB2, PSPH, ADAL, VSIG10L, C17orf81, ANKRD36B, ZNF835, ARHGAP6, TRNT1 and WDR8. Expression of these 10 genes was validated by qRT-PCR in an independent test set of 28 patients (10 AA, 18 EA). Our results are the first to implicate differential gene expression in CRC racial disparities and indicate prominent difference in CRC inflammation between AA and EA patients. Differences in susceptibility to inflammation support the existence of distinct tumor microenvironments in these two patient populations.

  4. Differential Gene Expression between African American and European American Colorectal Cancer Patients

    PubMed Central

    Jovov, Biljana; Araujo-Perez, Felix; Sigel, Carlie S.; Stratford, Jeran K.; McCoy, Amber N.; Yeh, Jen Jen; Keku, Temitope

    2012-01-01

    The incidence and mortality of colorectal cancer (CRC) is higher in African Americans (AAs) than other ethnic groups in the U. S., but reasons for the disparities are unknown. We performed gene expression profiling of sporadic CRCs from AAs vs. European Americans (EAs) to assess the contribution to CRC disparities. We evaluated the gene expression of 43 AA and 43 EA CRC tumors matched by stage and 40 matching normal colorectal tissues using the Agilent human whole genome 4x44K cDNA arrays. Gene and pathway analyses were performed using Significance Analysis of Microarrays (SAM), Ten-fold cross validation, and Ingenuity Pathway Analysis (IPA). SAM revealed that 95 genes were differentially expressed between AA and EA patients at a false discovery rate of ≤5%. Using IPA we determined that most prominent disease and pathway associations of differentially expressed genes were related to inflammation and immune response. Ten-fold cross validation demonstrated that following 10 genes can predict ethnicity with an accuracy of 94%: CRYBB2, PSPH, ADAL, VSIG10L, C17orf81, ANKRD36B, ZNF835, ARHGAP6, TRNT1 and WDR8. Expression of these 10 genes was validated by qRT-PCR in an independent test set of 28 patients (10 AA, 18 EA). Our results are the first to implicate differential gene expression in CRC racial disparities and indicate prominent difference in CRC inflammation between AA and EA patients. Differences in susceptibility to inflammation support the existence of distinct tumor microenvironments in these two patient populations. PMID:22276153

  5. Sequence-based model of gap gene regulatory network.

    PubMed

    Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria

    2014-01-01

    The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3) functional important sites are not exclusively located in cis-regulatory elements, but are rather dispersed through regulatory region. It is of importance that some of the sites with high functional impact in hb, Kr and kni regulatory regions coincide with strong sites annotated and verified in Dnase I footprint assays.

  6. Differential expression of alternatively spliced transcripts related to energy metabolism in colorectal cancer.

    PubMed

    Snezhkina, Anastasiya Vladimirovna; Krasnov, George Sergeevich; Zaretsky, Andrew Rostislavovich; Zhavoronkov, Alex; Nyushko, Kirill Mikhailovich; Moskalev, Alexey Alexandrovich; Karpova, Irina Yurievna; Afremova, Anastasiya Isaevna; Lipatova, Anastasiya Valerievna; Kochetkov, Dmitriy Vladimitovich; Fedorova, Maria Sergeena; Volchenko, Nadezhda Nikolaevna; Sadritdinova, Asiya Fayazovna; Melnikova, Nataliya Vladimirovna; Sidorov, Dmitry Vladimirovich; Popov, Anatoly Yurievich; Kalinin, Dmitry Valerievich; Kaprin, Andrey Dmitrievich; Alekseev, Boris Yakovlevich; Dmitriev, Alexey Alexandrovich; Kudryavtseva, Anna Viktorovna

    2016-12-28

    Colorectal cancer (CRC) is one of the most common malignant tumors worldwide. CRC molecular pathogenesis is heterogeneous and may be followed by mutations in oncogenes and tumor suppressor genes, chromosomal and microsatellite instability, alternative splicing alterations, hypermethylation of CpG islands, oxidative stress, impairment of different signaling pathways and energy metabolism. In the present work, we have studied the alterations of alternative splicing patterns of genes related to energy metabolism in CRC. Using CrossHub software, we analyzed The Cancer Genome Atlas (TCGA) RNA-Seq datasets derived from colon tumor and matched normal tissues. The expression of 1014 alternative mRNA isoforms involved in cell energy metabolism was examined. We found 7 genes with differentially expressed alternative transcripts whereas overall expression of these genes was not significantly altered in CRC. A set of 8 differentially expressed transcripts of interest has been validated by qPCR. These eight isoforms encoded by OGDH, COL6A3, ICAM1, PHPT1, PPP2R5D, SLC29A1, and TRIB3 genes were up-regulated in colorectal tumors, and this is in concordance with the bioinformatics data. The alternative transcript NM_057167 of COL6A3 was also strongly up-regulated in breast, lung, prostate, and kidney tumors. Alternative transcript of SLC29A1 (NM_001078177) was up-regulated only in CRC samples, but not in the other tested tumor types. We identified tumor-specific expression of alternative spliced transcripts of seven genes involved in energy metabolism in CRC. Our results bring new knowledge on alternative splicing in colorectal cancer and suggest a set of mRNA isoforms that could be used for cancer diagnosis and development of treatment methods.

  7. Comprehensive QTL mapping survey dissects the complex fruit texture physiology in apple (Malus x domestica Borkh.).

    PubMed

    Longhi, Sara; Moretto, Marco; Viola, Roberto; Velasco, Riccardo; Costa, Fabrizio

    2012-02-01

    Fruit ripening is a complex physiological process in plants whereby cell wall programmed changes occur mainly to promote seed dispersal. Cell wall modification also directly regulates the textural properties, a fundamental aspect of fruit quality. In this study, two full-sib populations of apple, with 'Fuji' as the common maternal parent, crossed with 'Delearly' and 'Pink Lady', were used to understand the control of fruit texture by QTL mapping and in silico gene mining. Texture was dissected with a novel high resolution phenomics strategy, simultaneously profiling both mechanical and acoustic fruit texture components. In 'Fuji × Delearly' nine linkage groups were associated with QTLs accounting from 15.6% to 49% of the total variance, and a highly significant QTL cluster for both textural components was mapped on chromosome 10 and co-located with Md-PG1, a polygalacturonase gene that, in apple, is known to be involved in cell wall metabolism processes. In addition, other candidate genes related to Md-NOR and Md-RIN transcription factors, Md-Pel (pectate lyase), and Md-ACS1 were mapped within statistical intervals. In 'Fuji × Pink Lady', a smaller set of linkage groups associated with the QTLs identified for fruit texture (15.9-34.6% variance) was observed. The analysis of the phenotypic variance over a two-dimensional PCA plot highlighted a transgressive segregation for this progeny, revealing two QTL sets distinctively related to both mechanical and acoustic texture components. The mining of the apple genome allowed the discovery of the gene inventory underlying each QTL, and functional profile assessment unravelled specific gene expression patterns of these candidate genes.

  8. Effector-mediated discovery of a novel resistance gene against Bremia lactucae in a nonhost lettuce species.

    PubMed

    Giesbers, Anne K J; Pelgrom, Alexandra J E; Visser, Richard G F; Niks, Rients E; Van den Ackerveken, Guido; Jeuken, Marieke J W

    2017-11-01

    Candidate effectors from lettuce downy mildew (Bremia lactucae) enable high-throughput germplasm screening for the presence of resistance (R) genes. The nonhost species Lactuca saligna comprises a source of B. lactucae R genes that has hardly been exploited in lettuce breeding. Its cross-compatibility with the host species L. sativa enables the study of inheritance of nonhost resistance (NHR). We performed transient expression of candidate RXLR effector genes from B. lactucae in a diverse Lactuca germplasm set. Responses to two candidate effectors (BLR31 and BLN08) were genetically mapped and tested for co-segregation with disease resistance. BLN08 induced a hypersensitive response (HR) in 55% of the L. saligna accessions, but responsiveness did not co-segregate with resistance to Bl:24. BLR31 triggered an HR in 5% of the L. saligna accessions, and revealed a novel R gene providing complete B. lactucae race Bl:24 resistance. Resistant hybrid plants that were BLR31 nonresponsive indicated other unlinked R genes and/or nonhost QTLs. We have identified a candidate avirulence effector of B. lactucae (BLR31) and its cognate R gene in L. saligna. Concurrently, our results suggest that R genes are not required for NHR of L. saligna. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.

  9. Mapping Linked Genes in "Drosophila Melanogaster" Using Data from the F2 Generation of a Dihybrid Cross

    ERIC Educational Resources Information Center

    Marshall, Pamela A.

    2008-01-01

    "Drosophila melanogaster" is a commonly utilized organism for testing hypotheses about inheritance of traits. Students in both high school and university labs study the genetics of inheritance by analyzing offspring of appropriate "Drosophila" crosses to determine inheritance patterns, including gene linkage. However, most genetics investigations…

  10. Gene Network Rewiring to Study Melanoma Stage Progression and Elements Essential for Driving Melanoma

    PubMed Central

    Kaushik, Abhinav; Bhatia, Yashuma; Ali, Shakir; Gupta, Dinesh

    2015-01-01

    Metastatic melanoma patients have a poor prognosis, mainly attributable to the underlying heterogeneity in melanoma driver genes and altered gene expression profiles. These characteristics of melanoma also make the development of drugs and identification of novel drug targets for metastatic melanoma a daunting task. Systems biology offers an alternative approach to re-explore the genes or gene sets that display dysregulated behaviour without being differentially expressed. In this study, we have performed systems biology studies to enhance our knowledge about the conserved property of disease genes or gene sets among mutually exclusive datasets representing melanoma progression. We meta-analysed 642 microarray samples to generate melanoma reconstructed networks representing four different stages of melanoma progression to extract genes with altered molecular circuitry wiring as compared to a normal cellular state. Intriguingly, a majority of the melanoma network-rewired genes are not differentially expressed and the disease genes involved in melanoma progression consistently modulate its activity by rewiring network connections. We found that the shortlisted disease genes in the study show strong and abnormal network connectivity, which enhances with the disease progression. Moreover, the deviated network properties of the disease gene sets allow ranking/prioritization of different enriched, dysregulated and conserved pathway terms in metastatic melanoma, in agreement with previous findings. Our analysis also reveals presence of distinct network hubs in different stages of metastasizing tumor for the same set of pathways in the statistically conserved gene sets. The study results are also presented as a freely available database at http://bioinfo.icgeb.res.in/m3db/. The web-based database resource consists of results from the analysis presented here, integrated with cytoscape web and user-friendly tools for visualization, retrieval and further analysis. PMID:26558755

  11. Genome-Wide Temporal Expression Profiling in Caenorhabditis elegans Identifies a Core Gene Set Related to Long-Term Memory.

    PubMed

    Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila

    2017-07-12

    The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets. SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.

  12. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    PubMed

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  13. Development of a set of SNP markers present in expressed genes of the apple.

    PubMed

    Chagné, David; Gasic, Ksenija; Crowhurst, Ross N; Han, Yuepeng; Bassett, Heather C; Bowatte, Deepa R; Lawrence, Timothy J; Rikkerink, Erik H A; Gardiner, Susan E; Korban, Schuyler S

    2008-11-01

    Molecular markers associated with gene coding regions are useful tools for bridging functional and structural genomics. Due to their high abundance in plant genomes, single nucleotide polymorphisms (SNPs) are present within virtually all genomic regions, including most coding sequences. The objective of this study was to develop a set of SNPs for the apple by taking advantage of the wealth of genomics resources available for the apple, including a large collection of expressed sequenced tags (ESTs). Using bioinformatics tools, a search for SNPs within an EST database of approximately 350,000 sequences developed from a variety of apple accessions was conducted. This resulted in the identification of a total of 71,482 putative SNPs. As the apple genome is reported to be an ancient polyploid, attempts were made to verify whether those SNPs detected in silico were attributable either to allelic polymorphisms or to gene duplication or paralogous or homeologous sequence variations. To this end, a set of 464 PCR primer pairs was designed, PCR was amplified using two subsets of plants, and the PCR products were sequenced. The SNPs retrieved from these sequences were then mapped onto apple genetic maps, including a newly constructed map of a Royal Gala x A689-24 cross and a Malling 9 x Robusta 5, map using a bin mapping strategy. The SNP genotyping was performed using the high-resolution melting (HRM) technique. A total of 93 new markers containing 210 coding SNPs were successfully mapped. This new set of SNP markers for the apple offers new opportunities for understanding the genetic control of important horticultural traits using quantitative trait loci (QTL) or linkage disequilibrium analysis. These also serve as useful markers for aligning physical and genetic maps, and as potential transferable markers across the Rosaceae family.

  14. Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.

    2011-02-18

    Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant linksmore » between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.« less

  15. Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts.

    PubMed

    Hakenberg, Jörg; Cheng, Wei-Yi; Thomas, Philippe; Wang, Ying-Chih; Uzilov, Andrew V; Chen, Rong

    2016-01-08

    Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms. We have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples. RVS facilitates cross-study analysis to discover novel genetic risk factors, gene-disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization. A web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.

  16. In search of causal variants: refining disease association signals using cross-population contrasts.

    PubMed

    Saccone, Nancy L; Saccone, Scott F; Goate, Alison M; Grucza, Richard A; Hinrichs, Anthony L; Rice, John P; Bierut, Laura J

    2008-08-29

    Genome-wide association (GWA) using large numbers of single nucleotide polymorphisms (SNPs) is now a powerful, state-of-the-art approach to mapping human disease genes. When a GWA study detects association between a SNP and the disease, this signal usually represents association with a set of several highly correlated SNPs in strong linkage disequilibrium. The challenge we address is to distinguish among these correlated loci to highlight potential functional variants and prioritize them for follow-up. We implemented a systematic method for testing association across diverse population samples having differing histories and LD patterns, using a logistic regression framework. The hypothesis is that important underlying biological mechanisms are shared across human populations, and we can filter correlated variants by testing for heterogeneity of genetic effects in different population samples. This approach formalizes the descriptive comparison of p-values that has typified similar cross-population fine-mapping studies to date. We applied this method to correlated SNPs in the cholinergic nicotinic receptor gene cluster CHRNA5-CHRNA3-CHRNB4, in a case-control study of cocaine dependence composed of 504 European-American and 583 African-American samples. Of the 10 SNPs genotyped in the r2 > or = 0.8 bin for rs16969968, three demonstrated significant cross-population heterogeneity and are filtered from priority follow-up; the remaining SNPs include rs16969968 (heterogeneity p = 0.75). Though the power to filter out rs16969968 is reduced due to the difference in allele frequency in the two groups, the results nevertheless focus attention on a smaller group of SNPs that includes the non-synonymous SNP rs16969968, which retains a similar effect size (odds ratio) across both population samples. Filtering out SNPs that demonstrate cross-population heterogeneity enriches for variants more likely to be important and causative. Our approach provides an important and effective tool to help interpret results from the many GWA studies now underway.

  17. A Transcriptomic Analysis of Echinococcus granulosus Larval Stages: Implications for Parasite Biology and Host Adaptation

    PubMed Central

    Parkinson, John; Wasmuth, James D.; Salinas, Gustavo; Bizarro, Cristiano V.; Sanford, Chris; Berriman, Matthew; Ferreira, Henrique B.; Zaha, Arnaldo; Blaxter, Mark L.; Maizels, Rick M.; Fernández, Cecilia

    2012-01-01

    Background The cestode Echinococcus granulosus - the agent of cystic echinococcosis, a zoonosis affecting humans and domestic animals worldwide - is an excellent model for the study of host-parasite cross-talk that interfaces with two mammalian hosts. To develop the molecular analysis of these interactions, we carried out an EST survey of E. granulosus larval stages. We report the salient features of this study with a focus on genes reflecting physiological adaptations of different parasite stages. Methodology/Principal Findings We generated ∼10,000 ESTs from two sets of full-length enriched libraries (derived from oligo-capped and trans-spliced cDNAs) prepared with three parasite materials: hydatid cyst wall, larval worms (protoscoleces), and pepsin/H+-activated protoscoleces. The ESTs were clustered into 2700 distinct gene products. In the context of the biology of E. granulosus, our analyses reveal: (i) a diverse group of abundant long non-protein coding transcripts showing homology to a middle repetitive element (EgBRep) that could either be active molecular species or represent precursors of small RNAs (like piRNAs); (ii) an up-regulation of fermentative pathways in the tissue of the cyst wall; (iii) highly expressed thiol- and selenol-dependent antioxidant enzyme targets of thioredoxin glutathione reductase, the functional hub of redox metabolism in parasitic flatworms; (iv) candidate apomucins for the external layer of the tissue-dwelling hydatid cyst, a mucin-rich structure that is critical for survival in the intermediate host; (v) a set of tetraspanins, a protein family that appears to have expanded in the cestode lineage; and (vi) a set of platyhelminth-specific gene products that may offer targets for novel pan-platyhelminth drug development. Conclusions/Significance This survey has greatly increased the quality and the quantity of the molecular information on E. granulosus and constitutes a valuable resource for gene prediction on the parasite genome and for further genomic and proteomic analyses focused on cestodes and platyhelminths. PMID:23209850

  18. Successful Wide Hybridization and Introgression Breeding in a Diverse Set of Common Peppers (Capsicum annuum) Using Different Cultivated Ají (C. baccatum) Accessions as Donor Parents.

    PubMed

    Manzur, Juan Pablo; Fita, Ana; Prohens, Jaime; Rodríguez-Burruezo, Adrián

    2015-01-01

    Capsicum baccatum, commonly known as ají, has been reported as a source of variation for many different traits to improve common pepper (C. annuum), one of the most important vegetables in the world. However, strong interspecific hybridization barriers exist between them. A comparative study of two wide hybridization approaches for introgressing C. baccatum genes into C. annuum was performed: i) genetic bridge (GB) using C. chinense and C. frutescens as bridge species; and, ii) direct cross between C. annuum and C. baccatum combined with in vitro embryo rescue (ER). A diverse and representative collection of 18 accessions from four cultivated species of Capsicum was used, including C. annuum (12), C. baccatum (3), C. chinense (2), and C. frutescens (1). More than 5000 crosses were made and over 1000 embryos were rescued in the present study. C. chinense performed as a good bridge species between C. annuum and C. baccatum, with the best results being obtained with the cross combination [C. baccatum (♀) × C. chinense (♂)] (♀) × C. annuum (♂), while C. frutescens gave poor results as bridge species due to strong prezygotic and postzygotic barriers. Virus-like-syndrome or dwarfism was observed in F1 hybrids when both C. chinense and C. frutescens were used as female parents. Regarding the ER strategy, the best response was found in C. annuum (♀) × C. baccatum (♂) crosses. First backcrosses to C. annuum (BC1s) were obtained according to the crossing scheme [C. annuum (♀) × C. baccatum (♂)] (♀) × C. annuum (♂) using ER. Advantages and disadvantages of each strategy are discussed in relation to their application to breeding programmes. These results provide breeders with useful practical information for the regular utilization of the C. baccatum gene pool in C. annuum breeding.

  19. Successful Wide Hybridization and Introgression Breeding in a Diverse Set of Common Peppers (Capsicum annuum) Using Different Cultivated Ají (C. baccatum) Accessions as Donor Parents

    PubMed Central

    2015-01-01

    Capsicum baccatum, commonly known as ají, has been reported as a source of variation for many different traits to improve common pepper (C. annuum), one of the most important vegetables in the world. However, strong interspecific hybridization barriers exist between them. A comparative study of two wide hybridization approaches for introgressing C. baccatum genes into C. annuum was performed: i) genetic bridge (GB) using C. chinense and C. frutescens as bridge species; and, ii) direct cross between C. annuum and C. baccatum combined with in vitro embryo rescue (ER). A diverse and representative collection of 18 accessions from four cultivated species of Capsicum was used, including C. annuum (12), C. baccatum (3), C. chinense (2), and C. frutescens (1). More than 5000 crosses were made and over 1000 embryos were rescued in the present study. C. chinense performed as a good bridge species between C. annuum and C. baccatum, with the best results being obtained with the cross combination [C. baccatum (♀) × C. chinense (♂)] (♀) × C. annuum (♂), while C. frutescens gave poor results as bridge species due to strong prezygotic and postzygotic barriers. Virus-like-syndrome or dwarfism was observed in F1 hybrids when both C. chinense and C. frutescens were used as female parents. Regarding the ER strategy, the best response was found in C. annuum (♀) × C. baccatum (♂) crosses. First backcrosses to C. annuum (BC1s) were obtained according to the crossing scheme [C. annuum (♀) × C. baccatum (♂)] (♀) × C. annuum (♂) using ER. Advantages and disadvantages of each strategy are discussed in relation to their application to breeding programmes. These results provide breeders with useful practical information for the regular utilization of the C. baccatum gene pool in C. annuum breeding. PMID:26642059

  20. Pathway-based analysis of GWAs data identifies association of sex determination genes with susceptibility to testicular germ cell tumors.

    PubMed

    Koster, Roelof; Mitra, Nandita; D'Andrea, Kurt; Vardhanabhuti, Saran; Chung, Charles C; Wang, Zhaoming; Loren Erickson, R; Vaughn, David J; Litchfield, Kevin; Rahman, Nazneen; Greene, Mark H; McGlynn, Katherine A; Turnbull, Clare; Chanock, Stephen J; Nathanson, Katherine L; Kanetsky, Peter A

    2014-11-15

    Genome-wide association (GWA) studies of testicular germ cell tumor (TGCT) have identified 18 susceptibility loci, some containing genes encoding proteins important in male germ cell development. Deletions of one of these genes, DMRT1, lead to male-to-female sex reversal and are associated with development of gonadoblastoma. To further explore genetic association with TGCT, we undertook a pathway-based analysis of SNP marker associations in the Penn GWAs (349 TGCT cases and 919 controls). We analyzed a custom-built sex determination gene set consisting of 32 genes using three different methods of pathway-based analysis. The sex determination gene set ranked highly compared with canonical gene sets, and it was associated with TGCT (FDRG = 2.28 × 10(-5), FDRM = 0.014 and FDRI = 0.008 for Gene Set Analysis-SNP (GSA-SNP), Meta-Analysis Gene Set Enrichment of Variant Associations (MAGENTA) and Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS) analysis, respectively). The association remained after removal of DMRT1 from the gene set (FDRG = 0.0002, FDRM = 0.055 and FDRI = 0.009). Using data from the NCI GWA scan (582 TGCT cases and 1056 controls) and UK scan (986 TGCT cases and 4946 controls), we replicated these findings (NCI: FDRG = 0.006, FDRM = 0.014, FDRI = 0.033, and UK: FDRG = 1.04 × 10(-6), FDRM = 0.016, FDRI = 0.025). After removal of DMRT1 from the gene set, the sex determination gene set remains associated with TGCT in the NCI (FDRG = 0.039, FDRM = 0.050 and FDRI = 0.055) and UK scans (FDRG = 3.00 × 10(-5), FDRM = 0.056 and FDRI = 0.044). With the exception of DMRT1, genes in the sex determination gene set have not previously been identified as TGCT susceptibility loci in these GWA scans, demonstrating the complementary nature of a pathway-based approach for genome-wide analysis of TGCT. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    PubMed Central

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  2. Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function

    PubMed Central

    2009-01-01

    Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443

  3. Chromatin structure and methylation of rat rRNA genes studied by formaldehyde fixation and psoralen cross-linking.

    PubMed Central

    Stancheva, I; Lucchini, R; Koller, T; Sogo, J M

    1997-01-01

    By using formaldehyde cross-linking of histones to DNA and gel retardation assays we show that formaldehyde fixation, similar to previously established psoralen photocross-linking, discriminates between nucleosome- packed (inactive) and nucleosome-free (active) fractions of ribosomal RNA genes. By both cross-linking techniques we were able to purify fragments from agarose gels, corresponding to coding, enhancer and promoter sequences of rRNA genes, which were further investigated with respect to DNA methylation. This approach allows us to analyse independently and in detail methylation patterns of active and inactive rRNA gene copies by the combination of Hpa II and Msp I restriction enzymes. We found CpG methylation mainly present in enhancer and promoter regions of inactive rRNA gene copies. The methylation of one single Hpa II site, located in the promoter region, showed particularly strong correlation with the transcriptional activity. PMID:9108154

  4. Identification of genes and gene pathways associated with major depressive disorder by integrative brain analysis of rat and human prefrontal cortex transcriptomes

    PubMed Central

    Malki, K; Pain, O; Tosto, M G; Du Rietz, E; Carboni, L; Schalkwyk, L C

    2015-01-01

    Despite moderate heritability estimates, progress in uncovering the molecular substrate underpinning major depressive disorder (MDD) has been slow. In this study, we used prefrontal cortex (PFC) gene expression from a genetic rat model of MDD to inform probe set prioritization in PFC in a human post-mortem study to uncover genes and gene pathways associated with MDD. Gene expression differences between Flinders sensitive (FSL) and Flinders resistant (FRL) rat lines were statistically evaluated using the RankProd, non-parametric algorithm. Top ranking probe sets in the rat study were subsequently used to prioritize orthologous selection in a human PFC in a case–control post-mortem study on MDD from the Stanley Brain Consortium. Candidate genes in the human post-mortem study were then tested against a matched control sample using the RankProd method. A total of 1767 probe sets were differentially expressed in the PFC between FSL and FRL rat lines at (q⩽0.001). A total of 898 orthologous probe sets was found on Affymetrix's HG-U95A chip used in the human study. Correcting for the number of multiple, non-independent tests, 20 probe sets were found to be significantly dysregulated between human cases and controls at q⩽0.05. These probe sets tagged the expression profile of 18 human genes (11 upregulated and seven downregulated). Using an integrative rat–human study, a number of convergent genes that may have a role in pathogenesis of MDD were uncovered. Eighty percent of these genes were functionally associated with a key stress response signalling cascade, involving NF-κB (nuclear factor kappa-light-chain-enhancer of activated B cells), AP-1 (activator protein 1) and ERK/MAPK, which has been systematically associated with MDD, neuroplasticity and neurogenesis. PMID:25734512

  5. Scuba: scalable kernel-based gene prioritization.

    PubMed

    Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio

    2018-01-25

    The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

  6. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain

    PubMed Central

    Krienen, Fenna M.; Yeo, B. T. Thomas; Ge, Tian; Buckner, Randy L.; Sherwood, Chet C.

    2016-01-01

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute’s human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections. PMID:26739559

  7. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain.

    PubMed

    Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C

    2016-01-26

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.

  8. Identification of type 2 diabetes-associated combination of SNPs using support vector machine.

    PubMed

    Ban, Hyo-Jeong; Heo, Jee Yeon; Oh, Kyung-Soo; Park, Keun-Joon

    2010-04-23

    Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.

  9. Bridging Plant and Human Radiation Response and DNA Repair through an In Silico Approach

    PubMed Central

    Nikitaki, Zacharenia; Pavlopoulou, Athanasia; Holá, Marcela; Donà, Mattia; Michalopoulos, Ioannis; Balestrazzi, Alma; Angelis, Karel J.; Georgakilas, Alexandros G.

    2017-01-01

    The mechanisms of response to radiation exposure are conserved in plants and animals. The DNA damage response (DDR) pathways are the predominant molecular pathways activated upon exposure to radiation, both in plants and animals. The conserved features of DDR in plants and animals might facilitate interdisciplinary studies that cross traditional boundaries between animal and plant biology in order to expand the collection of biomarkers currently used for radiation exposure monitoring (REM) in environmental and biomedical settings. Genes implicated in trans-kingdom conserved DDR networks often triggered by ionizing radiation (IR) and UV light are deposited into biological databases. In this study, we have applied an innovative approach utilizing data pertinent to plant and human genes from publicly available databases towards the design of a ‘plant radiation biodosimeter’, that is, a plant and DDR gene-based platform that could serve as a REM reliable biomarker for assessing environmental radiation exposure and associated risk. From our analysis, in addition to REM biomarkers, a significant number of genes, both in human and Arabidopsis thaliana, not yet characterized as DDR, are suggested as possible DNA repair players. Last but not least, we provide an example on the applicability of an Arabidopsis thaliana—based plant system monitoring the role of cancer-related DNA repair genes BRCA1, BARD1 and PARP1 in processing DNA lesions. PMID:28587301

  10. Bridging Plant and Human Radiation Response and DNA Repair through an In Silico Approach.

    PubMed

    Nikitaki, Zacharenia; Pavlopoulou, Athanasia; Holá, Marcela; Donà, Mattia; Michalopoulos, Ioannis; Balestrazzi, Alma; Angelis, Karel J; Georgakilas, Alexandros G

    2017-06-06

    The mechanisms of response to radiation exposure are conserved in plants and animals. The DNA damage response (DDR) pathways are the predominant molecular pathways activated upon exposure to radiation, both in plants and animals. The conserved features of DDR in plants and animals might facilitate interdisciplinary studies that cross traditional boundaries between animal and plant biology in order to expand the collection of biomarkers currently used for radiation exposure monitoring (REM) in environmental and biomedical settings. Genes implicated in trans-kingdom conserved DDR networks often triggered by ionizing radiation (IR) and UV light are deposited into biological databases. In this study, we have applied an innovative approach utilizing data pertinent to plant and human genes from publicly available databases towards the design of a 'plant radiation biodosimeter', that is, a plant and DDR gene-based platform that could serve as a REM reliable biomarker for assessing environmental radiation exposure and associated risk. From our analysis, in addition to REM biomarkers, a significant number of genes, both in human and Arabidopsis thaliana, not yet characterized as DDR, are suggested as possible DNA repair players. Last but not least, we provide an example on the applicability of an Arabidopsis thaliana- based plant system monitoring the role of cancer-related DNA repair genes BRCA1 , BARD1 and PARP1 in processing DNA lesions.

  11. HNdb: an integrated database of gene and protein information on head and neck squamous cell carcinoma

    PubMed Central

    Henrique, Tiago; José Freitas da Silveira, Nelson; Henrique Cunha Volpato, Arthur; Mioto, Mayra Mataruco; Carolina Buzzo Stefanini, Ana; Bachir Fares, Adil; Gustavo da Silva Castro Andrade, João; Masson, Carolina; Verónica Mendoza López, Rossana; Daumas Nunes, Fabio; Paulo Kowalski, Luis; Severino, Patricia; Tajara, Eloiza Helena

    2016-01-01

    The total amount of scientific literature has grown rapidly in recent years. Specifically, there are several million citations in the field of cancer. This makes it difficult, if not impossible, to manually retrieve relevant information on the mechanisms that govern tumor behavior or the neoplastic process. Furthermore, cancer is a complex disease or, more accurately, a set of diseases. The heterogeneity that permeates many tumors is particularly evident in head and neck (HN) cancer, one of the most common types of cancer worldwide. In this study, we present HNdb, a free database that aims to provide a unified and comprehensive resource of information on genes and proteins involved in HN squamous cell carcinoma, covering data on genomics, transcriptomics, proteomics, literature citations and also cross-references of external databases. Different literature searches of MEDLINE abstracts were performed using specific Medical Subject Headings (MeSH terms) for oral, oropharyngeal, hypopharyngeal and laryngeal squamous cell carcinomas. A curated gene-to-publication assignment yielded a total of 1370 genes related to HN cancer. The diversity of results allowed identifying novel and mostly unexplored gene associations, revealing, for example, that processes linked to response to steroid hormone stimulus are significantly enriched in genes related to HN carcinomas. Thus, our database expands the possibilities for gene networks investigation, providing potential hypothesis to be tested. Database URL: http://www.gencapo.famerp.br/hndb PMID:27013077

  12. Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

    PubMed

    Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

    2006-03-01

    Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

  13. Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill

    2016-01-01

    Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956

  14. A Novel Strategy for Selection and Validation of Reference Genes in Dynamic Multidimensional Experimental Design in Yeast

    PubMed Central

    Cankorur-Cetinkaya, Ayca; Dereli, Elif; Eraslan, Serpil; Karabekmez, Erkan; Dikicioglu, Duygu; Kirdar, Betul

    2012-01-01

    Background Understanding the dynamic mechanism behind the transcriptional organization of genes in response to varying environmental conditions requires time-dependent data. The dynamic transcriptional response obtained by real-time RT-qPCR experiments could only be correctly interpreted if suitable reference genes are used in the analysis. The lack of available studies on the identification of candidate reference genes in dynamic gene expression studies necessitates the identification and the verification of a suitable gene set for the analysis of transient gene expression response. Principal Findings In this study, a candidate reference gene set for RT-qPCR analysis of dynamic transcriptional changes in Saccharomyces cerevisiae was determined using 31 different publicly available time series transcriptome datasets. Ten of the twelve candidates (TPI1, FBA1, CCW12, CDC19, ADH1, PGK1, GCN4, PDC1, RPS26A and ARF1) we identified were not previously reported as potential reference genes. Our method also identified the commonly used reference genes ACT1 and TDH3. The most stable reference genes from this pool were determined as TPI1, FBA1, CDC19 and ACT1 in response to a perturbation in the amount of available glucose and as FBA1, TDH3, CCW12 and ACT1 in response to a perturbation in the amount of available ammonium. The use of these newly proposed gene sets outperformed the use of common reference genes in the determination of dynamic transcriptional response of the target genes, HAP4 and MEP2, in response to relaxation from glucose and ammonium limitations, respectively. Conclusions A candidate reference gene set to be used in dynamic real-time RT-qPCR expression profiling in yeast was proposed for the first time in the present study. Suitable pools of stable reference genes to be used under different experimental conditions could be selected from this candidate set in order to successfully determine the expression profiles for the genes of interest. PMID:22675547

  15. Discovering transnosological molecular basis of human brain diseases using biclustering analysis of integrated gene expression data.

    PubMed

    Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su

    2015-01-01

    It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.

  16. Discovering transnosological molecular basis of human brain diseases using biclustering analysis of integrated gene expression data

    PubMed Central

    2015-01-01

    Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779

  17. Genome-Wide Meta-Analyses of Breast, Ovarian, and Prostate Cancer Association Studies Identify Multiple New Susceptibility Loci Shared by at Least Two Cancer Types.

    PubMed

    Kar, Siddhartha P; Beesley, Jonathan; Amin Al Olama, Ali; Michailidou, Kyriaki; Tyrer, Jonathan; Kote-Jarai, ZSofia; Lawrenson, Kate; Lindstrom, Sara; Ramus, Susan J; Thompson, Deborah J; Kibel, Adam S; Dansonka-Mieszkowska, Agnieszka; Michael, Agnieszka; Dieffenbach, Aida K; Gentry-Maharaj, Aleksandra; Whittemore, Alice S; Wolk, Alicja; Monteiro, Alvaro; Peixoto, Ana; Kierzek, Andrzej; Cox, Angela; Rudolph, Anja; Gonzalez-Neira, Anna; Wu, Anna H; Lindblom, Annika; Swerdlow, Anthony; Ziogas, Argyrios; Ekici, Arif B; Burwinkel, Barbara; Karlan, Beth Y; Nordestgaard, Børge G; Blomqvist, Carl; Phelan, Catherine; McLean, Catriona; Pearce, Celeste Leigh; Vachon, Celine; Cybulski, Cezary; Slavov, Chavdar; Stegmaier, Christa; Maier, Christiane; Ambrosone, Christine B; Høgdall, Claus K; Teerlink, Craig C; Kang, Daehee; Tessier, Daniel C; Schaid, Daniel J; Stram, Daniel O; Cramer, Daniel W; Neal, David E; Eccles, Diana; Flesch-Janys, Dieter; Edwards, Digna R Velez; Wokozorczyk, Dominika; Levine, Douglas A; Yannoukakos, Drakoulis; Sawyer, Elinor J; Bandera, Elisa V; Poole, Elizabeth M; Goode, Ellen L; Khusnutdinova, Elza; Høgdall, Estrid; Song, Fengju; Bruinsma, Fiona; Heitz, Florian; Modugno, Francesmary; Hamdy, Freddie C; Wiklund, Fredrik; Giles, Graham G; Olsson, Håkan; Wildiers, Hans; Ulmer, Hans-Ulrich; Pandha, Hardev; Risch, Harvey A; Darabi, Hatef; Salvesen, Helga B; Nevanlinna, Heli; Gronberg, Henrik; Brenner, Hermann; Brauch, Hiltrud; Anton-Culver, Hoda; Song, Honglin; Lim, Hui-Yi; McNeish, Iain; Campbell, Ian; Vergote, Ignace; Gronwald, Jacek; Lubiński, Jan; Stanford, Janet L; Benítez, Javier; Doherty, Jennifer A; Permuth, Jennifer B; Chang-Claude, Jenny; Donovan, Jenny L; Dennis, Joe; Schildkraut, Joellen M; Schleutker, Johanna; Hopper, John L; Kupryjanczyk, Jolanta; Park, Jong Y; Figueroa, Jonine; Clements, Judith A; Knight, Julia A; Peto, Julian; Cunningham, Julie M; Pow-Sang, Julio; Batra, Jyotsna; Czene, Kamila; Lu, Karen H; Herkommer, Kathleen; Khaw, Kay-Tee; Matsuo, Keitaro; Muir, Kenneth; Offitt, Kenneth; Chen, Kexin; Moysich, Kirsten B; Aittomäki, Kristiina; Odunsi, Kunle; Kiemeney, Lambertus A; Massuger, Leon F A G; Fitzgerald, Liesel M; Cook, Linda S; Cannon-Albright, Lisa; Hooning, Maartje J; Pike, Malcolm C; Bolla, Manjeet K; Luedeke, Manuel; Teixeira, Manuel R; Goodman, Marc T; Schmidt, Marjanka K; Riggan, Marjorie; Aly, Markus; Rossing, Mary Anne; Beckmann, Matthias W; Moisse, Matthieu; Sanderson, Maureen; Southey, Melissa C; Jones, Michael; Lush, Michael; Hildebrandt, Michelle A T; Hou, Ming-Feng; Schoemaker, Minouk J; Garcia-Closas, Montserrat; Bogdanova, Natalia; Rahman, Nazneen; Le, Nhu D; Orr, Nick; Wentzensen, Nicolas; Pashayan, Nora; Peterlongo, Paolo; Guénel, Pascal; Brennan, Paul; Paulo, Paula; Webb, Penelope M; Broberg, Per; Fasching, Peter A; Devilee, Peter; Wang, Qin; Cai, Qiuyin; Li, Qiyuan; Kaneva, Radka; Butzow, Ralf; Kopperud, Reidun Kristin; Schmutzler, Rita K; Stephenson, Robert A; MacInnis, Robert J; Hoover, Robert N; Winqvist, Robert; Ness, Roberta; Milne, Roger L; Travis, Ruth C; Benlloch, Sara; Olson, Sara H; McDonnell, Shannon K; Tworoger, Shelley S; Maia, Sofia; Berndt, Sonja; Lee, Soo Chin; Teo, Soo-Hwang; Thibodeau, Stephen N; Bojesen, Stig E; Gapstur, Susan M; Kjær, Susanne Krüger; Pejovic, Tanja; Tammela, Teuvo L J; Dörk, Thilo; Brüning, Thomas; Wahlfors, Tiina; Key, Tim J; Edwards, Todd L; Menon, Usha; Hamann, Ute; Mitev, Vanio; Kosma, Veli-Matti; Setiawan, Veronica Wendy; Kristensen, Vessela; Arndt, Volker; Vogel, Walther; Zheng, Wei; Sieh, Weiva; Blot, William J; Kluzniak, Wojciech; Shu, Xiao-Ou; Gao, Yu-Tang; Schumacher, Fredrick; Freedman, Matthew L; Berchuck, Andrew; Dunning, Alison M; Simard, Jacques; Haiman, Christopher A; Spurdle, Amanda; Sellers, Thomas A; Hunter, David J; Henderson, Brian E; Kraft, Peter; Chanock, Stephen J; Couch, Fergus J; Hall, Per; Gayther, Simon A; Easton, Douglas F; Chenevix-Trench, Georgia; Eeles, Rosalind; Pharoah, Paul D P; Lambrechts, Diether

    2016-09-01

    Breast, ovarian, and prostate cancers are hormone-related and may have a shared genetic basis, but this has not been investigated systematically by genome-wide association (GWA) studies. Meta-analyses combining the largest GWA meta-analysis data sets for these cancers totaling 112,349 cases and 116,421 controls of European ancestry, all together and in pairs, identified at P < 10(-8) seven new cross-cancer loci: three associated with susceptibility to all three cancers (rs17041869/2q13/BCL2L11; rs7937840/11q12/INCENP; rs1469713/19p13/GATAD2A), two breast and ovarian cancer risk loci (rs200182588/9q31/SMC2; rs8037137/15q26/RCCD1), and two breast and prostate cancer risk loci (rs5013329/1p34/NSUN4; rs9375701/6q23/L3MBTL3). Index variants in five additional regions previously associated with only one cancer also showed clear association with a second cancer type. Cell-type-specific expression quantitative trait locus and enhancer-gene interaction annotations suggested target genes with potential cross-cancer roles at the new loci. Pathway analysis revealed significant enrichment of death receptor signaling genes near loci with P < 10(-5) in the three-cancer meta-analysis. We demonstrate that combining large-scale GWA meta-analysis findings across cancer types can identify completely new risk loci common to breast, ovarian, and prostate cancers. We show that the identification of such cross-cancer risk loci has the potential to shed new light on the shared biology underlying these hormone-related cancers. Cancer Discov; 6(9); 1052-67. ©2016 AACR.This article is highlighted in the In This Issue feature, p. 932. ©2016 American Association for Cancer Research.

  18. Beyond main effects of gene-sets: harsh parenting moderates the association between a dopamine gene-set and child externalizing behavior.

    PubMed

    Windhorst, Dafna A; Mileva-Seitz, Viara R; Rippe, Ralph C A; Tiemeier, Henning; Jaddoe, Vincent W V; Verhulst, Frank C; van IJzendoorn, Marinus H; Bakermans-Kranenburg, Marian J

    2016-08-01

    In a longitudinal cohort study, we investigated the interplay of harsh parenting and genetic variation across a set of functionally related dopamine genes, in association with children's externalizing behavior. This is one of the first studies to employ gene-based and gene-set approaches in tests of Gene by Environment (G × E) effects on complex behavior. This approach can offer an important alternative or complement to candidate gene and genome-wide environmental interaction (GWEI) studies in the search for genetic variation underlying individual differences in behavior. Genetic variants in 12 autosomal dopaminergic genes were available in an ethnically homogenous part of a population-based cohort. Harsh parenting was assessed with maternal (n = 1881) and paternal (n = 1710) reports at age 3. Externalizing behavior was assessed with the Child Behavior Checklist (CBCL) at age 5 (71 ± 3.7 months). We conducted gene-set analyses of the association between variation in dopaminergic genes and externalizing behavior, stratified for harsh parenting. The association was statistically significant or approached significance for children without harsh parenting experiences, but was absent in the group with harsh parenting. Similarly, significant associations between single genes and externalizing behavior were only found in the group without harsh parenting. Effect sizes in the groups with and without harsh parenting did not differ significantly. Gene-environment interaction tests were conducted for individual genetic variants, resulting in two significant interaction effects (rs1497023 and rs4922132) after correction for multiple testing. Our findings are suggestive of G × E interplay, with associations between dopamine genes and externalizing behavior present in children without harsh parenting, but not in children with harsh parenting experiences. Harsh parenting may overrule the role of genetic factors in externalizing behavior. Gene-based and gene-set analyses offer promising new alternatives to analyses focusing on single candidate polymorphisms when examining the interplay between genetic and environmental factors.

  19. Identification of a QTL in Mus musculus for Alcohol Preference, Withdrawal, and Ap3m2 Expression Using Integrative Functional Genomics and Precision Genetics

    PubMed Central

    Bubier, Jason A.; Jay, Jeremy J.; Baker, Christopher L.; Bergeson, Susan E.; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C.; Chesler, Elissa J.

    2014-01-01

    Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. PMID:24923803

  20. Identification of a QTL in Mus musculus for alcohol preference, withdrawal, and Ap3m2 expression using integrative functional genomics and precision genetics.

    PubMed

    Bubier, Jason A; Jay, Jeremy J; Baker, Christopher L; Bergeson, Susan E; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C; Chesler, Elissa J

    2014-08-01

    Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. Copyright © 2014 by the Genetics Society of America.

  1. Molecular Analysis and Genomic Organization of Major DNA Satellites in Banana (Musa spp.)

    PubMed Central

    Čížková, Jana; Hřibová, Eva; Humplíková, Lenka; Christelová, Pavla; Suchánková, Pavla; Doležel, Jaroslav

    2013-01-01

    Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa. PMID:23372772

  2. Molecular analysis and genomic organization of major DNA satellites in banana (Musa spp.).

    PubMed

    Čížková, Jana; Hřibová, Eva; Humplíková, Lenka; Christelová, Pavla; Suchánková, Pavla; Doležel, Jaroslav

    2013-01-01

    Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa.

  3. Changes in Gene Expression Predicting Local Control in Cervical Cancer: Results from Radiation Therapy Oncology Group 0128

    PubMed Central

    Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David

    2009-01-01

    Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178

  4. Combining Human Epigenetics and Sleep Studies in Caenorhabditis elegans: A Cross-Species Approach for Finding Conserved Genes Regulating Sleep.

    PubMed

    Huang, Huiyan; Zhu, Yong; Eliot, Melissa N; Knopik, Valerie S; McGeary, John E; Carskadon, Mary A; Hart, Anne C

    2017-06-01

    We aimed to test a combined approach to identify conserved genes regulating sleep and to explore the association between DNA methylation and sleep length. We identified candidate genes associated with shorter versus longer sleep duration in college students based on DNA methylation using Illumina Infinium HumanMethylation450 BeadChip arrays. Orthologous genes in Caenorhabditis elegans were identified, and we examined whether their loss of function affected C. elegans sleep. For genes whose perturbation affected C. elegans sleep, we subsequently undertook a small pilot study to re-examine DNA methylation in an independent set of human participants with shorter versus longer sleep durations. Eighty-seven out of 485,577 CpG sites had significant differential methylation in young adults with shorter versus longer sleep duration, corresponding to 52 candidate genes. We identified 34 C. elegans orthologs, including NPY/flp-18 and flp-21, which are known to affect sleep. Loss of five additional genes alters developmentally timed C. elegans sleep (B4GALT6/bre-4, DOCK180/ced-5, GNB2L1/rack-1, PTPRN2/ida-1, ZFYVE28/lst-2). For one of these genes, ZFYVE28 (also known as hLst2), the pilot replication study again found decreased DNA methylation associated with shorter sleep duration at the same two CpG sites in the first intron of ZFYVE28. Using an approach that combines human epigenetics and C. elegans sleep studies, we identified five genes that play previously unidentified roles in C. elegans sleep. We suggest sleep duration in humans may be associated with differential DNA methylation at specific sites and that the conserved genes identified here likely play roles in C. elegans sleep and in other species. © Sleep Research Society 2017. Published by Oxford University Press on behalf of the Sleep Research Society. All rights reserved. For permissions, please e-mail journals.permissions@oup.com.

  5. Panels of tumor-derived RNA markers in peripheral blood of patients with non-small cell lung cancer: their dependence on age, gender and clinical stages.

    PubMed

    Chian, Chih-Feng; Hwang, Yi-Ting; Terng, Harn-Jing; Lee, Shih-Chun; Chao, Tsui-Yi; Chang, Hung; Ho, Ching-Liang; Wu, Yi-Ying; Perng, Wann-Cherng

    2016-08-02

    Peripheral blood mononuclear cell (PBMC)-derived gene signatures were investigated for their potential use in the early detection of non-small cell lung cancer (NSCLC). In our study, 187 patients with NSCLC and 310 age- and gender-matched controls, and an independent set containing 29 patients for validation were included. Eight significant NSCLC-associated genes were identified, including DUSP6, EIF2S3, GRB2, MDM2, NF1, POLDIP2, RNF4, and WEE1. The logistic model containing these significant markers was able to distinguish subjects with NSCLC from controls with an excellent performance, 80.7% sensitivity, 90.6% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.924. Repeated random sub-sampling for 100 times was used to validate the performance of classification training models with an average AUC of 0.92. Additional cross-validation using the independent set resulted in the sensitivity 75.86%. Furthermore, six age/gender-dependent genes: CPEB4, EIF2S3, GRB2, MCM4, RNF4, and STAT2 were identified using age and gender stratification approach. STAT2 and WEE1 were explored as stage-dependent using stage-stratified subpopulation. We conclude that these logistic models using different signatures for total and stratified samples are potential complementary tools for assessing the risk of NSCLC.

  6. A whole blood gene expression-based signature for smoking status

    PubMed Central

    2012-01-01

    Background Smoking is the leading cause of preventable death worldwide and has been shown to increase the risk of multiple diseases including coronary artery disease (CAD). We sought to identify genes whose levels of expression in whole blood correlate with self-reported smoking status. Methods Microarrays were used to identify gene expression changes in whole blood which correlated with self-reported smoking status; a set of significant genes from the microarray analysis were validated by qRT-PCR in an independent set of subjects. Stepwise forward logistic regression was performed using the qRT-PCR data to create a predictive model whose performance was validated in an independent set of subjects and compared to cotinine, a nicotine metabolite. Results Microarray analysis of whole blood RNA from 209 PREDICT subjects (41 current smokers, 4 quit ≤ 2 months, 64 quit > 2 months, 100 never smoked; NCT00500617) identified 4214 genes significantly correlated with self-reported smoking status. qRT-PCR was performed on 1,071 PREDICT subjects across 256 microarray genes significantly correlated with smoking or CAD. A five gene (CLDND1, LRRN3, MUC1, GOPC, LEF1) predictive model, derived from the qRT-PCR data using stepwise forward logistic regression, had a cross-validated mean AUC of 0.93 (sensitivity=0.78; specificity=0.95), and was validated using 180 independent PREDICT subjects (AUC=0.82, CI 0.69-0.94; sensitivity=0.63; specificity=0.94). Plasma from the 180 validation subjects was used to assess levels of cotinine; a model using a threshold of 10 ng/ml cotinine resulted in an AUC of 0.89 (CI 0.81-0.97; sensitivity=0.81; specificity=0.97; kappa with expression model = 0.53). Conclusion We have constructed and validated a whole blood gene expression score for the evaluation of smoking status, demonstrating that clinical and environmental factors contributing to cardiovascular disease risk can be assessed by gene expression. PMID:23210427

  7. Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

    PubMed

    Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

    2014-01-01

    Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

  8. Differential gene expression in Varroa jacobsoni mites following a host shift to European honey bees (Apis mellifera).

    PubMed

    Andino, Gladys K; Gribskov, Michael; Anderson, Denis L; Evans, Jay D; Hunt, Greg J

    2016-11-16

    Varroa mites are widely considered the biggest honey bee health problem worldwide. Until recently, Varroa jacobsoni has been found to live and reproduce only in Asian honey bee (Apis cerana) colonies, while V. destructor successfully reproduces in both A. cerana and A. mellifera colonies. However, we have identified an island population of V. jacobsoni that is highly destructive to A. mellifera, the primary species used for pollination and honey production. The ability of these populations of mites to cross the host species boundary potentially represents an enormous threat to apiculture, and is presumably due to genetic variation that exists among populations of V. jacobsoni that influences gene expression and reproductive status. In this work, we investigate differences in gene expression between populations of V. jacobsoni reproducing on A. cerana and those either reproducing or not capable of reproducing on A. mellifera, in order to gain insight into differences that allow V. jacobsoni to overcome its normal species tropism. We sequenced and assembled a de novo transcriptome of V. jacobsoni. We also performed a differential gene expression analysis contrasting biological replicates of V. jacobsoni populations that differ in their ability to reproduce on A. mellifera. Using the edgeR, EBSeq and DESeq R packages for differential gene expression analysis, we found 287 differentially expressed genes (FDR ≤ 0.05), of which 91% were up regulated in mites reproducing on A. mellifera. In addition, mites found reproducing on A. mellifera showed substantially more variation in expression among replicates. We searched for orthologous genes in public databases and were able to associate 100 of these 287 differentially expressed genes with a functional description. There is differential gene expression between the two mite groups, with more variation in gene expression among mites that were able to reproduce on A. mellifera. A small set of genes showed reduced expression in mites on the A. mellifera host, including putative transcription factors and digestive tract developmental genes. The vast majority of differentially expressed genes were up-regulated in this host. This gene set showed enrichment for genes associated with mitochondrial respiratory function and apoptosis, suggesting that mites on this host may be experiencing higher stress, and may be less optimally adapted to parasitize it. Some genes involved in reproduction and oogenesis were also overexpressed, which should be further studied in regards to this host shift.

  9. Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

    PubMed Central

    Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298

  10. Lessons Learned from a Cross-Model Validation between a Discrete Event Simulation Model and a Cohort State-Transition Model for Personalized Breast Cancer Treatment.

    PubMed

    Jahn, Beate; Rochau, Ursula; Kurzthaler, Christina; Paulden, Mike; Kluibenschädl, Martina; Arvandi, Marjan; Kühne, Felicitas; Goehler, Alexander; Krahn, Murray D; Siebert, Uwe

    2016-04-01

    Breast cancer is the most common malignancy among women in developed countries. We developed a model (the Oncotyrol breast cancer outcomes model) to evaluate the cost-effectiveness of a 21-gene assay when used in combination with Adjuvant! Online to support personalized decisions about the use of adjuvant chemotherapy. The goal of this study was to perform a cross-model validation. The Oncotyrol model evaluates the 21-gene assay by simulating a hypothetical cohort of 50-year-old women over a lifetime horizon using discrete event simulation. Primary model outcomes were life-years, quality-adjusted life-years (QALYs), costs, and incremental cost-effectiveness ratios (ICERs). We followed the International Society for Pharmacoeconomics and Outcomes Research-Society for Medical Decision Making (ISPOR-SMDM) best practice recommendations for validation and compared modeling results of the Oncotyrol model with the state-transition model developed by the Toronto Health Economics and Technology Assessment (THETA) Collaborative. Both models were populated with Canadian THETA model parameters, and outputs were compared. The differences between the models varied among the different validation end points. The smallest relative differences were in costs, and the greatest were in QALYs. All relative differences were less than 1.2%. The cost-effectiveness plane showed that small differences in the model structure can lead to different sets of nondominated test-treatment strategies with different efficiency frontiers. We faced several challenges: distinguishing between differences in outcomes due to different modeling techniques and initial coding errors, defining meaningful differences, and selecting measures and statistics for comparison (means, distributions, multivariate outcomes). Cross-model validation was crucial to identify and correct coding errors and to explain differences in model outcomes. In our comparison, small differences in either QALYs or costs led to changes in ICERs because of changes in the set of dominated and nondominated strategies. © The Author(s) 2015.

  11. Genetic Changes Accompanying the Domestication of Pisum sativum: Is there a Common Genetic Basis to the ‘Domestication Syndrome’ for Legumes?

    PubMed Central

    Weeden, Norman F.

    2007-01-01

    Background and Aims The changes that occur during the domestication of crops such as maize and common bean appear to be controlled by relatively few genes. This study investigates the genetic basis of domestication in pea (Pisum sativum) and compares the genes involved with those determined to be important in common bean domestication. Methods Quantitative trait loci and classical genetic analysis are used to investigate and identify the genes modified at three stages of the domestication process. Five recombinant inbred populations involving crosses between different lines representing different stages are examined. Key Results A minimum of 15 known genes, in addition to a relatively few major quantitative trait loci, are identified as being critical to the domestication process. These genes control traits such as pod dehiscence, seed dormancy, seed size and other seed quality characters, stem height, root mass, and harvest index. Several of the genes have pleiotropic effects that in species possessing a more rudimentary genetic characterization might have been interpreted as clusters of genes. Very little evidence for gene clustering was found in pea. When compared with common bean, pea has used a different set of genes to produce the same or similar phenotypic changes. Conclusions Similar to results for common bean, relatively few genes appear to have been modified during the domestication of pea. However, the genes involved are different, and there does not appear to be a common genetic basis to ‘domestication syndrome’ in the Fabaceae. PMID:17660515

  12. Expression Analysis of Stress-Related Genes in Kernels of Different Maize (Zea mays L.) Inbred Lines with Different Resistance to Aflatoxin Contamination

    PubMed Central

    Jiang, Tingbo; Zhou, Boru; Luo, Meng; Abbas, Hamed K.; Kemerait, Robert; Lee, Robert Dewey; Scully, Brian T.; Guo, Baozhu

    2011-01-01

    This research examined the expression patterns of 94 stress-related genes in seven maize inbred lines with differential expressions of resistance to aflatoxin contamination. The objective was to develop a set of genes/probes associated with resistance to A. flavus and/or aflatoxin contamination. Ninety four genes were selected from previous gene expression studies with abiotic stress to test the differential expression in maize lines, A638, B73, Lo964, Lo1016, Mo17, Mp313E, and Tex6, using real-time RT-PCR. Based on the relative-expression levels, the seven maize inbred lines clustered into two different groups. One group included B73, Lo1016 and Mo17, which had higher levels of aflatoxin contamination and lower levels of overall gene expression. The second group which included Tex6, Mp313E, Lo964 and A638 had lower levels of aflatoxin contamination and higher overall levels of gene expressions. A total of six “cross-talking” genes were identified between the two groups, which are highly expressed in the resistant Group 2 but down-regulated in susceptible Group 1. When further subjected to drought stress, Tex6 expressed more genes up-regulated and B73 has fewer genes up-regulated. The transcript patterns and interactions measured in these experiments indicate that the resistant mechanism is an interconnected process involving many gene products and transcriptional regulators, as well as various host interactions with environmental factors, particularly, drought and high temperature. PMID:22069724

  13. PRGdb: a bioinformatics platform for plant resistance gene analysis

    PubMed Central

    Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella

    2010-01-01

    PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694

  14. Epistasis and Pleiotropy Affect the Modularity of the Genotype-Phenotype Map of Cross-Resistance in HIV-1.

    PubMed

    Polster, Robert; Petropoulos, Christos J; Bonhoeffer, Sebastian; Guillaume, Frédéric

    2016-12-01

    The genotype-phenotype (GP) map is a central concept in evolutionary biology as it describes the mapping of molecular genetic variation onto phenotypic trait variation. Our understanding of that mapping remains partial, especially when trying to link functional clustering of pleiotropic gene effects with patterns of phenotypic trait co-variation. Only on rare occasions have studies been able to fully explore that link and tend to show poor correspondence between modular structures within the GP map and among phenotypes. By dissecting the structure of the GP map of the replicative capacity of HIV-1 in 15 drug environments, we provide a detailed view of that mapping from mutational pleiotropic variation to phenotypic co-variation, including epistatic effects of a set of amino-acid substitutions in the reverse transcriptase and protease genes. We show that epistasis increases the pleiotropic degree of single mutations and provides modularity to the GP map of drug resistance in HIV-1. Moreover, modules of epistatic pleiotropic effects within the GP map match the phenotypic modules of correlated replicative capacity among drug classes. Epistasis thus increases the evolvability of cross-resistance in HIV by providing more drug- and class-specific pleiotropic profiles to the main effects of the mutations. We discuss the implications for the evolution of cross-resistance in HIV. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Joint genetic analysis of hippocampal size in mouse and human identifies a novel gene linked to neurodegenerative disease.

    PubMed

    Ashbrook, David G; Williams, Robert W; Lu, Lu; Stein, Jason L; Hibar, Derrek P; Nichols, Thomas E; Medland, Sarah E; Thompson, Paul M; Hager, Reinmar

    2014-10-03

    Variation in hippocampal volume has been linked to significant differences in memory, behavior, and cognition among individuals. To identify genetic variants underlying such differences and associated disease phenotypes, multinational consortia such as ENIGMA have used large magnetic resonance imaging (MRI) data sets in human GWAS studies. In addition, mapping studies in mouse model systems have identified genetic variants for brain structure variation with great power. A key challenge is to understand how genetically based differences in brain structure lead to the propensity to develop specific neurological disorders. We combine the largest human GWAS of brain structure with the largest mammalian model system, the BXD recombinant inbred mouse population, to identify novel genetic targets influencing brain structure variation that are linked to increased risk for neurological disorders. We first use a novel cross-species, comparative analysis using mouse and human genetic data to identify a candidate gene, MGST3, associated with adult hippocampus size in both systems. We then establish the coregulation and function of this gene in a comprehensive systems-analysis. We find that MGST3 is associated with hippocampus size and is linked to a group of neurodegenerative disorders, such as Alzheimer's.

  16. Lamellar ichthyosis maps to chromosome 14q11

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Russell, L.J.; Compton, J.G.; Bale, S.J.

    1994-09-01

    Lamellar ichthyosis (LI) is a serious skin disorder inherited as an autosomal recessive trait and characterized by large, brown plate-like scales covering the body. Skin involvement is apparent at birth, often as a collodion membrane. Scarring alopecia, ectropion, and secondary hypohidrosis are frequent. We used a panel of candidates genes that are expressed in the epidermis to study seven multiplex Caucasian families in the U.S. and six inbred (multiplex and simplex) families in Egypt. We find no recombination (Z=9.11 at {theta}=0) in either set of families with transglutaminse 1 (TGM1), the gene encoding the enzyme responsible for cross-linking proteins tomore » the cell envelope in the upper-most layer of the epidermis. In addition, striking homozygosity is observed in the inbred families for markers neighboring TGM1, defining a 9.3 cM candidate region which is bounded by MYH7 and D14S275. This is the first report of linkage in LI and suggests that further study of the TGM1 gene may identify the underlying pathogenesis of this severe, disfiguring disorder. Linkage-based genetic counseling and prenatal diagnosis is now available for informative at-risk families.« less

  17. Adaptive Evolution of Extreme Acidophile Sulfobacillus thermosulfidooxidans Potentially Driven by Horizontal Gene Transfer and Gene Loss

    PubMed Central

    Zhang, Xian; Liu, Xueduan; Liang, Yili; Guo, Xue; Xiao, Yunhua; Ma, Liyuan; Miao, Bo; Liu, Hongwei; Peng, Deliang; Huang, Wenkun; Zhang, Yuguang

    2017-01-01

    ABSTRACT Recent phylogenomic analysis has suggested that three strains isolated from different copper mine tailings around the world were taxonomically affiliated with Sulfobacillus thermosulfidooxidans. Here, we present a detailed investigation of their genomic features, particularly with respect to metabolic potentials and stress tolerance mechanisms. Comprehensive analysis of the Sulfobacillus genomes identified a core set of essential genes with specialized biological functions in the survival of acidophiles in their habitats, despite differences in their metabolic pathways. The Sulfobacillus strains also showed evidence for stress management, thereby enabling them to efficiently respond to harsh environments. Further analysis of metabolic profiles provided novel insights into the presence of genomic streamlining, highlighting the importance of gene loss as a main mechanism that potentially contributes to cellular economization. Another important evolutionary force, especially in larger genomes, is gene acquisition via horizontal gene transfer (HGT), which might play a crucial role in the recruitment of novel functionalities. Also, a successful integration of genes acquired from archaeal donors appears to be an effective way of enhancing the adaptive capacity to cope with environmental changes. Taken together, the findings of this study significantly expand the spectrum of HGT and genome reduction in shaping the evolutionary history of Sulfobacillus strains. IMPORTANCE Horizontal gene transfer (HGT) and gene loss are recognized as major driving forces that contribute to the adaptive evolution of microbial genomes, although their relative importance remains elusive. The findings of this study suggest that highly frequent gene turnovers within microorganisms via HGT were necessary to incur additional novel functionalities to increase the capacity of acidophiles to adapt to changing environments. Evidence also reveals a fascinating phenomenon of potential cross-kingdom HGT. Furthermore, genome streamlining may be a critical force in driving the evolution of microbial genomes. Taken together, this study provides insights into the importance of both HGT and gene loss in the evolution and diversification of bacterial genomes. PMID:28115381

  18. Adaptive Evolution of Extreme Acidophile Sulfobacillus thermosulfidooxidans Potentially Driven by Horizontal Gene Transfer and Gene Loss.

    PubMed

    Zhang, Xian; Liu, Xueduan; Liang, Yili; Guo, Xue; Xiao, Yunhua; Ma, Liyuan; Miao, Bo; Liu, Hongwei; Peng, Deliang; Huang, Wenkun; Zhang, Yuguang; Yin, Huaqun

    2017-04-01

    Recent phylogenomic analysis has suggested that three strains isolated from different copper mine tailings around the world were taxonomically affiliated with Sulfobacillus thermosulfidooxidans Here, we present a detailed investigation of their genomic features, particularly with respect to metabolic potentials and stress tolerance mechanisms. Comprehensive analysis of the Sulfobacillus genomes identified a core set of essential genes with specialized biological functions in the survival of acidophiles in their habitats, despite differences in their metabolic pathways. The Sulfobacillus strains also showed evidence for stress management, thereby enabling them to efficiently respond to harsh environments. Further analysis of metabolic profiles provided novel insights into the presence of genomic streamlining, highlighting the importance of gene loss as a main mechanism that potentially contributes to cellular economization. Another important evolutionary force, especially in larger genomes, is gene acquisition via horizontal gene transfer (HGT), which might play a crucial role in the recruitment of novel functionalities. Also, a successful integration of genes acquired from archaeal donors appears to be an effective way of enhancing the adaptive capacity to cope with environmental changes. Taken together, the findings of this study significantly expand the spectrum of HGT and genome reduction in shaping the evolutionary history of Sulfobacillus strains. IMPORTANCE Horizontal gene transfer (HGT) and gene loss are recognized as major driving forces that contribute to the adaptive evolution of microbial genomes, although their relative importance remains elusive. The findings of this study suggest that highly frequent gene turnovers within microorganisms via HGT were necessary to incur additional novel functionalities to increase the capacity of acidophiles to adapt to changing environments. Evidence also reveals a fascinating phenomenon of potential cross-kingdom HGT. Furthermore, genome streamlining may be a critical force in driving the evolution of microbial genomes. Taken together, this study provides insights into the importance of both HGT and gene loss in the evolution and diversification of bacterial genomes. Copyright © 2017 American Society for Microbiology.

  19. Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS.

    PubMed

    Kwon, Ji-Sun; Kim, Jihye; Nam, Dougu; Kim, Sangsoo

    2012-06-01

    Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.

  20. Common and unique elements of the ABA-regulated transcriptome of Arabidopsis guard cells

    PubMed Central

    2011-01-01

    Background In the presence of drought and other desiccating stresses, plants synthesize and redistribute the phytohormone abscisic acid (ABA). ABA promotes plant water conservation by acting on specialized cells in the leaf epidermis, guard cells, which border and regulate the apertures of stomatal pores through which transpirational water loss occurs. Following ABA exposure, solute uptake into guard cells is rapidly inhibited and solute loss is promoted, resulting in inhibition of stomatal opening and promotion of stomatal closure, with consequent plant water conservation. There is a wealth of information on the guard cell signaling mechanisms underlying these rapid ABA responses. To investigate ABA regulation of gene expression in guard cells in a systematic genome-wide manner, we analyzed data from global transcriptomes of guard cells generated with Affymetrix ATH1 microarrays, and compared these results to ABA regulation of gene expression in leaves and other tissues. Results The 1173 ABA-regulated genes of guard cells identified by our study share significant overlap with ABA-regulated genes of other tissues, and are associated with well-defined ABA-related promoter motifs such as ABREs and DREs. However, we also computationally identified a unique cis-acting motif, GTCGG, associated with ABA-induction of gene expression specifically in guard cells. In addition, approximately 300 genes showing ABA-regulation unique to this cell type were newly uncovered by our study. Within the ABA-regulated gene set of guard cells, we found that many of the genes known to encode ion transporters associated with stomatal opening are down-regulated by ABA, providing one mechanism for long-term maintenance of stomatal closure during drought. We also found examples of both negative and positive feedback in the transcriptional regulation by ABA of known ABA-signaling genes, particularly with regard to the PYR/PYL/RCAR class of soluble ABA receptors and their downstream targets, the type 2C protein phosphatases. Our data also provide evidence for cross-talk at the transcriptional level between ABA and another hormonal inhibitor of stomatal opening, methyl jasmonate. Conclusions Our results engender new insights into the basic cell biology of guard cells, reveal common and unique elements of ABA-regulation of gene expression in guard cells, and set the stage for targeted biotechnological manipulations to improve plant water use efficiency. PMID:21554708

  1. miRNA-mediated 'tug-of-war' model reveals ceRNA propensity of genes in cancers.

    PubMed

    Swain, Arpit Chandan; Mallick, Bibekanand

    2018-06-01

    Competing endogenous RNA (ceRNA) are transcripts that cross-regulate each other at the post-transcriptional level by competing for shared microRNA response elements (MREs). These have been implicated in various biological processes impacting cell-fate decisions and diseases including cancer. There are several studies that predict possible ceRNA pairs by adopting various machine-learning and mathematical approaches; however, there is no method that enables us to gauge as well as compare the propensity of the ceRNA of a gene and precisely envisages which among a pair exerts a stronger pull on the shared miRNA pool. In this study, we developed a method that uses the 'tug of war of genes' concept to predict and quantify ceRNA potential of a gene for the shared miRNA pool in cancers based on a score represented by SoCeR (score of competing endogenous RNA). The method was executed on the RNA-Seq transcriptional profiles of genes and miRNA available at TCGA along with CLIP-supported miRNA-target sites to predict ceRNA in 32 cancer types which were validated with already reported cases. The proposed method can be used to determine the sequestering capability of the gene of interest as well as in ranking the probable ceRNA candidates of a gene. Finally, we developed standalone applications (SoCeR tool) to aid researchers in easier implementation of the method in analysing different data sets or diseases. © 2018 The Authors. Published by FEBS Press and John Wiley & Sons Ltd.

  2. Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis.

    PubMed

    de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Verheijen, Mark H G; Posthuma, Danielle

    2015-11-01

    Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis.

  3. Involvement of astrocyte metabolic coupling in Tourette syndrome pathogenesis

    PubMed Central

    de Leeuw, Christiaan; Goudriaan, Andrea; Smit, August B; Yu, Dongmei; Mathews, Carol A; Scharf, Jeremiah M; Scharf, J M; Pauls, D L; Yu, D; Illmann, C; Osiecki, L; Neale, B M; Mathews, C A; Reus, V I; Lowe, T L; Freimer, N B; Cox, N J; Davis, L K; Rouleau, G A; Chouinard, S; Dion, Y; Girard, S; Cath, D C; Posthuma, D; Smit, J H; Heutink, P; King, R A; Fernandez, T; Leckman, J F; Sandor, P; Barr, C L; McMahon, W; Lyon, G; Leppert, M; Morgan, J; Weiss, R; Grados, M A; Singer, H; Jankovic, J; Tischfield, J A; Heiman, G A; Verheijen, Mark H G; Posthuma, Danielle

    2015-01-01

    Tourette syndrome is a heritable neurodevelopmental disorder whose pathophysiology remains unknown. Recent genome-wide association studies suggest that it is a polygenic disorder influenced by many genes of small effect. We tested whether these genes cluster in cellular function by applying gene-set analysis using expert curated sets of brain-expressed genes in the current largest available Tourette syndrome genome-wide association data set, involving 1285 cases and 4964 controls. The gene sets included specific synaptic, astrocytic, oligodendrocyte and microglial functions. We report association of Tourette syndrome with a set of genes involved in astrocyte function, specifically in astrocyte carbohydrate metabolism. This association is driven primarily by a subset of 33 genes involved in glycolysis and glutamate metabolism through which astrocytes support synaptic function. Our results indicate for the first time that the process of astrocyte-neuron metabolic coupling may be an important contributor to Tourette syndrome pathogenesis. PMID:25735483

  4. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

    PubMed Central

    Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar

    2016-01-01

    Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311

  5. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  6. Investigating the different mechanisms of genotoxic and non-genotoxic carcinogens by a gene set analysis.

    PubMed

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways.

  7. Investigating the Different Mechanisms of Genotoxic and Non-Genotoxic Carcinogens by a Gene Set Analysis

    PubMed Central

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed understanding of the perturbation of significant pathways. PMID:24497971

  8. Phylogenetics and evolution of Trx SET genes in fully sequenced land plants.

    PubMed

    Zhu, Xinyu; Chen, Caoyi; Wang, Baohua

    2012-04-01

    Plant Trx SET proteins are involved in H3K4 methylation and play a key role in plant floral development. Genes encoding Trx SET proteins constitute a multigene family in which the copy number varies among plant species and functional divergence appears to have occurred repeatedly. To investigate the evolutionary history of the Trx SET gene family, we made a comprehensive evolutionary analysis on this gene family from 13 major representatives of green plants. A novel clustering (here named as cpTrx clade), which included the III-1, III-2, and III-4 orthologous groups, previously resolved was identified. Our analysis showed that plant Trx proteins possessed a variety of domain organizations and gene structures among paralogs. Additional domains such as PHD, PWWP, and FYR were early integrated into primordial SET-PostSET domain organization of cpTrx clade. We suggested that the PostSET domain was lost in some members of III-4 orthologous group during the evolution of land plants. At least four classes of gene structures had been formed at the early evolutionary stage of land plants. Three intronless orphan Trx SET genes from the Physcomitrella patens (moss) were identified, and supposedly, their parental genes have been eliminated from the genome. The structural differences among evolutionary groups of plant Trx SET genes with different functions were described, contributing to the design of further experimental studies.

  9. Alteration of gene expression by zinc oxide nanoparticles or zinc sulfate in vivo and comparison with in vitro data: A harmonious case.

    PubMed

    Zhang, Wei-Dong; Zhao, Yong; Zhang, Hong-Fu; Wang, Shu-Kun; Hao, Zhi-Hui; Liu, Jing; Yuan, Yu-Qing; Zhang, Peng-Fei; Yang, Hong-Di; Shen, Wei; Li, Lan

    2016-08-01

    Granulosa cells (GCs) are those somatic cells closest to the female germ cell. GCs play a vital role in oocyte growth and development, and the oocyte is necessary for multiplication of a species. Zinc oxide (ZnO) nanoparticles (NPs) readily cross biologic barriers to be absorbed into biologic systems that make them promising candidates as food additives. The objective of the present investigation was to explore the impact of intact NPs on gene expression and the functional classification of altered genes in hen GCs in vivo, to compare the data from in vivo and in vitro studies, and finally to point out the adverse effects of ZnO NPs on the reproductive system. After a 24-week treatment, hen GCs were isolated and gene expression was quantified. Intact NPs were found in the ovary and other organs. Zn levels were similar in ZnO-NP-100 mg/kg- and ZnSO4-100 mg/kg-treated hen ovaries. ZnO-NP-100 mg/kg and ZnSO4-100 mg/kg regulated the expression of the same sets of genes, and they also altered the expression of different sets of genes individually. The number of genes altered by the ZnO-NP-100 mg/kg and ZnSO4-100 mg/kg treatments was different. Gene Ontology (GO) functional analysis reported that different results for the two treatments and, in Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, 12 pathways (out of the top 20 pathways) in each treatment were different. These results suggested that intact NPs and Zn(2+) had different effects on gene expression in GCs in vivo. In our recent publication, we noted that intact NPs and Zn(2+) differentially altered gene expression in GCs in vitro. However, GO functional classification and KEGG pathway enrichment analyses revealed close similarities for the changed genes in vivo and in vitro after ZnO NP treatment. Furthermore, close similarities were observed for the changed genes after ZnSO4 treatments in vivo and in vitro by GO functional classification and KEGG pathway enrichment analyses. Therefore, the effects of ZnO NPs on gene expression in vitro might represent their effects on gene expression in vivo. The results from this study and our earlier studies support previous findings indicating ZnO NPs promote adverse effects on organisms. Therefore, precautions should be taken when ZnO NPs are used as diet additives for hens because they might cause reproductive issues. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Minimal shortening of leukocyte telomere length across age groups in a cross-sectional study for carriers of a longevity-associated FOXO3 allele.

    PubMed

    Davy, Philip M C; Willcox, D Craig; Shimabukuro, Michio; Donlon, Timothy A; Torigoe, Trevor; Suzuki, Makoto; Higa, Moritake; Masuzaki, Hiroaki; Sata, Masataka; Chen, Randi; Murkofsky, Rachel; Morris, Brian J; Lim, Eunjung; Allsopp, Richard C; Willcox, Bradley J

    2018-04-21

    FOXO3 is one of the most prominent genes demonstrating a consistently reproducible genetic association with human longevity. The mechanisms by which these individual gene variants confer greater organismal lifespan are not well understood. We assessed the effect of longevity-associated FOXO3 alleles on age-related leukocyte telomere dynamics in a cross-sectional study comprising of samples from 121 healthy Okinawan-Japanese donors aged 21-95 years. We found that telomere length for carriers of the longevity associated allele of FOXO3 single nucleotide polymorphism rs2802292 displayed no significant correlation with age, an effect that was most pronounced in older (>50) participants. This is the first validated longevity gene variant identified to date showing an association with negligible loss of telomere length with age in humans in a cross sectional study. Reduced telomere attrition may be a key mechanism for the longevity-promoting effect of the FOXO3 genotype studied.

  11. Mapping Flagellar Genes in Chlamydomonas Using Restriction Fragment Length Polymorphisms

    PubMed Central

    Ranum, LPW.; Thompson, M. D.; Schloss, J. A.; Lefebvre, P. A.; Silflow, C. D.

    1988-01-01

    To correlate cloned nuclear DNA sequences with previously characterized mutations in Chlamydomonas and, to gain insight into the organization of its nuclear genome, we have begun to map molecular markers using restriction fragment length polymorphisms (RFLPs). A Chlamydomonas reinhardtii strain (CC-29) containing phenotypic markers on nine of the 19 linkage groups was crossed to the interfertile species Chlamydomonas smithii. DNA from each member of 22 randomly selected tetrads was analyzed for the segregation of RFLPs associated with cloned genes detected by hybridization with radioactive DNA probes. The current set of markers allows the detection of linkage to new molecular markers over approximately 54% of the existing genetic map. This study focused on mapping cloned flagellar genes and genes whose transcripts accumulate after deflagellation. Twelve different molecular clones have been assigned to seven linkage groups. The α-1 tubulin gene maps to linkage group III and is linked to the genomic sequence homologous to pcf6-100, a cDNA clone whose corresponding transcript accumulates after deflagellation. The α-2 tubulin gene maps to linkage group IV. The two β-tubulin genes are linked, with the β-1 gene being approximately 12 cM more distal from the centromere than the β-2 gene. A clone corresponding to a 73-kD dynein protein maps to the opposite arm of the same linkage group. The gene corresponding to the cDNA clone pcf6-187, whose mRNA accumulates after deflagellation, maps very close to the tightly linked pf-26 and pf-1 mutations on linkage group V. PMID:2906025

  12. MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data

    PubMed Central

    2014-01-01

    Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons. PMID:24621103

  13. Sarcosine influences apoptosis and growth of prostate cells via cell-type specific regulation of distinct sets of genes.

    PubMed

    Rodrigo, Miguel A Merlos; Strmiska, Vladislav; Horackova, Eva; Buchtelova, Hana; Michalek, Petr; Stiborova, Marie; Eckschlager, Tomas; Adam, Vojtech; Heger, Zbynek

    2018-02-01

    Sarcosine is a widely discussed oncometabolite of prostate cells. Although several reports described connections between sarcosine and various phenotypic changes of prostate cancer (PCa) cells, there is still a lack of insights on the complex phenomena of its effects on gene expression patterns, particularly in non-malignant and non-metastatic cells. To shed more light on this phenomenon, we performed parallel microarray profiling of RNA isolated from non-malignant (PNT1A), malignant (22Rv1), and metastatic (PC-3) prostate cell lines treated with sarcosine. Microarray results were experimentally verified using semi-quantitative-RT-PCR, clonogenic assay, through testing of the susceptibility of cells pre-incubated with sarcosine to anticancer agents with different modes of actions (inhibitors of topoisomerase II, DNA cross-linking agent, antimicrotubule agent and inhibitor of histone deacetylases) and by evaluation of activation of executioner caspases 3/7. We identified that irrespective of the cell type, sarcosine stimulates up-regulation of distinct sets of genes involved in cell cycle and mitosis, while down-regulates expression of genes driving apoptosis. Moreover, it was found that in all cell types, sarcosine had pronounced stimulatory effects on clonogenicity. Except of an inhibitor of histone deacetylase valproic acid, efficiency of all agents was significantly (P < 0.05) decreased in sarcosine pre-incubated cells. Our comparative study brings evidence that sarcosine affects not only metastatic PCa cells, but also their malignant and non-malignant counterparts and induces very similar changes in cells behavior, but via distinct cell-type specific targets. © 2017 Wiley Periodicals, Inc.

  14. Restoring pollen fertility in transgenic male-sterile eggplant by Cre/loxp-mediated site-specific recombination system.

    PubMed

    Cao, Bihao; Huang, Zhiyin; Chen, Guoju; Lei, Jianjun

    2010-04-01

    This study was designed to control plant fertility by cell lethal gene Barnase expressing at specific developmental stage and in specific tissue of male organ under the control of Cre/loxP system, for heterosis breeding, producing hybrid seed of eggplant. The Barnase-coding region was flanked by loxP recognition sites for Cre-recombinase. The eggplant inbred/pure line ('E-38') was transformed with Cre gene and the inbred/pure line ('E-8') was transformed with the Barnase gene situated between loxp. The experiments were done separately, by means of Agrobacterium co-culture. Four T(0) -plants with the Barnase gene were obtained, all proved to be male-sterile and incapable of producing viable pollen. Flowers stamens were shorter, but the vegetative phenotype was similar to wild-type. Five T (0) -plants with the Cre gene developed well, blossomed out and set fruit normally. The crossing of male-sterile Barnase-plants with Cre expression transgenic eggplants resulted in site-specific excision with the male-sterile plants producing normal fruits. With the Barnase was excised, pollen fertility was fully restored in the hybrids. The phenotype of these restored plants was the same as that of the wild-type. Thus, the Barnase and Cre genes were capable of stable inheritance and expression in progenies of transgenic plants.

  15. Comparative mRNA analysis of behavioral and genetic mouse models of aggression.

    PubMed

    Malki, Karim; Tosto, Maria G; Pain, Oliver; Sluyter, Frans; Mineur, Yann S; Crusio, Wim E; de Boer, Sietse; Sandnabba, Kenneth N; Kesserwani, Jad; Robinson, Edward; Schalkwyk, Leonard C; Asherson, Philip

    2016-04-01

    Mouse models of aggression have traditionally compared strains, most notably BALB/cJ and C57BL/6. However, these strains were not designed to study aggression despite differences in aggression-related traits and distinct reactivity to stress. This study evaluated expression of genes differentially regulated in a stress (behavioral) mouse model of aggression with those from a recent genetic mouse model aggression. The study used a discovery-replication design using two independent mRNA studies from mouse brain tissue. The discovery study identified strain (BALB/cJ and C57BL/6J) × stress (chronic mild stress or control) interactions. Probe sets differentially regulated in the discovery set were intersected with those uncovered in the replication study, which evaluated differences between high and low aggressive animals from three strains specifically bred to study aggression. Network analysis was conducted on overlapping genes uncovered across both studies. A significant overlap was found with the genetic mouse study sharing 1,916 probe sets with the stress model. Fifty-one probe sets were found to be strongly dysregulated across both studies mapping to 50 known genes. Network analysis revealed two plausible pathways including one centered on the UBC gene hub which encodes ubiquitin, a protein well-known for protein degradation, and another on P38 MAPK. Findings from this study support the stress model of aggression, which showed remarkable molecular overlap with a genetic model. The study uncovered a set of candidate genes including the Erg2 gene, which has previously been implicated in different psychopathologies. The gene networks uncovered points at a Redox pathway as potentially being implicated in aggressive related behaviors. © 2016 Wiley Periodicals, Inc.

  16. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  17. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  18. Relocation of a rust resistance gene R 2 and its marker-assisted gene pyramiding in confection sunflower (Helianthus annuus L.).

    PubMed

    Qi, L L; Ma, G J; Long, Y M; Hulke, B S; Gong, L; Markell, S G

    2015-03-01

    The rust resistance gene R 2 was reassigned to linkage group 14 of the sunflower genome. DNA markers linked to R 2 were identified and used for marker-assisted gene pyramiding in a confection type genetic background. Due to the frequent evolution of new pathogen races, sunflower rust is a recurring threat to sunflower production worldwide. The inbred line Morden Cross 29 (MC29) carries the rust resistance gene, R 2 , conferring resistance to numerous races of rust fungus in the US, Canada, and Australia, and can be used as a broad-spectrum resistance resource. Based on phenotypic assessments and SSR marker analyses on the 117 F2 individuals derived from a cross of HA 89 with MC29 (USDA), R 2 was mapped to linkage group (LG) 14 of the sunflower, and not to the previously reported location on LG9. The closest SSR marker HT567 was located at 4.3 cM distal to R 2 . Furthermore, 36 selected SNP markers from LG14 were used to saturate the R 2 region. Two SNP markers, NSA_002316 and SFW01272, flanked R 2 at a genetic distance of 2.8 and 1.8 cM, respectively. Of the three closely linked markers, SFW00211 amplified an allele specific for the presence of R 2 in a marker validation set of 46 breeding lines, and SFW01272 was also shown to be diagnostic for R 2 . These newly developed markers, together with the previously identified markers linked to the gene R 13a , were used to screen 524 F2 individuals from a cross of a confection R 2 line and HA-R6 carrying R 13a . Eleven homozygous double-resistant F2 plants with the gene combination of R 2 and R 13a were obtained. This double-resistant line will be extremely useful in confection sunflower, where few rust R genes are available, risking evolution of new virulence phenotypes and further disease epidemics.

  19. Probing the Xenopus laevis inner ear transcriptome for biological function

    PubMed Central

    2012-01-01

    Background The senses of hearing and balance depend upon mechanoreception, a process that originates in the inner ear and shares features across species. Amphibians have been widely used for physiological studies of mechanotransduction by sensory hair cells. In contrast, much less is known of the genetic basis of auditory and vestibular function in this class of animals. Among amphibians, the genus Xenopus is a well-characterized genetic and developmental model that offers unique opportunities for inner ear research because of the amphibian capacity for tissue and organ regeneration. For these reasons, we implemented a functional genomics approach as a means to undertake a large-scale analysis of the Xenopus laevis inner ear transcriptome through microarray analysis. Results Microarray analysis uncovered genes within the X. laevis inner ear transcriptome associated with inner ear function and impairment in other organisms, thereby supporting the inclusion of Xenopus in cross-species genetic studies of the inner ear. The use of gene categories (inner ear tissue; deafness; ion channels; ion transporters; transcription factors) facilitated the assignment of functional significance to probe set identifiers. We enhanced the biological relevance of our microarray data by using a variety of curation approaches to increase the annotation of the Affymetrix GeneChip® Xenopus laevis Genome array. In addition, annotation analysis revealed the prevalence of inner ear transcripts represented by probe set identifiers that lack functional characterization. Conclusions We identified an abundance of targets for genetic analysis of auditory and vestibular function. The orthologues to human genes with known inner ear function and the highly expressed transcripts that lack annotation are particularly interesting candidates for future analyses. We used informatics approaches to impart biologically relevant information to the Xenopus inner ear transcriptome, thereby addressing the impediment imposed by insufficient gene annotation. These findings heighten the relevance of Xenopus as a model organism for genetic investigations of inner ear organogenesis, morphogenesis, and regeneration. PMID:22676585

  20. Profiling of gene duplication patterns of sequenced teleost genomes: evidence for rapid lineage-specific genome expansion mediated by recent tandem duplications.

    PubMed

    Lu, Jianguo; Peatman, Eric; Tang, Haibao; Lewis, Joshua; Liu, Zhanjiang

    2012-06-15

    Gene duplication has had a major impact on genome evolution. Localized (or tandem) duplication resulting from unequal crossing over and whole genome duplication are believed to be the two dominant mechanisms contributing to vertebrate genome evolution. While much scrutiny has been directed toward discerning patterns indicative of whole-genome duplication events in teleost species, less attention has been paid to the continuous nature of gene duplications and their impact on the size, gene content, functional diversity, and overall architecture of teleost genomes. Here, using a Markov clustering algorithm directed approach we catalogue and analyze patterns of gene duplication in the four model teleost species with chromosomal coordinates: zebrafish, medaka, stickleback, and Tetraodon. Our analyses based on set size, duplication type, synonymous substitution rate (Ks), and gene ontology emphasize shared and lineage-specific patterns of genome evolution via gene duplication. Most strikingly, our analyses highlight the extraordinary duplication and retention rate of recent duplicates in zebrafish and their likely role in the structural and functional expansion of the zebrafish genome. We find that the zebrafish genome is remarkable in its large number of duplicated genes, small duplicate set size, biased Ks distribution toward minimal mutational divergence, and proportion of tandem and intra-chromosomal duplicates when compared with the other teleost model genomes. The observed gene duplication patterns have played significant roles in shaping the architecture of teleost genomes and appear to have contributed to the recent functional diversification and divergence of important physiological processes in zebrafish. We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication. This is particularly true for the zebrafish genome. Further analysis of the duplicated gene sets indicated that a significant portion of duplicated genes in the zebrafish genome were of recent, lineage-specific duplication events. Most strikingly, a subset of duplicated genes is enriched among the recently duplicated genes involved in immune or sensory response pathways. Such findings demonstrated the significance of continuous gene duplication as well as that of whole genome duplication in the course of genome evolution.

  1. Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets.

    PubMed

    Park, Inho; Lee, Kwang H; Lee, Doheon

    2010-06-15

    Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. Supplementary data are available at Bioinformatics online.

  2. Artificial neural network classifier predicts neuroblastoma patients' outcome.

    PubMed

    Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi

    2016-11-08

    More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic status of the tumor in patients with poor prognosis. We developed a robust classifier predicting neuroblastoma patients' outcome with a very low error rate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the potential of using hypoxia as target for neuroblastoma treatment.

  3. Wildlife friendly roads: the impacts of roads on wildlife in urban areas and potential remedies

    USGS Publications Warehouse

    Riley, Seth P D; Brown, Justin L.; Sikich, Jeff A.; Schoonmaker, Catherine M.; Boydston, Erin E.

    2014-01-01

    Roads are one of the most important factors affecting the ability of wildlife to live and move within an urban area. Roads physically replace wildlife habitat and often reduce habitat quality nearby, fragment the remaining habitat, and cause increased mortality through vehicle collisions. Much ecological research on roads has focused on whether animals are successfully crossing roads, or if the road is a barrier to wildlife movement, gene flow, or functional connectivity. Roads can alter survival and reproduction for wildlife, even among species such as birds that cross roads easily. Here we examine the suite of potential impacts of roads on wildlife, but we focus particularly on urban settings. We report on studies, both in the literature and from our own experience, that have addressed wildlife and roads in urban landscapes. Although road ecology is a growing field of study, relatively little of this research, and relatively few mitigation projects, have been done in urban landscapes. We also draw from the available science on road impacts in rural areas when urban case studies have not fully addressed key topics.

  4. Mobilization of lipids and fortification of cell wall and cuticle are important in host defense against Hessian fly

    PubMed Central

    2013-01-01

    Background Wheat – Hessian fly interaction follows a typical gene-for-gene model. Hessian fly larvae die in wheat plants carrying an effective resistance gene, or thrive in susceptible plants that carry no effective resistance gene. Results Gene sets affected by Hessian fly attack in resistant plants were found to be very different from those in susceptible plants. Differential expression of gene sets was associated with differential accumulation of intermediates in defense pathways. Our results indicated that resources were rapidly mobilized in resistant plants for defense, including extensive membrane remodeling and release of lipids, sugar catabolism, and amino acid transport and degradation. These resources were likely rapidly converted into defense molecules such as oxylipins; toxic proteins including cysteine proteases, inhibitors of digestive enzymes, and lectins; phenolics; and cell wall components. However, toxicity alone does not cause immediate lethality to Hessian fly larvae. Toxic defenses might slow down Hessian fly development and therefore give plants more time for other types of defense to become effective. Conclusion Our gene expression and metabolic profiling results suggested that remodeling and fortification of cell wall and cuticle by increased deposition of phenolics and enhanced cross-linking were likely to be crucial for insect mortality by depriving Hessian fly larvae of nutrients from host cells. The identification of a large number of genes that were differentially expressed at different time points during compatible and incompatible interactions also provided a foundation for further research on the molecular pathways that lead to wheat resistance and susceptibility to Hessian fly infestation. PMID:23800119

  5. Discovering monotonic stemness marker genes from time-series stem cell microarray data.

    PubMed

    Wang, Hsei-Wei; Sun, Hsing-Jen; Chang, Ting-Yu; Lo, Hung-Hao; Cheng, Wei-Chung; Tseng, George C; Lin, Chin-Teng; Chang, Shing-Jyh; Pal, Nikhil; Chung, I-Fang

    2015-01-01

    Identification of genes with ascending or descending monotonic expression patterns over time or stages of stem cells is an important issue in time-series microarray data analysis. We propose a method named Monotonic Feature Selector (MFSelector) based on a concept of total discriminating error (DEtotal) to identify monotonic genes. MFSelector considers various time stages in stage order (i.e., Stage One vs. other stages, Stages One and Two vs. remaining stages and so on) and computes DEtotal of each gene. MFSelector can successfully identify genes with monotonic characteristics. We have demonstrated the effectiveness of MFSelector on two synthetic data sets and two stem cell differentiation data sets: embryonic stem cell neurogenesis (ESCN) and embryonic stem cell vasculogenesis (ESCV) data sets. We have also performed extensive quantitative comparisons of the three monotonic gene selection approaches. Some of the monotonic marker genes such as OCT4, NANOG, BLBP, discovered from the ESCN dataset exhibit consistent behavior with that reported in other studies. The role of monotonic genes found by MFSelector in either stemness or differentiation is validated using information obtained from Gene Ontology analysis and other literature. We justify and demonstrate that descending genes are involved in the proliferation or self-renewal activity of stem cells, while ascending genes are involved in differentiation of stem cells into variant cell lineages. We have developed a novel system, easy to use even with no pre-existing knowledge, to identify gene sets with monotonic expression patterns in multi-stage as well as in time-series genomics matrices. The case studies on ESCN and ESCV have helped to get a better understanding of stemness and differentiation. The novel monotonic marker genes discovered from a data set are found to exhibit consistent behavior in another independent data set, demonstrating the utility of the proposed method. The MFSelector R function and data sets can be downloaded from: http://microarray.ym.edu.tw/tools/MFSelector/.

  6. GeneTopics - interpretation of gene sets via literature-driven topic models

    PubMed Central

    2013-01-01

    Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875

  7. Development of loop-mediated isothermal amplification (LAMP) assays for the rapid detection of allergic peanut in processed food.

    PubMed

    Sheu, Shyang-Chwen; Tsou, Po-Chuan; Lien, Yi-Yang; Lee, Meng-Shiou

    2018-08-15

    Peanut is a widely and common used in many cuisines around the world. However, peanut is also one of the most important food allergen for causing anaphylactic reaction. To prevent allergic reaction, the best way is to avoid the food allergen or food containing allergic ingredient such as peanut before food consuming. Thus, to efficient and precisely detect the allergic ingredient, peanut or related product, is essential and required for maintain consumer's health or their interest. In this study, a loop-mediated isothermal amplification (LAMP) assay was developed for the detection of allergic peanut using specifically designed primer sets. Two sets of the specific LAMP primers respectively targeted the internal transcribed sequence 1 (ITS1) of nuclear ribosomal DNA sequence regions and the ara h1 gene sequence of Arachia hypogeae (peanut) were used to address the application of LAMP for detecting peanut in processed food or diet. The results demonstrated that the identification of peanut using the newly designed primers for ITS 1 sequence is more sensitive rather than primers for sequence of Ara h1 gene when performing LAMP assay. Besides, the sensitivity of LAMP for detecting peanut is also higher than the traditional PCR method. These LAMP primers sets showed high specificity for the identification of the peanut and had no cross-reaction to other species of nut including walnut, hazelnut, almonds, cashew and macadamia nut. Moreover, when minimal 0.1% peanuts were mixed with other nuts ingredients at different ratios, no any cross-reactivity was evident during performing LAMP. Finally, genomic DNAs extracted from boiled and steamed peanut were used as templates; the detection of peanut by LAMP was not affected and reproducible. As to this established LAMP herein, not only can peanut ingredients be detected but commercial foods containing peanut can also be identified. This assay will be useful and potential for the rapid detection of peanut in practical food markets. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Evaluation of endogenous control genes for gene expression studies across multiple tissues and in the specific sets of fat- and muscle-type samples of the pig.

    PubMed

    Gu, Y R; Li, M Z; Zhang, K; Chen, L; Jiang, A A; Wang, J Y; Li, X W

    2011-08-01

    To normalize a set of quantitative real-time PCR (q-PCR) data, it is essential to determine an optimal number/set of housekeeping genes, as the abundance of housekeeping genes can vary across tissues or cells during different developmental stages, or even under certain environmental conditions. In this study, of the 20 commonly used endogenous control genes, 13, 18 and 17 genes exhibited credible stability in 56 different tissues, 10 types of adipose tissue and five types of muscle tissue, respectively. Our analysis clearly showed that three optimal housekeeping genes are adequate for an accurate normalization, which correlated well with the theoretical optimal number (r ≥ 0.94). In terms of economical and experimental feasibility, we recommend the use of the three most stable housekeeping genes for calculating the normalization factor. Based on our results, the three most stable housekeeping genes in all analysed samples (TOP2B, HSPCB and YWHAZ) are recommended for accurate normalization of q-PCR data. We also suggest that two different sets of housekeeping genes are appropriate for 10 types of adipose tissue (the HSPCB, ALDOA and GAPDH genes) and five types of muscle tissue (the TOP2B, HSPCB and YWHAZ genes), respectively. Our report will serve as a valuable reference for other studies aimed at measuring tissue-specific mRNA abundance in porcine samples. © 2011 Blackwell Verlag GmbH.

  9. Climacteric ripening of apple fruit is regulated by transcriptional circuits stimulated by cross-talks between ethylene and auxin.

    PubMed

    Busatto, Nicola; Tadiello, Alice; Trainotti, Livio; Costa, Fabrizio

    2017-01-02

    Apple is a fleshy fruit distinguished by a climacteric type of ripening, since most of the relevant physiological changes are triggered and governed by the action of ethylene. After its production, this hormone is perceived by a series of receptors to regulate, through a signaling cascade, downstream ethylene related genes. The possibility to control the effect of ethylene opened new horizons to the improvement of the postharvest fruit quality. To this end, 1-methylcyclopropene (1-MCP), an ethylene antagonist, is routinely used to modulate the ripening progression increasing storage life. In a recent work published in The Plant Journal, the whole transcriptome variation throughout fruit development and ripening, with the adjunct comparison between normal and impaired postharvest ripening, has been illustrated. In particular, besides the expected downregulation of ethylene-regulated genes, we shed light on a regulatory circuit leading to de-repressing the expression of a specific set of genes following 1-MCP treatment, such as AUX/IAA, NAC and MADS. These findings suggested the existence of a possible ethylene/auxin cross-talk in apple, regulated by a transcriptional circuit stimulated by the interference at the ethylene receptor level.

  10. Dissecting the genetic architecture of F1 hybrid sterility in house mice.

    PubMed

    Dzur-Gejdosova, Maria; Simecek, Petr; Gregorova, Sona; Bhattacharyya, Tanmoy; Forejt, Jiri

    2012-11-01

    Hybrid sterility as a postzygotic reproductive isolation mechanism has been studied for over 80 years, yet the first identifications of hybrid sterility genes in Drosophila and mouse are quite recent. To study the genetic architecture of F(1) hybrid sterility between young subspecies of house mouse Mus m. domesticus and M. m. musculus, we conducted QTL analysis of a backcross between inbred strains representing these two subspecies and probed the role of individual chromosomes in hybrid sterility using the intersubspecific chromosome substitution strains. We provide direct evidence that the asymmetry in male infertility between reciprocal crosses is conferred by the middle region of M. m. musculus Chr X, thus excluding other potential candidates such as Y, imprinted genes, and mitochondrial DNA. QTL analysis identified strong hybrid sterility loci on Chr 17 and Chr X and predicted a set of interchangeable autosomal loci, a subset of which is sufficient to activate the Dobzhansky-Muller incompatibility of the strong loci. Overall, our results indicate the oligogenic nature of F(1) hybrid sterility, which should be amenable to reconstruction by proper combination of chromosome substitution strains. Such a prefabricated model system should help to uncover the gene networks and molecular mechanisms underlying hybrid sterility. © 2012 The Author(s). Evolution© 2012 The Society for the Study of Evolution.

  11. ADGO: analysis of differentially expressed gene sets using composite GO annotation.

    PubMed

    Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun

    2006-09-15

    Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.

  12. Development of duplex PCR for simultaneous detection of Theileria spp. and Anaplasma spp. in sheep and goats.

    PubMed

    Cui, Yanyan; Zhang, Yan; Jian, Fuchun; Zhang, Longxian; Wang, Rongjun; Cao, Shuxuan; Wang, Xiaoxing; Yan, Yaqun; Ning, Changshen

    2017-05-01

    Theileria spp. and Anaplasma spp., which are important tick-borne pathogens (TBPs), impact the health of humans and animals in tropical and subtropical areas. Theileria and Anaplasma co-infections are common in sheep and goats. Following alignment of the relevant DNA sequences, two primer sets were designed to specifically target the Theileria spp. 18S rRNA and Anaplasma spp. 16S rRNA gene sequences. Genomic DNA from the two genera was serially diluted tenfold for testing the sensitivities of detection of the primer sets. The specificities of the primer sets were confirmed when DNA from Anaplasma and Theileria (positive controls), other related hematoparasites (negative controls) and ddH 2 O were used as templates. Fifty field samples were also used to evaluate the utility of single PCR and duplex PCR assays, and the detection results were compared with those of the PCR methods previously published. An optimized duplex PCR assay was established from the two primer sets based on the relevant genes from the two TBPs, and this assay generated products of 298-bp (Theileria spp.) and 139-bp (Anaplasma spp.). The detection limit of the assay was 29.4 × 10 -3  ng per μl, and there was no cross-reaction with the DNA from other hematoparasites. The results showed that the newly developed duplex PCR assay had an efficiency of detection (P > 0.05) similar to other published PCR methods. In this study, a duplex PCR assay was developed that can simultaneously identify Theileria spp. and Anaplasma spp. in sheep and goats. This duplex PCR is a potentially valuable assay for epidemiological studies of TBPs in that it can detect cases of mixed infections of the pathogens. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. A cross-species bi-clustering approach to identifying conserved co-regulated genes.

    PubMed

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-06-15

    A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.

  14. Genetic connectivity for two bear species at wildlife crossing structures in Banff National Park.

    PubMed

    Sawaya, Michael A; Kalinowski, Steven T; Clevenger, Anthony P

    2014-04-07

    Roads can fragment and isolate wildlife populations, which will eventually decrease genetic diversity within populations. Wildlife crossing structures may counteract these impacts, but most crossings are relatively new, and there is little evidence that they facilitate gene flow. We conducted a three-year research project in Banff National Park, Alberta, to evaluate the effectiveness of wildlife crossings to provide genetic connectivity. Our main objective was to determine how the Trans-Canada Highway and crossing structures along it affect gene flow in grizzly (Ursus arctos) and black bears (Ursus americanus). We compared genetic data generated from wildlife crossings with data collected from greater bear populations. We detected a genetic discontinuity at the highway in grizzly bears but not in black bears. We assigned grizzly bears that used crossings to populations north and south of the highway, providing evidence of bidirectional gene flow and genetic admixture. Parentage tests showed that 47% of black bears and 27% of grizzly bears that used crossings successfully bred, including multiple males and females of both species. Differentiating between dispersal and gene flow is difficult, but we documented gene flow by showing migration, reproduction and genetic admixture. We conclude that wildlife crossings allow sufficient gene flow to prevent genetic isolation.

  15. Genetic connectivity for two bear species at wildlife crossing structures in Banff National Park

    PubMed Central

    Sawaya, Michael A.; Kalinowski, Steven T.; Clevenger, Anthony P.

    2014-01-01

    Roads can fragment and isolate wildlife populations, which will eventually decrease genetic diversity within populations. Wildlife crossing structures may counteract these impacts, but most crossings are relatively new, and there is little evidence that they facilitate gene flow. We conducted a three-year research project in Banff National Park, Alberta, to evaluate the effectiveness of wildlife crossings to provide genetic connectivity. Our main objective was to determine how the Trans-Canada Highway and crossing structures along it affect gene flow in grizzly (Ursus arctos) and black bears (Ursus americanus). We compared genetic data generated from wildlife crossings with data collected from greater bear populations. We detected a genetic discontinuity at the highway in grizzly bears but not in black bears. We assigned grizzly bears that used crossings to populations north and south of the highway, providing evidence of bidirectional gene flow and genetic admixture. Parentage tests showed that 47% of black bears and 27% of grizzly bears that used crossings successfully bred, including multiple males and females of both species. Differentiating between dispersal and gene flow is difficult, but we documented gene flow by showing migration, reproduction and genetic admixture. We conclude that wildlife crossings allow sufficient gene flow to prevent genetic isolation. PMID:24552834

  16. Heterotic patterns in rapeseed (Brassica napus L.): I. Crosses between spring and Chinese semi-winter lines.

    PubMed

    Qian, W; Sass, O; Meng, J; Li, M; Frauen, M; Jung, C

    2007-06-01

    Chinese semi-winter rapeseed is genetically diverse from Canadian and European spring rapeseed. This study was conducted to evaluate the potential of semi-winter rapeseed for spring rapeseed hybrid breeding, to assess the genetic effects involved, and to estimate the correlation of parental genetic distance (GD) with hybrid performance, heterosis, general combining ability (GCA) and specific combining ability (SCA) in crosses between spring and semi-winter rapeseed lines. Four spring male sterile lines from Germany and Canada as testers were crossed with 13 Chinese semi-winter rapeseed lines to develop 52 hybrids, which were evaluated together with their parents and commercial hybrids for seed yield and oil content in three sets of field trials with 8 environments in Canada and Europe. The Chinese parental lines were not adapted to local environmental conditions as demonstrated by poor seed yields per se. However, the hybrids between the Chinese parents and the adapted spring rapeseed lines exhibited high heterosis for seed yield. The average mid-parent heterosis was 15% and ca. 50% of the hybrids were superior to the respective hybrid control across three sets of field trials. Additive gene effects mainly contributed to hybrid performance since the mean squares of GCA were higher as compared to SCA. The correlation between parental GD and hybrid performance and heterosis was found to be low whereas the correlation between GCA((f + m)) and hybrid performance was high and significant in each set of field trials, with an average of r = 0.87 for seed yield and r = 0.89 for oil content, indicating that hybrid performance can be predicted by GCA((f + m)). These results demonstrate that Chinese semi-winter rapeseed germplasm has a great potential to increase seed yield in spring rapeseed hybrid breeding programs in Canada and Europe.

  17. Differential expression of human lysyl hydroxylase genes, lysine hydroxylation, and cross-linking of type I collagen during osteoblastic differentiation in vitro

    NASA Technical Reports Server (NTRS)

    Uzawa, K.; Grzesik, W. J.; Nishiura, T.; Kuznetsov, S. A.; Robey, P. G.; Brenner, D. A.; Yamauchi, M.

    1999-01-01

    The pattern of lysyl hydroxylation in the nontriple helical domains of collagen is critical in determining the cross-linking pathways that are tissue specific. We hypothesized that the tissue specificity of type I collagen cross-linking is, in part, due to the differential expression of lysyl hydroxylase genes (Procollagen-lysine,2-oxyglutarate,5-dioxygenase 1, 2, and 3 [PLOD1, PLOD2, and PLOD3]). In this study, we have examined the expression patterns of these three genes during the course of in vitro differentiation of human osteoprogenitor cells (bone marrow stromal cells [BMSCs]) and normal skin fibroblasts (NSFs). In addition, using the medium and cell layer/matrix fractions in these cultures, lysine hydroxylation of type I collagen alpha chains and collagen cross-linking chemistries have been characterized. High levels of PLOD1 and PLOD3 genes were expressed in both BMSCs and NSFs, and the expression levels did not change in the course of differentiation. In contrast to the PLOD1 and PLOD3 genes, both cell types showed low PLOD2 gene expression in undifferentiated and early differentiated conditions. However, fully differentiated BMSCs, but not NSFs, exhibited a significantly elevated level (6-fold increase) of PLOD2 mRNA. This increase coincided with the onset of matrix mineralization and with the increase in lysyl hydroxylation in the nontriple helical domains of alpha chains of type I collagen molecule. Furthermore, the collagen cross-links that are derived from the nontriple helical hydroxylysine-aldehyde were found only in fully differentiated BMSC cultures. The data suggests that PLOD2 expression is associated with lysine hydroxylation in the nontriple helical domains of collagen and, thus, could be partially responsible for the tissue-specific collagen cross-linking pattern.

  18. Cross-organism learning method to discover new gene functionalities.

    PubMed

    Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro

    2016-04-01

    Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  19. Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data.

    PubMed

    Lee, Hyeonjeong; Shin, Miyoung

    2017-01-01

    The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.

  20. Integrative analysis of GWAS, eQTLs and meQTLs data suggests that multiple gene sets are associated with bone mineral density.

    PubMed

    Wang, W; Huang, S; Hou, W; Liu, Y; Fan, Q; He, A; Wen, Y; Hao, J; Guo, X; Zhang, F

    2017-10-01

    Several genome-wide association studies (GWAS) of bone mineral density (BMD) have successfully identified multiple susceptibility genes, yet isolated susceptibility genes are often difficult to interpret biologically. The aim of this study was to unravel the genetic background of BMD at pathway level, by integrating BMD GWAS data with genome-wide expression quantitative trait loci (eQTLs) and methylation quantitative trait loci (meQTLs) data METHOD: We employed the GWAS datasets of BMD from the Genetic Factors for Osteoporosis Consortium (GEFOS), analysing patients' BMD. The areas studied included 32 735 femoral necks, 28 498 lumbar spines, and 8143 forearms. Genome-wide eQTLs (containing 923 021 eQTLs) and meQTLs (containing 683 152 unique methylation sites with local meQTLs) data sets were collected from recently published studies. Gene scores were first calculated by summary data-based Mendelian randomisation (SMR) software and meQTL-aligned GWAS results. Gene set enrichment analysis (GSEA) was then applied to identify BMD-associated gene sets with a predefined significance level of 0.05. We identified multiple gene sets associated with BMD in one or more regions, including relevant known biological gene sets such as the Reactome Circadian Clock (GSEA p-value = 1.0 × 10 -4 for LS and 2.7 × 10 -2 for femoral necks BMD in eQTLs-based GSEA) and insulin-like growth factor receptor binding (GSEA p-value = 5.0 × 10 -4 for femoral necks and 2.6 × 10 -2 for lumbar spines BMD in meQTLs-based GSEA). Our results provided novel clues for subsequent functional analysis of bone metabolism, and illustrated the benefit of integrating eQTLs and meQTLs data into pathway association analysis for genetic studies of complex human diseases. Cite this article : W. Wang, S. Huang, W. Hou, Y. Liu, Q. Fan, A. He, Y. Wen, J. Hao, X. Guo, F. Zhang. Integrative analysis of GWAS, eQTLs and meQTLs data suggests that multiple gene sets are associated with bone mineral density. Bone Joint Res 2017;6:572-576. © 2017 Wang et al.

  1. Microgravity and Immunity: Changes in Lymphocyte Gene Expression

    NASA Technical Reports Server (NTRS)

    Risin, D.; Pellis, N. R.; Ward, N. E.; Risin, S. A.

    2006-01-01

    Earlier studies had shown that modeled and true microgravity (MG) cause multiple direct effects on human lymphocytes. MG inhibits lymphocyte locomotion, suppresses polyclonal and antigen-specific activation, affects signal transduction mechanisms, as well as activation-induced apoptosis. In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes (MGSG) in general and specifically those genes that might be responsible for the functional and structural changes observed earlier. Two sets of experiments targeting different goals were conducted. In the first set, T-lymphocytes from normal donors were activated with antiCD3 and IL2 and then cultured in 1g (static) and modeled MG (MMG) conditions (Rotating Wall Vessel bioreactor) for 24 hours. This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity. In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus (PHA) thus triggering the apoptotic pathway. Total RNA was extracted using the RNeasy isolation kit (Qiagen, Valencia, CA). Affymetrix Gene Chips (U133A), allowing testing for 18,400 human genes, were used for microarray analysis. In the first set of experiments MMG exposure resulted in altered expression of 89 genes, 10 of them were up-regulated and 79 down-regulated. In the second set, changes in expression were revealed in 85 genes, 20 were up-regulated and 65 were down-regulated. The analysis revealed that significant numbers of MGS genes are associated with signal transduction and apoptotic pathways. Interestingly, the majority of genes that responded by up- or down-regulation in the alternative sets of experiments were not the same, possibly reflecting different functional states of the examined T-lymphocyte populations. The responder genes (MGSG) might play an essential role in adaptation to MG and/or be responsible for pathologic changes encountered in Space and thus represent potential targets for molecular-based countermeasures

  2. Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling

    PubMed Central

    Fang, H; Tong, W; Perkins, R; Shi, L; Hong, H; Cao, X; Xie, Q; Yim, SH; Ward, JM; Pitot, HC; Dragan, YP

    2005-01-01

    Background The completion of the sequencing of human, mouse and rat genomes and knowledge of cross-species gene homologies enables studies of differential gene expression in animal models. These types of studies have the potential to greatly enhance our understanding of diseases such as liver cancer in humans. Genes co-expressed across multiple species are most likely to have conserved functions. We have used various bioinformatics approaches to examine microarray expression profiles from liver neoplasms that arise in albumin-SV40 transgenic rats to elucidate genes, chromosome aberrations and pathways that might be associated with human liver cancer. Results In this study, we first identified 2223 differentially expressed genes by comparing gene expression profiles for two control, two adenoma and two carcinoma samples using an F-test. These genes were subsequently mapped to the rat chromosomes using a novel visualization tool, the Chromosome Plot. Using the same plot, we further mapped the significant genes to orthologous chromosomal locations in human and mouse. Many genes expressed in rat 1q that are amplified in rat liver cancer map to the human chromosomes 10, 11 and 19 and to the mouse chromosomes 7, 17 and 19, which have been implicated in studies of human and mouse liver cancer. Using Comparative Genomics Microarray Analysis (CGMA), we identified regions of potential aberrations in human. Lastly, a pathway analysis was conducted to predict altered human pathways based on statistical analysis and extrapolation from the rat data. All of the identified pathways have been known to be important in the etiology of human liver cancer, including cell cycle control, cell growth and differentiation, apoptosis, transcriptional regulation, and protein metabolism. Conclusion The study demonstrates that the hepatic gene expression profiles from the albumin-SV40 transgenic rat model revealed genes, pathways and chromosome alterations consistent with experimental and clinical research in human liver cancer. The bioinformatics tools presented in this paper are essential for cross species extrapolation and mapping of microarray data, its analysis and interpretation. PMID:16026603

  3. The Genome-Wide Interaction Network of Nutrient Stress Genes in Escherichia coli.

    PubMed

    Côté, Jean-Philippe; French, Shawn; Gehrke, Sebastian S; MacNair, Craig R; Mangat, Chand S; Bharat, Amrita; Brown, Eric D

    2016-11-22

    Conventional efforts to describe essential genes in bacteria have typically emphasized nutrient-rich growth conditions. Of note, however, are the set of genes that become essential when bacteria are grown under nutrient stress. For example, more than 100 genes become indispensable when the model bacterium Escherichia coli is grown on nutrient-limited media, and many of these nutrient stress genes have also been shown to be important for the growth of various bacterial pathogens in vivo To better understand the genetic network that underpins nutrient stress in E. coli, we performed a genome-scale cross of strains harboring deletions in some 82 nutrient stress genes with the entire E. coli gene deletion collection (Keio) to create 315,400 double deletion mutants. An analysis of the growth of the resulting strains on rich microbiological media revealed an average of 23 synthetic sick or lethal genetic interactions for each nutrient stress gene, suggesting that the network defining nutrient stress is surprisingly complex. A vast majority of these interactions involved genes of unknown function or genes of unrelated pathways. The most profound synthetic lethal interactions were between nutrient acquisition and biosynthesis. Further, the interaction map reveals remarkable metabolic robustness in E. coli through pathway redundancies. In all, the genetic interaction network provides a powerful tool to mine and identify missing links in nutrient synthesis and to further characterize genes of unknown function in E. coli Moreover, understanding of bacterial growth under nutrient stress could aid in the development of novel antibiotic discovery platforms. With the rise of antibiotic drug resistance, there is an urgent need for new antibacterial drugs. Here, we studied a group of genes that are essential for the growth of Escherichia coli under nutrient limitation, culture conditions that arguably better represent nutrient availability during an infection than rich microbiological media. Indeed, many such nutrient stress genes are essential for infection in a variety of pathogens. Thus, the respective proteins represent a pool of potential new targets for antibacterial drugs that have been largely unexplored. We have created all possible double deletion mutants through a genetic cross of nutrient stress genes and the E. coli deletion collection. An analysis of the growth of the resulting clones on rich media revealed a robust, dense, and complex network for nutrient acquisition and biosynthesis. Importantly, our data reveal new genetic connections to guide innovative approaches for the development of new antibacterial compounds targeting bacteria under nutrient stress. Copyright © 2016 Côté et al.

  4. Parallel evolution of chordate cis-regulatory code for development.

    PubMed

    Doglio, Laura; Goode, Debbie K; Pelleri, Maria C; Pauls, Stefan; Frabetti, Flavia; Shimeld, Sebastian M; Vavouri, Tanya; Elgar, Greg

    2013-11-01

    Urochordates are the closest relatives of vertebrates and at the larval stage, possess a characteristic bilateral chordate body plan. In vertebrates, the genes that orchestrate embryonic patterning are in part regulated by highly conserved non-coding elements (CNEs), yet these elements have not been identified in urochordate genomes. Consequently the evolution of the cis-regulatory code for urochordate development remains largely uncharacterised. Here, we use genome-wide comparisons between C. intestinalis and C. savignyi to identify putative urochordate cis-regulatory sequences. Ciona conserved non-coding elements (ciCNEs) are associated with largely the same key regulatory genes as vertebrate CNEs. Furthermore, some of the tested ciCNEs are able to activate reporter gene expression in both zebrafish and Ciona embryos, in a pattern that at least partially overlaps that of the gene they associate with, despite the absence of sequence identity. We also show that the ability of a ciCNE to up-regulate gene expression in vertebrate embryos can in some cases be localised to short sub-sequences, suggesting that functional cross-talk may be defined by small regions of ancestral regulatory logic, although functional sub-sequences may also be dispersed across the whole element. We conclude that the structure and organisation of cis-regulatory modules is very different between vertebrates and urochordates, reflecting their separate evolutionary histories. However, functional cross-talk still exists because the same repertoire of transcription factors has likely guided their parallel evolution, exploiting similar sets of binding sites but in different combinations.

  5. Identification of VEGF-regulated genes associated with increased lung metastatic potential: functional involvement of tenascin-C in tumor growth and lung metastasis

    PubMed Central

    Calvo, A; Catena, R; Noble, MS; Carbott, D; Gil-Bazo, I; Gonzalez-Moreno, O; Huh, J-I; Sharp, R; Qiu, T-H; Anver, MR; Merlino, G; Dickson, RB; Johnson, MD; Green, JE

    2009-01-01

    Metastasis is the primary cause of death in patients with breast cancer. Overexpression of c-myc in humans correlates with metastases, but transgenic mice only show low rates of micrometastases. We have generated transgenic mice that overexpress both c-myc and vascular endothelial growth factor (VEGF) (Myc/VEGF) in the mammary gland, which develop high rates of pulmonary macrometastases. Gene expression profiling revealed a set of deregulated genes in Myc/VEGF tumors compared to Myc tumors associated with the increased metastatic phenotype. Cross-comparisons between this set of genes with a human breast cancer lung metastasis gene signature identified five common targets: tenascin-C (TNC), matrix metalloprotease-2, collagen-6-A1, mannosidase-α-1A and HLA-DPA1. Signaling blockade or knockdown of TNC in MDA-MB-435 cells resulted in a significant impairment of cell migration and anchorage-independent cell proliferation. Mice injected with clonal MDA-MB-435 cells with reduced expression of TNC demonstrated a significant decrease (P < 0.05) in (1) primary tumor growth; (2) tumor relapse after surgical removal of the primary tumor and (3) incidence of lung metastasis. Our results demonstrate that VEGF induces complex alterations in tissue architecture and gene expression. The TNC signaling pathway plays an important role in mammary tumor growth and metastases, suggesting that TNC may be a relevant target for therapy against metastatic breast cancer. PMID:18504437

  6. Mammary epithelial-specific disruption of focal adhesion kinase retards tumor formation and metastasis in a transgenic mouse model of human breast cancer.

    PubMed

    Provenzano, Paolo P; Inman, David R; Eliceiri, Kevin W; Beggs, Hilary E; Keely, Patricia J

    2008-11-01

    Focal adhesion kinase (FAK) is a central regulator of the focal adhesion, influencing cell proliferation, survival, and migration. Despite evidence demonstrating FAK overexpression in human cancer, its role in tumor initiation and progression is not well understood. Using Cre/LoxP technology to specifically knockout FAK in the mammary epithelium, we showed that FAK is not required for tumor initiation but is required for tumor progression. The mechanistic underpinnings of these results suggested that FAK regulates clinically relevant gene signatures and multiple signaling complexes associated with tumor progression and metastasis, such as Src, ERK, and p130Cas. Furthermore, a systems-level analysis identified FAK as a major regulator of the tumor transcriptome, influencing genes associated with adhesion and growth factor signaling pathways, and their cross talk. Additionally, FAK was shown to down-regulate the expression of clinically relevant proliferation- and metastasis-associated gene signatures, as well as an enriched group of genes associated with the G(2) and G(2)/M phases of the cell cycle. Computational analysis of transcription factor-binding sites within ontology-enriched or clustered gene sets suggested that the differentially expressed proliferation- and metastasis-associated genes in FAK-null cells were regulated through a common set of transcription factors, including p53. Therefore, FAK acts as a primary node in the activated signaling network in transformed motile cells and is a prime candidate for novel therapeutic interventions to treat aggressive human breast cancers.

  7. Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure.

    PubMed

    Lustgarten, Jonathan Lyle; Balasubramanian, Jeya Balaji; Visweswaran, Shyam; Gopalakrishnan, Vanathi

    2017-03-01

    The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.

  8. Ocimum basilicum miRNOME revisited: A cross kingdom approach.

    PubMed

    Patel, Maulikkumar; Patel, Shanaya; Mangukia, Naman; Patel, Saumya; Mankad, Archana; Pandya, Himanshu; Rawal, Rakesh

    2018-05-15

    O. basilicum is medicinally important herb having inevitable role in human health. However, the mechanism of action is largely unknown. Present study aims to understand the mechanism of regulation of key human target genes that could plausibly modulated by O. basilicum miRNAs in cross kingdom manner using computational and system biology approach. O. basilicum miRNA sequences were retrieved and their corresponding human target genes were identified using psRNA target and interaction analysis of hub nodes. Six O. basilicum derived miRNAs were found to modulate 26 human target genes which were associated `with PI3K-AKTand MAPK signaling pathways with PTPN11, EIF2S2, NOS1, IRS1 and USO1 as top 5 Hub nodes. O. basilicum miRNAs not only regulate key human target genes having a significance in various diseases but also paves the path for future studies that might explore potential of miRNA mediated cross-kingdom regulation, prevention and treatment of various human diseases including cancer. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Quantitative genetic analysis of agronomic and morphological traits in sorghum, Sorghum bicolor

    PubMed Central

    Mohammed, Riyazaddin; Are, Ashok K.; Bhavanasi, Ramaiah; Munghate, Rajendra S.; Kavi Kishor, Polavarapu B.; Sharma, Hari C.

    2015-01-01

    The productivity in sorghum is low, owing to various biotic and abiotic constraints. Combining insect resistance with desirable agronomic and morphological traits is important to increase sorghum productivity. Therefore, it is important to understand the variability for various agronomic traits, their heritabilities and nature of gene action to develop appropriate strategies for crop improvement. Therefore, a full diallel set of 10 parents and their 90 crosses including reciprocals were evaluated in replicated trials during the 2013–14 rainy and postrainy seasons. The crosses between the parents with early- and late-flowering flowered early, indicating dominance of earliness for anthesis in the test material used. Association between the shoot fly resistance, morphological, and agronomic traits suggested complex interactions between shoot fly resistance and morphological traits. Significance of the mean sum of squares for GCA (general combining ability) and SCA (specific combining ability) of all the studied traits suggested the importance of both additive and non-additive components in inheritance of these traits. The GCA/SCA, and the predictability ratios indicated predominance of additive gene effects for majority of the traits studied. High broad-sense and narrow-sense heritability estimates were observed for most of the morphological and agronomic traits. The significance of reciprocal combining ability effects for days to 50% flowering, plant height and 100 seed weight, suggested maternal effects for inheritance of these traits. Plant height and grain yield across seasons, days to 50% flowering, inflorescence exsertion, and panicle shape in the postrainy season showed greater specific combining ability variance, indicating the predominance of non-additive type of gene action/epistatic interactions in controlling the expression of these traits. Additive gene action in the rainy season, and dominance in the postrainy season for days to 50% flowering and plant height suggested G X E interactions for these traits. PMID:26579183

  10. The PhenX Toolkit: Get the Most From Your Measures

    PubMed Central

    Hamilton, Carol M.; Strader, Lisa C.; Pratt, Joseph G.; Maiese, Deborah; Hendershot, Tabitha; Kwok, Richard K.; Hammond, Jane A.; Huggins, Wayne; Jackman, Dean; Pan, Huaqin; Nettles, Destiney S.; Beaty, Terri H.; Farrer, Lindsay A.; Kraft, Peter; Marazita, Mary L.; Ordovas, Jose M.; Pato, Carlos N.; Spitz, Margaret R.; Wagener, Diane; Williams, Michelle; Junkins, Heather A.; Harlan, William R.; Ramos, Erin M.; Haines, Jonathan

    2011-01-01

    The potential for genome-wide association studies to relate phenotypes to specific genetic variation is greatly increased when data can be combined or compared across multiple studies. To facilitate replication and validation across studies, RTI International (Research Triangle Park, North Carolina) and the National Human Genome Research Institute (Bethesda, Maryland) are collaborating on the consensus measures for Phenotypes and eXposures (PhenX) project. The goal of PhenX is to identify 15 high-priority, well-established, and broadly applicable measures for each of 21 research domains. PhenX measures are selected by working groups of domain experts using a consensus process that includes input from the scientific community. The selected measures are then made freely available to the scientific community via the PhenX Toolkit. Thus, the PhenX Toolkit provides the research community with a core set of high-quality, well-established, low-burden measures intended for use in large-scale genomic studies. PhenX measures will have the most impact when included at the experimental design stage. The PhenX Toolkit also includes links to standards and resources in an effort to facilitate data harmonization to legacy data. Broad acceptance and use of PhenX measures will promote cross-study comparisons to increase statistical power for identifying and replicating variants associated with complex diseases and with gene-gene and gene-environment interactions. PMID:21749974

  11. Use of homologous and heterologous gene expression profiling tools to characterize transcription dynamics during apple fruit maturation and ripening.

    PubMed

    Costa, Fabrizio; Alba, Rob; Schouten, Henk; Soglio, Valeria; Gianfranceschi, Luca; Serra, Sara; Musacchi, Stefano; Sansavini, Silviero; Costa, Guglielmo; Fei, Zhangjun; Giovannoni, James

    2010-10-25

    Fruit development, maturation and ripening consists of a complex series of biochemical and physiological changes that in climacteric fruits, including apple and tomato, are coordinated by the gaseous hormone ethylene. These changes lead to final fruit quality and understanding of the functional machinery underlying these processes is of both biological and practical importance. To date many reports have been made on the analysis of gene expression in apple. In this study we focused our investigation on the role of ethylene during apple maturation, specifically comparing transcriptomics of normal ripening with changes resulting from application of the hormone receptor competitor 1-methylcyclopropene. To gain insight into the molecular process regulating ripening in apple, and to compare to tomato (model species for ripening studies), we utilized both homologous and heterologous (tomato) microarray to profile transcriptome dynamics of genes involved in fruit development and ripening, emphasizing those which are ethylene regulated.The use of both types of microarrays facilitated transcriptome comparison between apple and tomato (for the later using data previously published and available at the TED: tomato expression database) and highlighted genes conserved during ripening of both species, which in turn represent a foundation for further comparative genomic studies. The cross-species analysis had the secondary aim of examining the efficiency of heterologous (specifically tomato) microarray hybridization for candidate gene identification as related to the ripening process. The resulting transcriptomics data revealed coordinated gene expression during fruit ripening of a subset of ripening-related and ethylene responsive genes, further facilitating the analysis of ethylene response during fruit maturation and ripening. Our combined strategy based on microarray hybridization enabled transcriptome characterization during normal climacteric apple ripening, as well as definition of ethylene-dependent transcriptome changes. Comparison with tomato fruit maturation and ethylene responsive transcriptome activity facilitated identification of putative conserved orthologous ripening-related genes, which serve as an initial set of candidates for assessing conservation of gene activity across genomes of fruit bearing plant species.

  12. NCBI GEO: archive for functional genomics data sets--update.

    PubMed

    Barrett, Tanya; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra

    2013-01-01

    The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

  13. categoryCompare, an analytical tool based on feature annotations

    PubMed Central

    Flight, Robert M.; Harrison, Benjamin J.; Mohammad, Fahim; Bunge, Mary B.; Moon, Lawrence D. F.; Petruska, Jeffrey C.; Rouchka, Eric C.

    2014-01-01

    Assessment of high-throughput—omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered “biomarkers.” The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them. We developed a methodology, categoryCompare, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: (1) denervated skin vs. denervated muscle, and (2) colon from Crohn's disease vs. colon from ulcerative colitis (UC). The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined. In the skin vs. muscle denervation comparison, the tissues demonstrated markedly different responses. The Crohn's vs. UC comparison showed gross similarities in inflammatory response in the two diseases, with particular processes specific to each disease. PMID:24808906

  14. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI.

    PubMed

    Vimaleswaran, Karani S; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N; Dudbridge, Frank; Loos, Ruth J F

    2012-10-15

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.

  15. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  16. Reverse Engineering of Modified Genes by Bayesian Network Analysis Defines Molecular Determinants Critical to the Development of Glioblastoma

    PubMed Central

    Kunkle, Brian W.; Yoo, Changwon; Roy, Deodutta

    2013-01-01

    In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I–IV), and ‘key genes’ within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated). These 10 genes were able to predict tumor status with 96–100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential ‘hubs of activity’. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several ‘key genes’ may be required for the development of glioblastoma. Further studies are needed to validate these ‘key genes’ as useful tools for early detection and novel therapeutic options for these tumors. PMID:23737970

  17. An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

    PubMed

    Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H

    2015-01-01

    Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

  18. Diagnostic testing for pandemic influenza in Singapore: a novel dual-gene quantitative real-time RT-PCR for the detection of influenza A/H1N1/2009.

    PubMed

    Lee, Hong Kai; Lee, Chun Kiat; Loh, Tze Ping; Tang, Julian Wei-Tze; Chiu, Lily; Tambyah, Paul A; Sethi, Sunil K; Koay, Evelyn Siew-Chuan

    2010-09-01

    With the relative global lack of immunity to the pandemic influenza A/H1N1/2009 virus that emerged in April 2009 as well as the sustained susceptibility to infection, rapid and accurate diagnostic assays are essential to detect this novel influenza A variant. Among the molecular diagnostic methods that have been developed to date, most are in tandem monoplex assays targeting either different regions of a single viral gene segment or different viral gene segments. We describe a dual-gene (duplex) quantitative real-time RT-PCR method selectively targeting pandemic influenza A/H1N1/2009. The assay design includes a primer-probe set specific to only the hemagglutinin (HA) gene of this novel influenza A variant and a second set capable of detecting the nucleoprotein (NP) gene of all swine-origin influenza A virus. In silico analysis of the specific HA oligonucleotide sequence used in the assay showed that it targeted only the swine-origin pandemic strain; there was also no cross-reactivity against a wide spectrum of noninfluenza respiratory viruses. The assay has a diagnostic sensitivity and specificity of 97.7% and 100%, respectively, a lower detection limit of 50 viral gene copies/PCR, and can be adapted to either a qualitative or quantitative mode. It was first applied to 3512 patients with influenza-like illnesses at a tertiary hospital in Singapore, during the containment phase of the pandemic (May to July 2009).

  19. The potential for crop to wild hybridization in eggplant (Solanum melongena; Solanaceae) in southern India.

    PubMed

    Davidar, Priya; Snow, Allison A; Rajkumar, Muthu; Pasquet, Remy; Daunay, Marie-Christine; Mutegi, Evans

    2015-01-01

    • In India and elsewhere, transgenic Bt eggplant (Solanum melongena) has been developed to reduce insect herbivore damage, but published studies of the potential for pollen-mediated, crop- to- wild gene flow are scant. This information is useful for risk assessments as well as in situ conservation strategies for wild germplasm.• In 2010-2014, we surveyed 23 populations of wild/weedy eggplant (Solanum insanum; known as wild brinjal), carried out hand-pollination experiments, and observed pollinators to assess the potential for crop- to- wild gene flow in southern India.• Wild brinjal is a spiny, low-growing perennial commonly found in disturbed sites such as roadsides, wastelands, and sparsely vegetated areas near villages and agricultural fields. Fourteen of the 23 wild populations in our study occurred within 0.5 km of cultivated brinjal and at least nine flowered in synchrony with the crop. Hand crosses between wild and cultivated brinjal resulted in seed set and viable F1 progeny. Wild brinjal flowers that were bagged to exclude pollinators did not set fruit, and fruit set from manual self-pollination was low. The exserted stigmas of wild brinjal are likely to promote outcrossing. The most effective pollinators appeared to be bees (Amegilla, Xylocopa, Nomia, and Heterotrigona spp.), which also were observed foraging for pollen on crop brinjal.• Our findings suggest that hybridization is possible between cultivated and wild brinjal in southern India. Thus, as part of the risk assessment process, we assume that transgenes from the crop could spread to wild brinjal populations that occur nearby. © 2015 Botanical Society of America, Inc.

  20. Down-weighting overlapping genes improves gene set analysis

    PubMed Central

    2012-01-01

    Background The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set. Results In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysis with Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org. PMID:22713124

  1. Novel gene sets improve set-level classification of prokaryotic gene expression data.

    PubMed

    Holec, Matěj; Kuželka, Ondřej; Železný, Filip

    2015-10-28

    Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

  2. Survival bias and drug interaction can attenuate cross-sectional case-control comparisons of genes with health outcomes. An example of the kinesin-like protein 6 (KIF6) Trp719Arg polymorphism and coronary heart disease.

    PubMed

    Williams, Paul; Pendyala, Lakshmana; Superko, Robert

    2011-03-24

    Case-control studies typically exclude fatal endpoints from the case set, which we hypothesize will substantially underestimate risk if survival is genotype-dependent. The loss of fatal cases is particularly nontrivial for studies of coronary heart disease (CHD) because of significantly reduced survival (34% one-year fatality following a coronary attack). A case in point is the KIF6 Trp719Arg polymorphism (rs20455). Whereas six prospective studies have shown that carriers of the KIF6 Trp719Arg risk allele have 20% to 50% greater CHD risk than non-carriers, several cross-sectional case-control studies failed to show that carrier status is related to CHD. Computer simulations were therefore employed to assess the impact of the loss of fatal events on gene associations in cross-sectional case-control studies, using KIF6 Trp719Arg as an example. Ten replicates of 1,000,000 observations each were generated reflecting Canadian demographics. Cardiovascular disease (CVD) risks were assigned by the Framingham equation and events distributed among KIF6 Trp719Arg genotypes according to published prospective studies. Logistic regression analysis was used to estimate odds ratios between KIF6 genotypes. Results were examined for 33%, 41.5%, and 50% fatality rates for incident CVD.In the absence of any difference in percent fatalities between genotypes, the odds ratios (carriers vs. noncarriers) were unaffected by survival bias, otherwise the odds ratios were increasingly attenuated as the disparity between fatality rates increased between genotypes. Additional simulations demonstrated that statin usage, shown in four clinical trials to substantially reduce the excess CHD risk in the KIF6 719Arg variant, should also attenuate the KIF6 719Arg odds ratio in case-control studies. These computer simulations show that exclusions of prior CHD fatalities attenuate odds ratios of case-control studies in proportion to the difference in the percent fatalities between genotypes. Disproportionate CHD survival for KIF6 Trip719Arg carriers is suggested by their 50% greater risk for recurrent myocardial infarction. This, and the attenuation of KIF6 719Arg carrier risk with statin use, may explain the genotype's weak association with CHD in cross-sectional case-control studies. The results may be relevant to the underestimation of risk in cross-sectional case-control studies of other genetic CHD-risk factors affecting survival.

  3. Convergent functional genomics of anxiety disorders: translational identification of genes, biomarkers, pathways and mechanisms.

    PubMed

    Le-Niculescu, H; Balaraman, Y; Patel, S D; Ayalew, M; Gupta, J; Kuczenski, R; Shekhar, A; Schork, N; Geyer, M A; Niculescu, A B

    2011-05-24

    Anxiety disorders are prevalent and disabling yet understudied from a genetic standpoint, compared with other major psychiatric disorders such as bipolar disorder and schizophrenia. The fact that they are more common, diverse and perceived as embedded in normal life may explain this relative oversight. In addition, as for other psychiatric disorders, there are technical challenges related to the identification and validation of candidate genes and peripheral biomarkers. Human studies, particularly genetic ones, are susceptible to the issue of being underpowered, because of genetic heterogeneity, the effect of variable environmental exposure on gene expression, and difficulty of accrual of large, well phenotyped cohorts. Animal model gene expression studies, in a genetically homogeneous and experimentally tractable setting, can avoid artifacts and provide sensitivity of detection. Subsequent translational integration of the animal model datasets with human genetic and gene expression datasets can ensure cross-validatory power and specificity for illness. We have used a pharmacogenomic mouse model (involving treatments with an anxiogenic drug--yohimbine, and an anti-anxiety drug--diazepam) as a discovery engine for identification of anxiety candidate genes as well as potential blood biomarkers. Gene expression changes in key brain regions for anxiety (prefrontal cortex, amygdala and hippocampus) and blood were analyzed using a convergent functional genomics (CFG) approach, which integrates our new data with published human and animal model data, as a translational strategy of cross-matching and prioritizing findings. Our work identifies top candidate genes (such as FOS, GABBR1, NR4A2, DRD1, ADORA2A, QKI, RGS2, PTGDS, HSPA1B, DYNLL2, CCKBR and DBP), brain-blood biomarkers (such as FOS, QKI and HSPA1B), pathways (such as cAMP signaling) and mechanisms for anxiety disorders--notably signal transduction and reactivity to environment, with a prominent role for the hippocampus. Overall, this work complements our previous similar work (on bipolar mood disorders and schizophrenia) conducted over the last decade. It concludes our programmatic first pass mapping of the genomic landscape of the triad of major psychiatric disorder domains using CFG, and permitted us to uncover the significant genetic overlap between anxiety and these other major psychiatric disorders, notably the under-appreciated overlap with schizophrenia. PDE10A, TAC1 and other genes uncovered by our work provide a molecular basis for the frequently observed clinical co-morbidity and interdependence between anxiety and other major psychiatric disorders, and suggest schizo-anxiety as a possible new nosological domain.

  4. Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration.

    PubMed

    Meng, Shan; He, Jianbo; Zhao, Tuanjie; Xing, Guangnan; Li, Yan; Yang, Shouping; Lu, Jiangjie; Wang, Yufeng; Gai, Junyi

    2016-08-01

    Utilizing an innovative GWAS in CSLRP, 44 QTL 199 alleles with 72.2 % contribution to SIFC variation were detected and organized into a QTL-allele matrix for cross design and gene annotation. The seed isoflavone content (SIFC) of soybeans is of great importance to health care. The Chinese soybean landrace population (CSLRP) as a genetic reservoir was studied for its whole-genome quantitative trait loci (QTL) system of the SIFC using an innovative restricted two-stage multi-locus genome-wide association study procedure (RTM-GWAS). A sample of 366 landraces was tested under four environments and sequenced using RAD-seq (restriction-site-associated DNA sequencing) technique to obtain 116,769 single nucleotide polymorphisms (SNPs) then organized into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) for GWAS. The detected 44 QTL 199 alleles on 16 chromosomes (explaining 72.2 % of the total phenotypic variation) with the allele effects (92 positive and 107 negative) of the CSLRP were organized into a QTL-allele matrix showing the SIFC population genetic structure. Additional differentiation among eco-regions due to the SIFC in addition to that of genome-wide markers was found. All accessions comprised both positive and negative alleles, implying a great potential for recombination within the population. The optimal crosses were predicted from the matrices, showing transgressive potentials in the CSLRP. From the detected QTL system, 55 candidate genes related to 11 biological processes were χ (2)-tested as an SIFC candidate gene system. The present study explored the genome-wide SIFC QTL/gene system with the innovative RTM-GWAS and found the potentials of the QTL-allele matrix in optimal cross design and population genetic and genomic studies, which may have provided a solution to match the breeding by design strategy at both QTL and gene levels in breeding programs.

  5. Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.

    PubMed

    Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

    2017-01-25

    With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.

  6. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    PubMed Central

    Callister, Stephen J.; McCue, Lee Ann; Turse, Joshua E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-01-01

    While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits. PMID:18253490

  7. Inference of Evolutionary Forces Acting on Human Biological Pathways

    PubMed Central

    Daub, Josephine T.; Dupanloup, Isabelle; Robinson-Rechavi, Marc; Excoffier, Laurent

    2015-01-01

    Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures. PMID:25971280

  8. Controlling false-negative errors in microarray differential expression analysis: a PRIM approach.

    PubMed

    Cole, Steve W; Galic, Zoran; Zack, Jerome A

    2003-09-22

    Theoretical considerations suggest that current microarray screening algorithms may fail to detect many true differences in gene expression (Type II analytic errors). We assessed 'false negative' error rates in differential expression analyses by conventional linear statistical models (e.g. t-test), microarray-adapted variants (e.g. SAM, Cyber-T), and a novel strategy based on hold-out cross-validation. The latter approach employs the machine-learning algorithm Patient Rule Induction Method (PRIM) to infer minimum thresholds for reliable change in gene expression from Boolean conjunctions of fold-induction and raw fluorescence measurements. Monte Carlo analyses based on four empirical data sets show that conventional statistical models and their microarray-adapted variants overlook more than 50% of genes showing significant up-regulation. Conjoint PRIM prediction rules recover approximately twice as many differentially expressed transcripts while maintaining strong control over false-positive (Type I) errors. As a result, experimental replication rates increase and total analytic error rates decline. RT-PCR studies confirm that gene inductions detected by PRIM but overlooked by other methods represent true changes in mRNA levels. PRIM-based conjoint inference rules thus represent an improved strategy for high-sensitivity screening of DNA microarrays. Freestanding JAVA application at http://microarray.crump.ucla.edu/focus

  9. Transcriptome response signatures associated with the overexpression of a mitochondrial uncoupling protein (AtUCP1) in tobacco.

    PubMed

    Laitz, Alessandra Vasconcellos Nunes; Acencio, Marcio Luis; Budzinski, Ilara G F; Labate, Mônica T V; Lemke, Ney; Ribolla, Paulo Eduardo Martins; Maia, Ivan G

    2015-01-01

    Mitochondrial inner membrane uncoupling proteins (UCP) dissipate the proton electrochemical gradient established by the respiratory chain, thus affecting the yield of ATP synthesis. UCP overexpression in plants has been correlated with oxidative stress tolerance, improved photosynthetic efficiency and increased mitochondrial biogenesis. This study reports the main transcriptomic responses associated with the overexpression of an UCP (AtUCP1) in tobacco seedlings. Compared to wild-type (WT), AtUCP1 transgenic seedlings showed unaltered ATP levels and higher accumulation of serine. By using RNA-sequencing, a total of 816 differentially expressed genes between the investigated overexpressor lines and the untransformed WT control were identified. Among them, 239 were up-regulated and 577 were down-regulated. As a general response to AtUCP1 overexpression, noticeable changes in the expression of genes involved in energy metabolism and redox homeostasis were detected. A substantial set of differentially expressed genes code for products targeted to the chloroplast and mainly involved in photosynthesis. The overall results demonstrate that the alterations in mitochondrial function provoked by AtUCP1 overexpression require important transcriptomic adjustments to maintain cell homeostasis. Moreover, the occurrence of an important cross-talk between chloroplast and mitochondria, which culminates in the transcriptional regulation of several genes involved in different pathways, was evidenced.

  10. Expression Atlas: gene and protein expression across multiple studies and organisms

    PubMed Central

    Tang, Y Amy; Bazant, Wojciech; Burke, Melissa; Fuentes, Alfonso Muñoz-Pomer; George, Nancy; Koskinen, Satu; Mohammed, Suhaib; Geniza, Matthew; Preece, Justin; Jarnuczak, Andrew F; Huber, Wolfgang; Stegle, Oliver; Brazma, Alvis; Petryszak, Robert

    2018-01-01

    Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions. PMID:29165655

  11. Functionally Enigmatic Genes: A Case Study of the Brain Ignorome

    PubMed Central

    Pandey, Ashutosh K.; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W.

    2014-01-01

    What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed—the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases—ELMOD1, TMEM88B, and DZANK1—we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes. PMID:24523945

  12. Functionally enigmatic genes: a case study of the brain ignorome.

    PubMed

    Pandey, Ashutosh K; Lu, Lu; Wang, Xusheng; Homayouni, Ramin; Williams, Robert W

    2014-01-01

    What proportion of genes with intense and selective expression in specific tissues, cells, or systems are still almost completely uncharacterized with respect to biological function? In what ways do these functionally enigmatic genes differ from well-studied genes? To address these two questions, we devised a computational approach that defines so-called ignoromes. As proof of principle, we extracted and analyzed a large subset of genes with intense and selective expression in brain. We find that publications associated with this set are highly skewed--the top 5% of genes absorb 70% of the relevant literature. In contrast, approximately 20% of genes have essentially no neuroscience literature. Analysis of the ignorome over the past decade demonstrates that it is stubbornly persistent, and the rapid expansion of the neuroscience literature has not had the expected effect on numbers of these genes. Surprisingly, ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum--a genomic bandwagon effect. Finally we ask to what extent massive genomic, imaging, and phenotype data sets can be used to provide high-throughput functional annotation for an entire ignorome. In a majority of cases we have been able to extract and add significant information for these neglected genes. In several cases--ELMOD1, TMEM88B, and DZANK1--we have exploited sequence polymorphisms, large phenome data sets, and reverse genetic methods to evaluate the function of ignorome genes.

  13. Cross-Lagged Analysis of Interplay Between Differential Traits in Sibling Pairs: Validation and Application to Parenting Behavior and ADHD Symptomatology.

    PubMed

    Moscati, Arden; Verhulst, Brad; McKee, Kevin; Silberg, Judy; Eaves, Lindon

    2018-01-01

    Understanding the factors that contribute to behavioral traits is a complex task, and partitioning variance into latent genetic and environmental components is a useful beginning, but it should not also be the end. Many constructs are influenced by their contextual milieu, and accounting for background effects (such as gene-environment correlation) is necessary to avoid bias. This study introduces a method for examining the interplay between traits, in a longitudinal design using differential items in sibling pairs. The model is validated via simulation and power analysis, and we conclude with an application to paternal praise and ADHD symptoms in a twin sample. The model can help identify what type of genetic and environmental interplay may contribute to the dynamic relationship between traits using a cross-lagged panel framework. Overall, it presents a way to estimate and explicate the developmental interplay between a set of traits, free from many common sources of bias.

  14. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.

    PubMed

    Hettne, Kristina M; Boorsma, André; van Dartel, Dorien A M; Goeman, Jelle J; de Jong, Esther; Piersma, Aldert H; Stierum, Rob H; Kleinjans, Jos C; Kors, Jan A

    2013-01-29

    Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  15. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect. PMID:23356878

  16. Associations of candidate genes to age-related macular degeneration among racial/ethnic groups in the multi-ethnic study of atherosclerosis.

    PubMed

    Klein, Ronald; Li, Xiaohui; Kuo, Jane Z; Klein, Barbara E K; Cotch, Mary Frances; Wong, Tien Y; Taylor, Kent D; Rotter, Jerome I

    2013-11-01

    To describe the relationships of selected candidate genes to the prevalence of early age-related macular degeneration (AMD) in a cohort of whites, blacks, Hispanics, and Chinese Americans. Cross-sectional study. setting: Multicenter study. study population: A total of 2456 persons aged 45-84 years with genotype information and fundus photographs. procedures: Twelve of 2862 single nucleotide polymorphisms (SNPs) from 11 of 233 candidate genes for cardiovascular disease were selected for analysis based on screening with marginal unadjusted P value <.001 within 1 or more racial/ethnic groups. Logistic regression models tested for association in case-control samples. main outcome measure: Prevalence of early AMD. Early AMD was present in 4.0% of the cohort and varied from 2.4% in blacks to 6.0% in whites. The odds ratio increased from 2.3 for 1 to 10.0 for 4 risk alleles in a joint effect analysis of Age-Related Maculopathy Susceptibility 2 rs10490924 and Complement Factor H Y402H (P for trend = 4.2×10(-7)). Frequencies of each SNP varied among the racial/ethnic groups. Adjusting for age and other factors, few statistically significant associations of the 12 SNPs with AMD were consistent across all groups. In a multivariate model, most candidate genes did not attenuate the comparatively higher odds of AMD in whites. The higher frequency of risk alleles for several SNPs in Chinese Americans may partially explain their AMD frequency's approaching that of whites. The relationships of 11 candidate genes to early AMD varied among 4 racial/ethnic groups, and partially explained the observed variations in early AMD prevalence among them. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-07-07

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. Copyright © 2016 Crossa et al.

  18. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. PMID:27172218

  19. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Friddle, Carl J; Koga, Teiichiro; Rubin, Edward M.

    2000-03-15

    While cardiac hypertrophy has been the subject of intensive investigation, regression of hypertrophy has been significantly less studied, precluding large-scale analysis of the relationship between these processes. In the present study, using pharmacological models of hypertrophy in mice, expression profiling was performed with fragments of more than 3,000 genes to characterize and contrast expression changes during induction and regression of hypertrophy. Administration of angiotensin II and isoproterenol by osmotic minipump produced increases in heart weight (15% and 40% respectively) that returned to pre-induction size following drug withdrawal. From multiple expression analyses of left ventricular RNA isolated at daily time-points duringmore » cardiac hypertrophy and regression, we identified sets of genes whose expression was altered at specific stages of this process. While confirming the participation of 25 genes or pathways previously known to be altered by hypertrophy, a larger set of 30 genes was identified whose expression had not previously been associated with cardiac hypertrophy or regression. Of the 55 genes that showed reproducible changes during the time course of induction and regression, 32 genes were altered only during induction and 8 were altered only during regression. This study identified both known and novel genes whose expression is affected at different stages of cardiac hypertrophy and regression and demonstrates that cardiac remodeling during regression utilizes a set of genes that are distinct from those used during induction of hypertrophy.« less

  20. Discovery of cancer common and specific driver gene sets

    PubMed Central

    2017-01-01

    Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295

  1. Gene-set analysis based on the pharmacological profiles of drugs to identify repurposing opportunities in schizophrenia.

    PubMed

    de Jong, Simone; Vidler, Lewis R; Mokrab, Younes; Collier, David A; Breen, Gerome

    2016-08-01

    Genome-wide association studies (GWAS) have identified thousands of novel genetic associations for complex genetic disorders, leading to the identification of potential pharmacological targets for novel drug development. In schizophrenia, 108 conservatively defined loci that meet genome-wide significance have been identified and hundreds of additional sub-threshold associations harbour information on the genetic aetiology of the disorder. In the present study, we used gene-set analysis based on the known binding targets of chemical compounds to identify the 'drug pathways' most strongly associated with schizophrenia-associated genes, with the aim of identifying potential drug repositioning opportunities and clues for novel treatment paradigms, especially in multi-target drug development. We compiled 9389 gene sets (2496 with unique gene content) and interrogated gene-based p-values from the PGC2-SCZ analysis. Although no single drug exceeded experiment wide significance (corrected p<0.05), highly ranked gene-sets reaching suggestive significance including the dopamine receptor antagonists metoclopramide and trifluoperazine and the tyrosine kinase inhibitor neratinib. This is a proof of principle analysis showing the potential utility of GWAS data of schizophrenia for the direct identification of candidate drugs and molecules that show polypharmacy. © The Author(s) 2016.

  2. Genome-wide association between DNA methylation and alternative splicing in an invertebrate

    PubMed Central

    2012-01-01

    Background Gene bodies are the most evolutionarily conserved targets of DNA methylation in eukaryotes. However, the regulatory functions of gene body DNA methylation remain largely unknown. DNA methylation in insects appears to be primarily confined to exons. Two recent studies in Apis mellifera (honeybee) and Nasonia vitripennis (jewel wasp) analyzed transcription and DNA methylation data for one gene in each species to demonstrate that exon-specific DNA methylation may be associated with alternative splicing events. In this study we investigated the relationship between DNA methylation, alternative splicing, and cross-species gene conservation on a genome-wide scale using genome-wide transcription and DNA methylation data. Results We generated RNA deep sequencing data (RNA-seq) to measure genome-wide mRNA expression at the exon- and gene-level. We produced a de novo transcriptome from this RNA-seq data and computationally predicted splice variants for the honeybee genome. We found that exons that are included in transcription are higher methylated than exons that are skipped during transcription. We detected enrichment for alternative splicing among methylated genes compared to unmethylated genes using fisher’s exact test. We performed a statistical analysis to reveal that the presence of DNA methylation or alternative splicing are both factors associated with a longer gene length and a greater number of exons in genes. In concordance with this observation, a conservation analysis using BLAST revealed that each of these factors is also associated with higher cross-species gene conservation. Conclusions This study constitutes the first genome-wide analysis exhibiting a positive relationship between exon-level DNA methylation and mRNA expression in the honeybee. Our finding that methylated genes are enriched for alternative splicing suggests that, in invertebrates, exon-level DNA methylation may play a role in the construction of splice variants by positively influencing exon inclusion during transcription. The results from our cross-species homology analysis suggest that DNA methylation and alternative splicing are genetic mechanisms whose utilization could contribute to a longer gene length and a slower rate of gene evolution. PMID:22978521

  3. A multiplex branched DNA assay for parallel quantitative gene expression profiling.

    PubMed

    Flagella, Michael; Bui, Son; Zheng, Zhi; Nguyen, Cung Tuong; Zhang, Aiguo; Pastor, Larry; Ma, Yunqing; Yang, Wen; Crawford, Kimberly L; McMaster, Gary K; Witney, Frank; Luo, Yuling

    2006-05-01

    We describe a novel method to quantitatively measure messenger RNA (mRNA) expression of multiple genes directly from crude cell lysates and tissue homogenates without the need for RNA purification or target amplification. The multiplex branched DNA (bDNA) assay adapts the bDNA technology to the Luminex fluorescent bead-based platform through the use of cooperative hybridization, which ensures an exceptionally high degree of assay specificity. Using in vitro transcribed RNA as reference standards, we demonstrated that the assay is highly specific, with cross-reactivity less than 0.2%. We also determined that the assay detection sensitivity is 25,000 RNA transcripts with intra- and interplate coefficients of variance of less than 10% and less than 15%, respectively. Using three 10-gene panels designed to measure proinflammatory and apoptosis responses, we demonstrated sensitive and specific multiplex gene expression profiling directly from cell lysates. The gene expression change data demonstrate a high correlation coefficient (R(2)=0.94) compared with measurements obtained using the single-plex bDNA assay. Thus, the multiplex bDNA assay provides a powerful means to quantify the gene expression profile of a defined set of target genes in large sample populations.

  4. Numerical analysis of the dynamic interaction between wheel set and turnout crossing using the explicit finite element method

    NASA Astrophysics Data System (ADS)

    Xin, L.; Markine, V. L.; Shevtsov, I. Y.

    2016-03-01

    A three-dimensional (3-D) explicit dynamic finite element (FE) model is developed to simulate the impact of the wheel on the crossing nose. The model consists of a wheel set moving over the turnout crossing. Realistic wheel, wing rail and crossing geometries have been used in the model. Using this model the dynamic responses of the system such as the contact forces between the wheel and the crossing, crossing nose displacements and accelerations, stresses in rail material as well as in sleepers and ballast can be obtained. Detailed analysis of the wheel set and crossing interaction using the local contact stress state in the rail is possible as well, which provides a good basis for prediction of the long-term behaviour of the crossing (fatigue analysis). In order to tune and validate the FE model field measurements conducted on several turnouts in the railway network in the Netherlands are used here. The parametric study including variations of the crossing nose geometries performed here demonstrates the capabilities of the developed model. The results of the validation and parametric study are presented and discussed.

  5. Association of High Myopia with Crystallin Beta A4 (CRYBA4) Gene Polymorphisms in the Linkage-Identified MYP6 Locus

    PubMed Central

    Ho, Daniel W. H.; Yap, Maurice K. H.; Ng, Po Wah; Fung, Wai Yan; Yip, Shea Ping

    2012-01-01

    Background Myopia is the most common ocular disorder worldwide and imposes tremendous burden on the society. It is a complex disease. The MYP6 locus at 22 q12 is of particular interest because many studies have detected linkage signals at this interval. The MYP6 locus is likely to contain susceptibility gene(s) for myopia, but none has yet been identified. Methodology/Principal Findings Two independent subject groups of southern Chinese in Hong Kong participated in the study an initial study using a discovery sample set of 342 cases and 342 controls, and a follow-up study using a replication sample set of 316 cases and 313 controls. Cases with high myopia were defined by spherical equivalent ≤ -8 dioptres and emmetropic controls by spherical equivalent within ±1.00 dioptre for both eyes. Manual candidate gene selection from the MYP6 locus was supported by objective in silico prioritization. DNA samples of discovery sample set were genotyped for 178 tagging single nucleotide polymorphisms (SNPs) from 26 genes. For replication, 25 SNPs (tagging or located at predicted transcription factor or microRNA binding sites) from 4 genes were subsequently examined using the replication sample set. Fisher P value was calculated for all SNPs and overall association results were summarized by meta-analysis. Based on initial and replication studies, rs2009066 located in the crystallin beta A4 (CRYBA4) gene was identified to be the most significantly associated with high myopia (initial study: P = 0.02; replication study: P = 1.88e-4; meta-analysis: P = 1.54e-5) among all the SNPs tested. The association result survived correction for multiple comparisons. Under the allelic genetic model for the combined sample set, the odds ratio of the minor allele G was 1.41 (95% confidence intervals, 1.21-1.64). Conclusions/Significance A novel susceptibility gene (CRYBA4) was discovered for high myopia. Our study also signified the potential importance of appropriate gene prioritization in candidate selection. PMID:22792142

  6. The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.

    PubMed

    Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R

    2017-01-01

    Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.

  7. On the statistical assessment of classifiers using DNA microarray data

    PubMed Central

    Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F

    2006-01-01

    Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. Conclusions The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. PMID:16919171

  8. Transcriptional responses in thyroid tissues from rats treated with a tumorigenic and a non-tumorigenic triazole conazole fungicide.

    PubMed

    Hester, Susan D; Nesnow, Stephen

    2008-03-15

    Conazoles are azole-containing fungicides that are used in agriculture and medicine. Conazoles can induce follicular cell adenomas of the thyroid in rats after chronic bioassay. The goal of this study was to identify pathways and networks of genes that were associated with thyroid tumorigenesis through transcriptional analyses. To this end, we compared transcriptional profiles from tissues of rats treated with a tumorigenic and a non-tumorigenic conazole. Triadimefon, a rat thyroid tumorigen, and myclobutanil, which was not tumorigenic in rats after a 2-year bioassay, were administered in the feed to male Wistar/Han rats for 30 or 90 days similar to the treatment conditions previously used in their chronic bioassays. Thyroid gene expression was determined using high density Affymetrix GeneChips (Rat 230_2). Gene expression was analyzed by the Gene Set Expression Analyses method which clearly separated the tumorigenic treatments (tumorigenic response group (TRG)) from the non-tumorigenic treatments (non-tumorigenic response group (NRG)). Core genes from these gene sets were mapped to canonical, metabolic, and GeneGo processes and these processes compared across group and treatment time. Extensive analyses were performed on the 30-day gene sets as they represented the major perturbations. Gene sets in the 30-day TRG group had over representation of fatty acid metabolism, oxidation, and degradation processes (including PPARgamma and CYP involvement), and of cell proliferation responses. Core genes from these gene sets were combined into networks and found to possess signaling interactions. In addition, the core genes in each gene set were compared with genes known to be associated with human thyroid cancer. Among the genes that appeared in both rat and human data sets were: Acaca, Asns, Cebpg, Crem, Ddit3, Gja1, Grn, Jun, Junb, and Vegf. These genes were major contributors in the previously developed network from triadimefon-treated rat thyroids. It is postulated that triadimefon induces oxidative response genes and activates the nuclear receptor, Ppargamma, initiating transcription of gene products and signaling to a series of genes involved in cell proliferation.

  9. Transcriptome-wide selection of a reliable set of reference genes for gene expression studies in potato cyst nematodes (Globodera spp.).

    PubMed

    Sabeh, Michael; Duceppe, Marc-Olivier; St-Arnaud, Marc; Mimee, Benjamin

    2018-01-01

    Relative gene expression analyses by qRT-PCR (quantitative reverse transcription PCR) require an internal control to normalize the expression data of genes of interest and eliminate the unwanted variation introduced by sample preparation. A perfect reference gene should have a constant expression level under all the experimental conditions. However, the same few housekeeping genes selected from the literature or successfully used in previous unrelated experiments are often routinely used in new conditions without proper validation of their stability across treatments. The advent of RNA-Seq and the availability of public datasets for numerous organisms are opening the way to finding better reference genes for expression studies. Globodera rostochiensis is a plant-parasitic nematode that is particularly yield-limiting for potato. The aim of our study was to identify a reliable set of reference genes to study G. rostochiensis gene expression. Gene expression levels from an RNA-Seq database were used to identify putative reference genes and were validated with qRT-PCR analysis. Three genes, GR, PMP-3, and aaRS, were found to be very stable within the experimental conditions of this study and are proposed as reference genes for future work.

  10. A Meta-Analysis of Multiple Matched Copy Number and Transcriptomics Data Sets for Inferring Gene Regulatory Relationships

    PubMed Central

    Newton, Richard; Wernisch, Lorenz

    2014-01-01

    Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247

  11. EcoGene 3.0

    PubMed Central

    Zhou, Jindan; Rudd, Kenneth E.

    2013-01-01

    EcoGene (http://ecogene.org) is a database and website devoted to continuously improving the structural and functional annotation of Escherichia coli K-12, one of the most well understood model organisms, represented by the MG1655(Seq) genome sequence and annotations. Major improvements to EcoGene in the past decade include (i) graphic presentations of genome map features; (ii) ability to design Boolean queries and Venn diagrams from EcoArray, EcoTopics or user-provided GeneSets; (iii) the genome-wide clone and deletion primer design tool, PrimerPairs; (iv) sequence searches using a customized EcoBLAST; (v) a Cross Reference table of synonymous gene and protein identifiers; (vi) proteome-wide indexing with GO terms; (vii) EcoTools access to >2000 complete bacterial genomes in EcoGene-RefSeq; (viii) establishment of a MySql relational database; and (ix) use of web content management systems. The biomedical literature is surveyed daily to provide citation and gene function updates. As of September 2012, the review of 37 397 abstracts and articles led to creation of 98 425 PubMed-Gene links and 5415 PubMed-Topic links. Annotation updates to Genbank U00096 are transmitted from EcoGene to NCBI. Experimental verifications include confirmation of a CTG start codon, pseudogene restoration and quality assurance of the Keio strain collection. PMID:23197660

  12. EcoGene 3.0.

    PubMed

    Zhou, Jindan; Rudd, Kenneth E

    2013-01-01

    EcoGene (http://ecogene.org) is a database and website devoted to continuously improving the structural and functional annotation of Escherichia coli K-12, one of the most well understood model organisms, represented by the MG1655(Seq) genome sequence and annotations. Major improvements to EcoGene in the past decade include (i) graphic presentations of genome map features; (ii) ability to design Boolean queries and Venn diagrams from EcoArray, EcoTopics or user-provided GeneSets; (iii) the genome-wide clone and deletion primer design tool, PrimerPairs; (iv) sequence searches using a customized EcoBLAST; (v) a Cross Reference table of synonymous gene and protein identifiers; (vi) proteome-wide indexing with GO terms; (vii) EcoTools access to >2000 complete bacterial genomes in EcoGene-RefSeq; (viii) establishment of a MySql relational database; and (ix) use of web content management systems. The biomedical literature is surveyed daily to provide citation and gene function updates. As of September 2012, the review of 37 397 abstracts and articles led to creation of 98 425 PubMed-Gene links and 5415 PubMed-Topic links. Annotation updates to Genbank U00096 are transmitted from EcoGene to NCBI. Experimental verifications include confirmation of a CTG start codon, pseudogene restoration and quality assurance of the Keio strain collection.

  13. Longitudinal and Cross-Sectional Genetic Diversity in the Korean Peninsula Based on the P vivax Merozoite Surface Protein Gene.

    PubMed

    Kim, Jung-Yeon; Suh, Eun-Jung; Yu, Hyo-Soon; Jung, Hyun-Sik; Park, In-Ho; Choi, Yien-Kyeoug; Choi, Kyoung-Mi; Cho, Shin-Hyeong; Lee, Won-Ja

    2011-12-01

    Vivax malaria has reemerged and become endemic in Korea. Our study aimed to analyze by both longitudinal and cross-sectional genetic diversity of this malaria based on the P vivax Merozoite Surface Protein (PvMSP) gene parasites recently found in the Korean peninsula. PvMSP-1 gene sequence analysis from P vivax isolates (n = 835) during the 1996-2010 period were longitudinally analyzed and the isolates from the Korean peninsula through South Korea, the demilitarized zone and North Korea collected in 2008-2010 were enrolled in an overall analysis of MSP-1 gene diversity. New recombinant subtypes and severe multiple-cloneinfection rates were observed in recent vivax parasites. Regional variation was also observed in the study sites. This study revealed the great complexity of genetic variation and rapid dissemination of genes in P vivax. It also showed interesting patterns of diversity depending, on the region in the Korean Peninsula. Understanding the parasiteninsula. Under genetic variation may help to analyze trends and assess the extent of endemic malaria in Korea.

  14. Inheritance of tristyly in Oxalis tuberosa (Oxalidaceae).

    PubMed

    Trognitz, B R; Hermann, M

    2001-05-01

    Frequencies of floral morphs in progenies obtained from a complete set of diallelic crosses among three accessions of tristylous, octoploid oca (Oxalis tuberosa) were used for a Mendelian analysis of floral morph inheritance. The frequencies observed had the best fit to a model of tetrasomic inheritance with two diallelic factors, S, s and M, m, with S being epistatic over M. No explanation could be found for the unexpected formation of a small percentage of short-styled individuals in crosses between the mid-styled and the long-styled parent. For the acceptance of models of disomic and octosomic inheritance several additional assumptions would have to be made and therefore these modes of inheritance are less likely. Dosage-dependent inheritance of floral morph was rejected. Only a small frequency (36%) of the cross progenies flowered, in contrast to the greater propensity for flowering of O. tuberosa accessions held at gene banks.

  15. Transcriptional analysis of liver from chickens with fast (meat bird), moderate (F1 layer x meat bird cross) and low (layer bird) growth potential.

    PubMed

    Willson, Nicky-Lee; Forder, Rebecca E A; Tearle, Rick; Williams, John L; Hughes, Robert J; Nattrass, Greg S; Hynd, Philip I

    2018-05-02

    Divergent selection for meat and egg production in poultry has resulted in strains of birds differing widely in traits related to these products. Modern strains of meat birds can reach live weights of 2 kg in 35 d, while layer strains are now capable of producing more than 300 eggs per annum but grow slowly. In this study, RNA-Seq was used to investigate hepatic gene expression between three groups of birds with large differences in growth potential; meat bird, layer strain as well as an F1 layer x meat bird. The objective was to identify differentially expressed (DE) genes between all three strains to elucidate biological factors underpinning variations in growth performance. RNA-Seq analysis was carried out on total RNA extracted from the liver of meat bird (n = 6), F1 layer x meat bird cross (n = 6) and layer strain (n = 6), males. Differential expression of genes were considered significant at P < 0.05, and a false discovery rate of < 0.05, with any fold change considered. In total, 6278 genes were found to be DE with 5832 DE between meat birds and layers (19%), 2935 DE between meat birds and the cross (9.6%) and 493 DE between the cross and layers (1.6%). Comparisons between the three groups identified 155 significant DE genes. Gene ontology (GO) enrichment and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway analysis of the 155 DE genes showed the FoxO signalling pathway was most enriched (P = 0.001), including genes related to cell cycle regulation and insulin signalling. Significant GO terms included 'positive regulation of glucose import' and 'cellular response to oxidative stress', which is also consistent with FoxOs regulation of glucose metabolism. There were high correlations between FoxO pathway genes and bodyweight, as well as genes related to glycolysis and bodyweight. This study revealed large transcriptome differences between meat and layer birds. There was significant evidence implicating the FoxO signalling pathway (via cell cycle regulation and altered metabolism) as an active driver of growth variations in chicken. Functional analysis of the FoxO genes is required to understand how they regulate growth and egg production.

  16. Autogenous cross-regulation of Quaking mRNA processing and translation balances Quaking functions in splicing and translation

    PubMed Central

    Liu, Naiyou; Fair, Jeffrey Haskell; Shiue, Lily; Katzman, Sol; Donohue, John Paul

    2017-01-01

    Quaking protein isoforms arise from a single Quaking gene and bind the same RNA motif to regulate splicing, translation, decay, and localization of a large set of RNAs. However, the mechanisms by which Quaking expression is controlled to ensure that appropriate amounts of each isoform are available for such disparate gene expression processes are unknown. Here we explore how levels of two isoforms, nuclear Quaking-5 (Qk5) and cytoplasmic Qk6, are regulated in mouse myoblasts. We found that Qk5 and Qk6 proteins have distinct functions in splicing and translation, respectively, enforced through differential subcellular localization. We show that Qk5 and Qk6 regulate distinct target mRNAs in the cell and act in distinct ways on their own and each other's transcripts to create a network of autoregulatory and cross-regulatory feedback controls. Morpholino-mediated inhibition of Qk translation confirms that Qk5 controls Qk RNA levels by promoting accumulation and alternative splicing of Qk RNA, whereas Qk6 promotes its own translation while repressing Qk5. This Qk isoform cross-regulatory network responds to additional cell type and developmental controls to generate a spectrum of Qk5/Qk6 ratios, where they likely contribute to the wide range of functions of Quaking in development and cancer. PMID:29021242

  17. Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.

    PubMed

    Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine

    2010-04-13

    The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.

  18. The Hawk-Dove game in phenotypically homogeneous and heterogeneous populations of finite dimension

    NASA Astrophysics Data System (ADS)

    Laruelle, Annick; da Silva Rocha, André Barreira; Escobedo, Ramón

    2018-02-01

    The Hawk-Dove game played between individuals in populations of finite dimension is analyzed by means of a stochastic model. We take into account both cases when all individuals in the population are either phenotypically homogeneous or heterogeneous. A strategy in the model is a gene representing the probability of playing the Hawk strategy. Individual interactions at the microscopic level are described by a genetic algorithm where evolution results from the interplay among selection, mutation, drift and cross-over of genes. We show that the behavioral patterns observed at the macroscopic level can be reproduced as the emergent result of individual interactions governed by the rules of the Hawk-Dove game at the microscopic level. We study how the results of the genetic algorithm compare with those obtained in evolutionary game theory, finding that, although genes continuously change both their presence and frequency in the population over time, the population average behavior always achieves stationarity and, when this happens, the final average strategy played in the population oscillates around the evolutionarily stable strategy in the homogeneous population case or the neutrally stable set in the heterogeneous population case.

  19. The GENCODE exome: sequencing the complete human exome

    PubMed Central

    Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno

    2011-01-01

    Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695

  20. Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.

    PubMed

    Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J

    2017-04-01

    Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on Alcoholism.

  1. Functional characterization of MAT1-1-specific mating-type genes in the homothallic ascomycete Sordaria macrospora provides new insights into essential and nonessential sexual regulators.

    PubMed

    Klix, V; Nowrousian, M; Ringelberg, C; Loros, J J; Dunlap, J C; Pöggeler, S

    2010-06-01

    Mating-type genes in fungi encode regulators of mating and sexual development. Heterothallic ascomycete species require different sets of mating-type genes to control nonself-recognition and mating of compatible partners of different mating types. Homothallic (self-fertile) species also carry mating-type genes in their genome that are essential for sexual development. To analyze the molecular basis of homothallism and the role of mating-type genes during fruiting-body development, we deleted each of the three genes, SmtA-1 (MAT1-1-1), SmtA-2 (MAT1-1-2), and SmtA-3 (MAT1-1-3), contained in the MAT1-1 part of the mating-type locus of the homothallic ascomycete species Sordaria macrospora. Phenotypic analysis of deletion mutants revealed that the PPF domain protein-encoding gene SmtA-2 is essential for sexual reproduction, whereas the alpha domain protein-encoding genes SmtA-1 and SmtA-3 play no role in fruiting-body development. By means of cross-species microarray analysis using Neurospora crassa oligonucleotide microarrays hybridized with S. macrospora targets and quantitative real-time PCR, we identified genes expressed under the control of SmtA-1 and SmtA-2. Both genes are involved in the regulation of gene expression, including that of pheromone genes.

  2. Functional Characterization of MAT1-1-Specific Mating-Type Genes in the Homothallic Ascomycete Sordaria macrospora Provides New Insights into Essential and Nonessential Sexual Regulators▿†

    PubMed Central

    Klix, V.; Nowrousian, M.; Ringelberg, C.; Loros, J. J.; Dunlap, J. C.; Pöggeler, S.

    2010-01-01

    Mating-type genes in fungi encode regulators of mating and sexual development. Heterothallic ascomycete species require different sets of mating-type genes to control nonself-recognition and mating of compatible partners of different mating types. Homothallic (self-fertile) species also carry mating-type genes in their genome that are essential for sexual development. To analyze the molecular basis of homothallism and the role of mating-type genes during fruiting-body development, we deleted each of the three genes, SmtA-1 (MAT1-1-1), SmtA-2 (MAT1-1-2), and SmtA-3 (MAT1-1-3), contained in the MAT1-1 part of the mating-type locus of the homothallic ascomycete species Sordaria macrospora. Phenotypic analysis of deletion mutants revealed that the PPF domain protein-encoding gene SmtA-2 is essential for sexual reproduction, whereas the α domain protein-encoding genes SmtA-1 and SmtA-3 play no role in fruiting-body development. By means of cross-species microarray analysis using Neurospora crassa oligonucleotide microarrays hybridized with S. macrospora targets and quantitative real-time PCR, we identified genes expressed under the control of SmtA-1 and SmtA-2. Both genes are involved in the regulation of gene expression, including that of pheromone genes. PMID:20435701

  3. Cytogenetic analysis and mapping of leaf rust resistance in Aegilops speltoides Tausch derived bread wheat line Selection2427 carrying putative gametocidal gene(s).

    PubMed

    Niranjana, M; Vinod; Sharma, J B; Mallick, Niharika; Tomar, S M S; Jha, S K

    2017-12-01

    Leaf rust (Puccinia triticina) is a major biotic stress affecting wheat yields worldwide. Host-plant resistance is the best method for controlling leaf rust. Aegilops speltoides is a good source of resistance against wheat rusts. To date, five Lr genes, Lr28, Lr35, Lr36, Lr47, and Lr51, have been transferred from Ae. speltoides to bread wheat. In Selection2427, a bread wheat introgresed line with Ae. speltoides as the donor parent, a dominant gene for leaf rust resistance was mapped to the long arm of chromosome 3B (LrS2427). None of the Lr genes introgressed from Ae. speltoides have been mapped to chromosome 3B. Since none of the designated seedling leaf rust resistance genes have been located on chromosome 3B, LrS2427 seems to be a novel gene. Selection2427 showed a unique property typical of gametocidal genes, that when crossed to other bread wheat cultivars, the F 1 showed partial pollen sterility and poor seed setting, whilst Selection2427 showed reasonable male and female fertility. Accidental co-transfer of gametocidal genes with LrS2427 may have occurred in Selection2427. Though LrS2427 did not show any segregation distortion and assorted independently of putative gametocidal gene(s), its utilization will be difficult due to the selfish behavior of gametocidal genes.

  4. Systematic assessment of cervical cancer initiation and progression uncovers genetic panels for deep learning-based early diagnosis and proposes novel diagnostic and prognostic biomarkers.

    PubMed

    Long, Nguyen Phuoc; Jung, Kyung Hee; Yoon, Sang Jun; Anh, Nguyen Hoang; Nghi, Tran Diem; Kang, Yun Pyo; Yan, Hong Hua; Min, Jung Eun; Hong, Soon-Sun; Kwon, Sung Won

    2017-12-12

    Although many outstanding achievements in the management of cervical cancer (CxCa) have obtained, it still imposes a major burden which has prompted scientists to discover and validate new CxCa biomarkers to improve the diagnostic and prognostic assessment of CxCa. In this study, eight different gene expression data sets containing 202 cancer, 115 cervical intraepithelial neoplasia (CIN), and 105 normal samples were utilized for an integrative systems biology assessment in a multi-stage carcinogenesis manner. Deep learning-based diagnostic models were established based on the genetic panels of intrinsic genes of cervical carcinogenesis as well as on the unbiased variable selection approach. Survival analysis was also conducted to explore the potential biomarker candidates for prognostic assessment. Our results showed that cell cycle, RNA transport, mRNA surveillance, and one carbon pool by folate were the key regulatory mechanisms involved in the initiation, progression, and metastasis of CxCa. Various genetic panels combined with machine learning algorithms successfully differentiated CxCa from CIN and normalcy in cross-study normalized data sets. In particular, the 168-gene deep learning model for the differentiation of cancer from normalcy achieved an externally validated accuracy of 97.96% (99.01% sensitivity and 95.65% specificity). Survival analysis revealed that ZNF281 and EPHB6 were the two most promising prognostic genetic markers for CxCa among others. Our findings open new opportunities to enhance current understanding of the characteristics of CxCa pathobiology. In addition, the combination of transcriptomics-based signatures and deep learning classification may become an important approach to improve CxCa diagnosis and management in clinical practice.

  5. Quantitative trait loci for maternal performance for offspring survival in mice.

    PubMed Central

    Peripato, Andréa C; De Brito, Reinaldo A; Vaughn, Ty T; Pletscher, L Susan; Matioli, Sergio R; Cheverud, James M

    2002-01-01

    Maternal performance refers to the effect that the environment provided by mothers has on their offspring's phenotypes, such as offspring survival and growth. Variations in maternal behavior and physiology are responsible for variations in maternal performance, which in turn affects offspring survival. In our study we found females that failed to nurture their offspring and showed abnormal maternal behaviors. The genetic architecture of maternal performance for offspring survival was investigated in 241 females of an F(2) intercross of the SM/J and LG/J inbred mouse strains. Using interval-mapping methods we found two quantitative trait loci (QTL) affecting maternal performance at D2Mit17 + 6 cM and D7Mit21 + 2 cM on chromosomes 2 and 7, respectively. In a two-way genome-wide epistasis scan we found 15 epistatic interactions involving 23 QTL distributed across all chromosomes except 12, 16, and 17. These loci form several small sets of interacting QTL, suggesting a complex set of mechanisms operating to determine maternal performance for offspring survival. Taken all together and correcting for the large number of significant factors, QTL and their interactions explain almost 35% of the phenotypic variation for maternal performance for offspring survival in this cross. This study allowed the identification of many possible candidate genes, as well as the relative size of gene effects and patterns of gene action affecting maternal performance in mice. Detailed behavior observation of mothers from later generations suggests that offspring survival in the first week is related to maternal success in building nests, grooming their pups, providing milk, and/or manifesting aggressive behavior against intruders. PMID:12454078

  6. Molecular and epidemiological characterisation of clinical isolates of carbapenem-resistant Acinetobacter baumannii from public and private sector intensive care units in Karachi, Pakistan.

    PubMed

    Irfan, S; Turton, J F; Mehraj, J; Siddiqui, S Z; Haider, S; Zafar, A; Memon, B; Afzal, O; Hasan, R

    2011-06-01

    The purpose of this study was to identify molecular and epidemiological characteristics of hospital-acquired carbapenem-resistant Acinetobacter baumannii (CRAB) from two different intensive care unit (ICU) settings in Karachi, Pakistan. A cross-sectional study was performed in the adult ICUs of a private sector tertiary care hospital (PS-ICU) and of a government sector hospital (GS-ICU) between November 2007 and August 2008. Deduplicated CRAB isolates from clinical specimens were examined for carbapenemase and class 1 integrase genes. Isolates were typed using sequence-based multiplex polymerase chain reaction, pulsed-field gel electrophoresis (PFGE) and variable number tandem repeat (VNTR). A total of 50 patients (33 from PS-ICU and 17 from GS-ICU) were recruited. There were statistically significant differences between patients in the two ICUs in terms of mean age, comorbidities, the presence of central venous pressure lines, urinary catheters, and average length of stay. bla(OxA-23-like) acquired-oxacillinase genes were found in 47/50 isolates. Class 1 integrase genes were found in 50% (25/50) of the organisms. The majority of isolates belonged to strains of European clones I and II. PFGE typing grouped the isolates into eight distinct clusters, three of which were found in both hospitals. Most of the isolates within each PFGE cluster shared identical or highly similar VNTR profiles, suggesting close epidemiological association. Irrespective of differences in risk factors and infection control policies and practices, the extent of clonality among CRAB isolates was very similar in both ICU settings. Copyright © 2011 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  7. Genome-wide association study of clinical dimensions of schizophrenia: polygenic effect on disorganized symptoms.

    PubMed

    Fanous, Ayman H; Zhou, Baiyu; Aggen, Steven H; Bergen, Sarah E; Amdur, Richard L; Duan, Jubao; Sanders, Alan R; Shi, Jianxin; Mowry, Bryan J; Olincy, Ann; Amin, Farooq; Cloninger, C Robert; Silverman, Jeremy M; Buccola, Nancy G; Byerley, William F; Black, Donald W; Freedman, Robert; Dudbridge, Frank; Holmans, Peter A; Ripke, Stephan; Gejman, Pablo V; Kendler, Kenneth S; Levinson, Douglas F

    2012-12-01

    Multiple sources of evidence suggest that genetic factors influence variation in clinical features of schizophrenia. The authors present the first genome-wide association study (GWAS) of dimensional symptom scores among individuals with schizophrenia. Based on the Lifetime Dimensions of Psychosis Scale ratings of 2,454 case subjects of European ancestry from the Molecular Genetics of Schizophrenia (MGS) sample, three symptom factors (positive, negative/disorganized, and mood) were identified with exploratory factor analysis. Quantitative scores for each factor from a confirmatory factor analysis were analyzed for association with 696,491 single-nucleotide polymorphisms (SNPs) using linear regression, with correction for age, sex, clinical site, and ancestry. Polygenic score analysis was carried out to determine whether case and comparison subjects in 16 Psychiatric GWAS Consortium (PGC) schizophrenia samples (excluding MGS samples) differed in scores computed by weighting their genotypes by MGS association test results for each symptom factor. No genome-wide significant associations were observed between SNPs and factor scores. Most of the SNPs producing the strongest evidence for association were in or near genes involved in neurodevelopment, neuroprotection, or neurotransmission, including genes playing a role in Mendelian CNS diseases, but no statistically significant effect was observed for any defined gene pathway. Finally, polygenic scores based on MGS GWAS results for the negative/disorganized factor were significantly different between case and comparison subjects in the PGC data set; for MGS subjects, negative/disorganized factor scores were correlated with polygenic scores generated using case-control GWAS results from the other PGC samples. The polygenic signal that has been observed in cross-sample analyses of schizophrenia GWAS data sets could be in part related to genetic effects on negative and disorganized symptoms (i.e., core features of chronic schizophrenia).

  8. Systematic assessment of cervical cancer initiation and progression uncovers genetic panels for deep learning-based early diagnosis and proposes novel diagnostic and prognostic biomarkers

    PubMed Central

    Long, Nguyen Phuoc; Jung, Kyung Hee; Yoon, Sang Jun; Anh, Nguyen Hoang; Nghi, Tran Diem; Kang, Yun Pyo; Yan, Hong Hua; Min, Jung Eun; Hong, Soon-Sun; Kwon, Sung Won

    2017-01-01

    Although many outstanding achievements in the management of cervical cancer (CxCa) have obtained, it still imposes a major burden which has prompted scientists to discover and validate new CxCa biomarkers to improve the diagnostic and prognostic assessment of CxCa. In this study, eight different gene expression data sets containing 202 cancer, 115 cervical intraepithelial neoplasia (CIN), and 105 normal samples were utilized for an integrative systems biology assessment in a multi-stage carcinogenesis manner. Deep learning-based diagnostic models were established based on the genetic panels of intrinsic genes of cervical carcinogenesis as well as on the unbiased variable selection approach. Survival analysis was also conducted to explore the potential biomarker candidates for prognostic assessment. Our results showed that cell cycle, RNA transport, mRNA surveillance, and one carbon pool by folate were the key regulatory mechanisms involved in the initiation, progression, and metastasis of CxCa. Various genetic panels combined with machine learning algorithms successfully differentiated CxCa from CIN and normalcy in cross-study normalized data sets. In particular, the 168-gene deep learning model for the differentiation of cancer from normalcy achieved an externally validated accuracy of 97.96% (99.01% sensitivity and 95.65% specificity). Survival analysis revealed that ZNF281 and EPHB6 were the two most promising prognostic genetic markers for CxCa among others. Our findings open new opportunities to enhance current understanding of the characteristics of CxCa pathobiology. In addition, the combination of transcriptomics-based signatures and deep learning classification may become an important approach to improve CxCa diagnosis and management in clinical practice. PMID:29312619

  9. Epistasis interaction of QTL effects as a genetic parameter influencing estimation of the genetic additive effect.

    PubMed

    Bocianowski, Jan

    2013-03-01

    Epistasis, an additive-by-additive interaction between quantitative trait loci, has been defined as a deviation from the sum of independent effects of individual genes. Epistasis between QTLs assayed in populations segregating for an entire genome has been found at a frequency close to that expected by chance alone. Recently, epistatic effects have been considered by many researchers as important for complex traits. In order to understand the genetic control of complex traits, it is necessary to clarify additive-by-additive interactions among genes. Herein we compare estimates of a parameter connected with the additive gene action calculated on the basis of two models: a model excluding epistasis and a model with additive-by-additive interaction effects. In this paper two data sets were analysed: 1) 150 barley doubled haploid lines derived from the Steptoe × Morex cross, and 2) 145 DH lines of barley obtained from the Harrington × TR306 cross. The results showed that in cases when the effect of epistasis was different from zero, the coefficient of determination was larger for the model with epistasis than for the one excluding epistasis. These results indicate that epistatic interaction plays an important role in controlling the expression of complex traits.

  10. A Comprehensive Analysis of Nuclear-Encoded Mitochondrial Genes in Schizophrenia.

    PubMed

    Gonçalves, Vanessa F; Cappi, Carolina; Hagen, Christian M; Sequeira, Adolfo; Vawter, Marquis P; Derkach, Andriy; Zai, Clement C; Hedley, Paula L; Bybjerg-Grauholm, Jonas; Pouget, Jennie G; Cuperfain, Ari B; Sullivan, Patrick F; Christiansen, Michael; Kennedy, James L; Sun, Lei

    2018-05-01

    The genetic risk factors of schizophrenia (SCZ), a severe psychiatric disorder, are not yet fully understood. Multiple lines of evidence suggest that mitochondrial dysfunction may play a role in SCZ, but comprehensive association studies are lacking. We hypothesized that variants in nuclear-encoded mitochondrial genes influence susceptibility to SCZ. We conducted gene-based and gene-set analyses using summary association results from the Psychiatric Genomics Consortium Schizophrenia Phase 2 (PGC-SCZ2) genome-wide association study comprising 35,476 cases and 46,839 control subjects. We applied the MAGMA method to three sets of nuclear-encoded mitochondrial genes: oxidative phosphorylation genes, other nuclear-encoded mitochondrial genes, and genes involved in nucleus-mitochondria crosstalk. Furthermore, we conducted a replication study using the iPSYCH SCZ sample of 2290 cases and 21,621 control subjects. In the PGC-SCZ2 sample, 1186 mitochondrial genes were analyzed, among which 159 had p values < .05 and 19 remained significant after multiple testing correction. A meta-analysis of 818 genes combining the PGC-SCZ2 and iPSYCH samples resulted in 104 nominally significant and nine significant genes, suggesting a polygenic model for the nuclear-encoded mitochondrial genes. Gene-set analysis, however, did not show significant results. In an in silico protein-protein interaction network analysis, 14 mitochondrial genes interacted directly with 158 SCZ risk genes identified in PGC-SCZ2 (permutation p = .02), and aldosterone signaling in epithelial cells and mitochondrial dysfunction pathways appeared to be overrepresented in this network of mitochondrial and SCZ risk genes. This study provides evidence that specific aspects of mitochondrial function may play a role in SCZ, but we did not observe its broad involvement even using a large sample. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  11. Fast and robust group-wise eQTL mapping using sparse graphical models.

    PubMed

    Cheng, Wei; Shi, Yu; Zhang, Xiang; Wang, Wei

    2015-01-16

    Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways. We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods. Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes.

  12. Two non-allelic nuclear genes restore fertility in a gametophytic pattern and enhance abiotic stress tolerance in the hybrid rice plant.

    PubMed

    Huang, Wenchao; Hu, Jun; Yu, Changchun; Huang, Qi; Wan, Lei; Wang, Lili; Qin, Xiaojian; Ji, Yanxiao; Zhu, Renshan; Li, Shaoqing; Zhu, Yingguo

    2012-03-01

    In indica rice, the HongLian (HL)-type combination of cytoplasmic male sterility (CMS) and fertility restoration (Rf) is widely used for the production of commercial hybrid seeds in China, Laos, Vietnam and other Southeast Asian countries. Generally, any member of the gametophytic fertility restoration system, 50% of the pollen in hybrid F(1) plants displays recovered sterility. In this study, however, a HL-type hybrid variety named HongLian You6 had approximately 75% normal (viable) pollen rather than the expected 50%. To resolve this discrepancy, several fertility segregation populations, including F(2) and BC(1)F(1) derived from the HL-CMS line Yuetai A crossed with the restorer line 9311, were constructed and subjected to genetic analysis. A gametophytic restoration model was discovered to involve two non-allelic nuclear restorer genes, Rf5 and Rf6. The Rf5 had been previously identified using a positional clone strategy. The Rf6 gene represents a new restorer gene locus, which was mapped to the short arm of chromosome 8. The hybrid F(1) plants containing one restorer gene, either Rf5 or Rf6, displayed 50% normal pollen grains with I(2)-KI solution; however, those with both Rf5 and Rf6 displayed 75% normal pollens. We also established that the hybrid F(1) plants including both non-allelic restorer genes exhibited an increased stable seed setting when subjected to stress versus the F(1) plants with only one restorer gene. Finally, we discuss the breeding scheme for the plant gametophytic CMS/Rf system.

  13. Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

    PubMed

    Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H

    2014-04-11

    Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.

  14. Genetic dissection of ethanol tolerance in the budding yeast Saccharomyces cerevisiae.

    PubMed

    Hu, X H; Wang, M H; Tan, T; Li, J R; Yang, H; Leach, L; Zhang, R M; Luo, Z W

    2007-03-01

    Uncovering genetic control of variation in ethanol tolerance in natural populations of yeast Saccharomyces cerevisiae is essential for understanding the evolution of fermentation, the dominant lifestyle of the species, and for improving efficiency of selection for strains with high ethanol tolerance, a character of great economic value for the brewing and biofuel industries. To date, as many as 251 genes have been predicted to be involved in influencing this character. Candidacy of these genes was determined from a tested phenotypic effect following gene knockout, from an induced change in gene function under an ethanol stress condition, or by mutagenesis. This article represents the first genomics approach for dissecting genetic variation in ethanol tolerance between two yeast strains with a highly divergent trait phenotype. We developed a simple but reliable experimental protocol for scoring the phenotype and a set of STR/SNP markers evenly covering the whole genome. We created a mapping population comprising 319 segregants from crossing the parental strains. On the basis of the data sets, we find that the tolerance trait has a high heritability and that additive genetic variance dominates genetic variation of the trait. Segregation at five QTL detected has explained approximately 50% of phenotypic variation; in particular, the major QTL mapped on yeast chromosome 9 has accounted for a quarter of the phenotypic variation. We integrated the QTL analysis with the predicted candidacy of ethanol resistance genes and found that only a few of these candidates fall in the QTL regions.

  15. Learning contextual gene set interaction networks of cancer with condition specificity

    PubMed Central

    2013-01-01

    Background Identifying similarities and differences in the molecular constitutions of various types of cancer is one of the key challenges in cancer research. The appearances of a cancer depend on complex molecular interactions, including gene regulatory networks and gene-environment interactions. This complexity makes it challenging to decipher the molecular origin of the cancer. In recent years, many studies reported methods to uncover heterogeneous depictions of complex cancers, which are often categorized into different subtypes. The challenge is to identify diverse molecular contexts within a cancer, to relate them to different subtypes, and to learn underlying molecular interactions specific to molecular contexts so that we can recommend context-specific treatment to patients. Results In this study, we describe a novel method to discern molecular interactions specific to certain molecular contexts. Unlike conventional approaches to build modular networks of individual genes, our focus is to identify cancer-generic and subtype-specific interactions between contextual gene sets, of which each gene set share coherent transcriptional patterns across a subset of samples, termed contextual gene set. We then apply a novel formulation for quantitating the effect of the samples from each subtype on the calculated strength of interactions observed. Two cancer data sets were analyzed to support the validity of condition-specificity of identified interactions. When compared to an existing approach, the proposed method was much more sensitive in identifying condition-specific interactions even in heterogeneous data set. The results also revealed that network components specific to different types of cancer are related to different biological functions than cancer-generic network components. We found not only the results that are consistent with previous studies, but also new hypotheses on the biological mechanisms specific to certain cancer types that warrant further investigations. Conclusions The analysis on the contextual gene sets and characterization of networks of interaction composed of these sets discovered distinct functional differences underlying various types of cancer. The results show that our method successfully reveals many subtype-specific regions in the identified maps of biological contexts, which well represent biological functions that can be connected to specific subtypes. PMID:23418942

  16. SZGR 2.0: a one-stop shop of schizophrenia candidate genes

    PubMed Central

    Jia, Peilin; Han, Guangchun; Zhao, Junfei; Lu, Pinyi; Zhao, Zhongming

    2017-01-01

    SZGR 2.0 is a comprehensive resource of candidate variants and genes for schizophrenia, covering genetic, epigenetic, transcriptomic, translational and many other types of evidence. By systematic review and curation of multiple lines of evidence, we included almost all variants and genes that have ever been reported to be associated with schizophrenia. In particular, we collected ∼4200 common variants reported in genome-wide association studies, ∼1000 de novo mutations discovered by large-scale sequencing of family samples, 215 genes spanning rare and replication copy number variations, 99 genes overlapping with linkage regions, 240 differentially expressed genes, 4651 differentially methylated genes and 49 genes as antipsychotic drug targets. To facilitate interpretation, we included various functional annotation data, especially brain eQTL, methylation QTL, brain expression featured in deep categorization of brain areas and developmental stages and brain-specific promoter and enhancer annotations. Furthermore, we conducted cross-study, cross-data type and integrative analyses of the multidimensional data deposited in SZGR 2.0, and made the data and results available through a user-friendly interface. In summary, SZGR 2.0 provides a one-stop shop of schizophrenia variants and genes and their function and regulation, providing an important resource in the schizophrenia and other mental disease community. SZGR 2.0 is available at https://bioinfo.uth.edu/SZGR/. PMID:27733502

  17. Association between the oxytocin receptor (OXTR) gene and autism: relationship to Vineland Adaptive Behavior Scales and cognition.

    PubMed

    Lerer, E; Levi, S; Salomon, S; Darvasi, A; Yirmiya, N; Ebstein, R P

    2008-10-01

    Evidence both from animal and human studies suggests that common polymorphisms in the oxytocin receptor (OXTR) gene are likely candidates to confer risk for autism spectrum disorders (ASD). In lower mammals, oxytocin is important in a wide range of social behaviors, and recent human studies have shown that administration of oxytocin modulates behavior in both clinical and non-clinical groups. Additionally, two linkage studies and two recent association investigations also underscore a possible role for the OXTR gene in predisposing to ASD. We undertook a comprehensive study of all 18 tagged SNPs across the entire OXTR gene region identified using HapMap data and the Haploview algorithm. Altogether 152 subjects diagnosed with ASDs (that is, DSM IV autistic disorder or pervasive developmental disorder--NOS) from 133 families were genotyped (parents and affected siblings). Both individual SNPs and haplotypes were tested for association using family-based association tests as provided in the UNPHASED set of programs. Significant association with single SNPs and haplotypes (global P-values <0.05, following permutation test adjustment) were observed with ASD. Association was also observed with IQ and the Vineland Adaptive Behavior Scales (VABS). In particular, a five-locus haplotype block (rs237897-rs13316193-rs237889-rs2254298-rs2268494) was significantly associated with ASD (nominal global P=0.000019; adjusted global P=0.009) and a single haplotype (carried by 7% of the population) within that block showed highly significant association (P=0.00005). This is the third association study, in a third ethnic group, showing that SNPs and haplotypes in the OXTR gene confer risk for ASD. The current investigation also shows association with IQ and total VABS scores (as well as the communication, daily living skills and socialization subdomains), suggesting that this gene shapes both cognition and daily living skills that may cross diagnostic boundaries.

  18. iSS-PC: Identifying Splicing Sites via Physical-Chemical Properties Using Deep Sparse Auto-Encoder.

    PubMed

    Xu, Zhao-Chun; Wang, Peng; Qiu, Wang-Ren; Xiao, Xuan

    2017-08-15

    Gene splicing is one of the most significant biological processes in eukaryotic gene expression, such as RNA splicing, which can cause a pre-mRNA to produce one or more mature messenger RNAs containing the coded information with multiple biological functions. Thus, identifying splicing sites in DNA/RNA sequences is significant for both the bio-medical research and the discovery of new drugs. However, it is expensive and time consuming based only on experimental technique, so new computational methods are needed. To identify the splice donor sites and splice acceptor sites accurately and quickly, a deep sparse auto-encoder model with two hidden layers, called iSS-PC, was constructed based on minimum error law, in which we incorporated twelve physical-chemical properties of the dinucleotides within DNA into PseDNC to formulate given sequence samples via a battery of cross-covariance and auto-covariance transformations. In this paper, five-fold cross-validation test results based on the same benchmark data-sets indicated that the new predictor remarkably outperformed the existing prediction methods in this field. Furthermore, it is expected that many other related problems can be also studied by this approach. To implement classification accurately and quickly, an easy-to-use web-server for identifying slicing sites has been established for free access at: http://www.jci-bioinfo.cn/iSS-PC.

  19. System analysis of metabolism and the transcriptome in Arabidopsis thaliana roots reveals differential co-regulation upon iron, sulfur and potassium deficiency.

    PubMed

    Forieri, Ilaria; Sticht, Carsten; Reichelt, Michael; Gretz, Norbert; Hawkesford, Malcolm J; Malagoli, Mario; Wirtz, Markus; Hell, Ruediger

    2017-01-01

    Deprivation of mineral nutrients causes significant retardation of plant growth. This retardation is associated with nutrient-specific and general stress-induced transcriptional responses. In this study, we adjusted the external supply of iron, potassium and sulfur to cause the same retardation of shoot growth. Nevertheless, limitation by individual nutrients resulted in specific morphological adaptations and distinct shifts within the root metabolite fingerprint. The metabolic shifts affected key metabolites of primary metabolism and the stress-related phytohormones, jasmonic, salicylic and abscisic acid. These phytohormone signatures contributed to specific nutrient deficiency-induced transcriptional regulation. Limitation by the micronutrient iron caused the strongest regulation and affected 18% of the root transcriptome. Only 130 genes were regulated by all nutrients. Specific co-regulation between the iron and sulfur metabolic routes upon iron or sulfur deficiency was observed. Interestingly, iron deficiency caused regulation of a different set of genes of the sulfur assimilation pathway compared with sulfur deficiency itself, which demonstrates the presence of specific signal-transduction systems for the cross-regulation of the pathways. Combined iron and sulfur starvation experiments demonstrated that a requirement for a specific nutrient can overrule this cross-regulation. The comparative metabolomics and transcriptomics approach used dissected general stress from nutrient-specific regulation in roots of Arabidopsis. © 2016 John Wiley & Sons Ltd.

  20. A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

    PubMed

    Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang

    2017-08-23

    Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.

  1. Cross-species chemogenomic profiling reveals evolutionarily conserved drug mode of action

    PubMed Central

    Kapitzky, Laura; Beltrao, Pedro; Berens, Theresa J; Gassner, Nadine; Zhou, Chunshui; Wüster, Arthur; Wu, Julie; Babu, M Madan; Elledge, Stephen J; Toczyski, David; Lokey, R Scott; Krogan, Nevan J

    2010-01-01

    We present a cross-species chemogenomic screening platform using libraries of haploid deletion mutants from two yeast species, Saccharomyces cerevisiae and Schizosaccharomyces pombe. We screened a set of compounds of known and unknown mode of action (MoA) and derived quantitative drug scores (or D-scores), identifying mutants that are either sensitive or resistant to particular compounds. We found that compound–functional module relationships are more conserved than individual compound–gene interactions between these two species. Furthermore, we observed that combining data from both species allows for more accurate prediction of MoA. Finally, using this platform, we identified a novel small molecule that acts as a DNA damaging agent and demonstrate that its MoA is conserved in human cells. PMID:21179023

  2. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    PubMed

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research.

  3. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach

    PubMed Central

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Introduction Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. Aim The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Methods Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate – adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Results Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. Conclusion To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research. PMID:26080057

  4. Uncovering the Role of BMP Signaling in Melanocyte Development and Melanoma Tumorigenesis

    DTIC Science & Technology

    2014-07-01

    clear that these mutations are not sufficient for melanoma formation and other genes are involved. Using genomic studies and cross -species...studies and cross -species comparisons to identify several candidates. One of these candidates, GDF6, is a BMP factor that is recurrently amplified...having to cross cell membranes. Antibodies, such as the VEGF blocker bevacizumab, epitomize this type of therapy. We are currently investigating the

  5. Conjugal transfer of aac(6')Ie-aph(2″)Ia gene from native species and mechanism of regulation and cross resistance in Enterococcus faecalis MCC3063 by real time-PCR.

    PubMed

    Jaimee, G; Halami, P M

    2017-09-01

    High level aminoglycoside resistance (HLAR) in the lactic acid bacteria (LAB) derived from food animals is detrimental. The aim of this study was to investigate the localization and conjugal transfer of aminoglycoside resistance genes, aac(6')Ie-aph(2″)Ia and aph(3')IIIa in different Enterococcus species. The cross resistance patterns in Enterococcus faecalis MCC3063 to clinically important aminoglycosides by real time PCR were also studied. Southern hybridization experiments revealed the presence of aac(6')Ie-aph(2 ″ )Ia and aph(3')IIIa genes conferring HLAR in high molecular weight plasmids except in Lactobacillus plantarum. The plasmid encoded bifunctional aac(6')Ie-aph(2″)Ia gene was transferable from Enterococcus avium (n = 2), E. cecorum (n = 1), E. faecalis (n = 1) and Pediococcus lolii (n = 1) species into the recipient strain; E. faecalis JH2-2 by filter mating experiments thus indicating the possible risks of gene transfer into pathogenic strains. Molecular analysis of cross resistance patterns in native isolate of E. faecalis MCC3063 carrying aac(6')Ie-aph(2″)Ia and aph(3')IIIa gene was displayed by quantification of the mRNA levels in this study. For this, the culture was induced with increasing concentrations of gentamicin, kanamycin and streptomycin (2048, 4096, 8192, 16384 μg/mL) individually. The increasing concentrations of gentamicin and kanamycin induced the expression of the aac(6')Ie-aph(2″)Ia and aph(3')IIIa resistance genes, respectively. Interestingly, it was observed that induction with streptomycin triggered a significant fold increase in the expression of the aph(3')IIIa gene which otherwise was not known to modify the aminoglycoside. This is noteworthy as streptomycin was found to confer cross resistance to structurally unrelated kanamycin. Also, expression of the aph(3')IIIa gene when induced with streptomycin, revealed that bacteria harbouring this gene will be able to overcome streptomycin bactericidal action at specific concentrations. HLAR in E. faecalis MCC3063 may be due to the combined expression of both the aac(6')Ie-aph(2″)Ia and aph(3')IIIa genes which could be therapeutically challenging. A combined expression of both the genes in E. faecalis MCC3063 may yield HLAR which could be therapeutically challenging. The study highlights the significant alterations in the mRNA expression levels of aac(6')Ie-aph(2″)Ia and aph(3')IIIa in resistant pathogens, upon exposure to clinically vital aminoglycosides. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  7. Single Nucleotide Polymorphism Markers for Genetic Mapping in Drosophila melanogaster

    PubMed Central

    Hoskins, Roger A.; Phan, Alexander C.; Naeemuddin, Mohammed; Mapa, Felipa A.; Ruddy, David A.; Ryan, Jessica J.; Young, Lynn M.; Wells, Trent; Kopczynski, Casey; Ellis, Michael C.

    2001-01-01

    For nearly a century, genetic analysis in Drosophila melanogaster has been a powerful tool for analyzing gene function, yet Drosophila lacks the molecular genetic mapping tools that recently have revolutionized human, mouse, and plant genetics. Here, we describe the systematic characterization of a dense set of molecular markers in Drosophila by using a sequence tagged site-based physical map of the genome. We identify 474 biallelic markers in standard laboratory strains of Drosophila that span the genome. Most of these markers are single nucleotide polymorphisms and sequences for these variants are provided in an accessible format. The average density of the new markers is one per 225 kb on the autosomes and one per megabase on the X chromosome. We include in this survey a set of P-element strains that provide additional use for high-resolution mapping. We show one application of the new markers in a simple set of crosses to map a mutation in the hedgehog gene to an interval of <1 Mb. This new map resource significantly increases the efficiency and resolution of recombination mapping and will be of immediate value to the Drosophila research community. PMID:11381036

  8. Single nucleotide polymorphism markers for genetic mapping in Drosophila melanogaster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoskins, Roger A.; Phan, Alexander C.; Naeemuddin, Mohammed

    2001-04-16

    For nearly a century, genetic analysis in Drosophila melanogaster has been a powerful tool for analyzing gene function, yet Drosophila lacks the molecular genetic mapping tools that have recently revolutionized human, mouse and plant genetics. Here, we describe the systematic characterization of a dense set of molecular markers in Drosophila using an STS-based physical map of the genome. We identify 474 biallelic markers in standard laboratory strains of Drosophila that the genome. The majority of these markers are single nucleotide polymorphisms (SNPs) and sequences for these variants are provided in an accessible format. The average density of the new markersmore » is 1 marker per 225 kb on the autosomes and 1 marker per 1 Mb on the X chromosome. We include in this survey a set of P-element strains that provide additional utility for high-resolution mapping. We demonstrate one application of the new markers in a simple set of crosses to map a mutation in the hedgehog gene to an interval of <1 Mb. This new map resource significantly increases the efficiency and resolution of recombination mapping and will be of immediate value to the Drosophila research community.« less

  9. Investigation of exomic variants associated with overall survival in ovarian cancer

    PubMed Central

    Ann Chen, Yian; Larson, Melissa C; Fogarty, Zachary C; Earp, Madalene A; Anton-Culver, Hoda; Bandera, Elisa V; Cramer, Daniel; Doherty, Jennifer A; Goodman, Marc T; Gronwald, Jacek; Karlan, Beth Y; Kjaer, Susanne K; Levine, Douglas A; Menon, Usha; Ness, Roberta B; Pearce, Celeste L; Pejovic, Tanja; Rossing, Mary Anne; Wentzensen, Nicolas; Bean, Yukie T; Bisogna, Maria; Brinton, Louise A; Carney, Michael E; Cunningham, Julie M; Cybulski, Cezary; deFazio, Anna; Dicks, Ed M; Edwards, Robert P; Gayther, Simon A; Gentry-Maharaj, Aleksandra; Gore, Martin; Iversen, Edwin S; Jensen, Allan; Johnatty, Sharon E; Lester, Jenny; Lin, Hui-Yi; Lissowska, Jolanta; Lubinski, Jan; Menkiszak, Janusz; Modugno, Francesmary; Moysich, Kirsten B; Orlow, Irene; Pike, Malcolm C; Ramus, Susan J; Song, Honglin; Terry, Kathryn L; Thompson, Pamela J; Tyrer, Jonathan P; van den Berg, David J; Vierkant, Robert A; Vitonis, Allison F; Walsh, Christine; Wilkens, Lynne R; Wu, Anna H; Yang, Hannah; Ziogas, Argyrios; Berchuck, Andrew; Chenevix-Trench, Georgia; Schildkraut, Joellen M; Permuth-Wey, Jennifer; Phelan, Catherine M; Pharoah, Paul D P; Fridley, Brooke L

    2016-01-01

    Background While numerous susceptibility loci for epithelial ovarian cancer (EOC) have been identified, few associations have been reported with overall survival. In the absence of common prognostic genetic markers, we hypothesize that rare coding variants may be associated with overall EOC survival and assessed their contribution in two exome-based genotyping projects of the Ovarian Cancer Association Consortium (OCAC). Methods The primary patient set (Set 1) included 14 independent EOC studies (4293 patients) and 227,892 variants, and a secondary patient set (Set 2) included six additional EOC studies (1744 patients) and 114,620 variants. Because power to detect rare variants individually is reduced, gene-level tests were conducted. Sets were analyzed separately at individual variants and by gene, and then combined with meta-analyses (73,203 variants and 13,163 genes overlapped). Results No individual variant reached genome-wide statistical significance. A SNP previously implicated to be associated with EOC risk and, to a lesser extent, survival, rs8170, showed the strongest evidence of association with survival and similar effect size estimates across sets (Pmeta=1.1E-6, HRSet1=1.17, HRSet2=1.14). Rare variants in ATG2B, an autophagy gene important for apoptosis, were significantly associated with survival after multiple testing correction (Pmeta=1.1E-6; Pcorrected=0.01). Conclusions Common variant rs8170 and rare variants in ATG2B may be associated with EOC overall survival, although further study is needed. Impact This study represents the first exome-wide association study of EOC survival to include rare variant analyses, and suggests that complementary single variant and gene-level analyses in large studies are needed to identify rare variants that warrant follow-up study. PMID:26747452

  10. Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data.

    PubMed

    Tintle, Nathan L; Sitarik, Alexandra; Boerema, Benjamin; Young, Kylie; Best, Aaron A; Dejongh, Matthew

    2012-08-08

    Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.

  11. Comparative genomic analysis of SET domain family reveals the origin, expansion, and putative function of the arthropod-specific SmydA genes as histone modifiers in insects.

    PubMed

    Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling; Wang, Xianhui; Kang, Le

    2017-06-01

    The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain-containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. © The Authors 2017. Published by Oxford University Press.

  12. Comparative genomic analysis of SET domain family reveals the origin, expansion, and putative function of the arthropod-specific SmydA genes as histone modifiers in insects

    PubMed Central

    Jiang, Feng; Liu, Qing; Wang, Yanli; Zhang, Jie; Wang, Huimin; Song, Tianqi; Yang, Meiling

    2017-01-01

    Abstract The SET domain is an evolutionarily conserved motif present in histone lysine methyltransferases, which are important in the regulation of chromatin and gene expression in animals. In this study, we searched for SET domain–containing genes (SET genes) in all of the 147 arthropod genomes sequenced at the time of carrying out this experiment to understand the evolutionary history by which SET domains have evolved in insects. Phylogenetic and ancestral state reconstruction analysis revealed an arthropod-specific SET gene family, named SmydA, that is ancestral to arthropod animals and specifically diversified during insect evolution. Considering that pseudogenization is the most probable fate of the new emerging gene copies, we provided experimental and evolutionary evidence to demonstrate their essential functions. Fluorescence in situ hybridization analysis and in vitro methyltransferase activity assays showed that the SmydA-2 gene was transcriptionally active and retained the original histone methylation activity. Expression knockdown by RNA interference significantly increased mortality, implying that the SmydA genes may be essential for insect survival. We further showed predominantly strong purifying selection on the SmydA gene family and a potential association between the regulation of gene expression and insect phenotypic plasticity by transcriptome analysis. Overall, these data suggest that the SmydA gene family retains essential functions that may possibly define novel regulatory pathways in insects. This work provides insights into the roles of lineage-specific domain duplication in insect evolution. PMID:28444351

  13. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  14. Polygenic overlap between schizophrenia risk and antipsychotic response: a genomic medicine approach

    PubMed Central

    Ruderfer, Douglas M; Charney, Alexander W; Readhead, Ben; Kidd, Brian A; Kähler, Anna K; Kenny, Paul J; Keiser, Michael J; Moran, Jennifer L; Hultman, Christina M; Scott, Stuart A; Sullivan, Patrick F; Purcell, Shaun M; Dudley, Joel T; Sklar, Pamela

    2016-01-01

    Summary Background Therapeutic treatments for schizophrenia do not alleviate symptoms for all patients and efficacy is limited by common, often severe, side-effects. Genetic studies of disease can identify novel drug targets, and drugs for which the mechanism has direct genetic support have increased likelihood of clinical success. Large-scale genetic studies of schizophrenia have increased the number of genes and gene sets associated with risk. We aimed to examine the overlap between schizophrenia risk loci and gene targets of a comprehensive set of medications to potentially inform and improve treatment of schizophrenia. Methods We defined schizophrenia risk loci as genomic regions reaching genome-wide significance in the latest Psychiatric Genomics Consortium schizophrenia genome-wide association study (GWAS) of 36 989 cases and 113 075 controls and loss of function variants observed only once among 5079 individuals in an exome-sequencing study of 2536 schizophrenia cases and 2543 controls (Swedish Schizophrenia Study). Using two large and orthogonally created databases, we collated drug targets into 167 gene sets targeted by pharmacologically similar drugs and examined enrichment of schizophrenia risk loci in these sets. We further linked the exome-sequenced data with a national drug registry (the Swedish Prescribed Drug Register) to assess the contribution of rare variants to treatment response, using clozapine prescription as a proxy for treatment resistance. Findings We combined results from testing rare and common variation and, after correction for multiple testing, two gene sets were associated with schizophrenia risk: agents against amoebiasis and other protozoal diseases (106 genes, p=0·00046, pcorrected =0·024) and antipsychotics (347 genes, p=0·00078, pcorrected=0·046). Further analysis pointed to antipsychotics as having independent enrichment after removing genes that overlapped these two target sets. We noted significant enrichment both in known targets of antipsychotics (70 genes, p=0·0078) and novel predicted targets (277 genes, p=0·019). Patients with treatment-resistant schizophrenia had an excess of rare disruptive variants in gene targets of antipsychotics (347 genes, p=0·0067) and in genes with evidence for a role in antipsychotic efficacy (91 genes, p=0·0029). Interpretation Our results support genetic overlap between schizophrenia pathogenesis and antipsychotic mechanism of action. This finding is consistent with treatment efficacy being polygenic and suggests that single-target therapeutics might be insufficient. We provide evidence of a role for rare functional variants in antipsychotic treatment response, pointing to a subset of patients where their genetic information could inform treatment. Finally, we present a novel framework for identifying treatments from genetic data and improving our understanding of therapeutic mechanism. PMID:26915512

  15. Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting.

    PubMed

    Zhao, Wei; Ware, Erin B; He, Zihuai; Kardia, Sharon L R; Faul, Jessica D; Smith, Jennifer A

    2017-09-29

    Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) ( p = 0.07).

  16. Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting

    PubMed Central

    Zhao, Wei; He, Zihuai; Kardia, Sharon L. R.; Faul, Jessica D.

    2017-01-01

    Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) (p = 0.07). PMID:28961216

  17. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  18. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  19. Evaluation of gene expression classification studies: factors associated with classification performance.

    PubMed

    Novianti, Putri W; Roes, Kit C B; Eijkemans, Marinus J C

    2014-01-01

    Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was affected by factors such as the classification algorithm, cross validation method, number of genes, and gene selection method. In this paper, we study the hypothesis that the disease under study significantly determines which method is optimal, and that additionally sample size, class imbalance, type of medical question (diagnostic, prognostic or treatment response), and microarray platform are potentially influential. A systematic literature review was used to extract the information from 48 published articles on non-cancer microarray classification studies. The impact of the various factors on the reported classification accuracy was analyzed through random-intercept logistic regression. The type of medical question and method of cross validation dominated the explained variation in accuracy among studies, followed by disease category and microarray platform. In total, 42% of the between study variation was explained by all the study specific and problem specific factors that we studied together.

  20. Network-based integration of GWAS and gene expression identifies a HOX-centric network associated with serous ovarian cancer risk

    PubMed Central

    Kar, Siddhartha P.; Tyrer, Jonathan P.; Li, Qiyuan; Lawrenson, Kate; Aben, Katja K.H.; Anton-Culver, Hoda; Antonenkova, Natalia; Chenevix-Trench, Georgia; Baker, Helen; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Berchuck, Andrew; Bisogna, Maria; Bjørge, Line; Bogdanova, Natalia; Brinton, Louise; Brooks-Wilson, Angela; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Chen, Yian Ann; Chen, Zhihua; Cook, Linda S.; Cramer, Daniel; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; Dennis, Joe; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; du Bois, Andreas; Dürst, Matthias; Eccles, Diana; Easton, Douglas F.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goode, Ellen L.; Goodman, Marc T.; Grownwald, Jacek; Harrington, Patricia; Harter, Philipp; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Estrid; Hogdall, Claus K.; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Paul, James; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Kjaer, Susanne K.; Kelemen, Linda E.; Kellar, Melissa; Kelley, Joseph; Kiemeney, Lambertus A.; Krakstad, Camilla; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Iain A.; Menon, Usha; Modugno, Francesmary; Moysich, Kirsten B.; Narod, Steven A.; Nedergaard, Lotte; Ness, Roberta B.; Nevanlinna, Heli; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Pearce, Celeste Leigh; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Phelan, Catherine M.; Pike, Malcolm C.; Poole, Elizabeth M.; Ramus, Susan J.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schildkraut, Joellen M.; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Sucheston-Campbell, Lara E.; Tangen, Ingvild L.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J; Timorek, Agnieszka; Tsai, Ya-Yu; Tworoger, Shelley S.; van Altena, Anne M.; Van Nieuwenhuysen, Els; Vergote, Ignace; Vierkant, Robert A.; Wang-Gohrke, Shan; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Woo, Yin-Ling; Wu, Xifeng; Wu, Anna; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Sellers, Thomas A.; Monteiro, Alvaro N. A.; Freedman, Matthew L.; Gayther, Simon A.; Pharoah, Paul D. P.

    2015-01-01

    Background Genome-wide association studies (GWAS) have so far reported 12 loci associated with serous epithelial ovarian cancer (EOC) risk. We hypothesized that some of these loci function through nearby transcription factor (TF) genes and that putative target genes of these TFs as identified by co-expression may also be enriched for additional EOC risk associations. Methods We selected TF genes within 1 Mb of the top signal at the 12 genome-wide significant risk loci. Mutual information, a form of correlation, was used to build networks of genes strongly co-expressed with each selected TF gene in the unified microarray data set of 489 serous EOC tumors from The Cancer Genome Atlas. Genes represented in this data set were subsequently ranked using a gene-level test based on results for germline SNPs from a serous EOC GWAS meta-analysis (2,196 cases/4,396 controls). Results Gene set enrichment analysis identified six networks centered on TF genes (HOXB2, HOXB5, HOXB6, HOXB7 at 17q21.32 and HOXD1, HOXD3 at 2q31) that were significantly enriched for genes from the risk-associated end of the ranked list (P<0.05 and FDR<0.05). These results were replicated (P<0.05) using an independent association study (7,035 cases/21,693 controls). Genes underlying enrichment in the six networks were pooled into a combined network. Conclusion We identified a HOX-centric network associated with serous EOC risk containing several genes with known or emerging roles in serous EOC development. Impact Network analysis integrating large, context-specific data sets has the potential to offer mechanistic insights into cancer susceptibility and prioritize genes for experimental characterization. PMID:26209509

  1. Profiling of Genes Related to Cross Protection and Competition for NbTOM1 by HLSV and TMV

    PubMed Central

    Wen, Yi; Lim, Grace Xiao-Yun; Wong, Sek-Man

    2013-01-01

    Cross protection is the phenomenon through which a mild strain virus suppresses symptoms induced by a closely related severe strain virus in infected plants. Hibiscus latent Singapore virus (HLSV) and Tobacco mosaic virus (TMV) are species within the genus tobamovirus. HLSV can protect Nicotiana benthamiana against TMV-U1 strain, resulting in mild symptoms instead of severe systemic necrosis. The mechanism of cross protection between HLSV and TMV is unknown. In the past, some researchers suggest that the protecting virus strain might occupy virus-specific replication sites within a cell leaving no room for the challenge virus. Quantitative real-time RT-PCR was performed to detect viral RNA levels during cross protection. HLSV accumulation increased in cross protected plants compared with that of single HLSV infected plants, while TMV decreased in cross protected plants. This suggests that there is a competition for host factors between HLSV and TMV for replication. To investigate the mechanism under the cross protection between HLSV and TMV, microarray analysis was conducted to examine the transcriptional levels of global host genes during cross protection, using Tobacco Gene Expression Microarray, 4x44 k slides. The transcriptional level of some host genes corresponded to accumulation level of TMV. Some host genes were up-regulated only by HLSV. Tobamovirus multiplication gene 1 (TOM1), essential for tobamovirus multiplication, was involved in competition for replication by HLSV and TMV during cross protection. Both HLSV and TMV accumulation decreased when NbTOM1 was silenced. A large quantity of HLSV resulted in decreased TMV accumulation in HLSV+TMV (100:1) co-infection. These results indicate that host genes involved in the plant defense response and virus multiplication are up-regulated by challenge virus TMV but not by protecting virus HLSV during cross protection. PMID:24023899

  2. Filling gaps in PPAR-alpha signaling through comparative nutrigenomics analysis.

    PubMed

    Cavalieri, Duccio; Calura, Enrica; Romualdi, Chiara; Marchi, Emmanuela; Radonjic, Marijana; Van Ommen, Ben; Müller, Michael

    2009-12-11

    The application of high-throughput genomic tools in nutrition research is a widespread practice. However, it is becoming increasingly clear that the outcome of individual expression studies is insufficient for the comprehensive understanding of such a complex field. Currently, the availability of the large amounts of expression data in public repositories has opened up new challenges on microarray data analyses. We have focused on PPARalpha, a ligand-activated transcription factor functioning as fatty acid sensor controlling the gene expression regulation of a large set of genes in various metabolic organs such as liver, small intestine or heart. The function of PPARalpha is strictly connected to the function of its target genes and, although many of these have already been identified, major elements of its physiological function remain to be uncovered. To further investigate the function of PPARalpha, we have applied a cross-species meta-analysis approach to integrate sixteen microarray datasets studying high fat diet and PPARalpha signal perturbations in different organisms. We identified 164 genes (MDEGs) that were differentially expressed in a constant way in response to a high fat diet or to perturbations in PPARs signalling. In particular, we found five genes in yeast which were highly conserved and homologous of PPARalpha targets in mammals, potential candidates to be used as models for the equivalent mammalian genes. Moreover, a screening of the MDEGs for all known transcription factor binding sites and the comparison with a human genome-wide screening of Peroxisome Proliferating Response Elements (PPRE), enabled us to identify, 20 new potential candidate genes that show, both binding site, both change in expression in the condition studied. Lastly, we found a non random localization of the differentially expressed genes in the genome. The results presented are potentially of great interest to resume the currently available expression data, exploiting the power of in silico analysis filtered by evolutionary conservation. The analysis enabled us to indicate potential gene candidates that could fill in the gaps with regards to the signalling of PPARalpha and, moreover, the non-random localization of the differentially expressed genes in the genome, suggest that epigenetic mechanisms are of importance in the regulation of the transcription operated by PPARalpha.

  3. Prevalence of the Pro12Ala missense mutation in the PPARG2 gene in Kuwaiti patients with primary knee osteoarthritis

    PubMed Central

    Al-Jarallah, Khaled F.; Shehab, Diaa K.; Haider, Mohammad Z.

    2011-01-01

    BACKGROUND AND OBJECTIVES: Peroxisome proliferator–activated receptors (PPARs) play an important role in a number of cellular and metabolic functions. This study was carried out to determine the prevalence of a missense mutation (Pro12Ala) in the PPARG2 gene in Kuwaiti Arab patients with primary knee osteoarthritis (OA) and healthy controls with the aim of identifying a possible association. DESIGN AND SETTING: A prospective cross-sectional study carried out at three major teaching hospitals (referral centers) in the country over a one-year period. PATIENTS AND METHODS: The prevalence of PPARG2 gene Pro12Ala missense mutation was determined in 104 Kuwaiti Arab patients with primary knee OA and 111 ethnically matched healthy controls. The prevalence of this Pro12Ala missense mutation was also determined in clinical subgroups of OA patients divided on the basis of age at onset, function and radiologic grading. RESULTS The Pro-Pro genotype of the PPARG2 gene Pro12Ala missense mutation was detected in 95/104 (91.3%) cases compared to 111/111 (100%) in the control subjects. The heterozygous Pro-Ala genotype was detected in 9/104 (8.7%) of the OA patients, while it was not detected in any of the controls. The Ala-Ala genotype was not detected in any of the OA patients or the controls. No significant differences were detected in the PPARG2 gene Pro12Ala genotypes in the subgroups of patients classified on the basis of age at onset, functional assessment using Lequesne’s functional index, and radiological grading using Kellgren-Lawrence (K-L) grading. CONCLUSIONS This study found no significant association between the PPARG2 gene Pro12Ala missense mutation and knee OA. However, the presence of the Pro-Pro genotype of the PPARG2 gene mutation has a protective effect against development of OA. PMID:21245597

  4. A complex of serine protease genes expressed preferentially in cytotoxic T-lymphocytes is closely linked to the T-cell receptor alpha- and delta-chain genes on mouse chromosome 14.

    PubMed

    Crosby, J L; Bleackley, R C; Nadeau, J H

    1990-02-01

    A complex of genes encoding serine proteases that are preferentially expressed in cytotoxic T-cells was shown to be closely linked to the T-cell receptor alpha- and delta-chain genes on mouse chromosome 14. A striking difference in recombination frequencies among linkage crosses was reported. Two genes, Np-1 and Tcra, which fail to recombine in crosses involving conventional strains of mice, were shown to recombine readily in interspecific crosses involving Mus spretus. This difference in recombination frequency suggests chromosomal rearrangements that suppress recombination in conventional crosses, recombination hot spots in interspecific crosses, or selection against recombinant haplotypes during development of recombinant inbred strains. Finally, a mutation called disorganization, which is located near the serine protease complex, is of considerable interest because it causes an extraordinarily wide variety of congenital defects. Because of the involvement of serine protease loci in several homeotic mutations in Drosophila, disorganization must be considered a candidate for a mutation in a serine protease-encoding gene.

  5. Metatranscriptomic Study of Common and Host-Specific Patterns of Gene Expression between Pines and Their Symbiotic Ectomycorrhizal Fungi in the Genus Suillus

    PubMed Central

    Liao, Hui-Ling; Chen, Yuan; Vilgalys, Rytas

    2016-01-01

    Ectomycorrhizal fungi (EMF) represent one of the major guilds of symbiotic fungi associated with roots of forest trees, where they function to improve plant nutrition and fitness in exchange for plant carbon. Many groups of EMF exhibit preference or specificity for different plant host genera; a good example is the genus Suillus, which grows in association with the conifer family Pinaceae. We investigated genetics of EMF host-specificity by cross-inoculating basidiospores of five species of Suillus onto ten species of Pinus, and screened them for their ability to form ectomycorrhizae. Several Suillus spp. including S. granulatus, S. spraguei, and S. americanus readily formed ectomycorrhizae (compatible reaction) with white pine hosts (subgenus Strobus), but were incompatible with other pine hosts (subgenus Pinus). Metatranscriptomic analysis of inoculated roots reveals that plant and fungus each express unique gene sets during incompatible vs. compatible pairings. The Suillus-Pinus metatranscriptomes utilize highly conserved gene regulatory pathways, including fungal G-protein signaling, secretory pathways, leucine-rich repeat and pathogen resistance proteins that are similar to those associated with host-pathogen interactions in other plant-fungal systems. Metatranscriptomic study of the combined Suillus-Pinus transcriptome has provided new insight into mechanisms of adaptation and coevolution of forest trees with their microbial community, and revealed that genetic regulation of ectomycorrhizal symbiosis utilizes universal gene regulatory pathways used by other types of fungal-plant interactions including pathogenic fungal-host interactions. PMID:27736883

  6. Functional and evolutionary insights from the Ciona notochord transcriptome.

    PubMed

    Reeves, Wendy M; Wu, Yuye; Harder, Matthew J; Veeman, Michael T

    2017-09-15

    The notochord of the ascidian Ciona consists of only 40 cells, and is a longstanding model for studying organogenesis in a small, simple embryo. Here, we perform RNAseq on flow-sorted notochord cells from multiple stages to define a comprehensive Ciona notochord transcriptome. We identify 1364 genes with enriched expression and extensively validate the results by in situ hybridization. These genes are highly enriched for Gene Ontology terms related to the extracellular matrix, cell adhesion and cytoskeleton. Orthologs of 112 of the Ciona notochord genes have known notochord expression in vertebrates, more than twice as many as predicted by chance alone. This set of putative effector genes with notochord expression conserved from tunicates to vertebrates will be invaluable for testing hypotheses about notochord evolution. The full set of Ciona notochord genes provides a foundation for systems-level studies of notochord gene regulation and morphogenesis. We find only modest overlap between this set of notochord-enriched transcripts and the genes upregulated by ectopic expression of the key notochord transcription factor Brachyury, indicating that Brachyury is not a notochord master regulator gene as strictly defined. © 2017. Published by The Company of Biologists Ltd.

  7. About miRNAs, miRNA seeds, target genes and target pathways.

    PubMed

    Kehl, Tim; Backes, Christina; Kern, Fabian; Fehlmann, Tobias; Ludwig, Nicole; Meese, Eckart; Lenhof, Hans-Peter; Keller, Andreas

    2017-12-05

    miRNAs are typically repressing gene expression by binding to the 3' UTR, leading to degradation of the mRNA. This process is dominated by the eight-base seed region of the miRNA. Further, miRNAs are known not only to target genes but also to target significant parts of pathways. A logical line of thoughts is: miRNAs with similar (seed) sequence target similar sets of genes and thus similar sets of pathways. By calculating similarity scores for all 3.25 million pairs of 2,550 human miRNAs, we found that this pattern frequently holds, while we also observed exceptions. Respective results were obtained for both, predicted target genes as well as experimentally validated targets. We note that miRNAs target gene set similarity follows a bimodal distribution, pointing at a set of 282 miRNAs that seems to target genes with very high specificity. Further, we discuss miRNAs with different (seed) sequences that nonetheless regulate similar gene sets or pathways. Most intriguingly, we found miRNA pairs that regulate different gene sets but similar pathways such as miR-6886-5p and miR-3529-5p. These are jointly targeting different parts of the MAPK signaling cascade. The main goal of this study is to provide a general overview on the results, to highlight a selection of relevant results on miRNAs, miRNA seeds, target genes and target pathways and to raise awareness for artifacts in respective comparisons. The full set of information that allows to infer detailed results on each miRNA has been included in miRPathDB, the miRNA target pathway database (https://mpd.bioinf.uni-sb.de).

  8. Human chromosome Y and SRY.

    PubMed

    Shah, V C; Smart, V

    1996-01-01

    The precise location of the SRY gene on the human Y chromosome has been revealed through studies of sex reversal cases involving deletion, cross-linking and mutations of the SRY gene. Its DNA sequence and mechanism of action are being understood. Similarity of SRY with Sry of mice and its interaction with other genes in male sex determination are discussed.

  9. Molecular characterization of the PR-toxin gene cluster in Penicillium roqueforti and Penicillium chrysogenum: cross talk of secondary metabolite pathways.

    PubMed

    Hidalgo, Pedro I; Ullán, Ricardo V; Albillos, Silvia M; Montero, Olimpio; Fernández-Bodega, María Ángeles; García-Estrada, Carlos; Fernández-Aguado, Marta; Martín, Juan-Francisco

    2014-01-01

    The PR-toxin is a potent mycotoxin produced by Penicillium roqueforti in moulded grains and grass silages and may contaminate blue-veined cheese. The PR-toxin derives from the 15 carbon atoms sesquiterpene aristolochene formed by the aristolochene synthase (encoded by ari1). We have cloned and sequenced a four gene cluster that includes the ari1 gene from P. roqueforti. Gene silencing of each of the four genes (named prx1 to prx4) resulted in a reduction of 65-75% in the production of PR-toxin indicating that the four genes encode enzymes involved in PR-toxin biosynthesis. Interestingly the four silenced mutants overproduce large amounts of mycophenolic acid, an antitumor compound formed by an unrelated pathway suggesting a cross-talk of PR-toxin and mycophenolic acid production. An eleven gene cluster that includes the above mentioned four prx genes and a 14-TMS drug/H(+) antiporter was found in the genome of Penicillium chrysogenum. This eleven gene cluster has been reported to be very poorly expressed in a transcriptomic study of P. chrysogenum genes under conditions of penicillin production (strongly aerated cultures). We found that this apparently silent gene cluster is able to produce PR-toxin in P. chrysogenum under static culture conditions on hydrated rice medium. Noteworthily, the production of PR-toxin was 2.6-fold higher in P. chrysogenum npe10, a strain deleted in the 56.8kb amplifiable region containing the pen gene cluster, than in the parental strain Wisconsin 54-1255 providing another example of cross-talk between secondary metabolite pathways in this fungus. A detailed PR-toxin biosynthesis pathway is proposed based on all available evidence. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Glutamatergic and GABAergic gene sets in attention-deficit/hyperactivity disorder: association to overlapping traits in ADHD and autism.

    PubMed

    Naaijen, J; Bralten, J; Poelmans, G; Glennon, J C; Franke, B; Buitelaar, J K

    2017-01-10

    Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD.

  11. Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits.

    PubMed

    Rudra, Pratyaydipta; Broadaway, K Alaine; Ware, Erin B; Jhun, Min A; Bielak, Lawrence F; Zhao, Wei; Smith, Jennifer A; Peyser, Patricia A; Kardia, Sharon L R; Epstein, Michael P; Ghosh, Debashis

    2018-06-01

    Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next-generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross-phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare-variant approaches exist for testing cross-phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross-phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome-wide scale due to the use of a closed-form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy. © 2018 WILEY PERIODICALS, INC.

  12. Transgenic Suppression of AGAMOUS Genes in Apple Reduces Fertility and Increases Floral Attractiveness

    PubMed Central

    Klocko, Amy L.; Borejsza-Wysocka, Ewa; Brunner, Amy M.; Shevchenko, Olga; Aldwinckle, Herb; Strauss, Steven H.

    2016-01-01

    We investigated the ability of RNA interference (RNAi) directed against two co-orthologs of AGAMOUS (AG) from Malus domestica (domestic apple, MdAG) to reduce the risks of invasiveness and provide genetic containment of transgenes, while also promoting the attractiveness of flowers for ornamental usage. Suppression of two MdAG-like genes, MdMADS15 and MdMADS22, led to the production of trees with highly showy, polypetalous flowers. These “double-flowers” had strongly reduced expression of both MdAG-like genes. Members of the two other clades within in the MdAG subfamily showed mild to moderate differences in gene expression, or were unchanged, with the level of suppression approximately proportional to the level of sequence identity between the gene analyzed and the RNAi fragment. The double-flowers also exhibited reduced male and female fertility, had few viable pollen grains, a decreased number of stigmas, and produced few viable seeds after cross-pollination. Despite these floral alterations, RNAi-AG trees with double-flowers set full-sized fruit. Suppression or mutation of apple AG-like genes appears to be a promising method for combining genetic containment with improved floral attractiveness. PMID:27500731

  13. Gene expression in WAT from healthy humans and monkeys correlates with FGF21-induced browning of WAT in mice.

    PubMed

    Schlessinger, Karni; Li, Wenyu; Tan, Yejun; Liu, Franklin; Souza, Sandra C; Tozzo, Effie; Liu, Kevin; Thompson, John R; Wang, Liangsu; Muise, Eric S

    2015-09-01

    Identify a gene expression signature in white adipose tissue (WAT) that reports on WAT browning and is associated with a healthy phenotype. RNA from several different adipose depots across three species were analyzed by whole transcriptome profiling, including 1) mouse subcutaneous white fat, brown fat, and white fat after in vivo treatment with FGF21; 2) human subcutaneous and omental fat from insulin-sensitive and insulin-resistant patients; and 3) rhesus monkey subcutaneous fat from healthy and dysmetabolic individuals. A "browning" signature in mice was identified by cross-referencing the FGF21-induced signature in WAT with the brown adipose tissue (BAT) vs. WAT comparison. In addition, gene expression levels in WAT from insulin-sensitive/healthy vs. insulin-resistant/dysmetabolic humans and rhesus monkeys, respectively, correlated with the gene expression levels in mouse BAT vs. WAT. A subset of 49 genes were identified that were consistently regulated or differentially expressed in the mouse and human data sets that could be used to monitor browning of WAT across species. Gene expression profiles of WATs from healthy insulin-sensitive individuals correlate with those of BAT and FGF21-induced browning of WAT. © 2015 The Obesity Society.

  14. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  15. GARNET--gene set analysis with exploration of annotation relations.

    PubMed

    Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

    2011-02-15

    Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).

  16. Validation of reference genes aiming accurate normalization of qRT-PCR data in Dendrocalamus latiflorus Munro.

    PubMed

    Liu, Mingying; Jiang, Jing; Han, Xiaojiao; Qiao, Guirong; Zhuo, Renying

    2014-01-01

    Dendrocalamus latiflorus Munro distributes widely in subtropical areas and plays vital roles as valuable natural resources. The transcriptome sequencing for D. latiflorus Munro has been performed and numerous genes especially those predicted to be unique to D. latiflorus Munro were revealed. qRT-PCR has become a feasible approach to uncover gene expression profiling, and the accuracy and reliability of the results obtained depends upon the proper selection of stable reference genes for accurate normalization. Therefore, a set of suitable internal controls should be validated for D. latiflorus Munro. In this report, twelve candidate reference genes were selected and the assessment of gene expression stability was performed in ten tissue samples and four leaf samples from seedlings and anther-regenerated plants of different ploidy. The PCR amplification efficiency was estimated, and the candidate genes were ranked according to their expression stability using three software packages: geNorm, NormFinder and Bestkeeper. GAPDH and EF1α were characterized to be the most stable genes among different tissues or in all the sample pools, while CYP showed low expression stability. RPL3 had the optimal performance among four leaf samples. The application of verified reference genes was illustrated by analyzing ferritin and laccase expression profiles among different experimental sets. The analysis revealed the biological variation in ferritin and laccase transcript expression among the tissues studied and the individual plants. geNorm, NormFinder, and BestKeeper analyses recommended different suitable reference gene(s) for normalization according to the experimental sets. GAPDH and EF1α had the highest expression stability across different tissues and RPL3 for the other sample set. This study emphasizes the importance of validating superior reference genes for qRT-PCR analysis to accurately normalize gene expression of D. latiflorus Munro.

  17. Suppression and restoration of male fertility using a transcription factor.

    PubMed

    Li, Song Feng; Iacuone, Sylvana; Parish, Roger W

    2007-03-01

    The Arabidopsis AtMYB103 gene codes for an R2R3 MYB domain protein whose expression is restricted to the tapetum of developing anthers and to trichomes. Down-regulation of expression using anti-sense leads to abnormal tapetum and pollen development, although seed setting still occurs (Higginson, T., Li, S.F. and Parish, R.W. (2003) AtMYB103 regulates tapetum and trichome development in Arabidopsis thaliana. Plant J. 35, 177-192). In this study, we show that blocking the function of the AtMYB103 gene, employing either an insertion mutant or an AtMYB103EAR chimeric repressor construct under the control of the AtMYB103 promoter, results in complete male sterility and failure to set seed. These plants exhibit similar abnormalities in tapetum and pollen development, with the tapetum becoming highly vacuolated at early stages and degenerating prematurely. No exine is deposited on to the pollen wall. The degeneration of pollen grains commences prior to pollen mitosis, the pollen collapsing and largely lacking cytoplasmic content. A restorer containing the AtMYB103 gene under the control of a stronger anther-specific promoter was introduced into pollen donor plants and crossed into the male sterile plants transgenic for the repressor. The male fertility of F1 plants was restored. The chimeric repressor and the restorer constitute a reversible male sterility system which could be adapted for hybrid seed production. This is the first reversible male sterility system targeting a transcription factor essential for pollen development. Strategies for generating inducible male sterility and maintainable male sterility for the production of hybrid crops are discussed.

  18. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

    PubMed

    Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

    2017-09-04

    Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.

  19. Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression.

    PubMed

    Lexer, C; Fay, M F; Joseph, J A; Nica, M-S; Heinze, B

    2005-04-01

    The renewed interest in the use of hybrid zones for studying speciation calls for the identification and study of hybrid zones across a wide range of organisms, especially in long-lived taxa for which it is often difficult to generate interpopulation variation through controlled crosses. Here, we report on the extent and direction of introgression between two members of the "model tree" genus Populus: Populus alba (white poplar) and Populus tremula (European aspen), across a large zone of sympatry located in the Danube valley. We genotyped 93 hybrid morphotypes and samples from four parental reference populations from within and outside the zone of sympatry for a genome-wide set of 20 nuclear microsatellites and eight plastid DNA restriction site polymorphisms. Our results indicate that introgression occurs preferentially from P. tremula to P. alba via P. tremula pollen. This unidirectional pattern is facilitated by high levels of pollen vs. seed dispersal in P. tremula (pollen/seed flow = 23.9) and by great ecological opportunity in the lowland floodplain forest in proximity to P. alba seed parents, which maintains gene flow in the direction of P. alba despite smaller effective population sizes (N(e)) in this species (P. alba N(e)c. 500-550; P. tremula N(e)c. 550-700). Our results indicate that hybrid zones will be valuable tools for studying the genetic architecture of the barrier to gene flow between these two ecologically divergent Populus species.

  20. Downslope coarsening in aeolian grainflows of the Navajo Sandstone

    NASA Astrophysics Data System (ADS)

    Loope, David B.; Elder, James F.; Sweeney, Mark R.

    2012-07-01

    Downslope coarsening in grainflows has been observed on present-day dunes and generated in labs, but few previous studies have examined vertical sorting in ancient aeolian grainflows. We studied the grainflow strata of the Jurassic Navajo Sandstone in the southern Utah portion of its outcrop belt from Zion National Park (west) to Coyote Buttes and The Dive (east). At each study site, thick sets of grainflow-dominated cross-strata that were deposited by large transverse dunes comprise the bulk of the Navajo Sandstone. We studied three stratigraphic columns, one per site, composed almost exclusively of aeolian cross-strata. For each column, samples were obtained from one grainflow stratum in each consecutive set of the column, for a total of 139 samples from thirty-two sets of cross-strata. To investigate grading perpendicular to bedding within individual grainflows, we collected fourteen samples from four superimposed grainflow strata at The Dive. Samples were analyzed with a Malvern Mastersizer 2000 laser diffraction particle analyser. The median grain size of grainflow samples ranges from fine sand (164 μm) to coarse sand (617 μm). Using Folk and Ward criteria, samples are well-sorted to moderately-well-sorted. All but one of the twenty-eight sets showed at least slight downslope coarsening, but in general, downslope coarsening was not as well-developed or as consistent as that reported in laboratory subaqueous grainflows. Because coarse sand should be quickly sequestered within preserved cross-strata when bedforms climb, grain-size studies may help to test hypotheses for the stacking of sets of cross-strata.

  1. A Survey for Novel Imprinted Genes in the Mouse Placenta by mRNA-seq

    PubMed Central

    Wang, Xu; Soloway, Paul D.; Clark, Andrew G.

    2011-01-01

    Many questions about the regulation, functional specialization, computational prediction, and evolution of genomic imprinting would be better addressed by having an exhaustive genome-wide catalog of genes that display parent-of-origin differential expression. As a first-pass scan for novel imprinted genes, we performed mRNA-seq experiments on embryonic day 17.5 (E17.5) mouse placenta cDNA samples from reciprocal cross F1 progeny of AKR and PWD mouse strains and quantified the allele-specific expression and the degree of parent-of-origin allelic imbalance. We confirmed the imprinting status of 23 known imprinted genes in the placenta and found that 12 genes reported previously to be imprinted in other tissues are also imprinted in mouse placenta. Through a well-replicated design using an orthogonal allelic-expression technology, we verified 5 novel imprinted genes that were not previously known to be imprinted in mouse (Pde10, Phf17, Phactr2, Zfp64, and Htra3). Our data suggest that most of the strongly imprinted genes have already been identified, at least in the placenta, and that evidence supports perhaps 100 additional weakly imprinted genes. Despite previous appearance that the placenta tends to display an excess of maternally expressed imprinted genes, with the addition of our validated set of placenta-imprinted genes, this maternal bias has disappeared. PMID:21705755

  2. Estimation of gene induction enables a relevance-based ranking of gene sets.

    PubMed

    Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens

    2009-07-01

    In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.

  3. Nightlife Violence: A Gender-Specific View on Risk Factors for Violence in Nightlife Settings--A Cross-Sectional Study in Nine European Countries

    ERIC Educational Resources Information Center

    Schnitzer, Susanne; Bellis, Mark A.; Anderson, Zara; Hughes, Karen; Calafat, Amador; Juan, Montse; Kokkevi, Anna

    2010-01-01

    Within nightlife settings, youth violence places large burdens on both nightlife users and wider society. Internationally, research has identified risk factors for nightlife violence. However, few empirical studies have assessed differences in risk factors between genders. Here, a pan-European cross-sectional survey of 1,341 nightlife users aged…

  4. A Cross-Age Study of the Understanding of Three Genetic Concepts: How Do They Image the Gene, DNA and Chromosome?

    ERIC Educational Resources Information Center

    Saka, Arzu; Cerrah, Lale; Akdeniz, Ali Riza; Ayas, Alipasa

    2006-01-01

    The study was carried out with 175 Turkish students by using drawings at different ages understanding of gene, DNA and chromosome concepts. Students from 8th, 9th, 11th grades and, science and biology student teachers were simply asked to draw the structure of gene, DNA and chromosome in a cell and also to give explanations about these three…

  5. The Gene Set Builder: collation, curation, and distribution of sets of genes

    PubMed Central

    Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W

    2005-01-01

    Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of genes in a user-friendly environment. The application can be accessed via . PMID:16371163

  6. Comparison of Conventional PCR, Multiplex PCR, and Loop-Mediated Isothermal Amplification Assays for Rapid Detection of Arcobacter Species

    PubMed Central

    Wang, Xiaoyu; Seo, Dong Joo; Lee, Min Hwa

    2014-01-01

    This study aimed to develop a loop-mediated isothermal amplification (LAMP) method for the rapid detection of Arcobacter species. Specific primers targeting the 23S ribosomal RNA gene were used to detect Arcobacter butzleri, Arcobacter cryaerophilus, and Arcobacter skirrowii. The specificity of the LAMP primer set was assessed using DNA samples from a panel of Arcobacter and Campylobacter species, and the sensitivity was determined using serial dilutions of Arcobacter species cultures. LAMP showed a 10- to 1,000-fold-higher sensitivity than multiplex PCR, with a detection limit of 2 to 20 CFU per reaction in vitro. Whereas multiplex PCR showed cross-reactivity with Campylobacter species, the LAMP method developed in this study was more sensitive and reliable than conventional PCR or multiplex PCR for the detection of Arcobacter species. PMID:24478488

  7. Long-distance gene flow and cross-Andean dispersal of lowland rainforest bees (Apidae: Euglossini) revealed by comparative mitochondrial DNA phylogeography.

    PubMed

    Dick, Christopher W; Roubik, David W; Gruber, Karl F; Bermingham, Eldredge

    2004-12-01

    Euglossine bees (Apidae; Euglossini) exclusively pollinate hundreds of orchid species and comprise up to 25% of bee species richness in neotropical rainforests. As one of the first studies of comparative phylogeography in a neotropical insect group, we performed a mitochondrial DNA (mtDNA)-based analysis of 14 euglossine species represented by populations sampled across the Andes and/or across the Amazon basin. The mtDNA divergences within species were consistently low; across the 12 monophyletic species the mean intraspecific divergence among haplotypes was 0.9% (range of means, 0-1.9%). The cytochrome oxidase 1 (CO1) divergence among populations separated by the Andes (N = 11 species) averaged 1.1% (range 0.0-2.0%). The mtDNA CO1 data set displayed homogeneous rates of nucleotide substitution, permitting us to infer dispersal across the cordillera long after the final Andean uplift based on arthropod molecular clocks of 1.2-1.5% divergence per million years. Gene flow across the 3000-km breadth of the Amazon basin was inferred from identical cross-Amazon haplotypes found in five species. Although mtDNA haplotypes for 12 of the 14 euglossine species were monophyletic, a reticulate CO1 phylogeny was recovered in Euglossa cognata and E. mixta, suggesting large ancestral populations and recent speciation. Reference to closely related outgroups suggested recent speciation for the majority of species. Phylogeographical structure across a broad spatial scale is weaker in euglossine bees than in any neotropical group previously examined, and may derive from a combination of Quaternary speciation, population expansion and/or long-distance gene flow.

  8. Sexual Polyploidization in Medicago sativa L.: Impact on the Phenotype, Gene Transcription, and Genome Methylation

    PubMed Central

    Rosellini, Daniele; Ferradini, Nicoletta; Allegrucci, Stefano; Capomaccio, Stefano; Zago, Elisa Debora; Leonetti, Paola; Balech, Bachir; Aversano, Riccardo; Carputo, Domenico; Reale, Lara; Veronesi, Fabio

    2016-01-01

    Polyploidization as the consequence of 2n gamete formation is a prominent mechanism in plant evolution. Studying its effects on the genome, and on genome expression, has both basic and applied interest. We crossed two diploid (2n = 2x = 16) Medicago sativa plants, a subsp. falcata seed parent, and a coerulea × falcata pollen parent that form a mixture of n and 2n eggs and pollen, respectively. Such a cross produced full-sib diploid and tetraploid (2n = 4x = 32) hybrids, the latter being the result of bilateral sexual polyploidization (BSP). These unique materials allowed us to investigate the effects of BSP, and to separate the effect of intraspecific hybridization from those of polyploidization by comparing 2x with 4x full sib progeny plants. Simple sequence repeat marker segregation demonstrated tetrasomic inheritance for all chromosomes but one, demonstrating that these neotetraploids are true autotetraploids. BSP brought about increased biomass, earlier flowering, higher seed set and weight, and larger leaves with larger cells. Microarray analyses with M. truncatula gene chips showed that several hundred genes, related to diverse metabolic functions, changed their expression level as a consequence of polyploidization. In addition, cytosine methylation increased in 2x, but not in 4x, hybrids. Our results indicate that sexual polyploidization induces significant transcriptional novelty, possibly mediated in part by DNA methylation, and phenotypic novelty that could underpin improved adaptation and reproductive success of tetraploid M. sativa with respect to its diploid progenitor. These polyploidy-induced changes may have promoted the adoption of tetraploid alfalfa in agriculture. PMID:26858330

  9. Sexual Polyploidization in Medicago sativa L.: Impact on the Phenotype, Gene Transcription, and Genome Methylation.

    PubMed

    Rosellini, Daniele; Ferradini, Nicoletta; Allegrucci, Stefano; Capomaccio, Stefano; Zago, Elisa Debora; Leonetti, Paola; Balech, Bachir; Aversano, Riccardo; Carputo, Domenico; Reale, Lara; Veronesi, Fabio

    2016-04-07

    Polyploidization as the consequence of 2n gamete formation is a prominent mechanism in plant evolution. Studying its effects on the genome, and on genome expression, has both basic and applied interest. We crossed two diploid (2n = 2x = 16) Medicago sativa plants, a subsp. falcata seed parent, and a coerulea × falcata pollen parent that form a mixture of n and 2n eggs and pollen, respectively. Such a cross produced full-sib diploid and tetraploid (2n = 4x = 32) hybrids, the latter being the result of bilateral sexual polyploidization (BSP). These unique materials allowed us to investigate the effects of BSP, and to separate the effect of intraspecific hybridization from those of polyploidization by comparing 2x with 4x full sib progeny plants. Simple sequence repeat marker segregation demonstrated tetrasomic inheritance for all chromosomes but one, demonstrating that these neotetraploids are true autotetraploids. BSP brought about increased biomass, earlier flowering, higher seed set and weight, and larger leaves with larger cells. Microarray analyses with M. truncatula gene chips showed that several hundred genes, related to diverse metabolic functions, changed their expression level as a consequence of polyploidization. In addition, cytosine methylation increased in 2x, but not in 4x, hybrids. Our results indicate that sexual polyploidization induces significant transcriptional novelty, possibly mediated in part by DNA methylation, and phenotypic novelty that could underpin improved adaptation and reproductive success of tetraploid M. sativa with respect to its diploid progenitor. These polyploidy-induced changes may have promoted the adoption of tetraploid alfalfa in agriculture. Copyright © 2016 Rosellini et al.

  10. Modeling Host Genetic Regulation of Influenza Pathogenesis in the Collaborative Cross

    PubMed Central

    Ferris, Martin T.; Aylor, David L.; Bottomly, Daniel; Whitmore, Alan C.; Aicher, Lauri D.; Bell, Timothy A.; Bradel-Tretheway, Birgit; Bryan, Janine T.; Buus, Ryan J.; Gralinski, Lisa E.; Haagmans, Bart L.; McMillan, Leonard; Miller, Darla R.; Rosenzweig, Elizabeth; Valdar, William; Wang, Jeremy; Churchill, Gary A.; Threadgill, David W.; McWeeney, Shannon K.; Katze, Michael G.; Pardo-Manuel de Villena, Fernando; Baric, Ralph S.; Heise, Mark T.

    2013-01-01

    Genetic variation contributes to host responses and outcomes following infection by influenza A virus or other viral infections. Yet narrow windows of disease symptoms and confounding environmental factors have made it difficult to identify polymorphic genes that contribute to differential disease outcomes in human populations. Therefore, to control for these confounding environmental variables in a system that models the levels of genetic diversity found in outbred populations such as humans, we used incipient lines of the highly genetically diverse Collaborative Cross (CC) recombinant inbred (RI) panel (the pre-CC population) to study how genetic variation impacts influenza associated disease across a genetically diverse population. A wide range of variation in influenza disease related phenotypes including virus replication, virus-induced inflammation, and weight loss was observed. Many of the disease associated phenotypes were correlated, with viral replication and virus-induced inflammation being predictors of virus-induced weight loss. Despite these correlations, pre-CC mice with unique and novel disease phenotype combinations were observed. We also identified sets of transcripts (modules) that were correlated with aspects of disease. In order to identify how host genetic polymorphisms contribute to the observed variation in disease, we conducted quantitative trait loci (QTL) mapping. We identified several QTL contributing to specific aspects of the host response including virus-induced weight loss, titer, pulmonary edema, neutrophil recruitment to the airways, and transcriptional expression. Existing whole-genome sequence data was applied to identify high priority candidate genes within QTL regions. A key host response QTL was located at the site of the known anti-influenza Mx1 gene. We sequenced the coding regions of Mx1 in the eight CC founder strains, and identified a novel Mx1 allele that showed reduced ability to inhibit viral replication, while maintaining protection from weight loss. PMID:23468633

  11. Massive increase, spread, and exchange of extended spectrum β-lactamase-encoding genes among intestinal Enterobacteriaceae in hospitalized children with severe acute malnutrition in Niger.

    PubMed

    Woerther, Paul-Louis; Angebault, Cécile; Jacquier, Hervé; Hugede, Henri-Charles; Janssens, Ann-Carole; Sayadi, Sani; El Mniai, Assiya; Armand-Lefèvre, Laurence; Ruppé, Etienne; Barbier, François; Raskine, Laurent; Page, Anne-Laure; de Rekeneire, Nathalie; Andremont, Antoine

    2011-10-01

    From the time of CTX-M emergence, extended-spectrum β-lactamase-producing enterobacteria (ESBL-E) have spread worldwide in community settings as well as in hospitals, particularly in developing countries. Although their dissemination appears linked to Escherichia coli intestinal carriage, precise paths of this dynamic are largely unknown. Children from a pediatric renutrition center were prospectively enrolled in a fecal carriage study. Antibiotic exposure was recorded. ESBL-E strains were isolated using selective media from fecal samples obtained at admission and, when negative, also at discharge. ESBL-encoding genes were identified, their environments and plasmids were characterized, and clonality was assessed with polymerase chain reaction-based methods and pulsed-field gel electrophoresis for E. coli and Klebsiella pneumoniae. E. coli strains were subjected to multilocus sequence typing. The ESBL-E carriage rate was 31% at admission in the 55 children enrolled. All children enrolled received antibiotics during hospitalization. Among the ESBL-E-negative children, 16 were resampled at discharge, and the acquisition rate was 94%. The bla(CTX-M-15) gene was found in >90% of the carriers. Genetic environments and plasmid characterization evidenced the roles of a worldwide, previously described, multidrug-resistant region and of IncF plasmids in CTX-M-15 E. coli dissemination. Diversity of CTX-M-15-carrying genetic structures and clonality of acquired ESBL E. coli suggested horizontal genetic transfer and underlined the potential of some ST types for nosocomial cross-transmission. Cross-transmission and high selective pressure lead to very high acquisition of ESBL-E carriage, contributing to dissemination in the community. Strict hygiene measures as well as careful balancing of benefit-risk ratio of current antibiotic policies need to be reevaluated.

  12. EqualTDRL: illustrating equivalent tandem duplication random loss rearrangements.

    PubMed

    Hartmann, Tom; Bernt, Matthias; Middendorf, Martin

    2018-05-30

    To study the differences between two unichromosomal circular genomes, e.g., mitochondrial genomes, under the tandem duplication random loss (TDRL) rearrangement it is important to consider the whole set of potential TDRL rearrangement events that could have taken place. The reason is that for two given circular gene orders there can exist different TDRL rearrangements that transform one of the gene orders into the other. Hence, a TDRL event cannot always be reconstructed only from the knowledge of the circular gene order before a TDRL event and the circular gene order after it. We present the program EqualTDRL that computes and illustrates the complete set of TDRLs for pairs of circular gene orders that differ by only one TDRL. EqualTDRL considers the circularity of the given genomes and certain restrictions on the TDRL rearrangements. Examples for the latter are sequences of genes that have to be conserved during a TDRL or pairs of genes that frame intergenic regions which might represent remnants of duplicated genes. Additionally, EqualTDRL allows to determine the set of TDRLs that are minimum with respect to the number of duplicated genes. EqualTDRL supports scientists to study the complete set of TDRLs that possibly could have taken place in the evolution of mitochondrial genomes. EqualTDRL is implemented in C++ using the ggplot2 package of the open source programming language R and is freely available from http://pacosy.informatik.uni-leipzig.de/equaltdrl .

  13. Are duplicated genes responsible for anthracnose resistance in common bean?

    PubMed

    Costa, Larissa Carvalho; Nalin, Rafael Storto; Ramalho, Magno Antonio Patto; de Souza, Elaine Aparecida

    2017-01-01

    The race 65 of Colletotrichum lindemuthianum, etiologic agent of anthracnose in common bean, is distributed worldwide, having great importance in breeding programs for anthracnose resistance. Several resistance alleles have been identified promoting resistance to this race. However, the variability that has been detected within race has made it difficult to obtain cultivars with durable resistance, because cultivars may have different reactions to each strain of race 65. Thus, this work aimed at studying the resistance inheritance of common bean lines to different strains of C. lindemuthianum, race 65. We used six C. lindemuthianum strains previously characterized as belonging to the race 65 through the international set of differential cultivars of anthracnose and nine commercial cultivars, adapted to the Brazilian growing conditions and with potential ability to discriminate the variability within this race. To obtain information on the resistance inheritance related to nine commercial cultivars to six strains of race 65, these cultivars were crossed two by two in all possible combinations, resulting in 36 hybrids. Segregation in the F2 generations revealed that the resistance to each strain is conditioned by two independent genes with the same function, suggesting that they are duplicated genes, where the dominant allele promotes resistance. These results indicate that the specificity between host resistance genes and pathogen avirulence genes is not limited to races, it also occurs within strains of the same race. Further research may be carried out in order to establish if the alleles identified in these cultivars are different from those described in the literature.

  14. Co-regulation analysis of co-expressed modules under cold and pathogen stress conditions in tomato.

    PubMed

    Abedini, Davar; Rashidi Monfared, Sajad

    2018-06-01

    A primary mechanism for controlling the development of multicellular organisms is transcriptional regulation, which carried out by transcription factors (TFs) that recognize and bind to their binding sites on promoter region. The distance from translation start site, order, orientation, and spacing between cis elements are key factors in the concentration of active nuclear TFs and transcriptional regulation of target genes. In this study, overrepresented motifs in cold and pathogenesis responsive genes were scanned via Gibbs sampling method, this method is based on detection of overrepresented motifs by means of a stochastic optimization strategy that searches for all possible sets of short DNA segments. Then, identified motifs were checked by TRANSFAC, PLACE and Soft Berry databases in order to identify putative TFs which, interact to the motifs. Several cis/trans regulatory elements were found using these databases. Moreover, cross-talk between cold and pathogenesis responsive genes were confirmed. Statistical analysis was used to determine distribution of identified motifs on promoter region. In addition, co-regulation analysis results, illustrated genes in pathogenesis responsive module are divided into two main groups. Also, promoter region was crunched to six subareas in order to draw the pattern of distribution of motifs in promoter subareas. The result showed the majority of motifs are concentrated on 700 nucleotides upstream of the translational start site (ATG). In contrast, this result isn't true in another group. In other words, there was no difference between total and compartmentalized regions in cold responsive genes.

  15. Expression of PAM50 Genes in Lung Cancer: Evidence that Interactions between Hormone Receptors and HER2/HER3 Contribute to Poor Outcome.

    PubMed

    Siegfried, Jill M; Lin, Yan; Diergaarde, Brenda; Lin, Hui-Min; Dacic, Sanja; Pennathur, Arjun; Weissfeld, Joel L; Romkes, Marjorie; Nukui, Tomoko; Stabile, Laura P

    2015-11-01

    Non-small cell lung cancers (NSCLCs) frequently express estrogen receptor (ER) β, and estrogen signaling is active in many lung tumors. We investigated the ability of genes contained in the prediction analysis of microarray 50 (PAM50) breast cancer risk predictor gene signature to provide prognostic information in NSCLC. Supervised principal component analysis of mRNA expression data was used to evaluate the ability of the PAM50 panel to provide prognostic information in a stage I NSCLC cohort, in an all-stage NSCLC cohort, and in The Cancer Genome Atlas data. Immunohistochemistry was used to determine status of ERβ and other proteins in lung tumor tissue. Associations with prognosis were observed in the stage I cohort. Cross-validation identified seven genes that, when analyzed together, consistently showed survival associations. In pathway analysis, the seven-gene panel described one network containing the ER and progesterone receptor, as well as human epidermal growth factor receptor (HER)2/HER3 and neuregulin-1. NSCLC cases also showed a significant association between ERβ and HER2 protein expression. Cases positive for HER2 expression were more likely to express HER3, and ERβ-positive cases were less likely to be both HER2 and HER3 negative. Prognostic ability of genes in the PAM50 panel was verified in an ERβ-positive cohort representing all NSCLC stages. In The Cancer Genome Atlas data sets, the PAM50 gene set was prognostic in both adenocarcinoma and squamous cell carcinoma, whereas the seven-gene panel was prognostic only in squamous cell carcinoma. Genes in the PAM50 panel, including those linking ER and HER2, identify lung cancer patients at risk for poor outcome, especially among ERβ-positive cases and squamous cell carcinoma. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.

    PubMed

    Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi

    2015-04-22

    Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.

  17. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  18. Prediction of gene expression in embryonic structures of Drosophila melanogaster.

    PubMed

    Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis

    2007-07-01

    Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.

  19. Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster

    PubMed Central

    Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis

    2007-01-01

    Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945

  20. SiBIC: a web server for generating gene set networks based on biclusters obtained by maximal frequent itemset mining.

    PubMed

    Takahashi, Kei-ichiro; Takigawa, Ichigaku; Mamitsuka, Hiroshi

    2013-01-01

    Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set assigned to one node has coexpressed genes. We evaluated each step of this procedure: 1) significance of the generated biclusters biologically and statistically, 2) biological quality of merged biclusters, and 3) biological significance of gene set networks. We emphasize that gene set networks, in which nodes are not genes but gene sets, can be more compact than usual gene networks, meaning that gene set networks are more comprehensible. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/miami/faces/index.jsp.

  1. Use of homologous and heterologous gene expression profiling tools to characterize transcription dynamics during apple fruit maturation and ripening

    PubMed Central

    2010-01-01

    Background Fruit development, maturation and ripening consists of a complex series of biochemical and physiological changes that in climacteric fruits, including apple and tomato, are coordinated by the gaseous hormone ethylene. These changes lead to final fruit quality and understanding of the functional machinery underlying these processes is of both biological and practical importance. To date many reports have been made on the analysis of gene expression in apple. In this study we focused our investigation on the role of ethylene during apple maturation, specifically comparing transcriptomics of normal ripening with changes resulting from application of the hormone receptor competitor 1-Methylcyclopropene. Results To gain insight into the molecular process regulating ripening in apple, and to compare to tomato (model species for ripening studies), we utilized both homologous and heterologous (tomato) microarray to profile transcriptome dynamics of genes involved in fruit development and ripening, emphasizing those which are ethylene regulated. The use of both types of microarrays facilitated transcriptome comparison between apple and tomato (for the later using data previously published and available at the TED: tomato expression database) and highlighted genes conserved during ripening of both species, which in turn represent a foundation for further comparative genomic studies. The cross-species analysis had the secondary aim of examining the efficiency of heterologous (specifically tomato) microarray hybridization for candidate gene identification as related to the ripening process. The resulting transcriptomics data revealed coordinated gene expression during fruit ripening of a subset of ripening-related and ethylene responsive genes, further facilitating the analysis of ethylene response during fruit maturation and ripening. Conclusion Our combined strategy based on microarray hybridization enabled transcriptome characterization during normal climacteric apple ripening, as well as definition of ethylene-dependent transcriptome changes. Comparison with tomato fruit maturation and ethylene responsive transcriptome activity facilitated identification of putative conserved orthologous ripening-related genes, which serve as an initial set of candidates for assessing conservation of gene activity across genomes of fruit bearing plant species. PMID:20973957

  2. Genome-wide transcriptome study in wheat identified candidate genes related to processing quality, majority of them showing interaction (quality x development) and having temporal and spatial distributions.

    PubMed

    Singh, Anuradha; Mantri, Shrikant; Sharma, Monica; Chaudhury, Ashok; Tuli, Rakesh; Roy, Joy

    2014-01-16

    The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers.

  3. Genome-wide transcriptome study in wheat identified candidate genes related to processing quality, majority of them showing interaction (quality x development) and having temporal and spatial distributions

    PubMed Central

    2014-01-01

    Background The cultivated bread wheat (Triticum aestivum L.) possesses unique flour quality, which can be processed into many end-use food products such as bread, pasta, chapatti (unleavened flat bread), biscuit, etc. The present wheat varieties require improvement in processing quality to meet the increasing demand of better quality food products. However, processing quality is very complex and controlled by many genes, which have not been completely explored. To identify the candidate genes whose expressions changed due to variation in processing quality and interaction (quality x development), genome-wide transcriptome studies were performed in two sets of diverse Indian wheat varieties differing for chapatti quality. It is also important to understand the temporal and spatial distributions of their expressions for designing tissue and growth specific functional genomics experiments. Results Gene-specific two-way ANOVA analysis of expression of about 55 K transcripts in two diverse sets of Indian wheat varieties for chapatti quality at three seed developmental stages identified 236 differentially expressed probe sets (10-fold). Out of 236, 110 probe sets were identified for chapatti quality. Many processing quality related key genes such as glutenin and gliadins, puroindolines, grain softness protein, alpha and beta amylases, proteases, were identified, and many other candidate genes related to cellular and molecular functions were also identified. The ANOVA analysis revealed that the expression of 56 of 110 probe sets was involved in interaction (quality x development). Majority of the probe sets showed differential expression at early stage of seed development i.e. temporal expression. Meta-analysis revealed that the majority of the genes expressed in one or a few growth stages indicating spatial distribution of their expressions. The differential expressions of a few candidate genes such as pre-alpha/beta-gliadin and gamma gliadin were validated by RT-PCR. Therefore, this study identified several quality related key genes including many other genes, their interactions (quality x development) and temporal and spatial distributions. Conclusions The candidate genes identified for processing quality and information on temporal and spatial distributions of their expressions would be useful for designing wheat improvement programs for processing quality either by changing their expression or development of single nucleotide polymorphisms (SNPs) markers. PMID:24433256

  4. A Self-Assembled Coumarin-Anchored Dendrimer for Efficient Gene Delivery and Light-Responsive Drug Delivery.

    PubMed

    Wang, Hui; Miao, Wujun; Wang, Fei; Cheng, Yiyun

    2018-06-11

    The assembly of low molecular weight polymers into highly efficient and nontoxic nanostructures has broad applicability in gene delivery. In this study, we reported the assembly of coumarin-anchored low generation dendrimers in aqueous solution via hydrophobic interactions. The synthesized material showed significantly improved DNA binding and gene delivery, and minimal toxicity on the transfected cells. Moreover, the coumarin moieties in the assembled nanostructures endow the materials with light-responsive drug delivery behaviors. The coumarin substitutes in the assembled nanostructures were cross-linked with each other upon irradiation at 365 nm, and the cross-linked assemblies were degraded upon further irradiation at 254 nm. As a result, the drug-loaded nanoparticle showed a light-responsive drug release behavior and light-enhanced anticancer activity. The assembled nanoparticle also exhibited a complementary anticancer activity through the codelivery of 5-fluorouracil and a therapeutic gene encoding tumor necrosis factor-related apoptosis-inducing ligand (TRAIL). This study provided a facile strategy to develop light-responsive polymers for the codelivery of therapeutic genes and anticancer drugs.

  5. Accurate prediction of secondary metabolite gene clusters in filamentous fungi.

    PubMed

    Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H

    2013-01-02

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.

  6. Genetic and environmental factors affecting allergen-related gene expression in apple fruit (Malus domestica L. Borkh).

    PubMed

    Botton, Alessandro; Lezzer, Paolo; Dorigoni, Alberto; Barcaccia, Gianni; Ruperti, Benedetto; Ramina, Angelo

    2008-08-13

    Freshly consumed apples can cause allergic reactions because of the presence of four classes of allergens, namely, Mal d 1, Mal d 2, Mal d 3, and Mal d 4, and their cross-reactivity with sensitizing allergens of other species. Knowledge of environmental and endogenous factors affecting the allergenic potential of apples would provide important information to apple breeders, growers, and consumers for the selection of hypoallergenic genotypes, the adoption of agronomical practices decreasing the allergenic potential, and the consumption of fruits with reduced amount of allergens. In the present research, expression studies were performed by means of real-time PCR for all the known allergen-encoding genes in apple. Fruit samples were collected from 15 apple varieties and from fruits of three different trials, set up to assess the effect of shadowing, elevation, storage, and water stress on the expression of allergen genes. Principal components analysis (PCA) was performed for the classification of varieties according to gene expression values, pointing out that the cultivars Fuji and Brina were two good hypoallergenic candidates. Shadowing, elevation, and storage significantly affected the transcription of the allergen-encoding genes, whereas water stress slightly influenced the expression of only two genes, in spite of the dramatic effect on both fruit size and vegetative growth of the trees. In particular, shadowing may represent an important cultural practice aimed at reducing apple cortex allergenicity. Moreover, elevation and storage may be combined to reduce the allergenic potential of apple fruits. The possible implications of the results for breeders, growers, and consumers are discussed critically.

  7. Searching for the molecular benchmark of physiological intestinal anastomotic healing in rats: an experimental study.

    PubMed

    Seifert, Gabriel J; Seifert, Michael; Kulemann, Birte; Holzner, Philipp A; Glatz, Torben; Timme, Sylvia; Sick, Olivia; Höppner, Jens; Hopt, Ulrich T; Marjanovic, Goran

    2014-01-01

    This investigation focuses on the physiological characteristics of gene transcription of intestinal tissue following anastomosis formation. In eight rats, end-to-end ileo-ileal anastomoses were performed (n = 2/group). The healthy intestinal tissue resected for this operation was used as a control. On days 0, 2, 4 and 8, 10-mm perianastomotic segments were resected. Control and perianastomotic segments were examined with an Affymetrix microarray chip to assess changes in gene regulation. Microarray findings were validated using real-time PCR for selected genes. In addition to screening global gene expression, we identified genes intensely regulated during healing and also subjected our data sets to an overrepresentation analysis using the Gene Ontology (GO) and Kyoto Encyclopedia for Genes and Genomes (KEGG). Compared to the control group, we observed that the number of differentially regulated genes peaked on day 2 with a total of 2,238 genes, decreasing by day 4 to 1,687 genes and to 1,407 genes by day 8. PCR validation for matrix metalloproteinases-3 and -13 showed not only identical transcription patterns but also analogous regulation intensity. When setting the cutoff of upregulation at 10-fold to identify genes likely to be relevant, the total gene count was significantly lower with 55, 45 and 37 genes on days 2, 4 and 8, respectively. A total of 947 GO subcategories were significantly overrepresented during anastomotic healing. Furthermore, 23 overrepresented KEGG pathways were identified. This study is the first of its kind that focuses explicitly on gene transcription during intestinal anastomotic healing under standardized conditions. Our work sets a foundation for further studies toward a more profound understanding of the physiology of anastomotic healing.

  8. SZGR 2.0: a one-stop shop of schizophrenia candidate genes.

    PubMed

    Jia, Peilin; Han, Guangchun; Zhao, Junfei; Lu, Pinyi; Zhao, Zhongming

    2017-01-04

    SZGR 2.0 is a comprehensive resource of candidate variants and genes for schizophrenia, covering genetic, epigenetic, transcriptomic, translational and many other types of evidence. By systematic review and curation of multiple lines of evidence, we included almost all variants and genes that have ever been reported to be associated with schizophrenia. In particular, we collected ∼4200 common variants reported in genome-wide association studies, ∼1000 de novo mutations discovered by large-scale sequencing of family samples, 215 genes spanning rare and replication copy number variations, 99 genes overlapping with linkage regions, 240 differentially expressed genes, 4651 differentially methylated genes and 49 genes as antipsychotic drug targets. To facilitate interpretation, we included various functional annotation data, especially brain eQTL, methylation QTL, brain expression featured in deep categorization of brain areas and developmental stages and brain-specific promoter and enhancer annotations. Furthermore, we conducted cross-study, cross-data type and integrative analyses of the multidimensional data deposited in SZGR 2.0, and made the data and results available through a user-friendly interface. In summary, SZGR 2.0 provides a one-stop shop of schizophrenia variants and genes and their function and regulation, providing an important resource in the schizophrenia and other mental disease community. SZGR 2.0 is available at https://bioinfo.uth.edu/SZGR/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Molecular Mapping of Flowering Time Major Genes and QTLs in Chickpea (Cicer arietinum L.)

    PubMed Central

    Mallikarjuna, Bingi P.; Samineni, Srinivasan; Thudi, Mahendar; Sajja, Sobhan B.; Khan, Aamir W.; Patil, Ayyanagowda; Viswanatha, Kannalli P.; Varshney, Rajeev K.; Gaur, Pooran M.

    2017-01-01

    Flowering time is an important trait for adaptation and productivity of chickpea in the arid and the semi-arid environments. This study was conducted for molecular mapping of genes/quantitative trait loci (QTLs) controlling flowering time in chickpea using F2 populations derived from four crosses (ICCV 96029 × CDC Frontier, ICC 5810 × CDC Frontier, BGD 132 × CDC Frontier and ICC 16641 × CDC Frontier). Genetic studies revealed monogenic control of flowering time in the crosses ICCV 96029 × CDC Frontier, BGD 132 × CDC Frontier and ICC 16641 × CDC Frontier, while digenic control with complementary gene action in ICC 5810 × CDC Frontier. The intraspecific genetic maps developed from these crosses consisted 75, 75, 68 and 67 markers spanning 248.8 cM, 331.4 cM, 311.1 cM and 385.1 cM, respectively. A consensus map spanning 363.8 cM with 109 loci was constructed by integrating four genetic maps. Major QTLs corresponding to flowering time genes efl-1 from ICCV 96029, efl-3 from BGD 132 and efl-4 from ICC 16641 were mapped on CaLG04, CaLG08 and CaLG06, respectively. The QTLs and linked markers identified in this study can be used in marker-assisted breeding for developing early maturing chickpea. PMID:28729871

  10. Validating internal controls for quantitative plant gene expression studies.

    PubMed

    Brunner, Amy M; Yakovlev, Igor A; Strauss, Steven H

    2004-08-18

    Real-time reverse transcription PCR (RT-PCR) has greatly improved the ease and sensitivity of quantitative gene expression studies. However, accurate measurement of gene expression with this method relies on the choice of a valid reference for data normalization. Studies rarely verify that gene expression levels for reference genes are adequately consistent among the samples used, nor compare alternative genes to assess which are most reliable for the experimental conditions analyzed. Using real-time RT-PCR to study the expression of 10 poplar (genus Populus) housekeeping genes, we demonstrate a simple method for determining the degree of stability of gene expression over a set of experimental conditions. Based on a traditional method for analyzing the stability of varieties in plant breeding, it defines measures of gene expression stability from analysis of variance (ANOVA) and linear regression. We found that the potential internal control genes differed widely in their expression stability over the different tissues, developmental stages and environmental conditions studied. Our results support that quantitative comparisons of candidate reference genes are an important part of real-time RT-PCR studies that seek to precisely evaluate variation in gene expression. The method we demonstrated facilitates statistical and graphical evaluation of gene expression stability. Selection of the best reference gene for a given set of experimental conditions should enable detection of biologically significant changes in gene expression that are too small to be revealed by less precise methods, or when highly variable reference genes are unknowingly used in real-time RT-PCR experiments.

  11. Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis.

    PubMed

    Benyamin, Beben; He, Ji; Zhao, Qiongyi; Gratten, Jacob; Garton, Fleur; Leo, Paul J; Liu, Zhijun; Mangelsdorf, Marie; Al-Chalabi, Ammar; Anderson, Lisa; Butler, Timothy J; Chen, Lu; Chen, Xiang-Ding; Cremin, Katie; Deng, Hong-Weng; Devine, Matthew; Edson, Janette; Fifita, Jennifer A; Furlong, Sarah; Han, Ying-Ying; Harris, Jessica; Henders, Anjali K; Jeffree, Rosalind L; Jin, Zi-Bing; Li, Zhongshan; Li, Ting; Li, Mengmeng; Lin, Yong; Liu, Xiaolu; Marshall, Mhairi; McCann, Emily P; Mowry, Bryan J; Ngo, Shyuan T; Pamphlett, Roger; Ran, Shu; Reutens, David C; Rowe, Dominic B; Sachdev, Perminder; Shah, Sonia; Song, Sharon; Tan, Li-Jun; Tang, Lu; van den Berg, Leonard H; van Rheenen, Wouter; Veldink, Jan H; Wallace, Robyn H; Wheeler, Lawrie; Williams, Kelly L; Wu, Jinyu; Wu, Xin; Yang, Jian; Yue, Weihua; Zhang, Zong-Hong; Zhang, Dai; Noakes, Peter G; Blair, Ian P; Henderson, Robert D; McCombe, Pamela A; Visscher, Peter M; Xu, Huji; Bartlett, Perry F; Brown, Matthew A; Wray, Naomi R; Fan, Dongsheng

    2017-09-20

    Cross-ethnic genetic studies can leverage power from differences in disease epidemiology and population-specific genetic architecture. In particular, the differences in linkage disequilibrium and allele frequency patterns across ethnic groups may increase gene-mapping resolution. Here we use cross-ethnic genetic data in sporadic amyotrophic lateral sclerosis (ALS), an adult-onset, rapidly progressing neurodegenerative disease. We report analyses of novel genome-wide association study data of 1,234 ALS cases and 2,850 controls. We find a significant association of rs10463311 spanning GPX3-TNIP1 with ALS (p = 1.3 × 10 -8 ), with replication support from two independent Australian samples (combined 576 cases and 683 controls, p = 1.7 × 10 -3 ). Both GPX3 and TNIP1 interact with other known ALS genes (SOD1 and OPTN, respectively). In addition, GGNBP2 was identified using gene-based analysis and summary statistics-based Mendelian randomization analysis, although further replication is needed to confirm this result. Our results increase our understanding of genetic aetiology of ALS.Amyotrophic lateral sclerosis (ALS) is a rapidly progressing neurodegenerative disease. Here, Wray and colleagues identify association of the GPX3-TNIP1 locus with ALS using cross-ethnic meta-analyses.

  12. Genome-Wide Association Study for Identification and Validation of Novel SNP Markers for Sr6 Stem Rust Resistance Gene in Bread Wheat.

    PubMed

    Mourad, Amira M I; Sallam, Ahmed; Belamkar, Vikas; Wegulo, Stephen; Bowden, Robert; Jin, Yue; Mahdy, Ezzat; Bakheit, Bahy; El-Wafaa, Atif A; Poland, Jesse; Baenziger, Peter S

    2018-01-01

    Stem rust (caused by Puccinia graminis f. sp. tritici Erikss. & E. Henn.), is a major disease in wheat ( Triticum aestivium L.). However, in recent years it occurs rarely in Nebraska due to weather and the effective selection and gene pyramiding of resistance genes. To understand the genetic basis of stem rust resistance in Nebraska winter wheat, we applied genome-wide association study (GWAS) on a set of 270 winter wheat genotypes (A-set). Genotyping was carried out using genotyping-by-sequencing and ∼35,000 high-quality SNPs were identified. The tested genotypes were evaluated for their resistance to the common stem rust race in Nebraska (QFCSC) in two replications. Marker-trait association identified 32 SNP markers, which were significantly (Bonferroni corrected P < 0.05) associated with the resistance on chromosome 2D. The chromosomal location of the significant SNPs (chromosome 2D) matched the location of Sr6 gene which was expected in these genotypes based on pedigree information. A highly significant linkage disequilibrium (LD, r 2 ) was found between the significant SNPs and the specific SSR marker for the Sr6 gene ( Xcfd43 ). This suggests the significant SNP markers are tagging Sr6 gene. Out of the 32 significant SNPs, eight SNPs were in six genes that are annotated as being linked to disease resistance in the IWGSC RefSeq v1.0. The 32 significant SNP markers were located in nine haplotype blocks. All the 32 significant SNPs were validated in a set of 60 different genotypes (V-set) using single marker analysis. SNP markers identified in this study can be used in marker-assisted selection, genomic selection, and to develop KASP (Kompetitive Allele Specific PCR) marker for the Sr6 gene. Novel SNPs for Sr6 gene, an important stem rust resistant gene, were identified and validated in this study. These SNPs can be used to improve stem rust resistance in wheat.

  13. Pathway Distiller - multisource biological pathway consolidation

    PubMed Central

    2012-01-01

    Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments. PMID:23134636

  14. Pathway Distiller - multisource biological pathway consolidation.

    PubMed

    Doderer, Mark S; Anguiano, Zachry; Suresh, Uthra; Dashnamoorthy, Ravi; Bishop, Alexander J R; Chen, Yidong

    2012-01-01

    One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

  15. The limitations of simple gene set enrichment analysis assuming gene independence.

    PubMed

    Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

    2016-02-01

    Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.

  16. A gene expression signature associated with survival in metastatic melanoma

    PubMed Central

    Mandruzzato, Susanna; Callegaro, Andrea; Turcatel, Gianluca; Francescato, Samuela; Montesco, Maria C; Chiarion-Sileni, Vanna; Mocellin, Simone; Rossi, Carlo R; Bicciato, Silvio; Wang, Ena; Marincola, Francesco M; Zanovello, Paola

    2006-01-01

    Background Current clinical and histopathological criteria used to define the prognosis of melanoma patients are inadequate for accurate prediction of clinical outcome. We investigated whether genome screening by means of high-throughput gene microarray might provide clinically useful information on patient survival. Methods Forty-three tumor tissues from 38 patients with stage III and stage IV melanoma were profiled with a 17,500 element cDNA microarray. Expression data were analyzed using significance analysis of microarrays (SAM) to identify genes associated with patient survival, and supervised principal components (SPC) to determine survival prediction. Results SAM analysis revealed a set of 80 probes, corresponding to 70 genes, associated with survival, i.e. 45 probes characterizing longer and 35 shorter survival times, respectively. These transcripts were included in a survival prediction model designed using SPC and cross-validation which allowed identifying 30 predicting probes out of the 80 associated with survival. Conclusion The longer-survival group of genes included those expressed in immune cells, both innate and acquired, confirming the interplay between immunological mechanisms and the natural history of melanoma. Genes linked to immune cells were totally lacking in the poor-survival group, which was instead associated with a number of genes related to highly proliferative and invasive tumor cells. PMID:17129373

  17. Cross-talk of the biotrophic pathogen Claviceps purpurea and its host Secale cereale.

    PubMed

    Oeser, Birgitt; Kind, Sabine; Schurack, Selma; Schmutzer, Thomas; Tudzynski, Paul; Hinsch, Janine

    2017-04-04

    The economically important Ergot fungus Claviceps purpurea is an interesting biotrophic model system because of its strict organ specificity (grass ovaries) and the lack of any detectable plant defense reactions. Though several virulence factors were identified, the exact infection mechanisms are unknown, e.g. how the fungus masks its attack and if the host detects the infection at all. We present a first dual transcriptome analysis using an RNA-Seq approach. We studied both, fungal and plant gene expression in young ovaries infected by the wild-type and two virulence-attenuated mutants. We can show that the plant recognizes the fungus, since defense related genes are upregulated, especially several phytohormone genes. We present a survey of in planta expressed fungal genes, among them several confirmed virulence genes. Interestingly, the set of most highly expressed genes includes a high proportion of genes encoding putative effectors, small secreted proteins which might be involved in masking the fungal attack or interfering with host defense reactions. As known from several other phytopathogens, the C. purpurea genome contains more than 400 of such genes, many of them clustered and probably highly redundant. Since the lack of effective defense reactions in spite of recognition of the fungus could very well be achieved by effectors, we started a functional analysis of some of the most highly expressed candidates. However, the redundancy of the system made the identification of a drastic effect of a single gene most unlikely. We can show that at least one candidate accumulates in the plant apoplast. Deletion of some candidates led to a reduced virulence of C. purpurea on rye, indicating a role of the respective proteins during the infection process. We show for the first time that- despite the absence of effective plant defense reactions- the biotrophic pathogen C. purpurea is detected by its host. This points to a role of effectors in modulation of the effective plant response. Indeed, several putative effector genes are among the highest expressed genes in planta.

  18. Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex.

    PubMed

    Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne

    2004-06-01

    One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."

  19. Transcriptomic analysis between self- and cross-pollinated pistils of tea plants (Camellia sinensis).

    PubMed

    Ma, Qingping; Chen, Changsong; Zeng, Zhongping; Zou, Zhongwei; Li, Huan; Zhou, Qiongqiong; Chen, Xuan; Sun, Kang; Li, Xinghui

    2018-04-25

    Self-incompatibility (SI) is a major barrier that obstructs the breeding process in most horticultural plants including tea plants (Camellia sinensis). The aim of this study was to elucidate the molecular mechanism of SI in tea plants through a high throughput transcriptome analysis. In this study, the transcriptomes of self- and cross-pollinated pistils of two tea cultivars 'Fudingdabai' and 'Yulv' were compared to elucidate the SI mechanism of tea plants. In addition, the ion components and pollen tube growth in self- and cross-pollinated pistils were investigated. Our results revealed that both cultivars had similar pollen activities and cross-pollination could promote the pollen tube growth. In tea pistils, the highest ion content was potassium (K + ), followed by calcium (Ca 2+ ), magnesium (Mg 2+ ) and phosphorus (P 5+ ). Ca 2+ content increased after self-pollination but decreased after cross-pollination, while K + showed reverse trend with Ca 2+ . A total of 990 and 3 common differentially expressed genes (DEGs) were identified in un-pollinated vs. pollinated pistils and self- vs. cross-pollinated groups after 48 h, respectively. Function annotation indicated that three genes encoding UDP-glycosyltransferase 74B1 (UGT74B1), Mitochondrial calcium uniporter protein 2 (MCU2) and G-type lectin S-receptor-like serine/threonine-protein kinase (G-type RLK) might play important roles during SI process in tea plants. Ca 2+ and K + are important signal for SI in tea plants, and three genes including UGT74B1, MCU2 and G-type RLK play essential roles during SI signal transduction.

  20. Glutamatergic and GABAergic gene sets in attention-deficit/hyperactivity disorder: association to overlapping traits in ADHD and autism

    PubMed Central

    Naaijen, J; Bralten, J; Poelmans, G; Faraone, Stephen; Asherson, Philip; Banaschewski, Tobias; Buitelaar, Jan; Franke, Barbara; P Ebstein, Richard; Gill, Michael; Miranda, Ana; D Oades, Robert; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph; Sonuga-Barke, Edmund; Anney, Richard; Mulas, Fernando; Steinhausen, Hans-Christoph; Glennon, J C; Franke, B; Buitelaar, J K

    2017-01-01

    Attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorders (ASD) often co-occur. Both are highly heritable; however, it has been difficult to discover genetic risk variants. Glutamate and GABA are main excitatory and inhibitory neurotransmitters in the brain; their balance is essential for proper brain development and functioning. In this study we investigated the role of glutamate and GABA genetics in ADHD severity, autism symptom severity and inhibitory performance, based on gene set analysis, an approach to investigate multiple genetic variants simultaneously. Common variants within glutamatergic and GABAergic genes were investigated using the MAGMA software in an ADHD case-only sample (n=931), in which we assessed ASD symptoms and response inhibition on a Stop task. Gene set analysis for ADHD symptom severity, divided into inattention and hyperactivity/impulsivity symptoms, autism symptom severity and inhibition were performed using principal component regression analyses. Subsequently, gene-wide association analyses were performed. The glutamate gene set showed an association with severity of hyperactivity/impulsivity (P=0.009), which was robust to correcting for genome-wide association levels. The GABA gene set showed nominally significant association with inhibition (P=0.04), but this did not survive correction for multiple comparisons. None of single gene or single variant associations was significant on their own. By analyzing multiple genetic variants within candidate gene sets together, we were able to find genetic associations supporting the involvement of excitatory and inhibitory neurotransmitter systems in ADHD and ASD symptom severity in ADHD. PMID:28072412

  1. SABRE: a method for assessing the stability of gene modules in complex tissues and subject populations.

    PubMed

    Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T

    2016-11-14

    Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.

  2. Evolutionary rewiring of bacterial regulatory networks

    PubMed Central

    Taylor, Tiffany B.; Mulley, Geraldine; McGuffin, Liam J.; Johnson, Louise J.; Brockhurst, Michael A.; Arseneault, Tanya; Silby, Mark W.; Jackson, Robert W.

    2015-01-01

    Bacteria have evolved complex regulatory networks that enable integration of multiple intracellular and extracellular signals to coordinate responses to environmental changes. However, our knowledge of how regulatory systems function and evolve is still relatively limited. There is often extensive homology between components of different networks, due to past cycles of gene duplication, divergence, and horizontal gene transfer, raising the possibility of cross-talk or redundancy. Consequently, evolutionary resilience is built into gene networks - homology between regulators can potentially allow rapid rescue of lost regulatory function across distant regions of the genome. In our recent study [Taylor, et al. Science (2015), 347(6225)] we find that mutations that facilitate cross-talk between pathways can contribute to gene network evolution, but that such mutations come with severe pleiotropic costs. Arising from this work are a number of questions surrounding how this phenomenon occurs. PMID:28357301

  3. The truth about mouse, human, worms and yeast

    PubMed Central

    2004-01-01

    Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years. PMID:15601543

  4. The truth about mouse, human, worms and yeast.

    PubMed

    Nelson, David R; Nebert, Daniel W

    2004-01-01

    Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years.

  5. Combining Shigella Tn-seq data with gold-standard E. coli gene deletion data suggests rare transitions between essential and non-essential gene functionality.

    PubMed

    Freed, Nikki E; Bumann, Dirk; Silander, Olin K

    2016-09-06

    Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 481 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 revealed that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, implying that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.

  6. VirtualLeaf: an open-source framework for cell-based modeling of plant tissue growth and development.

    PubMed

    Merks, Roeland M H; Guravage, Michael; Inzé, Dirk; Beemster, Gerrit T S

    2011-02-01

    Plant organs, including leaves and roots, develop by means of a multilevel cross talk between gene regulation, patterned cell division and cell expansion, and tissue mechanics. The multilevel regulatory mechanisms complicate classic molecular genetics or functional genomics approaches to biological development, because these methodologies implicitly assume a direct relation between genes and traits at the level of the whole plant or organ. Instead, understanding gene function requires insight into the roles of gene products in regulatory networks, the conditions of gene expression, etc. This interplay is impossible to understand intuitively. Mathematical and computer modeling allows researchers to design new hypotheses and produce experimentally testable insights. However, the required mathematics and programming experience makes modeling poorly accessible to experimental biologists. Problem-solving environments provide biologically intuitive in silico objects ("cells", "regulation networks") required for setting up a simulation and present those to the user in terms of familiar, biological terminology. Here, we introduce the cell-based computer modeling framework VirtualLeaf for plant tissue morphogenesis. The current version defines a set of biologically intuitive C++ objects, including cells, cell walls, and diffusing and reacting chemicals, that provide useful abstractions for building biological simulations of developmental processes. We present a step-by-step introduction to building models with VirtualLeaf, providing basic example models of leaf venation and meristem development. VirtualLeaf-based models provide a means for plant researchers to analyze the function of developmental genes in the context of the biophysics of growth and patterning. VirtualLeaf is an ongoing open-source software project (http://virtualleaf.googlecode.com) that runs on Windows, Mac, and Linux.

  7. Examination of the Involvement of Cholinergic-Associated Genes in Nicotine Behaviors in European and African Americans.

    PubMed

    Melroy-Greif, Whitney E; Simonson, Matthew A; Corley, Robin P; Lutz, Sharon M; Hokanson, John E; Ehringer, Marissa A

    2017-04-01

    Cigarette smoking is a physiologically harmful habit. Nicotinic acetylcholine receptors (nAChRs) are bound by nicotine and upregulated in response to chronic exposure to nicotine. It is known that upregulation of these receptors is not due to a change in mRNA of these genes, however, more precise details on the process are still uncertain, with several plausible hypotheses describing how nAChRs are upregulated. We have manually curated a set of genes believed to play a role in nicotine-induced nAChR upregulation. Here, we test the hypothesis that these genes are associated with and contribute risk for nicotine dependence (ND) and the number of cigarettes smoked per day (CPD). Studies with genotypic data on European and African Americans (EAs and AAs, respectively) were collected and a gene-based test was run to test for an association between each gene and ND and CPD. Although several novel genes were associated with CPD and ND at P < 0.05 in EAs and AAs, these associations did not survive correction for multiple testing. Previous associations between CHRNA3, CHRNA5, CHRNB4 and CPD in EAs were replicated. Our hypothesis-driven approach avoided many of the limitations inherent in pathway analyses and provided nominal evidence for association between cholinergic-related genes and nicotine behaviors. We evaluated the evidence for association between a manually curated set of genes and nicotine behaviors in European and African Americans. Although no genes were associated after multiple testing correction, this study has several strengths: by manually curating a set of genes we circumvented the limitations inherent in many pathway analyses and tested several genes that had not yet been examined in a human genetic study; gene-based tests are a useful way to test for association with a set of genes; and these genes were collected based on literature review and conversations with experts, highlighting the importance of scientific collaboration. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Evolutionary history of tall fescue morphotypes inferred from molecular phylogenetics of the Lolium-Festuca species complex

    PubMed Central

    2010-01-01

    Background The agriculturally important pasture grass tall fescue (Festuca arundinacea Schreb. syn. Lolium arundinaceum (Schreb.) Darbysh.) is an outbreeding allohexaploid, that may be more accurately described as a species complex consisting of three major (Continental, Mediterranean and rhizomatous) morphotypes. Observation of hybrid infertility in some crossing combinations between morphotypes suggests the possibility of independent origins from different diploid progenitors. This study aims to clarify the evolutionary relationships between each tall fescue morphotype through phylogenetic analysis using two low-copy nuclear genes (encoding plastid acetyl-CoA carboxylase [Acc1] and centroradialis [CEN]), the nuclear ribosomal DNA internal transcribed spacer (rDNA ITS) and the chloroplast DNA (cpDNA) genome-located matK gene. Other taxa within the closely related Lolium-Festuca species complex were also included in the study, to increase understanding of evolutionary processes in a taxonomic group characterised by multiple inter-specific hybridisation events. Results Putative homoeologous sequences from both nuclear genes were obtained from each polyploid species and compared to counterparts from 15 diploid taxa. Phylogenetic reconstruction confirmed F. pratensis and F. arundinacea var. glaucescens as probable progenitors to Continental tall fescue, and these species are also likely to be ancestral to the rhizomatous morphotype. However, these two morphotypes are sufficiently distinct to be located in separate clades based on the ITS-derived data set. All four of the generated data sets suggest independent evolution of the Mediterranean and Continental morphotypes, with minimal affinity between cognate sequence haplotypes. No obvious candidate progenitor species for Mediterranean tall fescues were identified, and only two putative sub-genome-specific haplotypes were identified for this morphotype. Conclusions This study describes the first phylogenetic analysis of the Festuca genus to include representatives of each tall fescue morphotype, and to use low copy nuclear gene-derived sequences to identify putative progenitors of the polyploid species. The demonstration of distinct tall fescue lineages has implications for both taxonomy and molecular breeding strategies, and may facilitate the generation of morphotype and/or sub-genome-specific molecular markers. PMID:20937141

  9. In Vitro Selection of ramR and soxR Mutants Overexpressing Efflux Systems by Fluoroquinolones as Well as Cefoxitin in Klebsiella pneumoniae▿

    PubMed Central

    Bialek-Davenet, Suzanne; Marcon, Estelle; Leflon-Guibout, Véronique; Lavigne, Jean-Philippe; Bert, Frédéric; Moreau, Richard; Nicolas-Chanoine, Marie-Hélène

    2011-01-01

    The relationship between efflux system overexpression and cross-resistance to cefoxitin, quinolones, and chloramphenicol has recently been reported in Klebsiella pneumoniae. In 3 previously published clinical isolates and 17 in vitro mutants selected with cefoxitin or fluoroquinolones, mutations in the potential regulator genes of the AcrAB efflux pump (acrR, ramR, ramA, marR, marA, soxR, soxS, and rob) were searched, and their impacts on efflux-related antibiotic cross-resistance were assessed. All mutants but 1, and 2 clinical isolates, overexpressed acrB. No mutation was detected in the regulator genes studied among the clinical isolates and 8 of the mutants. For the 9 remaining mutants, a mutation was found in the ramR gene in 8 of them and in the soxR gene in the last one, resulting in overexpression of ramA and soxS, respectively. Transformation of the ramR mutants and the soxR mutant with the wild-type ramR and soxR genes, respectively, abolished overexpression of acrB and ramA in the ramR mutants and of soxS in the soxR mutant, as well as antibiotic cross-resistance. Resistance due to efflux system overexpression was demonstrated for 4 new antibiotics: cefuroxime, cefotaxime, ceftazidime, and ertapenem. This study shows that the ramR and soxR genes control the expression of efflux systems in K. pneumoniae and suggests the existence of efflux pumps other than AcrAB and of other loci involved in the regulation of AcrAB expression. PMID:21464248

  10. Cross-induction of detoxification genes by environmental xenobiotics and insecticides in the mosquito Aedes aegypti: impact on larval tolerance to chemical insecticides.

    PubMed

    Poupardin, Rodolphe; Reynaud, Stéphane; Strode, Clare; Ranson, Hilary; Vontas, John; David, Jean-Philippe

    2008-05-01

    The effect of exposure of Aedes aegypti larvae to sub-lethal doses of the pyrethroid insecticide permethrin, the organophosphate temephos, the herbicide atrazine, the polycyclic aromatic hydrocarbon fluoranthene and the heavy metal copper on their subsequent tolerance to insecticides, detoxification enzyme activities and expression of detoxification genes was investigated. Bioassays revealed a moderate increase in larval tolerance to permethrin following exposure to fluoranthene and copper while larval tolerance to temephos increased moderately after exposure to atrazine, copper and permethrin. Cytochrome P450 monooxygenases activities were induced in larvae exposed to permethrin, fluoranthene and copper while glutathione S-transferase activities were induced after exposure to fluoranthene and repressed after exposure to copper. Microarray screening of the expression patterns of all detoxification genes following exposure to each xenobiotic with the Aedes Detox Chip identified multiple genes induced by xenobiotics and insecticides. Further expression studies using real-time quantitative PCR confirmed the induction of multiple CYP genes and one carboxylesterase gene by insecticides and xenobiotics. Overall, this study reveals the potential of xenobiotics found in polluted mosquito breeding sites to affect their tolerance to insecticides, possibly through the cross-induction of particular detoxification genes. Molecular mechanisms involved and impact on mosquito control strategies are discussed.

  11. Identification of Genes Preferentially Expressed by Highly Virulent Piscine Streptococcus agalactiae upon Interaction with Macrophages

    PubMed Central

    Guo, Chang-Ming; Chen, Rong-Rong; Kalhoro, Dildar Hussain; Wang, Zhao-Fei; Liu, Guang-Jin; Lu, Cheng-Ping; Liu, Yong-Jie

    2014-01-01

    Streptococcus agalactiae, long recognized as a mammalian pathogen, is an emerging concern with regard to fish. In this study, we used a mouse model and in vitro cell infection to evaluate the pathogenetic characteristics of S. agalactiae GD201008-001, isolated from tilapia in China. This bacterium was found to be highly virulent and capable of inducing brain damage by migrating into the brain by crossing the blood–brain barrier (BBB). The phagocytosis assays indicated that this bacterium could be internalized by murine macrophages and survive intracellularly for more than 24 h, inducing injury to macrophages. Further, selective capture of transcribed sequences (SCOTS) was used to investigate microbial gene expression associated with intracellular survival. This positive cDNA selection technique identified 60 distinct genes that could be characterized into 6 functional categories. More than 50% of the differentially expressed genes were involved in metabolic adaptation. Some genes have previously been described as associated with virulence in other bacteria, and four showed no significant similarities to any other previously described genes. This study constitutes the first step in further gene expression analyses that will lead to a better understanding of the molecular mechanisms used by S. agalactiae to survive in macrophages and to cross the BBB. PMID:24498419

  12. TEMPORAL-SPATIAL ANALYSIS OF U.S.- MEXICO BORDER ENVIRONMENTAL FINE AND COARSE PM AIR SAMPLE EXTRACT ACTIVITY IN HUMAN BRONCHIAL EPITHELIAL CELLS

    PubMed Central

    Lauer, Fredine T.; Mitchell, Leah A.; Bedrick, Edward; McDonald, Jacob D.; Lee, Wen-Yee; Li, Wen-Whai; Olvera, Hector; Amaya, Maria A.; Berwick, Marianne; Gonzales, Melissa; Currey, Robert; Pingitore, Nicholas E.; Burchiel, Scott W.

    2009-01-01

    Particulate matter less than 10 μm (PM10) has been shown to be associated with aggravation of asthma and respiratory and cardiopulmonary morbidity. There is also great interest in the potential health effects of PM 2.5. Particulate matter (PM) varies in composition both spatially and temporally depending on the source, location and seasonal condition. El Paso County which lies in the Paso del Norte airshed is a unique location to study ambient air pollution due to three major points: the geological land formation, the relatively large population and the various sources of PM. In this study, dichotomous filters were collected from various sites in El Paso County every seven days for a period of one year. The sampling sites were both distant and near border crossings, which are near heavily populated areas with high traffic volume. Fine (PM2.5) and Coarse (PM10-2.5) PM filter samples were extracted using dichloromethane and were assessed for biologic activity and polycyclic aromatic (PAH) content. Three sets of marker genes human BEAS2B bronchial epithelial cells were utilized to assess the effects of airborne PAHs on biologic activities associated with specific biological pathways associated with airway diseases. These pathways included in inflammatory cytokine production (IL-6, IL-8), oxidative stress (HMOX-1, NQO-1, ALDH3A1, AKR1C1), and aryl hydrocarbon receptor (AhR)-dependent signaling (CYP1A1). Results demonstrated interesting temporal and spatial patterns of gene induction for all pathways, particularly those associated with oxidative stress, and significant differences in the PAHs detected in the PM10-2.5 and PM 2.5 fractions. Temporally, the greatest effects on gene induction were observed in winter months, which appeared to correlate with inversions that are common in the air basin. Spatially, the greatest gene expression increases were seen in extracts collected from the central most areas of El Paso which are also closest to highways and border crossings. PMID:19410595

  13. Misregulation of spermatogenesis genes in Drosophila hybrids is lineage-specific and driven by the combined effects of sterility and fast male regulatory divergence.

    PubMed

    Gomes, S; Civetta, A

    2014-09-01

    Hybrid male sterility is a common outcome of crosses between different species. Gene expression studies have found that a number of spermatogenesis genes are differentially expressed in sterile hybrid males, compared with parental species. Late-stage sperm development genes are particularly likely to be misexpressed, with fewer early-stage genes affected. Thus, a link has been posited between misexpression and sterility. A more recent alternative explanation for hybrid gene misexpression has been that it is independent of sterility and driven by divergent evolution of male-specific regulatory elements between species (faster male hypothesis). The faster male hypothesis predicts that misregulation of spermatogenesis genes should be independent of sterility and approximately the same in both hybrids, whereas sterility should only affect gene expression in sterile hybrids. To test the faster male hypothesis vs. the effect of sterility on gene misexpression, we analyse spermatogenesis gene expression in different species pairs of the Drosophila phylogeny, where hybrid male sterility occurs in only one direction of the interspecies cross (i.e. unidirectional sterility). We find significant differences among genes in misexpression with effects that are lineage-specific and caused by sterility or fast male regulatory divergence. © 2014 The Authors. Journal of Evolutionary Biology © 2014 European Society For Evolutionary Biology.

  14. SSR marker variations in Brassica species provide insight into the origin and evolution of Brassica amphidiploids.

    PubMed

    Thakur, Ajay Kumar; Singh, Kunwar Harendra; Singh, Lal; Nanjundan, Joghee; Khan, Yasin Jeshima; Singh, Dhiraj

    2018-01-01

    Oilseed Brassica represents an important group of oilseed crops with a long history of evolution and cultivation. To understand the origin and evolution of Brassica amphidiploids, simple sequence repeat (SSR) markers were used to unravel genetic variations in three diploids and three amphidiploid Brassica species of U's triangle along with Eruca sativa as an outlier. Of 124 Brassica-derived SSR loci assayed, 100% cross-transferability was obtained for B. juncea and three subspecies of B. rapa , while lowest cross-transferability (91.93%) was obtained for Eruca sativa . The average % age of cross-transferability across all the seven species was 98.15%. The number of alleles detected at each locus ranged from one to six with an average of 3.41 alleles per primer pair. Neighbor-Joining-based dendrogram divided all the 40 accessions into two main groups composed of B. juncea / B. nigra/B. rapa and B. carinata/B. napus/B. oleracea . C-genome of oilseed Brassica species remained relatively more conserved than A- and B-genome. A- genome present in B. juncea and B. napus seems distinct from each other and hence provides great opportunity for generating diversity through synthesizing amphidiploids from different sources of A- genome. B. juncea had least intra-specific distance indicating narrow genetic base. B. rapa appears to be more primitive species from which other two diploid species might have evolved. The SSR marker set developed in this study will assist in DNA fingerprinting of various Brassica species cultivars, evaluating the genetic diversity in Brassica germplasm, genome mapping and construction of linkage maps, gene tagging and various other genomics-related studies in Brassica species. Further, the evolutionary relationship established among various Brassica species would assist in formulating suitable breeding strategies for widening the genetic base of Brassica amphidiploids by exploiting the genetic diversity present in diploid progenitor gene pools.

  15. The Molecular Signatures Database (MSigDB) hallmark gene set collection.

    PubMed

    Liberzon, Arthur; Birger, Chet; Thorvaldsdóttir, Helga; Ghandi, Mahmoud; Mesirov, Jill P; Tamayo, Pablo

    2015-12-23

    The Molecular Signatures Database (MSigDB) is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis. Since its creation, MSigDB has grown beyond its roots in metabolic disease and cancer to include >10,000 gene sets. These better represent a wider range of biological processes and diseases, but the utility of the database is reduced by increased redundancy across, and heterogeneity within, gene sets. To address this challenge, here we use a combination of automated approaches and expert curation to develop a collection of "hallmark" gene sets as part of MSigDB. Each hallmark in this collection consists of a "refined" gene set, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression. The hallmarks effectively summarize most of the relevant information of the original founder sets and, by reducing both variation and redundancy, provide more refined and concise inputs for gene set enrichment analysis.

  16. Autogenous cross-regulation of Quaking mRNA processing and translation balances Quaking functions in splicing and translation.

    PubMed

    Fagg, W Samuel; Liu, Naiyou; Fair, Jeffrey Haskell; Shiue, Lily; Katzman, Sol; Donohue, John Paul; Ares, Manuel

    2017-09-15

    Quaking protein isoforms arise from a single Quaking gene and bind the same RNA motif to regulate splicing, translation, decay, and localization of a large set of RNAs. However, the mechanisms by which Quaking expression is controlled to ensure that appropriate amounts of each isoform are available for such disparate gene expression processes are unknown. Here we explore how levels of two isoforms, nuclear Quaking-5 (Qk5) and cytoplasmic Qk6, are regulated in mouse myoblasts. We found that Qk5 and Qk6 proteins have distinct functions in splicing and translation, respectively, enforced through differential subcellular localization. We show that Qk5 and Qk6 regulate distinct target mRNAs in the cell and act in distinct ways on their own and each other's transcripts to create a network of autoregulatory and cross-regulatory feedback controls. Morpholino-mediated inhibition of Qk translation confirms that Qk5 controls Qk RNA levels by promoting accumulation and alternative splicing of Qk RNA, whereas Qk6 promotes its own translation while repressing Qk5. This Qk isoform cross-regulatory network responds to additional cell type and developmental controls to generate a spectrum of Qk5/Qk6 ratios, where they likely contribute to the wide range of functions of Quaking in development and cancer. © 2017 Fagg et al.; Published by Cold Spring Harbor Laboratory Press.

  17. A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning

    PubMed Central

    2018-01-01

    Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968

  18. Reproducible detection of disease-associated markers from gene expression data.

    PubMed

    Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto

    2016-08-18

    Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

  19. Microgravity and immunity: Changes in lymphocyte gene expression.

    NASA Astrophysics Data System (ADS)

    Risin, D.; Ward, N. E.; Risin, S. A.; Pellis, N. R.

    Earlier studies had shown that modeled and true microgravity MG cause multiple direct effects on human lymphocytes MG inhibits lymphocyte locomotion suppresses polyclonal and antigen-specific activation affects signal transduction mechanisms as well as activation-induced apoptosis In this study we assessed changes in gene expression associated with lymphocyte exposure to microgravity in an attempt to identify microgravity-sensitive genes MGSG in general and specifically those genes that might be responsible for the functional and structural changes observed earlier Two sets of experiments targeting different goals were conducted In the first set T-lymphocytes from normal donors were activated with anti-CD3 and IL2 and then cultured in 1g static and modeled MG MMG conditions Rotating Wall Vessel bioreactor for 24 hours This setting allowed searching for MGSG by comparison of gene expression patterns in zero and 1 g gravity In the second set - activated T-cells after culturing for 24 hours in 1g and MMG were exposed three hours before harvesting to a secondary activation stimulus PHA thus triggering the apoptotic pathway Total RNA was extracted using the RNeasy isolation kit Qiagen Valencia CA Affymetrix Gene Chips U133A allowing testing for 18 400 human genes were used for microarray analysis The experiments were performed in triplicates with T-cells obtained from different blood donors to minimize the possible input of biological variation in gene expression and discriminate changes that are associated with the

  20. Lr41, Lr39, and a leaf rust resistance gene from Aegilops cylindrica may be allelic and are located on wheat chromosome 2DS.

    PubMed

    Singh, Sukhwinder; Franks, C D; Huang, L; Brown-Guedira, G L; Marshall, D S; Gill, B S; Fritz, A

    2004-02-01

    The leaf rust resistance gene Lr41 in wheat germplasm KS90WGRC10 and a resistance gene in wheat breeding line WX93D246-R-1 were transferred to Triticum aestivum from Aegilops tauschii and Ae. cylindrica, respectively. The leaf rust resistance gene in WX93D246-R-1 was located on wheat chromosome 2D by monosomic analysis. Molecular marker analysis of F(2) plants from non-critical crosses determined that this gene is 11.2 cM distal to marker Xgwm210 on the short arm of 2D. No susceptible plants were detected in a population of 300 F(2) plants from a cross between WX93D246-R-1 and TA 4186 ( Lr39), suggesting that the gene in WX93D246-R-1 is the same as, or closely linked to, Lr39. In addition, no susceptible plants were detected in a population of 180 F(2) plants from the cross between KS90WGRC10 and WX93D246-R-1. The resistance gene in KS90WGRC10, Lr41, was previously reported to be located on wheat chromosome 1D. In this study, no genetic association was found between Lr41 and 51 markers located on chromosome 1D. A population of 110 F(3 )lines from a cross between KS90WGRC10 and TAM 107 was evaluated with polymorphic SSR markers from chromosome 2D and marker Xgdm35 was found to be 1.9 cM proximal to Lr41. When evaluated with diverse isolates of Puccinia triticina, similar reactions were observed on WX93D246-R-1, KS90WGRC10, and TA 4186. The results of mapping, allelism, and race specificity test indicate that these germplasms likely have the same gene for resistance to leaf rust.

  1. Using RNA-Seq data to select refence genes for normalizing gene expression in apple roots

    USDA-ARS?s Scientific Manuscript database

    Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for t...

  2. Locating a modifier gene of Ovum mutant through crosses between DDK and C57BL/6J inbred strains in mice.

    PubMed

    Tan, Jing; Song, Gen Di; Song, Jia Sheng; Ren, Shi Hao; Li, Chun Li; Zheng, Zhen Yu; Zhao, Wei Dong

    2016-06-01

    A striking infertile phenotype has been discovered in the DDK strain of mouse. The DDK females are usually infertile when crossed with males of other inbred strains, whereas DDK males exhibit normal fertility in reciprocal crosses. This phenomenon is caused by mutation in the ovum (Om) locus on chromosome 11 and known as the DDK syndrome. Previously, some research groups reported that the embryonic mortality deviated from the semilethal rate in backcrosses between heterozygous (Om/+) females and males of other strains. This embryonic mortality exhibited an aggravated trend with increasing background genes of other strains. These results indicated that some modifier genes of Om were present in other strains. In the present study, a population of N₂2 (Om/+) females from the backcrosses between C57BL/6J (B6) and F₁ (B6♀ × DDK♂) was used to map potential modifier genes of Om. Quantitative trait locus showed that a major locus, namely Amom1 (aggravate modifier gene of Om 1), was located at the middle part of chromosome 9 in mice. The Amom1 could increase the expressivity of Om gene, thereby aggravating embryonic lethality when heterozygous (Om/+) females mated with males of B6 strain. Further, the 1.5 LOD-drop analysis indicated that the confidence interval was between 37.54 and 44.46 cM, ~6.92 cM. Amom1 is the first modifier gene of Om in the B6 background.

  3. Radiation Quality Effects on Transcriptome Profiles in 3-d Cultures After Particle Irradiation

    NASA Technical Reports Server (NTRS)

    Patel, Z. S.; Kidane, Y. H.; Huff, J. L.

    2014-01-01

    In this work, we evaluate the differential effects of low- and high-LET radiation on 3-D organotypic cultures in order to investigate radiation quality impacts on gene expression and cellular responses. Reducing uncertainties in current risk models requires new knowledge on the fundamental differences in biological responses (the so-called radiation quality effects) triggered by heavy ion particle radiation versus low-LET radiation associated with Earth-based exposures. We are utilizing novel 3-D organotypic human tissue models that provide a format for study of human cells within a realistic tissue framework, thereby bridging the gap between 2-D monolayer culture and animal models for risk extrapolation to humans. To identify biological pathway signatures unique to heavy ion particle exposure, functional gene set enrichment analysis (GSEA) was used with whole transcriptome profiling. GSEA has been used extensively as a method to garner biological information in a variety of model systems but has not been commonly used to analyze radiation effects. It is a powerful approach for assessing the functional significance of radiation quality-dependent changes from datasets where the changes are subtle but broad, and where single gene based analysis using rankings of fold-change may not reveal important biological information. We identified 45 statistically significant gene sets at 0.05 q-value cutoff, including 14 gene sets common to gamma and titanium irradiation, 19 gene sets specific to gamma irradiation, and 12 titanium-specific gene sets. Common gene sets largely align with DNA damage, cell cycle, early immune response, and inflammatory cytokine pathway activation. The top gene set enriched for the gamma- and titanium-irradiated samples involved KRAS pathway activation and genes activated in TNF-treated cells, respectively. Another difference noted for the high-LET samples was an apparent enrichment in gene sets involved in cycle cycle/mitotic control. It is plausible that the enrichment in these particular pathways results from the complex DNA damage resulting from high-LET exposure where repair processes are not completed during the same time scale as the less complex damage resulting from low-LET radiation.

  4. Motivation in Cross-Cultural Settings: A Papua New Guinea Psychometric Study

    ERIC Educational Resources Information Center

    Nelson, Genevieve F.; O'Mara, Alison J.; McInerney, Dennis M.; Dowson, Martin

    2006-01-01

    There is a paucity of research on motivation and education in developing countries. Although psychological constructs relating to academic engagement and achievement have been identified and researched in a number of cross-cultural settings this body of research has rarely been extended to the developing world. The processes by which students from…

  5. A Discourse-Centered Approach: Repetition in Cross-Cultural Settings.

    ERIC Educational Resources Information Center

    Yemenici, Alev

    A study investigated how repetition was used in the telling of personal narratives to create emotional involvement on the part of listeners, to evaluate stories, to prevent listeners from asking questions and from losing the story's focus, and to justify narrating that particular story in a cross-cultural setting. It was assumed that narrators…

  6. Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens

    PubMed Central

    Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason

    2017-01-01

    The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737

  7. Cross-ancestry genome-wide association analysis of corneal thickness strengthens link between complex and Mendelian eye diseases.

    PubMed

    Iglesias, Adriana I; Mishra, Aniket; Vitart, Veronique; Bykhovskaya, Yelena; Höhn, René; Springelkamp, Henriët; Cuellar-Partida, Gabriel; Gharahkhani, Puya; Bailey, Jessica N Cooke; Willoughby, Colin E; Li, Xiaohui; Yazar, Seyhan; Nag, Abhishek; Khawaja, Anthony P; Polašek, Ozren; Siscovick, David; Mitchell, Paul; Tham, Yih Chung; Haines, Jonathan L; Kearns, Lisa S; Hayward, Caroline; Shi, Yuan; van Leeuwen, Elisabeth M; Taylor, Kent D; Bonnemaijer, Pieter; Rotter, Jerome I; Martin, Nicholas G; Zeller, Tanja; Mills, Richard A; Staffieri, Sandra E; Jonas, Jost B; Schmidtmann, Irene; Boutin, Thibaud; Kang, Jae H; Lucas, Sionne E M; Wong, Tien Yin; Beutel, Manfred E; Wilson, James F; Uitterlinden, André G; Vithana, Eranga N; Foster, Paul J; Hysi, Pirro G; Hewitt, Alex W; Khor, Chiea Chuen; Pasquale, Louis R; Montgomery, Grant W; Klaver, Caroline C W; Aung, Tin; Pfeiffer, Norbert; Mackey, David A; Hammond, Christopher J; Cheng, Ching-Yu; Craig, Jamie E; Rabinowitz, Yaron S; Wiggs, Janey L; Burdon, Kathryn P; van Duijn, Cornelia M; MacGregor, Stuart

    2018-05-14

    Central corneal thickness (CCT) is a highly heritable trait associated with complex eye diseases such as keratoconus and glaucoma. We perform a genome-wide association meta-analysis of CCT and identify 19 novel regions. In addition to adding support for known connective tissue-related pathways, pathway analyses uncover previously unreported gene sets. Remarkably, >20% of the CCT-loci are near or within Mendelian disorder genes. These included FBN1, ADAMTS2 and TGFB2 which associate with connective tissue disorders (Marfan, Ehlers-Danlos and Loeys-Dietz syndromes), and the LUM-DCN-KERA gene complex involved in myopia, corneal dystrophies and cornea plana. Using index CCT-increasing variants, we find a significant inverse correlation in effect sizes between CCT and keratoconus (r = -0.62, P = 5.30 × 10 -5 ) but not between CCT and primary open-angle glaucoma (r = -0.17, P = 0.2). Our findings provide evidence for shared genetic influences between CCT and keratoconus, and implicate candidate genes acting in collagen and extracellular matrix regulation.

  8. Aligning a New Reference Genetic Map of Lupinus angustifolius with the Genome Sequence of the Model Legume, Lotus japonicus

    PubMed Central

    Nelson, Matthew N.; Moolhuijzen, Paula M.; Boersma, Jeffrey G.; Chudy, Magdalena; Lesniewska, Karolina; Bellgard, Matthew; Oliver, Richard P.; Święcicki, Wojciech; Wolko, Bogdan; Cowling, Wallace A.; Ellwood, Simon R.

    2010-01-01

    We have developed a dense reference genetic map of Lupinus angustifolius (2n = 40) based on a set of 106 publicly available recombinant inbred lines derived from a cross between domesticated and wild parental lines. The map comprised 1090 loci in 20 linkage groups and three small clusters, drawing together data from several previous mapping publications plus almost 200 new markers, of which 63 were gene-based markers. A total of 171 mainly gene-based, sequence-tagged site loci served as bridging points for comparing the Lu. angustifolius genome with the genome sequence of the model legume, Lotus japonicus via BLASTn homology searching. Comparative analysis indicated that the genomes of Lu. angustifolius and Lo. japonicus are highly diverged structurally but with significant regions of conserved synteny including the region of the Lu. angustifolius genome containing the pod-shatter resistance gene, lentus. We discuss the potential of synteny analysis for identifying candidate genes for domestication traits in Lu. angustifolius and in improving our understanding of Fabaceae genome evolution. PMID:20133394

  9. Identification and mapping of Sr46 from Aegilops tauschii accession CIae 25 conferring resistance to race TTKSK (Ug99) of wheat stem rust pathogen.

    PubMed

    Yu, Guotai; Zhang, Qijun; Friesen, Timothy L; Rouse, Matthew N; Jin, Yue; Zhong, Shaobin; Rasmussen, Jack B; Lagudah, Evans S; Xu, Steven S

    2015-03-01

    Mapping studies confirm that resistance to Ug99 race of stem rust pathogen in Aegilops tauschii accession Clae 25 is conditioned by Sr46 and markers linked to the gene were developed for marker-assisted selection. The race TTKSK (Ug99) of Puccinia graminis f. sp. tritici, the causal pathogen for wheat stem rust, is considered as a major threat to global wheat production. To address this threat, researchers across the world have been devoted to identifying TTKSK-resistant genes. Here, we report the identification and mapping of a stem rust resistance gene in Aegilops tauschii accession CIae 25 that confers resistance to TTKSK and the development of molecular markers for the gene. An F2 population of 710 plants from an Ae. tauschii cross CIae 25 × AL8/78 were first evaluated against race TPMKC. A set of 14 resistant and 116 susceptible F2:3 families from the F2 plants were then evaluated for their reactions to TTKSK. Based on the tests, 179 homozygous susceptible F2 plants were selected as the mapping population to identify the simple sequence repeat (SSR) and sequence tagged site (STS) markers linked to the gene by bulk segregant analysis. A dominant stem rust resistance gene was identified and mapped with 16 SSR and five new STS markers to the deletion bin 2DS5-0.47-1.00 of chromosome arm 2DS in which Sr46 was located. Molecular marker and stem rust tests on CIae 25 and two Ae. tauschii accessions carrying Sr46 confirmed that the gene in CIae 25 is Sr46. This study also demonstrated that Sr46 is temperature-sensitive being less effective at low temperatures. The marker validation indicated that two closely linked markers Xgwm210 and Xwmc111 can be used for marker-assisted selection of Sr46 in wheat breeding programs.

  10. Identification of Causal Genes, Networks, and Transcriptional Regulators of REM Sleep and Wake

    PubMed Central

    Millstein, Joshua; Winrow, Christopher J.; Kasarskis, Andrew; Owens, Joseph R.; Zhou, Lili; Summa, Keith C.; Fitzpatrick, Karrie; Zhang, Bin; Vitaterna, Martha H.; Schadt, Eric E.; Renger, John J.; Turek, Fred W.

    2011-01-01

    Study Objective: Sleep-wake traits are well-known to be under substantial genetic control, but the specific genes and gene networks underlying primary sleep-wake traits have largely eluded identification using conventional approaches, especially in mammals. Thus, the aim of this study was to use systems genetics and statistical approaches to uncover the genetic networks underlying 2 primary sleep traits in the mouse: 24-h duration of REM sleep and wake. Design: Genome-wide RNA expression data from 3 tissues (anterior cortex, hypothalamus, thalamus/midbrain) were used in conjunction with high-density genotyping to identify candidate causal genes and networks mediating the effects of 2 QTL regulating the 24-h duration of REM sleep and one regulating the 24-h duration of wake. Setting: Basic sleep research laboratory. Patients or Participants: Male [C57BL/6J × (BALB/cByJ × C57BL/6J*) F1] N2 mice (n = 283). Interventions: None. Measurements and Results: The genetic variation of a mouse N2 mapping cross was leveraged against sleep-state phenotypic variation as well as quantitative gene expression measurement in key brain regions using integrative genomics approaches to uncover multiple causal sleep-state regulatory genes, including several surprising novel candidates, which interact as components of networks that modulate REM sleep and wake. In particular, it was discovered that a core network module, consisting of 20 genes, involved in the regulation of REM sleep duration is conserved across the cortex, hypothalamus, and thalamus. A novel application of a formal causal inference test was also used to identify those genes directly regulating sleep via control of expression. Conclusion: Systems genetics approaches reveal novel candidate genes, complex networks and specific transcriptional regulators of REM sleep and wake duration in mammals. Citation: Millstein J; Winrow CJ; Kasarskis A; Owens JR; Zhou L; Summa KC; Fitzpatrick K; Zhang B; Vitaterna MH; Schadt EE; Renger JJ; Turek FW. Identification of causal genes, networks, and transcriptional regulators of REM sleep and wake. SLEEP 2011;34(11):1469-1477. PMID:22043117

  11. Gastrointestinal microbiota and mucosal immune gene expression in neonatal pigs reared in a cross-fostering model.

    PubMed

    Maradiaga, Nidia; Aldridge, Brian; Zeineldin, Mohamed; Lowe, James

    2018-05-06

    Cross fostering is employed to equalize the number of piglet between litters ensuring colostrum intake for their survival and growth. However, little is known about the impact of cross fostering on the intestinal microbiota and mucosal immune gene expression of the neonatal pig. The objective of this study was to determine the influence of maternal microbial communities on the gastrointestinal (GI) microbiota and mucosal immune gene expression in young pigs reared in a cross-fostering model. Piglets were given high quality colostrum from birth dam or foster dam upon birth. Twenty-four piglets were randomly assigned at birth to 1 of 3 treatments according to colostrum source and postcolostral milk feeding during, as follow: treatment 1 (n = 8), received colostrum and post-colostral milk feeding from their own dam; treatment 2 (n = 8), received colostrum from foster dam and returned to their own dam for post-colostral milk feeding; and treatment 3 (n = 8), received colostrum and post-colostral milk feeding from foster dam. Genomic DNA was extracted, and the V1-V3 hypervariable region of the bacterial 16S rRNA gene was amplified and sequenced using the Illumina MiSeq platform. Quantitative real-time PCR analysis was also performed to quantify the expression of toll-like receptors (TLR) 2, TLR 4, TLR 10, tumor necrosis factor alpha (TNFα), interferon gamma (IFNγ), and interleukin (IL) 4 and IL 10. Data analysis revealed that microbial communities were varied according to the GI biogeographical location, with colon being the most diverse section. Bacterial communities in both maternal colostrum and vaginal samples were significantly associated with those present in the fecal samples of piglets. Cross-fostering did not affect bacterial communities present in the piglet GI tract. However, the mRNA expression of TLR and inflammatory cytokines changed (P < 0.05) with biogeographical location in the GI tract. Higher mRNA expression of TLR and inflammatory cytokines was observed in ileum and ileum associated lymph tissues. This study suggests an impact of colostrum and maternal microbial communities on the microbiota development and mucosal immune gene expression in the newly born piglet. This study revealed novel information about the distribution and expression patterns of TLR and inflammatory cytokines in the GI tract of the young pig. Future studies are needed to determine the role and clinical importance of the mucosal microbiota and mucosal gene expression in health, productivity, and susceptibility to the development of GI disease, in piglets. Published by Elsevier Ltd.

  12. Global gene expression profile of peripheral blood mononuclear cells challenged with Theileria annulata in crossbred and indigenous cattle.

    PubMed

    Kumar, Amod; Gaur, Gyanendra Kumar; Gandham, Ravi Kumar; Panigrahi, Manjit; Ghosh, Shrikant; Saravanan, B C; Bhushan, Bharat; Tiwari, Ashok Kumar; Sulabh, Sourabh; Priya, Bhuvana; V N, Muhasin Asaf; Gupta, Jay Prakash; Wani, Sajad Ahmad; Sahu, Amit Ranjan; Sahoo, Aditya Prasad

    2017-01-01

    Bovine tropical theileriosis is an important haemoprotozoan disease associated with high rates of morbidity and mortality particularly in exotic and crossbred cattle. It is one of the major constraints of the livestock development programmes in India and Southeast Asia. Indigenous cattle (Bos indicus) are reported to be comparatively less affected than exotic and crossbred cattle. However, genetic basis of resistance to tropical theileriosis in indigenous cattle is not well documented. Recent studies incited an idea that differentially expressed genes in exotic and indigenous cattle play significant role in breed specific resistance to tropical theileriosis. The present study was designed to determine the global gene expression profile in peripheral blood mononuclear cells derived from indigenous (Tharparkar) and cross-bred cattle following in vitro infection of T. annulata (Parbhani strain). Two separate microarray experiments were carried out each for cross-bred and Tharparkar cattle. The cross-bred cattle showed 1082 differentially expressed genes (DEGs). Out of total DEGs, 597 genes were down-regulated and 485 were up-regulated. Their fold change varied from 2283.93 to -4816.02. Tharparkar cattle showed 875 differentially expressed genes including 451 down-regulated and 424 up-regulated. The fold change varied from 94.93 to -19.20. A subset of genes was validated by qRT-PCR and results were correlated well with microarray data indicating that microarray results provided an accurate report of transcript level. Functional annotation study of DEGs confirmed their involvement in various pathways including response to oxidative stress, immune system regulation, cell proliferation, cytoskeletal changes, kinases activity and apoptosis. Gene network analysis of these DEGs plays an important role to understand the interaction among genes. It is therefore, hypothesized that the different susceptibility to tropical theileriosis exhibited by indigenous and crossbred cattle is due to breed-specific differences in the dealing of infected cells with other immune cells, which ultimately influence the immune response responded against T. annulata infection. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Symbiosis and the origin of eukaryotic motility

    NASA Technical Reports Server (NTRS)

    Margulis, L.; Hinkle, G.

    1991-01-01

    Ongoing work to test the hypothesis of the origin of eukaryotic cell organelles by microbial symbioses is discussed. Because of the widespread acceptance of the serial endosymbiotic theory (SET) of the origin of plastids and mitochondria, the idea of the symbiotic origin of the centrioles and axonemes for spirochete bacteria motility symbiosis was tested. Intracellular microtubular systems are purported to derive from symbiotic associations between ancestral eukaryotic cells and motile bacteria. Four lines of approach to this problem are being pursued: (1) cloning the gene of a tubulin-like protein discovered in Spirocheata bajacaliforniesis; (2) seeking axoneme proteins in spirochets by antibody cross-reaction; (3) attempting to cultivate larger, free-living spirochetes; and (4) studying in detail spirochetes (e.g., Cristispira) symbiotic with marine animals. Other aspects of the investigation are presented.

  14. No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes.

    PubMed

    Johnson, Emma C; Border, Richard; Melroy-Greif, Whitney E; de Leeuw, Christiaan A; Ehringer, Marissa A; Keller, Matthew C

    2017-11-15

    A recent analysis of 25 historical candidate gene polymorphisms for schizophrenia in the largest genome-wide association study conducted to date suggested that these commonly studied variants were no more associated with the disorder than would be expected by chance. However, the same study identified other variants within those candidate genes that demonstrated genome-wide significant associations with schizophrenia. As such, it is possible that variants within historic schizophrenia candidate genes are associated with schizophrenia at levels above those expected by chance, even if the most-studied specific polymorphisms are not. The present study used association statistics from the largest schizophrenia genome-wide association study conducted to date as input to a gene set analysis to investigate whether variants within schizophrenia candidate genes are enriched for association with schizophrenia. As a group, variants in the most-studied candidate genes were no more associated with schizophrenia than were variants in control sets of noncandidate genes. While a small subset of candidate genes did appear to be significantly associated with schizophrenia, these genes were not particularly noteworthy given the large number of more strongly associated noncandidate genes. The history of schizophrenia research should serve as a cautionary tale to candidate gene investigators examining other phenotypes: our findings indicate that the most investigated candidate gene hypotheses of schizophrenia are not well supported by genome-wide association studies, and it is likely that this will be the case for other complex traits as well. Copyright © 2017 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  15. Hox genes and study of Hox genes in crustacean

    NASA Astrophysics Data System (ADS)

    Hou, Lin; Chen, Zhijuan; Xu, Mingyu; Lin, Shengguo; Wang, Lu

    2004-12-01

    Homeobox genes have been discovered in many species. These genes are known to play a major role in specifying regional identity along the anterior-posterior axis of animals from a wide range of phyla. The products of the homeotic genes are a set of evolutionarily conserved transcription factors that control elaborate developmental processes and specify cell fates in metazoans. Crustacean, presenting a variety of body plans not encountered in any other class or phylum of the Metazoa, has been shown to possess a single set of homologous Hox genes like insect. The ancestral crustacean Hox gene complex comprised ten genes: eight homologous to the hometic Hox genes and two related to nonhomeotic genes presented within the insect Hox complexes. The crustacean in particular exhibits an abundant diversity segment specialization and tagmosis. This morphological diversity relates to the Hox genes. In crustacean body plan, different Hox genes control different segments and tagmosis.

  16. EnsMart: A Generic System for Fast and Flexible Access to Biological Data

    PubMed Central

    Kasprzyk, Arek; Keefe, Damian; Smedley, Damian; London, Darin; Spooner, William; Melsopp, Craig; Hammond, Martin; Rocca-Serra, Philippe; Cox, Tony; Birney, Ewan

    2004-01-01

    The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to `non-Ensembl' data sets. PMID:14707178

  17. A method for constructing single-copy lac fusions in Salmonella typhimurium and its application to the hemA-prfA operon.

    PubMed

    Elliott, T

    1992-01-01

    This report describes a set of Escherichia coli and Salmonella typhimurium strains that permits the reversible transfer of lac fusions between a plasmid and either bacterial chromosome. The system relies on homologous recombination in an E. coli recD host for transfer from plasmid to chromosome. This E. coli strain carries the S. typhimurium put operon inserted into trp, and the resulting fusions are of the form trp::put::[Kanr-X-lac], where X is the promoter or gene fragment under study. The put homology flanks the lac fusion segment, so that fusions can be transduced into S. typhimurium, replacing the resident put operon. Subsequent transduction into an S. typhimurium strain with a large chromosomal deletion covering put allows selection for recombinants that inherit the fusion on a plasmid. A transposable version of the put operon was constructed and used to direct lac fusions to novel locations, including the F plasmid and the ara locus. Transductional crosses between strains with fusions bearing different segments of the hemA-prfA operon were used to determine the contribution of the hemA promoter region to expression of the prfA gene and other genes downstream of hemA in S. typhimurium.

  18. Validating internal controls for quantitative plant gene expression studies

    PubMed Central

    Brunner, Amy M; Yakovlev, Igor A; Strauss, Steven H

    2004-01-01

    Background Real-time reverse transcription PCR (RT-PCR) has greatly improved the ease and sensitivity of quantitative gene expression studies. However, accurate measurement of gene expression with this method relies on the choice of a valid reference for data normalization. Studies rarely verify that gene expression levels for reference genes are adequately consistent among the samples used, nor compare alternative genes to assess which are most reliable for the experimental conditions analyzed. Results Using real-time RT-PCR to study the expression of 10 poplar (genus Populus) housekeeping genes, we demonstrate a simple method for determining the degree of stability of gene expression over a set of experimental conditions. Based on a traditional method for analyzing the stability of varieties in plant breeding, it defines measures of gene expression stability from analysis of variance (ANOVA) and linear regression. We found that the potential internal control genes differed widely in their expression stability over the different tissues, developmental stages and environmental conditions studied. Conclusion Our results support that quantitative comparisons of candidate reference genes are an important part of real-time RT-PCR studies that seek to precisely evaluate variation in gene expression. The method we demonstrated facilitates statistical and graphical evaluation of gene expression stability. Selection of the best reference gene for a given set of experimental conditions should enable detection of biologically significant changes in gene expression that are too small to be revealed by less precise methods, or when highly variable reference genes are unknowingly used in real-time RT-PCR experiments. PMID:15317655

  19. Genotypic and Phenotypic Detection of AmpC β-lactamases in Enterobacter spp. Isolated from a Teaching Hospital in Malaysia.

    PubMed

    Mohd Khari, Fatin Izzati; Karunakaran, Rina; Rosli, Roshalina; Tee Tay, Sun

    2016-01-01

    The objective of this study was to determine the occurrence of chromosomal and plasmid-mediated β-lactamases (AmpC) genes in a collection of Malaysian isolates of Enterobacter species. Several phenotypic tests for detection of AmpC production of Enterobacter spp. were evaluated and the agreements between tests were determined. Antimicrobial susceptibility profiles for 117 Enterobacter clinical isolates obtained from the Medical Microbiology Diagnostic Laboratory, University Malaya Medical Centre, Malaysia, from November 2012-February 2014 were determined in accordance to CLSI guidelines. AmpC genes were detected using a multiplex PCR assay targeting the MIR/ACT gene (closely related to chromosomal EBC family gene) and other plasmid-mediated genes, including DHA, MOX, CMY, ACC, and FOX. The AmpC β-lactamase production of the isolates was assessed using cefoxitin disk screening test, D69C AmpC detection set, cefoxitin-cloxacillin double disk synergy test (CC-DDS) and AmpC induction test. Among the Enterobacter isolates in this study, 39.3% were resistant to cefotaxime and ceftriaxone and 23.9% were resistant to ceftazidime. Ten (8.5%) of the isolates were resistant to cefepime, and one isolate was resistant to meropenem. Chromosomal EBC family gene was amplified from 36 (47.4%) E. cloacae and three (25%) E. asburiae. A novel blaDHA type plasmid-mediated AmpC gene was identified for the first time from an E. cloacae isolate. AmpC β-lactamase production was detected in 99 (89.2%) of 111 potential AmpC β-lactamase producers (positive in cefoxitin disk screening) using D69C AmpC detection set. The detection rates were lower with CC-DDS (80.2%) and AmpC induction tests (50.5%). There was low agreement between the D69C AmpC detection set and the other two phenotypic tests. Of the 40 isolates with AmpC genes detected in this study, 87.5%, 77.5% and 50.0% of these isolates were positive by the D69C AmpC detection set, CC-DDS and AmpC induction tests, respectively. Besides MIR/ACT gene, a novel plasmid-mediated AmpC gene belonging to the DHA-type was identified in this study. Low agreement was noted between the D69C AmpC detection set and two other phenotypic tests for detection of AmpC production in Enterobacter spp. As plasmid-mediated genes may serve as the reservoir for the emergence of antibiotic resistance in a clinical setting, surveillance and infection control measures are necessary to limit the spread of these genes in the hospital.

  20. Genotypic and Phenotypic Detection of AmpC β-lactamases in Enterobacter spp. Isolated from a Teaching Hospital in Malaysia

    PubMed Central

    Mohd Khari, Fatin Izzati; Karunakaran, Rina; Rosli, Roshalina; Tee Tay, Sun

    2016-01-01

    Objectives The objective of this study was to determine the occurrence of chromosomal and plasmid-mediated β-lactamases (AmpC) genes in a collection of Malaysian isolates of Enterobacter species. Several phenotypic tests for detection of AmpC production of Enterobacter spp. were evaluated and the agreements between tests were determined. Methods Antimicrobial susceptibility profiles for 117 Enterobacter clinical isolates obtained from the Medical Microbiology Diagnostic Laboratory, University Malaya Medical Centre, Malaysia, from November 2012—February 2014 were determined in accordance to CLSI guidelines. AmpC genes were detected using a multiplex PCR assay targeting the MIR/ACT gene (closely related to chromosomal EBC family gene) and other plasmid-mediated genes, including DHA, MOX, CMY, ACC, and FOX. The AmpC β-lactamase production of the isolates was assessed using cefoxitin disk screening test, D69C AmpC detection set, cefoxitin-cloxacillin double disk synergy test (CC-DDS) and AmpC induction test. Results Among the Enterobacter isolates in this study, 39.3% were resistant to cefotaxime and ceftriaxone and 23.9% were resistant to ceftazidime. Ten (8.5%) of the isolates were resistant to cefepime, and one isolate was resistant to meropenem. Chromosomal EBC family gene was amplified from 36 (47.4%) E. cloacae and three (25%) E. asburiae. A novel blaDHA type plasmid-mediated AmpC gene was identified for the first time from an E. cloacae isolate. AmpC β-lactamase production was detected in 99 (89.2%) of 111 potential AmpC β-lactamase producers (positive in cefoxitin disk screening) using D69C AmpC detection set. The detection rates were lower with CC-DDS (80.2%) and AmpC induction tests (50.5%). There was low agreement between the D69C AmpC detection set and the other two phenotypic tests. Of the 40 isolates with AmpC genes detected in this study, 87.5%, 77.5% and 50.0% of these isolates were positive by the D69C AmpC detection set, CC-DDS and AmpC induction tests, respectively. Conclusions Besides MIR/ACT gene, a novel plasmid-mediated AmpC gene belonging to the DHA-type was identified in this study. Low agreement was noted between the D69C AmpC detection set and two other phenotypic tests for detection of AmpC production in Enterobacter spp. As plasmid-mediated genes may serve as the reservoir for the emergence of antibiotic resistance in a clinical setting, surveillance and infection control measures are necessary to limit the spread of these genes in the hospital. PMID:26963619

  1. Resistance Against Basil Downy Mildew in Ocimum Species.

    PubMed

    Ben-Naim, Yariv; Falach, Lidan; Cohen, Yigal

    2015-06-01

    Downy mildew, caused by the oomycete Peronospora belbahrii, is a devastating disease of sweet basil. In this study, 113 accessions of Ocimum species (83 Plant Introduction entries and 30 commercial entries) were tested for resistance against downy mildew at the seedling stage in growth chambers, and during three seasons, in the field. Most entries belonging to O. basilicum were highly susceptible whereas most entries belonging to O. americanum, O. kilimanadascharicum, O. gratissimum, O. campechianum, or O. tenuiflorum were highly resistant at both the seedling stage and the field. Twenty-seven highly resistant individual plants were each crossed with the susceptible sweet basil 'Peri', and the F1 progeny plants were examined for disease resistance. The F1 plants of two crosses were highly resistant, F1 plants of 24 crosses were moderately resistant, and F1 plants of one cross were susceptible, suggesting full, partial, or no dominance of the resistance gene(s), respectively. These data confirm the feasibility of producing downy mildew-resistant cultivars of sweet basil by crossing with wild Ocimum species.

  2. A Cross-Sectional Study of Shared Attention by Children with Autism and Typically Developing Children in an Inclusive Preschool Setting

    ERIC Educational Resources Information Center

    Rice, Catherine E.; Adamson, Lauren B.; Winner, Ellen; McGee, Gail G.

    2016-01-01

    This study examined the ways in which young children with autism and typical children focus their engagement with objects and people (peers and adults) in an inclusive preschool setting. A cross-sectional analysis was conducted of 30 typical children and 30 children with autism, with 10 different children from each group at 3 different ages (2, 3,…

  3. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-08

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Establishing the role of rare coding variants in known Parkinson's disease risk loci.

    PubMed

    Jansen, Iris E; Gibbs, J Raphael; Nalls, Mike A; Price, T Ryan; Lubbe, Steven; van Rooij, Jeroen; Uitterlinden, André G; Kraaij, Robert; Williams, Nigel M; Brice, Alexis; Hardy, John; Wood, Nicholas W; Morris, Huw R; Gasser, Thomas; Singleton, Andrew B; Heutink, Peter; Sharma, Manu

    2017-11-01

    Many common genetic factors have been identified to contribute to Parkinson's disease (PD) susceptibility, improving our understanding of the related underlying biological mechanisms. The involvement of rarer variants in these loci has been poorly studied. Using International Parkinson's Disease Genomics Consortium data sets, we performed a comprehensive study to determine the impact of rare variants in 23 previously published genome-wide association studies (GWAS) loci in PD. We applied Prix fixe to select the putative causal genes underneath the GWAS peaks, which was based on underlying functional similarities. The Sequence Kernel Association Test was used to analyze the joint effect of rare, common, or both types of variants on PD susceptibility. All genes were tested simultaneously as a gene set and each gene individually. We observed a moderate association of common variants, confirming the involvement of the known PD risk loci within our genetic data sets. Focusing on rare variants, we identified additional association signals for LRRK2, STBD1, and SPATA19. Our study suggests an involvement of rare variants within several putatively causal genes underneath previously identified PD GWAS peaks. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  6. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.

    Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less

  7. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    DOE PAGES

    Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; ...

    2014-01-01

    Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less

  8. Mining the archives: a cross-platform analysis of gene expression profiles in archival formalin-fixed paraffin-embedded (FFPE) tissue.

    EPA Science Inventory

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation...

  9. Cross-talk between Msx/Dlx homeobox genes and vitamin D during tooth mineralization.

    PubMed

    Lézot, F; Descroix, V; Mesbah, M; Hotton, D; Blin, C; Papagerakis, P; Mauro, N; Kato, S; MacDougall, M; Sharpe, P; Berdal, A

    2002-01-01

    Rickets is associated with site-specific disorders of enamel and dentin formation, which may reflect the impact of vitamin D on a morphogenetic pathway. This study is devoted to potential cross-talk between vitamin D and Msx/Dlx transcription factors. We raised the question of a potential link between tooth defects seen in mice with rickets and Msx2 gene misexpression, using mutant mice lacking the nuclear vitamin D receptor as an animal model. Our data showed a modulation of Msx2 expression. In order to search for a functional impact of this Msx2 misexpression secondary to rickets, we focused our attention on osteocalcin as a target gene for both vitamin D and Msx2. Combining Msx2 overexpression and vitamin D addition in vitro, we showed an inhibitory effect on osteocalcin expression in immortalized MO6-G3 odontoblasts. Finally, in the same cells, such combinations appeared to modulate VDR expression outlining the existence of complex cross-regulations between vitamin D and Msx/Dix pathways.

  10. Estrogen-related receptor alpha is critical for the growth of estrogen receptor-negative breast cancer

    PubMed Central

    Stein, Rebecca A.; Chang, Ching-yi; Kazmin, Dmitri A.; Way, James; Schroeder, Thies; Wergin, Melanie; Dewhirst, Mark W.; McDonnell, Donald P.

    2009-01-01

    Expression of estrogen-related receptor alpha (ERRα) has recently been shown to carry negative prognostic significance in breast and ovarian cancers. The specific role of this orphan nuclear receptor in tumor growth and progression, however, is yet to be fully understood. The significant homology between estrogen receptor alpha (ERα) and ERRα initially suggested that these receptors may have similar transcriptional targets. Using the well-characterized ERα-positive MCF-7 breast cancer cell line, we sought to gain a genome-wide picture of ERα-ERRα cross-talk using an unbiased microarray approach. In addition to generating a host of novel ERRα target genes, this study yielded the surprising result that most ERRα-regulated genes are unrelated to estrogen-signaling. The relatively small number of genes regulated by both ERα and ERRα led us to expand our study to the more aggressive and less clinically treatable ERα-negative class of breast cancers. In this setting we found that ERRα expression is required for the basal level of expression of many known and novel ERRα target genes. Introduction of an siRNA directed to ERRα into the highly aggressive breast carcinoma MDA-MB-231 cell line dramatically reduced the migratory potential of these cells. Although stable knockdown of ERRα expression in MDA-MB-231 cells had no impact on in vitro cell proliferation, a significant reduction of tumor growth rate was observed when these cells were implanted as xenografts. Our results confirm a role for ERRα in breast cancer growth and highlight it as a potential therapeutic target for estrogen receptor-negative breast cancer. PMID:18974123

  11. A systems-genetics approach and data mining tool to assist in the discovery of genes underlying complex traits in Oryza sativa.

    PubMed

    Ficklin, Stephen P; Feltus, Frank Alex

    2013-01-01

    Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.

  12. A Systems-Genetics Approach and Data Mining Tool to Assist in the Discovery of Genes Underlying Complex Traits in Oryza sativa

    PubMed Central

    Ficklin, Stephen P.; Feltus, Frank Alex

    2013-01-01

    Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance. PMID:23874666

  13. Breeding maize for silage and biofuel production, an illustration of a step forward with the genome sequence.

    PubMed

    Barrière, Yves; Courtial, Audrey; Chateigner-Boutin, Anne-Laure; Denoue, Dominique; Grima-Pettenati, Jacqueline

    2016-01-01

    The knowledge of the gene families mostly impacting cell wall digestibility variations would significantly increase the efficiency of marker-assisted selection when breeding maize and grass varieties with improved silage feeding value and/or with better straw fermentability into alcohol or methane. The maize genome sequence of the B73 inbred line was released at the end of 2009, opening up new avenues to identify the genetic determinants of quantitative traits. Colocalizations between a large set of candidate genes putatively involved in secondary cell wall assembly and QTLs for cell wall digestibility (IVNDFD) were then investigated, considering physical positions of both genes and QTLs. Based on available data from six RIL progenies, 59 QTLs corresponding to 38 non-overlapping positions were matched up with a list of 442 genes distributed all over the genome. Altogether, 176 genes colocalized with IVNDFD QTLs and most often, several candidate genes colocalized at each QTL position. Frequent QTL colocalizations were found firstly with genes encoding ZmMYB and ZmNAC transcription factors, and secondly with genes encoding zinc finger, bHLH, and xylogen regulation factors. In contrast, close colocalizations were less frequent with genes involved in monolignol biosynthesis, and found only with the C4H2, CCoAOMT5, and CCR1 genes. Close colocalizations were also infrequent with genes involved in cell wall feruloylation and cross-linkages. Altogether, investigated colocalizations between candidate genes and cell wall digestibility QTLs suggested a prevalent role of regulation factors over constitutive cell wall genes on digestibility variations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  14. Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV).

    PubMed

    Piette, Elizabeth R; Moore, Jason H

    2018-01-01

    Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole. We propose a new cross validation method, proportional instance cross validation (PICV), that preserves the original distribution of an independent variable when splitting the data set into training and testing partitions. We apply PICV to simulated GWAS data with epistatic interactions of varying minor allele frequencies and prevalences and compare performance to that of a traditional cross validation procedure in which individuals are randomly allocated to training and testing partitions. Sensitivity and positive predictive value are significantly improved across all tested scenarios for PICV compared to traditional cross validation. We also apply PICV to GWAS data from a study of primary open-angle glaucoma to investigate a previously-reported interaction, which fails to significantly replicate; PICV however improves the consistency of testing and training results. Application of traditional machine learning procedures to biomedical data may require modifications to better suit intrinsic characteristics of the data, such as the potential for highly imbalanced genotype distributions in the case of epistasis detection. The reproducibility of genetic interaction findings can be improved by considering this variable imbalance in cross validation implementation, such as with PICV. This approach may be extended to problems in other domains in which imbalanced variable distributions are a concern.

  15. Novel Gene Expression Profile of Women with Intrinsic Skin Youthfulness by Whole Transcriptome Sequencing

    PubMed Central

    Xu, Jin; Spitale, Robert C.; Guan, Linna; Flynn, Ryan A.; Torre, Eduardo A.; Li, Rui; Raber, Inbar; Qu, Kun; Kern, Dale; Knaggs, Helen E.; Chang, Howard Y.; Chang, Anne Lynn S.

    2016-01-01

    While much is known about genes that promote aging, little is known about genes that protect against or prevent aging, particularly in human skin. The main objective of this study was to perform an unbiased, whole transcriptome search for genes that associate with intrinsic skin youthfulness. To accomplish this, healthy women (n = 122) of European descent, ages 18–89 years with Fitzpatrick skin type I/II were examined for facial skin aging parameters and clinical covariates, including smoking and ultraviolet exposure. Skin youthfulness was defined as the top 10% of individuals whose assessed skin aging features were most discrepant with their chronological ages. Skin biopsies from sun-protected inner arm were subjected to 3’-end sequencing for expression quantification, with results verified by quantitative reverse transcriptase-polymerase chain reaction. Unbiased clustering revealed gene expression signatures characteristic of older women with skin youthfulness (n = 12) compared to older women without skin youthfulness (n = 33), after accounting for gene expression changes associated with chronological age alone. Gene set analysis was performed using Genomica open-access software. This study identified a novel set of candidate skin youthfulness genes demonstrating differences between SY and non-SY group, including pleckstrin homology like domain family A member 1 (PHLDA1) (p = 2.4x10-5), a follicle stem cell marker, and hyaluronan synthase 2-anti-sense 1 (HAS2-AS1) (p = 0.00105), a non-coding RNA that is part of the hyaluronan synthesis pathway. We show that immunologic gene sets are the most significantly altered in skin youthfulness (with the most significant gene set p = 2.4x10-5), suggesting the immune system plays an important role in skin youthfulness, a finding that has not previously been recognized. These results are a valuable resource from which multiple future studies may be undertaken to better understand the mechanisms that promote skin youthfulness in humans. PMID:27829007

  16. Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma

    PubMed Central

    2013-01-01

    Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432

  17. Identification, characterization and expression analysis of lineage-specific genes within sweet orange (Citrus sinensis).

    PubMed

    Xu, Yuantao; Wu, Guizhi; Hao, Baohai; Chen, Lingling; Deng, Xiuxin; Xu, Qiang

    2015-11-23

    With the availability of rapidly increasing number of genome and transcriptome sequences, lineage-specific genes (LSGs) can be identified and characterized. Like other conserved functional genes, LSGs play important roles in biological evolution and functions. Two set of citrus LSGs, 296 citrus-specific genes (CSGs) and 1039 orphan genes specific to sweet orange, were identified by comparative analysis between the sweet orange genome sequences and 41 genomes and 273 transcriptomes. With the two sets of genes, gene structure and gene expression pattern were investigated. On average, both the CSGs and orphan genes have fewer exons, shorter gene length and higher GC content when compared with those evolutionarily conserved genes (ECs). Expression profiling indicated that most of the LSGs expressed in various tissues of sweet orange and some of them exhibited distinct temporal and spatial expression patterns. Particularly, the orphan genes were preferentially expressed in callus, which is an important pluripotent tissue of citrus. Besides, part of the CSGs and orphan genes expressed responsive to abiotic stress, indicating their potential functions during interaction with environment. This study identified and characterized two sets of LSGs in citrus, dissected their sequence features and expression patterns, and provided valuable clues for future functional analysis of the LSGs in sweet orange.

  18. Identification and qualification of 500 nuclear, single-copy, orthologous genes for the Eupulmonata (Gastropoda) using transcriptome sequencing and exon capture.

    PubMed

    Teasdale, Luisa C; Köhler, Frank; Murray, Kevin D; O'Hara, Tim; Moussalli, Adnan

    2016-09-01

    The qualification of orthology is a significant challenge when developing large, multiloci phylogenetic data sets from assembled transcripts. Transcriptome assemblies have various attributes, such as fragmentation, frameshifts and mis-indexing, which pose problems to automated methods of orthology assessment. Here, we identify a set of orthologous single-copy genes from transcriptome assemblies for the land snails and slugs (Eupulmonata) using a thorough approach to orthology determination involving manual alignment curation, gene tree assessment and sequencing from genomic DNA. We qualified the orthology of 500 nuclear, protein-coding genes from the transcriptome assemblies of 21 eupulmonate species to produce the most complete phylogenetic data matrix for a major molluscan lineage to date, both in terms of taxon and character completeness. Exon capture targeting 490 of the 500 genes (those with at least one exon >120 bp) from 22 species of Australian Camaenidae successfully captured sequences of 2825 exons (representing all targeted genes), with only a 3.7% reduction in the data matrix due to the presence of putative paralogs or pseudogenes. The automated pipeline Agalma retrieved the majority of the manually qualified 500 single-copy gene set and identified a further 375 putative single-copy genes, although it failed to account for fragmented transcripts resulting in lower data matrix completeness when considering the original 500 genes. This could potentially explain the minor inconsistencies we observed in the supported topologies for the 21 eupulmonate species between the manually curated and 'Agalma-equivalent' data set (sharing 458 genes). Overall, our study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a range of evolutionary depths and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design. © 2016 John Wiley & Sons Ltd.

  19. Clinical interpretation of pathogenic ATM and CHEK2 variants on multigene panel tests: navigating moderate risk.

    PubMed

    West, Allison H; Blazer, Kathleen R; Stoll, Jessica; Jones, Matthew; Weipert, Caroline M; Nielsen, Sarah M; Kupfer, Sonia S; Weitzel, Jeffrey N; Olopade, Olufunmilayo I

    2018-02-14

    Comprehensive genomic cancer risk assessment (GCRA) helps patients, family members, and providers make informed choices about cancer screening, surgical and chemotherapeutic risk reduction, and genetically targeted cancer therapies. The increasing availability of multigene panel tests for clinical applications allows testing of well-defined high-risk genes, as well as moderate-risk genes, for which the penetrance and spectrum of cancer risk are less well characterized. Moderate-risk genes are defined as genes that, when altered by a pathogenic variant, confer a 2 to fivefold relative risk of cancer. Two such genes included on many comprehensive cancer panels are the DNA repair genes ATM and CHEK2, best known for moderately increased risk of breast cancer development. However, the impact of screening and preventative interventions and spectrum of cancer risk beyond breast cancer associated with ATM and/or CHEK2 variants remain less well characterized. We convened a large, multidisciplinary, cross-sectional panel of GCRA clinicians to review challenging, peer-submitted cases of patients identified with ATM or CHEK2 variants. This paper summarizes the inter-professional case discussion and recommendations generated during the session, the level of concordance with respect to recommendations between the academic and community clinician participants for each case, and potential barriers to implementing recommended care in various practice settings.

  20. Everything you always wanted to know about sex ... in flies.

    PubMed

    Arbeitman, M N; Kopp, Artyom; Siegal, M L; Van Doren, M

    2010-01-01

    'Everything you always wanted to know about sex' is a workshop organized as part of the annual Drosophila Research Conference of the Genetics Society of America. This workshop provides an intellectual venue for interaction among research groups that study sexual dimorphism from the molecular, evolutionary, genomic, and behavioral perspectives. The speakers summarize the key ideas behind their research for people working in other fields, outline unsolved questions, and offer their opinions about future directions. The 2010 workshop highlighted the power of the Drosophila model for understanding sexual dimorphism at levels ranging from cell biology and gene regulation to population genetics and genome evolution, and demonstrated the importance of cross-disciplinary interactions in the study of sex. In this respect, Drosophila sets a good example for research in other organisms, including humans and their mammalian relatives. Copyright © 2010 S. Karger AG, Basel.

Top