Sample records for multiple gene models

  1. Developing Pedagogical Tools to Improve Teaching Multiple Models of the Gene in High School

    ERIC Educational Resources Information Center

    Auckaraaree, Nantaya

    2013-01-01

    Multiple models of the gene are used to explore genetic phenomena in scientific practices and in the classroom. In genetics curricula, the classical and molecular models are presented in disconnected domains. Research demonstrates that, without explicit connections, students have difficulty developing an understanding of the gene that spans…

  2. Patterns of evolution at the gametophytic self-incompatibility Sorbus aucuparia (Pyrinae) S pollen genes support the non-self recognition by multiple factors model.

    PubMed

    Aguiar, Bruno; Vieira, Jorge; Cunha, Ana E; Fonseca, Nuno A; Reboiro-Jato, David; Reboiro-Jato, Miguel; Fdez-Riverola, Florentino; Raspé, Olivier; Vieira, Cristina P

    2013-05-01

    S-RNase-based gametophytic self-incompatibility evolved once before the split of the Asteridae and Rosidae. In Prunus (tribe Amygdaloideae of Rosaceae), the self-incompatibility S-pollen is a single F-box gene that presents the expected evolutionary signatures. In Malus and Pyrus (subtribe Pyrinae of Rosaceae), however, clusters of F-box genes (called SFBBs) have been described that are expressed in pollen only and are linked to the S-RNase gene. Although polymorphic, SFBB genes present levels of diversity lower than those of the S-RNase gene. They have been suggested as putative S-pollen genes, in a system of non-self recognition by multiple factors. Subsets of allelic products of the different SFBB genes interact with non-self S-RNases, marking them for degradation, and allowing compatible pollinations. This study performed a detailed characterization of SFBB genes in Sorbus aucuparia (Pyrinae) to address three predictions of the non-self recognition by multiple factors model. As predicted, the number of SFBB genes was large to account for the many S-RNase specificities. Secondly, like the S-RNase gene, the SFBB genes were old. Thirdly, amino acids under positive selection-those that could be involved in specificity determination-were identified when intra-haplotype SFBB genes were analysed using codon models. Overall, the findings reported here support the non-self recognition by multiple factors model.

  3. JRmGRN: Joint reconstruction of multiple gene regulatory networks with common hub genes using data from multiple tissues or conditions.

    PubMed

    Deng, Wenping; Zhang, Kui; Liu, Sanzhen; Zhao, Patrick; Xu, Shizhong; Wei, Hairong

    2018-04-30

    Joint reconstruction of multiple gene regulatory networks (GRNs) using gene expression data from multiple tissues/conditions is very important for understanding common and tissue/condition-specific regulation. However, there are currently no computational models and methods available for directly constructing such multiple GRNs that not only share some common hub genes but also possess tissue/condition-specific regulatory edges. In this paper, we proposed a new graphic Gaussian model for joint reconstruction of multiple gene regulatory networks (JRmGRN), which highlighted hub genes, using gene expression data from several tissues/conditions. Under the framework of Gaussian graphical model, JRmGRN method constructs the GRNs through maximizing a penalized log likelihood function. We formulated it as a convex optimization problem, and then solved it with an alternating direction method of multipliers (ADMM) algorithm. The performance of JRmGRN was first evaluated with synthetic data and the results showed that JRmGRN outperformed several other methods for reconstruction of GRNs. We also applied our method to real Arabidopsis thaliana RNA-seq data from two light regime conditions in comparison with other methods, and both common hub genes and some conditions-specific hub genes were identified with higher accuracy and precision. JRmGRN is available as a R program from: https://github.com/wenpingd. hairong@mtu.edu. Proof of theorem, derivation of algorithm and supplementary data are available at Bioinformatics online.

  4. Patterns of evolution at the gametophytic self-incompatibility Sorbus aucuparia (Pyrinae) S pollen genes support the non-self recognition by multiple factors model

    PubMed Central

    Aguiar, Bruno; Vieira, Jorge; Cunha, Ana E.; Fonseca, Nuno A.; Reboiro-Jato, David; Reboiro-Jato, Miguel; Fdez-Riverola, Florentino; Raspé, Olivier; Vieira, Cristina P.

    2013-01-01

    S-RNase-based gametophytic self-incompatibility evolved once before the split of the Asteridae and Rosidae. In Prunus (tribe Amygdaloideae of Rosaceae), the self-incompatibility S-pollen is a single F-box gene that presents the expected evolutionary signatures. In Malus and Pyrus (subtribe Pyrinae of Rosaceae), however, clusters of F-box genes (called SFBBs) have been described that are expressed in pollen only and are linked to the S-RNase gene. Although polymorphic, SFBB genes present levels of diversity lower than those of the S-RNase gene. They have been suggested as putative S-pollen genes, in a system of non-self recognition by multiple factors. Subsets of allelic products of the different SFBB genes interact with non-self S-RNases, marking them for degradation, and allowing compatible pollinations. This study performed a detailed characterization of SFBB genes in Sorbus aucuparia (Pyrinae) to address three predictions of the non-self recognition by multiple factors model. As predicted, the number of SFBB genes was large to account for the many S-RNase specificities. Secondly, like the S-RNase gene, the SFBB genes were old. Thirdly, amino acids under positive selection—those that could be involved in specificity determination—were identified when intra-haplotype SFBB genes were analysed using codon models. Overall, the findings reported here support the non-self recognition by multiple factors model. PMID:23606363

  5. Multiple abiotic stimuli are integrated in the regulation of rice gene expression under field conditions.

    PubMed

    Plessis, Anne; Hafemeister, Christoph; Wilkins, Olivia; Gonzaga, Zennia Jean; Meyer, Rachel Sarah; Pires, Inês; Müller, Christian; Septiningsih, Endang M; Bonneau, Richard; Purugganan, Michael

    2015-11-26

    Plants rely on transcriptional dynamics to respond to multiple climatic fluctuations and contexts in nature. We analyzed the genome-wide gene expression patterns of rice (Oryza sativa) growing in rainfed and irrigated fields during two distinct tropical seasons and determined simple linear models that relate transcriptomic variation to climatic fluctuations. These models combine multiple environmental parameters to account for patterns of expression in the field of co-expressed gene clusters. We examined the similarities of our environmental models between tropical and temperate field conditions, using previously published data. We found that field type and macroclimate had broad impacts on transcriptional responses to environmental fluctuations, especially for genes involved in photosynthesis and development. Nevertheless, variation in solar radiation and temperature at the timescale of hours had reproducible effects across environmental contexts. These results provide a basis for broad-based predictive modeling of plant gene expression in the field.

  6. Multiple abiotic stimuli are integrated in the regulation of rice gene expression under field conditions

    PubMed Central

    Plessis, Anne; Hafemeister, Christoph; Wilkins, Olivia; Gonzaga, Zennia Jean; Meyer, Rachel Sarah; Pires, Inês; Müller, Christian; Septiningsih, Endang M; Bonneau, Richard; Purugganan, Michael

    2015-01-01

    Plants rely on transcriptional dynamics to respond to multiple climatic fluctuations and contexts in nature. We analyzed the genome-wide gene expression patterns of rice (Oryza sativa) growing in rainfed and irrigated fields during two distinct tropical seasons and determined simple linear models that relate transcriptomic variation to climatic fluctuations. These models combine multiple environmental parameters to account for patterns of expression in the field of co-expressed gene clusters. We examined the similarities of our environmental models between tropical and temperate field conditions, using previously published data. We found that field type and macroclimate had broad impacts on transcriptional responses to environmental fluctuations, especially for genes involved in photosynthesis and development. Nevertheless, variation in solar radiation and temperature at the timescale of hours had reproducible effects across environmental contexts. These results provide a basis for broad-based predictive modeling of plant gene expression in the field. DOI: http://dx.doi.org/10.7554/eLife.08411.001 PMID:26609814

  7. Trainable Gene Regulation Networks with Applications to Drosophila Pattern Formation

    NASA Technical Reports Server (NTRS)

    Mjolsness, Eric

    2000-01-01

    This chapter will very briefly introduce and review some computational experiments in using trainable gene regulation network models to simulate and understand selected episodes in the development of the fruit fly, Drosophila melanogaster. For details the reader is referred to the papers introduced below. It will then introduce a new gene regulation network model which can describe promoter-level substructure in gene regulation. As described in chapter 2, gene regulation may be thought of as a combination of cis-acting regulation by the extended promoter of a gene (including all regulatory sequences) by way of the transcription complex, and of trans-acting regulation by the transcription factor products of other genes. If we simplify the cis-action by using a phenomenological model which can be tuned to data, such as a unit or other small portion of an artificial neural network, then the full transacting interaction between multiple genes during development can be modelled as a larger network which can again be tuned or trained to data. The larger network will in general need to have recurrent (feedback) connections since at least some real gene regulation networks do. This is the basic modeling approach taken, which describes how a set of recurrent neural networks can be used as a modeling language for multiple developmental processes including gene regulation within a single cell, cell-cell communication, and cell division. Such network models have been called "gene circuits", "gene regulation networks", or "genetic regulatory networks", sometimes without distinguishing the models from the actual modeled systems.

  8. Action of multiple intra-QTL genes concerted around a co-localized transcription factor underpins a large effect QTL

    PubMed Central

    Dixit, Shalabh; Kumar Biswal, Akshaya; Min, Aye; Henry, Amelia; Oane, Rowena H.; Raorane, Manish L.; Longkumer, Toshisangba; Pabuayon, Isaiah M.; Mutte, Sumanth K.; Vardarajan, Adithi R.; Miro, Berta; Govindan, Ganesan; Albano-Enriquez, Blesilda; Pueffeld, Mandy; Sreenivasulu, Nese; Slamet-Loedin, Inez; Sundarvelpandian, Kalaipandian; Tsai, Yuan-Ching; Raghuvanshi, Saurabh; Hsing, Yue-Ie C.; Kumar, Arvind; Kohli, Ajay

    2015-01-01

    Sub-QTLs and multiple intra-QTL genes are hypothesized to underpin large-effect QTLs. Known QTLs over gene families, biosynthetic pathways or certain traits represent functional gene-clusters of genes of the same gene ontology (GO). Gene-clusters containing genes of different GO have not been elaborated, except in silico as coexpressed genes within QTLs. Here we demonstrate the requirement of multiple intra-QTL genes for the full impact of QTL qDTY12.1 on rice yield under drought. Multiple evidences are presented for the need of the transcription factor ‘no apical meristem’ (OsNAM12.1) and its co-localized target genes of separate GO categories for qDTY12.1 function, raising a regulon-like model of genetic architecture. The molecular underpinnings of qDTY12.1 support its effectiveness in further improving a drought tolerant genotype and for its validity in multiple genotypes/ecosystems/environments. Resolving the combinatorial value of OsNAM12.1 with individual intra-QTL genes notwithstanding, identification and analyses of qDTY12.1has fast-tracked rice improvement towards food security. PMID:26507552

  9. Discovery of cancer common and specific driver gene sets

    PubMed Central

    2017-01-01

    Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295

  10. A latent variable approach to study gene-environment interactions in the presence of multiple correlated exposures.

    PubMed

    Sánchez, Brisa N; Kang, Shan; Mukherjee, Bhramar

    2012-06-01

    Many existing cohort studies initially designed to investigate disease risk as a function of environmental exposures have collected genomic data in recent years with the objective of testing for gene-environment interaction (G × E) effects. In environmental epidemiology, interest in G × E arises primarily after a significant effect of the environmental exposure has been documented. Cohort studies often collect rich exposure data; as a result, assessing G × E effects in the presence of multiple exposure markers further increases the burden of multiple testing, an issue already present in both genetic and environment health studies. Latent variable (LV) models have been used in environmental epidemiology to reduce dimensionality of the exposure data, gain power by reducing multiplicity issues via condensing exposure data, and avoid collinearity problems due to presence of multiple correlated exposures. We extend the LV framework to characterize gene-environment interaction in presence of multiple correlated exposures and genotype categories. Further, similar to what has been done in case-control G × E studies, we use the assumption of gene-environment (G-E) independence to boost the power of tests for interaction. The consequences of making this assumption, or the issue of how to explicitly model G-E association has not been previously investigated in LV models. We postulate a hierarchy of assumptions about the LV model regarding the different forms of G-E dependence and show that making such assumptions may influence inferential results on the G, E, and G × E parameters. We implement a class of shrinkage estimators to data adaptively trade-off between the most restrictive to most flexible form of G-E dependence assumption and note that such class of compromise estimators can serve as a benchmark of model adequacy in LV models. We demonstrate the methods with an example from the Early Life Exposures in Mexico City to Neuro-Toxicants Study of lead exposure, iron metabolism genes, and birth weight. © 2011, The International Biometric Society.

  11. Genotet: An Interactive Web-based Visual Exploration Framework to Support Validation of Gene Regulatory Networks.

    PubMed

    Yu, Bowen; Doraiswamy, Harish; Chen, Xi; Miraldi, Emily; Arrieta-Ortiz, Mario Luis; Hafemeister, Christoph; Madar, Aviv; Bonneau, Richard; Silva, Cláudio T

    2014-12-01

    Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).

  12. Statistical mechanical model of coupled transcription from multiple promoters due to transcription factor titration

    PubMed Central

    Rydenfelt, Mattias; Cox, Robert Sidney; Garcia, Hernan; Phillips, Rob

    2014-01-01

    Transcription factors (TFs) with regulatory action at multiple promoter targets is the rule rather than the exception, with examples ranging from the cAMP receptor protein (CRP) in E. coli that regulates hundreds of different genes simultaneously to situations involving multiple copies of the same gene, such as plasmids, retrotransposons, or highly replicated viral DNA. When the number of TFs heavily exceeds the number of binding sites, TF binding to each promoter can be regarded as independent. However, when the number of TF molecules is comparable to the number of binding sites, TF titration will result in correlation (“promoter entanglement”) between transcription of different genes. We develop a statistical mechanical model which takes the TF titration effect into account and use it to predict both the level of gene expression for a general set of promoters and the resulting correlation in transcription rates of different genes. Our results show that the TF titration effect could be important for understanding gene expression in many regulatory settings. PMID:24580252

  13. GeneNetFinder2: Improved Inference of Dynamic Gene Regulatory Relations with Multiple Regulators.

    PubMed

    Han, Kyungsook; Lee, Jeonghoon

    2016-01-01

    A gene involved in complex regulatory interactions may have multiple regulators since gene expression in such interactions is often controlled by more than one gene. Another thing that makes gene regulatory interactions complicated is that regulatory interactions are not static, but change over time during the cell cycle. Most research so far has focused on identifying gene regulatory relations between individual genes in a particular stage of the cell cycle. In this study we developed a method for identifying dynamic gene regulations of several types from the time-series gene expression data. The method can find gene regulations with multiple regulators that work in combination or individually as well as those with single regulators. The method has been implemented as the second version of GeneNetFinder (hereafter called GeneNetFinder2) and tested on several gene expression datasets. Experimental results with gene expression data revealed the existence of genes that are not regulated by individual genes but rather by a combination of several genes. Such gene regulatory relations cannot be found by conventional methods. Our method finds such regulatory relations as well as those with multiple, independent regulators or single regulators, and represents gene regulatory relations as a dynamic network in which different gene regulatory relations are shown in different stages of the cell cycle. GeneNetFinder2 is available at http://bclab.inha.ac.kr/GeneNetFinder and will be useful for modeling dynamic gene regulations with multiple regulators.

  14. Identification of Single- and Multiple-Class Specific Signature Genes from Gene Expression Profiles by Group Marker Index

    PubMed Central

    Tsai, Yu-Shuen; Aguan, Kripamoy; Pal, Nikhil R.; Chung, I-Fang

    2011-01-01

    Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases. PMID:21909426

  15. Conceptual Variation or Incoherence? Textbook Discourse on Genes in Six Countries

    NASA Astrophysics Data System (ADS)

    Gericke, Niklas M.; Hagberg, Mariana; dos Santos, Vanessa Carvalho; Joaquim, Leyla Mariane; El-Hani, Charbel N.

    2014-02-01

    The aim of this paper is to investigate in a systematic and comparative way previous results of independent studies on the treatment of genes and gene function in high school textbooks from six different countries. We analyze how the conceptual variation within the scientific domain of Genetics regarding gene function models and gene concepts is transformed via the didactic transposition into school science textbooks. The results indicate that a common textbook discourse on genes and their function exist in textbooks from the different countries. The structure of science as represented by conceptual variation and the use of multiple models was present in all the textbooks. However, the existence of conceptual variation and multiple models is implicit in these textbooks, i.e., the phenomenon of conceptual variation and multiple models are not addressed explicitly, nor its consequences and, thus, it ends up introducing conceptual incoherence about the gene concept and its function within the textbooks. We conclude that within the found textbook-discourse ontological aspects of the academic disciplines of genetics and molecular biology were retained, but without their epistemological underpinnings; these are lost in the didactic transposition. These results are of interest since students might have problems reconstructing the correct scientific understanding from the transformed school science knowledge as depicted within the high school textbooks. Implications for textbook writing as well as teaching are discussed in the paper.

  16. Classification and Clustering Methods for Multiple Environmental Factors in Gene-Environment Interaction: Application to the Multi-Ethnic Study of Atherosclerosis.

    PubMed

    Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V

    2016-11-01

    There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.

  17. Assessment of the Toxicity of CuO Nanoparticles by Using Saccharomyces cerevisiae Mutants with Multiple Genes Deleted

    PubMed Central

    Bao, Shaopan; Lu, Qicong; Dai, Heping; Zhang, Chao

    2015-01-01

    To develop applicable and susceptible models to evaluate the toxicity of nanoparticles, the antimicrobial effects of CuO nanoparticles (CuO-NPs) on various Saccharomyces cerevisiae (S. cerevisiae) strains (wild type, single-gene-deleted mutants, and multiple-gene-deleted mutants) were determined and compared. Further experiments were also conducted to analyze the mechanisms associated with toxicity using copper salt, bulk CuO (bCuO), carbon-shelled copper nanoparticles (C/Cu-NPs), and carbon nanoparticles (C-NPs) for comparisons. The results indicated that the growth inhibition rates of CuO-NPs for the wild-type and the single-gene-deleted strains were comparable, while for the multiple-gene deletion mutant, significantly higher toxicity was observed (P < 0.05). When the toxicity of the CuO-NPs to yeast cells was compared with the toxicities of copper salt and bCuO, we concluded that the toxicity of CuO-NPs should be attributed to soluble copper rather than to the nanoparticles. The striking difference in adverse effects of C-NPs and C/Cu-NPs with equivalent surface areas also proved this. A toxicity assay revealed that the multiple-gene-deleted mutant was significantly more sensitive to CuO-NPs than the wild type. Specifically, compared with the wild-type strain, copper was readily taken up by mutant strains when cell permeability genes were knocked out, and the mutants with deletions of genes regulated under oxidative stress (OS) were likely producing more reactive oxygen species (ROS). Hence, as mechanism-based gene inactivation could increase the susceptibility of yeast, the multiple-gene-deleted mutants should be improved model organisms to investigate the toxicity of nanoparticles. PMID:26386067

  18. Clinical and multiple gene expression variables in survival analysis of breast cancer: Analysis with the hypertabastic survival model

    PubMed Central

    2012-01-01

    Background We explore the benefits of applying a new proportional hazard model to analyze survival of breast cancer patients. As a parametric model, the hypertabastic survival model offers a closer fit to experimental data than Cox regression, and furthermore provides explicit survival and hazard functions which can be used as additional tools in the survival analysis. In addition, one of our main concerns is utilization of multiple gene expression variables. Our analysis treats the important issue of interaction of different gene signatures in the survival analysis. Methods The hypertabastic proportional hazards model was applied in survival analysis of breast cancer patients. This model was compared, using statistical measures of goodness of fit, with models based on the semi-parametric Cox proportional hazards model and the parametric log-logistic and Weibull models. The explicit functions for hazard and survival were then used to analyze the dynamic behavior of hazard and survival functions. Results The hypertabastic model provided the best fit among all the models considered. Use of multiple gene expression variables also provided a considerable improvement in the goodness of fit of the model, as compared to use of only one. By utilizing the explicit survival and hazard functions provided by the model, we were able to determine the magnitude of the maximum rate of increase in hazard, and the maximum rate of decrease in survival, as well as the times when these occurred. We explore the influence of each gene expression variable on these extrema. Furthermore, in the cases of continuous gene expression variables, represented by a measure of correlation, we were able to investigate the dynamics with respect to changes in gene expression. Conclusions We observed that use of three different gene signatures in the model provided a greater combined effect and allowed us to assess the relative importance of each in determination of outcome in this data set. These results point to the potential to combine gene signatures to a greater effect in cases where each gene signature represents some distinct aspect of the cancer biology. Furthermore we conclude that the hypertabastic survival models can be an effective survival analysis tool for breast cancer patients. PMID:23241496

  19. Mixture models for detecting differentially expressed genes in microarrays.

    PubMed

    Jones, Liat Ben-Tovim; Bean, Richard; McLachlan, Geoffrey J; Zhu, Justin Xi

    2006-10-01

    An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.

  20. Mitochondria, oligodendrocytes and inflammation in bipolar disorder: evidence from transcriptome studies points to intriguing parallels with multiple sclerosis

    PubMed Central

    Konradi, Christine; Sillivan, Stephanie E.; Clay, Hayley B.

    2011-01-01

    Gene expression studies of bipolar disorder (BPD) have shown changes in transcriptome profiles in multiple brain regions. Here we summarize the most consistent findings in the scientific literature, and compare them to data from schizophrenia (SZ) and major depressive disorder (MDD). The transcriptome profiles of all three disorders overlap, making the existence of a BPD-specific profile unlikely. Three groups of functionally related genes are consistently expressed at altered levels in BPD, SZ and MDD. Genes involved in energy metabolism and mitochondrial function are downregulated, genes involved in immune response and inflammation are upregulated, and genes expressed in oligodendrocytes are downregulated. Experimental paradigms for multiple sclerosis demonstrate a tight link between energy metabolism, inflammation and demyelination. These studies also show variabilities in the extent of oligodendrocyte stress, which can vary from a downregulation of oligodendrocyte genes, such as observed in psychiatric disorders, to cell death and brain lesions seen in multiple sclerosis. We conclude that experimental models of multiple sclerosis could be of interest for the research of BPD, SZ and MDD. PMID:21310238

  1. One-step generation of complete gene knockout mice and monkeys by CRISPR/Cas9-mediated gene editing with multiple sgRNAs.

    PubMed

    Zuo, Erwei; Cai, Yi-Jun; Li, Kui; Wei, Yu; Wang, Bang-An; Sun, Yidi; Liu, Zhen; Liu, Jiwei; Hu, Xinde; Wei, Wei; Huo, Xiaona; Shi, Linyu; Tang, Cheng; Liang, Dan; Wang, Yan; Nie, Yan-Hong; Zhang, Chen-Chen; Yao, Xuan; Wang, Xing; Zhou, Changyang; Ying, Wenqin; Wang, Qifang; Chen, Ren-Chao; Shen, Qi; Xu, Guo-Liang; Li, Jinsong; Sun, Qiang; Xiong, Zhi-Qi; Yang, Hui

    2017-07-01

    The CRISPR/Cas9 system is an efficient gene-editing method, but the majority of gene-edited animals showed mosaicism, with editing occurring only in a portion of cells. Here we show that single gene or multiple genes can be completely knocked out in mouse and monkey embryos by zygotic injection of Cas9 mRNA and multiple adjacent single-guide RNAs (spaced 10-200 bp apart) that target only a single key exon of each gene. Phenotypic analysis of F0 mice following targeted deletion of eight genes on the Y chromosome individually demonstrated the robustness of this approach in generating knockout mice. Importantly, this approach delivers complete gene knockout at high efficiencies (100% on Arntl and 91% on Prrt2) in monkey embryos. Finally, we could generate a complete Prrt2 knockout monkey in a single step, demonstrating the usefulness of this approach in rapidly establishing gene-edited monkey models.

  2. Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling.

    PubMed

    Ando, Tatsuya; Suguro, Miyuki; Kobayashi, Takeshi; Seto, Masao; Honda, Hiroyuki

    2003-10-01

    A fuzzy neural network (FNN) using gene expression profile data can select combinations of genes from thousands of genes, and is applicable to predict outcome for cancer patients after chemotherapy. However, wide clinical heterogeneity reduces the accuracy of prediction. To overcome this problem, we have proposed an FNN system based on majoritarian decision using multiple noninferior models. We used transcriptional profiling data, which were obtained from "Lymphochip" DNA microarrays (http://llmpp.nih.gov/DLBCL), reported by Rosenwald (N Engl J Med 2002; 346: 1937-47). When the data were analyzed by our FNN system, accuracy (73.4%) of outcome prediction using only 1 FNN model with 4 genes was higher than that (68.5%) of the Cox model using 17 genes. Higher accuracy (91%) was obtained when an FNN system with 9 noninferior models, consisting of 35 independent genes, was used. The genes selected by the system included genes that are informative in the prognosis of Diffuse large B-cell lymphoma (DLBCL), such as genes showing an expression pattern similar to that of CD10 and BCL-6 or similar to that of IRF-4 and BCL-4. We classified 220 DLBCL patients into 5 groups using the prediction results of 9 FNN models. These groups may correspond to DLBCL subtypes. In group A containing half of the 220 patients, patients with poor outcome were found to satisfy 2 rules, i.e., high expression of MAX dimerization with high expression of unknown A (LC_26146), or high expression of MAX dimerization with low expression of unknown B (LC_33144). The present paper is the first to describe the multiple noninferior FNN modeling system. This system is a powerful tool for predicting outcome and classifying patients, and is applicable to other heterogeneous diseases.

  3. ExprAlign - the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles

    PubMed Central

    2009-01-01

    Background Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities. Results Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments. Conclusion The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data. PMID:19939286

  4. Integrative Analysis of Prognosis Data on Multiple Cancer Subtypes

    PubMed Central

    Liu, Jin; Huang, Jian; Zhang, Yawei; Lan, Qing; Rothman, Nathaniel; Zheng, Tongzhang; Ma, Shuangge

    2014-01-01

    Summary In cancer research, profiling studies have been extensively conducted, searching for genes/SNPs associated with prognosis. Cancer is diverse. Examining the similarity and difference in the genetic basis of multiple subtypes of the same cancer can lead to a better understanding of their connections and distinctions. Classic meta-analysis methods analyze each subtype separately and then compare analysis results across subtypes. Integrative analysis methods, in contrast, analyze the raw data on multiple subtypes simultaneously and can outperform meta-analysis methods. In this study, prognosis data on multiple subtypes of the same cancer are analyzed. An AFT (accelerated failure time) model is adopted to describe survival. The genetic basis of multiple subtypes is described using the heterogeneity model, which allows a gene/SNP to be associated with prognosis of some subtypes but not others. A compound penalization method is developed to identify genes that contain important SNPs associated with prognosis. The proposed method has an intuitive formulation and is realized using an iterative algorithm. Asymptotic properties are rigorously established. Simulation shows that the proposed method has satisfactory performance and outperforms a penalization-based meta-analysis method and a regularized thresholding method. An NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements is analyzed. Genes associated with the three major subtypes, namely DLBCL, FL, and CLL/SLL, are identified. The proposed method identifies genes that are different from alternatives and have important implications and satisfactory prediction performance. PMID:24766212

  5. Generation of gene-targeted mice using embryonic stem cells derived from a transgenic mouse model of Alzheimer's disease.

    PubMed

    Yamamoto, Satoshi; Ooshima, Yuki; Nakata, Mitsugu; Yano, Takashi; Matsuoka, Kunio; Watanabe, Sayuri; Maeda, Ryouta; Takahashi, Hideki; Takeyama, Michiyasu; Matsumoto, Yoshio; Hashimoto, Tadatoshi

    2013-06-01

    Gene-targeting technology using mouse embryonic stem (ES) cells has become the "gold standard" for analyzing gene functions and producing disease models. Recently, genetically modified mice with multiple mutations have increasingly been produced to study the interaction between proteins and polygenic diseases. However, introduction of an additional mutation into mice already harboring several mutations by conventional natural crossbreeding is an extremely time- and labor-intensive process. Moreover, to do so in mice with a complex genetic background, several years may be required if the genetic background is to be retained. Establishing ES cells from multiple-mutant mice, or disease-model mice with a complex genetic background, would offer a possible solution. Here, we report the establishment and characterization of novel ES cell lines from a mouse model of Alzheimer's disease (3xTg-AD mouse, Oddo et al. in Neuron 39:409-421, 2003) harboring 3 mutated genes (APPswe, TauP301L, and PS1M146V) and a complex genetic background. Thirty blastocysts were cultured and 15 stable ES cell lines (male: 11; female: 4) obtained. By injecting these ES cells into diploid or tetraploid blastocysts, we generated germline-competent chimeras. Subsequently, we confirmed that F1 mice derived from these animals showed similar biochemical and behavioral characteristics to the original 3xTg-AD mice. Furthermore, we introduced a gene-targeting vector into the ES cells and successfully obtained gene-targeted ES cells, which were then used to generate knockout mice for the targeted gene. These results suggest that the present methodology is effective for introducing an additional mutation into mice already harboring multiple mutated genes and/or a complex genetic background.

  6. The Role of Multiple Transcription Factors In Archaeal Gene Expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Charles J. Daniels

    2008-09-23

    Since the inception of this research program, the project has focused on two central questions: What is the relationship between the 'eukaryal-like' transcription machinery of archaeal cells and its counterparts in eukaryal cells? And, how does the archaeal cell control gene expression using its mosaic of eukaryal core transcription machinery and its bacterial-like transcription regulatory proteins? During the grant period we have addressed these questions using a variety of in vivo approaches and have sought to specifically define the roles of the multiple TATA binding protein (TBP) and TFIIB-like (TFB) proteins in controlling gene expression in Haloferax volcanii. H. volcaniimore » was initially chosen as a model for the Archaea based on the availability of suitable genetic tools; however, later studies showed that all haloarchaea possessed multiple tbp and tfb genes, which led to the proposal that multiple TBP and TFB proteins may function in a manner similar to alternative sigma factors in bacterial cells. In vivo transcription and promoter analysis established a clear relationship between the promoter requirements of haloarchaeal genes and those of the eukaryal RNA polymerase II promoter. Studies on heat shock gene promoters, and the demonstration that specific tfb genes were induced by heat shock, provided the first indication that TFB proteins may direct expression of specific gene families. The construction of strains lacking tbp or tfb genes, coupled with the finding that many of these genes are differentially expressed under varying growth conditions, provided further support for this model. Genetic tools were also developed that led to the construction of insertion and deletion mutants, and a novel gene expression scheme was designed that allowed the controlled expression of these genes in vivo. More recent studies have used a whole genome array to examine the expression of these genes and we have established a linkage between the expression of specific tfb genes and the regulation of nitrogen metabolism and other global cellular responses.« less

  7. Origins of extrinsic variability in eukaryotic gene expression

    NASA Astrophysics Data System (ADS)

    Volfson, Dmitri; Marciniak, Jennifer; Blake, William J.; Ostroff, Natalie; Tsimring, Lev S.; Hasty, Jeff

    2006-02-01

    Variable gene expression within a clonal population of cells has been implicated in a number of important processes including mutation and evolution, determination of cell fates and the development of genetic disease. Recent studies have demonstrated that a significant component of expression variability arises from extrinsic factors thought to influence multiple genes simultaneously, yet the biological origins of this extrinsic variability have received little attention. Here we combine computational modelling with fluorescence data generated from multiple promoter-gene inserts in Saccharomyces cerevisiae to identify two major sources of extrinsic variability. One unavoidable source arising from the coupling of gene expression with population dynamics leads to a ubiquitous lower limit for expression variability. A second source, which is modelled as originating from a common upstream transcription factor, exemplifies how regulatory networks can convert noise in upstream regulator expression into extrinsic noise at the output of a target gene. Our results highlight the importance of the interplay of gene regulatory networks with population heterogeneity for understanding the origins of cellular diversity.

  8. Origins of extrinsic variability in eukaryotic gene expression

    NASA Astrophysics Data System (ADS)

    Volfson, Dmitri; Marciniak, Jennifer; Blake, William J.; Ostroff, Natalie; Tsimring, Lev S.; Hasty, Jeff

    2006-03-01

    Variable gene expression within a clonal population of cells has been implicated in a number of important processes including mutation and evolution, determination of cell fates and the development of genetic disease. Recent studies have demonstrated that a significant component of expression variability arises from extrinsic factors thought to influence multiple genes in concert, yet the biological origins of this extrinsic variability have received little attention. Here we combine computational modeling with fluorescence data generated from multiple promoter-gene inserts in Saccharomyces cerevisiae to identify two major sources of extrinsic variability. One unavoidable source arising from the coupling of gene expression with population dynamics leads to a ubiquitous noise floor in expression variability. A second source which is modeled as originating from a common upstream transcription factor exemplifies how regulatory networks can convert noise in upstream regulator expression into extrinsic noise at the output of a target gene. Our results highlight the importance of the interplay of gene regulatory networks with population heterogeneity for understanding the origins of cellular diversity.

  9. Rat Models of Cardiovascular Disease Demonstrate Distinctive Pulmonary Gene Expressions for Vascular Response Genes: Impact of Ozone Exposure

    EPA Science Inventory

    Comparative gene expression profiling of multiple tissues from rat strains with genetic predisposition to diverse cardiovascular diseases (CVD) can help decode the transcriptional program that governs organ-specific functions. We examined expressions of CVD genes in the lungs of ...

  10. A BAC-bacterial recombination method to generate physically linked multiple gene reporter DNA constructs.

    PubMed

    Maye, Peter; Stover, Mary Louise; Liu, Yaling; Rowe, David W; Gong, Shiaochin; Lichtler, Alexander C

    2009-03-13

    Reporter gene mice are valuable animal models for biological research providing a gene expression readout that can contribute to cellular characterization within the context of a developmental process. With the advancement of bacterial recombination techniques to engineer reporter gene constructs from BAC genomic clones and the generation of optically distinguishable fluorescent protein reporter genes, there is an unprecedented capability to engineer more informative transgenic reporter mouse models relative to what has been traditionally available. We demonstrate here our first effort on the development of a three stage bacterial recombination strategy to physically link multiple genes together with their respective fluorescent protein (FP) reporters in one DNA fragment. This strategy uses bacterial recombination techniques to: (1) subclone genes of interest into BAC linking vectors, (2) insert desired reporter genes into respective genes and (3) link different gene-reporters together. As proof of concept, we have generated a single DNA fragment containing the genes Trap, Dmp1, and Ibsp driving the expression of ECFP, mCherry, and Topaz FP reporter genes, respectively. Using this DNA construct, we have successfully generated transgenic reporter mice that retain two to three gene readouts. The three stage methodology to link multiple genes with their respective fluorescent protein reporter works with reasonable efficiency. Moreover, gene linkage allows for their common chromosomal integration into a single locus. However, the testing of this multi-reporter DNA construct by transgenesis does suggest that the linkage of two different genes together, despite their large size, can still create a positional effect. We believe that gene choice, genomic DNA fragment size and the presence of endogenous insulator elements are critical variables.

  11. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature.

    PubMed

    Ye, Ning; Yin, Hengfu; Liu, Jingjing; Dai, Xiaogang; Yin, Tongming

    2015-01-01

    The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.

  12. Interleukin 35 and Hepatocyte Growth Factor; as a novel combined immune gene therapy for Multiple Sclerosis disease.

    PubMed

    Moghadam, Samira; Erfanmanesh, Maryam; Esmaeilzadeh, Abdolreza

    2017-11-01

    An autoimmune demyelination disease of the Central Nervous System, Multiple Sclerosis, is a chronic inflammation which mostly involves young adults. Suffering people face functional loss with a severe pain. Most current MS treatments are focused on the immune response suppression. Approved drugs suppress the inflammatory process, but factually, there is no definite cure for Multiple Sclerosis. Recently developed knowledge has demonstrated that gene and cell therapy as a hopeful approach in tissue regeneration. The authors propose a novel combined immune gene therapy for Multiple Sclerosis treatment using anti-inflammatory and remyelination of Interleukine-35 and Hepatocyte Growth Factor properties, respectively. In this hypothesis Interleukine-35 and Hepatocyte Growth Factor introduce to Mesenchymal Stem Cells of EAE mouse model via an adenovirus based vector. It is expected that Interleukine-35 and Hepatocyte Growth Factor genes expressed from MSCs could effectively perform in immunotherapy of Multiple Sclerosis. Copyright © 2017. Published by Elsevier Ltd.

  13. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

    PubMed

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-07-01

    EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.

  14. Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

    PubMed Central

    Miklós, István

    2009-01-01

    Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746

  15. Limit cycles in piecewise-affine gene network models with multiple interaction loops

    NASA Astrophysics Data System (ADS)

    Farcot, Etienne; Gouzé, Jean-Luc

    2010-01-01

    In this article, we consider piecewise affine differential equations modelling gene networks. We work with arbitrary decay rates, and under a local hypothesis expressed as an alignment condition of successive focal points. The interaction graph of the system may be rather complex (multiple intricate loops of any sign, multiple thresholds, etc.). Our main result is an alternative theorem showing that if a sequence of region is periodically visited by trajectories, then under our hypotheses, there exists either a unique stable periodic solution, or the origin attracts all trajectories in this sequence of regions. This result extends greatly our previous work on a single negative feedback loop. We give several examples and simulations illustrating different cases.

  16. Circadian Enhancers Coordinate Multiple Phases of Rhythmic Gene Transcription In Vivo

    PubMed Central

    Fang, Bin; Everett, Logan J.; Jager, Jennifer; Briggs, Erika; Armour, Sean M.; Feng, Dan; Roy, Ankur; Gerhart-Hines, Zachary; Sun, Zheng; Lazar, Mitchell A.

    2014-01-01

    SUMMARY Mammalian transcriptomes display complex circadian rhythms with multiple phases of gene expression that cannot be accounted for by current models of the molecular clock. We have determined the underlying mechanisms by measuring nascent RNA transcription around the clock in mouse liver. Unbiased examination of eRNAs that cluster in specific circadian phases identified functional enhancers driven by distinct transcription factors (TFs). We further identify on a global scale the components of the TF cistromes that function to orchestrate circadian gene expression. Integrated genomic analyses also revealed novel mechanisms by which a single circadian factor controls opposing transcriptional phases. These findings shed new light on the diversity and specificity of TF function in the generation of multiple phases of circadian gene transcription in a mammalian organ. PMID:25416951

  17. Circadian enhancers coordinate multiple phases of rhythmic gene transcription in vivo.

    PubMed

    Fang, Bin; Everett, Logan J; Jager, Jennifer; Briggs, Erika; Armour, Sean M; Feng, Dan; Roy, Ankur; Gerhart-Hines, Zachary; Sun, Zheng; Lazar, Mitchell A

    2014-11-20

    Mammalian transcriptomes display complex circadian rhythms with multiple phases of gene expression that cannot be accounted for by current models of the molecular clock. We have determined the underlying mechanisms by measuring nascent RNA transcription around the clock in mouse liver. Unbiased examination of enhancer RNAs (eRNAs) that cluster in specific circadian phases identified functional enhancers driven by distinct transcription factors (TFs). We further identify on a global scale the components of the TF cistromes that function to orchestrate circadian gene expression. Integrated genomic analyses also revealed mechanisms by which a single circadian factor controls opposing transcriptional phases. These findings shed light on the diversity and specificity of TF function in the generation of multiple phases of circadian gene transcription in a mammalian organ.

  18. A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

    PubMed

    Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang

    2017-08-23

    Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.

  19. Endeavour update: a web resource for gene prioritization in multiple species

    PubMed Central

    Tranchevent, Léon-Charles; Barriot, Roland; Yu, Shi; Van Vooren, Steven; Van Loo, Peter; Coessens, Bert; De Moor, Bart; Aerts, Stein; Moreau, Yves

    2008-01-01

    Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis. PMID:18508807

  20. Evaluating aggregate effects of rare and common variants in the 1000 Genomes Project exon sequencing data using latent variable structural equation modeling.

    PubMed

    Nock, Nl; Zhang, Lx

    2011-11-29

    Methods that can evaluate aggregate effects of rare and common variants are limited. Therefore, we applied a two-stage approach to evaluate aggregate gene effects in the 1000 Genomes Project data, which contain 24,487 single-nucleotide polymorphisms (SNPs) in 697 unrelated individuals from 7 populations. In stage 1, we identified potentially interesting genes (PIGs) as those having at least one SNP meeting Bonferroni correction using univariate, multiple regression models. In stage 2, we evaluate aggregate PIG effects on trait, Q1, by modeling each gene as a latent construct, which is defined by multiple common and rare variants, using the multivariate statistical framework of structural equation modeling (SEM). In stage 1, we found that PIGs varied markedly between a randomly selected replicate (replicate 137) and 100 other replicates, with the exception of FLT1. In stage 1, collapsing rare variants decreased false positives but increased false negatives. In stage 2, we developed a good-fitting SEM model that included all nine genes simulated to affect Q1 (FLT1, KDR, ARNT, ELAV4, FLT4, HIF1A, HIF3A, VEGFA, VEGFC) and found that FLT1 had the largest effect on Q1 (βstd = 0.33 ± 0.05). Using replicate 137 estimates as population values, we found that the mean relative bias in the parameters (loadings, paths, residuals) and their standard errors across 100 replicates was on average, less than 5%. Our latent variable SEM approach provides a viable framework for modeling aggregate effects of rare and common variants in multiple genes, but more elegant methods are needed in stage 1 to minimize type I and type II error.

  1. Selection of higher order regression models in the analysis of multi-factorial transcription data.

    PubMed

    Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim

    2014-01-01

    Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.

  2. Spatial scaling and multi-model inference in landscape genetics: Martes americana in northern Idaho

    Treesearch

    Tzeidle N. Wasserman; Samuel A. Cushman; Michael K. Schwartz; David O. Wallin

    2010-01-01

    Individual-based analyses relating landscape structure to genetic distances across complex landscapes enable rigorous evaluation of multiple alternative hypotheses linking landscape structure to gene flow. We utilize two extensions to increase the rigor of the individual-based causal modeling approach to inferring relationships between landscape patterns and gene flow...

  3. EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

    PubMed Central

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-01-01

    EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  4. Single and multiple phenotype QTL analyses of downy mildew resistance in interspecific grapevines.

    PubMed

    Divilov, Konstantin; Barba, Paola; Cadle-Davidson, Lance; Reisch, Bruce I

    2018-05-01

    Downy mildew resistance across days post-inoculation, experiments, and years in two interspecific grapevine F 1 families was investigated using linear mixed models and Bayesian networks, and five new QTL were identified. Breeding grapevines for downy mildew disease resistance has traditionally relied on qualitative gene resistance, which can be overcome by pathogen evolution. Analyzing two interspecific F 1 families, both having ancestry derived from Vitis vinifera and wild North American Vitis species, across 2 years and multiple experiments, we found multiple loci associated with downy mildew sporulation and hypersensitive response in both families using a single phenotype model. The loci explained between 7 and 17% of the variance for either phenotype, suggesting a complex genetic architecture for these traits in the two families studied. For two loci, we used RNA-Seq to detect differentially transcribed genes and found that the candidate genes at these loci were likely not NBS-LRR genes. Additionally, using a multiple phenotype Bayesian network analysis, we found effects between the leaf trichome density, hypersensitive response, and sporulation phenotypes. Moderate-high heritabilities were found for all three phenotypes, suggesting that selection for downy mildew resistance is an achievable goal by breeding for either physical- or non-physical-based resistance mechanisms, with the combination of the two possibly providing durable resistance.

  5. Tensor decomposition-based and principal-component-analysis-based unsupervised feature extraction applied to the gene expression and methylation profiles in the brains of social insects with multiple castes.

    PubMed

    Taguchi, Y-H

    2018-05-08

    Even though coexistence of multiple phenotypes sharing the same genomic background is interesting, it remains incompletely understood. Epigenomic profiles may represent key factors, with unknown contributions to the development of multiple phenotypes, and social-insect castes are a good model for elucidation of the underlying mechanisms. Nonetheless, previous studies have failed to identify genes associated with aberrant gene expression and methylation profiles because of the lack of suitable methodology that can address this problem properly. A recently proposed principal component analysis (PCA)-based and tensor decomposition (TD)-based unsupervised feature extraction (FE) can solve this problem because these two approaches can deal with gene expression and methylation profiles even when a small number of samples is available. PCA-based and TD-based unsupervised FE methods were applied to the analysis of gene expression and methylation profiles in the brains of two social insects, Polistes canadensis and Dinoponera quadriceps. Genes associated with differential expression and methylation between castes were identified, and analysis of enrichment of Gene Ontology terms confirmed reliability of the obtained sets of genes from the biological standpoint. Biologically relevant genes, shown to be associated with significant differential gene expression and methylation between castes, were identified here for the first time. The identification of these genes may help understand the mechanisms underlying epigenetic control of development of multiple phenotypes under the same genomic conditions.

  6. Latent variable models for gene-environment interactions in longitudinal studies with multiple correlated exposures.

    PubMed

    Tao, Yebin; Sánchez, Brisa N; Mukherjee, Bhramar

    2015-03-30

    Many existing cohort studies designed to investigate health effects of environmental exposures also collect data on genetic markers. The Early Life Exposures in Mexico to Environmental Toxicants project, for instance, has been genotyping single nucleotide polymorphisms on candidate genes involved in mental and nutrient metabolism and also in potentially shared metabolic pathways with the environmental exposures. Given the longitudinal nature of these cohort studies, rich exposure and outcome data are available to address novel questions regarding gene-environment interaction (G × E). Latent variable (LV) models have been effectively used for dimension reduction, helping with multiple testing and multicollinearity issues in the presence of correlated multivariate exposures and outcomes. In this paper, we first propose a modeling strategy, based on LV models, to examine the association between repeated outcome measures (e.g., child weight) and a set of correlated exposure biomarkers (e.g., prenatal lead exposure). We then construct novel tests for G × E effects within the LV framework to examine effect modification of outcome-exposure association by genetic factors (e.g., the hemochromatosis gene). We consider two scenarios: one allowing dependence of the LV models on genes and the other assuming independence between the LV models and genes. We combine the two sets of estimates by shrinkage estimation to trade off bias and efficiency in a data-adaptive way. Using simulations, we evaluate the properties of the shrinkage estimates, and in particular, we demonstrate the need for this data-adaptive shrinkage given repeated outcome measures, exposure measures possibly repeated and time-varying gene-environment association. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

    PubMed

    Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

    2016-04-01

    To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

  8. Homology-integrated CRISPR-Cas (HI-CRISPR) system for one-step multigene disruption in Saccharomyces cerevisiae.

    PubMed

    Bao, Zehua; Xiao, Han; Liang, Jing; Zhang, Lu; Xiong, Xiong; Sun, Ning; Si, Tong; Zhao, Huimin

    2015-05-15

    One-step multiple gene disruption in the model organism Saccharomyces cerevisiae is a highly useful tool for both basic and applied research, but it remains a challenge. Here, we report a rapid, efficient, and potentially scalable strategy based on the type II Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR associated proteins (Cas) system to generate multiple gene disruptions simultaneously in S. cerevisiae. A 100 bp dsDNA mutagenizing homologous recombination donor is inserted between two direct repeats for each target gene in a CRISPR array consisting of multiple donor and guide sequence pairs. An ultrahigh copy number plasmid carrying iCas9, a variant of wild-type Cas9, trans-encoded RNA (tracrRNA), and a homology-integrated crRNA cassette is designed to greatly increase the gene disruption efficiency. As proof of concept, three genes, CAN1, ADE2, and LYP1, were simultaneously disrupted in 4 days with an efficiency ranging from 27 to 87%. Another three genes involved in an artificial hydrocortisone biosynthetic pathway, ATF2, GCY1, and YPR1, were simultaneously disrupted in 6 days with 100% efficiency. This homology-integrated CRISPR (HI-CRISPR) strategy represents a powerful tool for creating yeast strains with multiple gene knockouts.

  9. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  10. MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.

    PubMed

    Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin

    2015-04-01

    Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  11. The transcription factor titration effect dictates level of gene expression.

    PubMed

    Brewster, Robert C; Weinert, Franz M; Garcia, Hernan G; Song, Dan; Rydenfelt, Mattias; Phillips, Rob

    2014-03-13

    Models of transcription are often built around a picture of RNA polymerase and transcription factors (TFs) acting on a single copy of a promoter. However, most TFs are shared between multiple genes with varying binding affinities. Beyond that, genes often exist at high copy number-in multiple identical copies on the chromosome or on plasmids or viral vectors with copy numbers in the hundreds. Using a thermodynamic model, we characterize the interplay between TF copy number and the demand for that TF. We demonstrate the parameter-free predictive power of this model as a function of the copy number of the TF and the number and affinities of the available specific binding sites; such predictive control is important for the understanding of transcription and the desire to quantitatively design the output of genetic circuits. Finally, we use these experiments to dynamically measure plasmid copy number through the cell cycle. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Disrupting the male germ line to find infertility and contraception targets.

    PubMed

    Archambeault, Denise R; Matzuk, Martin M

    2014-05-01

    Genetically-manipulated mouse models have become indispensible for broadening our understanding of genes and pathways related to male germ cell development. Until suitable in vitro systems for studying spermatogenesis are perfected, in vivo models will remain the gold standard for inquiry into testicular function. Here, we discuss exciting advances that are allowing researchers faster, easier, and more customizable access to their mouse models of interest. Specifically, the trans-NIH Knockout Mouse Project (KOMP) is working to generate knockout mouse models of every gene in the mouse genome. The related Knockout Mouse Phenotyping Program (KOMP2) is performing systematic phenotypic analysis of this genome-wide collection of knockout mice, including fertility screening. Together, these programs will not only uncover new genes involved in male germ cell development but also provide the research community with the mouse models necessary for further investigations. In addition to KOMP/KOMP2, another promising development in the field of mouse models is the advent of CRISPR (clustered regularly interspaced short palindromic repeat)-Cas technology. Utilizing 20 nucleotide guide sequences, CRISPR/Cas has the potential to introduce sequence-specific insertions, deletions, and point mutations to produce null, conditional, activated, or reporter-tagged alleles. CRISPR/Cas can also successfully target multiple genes in a single experimental step, forgoing the multiple generations of breeding traditionally required to produce mouse models with deletions, insertions, or mutations in multiple genes. In addition, CRISPR/Cas can be used to create mouse models carrying variants identical to those identified in infertile human patients, providing the opportunity to explore the effects of such mutations in an in vivo system. Both the KOMP/KOMP2 projects and the CRISPR/Cas system provide powerful, accessible genetic approaches to the study of male germ cell development in the mouse. A more complete understanding of male germ cell biology is critical for the identification of novel targets for potential non-hormonal contraceptive intervention. Copyright © 2014. Published by Elsevier Masson SAS.

  13. A critical examination of the numerology of antigen-binding cells: evidence for multiple receptor specificities on single cells.

    PubMed

    Miller, A

    1977-01-01

    The data available from other laboratories as well as our own on the frequency of cells recognizing major histocompatibility antigens or conventional protein and hapten antigens is critically evaluated. The frequency of specific binding for a large number of antigens is sufficiently high to support the idea that at least part of the antigen-binding cell population must have multiple specificities. Our results suggest that these multiple specific cells result from single cells synthesizing and displaying as many as 50-100 species of receptor, each at a frequency of 10(4) per cell. A model involving gene expansion of constant-region genes is suggested and some auxilliary evidence consistent with such C-gene expansion is presented.

  14. Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees.

    PubMed

    Zhu, Sha; Degnan, James H; Goldstien, Sharyn J; Eldon, Bjarki

    2015-09-15

    There has been increasing interest in coalescent models which admit multiple mergers of ancestral lineages; and to model hybridization and coalescence simultaneously. Hybrid-Lambda is a software package that simulates gene genealogies under multiple merger and Kingman's coalescent processes within species networks or species trees. Hybrid-Lambda allows different coalescent processes to be specified for different populations, and allows for time to be converted between generations and coalescent units, by specifying a population size for each population. In addition, Hybrid-Lambda can generate simulated datasets, assuming the infinitely many sites mutation model, and compute the F ST statistic. As an illustration, we apply Hybrid-Lambda to infer the time of subdivision of certain marine invertebrates under different coalescent processes. Hybrid-Lambda makes it possible to investigate biogeographic concordance among high fecundity species exhibiting skewed offspring distribution.

  15. Reference gene selection for quantitative gene expression studies during biological invasions: A test on multiple genes and tissues in a model ascidian Ciona savignyi.

    PubMed

    Huang, Xuena; Gao, Yangchun; Jiang, Bei; Zhou, Zunchun; Zhan, Aibin

    2016-01-15

    As invasive species have successfully colonized a wide range of dramatically different local environments, they offer a good opportunity to study interactions between species and rapidly changing environments. Gene expression represents one of the primary and crucial mechanisms for rapid adaptation to local environments. Here, we aim to select reference genes for quantitative gene expression analysis based on quantitative Real-Time PCR (qRT-PCR) for a model invasive ascidian, Ciona savignyi. We analyzed the stability of ten candidate reference genes in three tissues (siphon, pharynx and intestine) under two key environmental stresses (temperature and salinity) in the marine realm based on three programs (geNorm, NormFinder and delta Ct method). Our results demonstrated only minor difference for stability rankings among the three methods. The use of different single reference gene might influence the data interpretation, while multiple reference genes could minimize possible errors. Therefore, reference gene combinations were recommended for different tissues - the optimal reference gene combination for siphon was RPS15 and RPL17 under temperature stress, and RPL17, UBQ and TubA under salinity treatment; for pharynx, TubB, TubA and RPL17 were the most stable genes under temperature stress, while TubB, TubA and UBQ were the best under salinity stress; for intestine, UBQ, RPS15 and RPL17 were the most reliable reference genes under both treatments. Our results suggest that the necessity of selection and test of reference genes for different tissues under varying environmental stresses. The results obtained here are expected to reveal mechanisms of gene expression-mediated invasion success using C. savignyi as a model species. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. [Analysis of genetic models and gene effects on main agronomy characters in rapeseed].

    PubMed

    Li, J; Qiu, J; Tang, Z; Shen, L

    1992-01-01

    According to four different genetic models, the genetic patterns of 8 agronomy traits were analysed by using the data of 24 generations which included positive and negative cross of 81008 x Tower, both of the varieties are of good quality. The results showed that none of 8 characters could fit in with additive-dominance models. Epistasis was found in all of these characters, and it has significant effect on generation means. Seed weight/plant and some other main yield characters are controlled by duplicate interaction genes. The interaction between triple genes or multiple genes needs to be utilized in yield heterosis.

  17. Dinucleotide controlled null models for comparative RNA gene prediction.

    PubMed

    Gesell, Tanja; Washietl, Stefan

    2008-05-27

    Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.

  18. Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique

    PubMed Central

    2012-01-01

    Background Understanding gene interactions is a fundamental question in systems biology. Currently, modeling of gene regulations using the Bayesian Network (BN) formalism assumes that genes interact either instantaneously or with a certain amount of time delay. However in reality, biological regulations, both instantaneous and time-delayed, occur simultaneously. A framework that can detect and model both these two types of interactions simultaneously would represent gene regulatory networks more accurately. Results In this paper, we introduce a framework based on the Bayesian Network (BN) formalism that can represent both instantaneous and time-delayed interactions between genes simultaneously. A novel scoring metric having firm mathematical underpinnings is also proposed that, unlike other recent methods, can score both interactions concurrently and takes into account the reality that multiple regulators can regulate a gene jointly, rather than in an isolated pair-wise manner. Further, a gene regulatory network (GRN) inference method employing an evolutionary search that makes use of the framework and the scoring metric is also presented. Conclusion By taking into consideration the biological fact that both instantaneous and time-delayed regulations can occur among genes, our approach models gene interactions with greater accuracy. The proposed framework is efficient and can be used to infer gene networks having multiple orders of instantaneous and time-delayed regulations simultaneously. Experiments are carried out using three different synthetic networks (with three different mechanisms for generating synthetic data) as well as real life networks of Saccharomyces cerevisiae, E. coli and cyanobacteria gene expression data. The results show the effectiveness of our approach. PMID:22691450

  19. Comparative genomics of duplicate γ-glutamyl transferase genes in teleosts: medaka (Oryzias latipes), stickleback (Gasterosteus aculeatus), green spotted pufferfish (Tetraodon nigroviridis), fugu (Takifugu rubripes), and zebrafish (Danio rerio).

    PubMed

    Law, Sheran Hiu Wan; Redelings, Benjamin David; Kullman, Seth William

    2012-01-15

    The availability of multiple teleost (bony fish) genomes is providing unprecedented opportunities to understand the diversity and function of gene duplication events using comparative genomics. Here we examine multiple paralogous genes of γ-glutamyl transferase (GGT) in several distantly related teleost species including medaka, stickleback, green spotted pufferfish, fugu, and zebrafish. Through mining genome databases, we have identified multiple GGT orthologs. Duplicate (paralogous) GGT sequences for GGT1 (GGT1 a and b), GGTL1 (GGTL1 a and b), and GGTL3 (GGTL3 a and b) were identified for each species. Phylogenetic analysis suggests that GGTs are ancient proteins conserved across most metazoan phyla and those paralogous GGTs in teleosts likely arose from the serial 3R genome duplication events. A third GGTL1 gene (GGTL1c) was found in green spotted pufferfish; however, this gene is not present in medaka, stickleback, or fugu. Similarly, one or both paralogs of GGTL3 appear to have been lost in green spotted pufferfish, fugu, and zebrafish. Syntenic relationships were highly maintained between duplicated teleost chromosomes, among teleosts and across ray-finned (Actinopterygii) and lobe-finned (Sarcopterygii) species. To assess subfunction partitioning, six medaka GGT genes were cloned and assessed for developmental and tissue-specific expression. On the basis of these data, we propose a modification of the "duplication-degeneration-complementation" model of subfunction partitioning where quantitative differences rather than absolute differences in gene expression are observed between gene paralogs. Our results demonstrate that multiple GGT genes have been retained within teleost genomes. Questions remain, however, regarding the functional roles of multiple GGTs in these species. Copyright © 2011 Wiley Periodicals, Inc., A Wiley Company.

  20. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

    PubMed

    Özgür, Arzucan; Hur, Junguk; He, Yongqun

    2016-01-01

    The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset. By modeling and representing multiple textual keywords for interaction types, the extended INO enabled the identification of complex biological gene-gene interactions represented with multiple keywords.

  1. A Risk Stratification Model for Lung Cancer Based on Gene Coexpression Network and Deep Learning

    PubMed Central

    2018-01-01

    Risk stratification model for lung cancer with gene expression profile is of great interest. Instead of previous models based on individual prognostic genes, we aimed to develop a novel system-level risk stratification model for lung adenocarcinoma based on gene coexpression network. Using multiple microarray, gene coexpression network analysis was performed to identify survival-related networks. A deep learning based risk stratification model was constructed with representative genes of these networks. The model was validated in two test sets. Survival analysis was performed using the output of the model to evaluate whether it could predict patients' survival independent of clinicopathological variables. Five networks were significantly associated with patients' survival. Considering prognostic significance and representativeness, genes of the two survival-related networks were selected for input of the model. The output of the model was significantly associated with patients' survival in two test sets and training set (p < 0.00001, p < 0.0001 and p = 0.02 for training and test sets 1 and 2, resp.). In multivariate analyses, the model was associated with patients' prognosis independent of other clinicopathological features. Our study presents a new perspective on incorporating gene coexpression networks into the gene expression signature and clinical application of deep learning in genomic data science for prognosis prediction. PMID:29581968

  2. Identification of radiation responsive genes and transcriptome profiling via complete RNA sequencing in a stable radioresistant U87 glioblastoma model.

    PubMed

    Doan, Ninh B; Nguyen, Ha S; Alhajala, Hisham S; Jaber, Basem; Al-Gizawiy, Mona M; Ahn, Eun-Young Erin; Mueller, Wade M; Chitambar, Christopher R; Mirza, Shama P; Schmainda, Kathleen M

    2018-05-04

    The absence of major progress in the treatment of glioblastoma (GBM) is partly attributable to our poor understanding of both GBM tumor biology and the acquirement of treatment resistance in recurrent GBMs. Recurrent GBMs are characterized by their resistance to radiation. In this study, we used an established stable U87 radioresistant GBM model and total RNA sequencing to shed light on global mRNA expression changes following irradiation. We identified many genes, the expressions of which were altered in our radioresistant GBM model, that have never before been reported to be associated with the development of radioresistant GBM and should be concertedly further investigated to understand their roles in radioresistance. These genes were enriched in various biological processes such as inflammatory response, cell migration, positive regulation of epithelial to mesenchymal transition, angiogenesis, apoptosis, positive regulation of T-cell migration, positive regulation of macrophage chemotaxis, T-cell antigen processing and presentation, and microglial cell activation involved in immune response genes. These findings furnish crucial information for elucidating the molecular mechanisms associated with radioresistance in GBM. Therapeutically, with the global alterations of multiple biological pathways observed in irradiated GBM cells, an effective GBM therapy may require a cocktail carrying multiple agents targeting multiple implicated pathways in order to have a chance at making a substantial impact on improving the overall GBM survival.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Farahani, Poupak; Chiu, Sally; Bowlus, Christopher L.

    Obesity is a complex disease. To date, over 100 chromosomal loci for body weight, body fat, regional white adipose tissue weight, and other obesity-related traits have been identified in humans and in animal models. For most loci, the underlying genes are not yet identified; some of these chromosomal loci will be alleles of known obesity genes, whereas many will represent alleles of unknown genes. Microarray analysis allows simultaneous multiple gene and pathway discovery. cDNA and oligonucleotide arrays are commonly used to identify differentially expressed genes by surveys of large numbers of known and unnamed genes. Two papers previously identified genesmore » differentially expressed in adipose tissue of mouse models of obesity and diabetes by analysis of hybridization to Affymetrix oligonucleotide chips.« less

  4. Expression of Multiple Stress Response Genes by Escherichia Coli Under Modeled Reduced Gravity

    NASA Astrophysics Data System (ADS)

    Vukanti, Raja; Leff, Laura G.

    2012-09-01

    Bacteria, in response to changes in their environment, quickly regulate gene expression; hence, transcriptional profiling has been widely used to characterize bacterial responses to various environmental conditions. In this study, we used clinorotation to grow bacteria under low-sedimentation, -shear, and -turbulence conditions (referred to as modeled reduced gravity, MRG, below) which profoundly impacts bacteria including causing elevated resistance to multiple environmental stresses. To explore potential mechanisms behind the multiple stress resistance response to MRG, we assessed expression levels of E. coli genes, using reverse transcription followed by real-time-PCR, involved in specific stress and general stress responses under MRG and normal gravity (NG) in nutritionally rich and minimal media, and during exponential and stationary phases of growth. In addition, growth rates as well as physico-chemical parameters of culture media were examined. Over-expression of stress response genes (csiD, cstA, katE, otsA, treA) occurred under MRG compared to NG controls, but only during the later stages of growth in rich medium demonstrating that bacterial response to MRG varies with growth-medium and -phase. At stationary phase in rich medium under MRG and NG, E. coli had similar growth rates (based on rRNA-leader abundance) and yields (cell mass and numbers); this coupled, with observations of simultaneous induction of starvation response genes (csiD and cstA) suggests the multiple stress resistance phenotype under MRG could be attributable to microzones of nutrient unavailability around cells. Overall, in rich medium, the response resembled the general stress response (GSR) that E. coli develops during stationary phase of growth. Along these same lines, induction of genes coding for GSR was reversed by improving nutritional conditions under MRG. The reversal of GSR under MRG suggests that the multiple stress response exhibited is not specific to MRG but may result from nutrient limitation experienced by bacteria after incubation in nutrient-rich media under these conditions.

  5. ECOTOXICOGENOMICS: EXPOSURE INDICATORS USING ESTS AND SUBTRACTIVE LIBRARIES FOR MULTI-LIFE STAGES OF PIMEPHALES

    EPA Science Inventory

    Ecotoxicogenomics is research that identifies patterns of gene expression in wildlife and predicts effects of environmental stressors. We are developing a multiple stressor, multiple life stage exposure model using the fathead minnow (Pimephales promelas), initially studying fou...

  6. Molecular Evolution and Mosaicism of Leptospiral Outer Membrane Proteins Involves Horizontal DNA Transfer

    PubMed Central

    Haake, David A.; Suchard, Marc A.; Kelley, Melissa M.; Dundoo, Manjula; Alt, David P.; Zuerner, Richard L.

    2004-01-01

    Leptospires belong to a genus of parasitic bacterial spirochetes that have adapted to a broad range of mammalian hosts. Mechanisms of leptospiral molecular evolution were explored by sequence analysis of four genes shared by 38 strains belonging to the core group of pathogenic Leptospira species: L. interrogans, L. kirschneri, L. noguchii, L. borgpetersenii, L. santarosai, and L. weilii. The 16S rRNA and lipL32 genes were highly conserved, and the lipL41 and ompL1 genes were significantly more variable. Synonymous substitutions are distributed throughout the ompL1 gene, whereas nonsynonymous substitutions are clustered in four variable regions encoding surface loops. While phylogenetic trees for the 16S, lipL32, and lipL41 genes were relatively stable, 8 of 38 (20%) ompL1 sequences had mosaic compositions consistent with horizontal transfer of DNA between related bacterial species. A novel Bayesian multiple change point model was used to identify the most likely sites of recombination and to determine the phylogenetic relatedness of the segments of the mosaic ompL1 genes. Segments of the mosaic ompL1 genes encoding two of the surface-exposed loops were likely acquired by horizontal transfer from a peregrine allele of unknown ancestry. Identification of the most likely sites of recombination with the Bayesian multiple change point model, an approach which has not previously been applied to prokaryotic gene sequence analysis, serves as a model for future studies of recombination in molecular evolution of genes. PMID:15090524

  7. High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics

    PubMed Central

    Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike

    2010-01-01

    We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139

  8. A vector space model approach to identify genetically related diseases.

    PubMed

    Sarkar, Indra Neil

    2012-01-01

    The relationship between diseases and their causative genes can be complex, especially in the case of polygenic diseases. Further exacerbating the challenges in their study is that many genes may be causally related to multiple diseases. This study explored the relationship between diseases through the adaptation of an approach pioneered in the context of information retrieval: vector space models. A vector space model approach was developed that bridges gene disease knowledge inferred across three knowledge bases: Online Mendelian Inheritance in Man, GenBank, and Medline. The approach was then used to identify potentially related diseases for two target diseases: Alzheimer disease and Prader-Willi Syndrome. In the case of both Alzheimer Disease and Prader-Willi Syndrome, a set of plausible diseases were identified that may warrant further exploration. This study furthers seminal work by Swanson, et al. that demonstrated the potential for mining literature for putative correlations. Using a vector space modeling approach, information from both biomedical literature and genomic resources (like GenBank) can be combined towards identification of putative correlations of interest. To this end, the relevance of the predicted diseases of interest in this study using the vector space modeling approach were validated based on supporting literature. The results of this study suggest that a vector space model approach may be a useful means to identify potential relationships between complex diseases, and thereby enable the coordination of gene-based findings across multiple complex diseases.

  9. Exploring Conceptual Change in Genetics Using a Multidimensional Interpretive Framework.

    ERIC Educational Resources Information Center

    Venville, Grady J.; Treagust, David F.

    1998-01-01

    Changes in grade 10 students' (n=79) conceptions of genes during genetics instruction was studied from multiple perspectives. Ontologically, most students moved from passive to active models of genes. Affectively, students were interested in genetics but unmotivated by microscopic mechanistic explanations; however, teaching approaches were…

  10. Identifying Loci Under Selection Against Gene Flow in Isolation-with-Migration Models

    PubMed Central

    Sousa, Vitor C.; Carneiro, Miguel; Ferrand, Nuno; Hey, Jody

    2013-01-01

    When divergence occurs in the presence of gene flow, there can arise an interesting dynamic in which selection against gene flow, at sites associated with population-specific adaptations or genetic incompatibilities, can cause net gene flow to vary across the genome. Loci linked to sites under selection may experience reduced gene flow and may experience genetic bottlenecks by the action of nearby selective sweeps. Data from histories such as these may be poorly fitted by conventional neutral model approaches to demographic inference, which treat all loci as equally subject to forces of genetic drift and gene flow. To allow for demographic inference in the face of such histories, as well as the identification of loci affected by selection, we developed an isolation-with-migration model that explicitly provides for variation among genomic regions in migration rates and/or rates of genetic drift. The method allows for loci to fall into any of multiple groups, each characterized by a different set of parameters, thus relaxing the assumption that all loci share the same demography. By grouping loci, the method can be applied to data with multiple loci and still have tractable dimensionality and statistical power. We studied the performance of the method using simulated data, and we applied the method to study the divergence of two subspecies of European rabbits (Oryctolagus cuniculus). PMID:23457232

  11. The extraction of simple relationships in growth factor-specific multiple-input and multiple-output systems in cell-fate decisions by backward elimination PLS regression.

    PubMed

    Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya

    2013-01-01

    Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.

  12. Small RNA biology is systems biology.

    PubMed

    Jost, Daniel; Nowojewski, Andrzej; Levine, Erel

    2011-01-01

    During the last decade small regulatory RNA (srRNA) emerged as central players in the regulation of gene expression in all kingdoms of life. Multiple pathways for srRNA biogenesis and diverse mechanisms of gene regulation may indicate that srRNA regulation evolved independently multiple times. However, small RNA pathways share numerous properties, including the ability of a single srRNA to regulate multiple targets. Some of the mechanisms of gene regulation by srRNAs have significant effect on the abundance of free srRNAs that are ready to interact with new targets. This results in indirect interactions among seemingly unrelated genes, as well as in a crosstalk between different srRNA pathways. Here we briefly review and compare the major srRNA pathways, and argue that the impact of srRNA is always at the system level. We demonstrate how a simple mathematical model can ease the discussion of governing principles. To demonstrate these points we review a few examples from bacteria and animals.

  13. Network Analysis of Rodent Transcriptomes in Spaceflight

    NASA Technical Reports Server (NTRS)

    Ramachandran, Maya; Fogle, Homer; Costes, Sylvain

    2017-01-01

    Network analysis methods leverage prior knowledge of cellular systems and the statistical and conceptual relationships between analyte measurements to determine gene connectivity. Correlation and conditional metrics are used to infer a network topology and provide a systems-level context for cellular responses. Integration across multiple experimental conditions and omics domains can reveal the regulatory mechanisms that underlie gene expression. GeneLab has assembled rich multi-omic (transcriptomics, proteomics, epigenomics, and epitranscriptomics) datasets for multiple murine tissues from the Rodent Research 1 (RR-1) experiment. RR-1 assesses the impact of 37 days of spaceflight on gene expression across a variety of tissue types, such as adrenal glands, quadriceps, gastrocnemius, tibalius anterior, extensor digitorum longus, soleus, eye, and kidney. Network analysis is particularly useful for RR-1 -omics datasets because it reinforces subtle relationships that may be overlooked in isolated analyses and subdues confounding factors. Our objective is to use network analysis to determine potential target nodes for therapeutic intervention and identify similarities with existing disease models. Multiple network algorithms are used for a higher confidence consensus.

  14. Applications and Limitations of Mouse Models for Understanding Human Atherosclerosis

    PubMed Central

    von Scheidt, Moritz; Zhao, Yuqi; Kurt, Zeyneb; Pan, Calvin; Zeng, Lingyao; Yang, Xia; Schunkert, Heribert; Lusis, Aldons J.

    2017-01-01

    Most of the biological understanding of mechanisms underlying coronary artery disease (CAD) derives from studies of mouse models. The identification of multiple CAD loci and strong candidate genes in large human genome-wide association studies (GWAS) presented an opportunity to examine the relevance of mouse models for the human disease. We comprehensively reviewed the mouse literature, including 827 literature-derived genes, and compared it to human data. First, we observed striking concordance of risk factors for atherosclerosis in mice and humans. Second, there was highly significant overlap of mouse genes with human genes identified by GWAS. In particular, of the 46 genes with strong association signals in CAD-GWAS that were studied in mouse models all but one exhibited consistent effects on atherosclerosis-related phenotypes. Third, we compared 178 CAD-associated pathways derived from human GWAS with 263 from mouse studies and observed that over 50% were consistent between both species. PMID:27916529

  15. Markov State Models of gene regulatory networks.

    PubMed

    Chu, Brian K; Tse, Margaret J; Sato, Royce R; Read, Elizabeth L

    2017-02-06

    Gene regulatory networks with dynamics characterized by multiple stable states underlie cell fate-decisions. Quantitative models that can link molecular-level knowledge of gene regulation to a global understanding of network dynamics have the potential to guide cell-reprogramming strategies. Networks are often modeled by the stochastic Chemical Master Equation, but methods for systematic identification of key properties of the global dynamics are currently lacking. The method identifies the number, phenotypes, and lifetimes of long-lived states for a set of common gene regulatory network models. Application of transition path theory to the constructed Markov State Model decomposes global dynamics into a set of dominant transition paths and associated relative probabilities for stochastic state-switching. In this proof-of-concept study, we found that the Markov State Model provides a general framework for analyzing and visualizing stochastic multistability and state-transitions in gene networks. Our results suggest that this framework-adopted from the field of atomistic Molecular Dynamics-can be a useful tool for quantitative Systems Biology at the network scale.

  16. Cross-species multiple environmental stress responses: An integrated approach to identify candidate genes for multiple stress tolerance in sorghum (Sorghum bicolor (L.) Moench) and related model species

    PubMed Central

    Modise, David M.; Gemeildien, Junaid; Ndimba, Bongani K.; Christoffels, Alan

    2018-01-01

    Background Crop response to the changing climate and unpredictable effects of global warming with adverse conditions such as drought stress has brought concerns about food security to the fore; crop yield loss is a major cause of concern in this regard. Identification of genes with multiple responses across environmental stresses is the genetic foundation that leads to crop adaptation to environmental perturbations. Methods In this paper, we introduce an integrated approach to assess candidate genes for multiple stress responses across-species. The approach combines ontology based semantic data integration with expression profiling, comparative genomics, phylogenomics, functional gene enrichment and gene enrichment network analysis to identify genes associated with plant stress phenotypes. Five different ontologies, viz., Gene Ontology (GO), Trait Ontology (TO), Plant Ontology (PO), Growth Ontology (GRO) and Environment Ontology (EO) were used to semantically integrate drought related information. Results Target genes linked to Quantitative Trait Loci (QTLs) controlling yield and stress tolerance in sorghum (Sorghum bicolor (L.) Moench) and closely related species were identified. Based on the enriched GO terms of the biological processes, 1116 sorghum genes with potential responses to 5 different stresses, such as drought (18%), salt (32%), cold (20%), heat (8%) and oxidative stress (25%) were identified to be over-expressed. Out of 169 sorghum drought responsive QTLs associated genes that were identified based on expression datasets, 56% were shown to have multiple stress responses. On the other hand, out of 168 additional genes that have been evaluated for orthologous pairs, 90% were conserved across species for drought tolerance. Over 50% of identified maize and rice genes were responsive to drought and salt stresses and were co-located within multifunctional QTLs. Among the total identified multi-stress responsive genes, 272 targets were shown to be co-localized within QTLs associated with different traits that are responsive to multiple stresses. Ontology mapping was used to validate the identified genes, while reconstruction of the phylogenetic tree was instrumental to infer the evolutionary relationship of the sorghum orthologs. The results also show specific genes responsible for various interrelated components of drought response mechanism such as drought tolerance, drought avoidance and drought escape. Conclusions We submit that this approach is novel and to our knowledge, has not been used previously in any other research; it enables us to perform cross-species queries for genes that are likely to be associated with multiple stress tolerance, as a means to identify novel targets for engineering stress resistance in sorghum and possibly, in other crop species. PMID:29590108

  17. multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

    PubMed

    Kang, Guangliang; Du, Li; Zhang, Hong

    2016-06-22

    The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

  18. Isolation with Migration Models for More Than Two Populations

    PubMed Central

    Hey, Jody

    2010-01-01

    A method for studying the divergence of multiple closely related populations is described and assessed. The approach of Hey and Nielsen (2007, Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci USA. 104:2785–2790) for fitting an isolation-with-migration model was extended to the case of multiple populations with a known phylogeny. Analysis of simulated data sets reveals the kinds of history that are accessible with a multipopulation analysis. Necessarily, processes associated with older time periods in a phylogeny are more difficult to estimate; and histories with high levels of gene flow are particularly difficult with more than two populations. However, for histories with modest levels of gene flow, or for very large data sets, it is possible to study large complex divergence problems that involve multiple closely related populations or species. PMID:19955477

  19. Isolation with migration models for more than two populations.

    PubMed

    Hey, Jody

    2010-04-01

    A method for studying the divergence of multiple closely related populations is described and assessed. The approach of Hey and Nielsen (2007, Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci USA. 104:2785-2790) for fitting an isolation-with-migration model was extended to the case of multiple populations with a known phylogeny. Analysis of simulated data sets reveals the kinds of history that are accessible with a multipopulation analysis. Necessarily, processes associated with older time periods in a phylogeny are more difficult to estimate; and histories with high levels of gene flow are particularly difficult with more than two populations. However, for histories with modest levels of gene flow, or for very large data sets, it is possible to study large complex divergence problems that involve multiple closely related populations or species.

  20. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  1. MYD88 and functionally related genes are associated with multiple infections in a model population of Kenyan village dogs.

    PubMed

    Necesankova, Michaela; Vychodilova, Leona; Albrechtova, Katerina; Kennedy, Lorna J; Hlavac, Jan; Sedlak, Kamil; Modry, David; Janova, Eva; Vyskocil, Mirko; Horin, Petr

    2016-12-01

    The purpose of this study was to seek associations between immunity-related molecular markers and endemic infections in a model population of African village dogs from Northern Kenya with no veterinary care and no selective breeding. A population of village dogs from Northern Kenya composed of three sub-populations from three different areas (84, 50 and 55 dogs) was studied. Canine distemper virus (CDV), Hepatozoon canis, Microfilariae (Acantocheilonema dracunculoides, Acantocheilonema reconditum) and Neospora caninum were the pathogens studied. The presence of antibodies (CDV, Neospora), light microscopy (Hepatozoon) and diagnostic PCR (Microfilariae) were the methods used for diagnosing infection. Genes involved in innate immune mechanisms, NOS3, IL6, TLR1, TLR2, TLR4, TLR7, TLR9, LY96, MYD88, and three major histocompatibility genes class II genes were selected as candidates. Single nucleotide polymorphism (SNP) markers were detected by Sanger sequencing, next generation sequencing and PCR-RFLP. The Fisher´s exact test for additive and non-additive models was used for association analyses. Three SNPs within the MYD88 gene and one TLR4 SNP marker were associated with more than one infection. Combined genotypes and further markers identified by next generation sequencing confirmed associations observed for individual genes. The genes associated with infection and their combinations in specific genotypes match well our knowledge on their biological role and on the role of the relevant biological pathways, respectively. Associations with multiple infections observed between the MYD88 and TLR4 genes suggest their involvement in the mechanisms of anti-infectious defenses in dogs.

  2. Analysis of 30 Genes (355 SNPS) Related to Energy Homeostasis for Association with Adiposity in European-American and Yup'ik Eskimo Populations

    PubMed Central

    Chung, Wendy K.; Patki, Amit; Matsuoka, Naoki; Boyer, Bert B.; Liu, Nianjun; Musani, Solomon K.; Goropashnaya, Anna V.; Tan, Perciliz L.; Katsanis, Nicholas; Johnson, Stephen B.; Gregersen, Peter K.; Allison, David B.; Leibel, Rudolph L.; Tiwari, Hemant K.

    2009-01-01

    Objective Human adiposity is highly heritable, but few of the genes that predispose to obesity in most humans are known. We tested candidate genes in pathways related to food intake and energy expenditure for association with measures of adiposity. Methods We studied 355 genetic variants in 30 candidate genes in 7 molecular pathways related to obesity in two groups of adult subjects: 1,982 unrelated European Americans living in the New York metropolitan area drawn from the extremes of their body mass index (BMI) distribution and 593 related Yup'ik Eskimos living in rural Alaska characterized for BMI, body composition, waist circumference, and skin fold thicknesses. Data were analyzed by using a mixed model in conjunction with a false discovery rate (FDR) procedure to correct for multiple testing. Results After correcting for multiple testing, two single nucleotide polymorphisms (SNPs) in Ghrelin (GHRL) (rs35682 and rs35683) were associated with BMI in the New York European Americans. This association was not replicated in the Yup'ik participants. There was no evidence for gene × gene interactions among genes within the same molecular pathway after adjusting for multiple testing via FDR control procedure. Conclusion Genetic variation in GHRL may have a modest impact on BMI in European Americans. PMID:19077438

  3. Analysis of 30 genes (355 SNPS) related to energy homeostasis for association with adiposity in European-American and Yup'ik Eskimo populations.

    PubMed

    Chung, Wendy K; Patki, Amit; Matsuoka, Naoki; Boyer, Bert B; Liu, Nianjun; Musani, Solomon K; Goropashnaya, Anna V; Tan, Perciliz L; Katsanis, Nicholas; Johnson, Stephen B; Gregersen, Peter K; Allison, David B; Leibel, Rudolph L; Tiwari, Hemant K

    2009-01-01

    Human adiposity is highly heritable, but few of the genes that predispose to obesity in most humans are known. We tested candidate genes in pathways related to food intake and energy expenditure for association with measures of adiposity. We studied 355 genetic variants in 30 candidate genes in 7 molecular pathways related to obesity in two groups of adult subjects: 1,982 unrelated European Americans living in the New York metropolitan area drawn from the extremes of their body mass index (BMI) distribution and 593 related Yup'ik Eskimos living in rural Alaska characterized for BMI, body composition, waist circumference, and skin fold thicknesses. Data were analyzed by using a mixed model in conjunction with a false discovery rate (FDR) procedure to correct for multiple testing. After correcting for multiple testing, two single nucleotide polymorphisms (SNPs) in Ghrelin (GHRL) (rs35682 and rs35683) were associated with BMI in the New York European Americans. This association was not replicated in the Yup'ik participants. There was no evidence for gene x gene interactions among genes within the same molecular pathway after adjusting for multiple testing via FDR control procedure. Genetic variation in GHRL may have a modest impact on BMI in European Americans.

  4. History of a prolific family: the Hes/Hey-related genes of the annelid Platynereis.

    PubMed

    Gazave, Eve; Guillou, Aurélien; Balavoine, Guillaume

    2014-01-01

    The Hes superfamily or Hes/Hey-related genes encompass a variety of metazoan-specific bHLH genes, with somewhat fuzzy phylogenetic relationships. Hes superfamily members are involved in a variety of major developmental mechanisms in metazoans, notably in neurogenesis and segmentation processes, in which they often act as direct effector genes of the Notch signaling pathway. We have investigated the molecular and functional evolution of the Hes superfamily in metazoans using the lophotrochozoan Platynereis dumerilii as model. Our phylogenetic analyses of more than 200 Metazoan Hes/Hey-related genes revealed the presence of five families, three of them (Hes, Hey and Helt) being pan-metazoan. Those families were likely composed of a unique representative in the last common metazoan ancestor. The evolution of the Hes family was shaped by many independent lineage specific tandem duplication events. The expression patterns of 13 of the 15 Hes/Hey-related genes in Platynereis indicate a broad functional diversification. Nevertheless, a majority of these genes are involved in two crucial developmental processes in annelids: neurogenesis and segmentation, resembling functions highlighted in other animal models. Combining phylogenetic and expression data, our study suggests an unusual evolutionary history for the Hes superfamily. An ancestral multifunctional annelid Hes gene may have undergone multiples rounds of duplication-degeneration-complementation processes in the lineage leading to Platynereis, each gene copies ensuring their maintenance in the genome by subfunctionalisation. Similar but independent waves of duplications are at the origin of the multiplicity of Hes genes in other metazoan lineages.

  5. MU OPIOID RECEPTORS IN PAIN MANAGEMENT

    PubMed Central

    Pasternak, Gavril; Pan, Ying-Xian

    2014-01-01

    Most of the potent analgesics currently in use act through the mu opioid receptor. Although they are classified as mu opioids, clinical experience suggests differences among them. The relative potencies of the agents can vary from patient to patient, as well as the side-effect profiles. These observations, coupled with pharmacological approaches in preclinical models, led to the suggestion of multiple subtypes of mu receptors. The explosion in molecular biology has led to the identification of a single gene encoding mu opioid receptors. It now appears that this gene undergoes extensive splicing, in which a single gene can generate multiple proteins. Evidence now suggests that these splice variants may help explain the clinical variability in responses among patients. PMID:21453899

  6. Normal uniform mixture differential gene expression detection for cDNA microarrays

    PubMed Central

    Dean, Nema; Raftery, Adrian E

    2005-01-01

    Background One of the primary tasks in analysing gene expression data is finding genes that are differentially expressed in different samples. Multiple testing issues due to the thousands of tests run make some of the more popular methods for doing this problematic. Results We propose a simple method, Normal Uniform Differential Gene Expression (NUDGE) detection for finding differentially expressed genes in cDNA microarrays. The method uses a simple univariate normal-uniform mixture model, in combination with new normalization methods for spread as well as mean that extend the lowess normalization of Dudoit, Yang, Callow and Speed (2002) [1]. It takes account of multiple testing, and gives probabilities of differential expression as part of its output. It can be applied to either single-slide or replicated experiments, and it is very fast. Three datasets are analyzed using NUDGE, and the results are compared to those given by other popular methods: unadjusted and Bonferroni-adjusted t tests, Significance Analysis of Microarrays (SAM), and Empirical Bayes for microarrays (EBarrays) with both Gamma-Gamma and Lognormal-Normal models. Conclusion The method gives a high probability of differential expression to genes known/suspected a priori to be differentially expressed and a low probability to the others. In terms of known false positives and false negatives, the method outperforms all multiple-replicate methods except for the Gamma-Gamma EBarrays method to which it offers comparable results with the added advantages of greater simplicity, speed, fewer assumptions and applicability to the single replicate case. An R package called nudge to implement the methods in this paper will be made available soon at . PMID:16011807

  7. Deciphering the associations between gene expression and copy number alteration using a sparse double Laplacian shrinkage approach

    PubMed Central

    Shi, Xingjie; Zhao, Qing; Huang, Jian; Xie, Yang; Ma, Shuangge

    2015-01-01

    Motivation: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. Results: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. Availability and implementation: R code is available at http://works.bepress.com/shuangge/49/ Contact: shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342102

  8. A high resolution atlas of gene expression in the domestic sheep (Ovis aries)

    PubMed Central

    Farquhar, Iseabail L.; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G.; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C. Bruce; Freeman, Tom C.; Archibald, Alan L.; Hume, David A.

    2017-01-01

    Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of ‘guilt by association’ was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages. PMID:28915238

  9. A high resolution atlas of gene expression in the domestic sheep (Ovis aries).

    PubMed

    Clark, Emily L; Bush, Stephen J; McCulloch, Mary E B; Farquhar, Iseabail L; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G; Wu, Chunlei; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C Bruce; Freeman, Tom C; Summers, Kim M; Archibald, Alan L; Hume, David A

    2017-09-01

    Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.

  10. Regulation of epidermal cell fate in Arabidopsis roots: the importance of multiple feedback loops

    PubMed Central

    Schiefelbein, John; Huang, Ling; Zheng, Xiaohua

    2014-01-01

    The specification of distinct cell types in multicellular organisms is accomplished via establishment of differential gene expression. A major question is the nature of the mechanisms that establish this differential expression in time and space. In plants, the formation of the hair and non-hair cell types in the root epidermis has been used as a model to understand regulation of cell specification. Recent findings show surprising complexity in the number and the types of regulatory interactions between the multiple transcription factor genes/proteins influencing root epidermis cell fate. Here, we describe this regulatory network and the importance of the multiple feedback loops for its establishment and maintenance. PMID:24596575

  11. Animal models of pituitary neoplasia

    PubMed Central

    Lines, K.E.; Stevenson, M.; Thakker, R.V.

    2016-01-01

    Pituitary neoplasias can occur as part of a complex inherited disorder, or more commonly as sporadic (non-familial) disease. Studies of the molecular and genetic mechanisms causing such pituitary tumours have identified dysregulation of >35 genes, with many revealed by studies in mice, rats and zebrafish. Strategies used to generate these animal models have included gene knockout, gene knockin and transgenic over-expression, as well as chemical mutagenesis and drug induction. These animal models provide an important resource for investigation of tissue-specific tumourigenic mechanisms, and evaluations of novel therapies, illustrated by studies into multiple endocrine neoplasia type 1 (MEN1), a hereditary syndrome in which ∼30% of patients develop pituitary adenomas. This review describes animal models of pituitary neoplasia that have been generated, together with some recent advances in gene editing technologies, and an illustration of the use of the Men1 mouse as a pre clinical model for evaluating novel therapies. PMID:26320859

  12. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    PubMed

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  13. Integration of Immune Cell Populations, mRNA-Seq, and CpG Methylation to Better Predict Humoral Immunity to Influenza Vaccination: Dependence of mRNA-Seq/CpG Methylation on Immune Cell Populations

    PubMed Central

    Zimmermann, Michael T.; Kennedy, Richard B.; Grill, Diane E.; Oberg, Ann L.; Goergen, Krista M.; Ovsyannikova, Inna G.; Haralambieva, Iana H.; Poland, Gregory A.

    2017-01-01

    The development of a humoral immune response to influenza vaccines occurs on a multisystems level. Due to the orchestration required for robust immune responses when multiple genes and their regulatory components across multiple cell types are involved, we examined an influenza vaccination cohort using multiple high-throughput technologies. In this study, we sought a more thorough understanding of how immune cell composition and gene expression relate to each other and contribute to interindividual variation in response to influenza vaccination. We first hypothesized that many of the differentially expressed (DE) genes observed after influenza vaccination result from changes in the composition of participants’ peripheral blood mononuclear cells (PBMCs), which were assessed using flow cytometry. We demonstrated that DE genes in our study are correlated with changes in PBMC composition. We gathered DE genes from 128 other publically available PBMC-based vaccine studies and identified that an average of 57% correlated with specific cell subset levels in our study (permutation used to control false discovery), suggesting that the associations we have identified are likely general features of PBMC-based transcriptomics. Second, we hypothesized that more robust models of vaccine response could be generated by accounting for the interplay between PBMC composition, gene expression, and gene regulation. We employed machine learning to generate predictive models of B-cell ELISPOT response outcomes and hemagglutination inhibition (HAI) antibody titers. The top HAI and B-cell ELISPOT model achieved an area under the receiver operating curve (AUC) of 0.64 and 0.79, respectively, with linear model coefficients of determination of 0.08 and 0.28. For the B-cell ELISPOT outcomes, CpG methylation had the greatest predictive ability, highlighting potentially novel regulatory features important for immune response. B-cell ELISOT models using only PBMC composition had lower performance (AUC = 0.67), but highlighted well-known mechanisms. Our analysis demonstrated that each of the three data sets (cell composition, mRNA-Seq, and DNA methylation) may provide distinct information for the prediction of humoral immune response outcomes. We believe that these findings are important for the interpretation of current omics-based studies and set the stage for a more thorough understanding of interindividual immune responses to influenza vaccination. PMID:28484452

  14. Opsins have evolved under the permanent heterozygote model: insights from phylotranscriptomics of Odonata.

    PubMed

    Suvorov, Anton; Jensen, Nicholas O; Sharkey, Camilla R; Fujimoto, M Stanley; Bodily, Paul; Wightman, Haley M Cahill; Ogden, T Heath; Clement, Mark J; Bybee, Seth M

    2017-03-01

    Gene duplication plays a central role in adaptation to novel environments by providing new genetic material for functional divergence and evolution of biological complexity. Several evolutionary models have been proposed for gene duplication to explain how new gene copies are preserved by natural selection, but these models have rarely been tested using empirical data. Opsin proteins, when combined with a chromophore, form a photopigment that is responsible for the absorption of light, the first step in the phototransduction cascade. Adaptive gene duplications have occurred many times within the animal opsins' gene family, leading to novel wavelength sensitivities. Consequently, opsins are an attractive choice for the study of gene duplication evolutionary models. Odonata (dragonflies and damselflies) have the largest opsin repertoire of any insect currently known. Additionally, there is tremendous variation in opsin copy number between species, particularly in the long-wavelength-sensitive (LWS) class. Using comprehensive phylotranscriptomic and statistical approaches, we tested various evolutionary models of gene duplication. Our results suggest that both the blue-sensitive (BS) and LWS opsin classes were subjected to strong positive selection that greatly weakens after multiple duplication events, a pattern that is consistent with the permanent heterozygote model. Due to the immense interspecific variation and duplicability potential of opsin genes among odonates, they represent a unique model system to test hypotheses regarding opsin gene duplication and diversification at the molecular level. © 2016 John Wiley & Sons Ltd.

  15. Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model

    PubMed Central

    Hua, Jianping; Bittner, Michael L.; Dougherty, Edward R.

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance. PMID:24558298

  16. Use of Network Inference to Elucidate Common and Chemical-specific Effects on Steoidogenesis

    EPA Science Inventory

    Microarray data is a key source for modeling gene regulatory interactions. Regulatory network models based on multiple datasets are potentially more robust and can provide greater confidence. In this study, we used network modeling on microarray data generated by exposing the fat...

  17. Genome-wide transcriptomics of aging in the rotifer Brachionus manjavacas, an emerging model system.

    PubMed

    Gribble, Kristin E; Mark Welch, David B

    2017-03-01

    Understanding gene expression changes over lifespan in diverse animal species will lead to insights to conserved processes in the biology of aging and allow development of interventions to improve health. Rotifers are small aquatic invertebrates that have been used in aging studies for nearly 100 years and are now re-emerging as a modern model system. To provide a baseline to evaluate genetic responses to interventions that change health throughout lifespan and a framework for new hypotheses about the molecular genetic mechanisms of aging, we examined the transcriptome of an asexual female lineage of the rotifer Brachionus manjavacas at five life stages: eggs, neonates, and early-, late-, and post-reproductive adults. There are widespread shifts in gene expression over the lifespan of B. manjavacas; the largest change occurs between neonates and early reproductive adults and is characterized by down-regulation of developmental genes and up-regulation of genes involved in reproduction. The expression profile of post-reproductive adults was distinct from that of other life stages. While few genes were significantly differentially expressed in the late- to post-reproductive transition, gene set enrichment analysis revealed multiple down-regulated pathways in metabolism, maintenance and repair, and proteostasis, united by genes involved in mitochondrial function and oxidative phosphorylation. This study provides the first examination of changes in gene expression over lifespan in rotifers. We detected differential expression of many genes with human orthologs that are absent in Drosophila and C. elegans, highlighting the potential of the rotifer model in aging studies. Our findings suggest that small but coordinated changes in expression of many genes in pathways that integrate diverse functions drive the aging process. The observation of simultaneous declines in expression of genes in multiple pathways may have consequences for health and longevity not detected by single- or multi-gene knockdown in otherwise healthy animals. Investigation of subtle but genome-wide change in these pathways during aging is an important area for future study.

  18. The processive kinetics of gene conversion in bacteria

    PubMed Central

    Paulsson, Johan; El Karoui, Meriem; Lindell, Monica

    2017-01-01

    Summary Gene conversion, non‐reciprocal transfer from one homologous sequence to another, is a major force in evolutionary dynamics, promoting co‐evolution in gene families and maintaining similarities between repeated genes. However, the properties of the transfer – where it initiates, how far it proceeds and how the resulting conversion tracts are affected by mismatch repair – are not well understood. Here, we use the duplicate tuf genes in Salmonella as a quantitatively tractable model system for gene conversion. We selected for conversion in multiple different positions of tuf, and examined the resulting distributions of conversion tracts in mismatch repair‐deficient and mismatch repair‐proficient strains. A simple stochastic model accounting for the essential steps of conversion showed excellent agreement with the data for all selection points using the same value of the conversion processivity, which is the only kinetic parameter of the model. The analysis suggests that gene conversion effectively initiates uniformly at any position within a tuf gene, and proceeds with an effectively uniform conversion processivity in either direction limited by the bounds of the gene. PMID:28256783

  19. Inherited genetic variants associated with occurrence of multiple primary melanoma.

    PubMed

    Gibbs, David C; Orlow, Irene; Kanetsky, Peter A; Luo, Li; Kricker, Anne; Armstrong, Bruce K; Anton-Culver, Hoda; Gruber, Stephen B; Marrett, Loraine D; Gallagher, Richard P; Zanetti, Roberto; Rosso, Stefano; Dwyer, Terence; Sharma, Ajay; La Pilla, Emily; From, Lynn; Busam, Klaus J; Cust, Anne E; Ollila, David W; Begg, Colin B; Berwick, Marianne; Thomas, Nancy E

    2015-06-01

    Recent studies, including genome-wide association studies, have identified several putative low-penetrance susceptibility loci for melanoma. We sought to determine their generalizability to genetic predisposition for multiple primary melanoma in the international population-based Genes, Environment, and Melanoma (GEM) Study. GEM is a case-control study of 1,206 incident cases of multiple primary melanoma and 2,469 incident first primary melanoma participants as the control group. We investigated the odds of developing multiple primary melanoma for 47 SNPs from 21 distinct genetic regions previously reported to be associated with melanoma. ORs and 95% confidence intervals were determined using logistic regression models adjusted for baseline features (age, sex, age by sex interaction, and study center). We investigated univariable models and built multivariable models to assess independent effects of SNPs. Eleven SNPs in 6 gene neighborhoods (TERT/CLPTM1L, TYRP1, MTAP, TYR, NCOA6, and MX2) and a PARP1 haplotype were associated with multiple primary melanoma. In a multivariable model that included only the most statistically significant findings from univariable modeling and adjusted for pigmentary phenotype, back nevi, and baseline features, we found TERT/CLPTM1L rs401681 (P = 0.004), TYRP1 rs2733832 (P = 0.006), MTAP rs1335510 (P = 0.0005), TYR rs10830253 (P = 0.003), and MX2 rs45430 (P = 0.008) to be significantly associated with multiple primary melanoma, while NCOA6 rs4911442 approached significance (P = 0.06). The GEM Study provides additional evidence for the relevance of these genetic regions to melanoma risk and estimates the magnitude of the observed genetic effect on development of subsequent primary melanoma. ©2015 American Association for Cancer Research.

  20. Inherited genetic variants associated with occurrence of multiple primary melanoma

    PubMed Central

    Gibbs, David C.; Orlow, Irene; Kanetsky, Peter A.; Luo, Li; Kricker, Anne; Armstrong, Bruce K.; Anton-Culver, Hoda; Gruber, Stephen B.; Marrett, Loraine D.; Gallagher, Richard P.; Zanetti, Roberto; Rosso, Stefano; Dwyer, Terence; Sharma, Ajay; La Pilla, Emily; From, Lynn; Busam, Klaus J.; Cust, Anne E.; Ollila, David W.; Begg, Colin B.; Berwick, Marianne; Thomas, Nancy E.

    2015-01-01

    Recent studies including genome-wide association studies have identified several putative low-penetrance susceptibility loci for melanoma. We sought to determine their generalizability to genetic predisposition for multiple primary melanoma in the international population-based Genes, Environment, and Melanoma (GEM) Study. GEM is a case-control study of 1,206 incident cases of multiple primary melanoma and 2,469 incident first primary melanoma participants as the control group. We investigated the odds of developing multiple primary melanoma for 47 single nucleotide polymorphisms (SNP) from 21 distinct genetic regions previously reported to be associated with melanoma. ORs and 95% CIs were determined using logistic regression models adjusted for baseline features (age, sex, age by sex interaction, and study center). We investigated univariable models and built multivariable models to assess independent effects of SNPs. Eleven SNPs in 6 gene neighborhoods (TERT/CLPTM1L, TYRP1, MTAP, TYR, NCOA6, and MX2) and a PARP1 haplotype were associated with multiple primary melanoma. In a multivariable model that included only the most statistically significant findings from univariable modeling and adjusted for pigmentary phenotype, back nevi, and baseline features, we found TERT/CLPTM1L rs401681 (P = 0.004), TYRP1 rs2733832 (P = 0.006), MTAP rs1335510 (P = 0.0005), TYR rs10830253 (P = 0.003), and MX2 rs45430 (P = 0.008) to be significantly associated with multiple primary melanoma while NCOA6 rs4911442 approached significance (P = 0.06). The GEM study provides additional evidence for the relevance of these genetic regions to melanoma risk and estimates the magnitude of the observed genetic effect on development of subsequent primary melanoma. PMID:25837821

  1. Pleiotropy Analysis of Quantitative Traits at Gene Level by Multivariate Functional Linear Models

    PubMed Central

    Wang, Yifan; Liu, Aiyi; Mills, James L.; Boehnke, Michael; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Xiong, Momiao; Wu, Colin O.; Fan, Ruzong

    2015-01-01

    In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks’s Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. PMID:25809955

  2. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.

    PubMed

    Wang, Yifan; Liu, Aiyi; Mills, James L; Boehnke, Michael; Wilson, Alexander F; Bailey-Wilson, Joan E; Xiong, Momiao; Wu, Colin O; Fan, Ruzong

    2015-05-01

    In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. © 2015 WILEY PERIODICALS, INC.

  3. A Single Multiplex crRNA Array for FnCpf1-Mediated Human Genome Editing.

    PubMed

    Sun, Huihui; Li, Fanfan; Liu, Jie; Yang, Fayu; Zeng, Zhenhai; Lv, Xiujuan; Tu, Mengjun; Liu, Yeqing; Ge, Xianglian; Liu, Changbao; Zhao, Junzhao; Zhang, Zongduan; Qu, Jia; Song, Zongming; Gu, Feng

    2018-06-15

    Cpf1 has been harnessed as a tool for genome manipulation in various species because of its simplicity and high efficiency. Our recent study demonstrated that FnCpf1 could be utilized for human genome editing with notable advantages for target sequence selection due to the flexibility of the protospacer adjacent motif (PAM) sequence. Multiplex genome editing provides a powerful tool for targeting members of multigene families, dissecting gene networks, modeling multigenic disorders in vivo, and applying gene therapy. However, there are no reports at present that show FnCpf1-mediated multiplex genome editing via a single customized CRISPR RNA (crRNA) array. In the present study, we utilize a single customized crRNA array to simultaneously target multiple genes in human cells. In addition, we also demonstrate that a single customized crRNA array to target multiple sites in one gene could be achieved. Collectively, FnCpf1, a powerful genome-editing tool for multiple genomic targets, can be harnessed for effective manipulation of the human genome. Copyright © 2018 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.

  4. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

    PubMed

    Allman, Elizabeth S; Degnan, James H; Rhodes, John A

    2011-06-01

    Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

  5. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    NASA Astrophysics Data System (ADS)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  6. History of a prolific family: the Hes/Hey-related genes of the annelid Platynereis

    PubMed Central

    2014-01-01

    Background The Hes superfamily or Hes/Hey-related genes encompass a variety of metazoan-specific bHLH genes, with somewhat fuzzy phylogenetic relationships. Hes superfamily members are involved in a variety of major developmental mechanisms in metazoans, notably in neurogenesis and segmentation processes, in which they often act as direct effector genes of the Notch signaling pathway. Results We have investigated the molecular and functional evolution of the Hes superfamily in metazoans using the lophotrochozoan Platynereis dumerilii as model. Our phylogenetic analyses of more than 200 Metazoan Hes/Hey-related genes revealed the presence of five families, three of them (Hes, Hey and Helt) being pan-metazoan. Those families were likely composed of a unique representative in the last common metazoan ancestor. The evolution of the Hes family was shaped by many independent lineage specific tandem duplication events. The expression patterns of 13 of the 15 Hes/Hey-related genes in Platynereis indicate a broad functional diversification. Nevertheless, a majority of these genes are involved in two crucial developmental processes in annelids: neurogenesis and segmentation, resembling functions highlighted in other animal models. Conclusions Combining phylogenetic and expression data, our study suggests an unusual evolutionary history for the Hes superfamily. An ancestral multifunctional annelid Hes gene may have undergone multiples rounds of duplication-degeneration-complementation processes in the lineage leading to Platynereis, each gene copies ensuring their maintenance in the genome by subfunctionalisation. Similar but independent waves of duplications are at the origin of the multiplicity of Hes genes in other metazoan lineages. PMID:25250171

  7. An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

    PubMed Central

    2016-01-01

    Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Young June; Ahn, Kwang Sung; Kim, Minjeong

    Highlights: • ATM gene-targeted pigs were produced by somatic cell nuclear transfer. • A novel large animal model for ataxia telangiectasia was developed. • The new model may provide an alternative to the mouse model. - Abstract: Ataxia telangiectasia (A-T) is a recessive autosomal disorder associated with pleiotropic phenotypes, including progressive cerebellar degeneration, gonad atrophy, and growth retardation. Even though A-T is known to be caused by the mutations in the Ataxia telangiectasia mutated (ATM) gene, the correlation between abnormal cellular physiology caused by ATM mutations and the multiple symptoms of A-T disease has not been clearly determined. None ofmore » the existing ATM mouse models properly reflects the extent to which neurological degeneration occurs in human. In an attempt to provide a large animal model for A-T, we produced gene-targeted pigs with mutations in the ATM gene by somatic cell nuclear transfer. The disrupted allele in the ATM gene of cloned piglets was confirmed via PCR and Southern blot analysis. The ATM gene-targeted pigs generated in the present study may provide an alternative to the current mouse model for the study of mechanisms underlying A-T disorder and for the development of new therapies.« less

  9. Discovering time-lagged rules from microarray data using gene profile classifiers

    PubMed Central

    2011-01-01

    Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. PMID:21524308

  10. Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

    PubMed

    Hu, Jianhua; Wright, Fred A

    2007-03-01

    The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.

  11. Patterns of Nucleotide Diversity at Photoperiod Related Genes in Norway Spruce [Picea abies (L.) Karst.

    PubMed Central

    Källman, Thomas; De Mita, Stéphane; Larsson, Hanna; Gyllenstrand, Niclas; Heuertz, Myriam; Parducci, Laura; Suyama, Yoshihisa; Lagercrantz, Ulf; Lascoux, Martin

    2014-01-01

    The ability of plants to track seasonal changes is largely dependent on genes assigned to the photoperiod pathway, and variation in those genes is thereby important for adaptation to local day length conditions. Extensive physiological data in several temperate conifer species suggest that populations are adapted to local light conditions, but data on the genes underlying this adaptation are more limited. Here we present nucleotide diversity data from 19 genes putatively involved in photoperiodic response in Norway spruce (Picea abies). Based on similarity to model plants the genes were grouped into three categories according to their presumed position in the photoperiod pathway: photoreceptors, circadian clock genes, and downstream targets. An HKA (Hudson, Kreitman and Aquade) test showed a significant excess of diversity at photoreceptor genes, but no departure from neutrality at circadian genes and downstream targets. Departures from neutrality were also tested with Tajima's D and Fay and Wu's H statistics under three demographic scenarios: the standard neutral model, a population expansion model, and a more complex population split model. Only one gene, the circadian clock gene PaPRR3 with a highly positive Tajima's D value, deviates significantly from all tested demographic scenarios. As the PaPRR3 gene harbours multiple non-synonymous variants it appears as an excellent candidate gene for control of photoperiod response in Norway spruce. PMID:24810273

  12. Patterns of nucleotide diversity at photoperiod related genes in Norway spruce [Picea abies (L.) Karst].

    PubMed

    Källman, Thomas; De Mita, Stéphane; Larsson, Hanna; Gyllenstrand, Niclas; Heuertz, Myriam; Parducci, Laura; Suyama, Yoshihisa; Lagercrantz, Ulf; Lascoux, Martin

    2014-01-01

    The ability of plants to track seasonal changes is largely dependent on genes assigned to the photoperiod pathway, and variation in those genes is thereby important for adaptation to local day length conditions. Extensive physiological data in several temperate conifer species suggest that populations are adapted to local light conditions, but data on the genes underlying this adaptation are more limited. Here we present nucleotide diversity data from 19 genes putatively involved in photoperiodic response in Norway spruce (Picea abies). Based on similarity to model plants the genes were grouped into three categories according to their presumed position in the photoperiod pathway: photoreceptors, circadian clock genes, and downstream targets. An HKA (Hudson, Kreitman and Aquade) test showed a significant excess of diversity at photoreceptor genes, but no departure from neutrality at circadian genes and downstream targets. Departures from neutrality were also tested with Tajima's D and Fay and Wu's H statistics under three demographic scenarios: the standard neutral model, a population expansion model, and a more complex population split model. Only one gene, the circadian clock gene PaPRR3 with a highly positive Tajima's D value, deviates significantly from all tested demographic scenarios. As the PaPRR3 gene harbours multiple non-synonymous variants it appears as an excellent candidate gene for control of photoperiod response in Norway spruce.

  13. Changes in expression of genes involved in apoptosis in activated human T-cells in response to modeled microgravity

    NASA Astrophysics Data System (ADS)

    Ward, Nancy E.; Pellis, Neal R.; Risin, Diana; Risin, Semyon A.; Liu, Wenbin

    2006-09-01

    Space flights result in remarkable effects on various physiological systems, including a decline in cellular immune functions. Previous studies have shown that exposure to microgravity, both true and modeled, can cause significant changes in numerous lymphocyte functions. The purpose of this study was to search for microgravity-sensitive genes, and specifically for apoptotic genes influenced by the microgravity environment and other genes related to immune response. The experiments were performed on anti-CD3 and IL-2 activated human T cells. To model microgravity conditions we have utilized the NASA rotating wall vessel bioreactor. Control lymphocytes were cultured in static 1g conditions. To assess gene expression we used DNA microarray chip technology. We had shown that multiple genes (approximately 3-8% of tested genes) respond to microgravity conditions by 1.5 and more fold change in expression. There is a significant variability in the response. However, a certain reproducible pattern in gene response could be identified. Among the genes showing reproducible changes in expression in modeled microgravity, several genes involved in apoptosis as well as in immune response were identified. These are IL-7 receptor, Granzyme B, Beta-3-endonexin, Apo2 ligand and STAT1. Possible functional consequences of these changes are discussed.

  14. Patterns of gene flow and selection across multiple species of Acrocephalus warblers: footprints of parallel selection on the Z chromosome.

    PubMed

    Reifová, Radka; Majerová, Veronika; Reif, Jiří; Ahola, Markus; Lindholm, Antero; Procházka, Petr

    2016-06-16

    Understanding the mechanisms and selective forces leading to adaptive radiations and origin of biodiversity is a major goal of evolutionary biology. Acrocephalus warblers are small passerines that underwent an adaptive radiation in the last approximately 10 million years that gave rise to 37 extant species, many of which still hybridize in nature. Acrocephalus warblers have served as model organisms for a wide variety of ecological and behavioral studies, yet our knowledge of mechanisms and selective forces driving their radiation is limited. Here we studied patterns of interspecific gene flow and selection across three European Acrocephalus warblers to get a first insight into mechanisms of radiation of this avian group. We analyzed nucleotide variation at eight nuclear loci in three hybridizing Acrocephalus species with overlapping breeding ranges in Europe. Using an isolation-with-migration model for multiple populations, we found evidence for unidirectional gene flow from A. scirpaceus to A. palustris and from A. palustris to A. dumetorum. Gene flow was higher between genetically more closely related A. scirpaceus and A. palustris than between ecologically more similar A. palustris and A. dumetorum, suggesting that gradual accumulation of intrinsic barriers rather than divergent ecological selection are more efficient in restricting interspecific gene flow in Acrocephalus warblers. Although levels of genetic differentiation between different species pairs were in general not correlated, we found signatures of apparently independent instances of positive selection at the same two Z-linked loci in multiple species. Our study brings the first evidence that gene flow occurred during Acrocephalus radiation and not only between sister species. Interspecific gene flow could thus be an important source of genetic variation in individual Acrocephalus species and could have accelerated adaptive evolution and speciation rate in this avian group by creating novel genetic combinations and new phenotypes. Independent instances of positive selection at the same loci in multiple species indicate an interesting possibility that the same loci might have contributed to reproductive isolation in several speciation events.

  15. Genotype-Based Association Mapping of Complex Diseases: Gene-Environment Interactions with Multiple Genetic Markers and Measurement Error in Environmental Exposures

    PubMed Central

    Lobach, Irvna; Fan, Ruzone; Carroll, Raymond T.

    2011-01-01

    With the advent of dense single nucleotide polymorphism genotyping, population-based association studies have become the major tools for identifying human disease genes and for fine gene mapping of complex traits. We develop a genotype-based approach for association analysis of case-control studies of gene-environment interactions in the case when environmental factors are measured with error and genotype data are available on multiple genetic markers. To directly use the observed genotype data, we propose two genotype-based models: genotype effect and additive effect models. Our approach offers several advantages. First, the proposed risk functions can directly incorporate the observed genotype data while modeling the linkage disequihbrium information in the regression coefficients, thus eliminating the need to infer haplotype phase. Compared with the haplotype-based approach, an estimating procedure based on the proposed methods can be much simpler and significantly faster. In addition, there is no potential risk due to haplotype phase estimation. Further, by fitting the proposed models, it is possible to analyze the risk alleles/variants of complex diseases, including their dominant or additive effects. To model measurement error, we adopt the pseudo-likelihood method by Lobach et al. [2008]. Performance of the proposed method is examined using simulation experiments. An application of our method is illustrated using a population-based case-control study of association between calcium intake with the risk of colorectal adenoma development. PMID:21031455

  16. Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning.

    PubMed

    Wu, Mengmeng; Zeng, Wanwen; Liu, Wenqiang; Lv, Hairong; Chen, Ting; Jiang, Rui

    2018-06-03

    Genome-wide association studies (GWAS) have successfully discovered a number of disease-associated genetic variants in the past decade, providing an unprecedented opportunity for deciphering genetic basis of human inherited diseases. However, it is still a challenging task to extract biological knowledge from the GWAS data, due to such issues as missing heritability and weak interpretability. Indeed, the fact that the majority of discovered loci fall into noncoding regions without clear links to genes has been preventing the characterization of their functions and appealing for a sophisticated approach to bridge genetic and genomic studies. Towards this problem, network-based prioritization of candidate genes, which performs integrated analysis of gene networks with GWAS data, has emerged as a promising direction and attracted much attention. However, most existing methods overlook the sparse and noisy properties of gene networks and thus may lead to suboptimal performance. Motivated by this understanding, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize candidate genes for complex diseases. We leveraged a technique called the network representation learning to embed a gene network into a compact and robust feature space, and then designed a hierarchical statistical model to integrate features of multiple gene networks with GWAS data for the effective inference of genes associated with a disease of interest. We applied our method to six complex diseases and demonstrated the superior performance of REGENT over existing approaches in recovering known disease-associated genes. We further conducted a pathway analysis and showed that the ability of REGENT to discover disease-associated pathways. We expect to see applications of our method to a broad spectrum of diseases for post-GWAS analysis. REGENT is freely available at https://github.com/wmmthu/REGENT. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. Omics analysis of mouse brain models of human diseases.

    PubMed

    Paban, Véronique; Loriod, Béatrice; Villard, Claude; Buee, Luc; Blum, David; Pietropaolo, Susanna; Cho, Yoon H; Gory-Faure, Sylvie; Mansour, Elodie; Gharbi, Ali; Alescio-Lautier, Béatrice

    2017-02-05

    The identification of common gene/protein profiles related to brain alterations, if they exist, may indicate the convergence of the pathogenic mechanisms driving brain disorders. Six genetically engineered mouse lines modelling neurodegenerative diseases and neuropsychiatric disorders were considered. Omics approaches, including transcriptomic and proteomic methods, were used. The gene/protein lists were used for inter-disease comparisons and further functional and network investigations. When the inter-disease comparison was performed using the gene symbol identifiers, the number of genes/proteins involved in multiple diseases decreased rapidly. Thus, no genes/proteins were shared by all 6 mouse models. Only one gene/protein (Gfap) was shared among 4 disorders, providing strong evidence that a common molecular signature does not exist among brain diseases. The inter-disease comparison of functional processes showed the involvement of a few major biological processes indicating that brain diseases of diverse aetiologies might utilize common biological pathways in the nervous system, without necessarily involving similar molecules. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Synchronous versus asynchronous modeling of gene regulatory networks.

    PubMed

    Garg, Abhishek; Di Cara, Alessandro; Xenarios, Ioannis; Mendoza, Luis; De Micheli, Giovanni

    2008-09-01

    In silico modeling of gene regulatory networks has gained some momentum recently due to increased interest in analyzing the dynamics of biological systems. This has been further facilitated by the increasing availability of experimental data on gene-gene, protein-protein and gene-protein interactions. The two dynamical properties that are often experimentally testable are perturbations and stable steady states. Although a lot of work has been done on the identification of steady states, not much work has been reported on in silico modeling of cellular differentiation processes. In this manuscript, we provide algorithms based on reduced ordered binary decision diagrams (ROBDDs) for Boolean modeling of gene regulatory networks. Algorithms for synchronous and asynchronous transition models have been proposed and their corresponding computational properties have been analyzed. These algorithms allow users to compute cyclic attractors of large networks that are currently not feasible using existing software. Hereby we provide a framework to analyze the effect of multiple gene perturbation protocols, and their effect on cell differentiation processes. These algorithms were validated on the T-helper model showing the correct steady state identification and Th1-Th2 cellular differentiation process. The software binaries for Windows and Linux platforms can be downloaded from http://si2.epfl.ch/~garg/genysis.html.

  19. Dissecting Embryonic Stem Cell Self-Renewal and Differentiation Commitment from Quantitative Models.

    PubMed

    Hu, Rong; Dai, Xianhua; Dai, Zhiming; Xiang, Qian; Cai, Yanning

    2016-10-01

    To model quantitatively embryonic stem cell (ESC) self-renewal and differentiation by computational approaches, we developed a unified mathematical model for gene expression involved in cell fate choices. Our quantitative model comprised ESC master regulators and lineage-specific pivotal genes. It took the factors of multiple pathways as input and computed expression as a function of intrinsic transcription factors, extrinsic cues, epigenetic modifications, and antagonism between ESC master regulators and lineage-specific pivotal genes. In the model, the differential equations of expression of genes involved in cell fate choices from regulation relationship were established according to the transcription and degradation rates. We applied this model to the Murine ESC self-renewal and differentiation commitment and found that it modeled the expression patterns with good accuracy. Our model analysis revealed that Murine ESC was an attractor state in culture and differentiation was predominantly caused by antagonism between ESC master regulators and lineage-specific pivotal genes. Moreover, antagonism among lineages played a critical role in lineage reprogramming. Our results also uncovered that the ordered expression alteration of ESC master regulators over time had a central role in ESC differentiation fates. Our computational framework was generally applicable to most cell-type maintenance and lineage reprogramming.

  20. Meta-analysis identifies gene-by-environment interactions as demonstrated in a study of 4,965 mice.

    PubMed

    Kang, Eun Yong; Han, Buhm; Furlotte, Nicholas; Joo, Jong Wha J; Shih, Diana; Davis, Richard C; Lusis, Aldons J; Eskin, Eleazar

    2014-01-01

    Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study.

  1. Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice

    PubMed Central

    Joo, Jong Wha J.; Shih, Diana; Davis, Richard C.; Lusis, Aldons J.; Eskin, Eleazar

    2014-01-01

    Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study. PMID:24415945

  2. Analysis of Cytoskeletal and Motility Proteins in the Sea Urchin Genome Assembly

    PubMed Central

    RL, Morris; MP, Hoffman; RA, Obar; SS, McCafferty; IR, Gibbons; AD, Leone; J, Cool; EL, Allgood; AM, Musante; KM, Judkins; BJ, Rossetti; AP, Rawson; DR, Burgess

    2007-01-01

    The sea urchin embryo is a classical model system for studying the role of the cytoskeleton in such events as fertilization, mitosis, cleavage, cell migration and gastrulation. We have conducted an analysis of gene models derived from the Strongylocentrotus purpuratus genome assembly and have gathered strong evidence for the existence of multiple gene families encoding cytoskeletal proteins and their regulators in sea urchin. While many cytoskeletal genes have been cloned from sea urchin with sequences already existing in public databases, genome analysis reveals a significantly higher degree of diversity within certain gene families. Furthermore, genes are described corresponding to homologs of cytoskeletal proteins not previously documented in sea urchins. To illustrate the varying degree of sequence diversity that exists within cytoskeletal gene families, we conducted an analysis of genes encoding actins, specific actin-binding proteins, myosins, tubulins, kinesins, dyneins, specific microtubule-associated proteins, and intermediate filaments. We conducted ontological analysis of select genes to better understand the relatedness of urchin cytoskeletal genes to those of other deuterostomes. We analyzed developmental expression (EST) data to confirm the existence of select gene models and to understand their differential expression during various stages of early development. PMID:17027957

  3. Association of a novel point mutation in MSH2 gene with familial multiple primary cancers.

    PubMed

    Hu, Hai; Li, Hong; Jiao, Feng; Han, Ting; Zhuo, Meng; Cui, Jiujie; Li, Yixue; Wang, Liwei

    2017-10-03

    Multiple primary cancers (MPC) have been identified as two or more cancers without any subordinate relationship that occur either simultaneously or metachronously in the same or different organs of an individual. Lynch syndrome is an autosomal dominant genetic disorder that increases the risk of many types of cancers. Lynch syndrome patients who suffer more than two cancers can also be considered as MPC; patients of this kind provide unique resources to learn how genetic mutation causes MPC in different tissues. We performed a whole genome sequencing on blood cells and two tumor samples of a Lynch syndrome patient who was diagnosed with five primary cancers. The mutational landscape of the tumors, including somatic point mutations and copy number alternations, was characterized. We also compared Lynch syndrome with sporadic cancers and proposed a model to illustrate the mutational process by which Lynch syndrome progresses to MPC. We revealed a novel pathologic mutation on the MSH2 gene (G504 splicing) that associates with Lynch syndrome. Systematical comparison of the mutation landscape revealed that multiple cancers in the proband were evolutionarily independent. Integrative analysis showed that truncating mutations of DNA mismatch repair (MMR) genes were significantly enriched in the patient. A mutation progress model that included germline mutations of MMR genes, double hits of MMR system, mutations in tissue-specific driver genes, and rapid accumulation of additional passenger mutations was proposed to illustrate how MPC occurs in Lynch syndrome patients. Our findings demonstrate that both germline and somatic alterations are driving forces of carcinogenesis, which may resolve the carcinogenic theory of Lynch syndrome.

  4. A Partial Least Square Approach for Modeling Gene-gene and Gene-environment Interactions When Multiple Markers Are Genotyped

    PubMed Central

    Wang, Tao; Ho, Gloria; Ye, Kenny; Strickler, Howard; Elston, Robert C.

    2008-01-01

    Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense SNPs in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches: the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey’s 1-df model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women’s Health Initiative (WHI), this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with BMI. PMID:18615621

  5. A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped.

    PubMed

    Wang, Tao; Ho, Gloria; Ye, Kenny; Strickler, Howard; Elston, Robert C

    2009-01-01

    Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense single nucleotype polymorphisms (SNPs) in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches, the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey's one-degree-of-freedom model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women's Health Initiative, this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with body mass index.

  6. A robust prognostic signature for hormone-positive node-negative breast cancer.

    PubMed

    Griffith, Obi L; Pepin, François; Enache, Oana M; Heiser, Laura M; Collisson, Eric A; Spellman, Paul T; Gray, Joe W

    2013-01-01

    Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.

  7. A robust prognostic signature for hormone-positive node-negative breast cancer

    PubMed Central

    2013-01-01

    Background Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. Conclusions RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment. PMID:24112773

  8. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    PubMed Central

    de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710

  9. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  10. In vivo endothelial siRNA delivery using polymeric nanoparticles with low molecular weight

    NASA Astrophysics Data System (ADS)

    Dahlman, James E.; Barnes, Carmen; Khan, Omar F.; Thiriot, Aude; Jhunjunwala, Siddharth; Shaw, Taylor E.; Xing, Yiping; Sager, Hendrik B.; Sahay, Gaurav; Speciner, Lauren; Bader, Andrew; Bogorad, Roman L.; Yin, Hao; Racie, Tim; Dong, Yizhou; Jiang, Shan; Seedorf, Danielle; Dave, Apeksha; Singh Sandhu, Kamaljeet; Webber, Matthew J.; Novobrantseva, Tatiana; Ruda, Vera M.; Lytton-Jean, Abigail K. R.; Levins, Christopher G.; Kalish, Brian; Mudge, Dayna K.; Perez, Mario; Abezgauz, Ludmila; Dutta, Partha; Smith, Lynelle; Charisse, Klaus; Kieran, Mark W.; Fitzgerald, Kevin; Nahrendorf, Matthias; Danino, Dganit; Tuder, Rubin M.; von Andrian, Ulrich H.; Akinc, Akin; Panigrahy, Dipak; Schroeder, Avi; Koteliansky, Victor; Langer, Robert; Anderson, Daniel G.

    2014-08-01

    Dysfunctional endothelium contributes to more diseases than any other tissue in the body. Small interfering RNAs (siRNAs) can help in the study and treatment of endothelial cells in vivo by durably silencing multiple genes simultaneously, but efficient siRNA delivery has so far remained challenging. Here, we show that polymeric nanoparticles made of low-molecular-weight polyamines and lipids can deliver siRNA to endothelial cells with high efficiency, thereby facilitating the simultaneous silencing of multiple endothelial genes in vivo. Unlike lipid or lipid-like nanoparticles, this formulation does not significantly reduce gene expression in hepatocytes or immune cells even at the dosage necessary for endothelial gene silencing. These nanoparticles mediate the most durable non-liver silencing reported so far and facilitate the delivery of siRNAs that modify endothelial function in mouse models of vascular permeability, emphysema, primary tumour growth and metastasis.

  11. Uptake, Results, and Outcomes of Germline Multiple-Gene Sequencing After Diagnosis of Breast Cancer.

    PubMed

    Kurian, Allison W; Ward, Kevin C; Hamilton, Ann S; Deapen, Dennis M; Abrahamse, Paul; Bondarenko, Irina; Li, Yun; Hawley, Sarah T; Morrow, Monica; Jagsi, Reshma; Katz, Steven J

    2018-05-10

    Low-cost sequencing of multiple genes is increasingly available for cancer risk assessment. Little is known about uptake or outcomes of multiple-gene sequencing after breast cancer diagnosis in community practice. To examine the effect of multiple-gene sequencing on the experience and treatment outcomes for patients with breast cancer. For this population-based retrospective cohort study, patients with breast cancer diagnosed from January 2013 to December 2015 and accrued from SEER registries across Georgia and in Los Angeles, California, were surveyed (n = 5080, response rate = 70%). Responses were merged with SEER data and results of clinical genetic tests, either BRCA1 and BRCA2 (BRCA1/2) sequencing only or including additional other genes (multiple-gene sequencing), provided by 4 laboratories. Type of testing (multiple-gene sequencing vs BRCA1/2-only sequencing), test results (negative, variant of unknown significance, or pathogenic variant), patient experiences with testing (timing of testing, who discussed results), and treatment (strength of patient consideration of, and surgeon recommendation for, prophylactic mastectomy), and prophylactic mastectomy receipt. We defined a patient subgroup with higher pretest risk of carrying a pathogenic variant according to practice guidelines. Among 5026 patients (mean [SD] age, 59.9 [10.7]), 1316 (26.2%) were linked to genetic results from any laboratory. Multiple-gene sequencing increasingly replaced BRCA1/2-only testing over time: in 2013, the rate of multiple-gene sequencing was 25.6% and BRCA1/2-only testing, 74.4%;in 2015 the rate of multiple-gene sequencing was 66.5% and BRCA1/2-only testing, 33.5%. Multiple-gene sequencing was more often ordered by genetic counselors (multiple-gene sequencing, 25.5% and BRCA1/2-only testing, 15.3%) and delayed until after surgery (multiple-gene sequencing, 32.5% and BRCA1/2-only testing, 19.9%). Multiple-gene sequencing substantially increased rate of detection of any pathogenic variant (multiple-gene sequencing: higher-risk patients, 12%; average-risk patients, 4.2% and BRCA1/2-only testing: higher-risk patients, 7.8%; average-risk patients, 2.2%) and variants of uncertain significance, especially in minorities (multiple-gene sequencing: white patients, 23.7%; black patients, 44.5%; and Asian patients, 50.9% and BRCA1/2-only testing: white patients, 2.2%; black patients, 5.6%; and Asian patients, 0%). Multiple-gene sequencing was not associated with an increase in the rate of prophylactic mastectomy use, which was highest with pathogenic variants in BRCA1/2 (BRCA1/2, 79.0%; other pathogenic variant, 37.6%; variant of uncertain significance, 30.2%; negative, 35.3%). Multiple-gene sequencing rapidly replaced BRCA1/2-only testing for patients with breast cancer in the community and enabled 2-fold higher detection of clinically relevant pathogenic variants without an associated increase in prophylactic mastectomy. However, important targets for improvement in the clinical utility of multiple-gene sequencing include postsurgical delay and racial/ethnic disparity in variants of uncertain significance.

  12. Circuit-wide Transcriptional Profiling Reveals Brain Region-Specific Gene Networks Regulating Depression Susceptibility.

    PubMed

    Bagot, Rosemary C; Cates, Hannah M; Purushothaman, Immanuel; Lorsch, Zachary S; Walker, Deena M; Wang, Junshi; Huang, Xiaojie; Schlüter, Oliver M; Maze, Ian; Peña, Catherine J; Heller, Elizabeth A; Issler, Orna; Wang, Minghui; Song, Won-Min; Stein, Jason L; Liu, Xiaochuan; Doyle, Marie A; Scobie, Kimberly N; Sun, Hao Sheng; Neve, Rachael L; Geschwind, Daniel; Dong, Yan; Shen, Li; Zhang, Bin; Nestler, Eric J

    2016-06-01

    Depression is a complex, heterogeneous disorder and a leading contributor to the global burden of disease. Most previous research has focused on individual brain regions and genes contributing to depression. However, emerging evidence in humans and animal models suggests that dysregulated circuit function and gene expression across multiple brain regions drive depressive phenotypes. Here, we performed RNA sequencing on four brain regions from control animals and those susceptible or resilient to chronic social defeat stress at multiple time points. We employed an integrative network biology approach to identify transcriptional networks and key driver genes that regulate susceptibility to depressive-like symptoms. Further, we validated in vivo several key drivers and their associated transcriptional networks that regulate depression susceptibility and confirmed their functional significance at the levels of gene transcription, synaptic regulation, and behavior. Our study reveals novel transcriptional networks that control stress susceptibility and offers fundamentally new leads for antidepressant drug discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Bayesian state space models for dynamic genetic network construction across multiple tissues.

    PubMed

    Liang, Yulan; Kelemen, Arpad

    2016-08-01

    Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.

  14. Limited Agreement of Independent RNAi Screens for Virus-Required Host Genes Owes More to False-Negative than False-Positive Factors

    PubMed Central

    Wang, Zhishi; Craven, Mark; Newton, Michael A.; Ahlquist, Paul

    2013-01-01

    Systematic, genome-wide RNA interference (RNAi) analysis is a powerful approach to identify gene functions that support or modulate selected biological processes. An emerging challenge shared with some other genome-wide approaches is that independent RNAi studies often show limited agreement in their lists of implicated genes. To better understand this, we analyzed four genome-wide RNAi studies that identified host genes involved in influenza virus replication. These studies collectively identified and validated the roles of 614 cell genes, but pair-wise overlap among the four gene lists was only 3% to 15% (average 6.7%). However, a number of functional categories were overrepresented in multiple studies. The pair-wise overlap of these enriched-category lists was high, ∼19%, implying more agreement among studies than apparent at the gene level. Probing this further, we found that the gene lists implicated by independent studies were highly connected in interacting networks by independent functional measures such as protein-protein interactions, at rates significantly higher than predicted by chance. We also developed a general, model-based approach to gauge the effects of false-positive and false-negative factors and to estimate, from a limited number of studies, the total number of genes involved in a process. For influenza virus replication, this novel statistical approach estimates the total number of cell genes involved to be ∼2,800. This and multiple other aspects of our experimental and computational results imply that, when following good quality control practices, the low overlap between studies is primarily due to false negatives rather than false-positive gene identifications. These results and methods have implications for and applications to multiple forms of genome-wide analysis. PMID:24068911

  15. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  16. The Human CHRNA7 and CHRFAM7A Genes: A Review of the Genetics, Regulation, and Function

    PubMed Central

    Sinkus, Melissa L.; Graw, Sharon; Freedman, Robert; Ross, Randal G.; Lester, Henry A.; Leonard, Sherry

    2015-01-01

    The human α7 neuronal nicotinic acetylcholine receptor gene (CHRNA7) is ubiquitously expressed in both the central nervous system and in the periphery. CHRNA7 is genetically linked to multiple disorders with cognitive deficits, including schizophrenia, bipolar disorder, ADHD, epilepsy, Alzheimer’s disease, and Rett syndrome. The regulation of CHRNA7 is complex; more than a dozen mechanisms are known, one of which is a partial duplication of the parent gene. Exons 5-10 of CHRNA7 on chromosome 15 were duplicated and inserted 1.6 Mb upstream of CHRNA7, interrupting an earlier partial duplication of two other genes. The chimeric CHRFAM7A gene product, dupα7, assembles with α7 subunits, resulting in a dominant negative regulation of function. The duplication is human specific, occurring neither in primates nor in rodents. The duplicated α7 sequence in exons 5-10 of CHRFAM7A is almost identical to CHRNA7, and thus is not completely queried in high throughput genetic studies (GWAS). Further, pre-clinical animal models of the α7nAChR utilized in drug development research do not have CHRFAM7A (dupα7) and cannot fully model human drug responses. The wide expression of CHRNA7, its multiple functions and modes of regulation present challenges for study of this gene in disease. PMID:25701707

  17. DOPA Decarboxylase Modulates Tau Toxicity.

    PubMed

    Kow, Rebecca L; Sikkema, Carl; Wheeler, Jeanna M; Wilkinson, Charles W; Kraemer, Brian C

    2018-03-01

    The microtubule-associated protein tau accumulates into toxic aggregates in multiple neurodegenerative diseases. We found previously that loss of D 2 -family dopamine receptors ameliorated tauopathy in multiple models including a Caenorhabditis elegans model of tauopathy. To better understand how loss of D 2 -family dopamine receptors can ameliorate tau toxicity, we screened a collection of C. elegans mutations in dopamine-related genes (n = 45) for changes in tau transgene-induced behavioral defects. These included many genes responsible for dopamine synthesis, metabolism, and signaling downstream of the D 2 receptors. We identified one dopamine synthesis gene, DOPA decarboxylase (DDC), as a suppressor of tau toxicity in tau transgenic worms. Loss of the C. elegans DDC gene, bas-1, ameliorated the behavioral deficits of tau transgenic worms, reduced phosphorylated and detergent-insoluble tau accumulation, and reduced tau-mediated neuron loss. Loss of function in other genes in the dopamine and serotonin synthesis pathways did not alter tau-induced toxicity; however, their function is required for the suppression of tau toxicity by bas-1. Additional loss of D 2 -family dopamine receptors did not synergize with bas-1 suppression of tauopathy phenotypes. Loss of the DDC bas-1 reduced tau-induced toxicity in a C. elegans model of tauopathy, while loss of no other dopamine or serotonin synthesis genes tested had this effect. Because loss of activity upstream of DDC could reduce suppression of tau by DDC, this suggests the possibility that loss of DDC suppresses tau via the combined accumulation of dopamine precursor levodopa and serotonin precursor 5-hydroxytryptophan. Published by Elsevier Inc.

  18. Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data.

    PubMed

    Yang, Mingxing; Li, Xiumin; Li, Zhibin; Ou, Zhimin; Liu, Ming; Liu, Suhuan; Li, Xuejun; Yang, Shuyu

    2013-01-01

    DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub's leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.

  19. Circadian abnormalities in mouse models of Smith-Magenis syndrome: evidence for involvement of RAI1.

    PubMed

    Lacaria, Melanie; Gu, Wenli; Lupski, James R

    2013-07-01

    Smith-Magenis syndrome (SMS; OMIM 182290) is a genomic disorder characterized by multiple congenital anomalies, intellectual disability, behavioral abnormalities, and disordered sleep resulting from an ~3.7 Mb deletion copy number variant (CNV) on chromosome 17p11.2 or from point mutations in the gene RAI1. The reciprocal duplication of this region results in another genomic disorder, Potocki-Lupski syndrome (PTLS; OMIM 610883), characterized by autism, intellectual disability, and congenital anomalies. We previously used chromosome-engineering and gene targeting to generate mouse models for PTLS (Dp(11)17/+), and SMS due to either deletion CNV or gene knock-out (Df(11)17-2/+ and Rai1(+/-) , respectively) and we observed phenotypes in these mouse models consistent with their associated human syndromes. To investigate the contribution of individual genes to the circadian phenotypes observed in SMS, we now report the analysis of free-running period lengths in Rai1(+/-) and Df(11)17-2/+ mice, as well as in mice deficient for another known circadian gene mapping within the commonly deleted/duplicated region, Dexras1, and we compare these results to those previously observed in Dp(11)17/+ mice. Reduced free-running period lengths were seen in Df(11)17-2/+, Rai1(+/-) , and Dexras1(-/-) , but not Dexras1(+/-) mice, suggesting that Rai1 may be the primary gene underlying the circadian defects in SMS. However, we cannot rule out the possibility that cis effects between multiple haploinsufficient genes in the SMS critical interval (e.g., RAI1 and DEXRAS1) either exacerbate the circadian phenotypes observed in SMS patients with deletions or increase their penetrance in certain environments. This study also confirms a previous report of abnormal circadian function in Dexras1(-/-) mice. Copyright © 2013 Wiley Periodicals, Inc.

  20. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions.

    PubMed

    Chatterjee, Nilanjan; Kalaylioglu, Zeynep; Moslehi, Roxana; Peters, Ulrike; Wacholder, Sholom

    2006-12-01

    In modern genetic epidemiology studies, the association between the disease and a genomic region, such as a candidate gene, is often investigated using multiple SNPs. We propose a multilocus test of genetic association that can account for genetic effects that might be modified by variants in other genes or by environmental factors. We consider use of the venerable and parsimonious Tukey's 1-degree-of-freedom model of interaction, which is natural when individual SNPs within a gene are associated with disease through a common biological mechanism; in contrast, many standard regression models are designed as if each SNP has unique functional significance. On the basis of Tukey's model, we propose a novel but computationally simple generalized test of association that can simultaneously capture both the main effects of the variants within a genomic region and their interactions with the variants in another region or with an environmental exposure. We compared performance of our method with that of two standard tests of association, one ignoring gene-gene/gene-environment interactions and the other based on a saturated model of interactions. We demonstrate major power advantages of our method both in analysis of data from a case-control study of the association between colorectal adenoma and DNA variants in the NAT2 genomic region, which are well known to be related to a common biological phenotype, and under different models of gene-gene interactions with use of simulated data.

  1. Genomic and Coexpression Analyses Predict Multiple Genes Involved in Triterpene Saponin Biosynthesis in Medicago truncatula[C][W

    PubMed Central

    Naoumkina, Marina A.; Modolo, Luzia V.; Huhman, David V.; Urbanczyk-Wochniak, Ewa; Tang, Yuhong; Sumner, Lloyd W.; Dixon, Richard A.

    2010-01-01

    Saponins, an important group of bioactive plant natural products, are glycosides of triterpenoid or steroidal aglycones (sapogenins). Saponins possess many biological activities, including conferring potential health benefits for humans. However, most of the steps specific for the biosynthesis of triterpene saponins remain uncharacterized at the molecular level. Here, we use comprehensive gene expression clustering analysis to identify candidate genes involved in the elaboration, hydroxylation, and glycosylation of the triterpene skeleton in the model legume Medicago truncatula. Four candidate uridine diphosphate glycosyltransferases were expressed in Escherichia coli, one of which (UGT73F3) showed specificity for multiple sapogenins and was confirmed to glucosylate hederagenin at the C28 position. Genetic loss-of-function studies in M. truncatula confirmed the in vivo function of UGT73F3 in saponin biosynthesis. This report provides a basis for future studies to define genetically the roles of multiple cytochromes P450 and glycosyltransferases in triterpene saponin biosynthesis in Medicago. PMID:20348429

  2. Three-Dimensional Transgenic Cell Models to Quantify Space Genotoxic Effects

    NASA Technical Reports Server (NTRS)

    Gonda, S.; Wu, H.; Pingerelli, P.; Glickman, B.

    2000-01-01

    In this paper we describe a three-dimensional, multicellular tissue-equivalent model, produced in NASA-designed, rotating wall bioreactors using mammalian cells engineered for genomic containment of mUltiple copies of defined target genes for genotoxic assessment. The Rat 2(lambda) fibroblasts (Stratagene, Inc.) were genetically engineered to contain high-density target genes for mutagenesis. Stable three-dimensional, multicellular spheroids were formed when human mammary epithelial cells and Rat 2(lambda) fibroblasts were cocultured on Cytodex 3 Beads in a rotating wall bioreactor. The utility of this spheroidal model for genotoxic assessment was indicated by a linear dose response curve and by results of gene sequence analysis of mutant clones from 400micron diameter spheroids following low-dose, high-energy, neon radiation exposure

  3. An in vivo and in silico approach to study cis-antisense: a short cut to higher order response

    NASA Astrophysics Data System (ADS)

    Courtney, Colleen; Varanasi, Usha; Chatterjee, Anushree

    2014-03-01

    Antisense interactions are present in all domains of life. Typically sense, antisense RNA pairs originate from overlapping genes with convergent face to face promoters, and are speculated to be involved in gene regulation. Recent studies indicate the role of transcriptional interference (TI) in regulating expression of genes in convergent orientation. Modeling antisense, TI gene regulation mechanisms allows us to understand how organisms control gene expression. We present a modeling and experimental framework to understand convergent transcription that combines the effects of transcriptional interference and cis-antisense regulation. Our model shows that combining transcriptional interference and antisense RNA interaction adds multiple-levels of regulation which affords a highly tunable biological output, ranging from first order response to complex higher-order response. To study this system we created a library of experimental constructs with engineered TI and antisense interaction by using face-to-face inducible promoters separated by carefully tailored overlapping DNA sequences to control expression of a set of fluorescent reporter proteins. Studying this gene expression mechanism allows for an understanding of higher order behavior of gene expression networks.

  4. Integration of multi-omics data for integrative gene regulatory network inference.

    PubMed

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun; Kang, Mingon

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.

  5. Integration of multi-omics data for integrative gene regulatory network inference

    PubMed Central

    Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun

    2017-01-01

    Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called ‘multi-omics data’, that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN’s capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed. PMID:29354189

  6. Sampling strategies for improving tree accuracy and phylogenetic analyses: a case study in ciliate protists, with notes on the genus Paramecium.

    PubMed

    Yi, Zhenzhen; Strüder-Kypke, Michaela; Hu, Xiaozhong; Lin, Xiaofeng; Song, Weibo

    2014-02-01

    In order to assess how dataset-selection for multi-gene analyses affects the accuracy of inferred phylogenetic trees in ciliates, we chose five genes and the genus Paramecium, one of the most widely used model protist genera, and compared tree topologies of the single- and multi-gene analyses. Our empirical study shows that: (1) Using multiple genes improves phylogenetic accuracy, even when their one-gene topologies are in conflict with each other. (2) The impact of missing data on phylogenetic accuracy is ambiguous: resolution power and topological similarity, but not number of represented taxa, are the most important criteria of a dataset for inclusion in concatenated analyses. (3) As an example, we tested the three classification models of the genus Paramecium with a multi-gene based approach, and only the monophyly of the subgenus Paramecium is supported. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Multiple interval QTL mapping and searching for PSTOL1 homologs associated with root morphology, biomass accumulation and phosphorus content in maize seedlings under low-P.

    PubMed

    Azevedo, Gabriel C; Cheavegatti-Gianotto, Adriana; Negri, Bárbara F; Hufnagel, Bárbara; E Silva, Luciano da Costa; Magalhaes, Jurandir V; Garcia, Antonio Augusto F; Lana, Ubiraci G P; de Sousa, Sylvia M; Guimaraes, Claudia T

    2015-07-07

    Modifications in root morphology are important strategies to maximize soil exploitation under phosphorus starvation in plants. Here, we used two multiple interval models to map QTLs related to root traits, biomass accumulation and P content in a maize RIL population cultivated in nutrient solution. In addition, we searched for putative maize homologs to PSTOL1, a gene responsible to enhance early root growth, P uptake and grain yield in rice and sorghum. Based on path analysis, root surface area was the root morphology component that most strongly contributed to total dry weight and to P content in maize seedling under low-P availability. Multiple interval mapping models for single (MIM) and multiple traits (MT-MIM) were combined and revealed 13 genomic regions significantly associated with the target traits in a complementary way. The phenotypic variances explained by all QTLs and their epistatic interactions using MT-MIM (23.4 to 35.5 %) were higher than in previous studies, and presented superior statistical power. Some of these QTLs were coincident with QTLs for root morphology traits and grain yield previously mapped, whereas others harbored ZmPSTOL candidate genes, which shared more than 55 % of amino acid sequence identity and a conserved serine/threonine kinase domain with OsPSTOL1. Additionally, four ZmPSTOL candidate genes co-localized with QTLs for root morphology, biomass accumulation and/or P content were preferentially expressed in roots of the parental lines that contributed the alleles enhancing the respective phenotypes. QTL mapping strategies adopted in this study revealed complementary results for single and multiple traits with high accuracy. Some QTLs, mainly the ones that were also associated with yield performance in other studies, can be good targets for marker-assisted selection to improve P-use efficiency in maize. Based on the co-localization with QTLs, the protein domain conservation and the coincidence of gene expression, we selected novel maize genes as putative homologs to PSTOL1 that will require further validation studies.

  8. DLRS: gene tree evolution in light of a species tree.

    PubMed

    Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens

    2012-11-15

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).

  9. An Unbiased Assessment of the Role of Imprinted Genes in an Intergenerational Model of Developmental Programming

    PubMed Central

    Radford, Elizabeth J.; Isganaitis, Elvira; Jimenez-Chillaron, Josep; Schroeder, Joshua; Molla, Michael; Andrews, Simon; Didier, Nathalie; Charalambous, Marika; McEwen, Kirsten; Marazzi, Giovanna; Sassoon, David; Patti, Mary-Elizabeth; Ferguson-Smith, Anne C.

    2012-01-01

    Environmental factors during early life are critical for the later metabolic health of the individual and of future progeny. In our obesogenic environment, it is of great socioeconomic importance to investigate the mechanisms that contribute to the risk of metabolic ill health. Imprinted genes, a class of functionally mono-allelic genes critical for early growth and metabolic axis development, have been proposed to be uniquely susceptible to environmental change. Furthermore, it has also been suggested that perturbation of the epigenetic reprogramming of imprinting control regions (ICRs) may play a role in phenotypic heritability following early life insults. Alternatively, the presence of multiple layers of epigenetic regulation may in fact protect imprinted genes from such perturbation. Unbiased investigation of these alternative hypotheses requires assessment of imprinted gene expression in the context of the response of the whole transcriptome to environmental assault. We therefore analyse the role of imprinted genes in multiple tissues in two affected generations of an established murine model of the developmental origins of health and disease using microarrays and quantitative RT–PCR. We demonstrate that, despite the functional mono-allelicism of imprinted genes and their unique mechanisms of epigenetic dosage control, imprinted genes as a class are neither more susceptible nor protected from expression perturbation induced by maternal undernutrition in either the F1 or the F2 generation compared to other genes. Nor do we find any evidence that the epigenetic reprogramming of ICRs in the germline is susceptible to nutritional restriction. However, we propose that those imprinted genes that are affected may play important roles in the foetal response to undernutrition and potentially its long-term sequelae. We suggest that recently described instances of dosage regulation by relaxation of imprinting are rare and likely to be highly regulated. PMID:22511876

  10. Visualization and dissemination of multidimensional proteomics data comparing protein abundance during Caenorhabditis elegans development.

    PubMed

    Riffle, Michael; Merrihew, Gennifer E; Jaschob, Daniel; Sharma, Vagisha; Davis, Trisha N; Noble, William S; MacCoss, Michael J

    2015-11-01

    Regulation of protein abundance is a critical aspect of cellular function, organism development, and aging. Alternative splicing may give rise to multiple possible proteoforms of gene products where the abundance of each proteoform is independently regulated. Understanding how the abundances of these distinct gene products change is essential to understanding the underlying mechanisms of many biological processes. Bottom-up proteomics mass spectrometry techniques may be used to estimate protein abundance indirectly by sequencing and quantifying peptides that are later mapped to proteins based on sequence. However, quantifying the abundance of distinct gene products is routinely confounded by peptides that map to multiple possible proteoforms. In this work, we describe a technique that may be used to help mitigate the effects of confounding ambiguous peptides and multiple proteoforms when quantifying proteins. We have applied this technique to visualize the distribution of distinct gene products for the whole proteome across 11 developmental stages of the model organism Caenorhabditis elegans. The result is a large multidimensional dataset for which web-based tools were developed for visualizing how translated gene products change during development and identifying possible proteoforms. The underlying instrument raw files and tandem mass spectra may also be downloaded. The data resource is freely available on the web at http://www.yeastrc.org/wormpes/ . Graphical Abstract ᅟ.

  11. Visualization and Dissemination of Multidimensional Proteomics Data Comparing Protein Abundance During Caenorhabditis elegans Development

    NASA Astrophysics Data System (ADS)

    Riffle, Michael; Merrihew, Gennifer E.; Jaschob, Daniel; Sharma, Vagisha; Davis, Trisha N.; Noble, William S.; MacCoss, Michael J.

    2015-11-01

    Regulation of protein abundance is a critical aspect of cellular function, organism development, and aging. Alternative splicing may give rise to multiple possible proteoforms of gene products where the abundance of each proteoform is independently regulated. Understanding how the abundances of these distinct gene products change is essential to understanding the underlying mechanisms of many biological processes. Bottom-up proteomics mass spectrometry techniques may be used to estimate protein abundance indirectly by sequencing and quantifying peptides that are later mapped to proteins based on sequence. However, quantifying the abundance of distinct gene products is routinely confounded by peptides that map to multiple possible proteoforms. In this work, we describe a technique that may be used to help mitigate the effects of confounding ambiguous peptides and multiple proteoforms when quantifying proteins. We have applied this technique to visualize the distribution of distinct gene products for the whole proteome across 11 developmental stages of the model organism Caenorhabditis elegans. The result is a large multidimensional dataset for which web-based tools were developed for visualizing how translated gene products change during development and identifying possible proteoforms. The underlying instrument raw files and tandem mass spectra may also be downloaded. The data resource is freely available on the web at http://www.yeastrc.org/wormpes/.

  12. Identification of two novel critical mutations in PCNT gene resulting in microcephalic osteodysplastic primordial dwarfism type II associated with multiple intracranial aneurysms.

    PubMed

    Li, Fei-Feng; Wang, Xu-Dong; Zhu, Min-Wei; Lou, Zhi-Hong; Zhang, Qiong; Zhu, Chun-Yu; Feng, Hong-Lin; Lin, Zhi-Guo; Liu, Shu-Lin

    2015-12-01

    Microcephalic osteodysplastic primordial dwarfism type II (MOPD II) is a highly detrimental human autosomal inherited recessive disorder. The hallmark characteristics of this disease are intrauterine and postnatal growth restrictions, with some patients also having cerebrovascular problems such as cerebral aneurysms. The genomic basis behind most clinical features of MOPD II remains largely unclear. The aim of this work was to identify the genetic defects in a Chinese family with MOPD II associated with multiple intracranial aneurysms. The patient had typical MOPD II syndrome, with subarachnoid hemorrhage and multiple intracranial aneurysms. We identified three novel mutations in the PCNT gene, including one single base alteration (9842A>C in exon 45) and two deletions (Del-C in exon 30 and Del-16 in exon 41). The deletions were co-segregated with the affected individual in the family and were not present in the control population. Computer modeling demonstrated that the deletions may cause drastic changes on the secondary and tertiary structures, affecting the hydrophilicity and hydrophobicity of the mutant proteins. In conclusion, we identified two novel mutations in the PCNT gene associated with MOPD II and intracranial aneurysms, and the mutations were expected to alter the stability and functioning of the protein by computer modeling.

  13. Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

    PubMed

    Liu, Xuejun; Shi, Xinxin; Chen, Chunlin; Zhang, Li

    2015-10-16

    The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.

  14. Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy

    PubMed Central

    Lohr, Jens G.; Stojanov, Petar; Carter, Scott L.; Cruz-Gordillo, Peter; Lawrence, Michael S.; Auclair, Daniel; Sougnez, Carrie; Knoechel, Birgit; Gould, Joshua; Saksena, Gordon; Cibulskis, Kristian; McKenna, Aaron; Chapman, Michael A.; Straussman, Ravid; Levy, Joan; Perkins, Louise M.; Keats, Jonathan J.; Schumacher, Steven E.; Rosenberg, Mara; Getz, Gad

    2014-01-01

    SUMMARY We performed massively parallel sequencing of paired tumor/normal samples from 203 multiple myeloma (MM) patients and identified significantly mutated genes and copy number alterations, and discovered putative tumor suppressor genes by determining homozygous deletions and loss-of-heterozygosity. We observed frequent mutations in KRAS (particularly in previously treated patients), NRAS, BRAF, FAM46C, TP53 and DIS3 (particularly in non-hyperdiploid MM). Mutations were often present in subclonal populations, and multiple mutations within the same pathway (e.g. KRAS, NRAS and BRAF) were observed in the same patient. In vitro modeling predicts only partial treatment efficacy of targeting subclonal mutations, and even growth promotion of non-mutated subclones in some cases. These results emphasize the importance of heterogeneity analysis for treatment decisions. PMID:24434212

  15. Widespread genetic heterogeneity in multiple myeloma: implications for targeted therapy.

    PubMed

    Lohr, Jens G; Stojanov, Petar; Carter, Scott L; Cruz-Gordillo, Peter; Lawrence, Michael S; Auclair, Daniel; Sougnez, Carrie; Knoechel, Birgit; Gould, Joshua; Saksena, Gordon; Cibulskis, Kristian; McKenna, Aaron; Chapman, Michael A; Straussman, Ravid; Levy, Joan; Perkins, Louise M; Keats, Jonathan J; Schumacher, Steven E; Rosenberg, Mara; Getz, Gad; Golub, Todd R

    2014-01-13

    We performed massively parallel sequencing of paired tumor/normal samples from 203 multiple myeloma (MM) patients and identified significantly mutated genes and copy number alterations and discovered putative tumor suppressor genes by determining homozygous deletions and loss of heterozygosity. We observed frequent mutations in KRAS (particularly in previously treated patients), NRAS, BRAF, FAM46C, TP53, and DIS3 (particularly in nonhyperdiploid MM). Mutations were often present in subclonal populations, and multiple mutations within the same pathway (e.g., KRAS, NRAS, and BRAF) were observed in the same patient. In vitro modeling predicts only partial treatment efficacy of targeting subclonal mutations, and even growth promotion of nonmutated subclones in some cases. These results emphasize the importance of heterogeneity analysis for treatment decisions. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. A fast and high performance multiple data integration algorithm for identifying human disease genes

    PubMed Central

    2015-01-01

    Background Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. Results In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. Conclusions The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms. PMID:26399620

  17. The Natural History of Class I Primate Alcohol Dehydrogenases Includes Gene Duplication, Gene Loss, and Gene Conversion

    PubMed Central

    Carrigan, Matthew A.; Uryasev, Oleg; Davis, Ross P.; Zhai, LanMin; Hurley, Thomas D.; Benner, Steven A.

    2012-01-01

    Background Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids. Methodology/Principal Findings To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences. Conclusions/Significance We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage. PMID:22859968

  18. Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

    PubMed

    Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

    2014-01-01

    Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.

  19. Acetylcholinesterases of Rhipicephalus (Boophilus) microplus – Multiple gene expression presents an opportune model system for elucidation of multiple functions of AChEs.

    USDA-ARS?s Scientific Manuscript database

    Acetylcholinesterase (AChE) is a key neural enzyme of both vertebrates and invertebrates, and is the biochemical target of organophosphate and carbamate pesticides for invertebrates, as well as vertebrate nerve agents, e.g., soman, tabun, VX, and others. AChE inhibitors are also key drugs among thos...

  20. Genome-wide association analysis of seedling root development in maize (Zea mays L.).

    PubMed

    Pace, Jordon; Gardner, Candice; Romay, Cinta; Ganapathysubramanian, Baskar; Lübberstedt, Thomas

    2015-02-05

    Plants rely on the root system for anchorage to the ground and the acquisition and absorption of nutrients critical to sustaining productivity. A genome wide association analysis enables one to analyze allelic diversity of complex traits and identify superior alleles. 384 inbred lines from the Ames panel were genotyped with 681,257 single nucleotide polymorphism markers using Genotyping-by-Sequencing technology and 22 seedling root architecture traits were phenotyped. Utilizing both a general linear model and mixed linear model, a GWAS study was conducted identifying 268 marker trait associations (p ≤ 5.3×10(-7)). Analysis of significant SNP markers for multiple traits showed that several were located within gene models with some SNP markers localized within regions of previously identified root quantitative trait loci. Gene model GRMZM2G153722 located on chromosome 4 contained nine significant markers. This predicted gene is expressed in roots and shoots. This study identifies putatively associated SNP markers associated with root traits at the seedling stage. Some SNPs were located within or near (<1 kb) gene models. These gene models identify possible candidate genes involved in root development at the seedling stage. These and respective linked or functional markers could be targets for breeders for marker assisted selection of seedling root traits.

  1. Meta-analysis of gene expression patterns in animal models of prenatal alcohol exposure suggests role for protein synthesis inhibition and chromatin remodeling

    PubMed Central

    Rogic, Sanja; Wong, Albertina; Pavlidis, Paul

    2017-01-01

    Background Prenatal alcohol exposure (PAE) can result in an array of morphological, behavioural and neurobiological deficits that can range in their severity. Despite extensive research in the field and a significant progress made, especially in understanding the range of possible malformations and neurobehavioral abnormalities, the molecular mechanisms of alcohol responses in development are still not well understood. There have been multiple transcriptomic studies looking at the changes in gene expression after PAE in animal models, however there is a limited apparent consensus among the reported findings. In an effort to address this issue, we performed a comprehensive re-analysis and meta-analysis of all suitable, publically available expression data sets. Methods We assembled ten microarray data sets of gene expression after PAE in mouse and rat models consisting of samples from a total of 63 ethanol-exposed and 80 control animals. We re-analyzed each data set for differential expression and then used the results to perform meta-analyses considering all data sets together or grouping them by time or duration of exposure (pre- and post-natal, acute and chronic, respectively). We performed network and Gene Ontology enrichment analysis to further characterize the identified signatures. Results For each sub-analysis we identified signatures of differential expressed genes that show support from multiple studies. Overall, the changes in gene expression were more extensive after acute ethanol treatment during prenatal development than in other models. Considering the analysis of all the data together, we identified a robust core signature of 104 genes down-regulated after PAE, with no up-regulated genes. Functional analysis reveals over-representation of genes involved in protein synthesis, mRNA splicing and chromatin organization. Conclusions Our meta-analysis shows that existing studies, despite superficial dissimilarity in findings, share features that allow us to identify a common core signature set of transcriptome changes in PAE. This is an important step to identifying the biological processes that underlie the etiology of FASD. PMID:26996386

  2. Stochastic models for inferring genetic regulation from microarray gene expression data.

    PubMed

    Tian, Tianhai

    2010-03-01

    Microarray expression profiles are inherently noisy and many different sources of variation exist in microarray experiments. It is still a significant challenge to develop stochastic models to realize noise in microarray expression profiles, which has profound influence on the reverse engineering of genetic regulation. Using the target genes of the tumour suppressor gene p53 as the test problem, we developed stochastic differential equation models and established the relationship between the noise strength of stochastic models and parameters of an error model for describing the distribution of the microarray measurements. Numerical results indicate that the simulated variance from stochastic models with a stochastic degradation process can be represented by a monomial in terms of the hybridization intensity and the order of the monomial depends on the type of stochastic process. The developed stochastic models with multiple stochastic processes generated simulations whose variance is consistent with the prediction of the error model. This work also established a general method to develop stochastic models from experimental information. 2009 Elsevier Ireland Ltd. All rights reserved.

  3. Gene therapies that restore dystrophin expression for the treatment of Duchenne muscular dystrophy

    PubMed Central

    Robinson-Hamm, Jacqueline N.; Gersbach, Charles A.

    2016-01-01

    Duchenne muscular dystrophy is one of the most common inherited genetic diseases and is caused by mutations to the DMD gene that encodes the dystrophin protein. Recent advances in genome editing and gene therapy offer hope for the development of potential therapeutics. Truncated versions of the DMD gene can be delivered to the affected tissues with viral vectors and show promising results in a variety of animal models. Genome editing with the CRISPR/Cas9 system has recently been used to restore dystrophin expression by deleting one or more exons of the DMD gene in patient cells and in a mouse model that led to functional improvement of muscle strength. Exon skipping with oligonucleotides has been successful in several animal models and evaluated in multiple clinical trials. Next-generation oligonucleotide formulations offer significant promise to build on these results. All these approaches to restoring dystrophin expression are encouraging, but many hurdles remain. This review summarizes the current state of these technologies and summarizes considerations for their future development. PMID:27542949

  4. A stochastic model for optimizing composite predictors based on gene expression profiles.

    PubMed

    Ramanathan, Murali

    2003-07-01

    This project was done to develop a mathematical model for optimizing composite predictors based on gene expression profiles from DNA arrays and proteomics. The problem was amenable to a formulation and solution analogous to the portfolio optimization problem in mathematical finance: it requires the optimization of a quadratic function subject to linear constraints. The performance of the approach was compared to that of neighborhood analysis using a data set containing cDNA array-derived gene expression profiles from 14 multiple sclerosis patients receiving intramuscular inteferon-beta1a. The Markowitz portfolio model predicts that the covariance between genes can be exploited to construct an efficient composite. The model predicts that a composite is not needed for maximizing the mean value of a treatment effect: only a single gene is needed, but the usefulness of the effect measure may be compromised by high variability. The model optimized the composite to yield the highest mean for a given level of variability or the least variability for a given mean level. The choices that meet this optimization criteria lie on a curve of composite mean vs. composite variability plot referred to as the "efficient frontier." When a composite is constructed using the model, it outperforms the composite constructed using the neighborhood analysis method. The Markowitz portfolio model may find potential applications in constructing composite biomarkers and in the pharmacogenomic modeling of treatment effects derived from gene expression endpoints.

  5. Multiple Translocation of the AVR-Pita Effector Gene among Chromosomes of the Rice Blast Fungus Magnaporthe oryzae and Related Species

    PubMed Central

    Chuma, Izumi; Isobe, Chihiro; Hotta, Yuma; Ibaragi, Kana; Futamata, Natsuru; Kusaba, Motoaki; Yoshida, Kentaro; Terauchi, Ryohei; Fujita, Yoshikatsu; Nakayashiki, Hitoshi; Valent, Barbara; Tosa, Yukio

    2011-01-01

    Magnaporthe oryzae is the causal agent of rice blast disease, a devastating problem worldwide. This fungus has caused breakdown of resistance conferred by newly developed commercial cultivars. To address how the rice blast fungus adapts itself to new resistance genes so quickly, we examined chromosomal locations of AVR-Pita, a subtelomeric gene family corresponding to the Pita resistance gene, in various isolates of M. oryzae (including wheat and millet pathogens) and its related species. We found that AVR-Pita (AVR-Pita1 and AVR-Pita2) is highly variable in its genome location, occurring in chromosomes 1, 3, 4, 5, 6, 7, and supernumerary chromosomes, particularly in rice-infecting isolates. When expressed in M. oryzae, most of the AVR-Pita homologs could elicit Pita-mediated resistance, even those from non-rice isolates. AVR-Pita was flanked by a retrotransposon, which presumably contributed to its multiple translocation across the genome. On the other hand, family member AVR-Pita3, which lacks avirulence activity, was stably located on chromosome 7 in a vast majority of isolates. These results suggest that the diversification in genome location of AVR-Pita in the rice isolates is a consequence of recognition by Pita in rice. We propose a model that the multiple translocation of AVR-Pita may be associated with its frequent loss and recovery mediated by its transfer among individuals in asexual populations. This model implies that the high mobility of AVR-Pita is a key mechanism accounting for the rapid adaptation toward Pita. Dynamic adaptation of some fungal plant pathogens may be achieved by deletion and recovery of avirulence genes using a population as a unit of adaptation. PMID:21829350

  6. Multiple PAR and E4BP4 bZIP transcription factors in zebrafish: diverse spatial and temporal expression patterns.

    PubMed

    Ben-Moshe, Zohar; Vatine, Gad; Alon, Shahar; Tovin, Adi; Mracek, Philipp; Foulkes, Nicholas S; Gothilf, Yoav

    2010-09-01

    Circadian rhythms of physiology and behavior are generated by an autonomous circadian oscillator that is synchronized daily with the environment, mainly by light input. The PAR subfamily of transcriptional activators and the related E4BP4 repressor belonging to the basic leucine zipper (bZIP) family are clock-controlled genes that are suggested to mediate downstream circadian clock processes and to feedback onto the core oscillator. Here, the authors report the characterization of these genes in the zebrafish, an increasingly important model in the field of chronobiology. Five novel PAR and six novel e4bp4 zebrafish homolog genes were identified using bioinformatic tools and their coding sequences were cloned. Based on their evolutionary relationships, these genes were annotated as ztef2, zhlf1 and zhlf2, zdbp1 and zdbp2, and ze4bp4-1 to -6. The spatial and temporal mRNA expression pattern of each of these factors was characterized in zebrafish embryos in the context of a functional circadian clock and regulation by light. Nine of the factors exhibited augmented and rhythmic expression in the pineal gland, a central clock organ in zebrafish. Moreover, these genes were found to be regulated, to variable extents, by the circadian clock and/or by light. Differential expression patterns of multiple paralogs in zebrafish suggest multiple roles for these factors within the vertebrate circadian clock. This study, in the genetically accessible zebrafish model, lays the foundation for further research regarding the involvement and specific roles of PAR and E4BP4 transcription factors in the vertebrate circadian clock mechanism.

  7. Yeast Phenomics: An Experimental Approach for Modeling Gene Interaction Networks that Buffer Disease

    PubMed Central

    Hartman, John L.; Stisher, Chandler; Outlaw, Darryl A.; Guo, Jingyu; Shah, Najaf A.; Tian, Dehua; Santos, Sean M.; Rodgers, John W.; White, Richard A.

    2015-01-01

    The genome project increased appreciation of genetic complexity underlying disease phenotypes: many genes contribute each phenotype and each gene contributes multiple phenotypes. The aspiration of predicting common disease in individuals has evolved from seeking primary loci to marginal risk assignments based on many genes. Genetic interaction, defined as contributions to a phenotype that are dependent upon particular digenic allele combinations, could improve prediction of phenotype from complex genotype, but it is difficult to study in human populations. High throughput, systematic analysis of S. cerevisiae gene knockouts or knockdowns in the context of disease-relevant phenotypic perturbations provides a tractable experimental approach to derive gene interaction networks, in order to deduce by cross-species gene homology how phenotype is buffered against disease-risk genotypes. Yeast gene interaction network analysis to date has revealed biology more complex than previously imagined. This has motivated the development of more powerful yeast cell array phenotyping methods to globally model the role of gene interaction networks in modulating phenotypes (which we call yeast phenomic analysis). The article illustrates yeast phenomic technology, which is applied here to quantify gene X media interaction at higher resolution and supports use of a human-like media for future applications of yeast phenomics for modeling human disease. PMID:25668739

  8. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway

    PubMed Central

    Wang, Fengfeng; Wong, S. C. Cesar; Chan, Lawrence W. C.; Cho, William C. S.; Yip, S. P.; Yung, Benjamin Y. M.

    2014-01-01

    Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. PMID:24895601

  9. Dynamic regulation of genetic pathways and targets during aging in Caenorhabditis elegans.

    PubMed

    He, Kan; Zhou, Tao; Shao, Jiaofang; Ren, Xiaoliang; Zhao, Zhongying; Liu, Dahai

    2014-03-01

    Numerous genetic targets and some individual pathways associated with aging have been identified using the worm model. However, less is known about the genetic mechanisms of aging in genome wide, particularly at the level of multiple pathways as well as the regulatory networks during aging. Here, we employed the gene expression datasets of three time points during aging in Caenorhabditis elegans (C. elegans) and performed the approach of gene set enrichment analysis (GSEA) on each dataset between adjacent stages. As a result, multiple genetic pathways and targets were identified as significantly down- or up-regulated. Among them, 5 truly aging-dependent signaling pathways including MAPK signaling pathway, mTOR signaling pathway, Wnt signaling pathway, TGF-beta signaling pathway and ErbB signaling pathway as well as 12 significantly associated genes were identified with dynamic expression pattern during aging. On the other hand, the continued declines in the regulation of several metabolic pathways have been demonstrated to display age-related changes. Furthermore, the reconstructed regulatory networks based on three of aging related Chromatin immunoprecipitation experiments followed by sequencing (ChIP-seq) datasets and the expression matrices of 154 involved genes in above signaling pathways provide new insights into aging at the multiple pathways level. The combination of multiple genetic pathways and targets needs to be taken into consideration in future studies of aging, in which the dynamic regulation would be uncovered.

  10. Comparative description of ten transcriptomes of newly sequenced invertebrates and efficiency estimation of genomic sampling in non-model taxa

    PubMed Central

    2012-01-01

    Introduction Traditionally, genomic or transcriptomic data have been restricted to a few model or emerging model organisms, and to a handful of species of medical and/or environmental importance. Next-generation sequencing techniques have the capability of yielding massive amounts of gene sequence data for virtually any species at a modest cost. Here we provide a comparative analysis of de novo assembled transcriptomic data for ten non-model species of previously understudied animal taxa. Results cDNA libraries of ten species belonging to five animal phyla (2 Annelida [including Sipuncula], 2 Arthropoda, 2 Mollusca, 2 Nemertea, and 2 Porifera) were sequenced in different batches with an Illumina Genome Analyzer II (read length 100 or 150 bp), rendering between ca. 25 and 52 million reads per species. Read thinning, trimming, and de novo assembly were performed under different parameters to optimize output. Between 67,423 and 207,559 contigs were obtained across the ten species, post-optimization. Of those, 9,069 to 25,681 contigs retrieved blast hits against the NCBI non-redundant database, and approximately 50% of these were assigned with Gene Ontology terms, covering all major categories, and with similar percentages in all species. Local blasts against our datasets, using selected genes from major signaling pathways and housekeeping genes, revealed high efficiency in gene recovery compared to available genomes of closely related species. Intriguingly, our transcriptomic datasets detected multiple paralogues in all phyla and in nearly all gene pathways, including housekeeping genes that are traditionally used in phylogenetic applications for their purported single-copy nature. Conclusions We generated the first study of comparative transcriptomics across multiple animal phyla (comparing two species per phylum in most cases), established the first Illumina-based transcriptomic datasets for sponge, nemertean, and sipunculan species, and generated a tractable catalogue of annotated genes (or gene fragments) and protein families for ten newly sequenced non-model organisms, some of commercial importance (i.e., Octopus vulgaris). These comprehensive sets of genes can be readily used for phylogenetic analysis, gene expression profiling, developmental analysis, and can also be a powerful resource for gene discovery. The characterization of the transcriptomes of such a diverse array of animal species permitted the comparison of sequencing depth, functional annotation, and efficiency of genomic sampling using the same pipelines, which proved to be similar for all considered species. In addition, the datasets revealed their potential as a resource for paralogue detection, a recurrent concern in various aspects of biological inquiry, including phylogenetics, molecular evolution, development, and cellular biochemistry. PMID:23190771

  11. A Comparison Study of Multivariate Fixed Models and Gene Association with Multiple Traits (GAMuT) for Next-Generation Sequencing

    PubMed Central

    Chiu, Chi-yang; Jung, Jeesun; Wang, Yifan; Weeks, Daniel E.; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Amos, Christopher I.; Mills, James L.; Boehnke, Michael; Xiong, Momiao; Fan, Ruzong

    2016-01-01

    In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate F-distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of high-dimensional genotype data. It is shown that approximate F-distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for: (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models which perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods. PMID:27917525

  12. Chronic exposure to water pollutant trichloroethylene increased epigenetic drift in CD4(+) T cells.

    PubMed

    Gilbert, Kathleen M; Blossom, Sarah J; Erickson, Stephen W; Reisfeld, Brad; Zurlinden, Todd J; Broadfoot, Brannon; West, Kirk; Bai, Shasha; Cooney, Craig A

    2016-05-01

    Autoimmune disease and CD4(+) T-cell alterations are induced in mice exposed to the water pollutant trichloroethylene (TCE). We examined here whether TCE altered gene-specific DNA methylation in CD4(+) T cells as a possible mechanism of immunotoxicity. Naive and effector/memory CD4(+) T cells from mice exposed to TCE (0.5 mg/ml in drinking water) for 40 weeks were examined by bisulfite next-generation DNA sequencing. A probabilistic model calculated from multiple genes showed that TCE decreased methylation control in CD4(+) T cells. Data from individual genes fitted to a quadratic regression model showed that TCE increased gene-specific methylation variance in both CD4 subsets. TCE increased epigenetic drift of specific CpG sites in CD4(+) T cells.

  13. Systematic reconstruction of autism biology from massive genetic mutation profiles

    PubMed Central

    Zhang, Chaolin; Jiang, Yong-hui

    2018-01-01

    Autism spectrum disorder (ASD) affects 1% of world population and has become a pressing medical and social problem worldwide. As a paradigmatic complex genetic disease, ASD has been intensively studied and thousands of gene mutations have been reported. Because these mutations rarely recur, it is difficult to (i) pinpoint the fewer disease-causing versus majority random events and (ii) replicate or verify independent studies. A coherent and systematic understanding of autism biology has not been achieved. We analyzed 3392 and 4792 autism-related mutations from two large-scale whole-exome studies across multiple resolution levels, that is, variants (single-nucleotide), genes (protein-coding unit), and pathways (molecular module). These mutations do not recur or replicate at the variant level, but significantly and increasingly do so at gene and pathway levels. Genetic association reveals a novel gene + pathway dual-hit model, where the mutation burden becomes less relevant. In multiple independent analyses, hundreds of variants or genes repeatedly converge to several canonical pathways, either novel or literature-supported. These pathways define recurrent and systematic ASD biology, distinct from previously reported gene groups or networks. They also present a catalog of novel ASD risk factors including 118 variants and 72 genes. At a subpathway level, most variants disrupt the pathway-related gene functions, and in the same gene, they tend to hit residues extremely close to each other and in the same domain. Multiple interacting variants spotlight key modules, including the cAMP (adenosine 3′,5′-monophosphate) second-messenger system and mGluR (metabotropic glutamate receptor) signaling regulation by GRKs (G protein–coupled receptor kinases). At a superpathway level, distinct pathways further interconnect and converge to three biology themes: synaptic function, morphology, and plasticity. PMID:29651456

  14. Systematic reconstruction of autism biology from massive genetic mutation profiles.

    PubMed

    Luo, Weijun; Zhang, Chaolin; Jiang, Yong-Hui; Brouwer, Cory R

    2018-04-01

    Autism spectrum disorder (ASD) affects 1% of world population and has become a pressing medical and social problem worldwide. As a paradigmatic complex genetic disease, ASD has been intensively studied and thousands of gene mutations have been reported. Because these mutations rarely recur, it is difficult to (i) pinpoint the fewer disease-causing versus majority random events and (ii) replicate or verify independent studies. A coherent and systematic understanding of autism biology has not been achieved. We analyzed 3392 and 4792 autism-related mutations from two large-scale whole-exome studies across multiple resolution levels, that is, variants (single-nucleotide), genes (protein-coding unit), and pathways (molecular module). These mutations do not recur or replicate at the variant level, but significantly and increasingly do so at gene and pathway levels. Genetic association reveals a novel gene + pathway dual-hit model, where the mutation burden becomes less relevant. In multiple independent analyses, hundreds of variants or genes repeatedly converge to several canonical pathways, either novel or literature-supported. These pathways define recurrent and systematic ASD biology, distinct from previously reported gene groups or networks. They also present a catalog of novel ASD risk factors including 118 variants and 72 genes. At a subpathway level, most variants disrupt the pathway-related gene functions, and in the same gene, they tend to hit residues extremely close to each other and in the same domain. Multiple interacting variants spotlight key modules, including the cAMP (adenosine 3',5'-monophosphate) second-messenger system and mGluR (metabotropic glutamate receptor) signaling regulation by GRKs (G protein-coupled receptor kinases). At a superpathway level, distinct pathways further interconnect and converge to three biology themes: synaptic function, morphology, and plasticity.

  15. FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe.

    PubMed

    Gertz, E Michael; Chowdhury, Salim Akhter; Lee, Woei-Jyh; Wangsa, Darawalee; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schwartz, Russell; Schäffer, Alejandro A

    2016-01-01

    Advances in fluorescence in situ hybridization (FISH) make it feasible to detect multiple copy-number changes in hundreds of cells of solid tumors. Studies using FISH, sequencing, and other technologies have revealed substantial intra-tumor heterogeneity. The evolution of subclones in tumors may be modeled by phylogenies. Tumors often harbor aneuploid or polyploid cell populations. Using a FISH probe to estimate changes in ploidy can guide the creation of trees that model changes in ploidy and individual gene copy-number variations. We present FISHtrees 3.0, which implements a ploidy-based tree building method based on mixed integer linear programming (MILP). The ploidy-based modeling in FISHtrees includes a new formulation of the problem of merging trees for changes of a single gene into trees modeling changes in multiple genes and the ploidy. When multiple samples are collected from each patient, varying over time or tumor regions, it is useful to evaluate similarities in tumor progression among the samples. Therefore, we further implemented in FISHtrees 3.0 a new method to build consensus graphs for multiple samples. We validate FISHtrees 3.0 on a simulated data and on FISH data from paired cases of cervical primary and metastatic tumors and on paired breast ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Tests on simulated data show improved accuracy of the ploidy-based approach relative to prior ploidyless methods. Tests on real data further demonstrate novel insights these methods offer into tumor progression processes. Trees for DCIS samples are significantly less complex than trees for paired IDC samples. Consensus graphs show substantial divergence among most paired samples from both sets. Low consensus between DCIS and IDC trees may help explain the difficulty in finding biomarkers that predict which DCIS cases are at most risk to progress to IDC. The FISHtrees software is available at ftp://ftp.ncbi.nih.gov/pub/FISHtrees.

  16. FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe

    PubMed Central

    Chowdhury, Salim Akhter; Lee, Woei-Jyh; Wangsa, Darawalee; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schwartz, Russell; Schäffer, Alejandro A.

    2016-01-01

    Advances in fluorescence in situ hybridization (FISH) make it feasible to detect multiple copy-number changes in hundreds of cells of solid tumors. Studies using FISH, sequencing, and other technologies have revealed substantial intra-tumor heterogeneity. The evolution of subclones in tumors may be modeled by phylogenies. Tumors often harbor aneuploid or polyploid cell populations. Using a FISH probe to estimate changes in ploidy can guide the creation of trees that model changes in ploidy and individual gene copy-number variations. We present FISHtrees 3.0, which implements a ploidy-based tree building method based on mixed integer linear programming (MILP). The ploidy-based modeling in FISHtrees includes a new formulation of the problem of merging trees for changes of a single gene into trees modeling changes in multiple genes and the ploidy. When multiple samples are collected from each patient, varying over time or tumor regions, it is useful to evaluate similarities in tumor progression among the samples. Therefore, we further implemented in FISHtrees 3.0 a new method to build consensus graphs for multiple samples. We validate FISHtrees 3.0 on a simulated data and on FISH data from paired cases of cervical primary and metastatic tumors and on paired breast ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Tests on simulated data show improved accuracy of the ploidy-based approach relative to prior ploidyless methods. Tests on real data further demonstrate novel insights these methods offer into tumor progression processes. Trees for DCIS samples are significantly less complex than trees for paired IDC samples. Consensus graphs show substantial divergence among most paired samples from both sets. Low consensus between DCIS and IDC trees may help explain the difficulty in finding biomarkers that predict which DCIS cases are at most risk to progress to IDC. The FISHtrees software is available at ftp://ftp.ncbi.nih.gov/pub/FISHtrees. PMID:27362268

  17. The genetic basis of alcoholism: multiple phenotypes, many genes, complex networks.

    PubMed

    Morozova, Tatiana V; Goldman, David; Mackay, Trudy F C; Anholt, Robert R H

    2012-02-20

    Alcoholism is a significant public health problem. A picture of the genetic architecture underlying alcohol-related phenotypes is emerging from genome-wide association studies and work on genetically tractable model organisms.

  18. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    PubMed

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  19. Punctual Transcriptional Regulation by the Rice Circadian Clock under Fluctuating Field Conditions[OPEN

    PubMed Central

    Matsuzaki, Jun; Kawahara, Yoshihiro; Izawa, Takeshi

    2015-01-01

    Plant circadian clocks that oscillate autonomously with a roughly 24-h period are entrained by fluctuating light and temperature and globally regulate downstream genes in the field. However, it remains unknown how punctual internal time produced by the circadian clock in the field is and how it is affected by environmental fluctuations due to weather or daylength. Using hundreds of samples of field-grown rice (Oryza sativa) leaves, we developed a statistical model for the expression of circadian clock-related genes integrating diurnally entrained circadian clock with phase setting by light, both responses to light and temperature gated by the circadian clock. We show that expression of individual genes was strongly affected by temperature. However, internal time estimated from expression of multiple genes, which may reflect transcriptional regulation of downstream genes, is punctual to 22 min and not affected by weather, daylength, or plant developmental age in the field. We also revealed perturbed progression of internal time under controlled environment or in a mutant of the circadian clock gene GIGANTEA. Thus, we demonstrated that the circadian clock is a regulatory network of multiple genes that retains accurate physical time of day by integrating the perturbations on individual genes under fluctuating environments in the field. PMID:25757473

  20. Novel method to load multiple genes onto a mammalian artificial chromosome.

    PubMed

    Tóth, Anna; Fodor, Katalin; Praznovszky, Tünde; Tubak, Vilmos; Udvardy, Andor; Hadlaczky, Gyula; Katona, Robert L

    2014-01-01

    Mammalian artificial chromosomes are natural chromosome-based vectors that may carry a vast amount of genetic material in terms of both size and number. They are reasonably stable and segregate well in both mitosis and meiosis. A platform artificial chromosome expression system (ACEs) was earlier described with multiple loading sites for a modified lambda-integrase enzyme. It has been shown that this ACEs is suitable for high-level industrial protein production and the treatment of a mouse model for a devastating human disorder, Krabbe's disease. ACEs-treated mutant mice carrying a therapeutic gene lived more than four times longer than untreated counterparts. This novel gene therapy method is called combined mammalian artificial chromosome-stem cell therapy. At present, this method suffers from the limitation that a new selection marker gene should be present for each therapeutic gene loaded onto the ACEs. Complex diseases require the cooperative action of several genes for treatment, but only a limited number of selection marker genes are available and there is also a risk of serious side-effects caused by the unwanted expression of these marker genes in mammalian cells, organs and organisms. We describe here a novel method to load multiple genes onto the ACEs by using only two selectable marker genes. These markers may be removed from the ACEs before therapeutic application. This novel technology could revolutionize gene therapeutic applications targeting the treatment of complex disorders and cancers. It could also speed up cell therapy by allowing researchers to engineer a chromosome with a predetermined set of genetic factors to differentiate adult stem cells, embryonic stem cells and induced pluripotent stem (iPS) cells into cell types of therapeutic value. It is also a suitable tool for the investigation of complex biochemical pathways in basic science by producing an ACEs with several genes from a signal transduction pathway of interest.

  1. Assessing interactions between HLA-DRB1*15 and infectious mononucleosis on the risk of multiple sclerosis.

    PubMed

    Disanto, Giulio; Hall, Carolina; Lucas, Robyn; Ponsonby, Anne-Louise; Berlanga-Taylor, Antonio J; Giovannoni, Gavin; Ramagopalan, Sreeram V

    2013-09-01

    Gene-environment interactions may shed light on the mechanisms underlying multiple sclerosis (MS). We pooled data from two case-control studies on incident demyelination and used different methods to assess interaction between HLA-DRB1*15 (DRB1-15) and history of infectious mononucleosis (IM). Individuals exposed to both factors were at substantially increased risk of disease (OR=7.32, 95% CI=4.92-10.90). In logistic regression models, DRB1-15 and IM status were independent predictors of disease while their interaction term was not (DRB1-15*IM: OR=1.35, 95% CI=0.79-2.23). However, interaction on an additive scale was evident (Synergy index=2.09, 95% CI=1.59-2.59; excess risk due to interaction=3.30, 95%CI=0.47-6.12; attributable proportion due to interaction=45%, 95% CI=22-68%). This suggests, if the additive model is appropriate, the DRB1-15 and IM may be involved in the same causal process leading to MS and highlights the benefit of reporting gene-environment interactions on both a multiplicative and additive scale.

  2. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease

    PubMed Central

    McDonald, Jacqueline U.; Kaforou, Myrsini; Clare, Simon; Hale, Christine; Ivanova, Maria; Huntley, Derek; Dorner, Marcus; Wright, Victoria J.; Levin, Michael; Martinon-Torres, Federico; Herberg, Jethro A.

    2016-01-01

    ABSTRACT Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of “big data” is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases. PMID:27822537

  3. Two FGFRL-Wnt circuits organize the planarian anteroposterior axis.

    PubMed

    Scimone, M Lucila; Cote, Lauren E; Rogers, Travis; Reddien, Peter W

    2016-04-11

    How positional information instructs adult tissue maintenance is poorly understood. Planarians undergo whole-body regeneration and tissue turnover, providing a model for adult positional information studies. Genes encoding secreted and transmembrane components of multiple developmental pathways are predominantly expressed in planarian muscle cells. Several of these genes regulate regional identity, consistent with muscle harboring positional information. Here, single-cell RNA-sequencing of 115 muscle cells from distinct anterior-posterior regions identified 44 regionally expressed genes, including multiple Wnt and ndk/FGF receptor-like (ndl/FGFRL) genes. Two distinct FGFRL-Wnt circuits, involving juxtaposed anterior FGFRL and posterior Wnt expression domains, controlled planarian head and trunk patterning. ndl-3 and wntP-2 inhibition expanded the trunk, forming ectopic mouths and secondary pharynges, which independently extended and ingested food. fz5/8-4 inhibition, like that of ndk and wntA, caused posterior brain expansion and ectopic eye formation. Our results suggest that FGFRL-Wnt circuits operate within a body-wide coordinate system to control adult axial positioning.

  4. Fast and robust group-wise eQTL mapping using sparse graphical models.

    PubMed

    Cheng, Wei; Shi, Yu; Zhang, Xiang; Wang, Wei

    2015-01-16

    Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways. We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods. Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes.

  5. Multi-environment QTL analysis of grain morphology traits and fine mapping of a kernel-width QTL in Zheng58 × SK maize population.

    PubMed

    Raihan, Mohammad Sharif; Liu, Jie; Huang, Juan; Guo, Huan; Pan, Qingchun; Yan, Jianbing

    2016-08-01

    Sixteen major QTLs regulating maize kernel traits were mapped in multiple environments and one of them, qKW - 9.2 , was restricted to 630 Kb, harboring 28 putative gene models. To elucidate the genetic basis of kernel traits, a quantitative trait locus (QTL) analysis was conducted in a maize recombinant inbred line population derived from a cross between two diverse parents Zheng58 and SK, evaluated across eight environments. Construction of a high-density linkage map was based on 13,703 single-nucleotide polymorphism markers, covering 1860.9 cM of the whole genome. In total, 18, 26, 23, and 19 QTLs for kernel length, width, thickness, and 100-kernel weight, respectively, were detected on the basis of a single-environment analysis, and each QTL explained 3.2-23.7 % of the phenotypic variance. Sixteen major QTLs, which could explain greater than 10 % of the phenotypic variation, were mapped in multiple environments, implying that kernel traits might be controlled by many minor and multiple major QTLs. The major QTL qKW-9.2 with physical confidence interval of 1.68 Mbp, affecting kernel width, was then selected for fine mapping using heterogeneous inbred families. At final, the location of the underlying gene was narrowed down to 630 Kb, harboring 28 putative candidate-gene models. This information will enhance molecular breeding for kernel traits and simultaneously assist the gene cloning underlying this QTL, helping to reveal the genetic basis of kernel development in maize.

  6. Multiple homologous genes knockout (KO) by CRISPR/Cas9 system in rabbit.

    PubMed

    Liu, Huan; Sui, Tingting; Liu, Di; Liu, Tingjun; Chen, Mao; Deng, Jichao; Xu, Yuanyuan; Li, Zhanjun

    2018-03-20

    The CRISPR/Cas9 system is a highly efficient and convenient genome editing tool, which has been widely used for single or multiple gene mutation in a variety of organisms. Disruption of multiple homologous genes, which have similar DNA sequences and gene function, is required for the study of the desired phenotype. In this study, to test whether the CRISPR/Cas9 system works on the mutation of multiple homologous genes, a single guide RNA (sgRNA) targeting three fucosyltransferases encoding genes (FUT1, FUT2 and SEC1) was designed. As expected, triple gene mutation of FUT1, FUT2 and SEC1 could be achieved simultaneously via a sgRNA mediated CRISPR/Cas9 system. Besides, significantly reduced serum fucosyltransferases enzymes activity was also determined in those triple gene mutation rabbits. Thus, we provide the first evidence that multiple homologous genes knockout (KO) could be achieved efficiently by a sgRNA mediated CRISPR/Cas9 system in mammals, which could facilitate the genotype to phenotype studies of homologous genes in future. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. The genetic basis of alcoholism: multiple phenotypes, many genes, complex networks

    PubMed Central

    2012-01-01

    Alcoholism is a significant public health problem. A picture of the genetic architecture underlying alcohol-related phenotypes is emerging from genome-wide association studies and work on genetically tractable model organisms. PMID:22348705

  8. A Mechanistic Model for Cooperative Behavior of Co-transcribing RNA Polymerases

    PubMed Central

    Heberling, Tamra; Davis, Lisa; Gedeon, Jakub; Morgan, Charles; Gedeon, Tomáš

    2016-01-01

    In fast-transcribing prokaryotic genes, such as an rrn gene in Escherichia coli, many RNA polymerases (RNAPs) transcribe the DNA simultaneously. Active elongation of RNAPs is often interrupted by pauses, which has been observed to cause RNAP traffic jams; yet some studies indicate that elongation seems to be faster in the presence of multiple RNAPs than elongation by a single RNAP. We propose that an interaction between RNAPs via the torque produced by RNAP motion on helically twisted DNA can explain this apparent paradox. We have incorporated the torque mechanism into a stochastic model and simulated transcription both with and without torque. Simulation results illustrate that the torque causes shorter pause durations and fewer collisions between polymerases. Our results suggest that the torsional interaction of RNAPs is an important mechanism in maintaining fast transcription times, and that transcription should be viewed as a cooperative group effort by multiple polymerases. PMID:27517607

  9. FOXP3 Orchestrates H4K16 Acetylation and H3K4 Tri-Methylation for Activation of Multiple Genes through Recruiting MOF and Causing Displacement of PLU-1

    PubMed Central

    Katoh, Hiroto; Qin, Zhaohui S.; Liu, Runhua; Wang, Lizhong; Li, Weiquan; Li, Xiangzhi; Wu, Lipeng; Du, Zhanwen; Lyons, Robert; Liu, Chang-Gong; Liu, Xiuping; Dou, Yali; Zheng, Pan; Liu, Yang

    2011-01-01

    SUMMARY Both H4K16 acetylation and H3K4 tri-methylation are required for gene activation. However, it is still largely unclear how these modifications are orchestrated by transcriptional factors. Here we analyzed the mechanism of the transcriptional activation by FOXP3, an X-linked suppressor of autoimmune diseases and cancers. FOXP3 binds near transcriptional start sites of its target genes. By recruiting MOF and displacing histone H3K4 demethylase PLU-1, FOXP3 increases both H4K16 acetylation and H3K4 tri-methylation at the FOXP3-associated chromatins of multiple FOXP3-activated genes. RNAi-mediated silencing of MOF reduced both gene activation and tumor suppression by FOXP3, while both somatic mutations in clinical cancer samples and targeted mutation of FOXP3 in mouse prostate epithelial disrupted nuclear localization of MOF. Our data demonstrate a pull-push model in which a single transcription factor orchestrates two epigenetic alterations necessary for gene activation and provide a mechanism for somatic inactivation of the FOXP3 protein function in cancer cells. PMID:22152480

  10. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  11. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

    PubMed

    Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

    2003-01-01

    The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.

  12. Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie

    2014-05-12

    Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.« less

  13. Dissecting DNA repair in adult high grade gliomas for patient stratification in the post-genomic era

    PubMed Central

    Perry, Christina; Agarwal, Devika; Abdel-Fatah, Tarek M.A.; Lourdusamy, Anbarasu; Grundy, Richard; Auer, Dorothee T.; Walker, David; Lakhani, Ravi; Scott, Ian S.; Chan, Stephen; Ball, Graham; Madhusudan, Srinivasan

    2014-01-01

    Deregulation of multiple DNA repair pathways may contribute to aggressive biology and therapy resistance in gliomas. We evaluated transcript levels of 157 genes involved in DNA repair in an adult glioblastoma Test set (n=191) and validated in ‘The Cancer Genome Atlas’ (TCGA) cohort (n=508). A DNA repair prognostic index model was generated. Artificial neural network analysis (ANN) was conducted to investigate global gene interactions. Protein expression by immunohistochemistry was conducted in 61 tumours. A fourteen DNA repair gene expression panel was associated with poor survival in Test and TCGA cohorts. A Cox multivariate model revealed APE1, NBN, PMS2, MGMT and PTEN as independently associated with poor prognosis. A DNA repair prognostic index incorporating APE1, NBN, PMS2, MGMT and PTEN stratified patients in to three prognostic sub-groups with worsening survival. APE1, NBN, PMS2, MGMT and PTEN also have predictive significance in patients who received chemotherapy and/or radiotherapy. ANN analysis of APE1, NBN, PMS2, MGMT and PTEN revealed interactions with genes involved in transcription, hypoxia and metabolic regulation. At the protein level, low APE1 and low PTEN remain associated with poor prognosis. In conclusion, multiple DNA repair pathways operate to influence biology and clinical outcomes in adult high grade gliomas. PMID:25026297

  14. Integrative prescreening in analysis of multiple cancer genomic studies

    PubMed Central

    2012-01-01

    Background In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost. Results An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach. Conclusions The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers. PMID:22799431

  15. Restoring Dystrophin Expression in Duchenne Muscular Dystrophy Muscle

    PubMed Central

    Hoffman, Eric P.; Bronson, Abby; Levin, Arthur A.; Takeda, Shin'ichi; Yokota, Toshifumi; Baudy, Andreas R.; Connor, Edward M.

    2011-01-01

    The identification of the Duchenne muscular dystrophy gene and protein in the late 1980s led to high hopes of rapid translation to molecular therapeutics. These hopes were fueled by early reports of delivering new functional genes to dystrophic muscle in mouse models using gene therapy and stem cell transplantation. However, significant barriers have thwarted translation of these approaches to true therapies, including insufficient therapeutic material (eg, cells and viral vectors), challenges in systemic delivery, and immunological hurdles. An alternative approach is to repair the patient's own gene. Two innovative small-molecule approaches have emerged as front-line molecular therapeutics: exon skipping and stop codon read through. Both approaches are in human clinical trials and aim to coax dystrophin protein production from otherwise inactive mutant genes. In the clinically severe dog model of Duchenne muscular dystrophy, the exon-skipping approach recently improved multiple functional outcomes. We discuss the status of these two methods aimed at inducing de novo dystrophin production from mutant genes and review implications for other disorders. PMID:21703390

  16. Petunia, Your Next Supermodel?

    PubMed Central

    Vandenbussche, Michiel; Chambrier, Pierre; Rodrigues Bento, Suzanne; Morel, Patrice

    2016-01-01

    Plant biology in general, and plant evo–devo in particular would strongly benefit from a broader range of available model systems. In recent years, technological advances have facilitated the analysis and comparison of individual gene functions in multiple species, representing now a fairly wide taxonomic range of the plant kingdom. Because genes are embedded in gene networks, studying evolution of gene function ultimately should be put in the context of studying the evolution of entire gene networks, since changes in the function of a single gene will normally go together with further changes in its network environment. For this reason, plant comparative biology/evo–devo will require the availability of a defined set of ‘super’ models occupying key taxonomic positions, in which performing gene functional analysis and testing genetic interactions ideally is as straightforward as, e.g., in Arabidopsis. Here we review why petunia has the potential to become one of these future supermodels, as a representative of the Asterid clade. We will first detail its intrinsic qualities as a model system. Next, we highlight how the revolution in sequencing technologies will now finally allows exploitation of the petunia system to its full potential, despite that petunia has already a long history as a model in plant molecular biology and genetics. We conclude with a series of arguments in favor of a more diversified multi-model approach in plant biology, and we point out where the petunia model system may further play a role, based on its biological features and molecular toolkit. PMID:26870078

  17. Identifying metabolic enzymes with multiple types of association evidence

    PubMed Central

    Kharchenko, Peter; Chen, Lifeng; Freund, Yoav; Vitkup, Dennis; Church, George M

    2006-01-01

    Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities. PMID:16571130

  18. Chronic exposure to water pollutant trichloroethylene increased epigenetic drift in CD4+ T cells

    PubMed Central

    Gilbert, Kathleen M; Blossom, Sarah J; Erickson, Stephen W; Reisfeld, Brad; Zurlinden, Todd J; Broadfoot, Brannon; West, Kirk; Bai, Shasha; Cooney, Craig A

    2016-01-01

    Aim: Autoimmune disease and CD4+ T-cell alterations are induced in mice exposed to the water pollutant trichloroethylene (TCE). We examined here whether TCE altered gene-specific DNA methylation in CD4+ T cells as a possible mechanism of immunotoxicity. Materials & methods: Naive and effector/memory CD4+ T cells from mice exposed to TCE (0.5 mg/ml in drinking water) for 40 weeks were examined by bisulfite next-generation DNA sequencing. Results: A probabilistic model calculated from multiple genes showed that TCE decreased methylation control in CD4+ T cells. Data from individual genes fitted to a quadratic regression model showed that TCE increased gene-specific methylation variance in both CD4 subsets. Conclusion: TCE increased epigenetic drift of specific CpG sites in CD4+ T cells. PMID:27092578

  19. Inferring Gene Family Histories in Yeast Identifies Lineage Specific Expansions

    PubMed Central

    Ames, Ryan M.; Money, Daniel; Lovell, Simon C.

    2014-01-01

    The complement of genes found in the genome is a balance between gene gain and gene loss. Knowledge of the specific genes that are gained and lost over evolutionary time allows an understanding of the evolution of biological functions. Here we use new evolutionary models to infer gene family histories across complete yeast genomes; these models allow us to estimate the relative genome-wide rates of gene birth, death, innovation and extinction (loss of an entire family) for the first time. We show that the rates of gene family evolution vary both between gene families and between species. We are also able to identify those families that have experienced rapid lineage specific expansion/contraction and show that these families are enriched for specific functions. Moreover, we find that families with specific functions are repeatedly expanded in multiple species, suggesting the presence of common adaptations and that these family expansions/contractions are not random. Additionally, we identify potential specialisations, unique to specific species, in the functions of lineage specific expanded families. These results suggest that an important mechanism in the evolution of genome content is the presence of lineage-specific gene family changes. PMID:24921666

  20. Identification of additive, dominant, and epistatic variation conferred by key genes in cellulose biosynthesis pathway in Populus tomentosa†

    PubMed Central

    Du, Qingzhang; Tian, Jiaxing; Yang, Xiaohui; Pan, Wei; Xu, Baohua; Li, Bailian; Ingvarsson, Pär K.; Zhang, Deqiang

    2015-01-01

    Economically important traits in many species generally show polygenic, quantitative inheritance. The components of genetic variation (additive, dominant and epistatic effects) of these traits conferred by multiple genes in shared biological pathways remain to be defined. Here, we investigated 11 full-length genes in cellulose biosynthesis, on 10 growth and wood-property traits, within a population of 460 unrelated Populus tomentosa individuals, via multi-gene association. To validate positive associations, we conducted single-marker analysis in a linkage population of 1,200 individuals. We identified 118, 121, and 43 associations (P< 0.01) corresponding to additive, dominant, and epistatic effects, respectively, with low to moderate proportions of phenotypic variance (R2). Epistatic interaction models uncovered a combination of three non-synonymous sites from three unique genes, representing a significant epistasis for diameter at breast height and stem volume. Single-marker analysis validated 61 associations (false discovery rate, Q ≤ 0.10), representing 38 SNPs from nine genes, and its average effect (R2 = 3.8%) nearly 2-fold higher than that identified with multi-gene association, suggesting that multi-gene association can capture smaller individual variants. Moreover, a structural gene–gene network based on tissue-specific transcript abundances provides a better understanding of the multi-gene pathway affecting tree growth and lignocellulose biosynthesis. Our study highlights the importance of pathway-based multiple gene associations to uncover the nature of genetic variance for quantitative traits and may drive novel progress in molecular breeding. PMID:25428896

  1. Developing and applying a gene functional association network for anti-angiogenic kinase inhibitor activity assessment in an angiogenesis co-culture model

    PubMed Central

    Chen, Yuefeng; Wei, Tao; Yan, Lei; Lawrence, Frank; Qian, Hui-Rong; Burkholder, Timothy P; Starling, James J; Yingling, Jonathan M; Shou, Jianyong

    2008-01-01

    Background Tumor angiogenesis is a highly regulated process involving intercellular communication as well as the interactions of multiple downstream signal transduction pathways. Disrupting one or even a few angiogenesis pathways is often insufficient to achieve sustained therapeutic benefits due to the complexity of angiogenesis. Targeting multiple angiogenic pathways has been increasingly recognized as a viable strategy. However, translation of the polypharmacology of a given compound to its antiangiogenic efficacy remains a major technical challenge. Developing a global functional association network among angiogenesis-related genes is much needed to facilitate holistic understanding of angiogenesis and to aid the development of more effective anti-angiogenesis therapeutics. Results We constructed a comprehensive gene functional association network or interactome by transcript profiling an in vitro angiogenesis model, in which human umbilical vein endothelial cells (HUVECs) formed capillary structures when co-cultured with normal human dermal fibroblasts (NHDFs). HUVEC competence and NHDF supportiveness of cord formation were found to be highly cell-passage dependent. An enrichment test of Biological Processes (BP) of differentially expressed genes (DEG) revealed that angiogenesis related BP categories significantly changed with cell passages. Built upon 2012 DEGs identified from two microarray studies, the resulting interactome captured 17226 functional gene associations and displayed characteristics of a scale-free network. The interactome includes the involvement of oncogenes and tumor suppressor genes in angiogenesis. We developed a network walking algorithm to extract connectivity information from the interactome and applied it to simulate the level of network perturbation by three multi-targeted anti-angiogenic kinase inhibitors. Simulated network perturbation correlated with observed anti-angiogenesis activity in a cord formation bioassay. Conclusion We established a comprehensive gene functional association network to model in vitro angiogenesis regulation. The present study provided a proof-of-concept pilot of applying network perturbation analysis to drug phenotypic activity assessment. PMID:18518970

  2. Candidate Gene Study of TRAIL and TRAIL Receptors: Association with Response to Interferon Beta Therapy in Multiple Sclerosis Patients

    PubMed Central

    Órpez-Zafra, Teresa; Pinto-Medel, María Jesús; Oliver-Martos, Begoña; Ortega-Pinazo, Jesús; Arnáiz, Carlos; Guijarro-Castro, Cristina; Varadé, Jezabel; Álvarez-Lafuente, Roberto; Urcelay, Elena; Sánchez-Jiménez, Francisca

    2013-01-01

    TRAIL and TRAIL Receptor genes have been implicated in Multiple Sclerosis pathology as well as in the response to IFN beta therapy. The objective of our study was to evaluate the association of these genes in relation to the age at disease onset (AAO) and to the clinical response upon IFN beta treatment in Spanish MS patients. We carried out a candidate gene study of TRAIL, TRAILR-1, TRAILR-2, TRAILR-3 and TRAILR-4 genes. A total of 54 SNPs were analysed in 509 MS patients under IFN beta treatment, and an additional cohort of 226 MS patients was used to validate the results. Associations of rs1047275 in TRAILR-2 and rs7011559 in TRAILR-4 genes with AAO under an additive model did not withstand Bonferroni correction. In contrast, patients with the TRAILR-1 rs20576-CC genotype showed a better clinical response to IFN beta therapy compared with patients carrying the A-allele (recessive model: p = 8.88×10−4, pc = 0.048, OR = 0.30). This SNP resulted in a non synonymous substitution of Glutamic acid to Alanine in position 228 (E228A), a change previously associated with susceptibility to different cancer types and risk of metastases, suggesting a lack of functionality of TRAILR-1. In order to unravel how this amino acid change in TRAILR-1 would affect to death signal, we performed a molecular modelling with both alleles. Neither TRAIL binding sites in the receptor nor the expression levels of TRAILR-1 in peripheral blood mononuclear cell subsets (monocytes, CD4+ and CD8+ T cells) were modified, suggesting that this SNP may be altering the death signal by some other mechanism. These findings show a role for TRAILR-1 gene variations in the clinical outcome of IFN beta therapy that might have relevance as a biomarker to predict the response to IFN beta in MS. PMID:23658636

  3. Bayesian correlated clustering to integrate multiple datasets

    PubMed Central

    Kirk, Paul; Griffin, Jim E.; Savage, Richard S.; Ghahramani, Zoubin; Wild, David L.

    2012-01-01

    Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047558

  4. Understanding the Origin of Species with Genome-Scale Data: the Role of Gene Flow

    PubMed Central

    Sousa, Vitor; Hey, Jody

    2017-01-01

    As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population-genomic data sets. Such data hold the potential to resolve evolutionary biology’s long-standing questions about the role of gene exchange in species formation. In principle the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here. PMID:23657479

  5. Predicting functional divergence in protein evolution by site-specific rate shifts

    NASA Technical Reports Server (NTRS)

    Gaucher, Eric A.; Gu, Xun; Miyamoto, Michael M.; Benner, Steven A.

    2002-01-01

    Most modern tools that analyze protein evolution allow individual sites to mutate at constant rates over the history of the protein family. However, Walter Fitch observed in the 1970s that, if a protein changes its function, the mutability of individual sites might also change. This observation is captured in the "non-homogeneous gamma model", which extracts functional information from gene families by examining the different rates at which individual sites evolve. This model has recently been coupled with structural and molecular biology to identify sites that are likely to be involved in changing function within the gene family. Applying this to multiple gene families highlights the widespread divergence of functional behavior among proteins to generate paralogs and orthologs.

  6. GeneSilico protein structure prediction meta-server.

    PubMed

    Kurowski, Michal A; Bujnicki, Janusz M

    2003-07-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.

  7. GeneSilico protein structure prediction meta-server

    PubMed Central

    Kurowski, Michal A.; Bujnicki, Janusz M.

    2003-01-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313

  8. Selection of sporophytic and gametophytic self-incompatibility in the absence of a superlocus.

    PubMed

    Schoen, Daniel J; Roda, Megan J

    2016-06-01

    Self-incompatibility (SI) is a complex trait that enforces outcrossing in plant populations. SI generally involves tight linkage of genes coding for the proteins that underlie self-pollen detection and pollen identity specification. Here, we develop two-locus genetic models to address the question of whether sporophytic SI (SSI) and gametophytic SI (GSI) can invade populations of self-compatible plants when there is no linkage or weak linkage of the underlying pollen detection and identity genes (i.e., no S-locus supergene). The models assume that SI evolves as a result of exaptation of genes formerly involved in functions other than SI. Model analysis reveals that SSI and GSI can invade populations even when the underlying genes are loosely linked, provided that inbreeding depression and selfing rate are sufficiently high. Reducing recombination between these genes makes conditions for invasion more lenient. These results can help account for multiple, independent evolution of SI systems as seems to have occurred in the angiosperms. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  9. Gene regulation and noise reduction by coupling of stochastic processes

    NASA Astrophysics Data System (ADS)

    Ramos, Alexandre F.; Hornos, José Eduardo M.; Reinitz, John

    2015-02-01

    Here we characterize the low-noise regime of a stochastic model for a negative self-regulating binary gene. The model has two stochastic variables, the protein number and the state of the gene. Each state of the gene behaves as a protein source governed by a Poisson process. The coupling between the two gene states depends on protein number. This fact has a very important implication: There exist protein production regimes characterized by sub-Poissonian noise because of negative covariance between the two stochastic variables of the model. Hence the protein numbers obey a probability distribution that has a peak that is sharper than those of the two coupled Poisson processes that are combined to produce it. Biochemically, the noise reduction in protein number occurs when the switching of the genetic state is more rapid than protein synthesis or degradation. We consider the chemical reaction rates necessary for Poisson and sub-Poisson processes in prokaryotes and eucaryotes. Our results suggest that the coupling of multiple stochastic processes in a negative covariance regime might be a widespread mechanism for noise reduction.

  10. Gene regulation and noise reduction by coupling of stochastic processes

    PubMed Central

    Hornos, José Eduardo M.; Reinitz, John

    2015-01-01

    Here we characterize the low noise regime of a stochastic model for a negative self-regulating binary gene. The model has two stochastic variables, the protein number and the state of the gene. Each state of the gene behaves as a protein source governed by a Poisson process. The coupling between the the two gene states depends on protein number. This fact has a very important implication: there exist protein production regimes characterized by sub-Poissonian noise because of negative covariance between the two stochastic variables of the model. Hence the protein numbers obey a probability distribution that has a peak that is sharper than those of the two coupled Poisson processes that are combined to produce it. Biochemically, the noise reduction in protein number occurs when the switching of genetic state is more rapid than protein synthesis or degradation. We consider the chemical reaction rates necessary for Poisson and sub-Poisson processes in prokaryotes and eucaryotes. Our results suggest that the coupling of multiple stochastic processes in a negative covariance regime might be a widespread mechanism for noise reduction. PMID:25768447

  11. Gene regulation and noise reduction by coupling of stochastic processes.

    PubMed

    Ramos, Alexandre F; Hornos, José Eduardo M; Reinitz, John

    2015-02-01

    Here we characterize the low-noise regime of a stochastic model for a negative self-regulating binary gene. The model has two stochastic variables, the protein number and the state of the gene. Each state of the gene behaves as a protein source governed by a Poisson process. The coupling between the two gene states depends on protein number. This fact has a very important implication: There exist protein production regimes characterized by sub-Poissonian noise because of negative covariance between the two stochastic variables of the model. Hence the protein numbers obey a probability distribution that has a peak that is sharper than those of the two coupled Poisson processes that are combined to produce it. Biochemically, the noise reduction in protein number occurs when the switching of the genetic state is more rapid than protein synthesis or degradation. We consider the chemical reaction rates necessary for Poisson and sub-Poisson processes in prokaryotes and eucaryotes. Our results suggest that the coupling of multiple stochastic processes in a negative covariance regime might be a widespread mechanism for noise reduction.

  12. Got black swimming dots in your cell culture? Identification of Achromobacter as a novel cell culture contaminant

    PubMed Central

    Gray, Jennifer Sue; Birmingham, Janette Marie; Fenton, Jenifer Imig

    2009-01-01

    ARTICLE SUMMARY Cell culture model systems are utilized for their ease of use, relative inexpensiveness, and potentially limitless sample size. Reliable results cannot be obtained, however, when cultures contain contamination. This report discusses the observation and identification of mobile black specks observed in multiple cell lines. Cultures of the contamination were grown, and DNA was purified from isolated colonies. The 16S rDNA gene was PCR amplified using primers that will amplify the gene from many genera, and then sequenced. Sequencing results matched the members of the genus Achromobacter, bacteria common in the environment. Achromobacter species have been shown to be resistant to multiple antibiotics. Attempts to decontaminate the eukaryotic cell culture used multiple antibiotics at different concentrations. The contaminating Achromobacter was eventually eliminated, without permanently harming the eukaryotic cells, using a combination of the antibiotics ciprofloxacin and piperacillin. PMID:19926304

  13. Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease

    PubMed Central

    Carbonetto, Peter; Stephens, Matthew

    2013-01-01

    Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study. PMID:24098138

  14. UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets.

    PubMed

    Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K

    2015-06-04

    Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.

  15. Detailed assessment of gene activation levels by multiple hypoxia-responsive elements under various hypoxic conditions.

    PubMed

    Takeuchi, Yasuto; Inubushi, Masayuki; Jin, Yong-Nan; Murai, Chika; Tsuji, Atsushi B; Hata, Hironobu; Kitagawa, Yoshimasa; Saga, Tsuneo

    2014-12-01

    HIF-1/HRE pathway is a promising target for the imaging and the treatment of intractable malignancy (HIF-1; hypoxia-inducible factor 1, HRE; hypoxia-responsive element). The purposes of our study are: (1) to assess the gene activation levels resulting from various numbers of HREs under various hypoxic conditions, (2) to evaluate the bidirectional activity of multiple HREs, and (3) to confirm whether multiple HREs can induce gene expression in vivo. Human colon carcinoma HCT116 cells were transiently transfected by the constructs containing a firefly luciferase reporter gene and various numbers (2, 4, 6, 8, 10, and 12) of HREs (nHRE+, nHRE-). The relative luciferase activities were measured under various durations of hypoxia (6, 12, 18, and 24 h), O2 concentrations (1, 2, 4, 8, and 16 %), and various concentrations of deferoxamine mesylate (20, 40, 80, 160, and 320 µg/mL growth medium). The bidirectional gene activation levels by HREs were examined in the constructs (dual-luc-nHREs) containing firefly and Renilla luciferase reporter genes at each side of nHREs. Finally, to test whether the construct containing 12HRE and the NIS reporter gene (12HRE-NIS) can induce gene expression in vivo, SPECT imaging was performed in a mouse xenograft model. (1) gene activation levels by HREs tended to increase with increasing HRE copy number, but a saturation effect was observed in constructs with more than 6 or 8 copies of an HRE, (2) gene activation levels by HREs increased remarkably during 6-12 h of hypoxia, but not beyond 12 h, (3) gene activation levels by HREs decreased with increasing O2 concentrations, but could be detected even under mild hypoxia at 16 % O2, (4) the bidirectionally proportional activity of the HRE was confirmed regardless of the hypoxic severity, and (5) NIS expression driven by 12 tandem copies of an HRE in response to hypoxia could be visualized on in vivo SPECT imaging. The results of this study will help in the understanding and assessment of the activity of multiple HREs under hypoxia and become the basis for hypoxia-targeted imaging and therapy in the future.

  16. Identification and Analyses of AUX-IAA target genes controlling multiple pathways in developing fiber cells of Gossypium hirsutum L

    PubMed Central

    Nigam, Deepti; Sawant, Samir V

    2013-01-01

    Technological development led to an increased interest in systems biological approaches in plants to characterize developmental mechanism and candidate genes relevant to specific tissue or cell morphology. AUX-IAA proteins are important plant-specific putative transcription factors. There are several reports on physiological response of this family in Arabidopsis but in cotton fiber the transcriptional network through which AUX-IAA regulated its target genes is still unknown. in-silico modelling of cotton fiber development specific gene expression data (108 microarrays and 22,737 genes) using Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) reveals 3690 putative AUX-IAA target genes of which 139 genes were known to be AUX-IAA co-regulated within Arabidopsis. Further AUX-IAA targeted gene regulatory network (GRN) had substantial impact on the transcriptional dynamics of cotton fiber, as showed by, altered TF networks, and Gene Ontology (GO) biological processes and metabolic pathway associated with its target genes. Analysis of the AUX-IAA-correlated gene network reveals multiple functions for AUX-IAA target genes such as unidimensional cell growth, cellular nitrogen compound metabolic process, nucleosome organization, DNA-protein complex and process related to cell wall. These candidate networks/pathways have a variety of profound impacts on such cellular functions as stress response, cell proliferation, and cell differentiation. While these functions are fairly broad, their underlying TF networks may provide a global view of AUX-IAA regulated gene expression and a GRN that guides future studies in understanding role of AUX-IAA box protein and its targets regulating fiber development. PMID:24497725

  17. Pyrethroid Resistance in Malaysian Populations of Dengue Vector Aedes aegypti Is Mediated by CYP9 Family of Cytochrome P450 Genes.

    PubMed

    Ishak, Intan H; Kamgang, Basile; Ibrahim, Sulaiman S; Riveron, Jacob M; Irving, Helen; Wondji, Charles S

    2017-01-01

    Dengue control and prevention rely heavily on insecticide-based interventions. However, insecticide resistance in the dengue vector Aedes aegypti, threatens the continued effectiveness of these tools. The molecular basis of the resistance remains uncharacterised in many endemic countries including Malaysia, preventing the design of evidence-based resistance management. Here, we investigated the underlying molecular basis of multiple insecticide resistance in Ae. aegypti populations across Malaysia detecting the major genes driving the metabolic resistance. Genome-wide microarray-based transcription analysis was carried out to detect the genes associated with metabolic resistance in these populations. Comparisons of the susceptible New Orleans strain to three non-exposed multiple insecticide resistant field strains; Penang, Kuala Lumpur and Kota Bharu detected 2605, 1480 and 425 differentially expressed transcripts respectively (fold-change>2 and p-value ≤ 0.05). 204 genes were commonly over-expressed with monooxygenase P450 genes (CYP9J27, CYP6CB1, CYP9J26 and CYP9M4) consistently the most up-regulated detoxification genes in all populations, indicating that they possibly play an important role in the resistance. In addition, glutathione S-transferases, carboxylesterases and other gene families commonly associated with insecticide resistance were also over-expressed. Gene Ontology (GO) enrichment analysis indicated an over-representation of GO terms linked to resistance such as monooxygenases, carboxylesterases, glutathione S-transferases and heme-binding. Polymorphism analysis of CYP9J27 sequences revealed a high level of polymorphism (except in Joho Bharu), suggesting a limited directional selection on this gene. In silico analysis of CYP9J27 activity through modelling and docking simulations suggested that this gene is involved in the multiple resistance in Malaysian populations as it is predicted to metabolise pyrethroids, DDT and bendiocarb. The predominant over-expression of cytochrome P450s suggests that synergist-based (PBO) control tools could be utilised to improve control of this major dengue vector across Malaysia.

  18. Identification of the Gene for Scleroderma in the Tsk/2 Mouse Strain: Implications for Human Scleroderma Pathogenesis and Subset Distinctions

    DTIC Science & Technology

    2012-07-01

    Philadelphia, PA 19104 1 Jul 2011 - 30 Jun 2012Annual01-07-2012 This project is focused on an animal model of the human disease, systemic sclerosis ...earliest indicator of tight-skin in the tissue Animal model, systemic sclerosis , scleroderma, Tsk2/+, fibrosis, gene, genetics, TGFβ 35 eblanken...the  multiple  clinical parameters of fibrotic disease from birth  onward.   BODY  Milestones were assigned to this proposal, with tasks to be

  19. Animal and in silico models for the study of sarcomeric cardiomyopathies

    PubMed Central

    Duncker, Dirk J.; Bakkers, Jeroen; Brundel, Bianca J.; Robbins, Jeff; Tardiff, Jil C.; Carrier, Lucie

    2015-01-01

    Over the past decade, our understanding of cardiomyopathies has improved dramatically, due to improvements in screening and detection of gene defects in the human genome as well as a variety of novel animal models (mouse, zebrafish, and drosophila) and in silico computational models. These novel experimental tools have created a platform that is highly complementary to the naturally occurring cardiomyopathies in cats and dogs that had been available for some time. A fully integrative approach, which incorporates all these modalities, is likely required for significant steps forward in understanding the molecular underpinnings and pathogenesis of cardiomyopathies. Finally, novel technologies, including CRISPR/Cas9, which have already been proved to work in zebrafish, are currently being employed to engineer sarcomeric cardiomyopathy in larger animals, including pigs and non-human primates. In the mouse, the increased speed with which these techniques can be employed to engineer precise ‘knock-in’ models that previously took years to make via multiple rounds of homologous recombination-based gene targeting promises multiple and precise models of human cardiac disease for future study. Such novel genetically engineered animal models recapitulating human sarcomeric protein defects will help bridging the gap to translate therapeutic targets from small animal and in silico models to the human patient with sarcomeric cardiomyopathy. PMID:25600962

  20. Rofecoxib modulates multiple gene expression pathways in a clinical model of acute inflammatory pain

    PubMed Central

    Wang, Xiao-Min; Wu, Tian-Xia; Hamza, May; Ramsay, Edward S.; Wahl, Sharon M.; Dionne, Raymond A.

    2007-01-01

    New insights into the biological properties of cyclooxygenase-2 (COX-2) and its response pathway challenge the hypothesis that COX-2 is simply pro-inflammatory and inhibition of COX-2 solely prevents the development of inflammation and ameliorates inflammatory pain. The present study performed a comprehensive analysis of gene/protein expression induced by a selective inhibitor of COX-2, rofecoxib, compared with a non-selective COX inhibitor, ibuprofen, and placebo in a clinical model of acute inflammatory pain (the surgical extraction of impacted third molars) using microarray analysis followed by quantitative RT-PCR verification and Western blotting. Inhibition of COX-2 modulated gene expression related to inflammation and pain, the arachidonic acid pathway, apoptosis/angiogenesis, cell adhesion and signal transduction. Compared to placebo, rofecoxib treatment increased the gene expression of ANXA3 (annexin 3), SOD2 (superoxide dismutase 2), SOCS3 (suppressor of cytokine signaling 3) and IL1RN (IL1 receptor antagonist) which are associated with inhibition of phospholipase A2 and suppression of cytokine signaling cascades, respectively. Both rofecoxib and ibuprofen treatment increased the gene expression of the pro-inflammatory mediators, IL6 and CCL2 (chemokine C-C motif ligand 2), following tissue injury compared to the placebo treatment. These results indicate a complex role for COX-2 in the inflammatory cascade in addition to the well-characterized COX-dependent pathway, as multiple pathways are also involved in rofecoxib-induced anti-inflammatory and analgesic effects at the gene expression level. These findings may also suggest an alternative hypothesis for the adverse effects attributed to selective inhibition of COX-2. PMID:17070997

  1. Effect of promoter architecture on the cell-to-cell variability in gene expression.

    PubMed

    Sanchez, Alvaro; Garcia, Hernan G; Jones, Daniel; Phillips, Rob; Kondev, Jané

    2011-03-01

    According to recent experimental evidence, promoter architecture, defined by the number, strength and regulatory role of the operators that control transcription, plays a major role in determining the level of cell-to-cell variability in gene expression. These quantitative experiments call for a corresponding modeling effort that addresses the question of how changes in promoter architecture affect variability in gene expression in a systematic rather than case-by-case fashion. In this article we make such a systematic investigation, based on a microscopic model of gene regulation that incorporates stochastic effects. In particular, we show how operator strength and operator multiplicity affect this variability. We examine different modes of transcription factor binding to complex promoters (cooperative, independent, simultaneous) and how each of these affects the level of variability in transcriptional output from cell-to-cell. We propose that direct comparison between in vivo single-cell experiments and theoretical predictions for the moments of the probability distribution of mRNA number per cell can be used to test kinetic models of gene regulation. The emphasis of the discussion is on prokaryotic gene regulation, but our analysis can be extended to eukaryotic cells as well.

  2. Effect of Promoter Architecture on the Cell-to-Cell Variability in Gene Expression

    PubMed Central

    Sanchez, Alvaro; Garcia, Hernan G.; Jones, Daniel; Phillips, Rob; Kondev, Jané

    2011-01-01

    According to recent experimental evidence, promoter architecture, defined by the number, strength and regulatory role of the operators that control transcription, plays a major role in determining the level of cell-to-cell variability in gene expression. These quantitative experiments call for a corresponding modeling effort that addresses the question of how changes in promoter architecture affect variability in gene expression in a systematic rather than case-by-case fashion. In this article we make such a systematic investigation, based on a microscopic model of gene regulation that incorporates stochastic effects. In particular, we show how operator strength and operator multiplicity affect this variability. We examine different modes of transcription factor binding to complex promoters (cooperative, independent, simultaneous) and how each of these affects the level of variability in transcriptional output from cell-to-cell. We propose that direct comparison between in vivo single-cell experiments and theoretical predictions for the moments of the probability distribution of mRNA number per cell can be used to test kinetic models of gene regulation. The emphasis of the discussion is on prokaryotic gene regulation, but our analysis can be extended to eukaryotic cells as well. PMID:21390269

  3. The origin of parasitism gene in nematodes: evolutionary analysis through the construction of domain trees.

    PubMed

    Yang, Yizi; Luo, Damin

    2013-01-01

    Inferring evolutionary history of parasitism genes is important to understand how evolutionary mechanisms affect the occurrences of parasitism genes. In this study, we constructed multiple domain trees for parasitism genes and genes under free-living conditions. Further analyses of horizontal gene transfer (HGT)-like phylogenetic incongruences, duplications, and speciations were performed based on these trees. By comparing these analyses, the contributions of pre-adaptations were found to be more important to the evolution of parasitism genes than those of duplications, and pre-adaptations are as crucial as previously reported HGTs to parasitism. Furthermore, speciation may also affect the evolution of parasitism genes. In addition, Pristionchus pacificus was suggested to be a common model organism for studies of parasitic nematodes, including root-knot species. These analyses provided information regarding mechanisms that may have contributed to the evolution of parasitism genes.

  4. A Comprehensive Analysis of Nuclear-Encoded Mitochondrial Genes in Schizophrenia.

    PubMed

    Gonçalves, Vanessa F; Cappi, Carolina; Hagen, Christian M; Sequeira, Adolfo; Vawter, Marquis P; Derkach, Andriy; Zai, Clement C; Hedley, Paula L; Bybjerg-Grauholm, Jonas; Pouget, Jennie G; Cuperfain, Ari B; Sullivan, Patrick F; Christiansen, Michael; Kennedy, James L; Sun, Lei

    2018-05-01

    The genetic risk factors of schizophrenia (SCZ), a severe psychiatric disorder, are not yet fully understood. Multiple lines of evidence suggest that mitochondrial dysfunction may play a role in SCZ, but comprehensive association studies are lacking. We hypothesized that variants in nuclear-encoded mitochondrial genes influence susceptibility to SCZ. We conducted gene-based and gene-set analyses using summary association results from the Psychiatric Genomics Consortium Schizophrenia Phase 2 (PGC-SCZ2) genome-wide association study comprising 35,476 cases and 46,839 control subjects. We applied the MAGMA method to three sets of nuclear-encoded mitochondrial genes: oxidative phosphorylation genes, other nuclear-encoded mitochondrial genes, and genes involved in nucleus-mitochondria crosstalk. Furthermore, we conducted a replication study using the iPSYCH SCZ sample of 2290 cases and 21,621 control subjects. In the PGC-SCZ2 sample, 1186 mitochondrial genes were analyzed, among which 159 had p values < .05 and 19 remained significant after multiple testing correction. A meta-analysis of 818 genes combining the PGC-SCZ2 and iPSYCH samples resulted in 104 nominally significant and nine significant genes, suggesting a polygenic model for the nuclear-encoded mitochondrial genes. Gene-set analysis, however, did not show significant results. In an in silico protein-protein interaction network analysis, 14 mitochondrial genes interacted directly with 158 SCZ risk genes identified in PGC-SCZ2 (permutation p = .02), and aldosterone signaling in epithelial cells and mitochondrial dysfunction pathways appeared to be overrepresented in this network of mitochondrial and SCZ risk genes. This study provides evidence that specific aspects of mitochondrial function may play a role in SCZ, but we did not observe its broad involvement even using a large sample. Copyright © 2018 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  5. SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees

    PubMed Central

    Mallo, Diego; De Oliveira Martins, Leonardo; Posada, David

    2016-01-01

    We present a fast and flexible software package—SimPhy—for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases. PMID:26526427

  6. NATRIURETIC PEPTIDE SYSTEM GENE VARIANTS ARE ASSOCIATED WITH VENTRICULAR DYSFUNCTION AFTER CORONARY ARTERY BYPASS GRAFTING

    PubMed Central

    Fox, Amanda A.; Collard, Charles D.; Shernan, Stanton K.; Seidman, Christine E.; Seidman, Jonathan G.; Liu, Kuang-Yu; Muehlschlegel, Jochen D.; Perry, Tjorvi E.; Aranki, Sary F.; Lange, Christoph; Herman, Daniel S.; Meitinger, Thomas; Lichtner, Peter; Body, Simon C.

    2009-01-01

    Background Ventricular dysfunction (VnD) after primary coronary artery bypass grafting is associated with increased hospital stay and mortality. Natriuretic peptides have compensatory vasodilatory, natriuretic and paracrine influences on myocardial failure and ischemia. We hypothesized that natriuretic peptide system gene variants independently predict risk of VnD after primary coronary artery bypass grafting. Methods 1164 patients undergoing primary coronary artery bypass grafting with cardiopulmonary bypass at two institutions were prospectively enrolled. After prospectively defined exclusions, 697 Caucasian patients (76 with VnD) were analyzed. VnD was defined as need for ≥ 2 new inotropes and/or new mechanical ventricular support after coronary artery bypass grafting. 139 haplotype-tagging SNPs within 7 genes (NPPA; NPPB; NPPC; NPR1; NPR2; NPR3; CORIN) were genotyped. SNPs univariately associated with VnD were entered into logistic regression models adjusting for clinical covariates predictive of VnD. To control for multiple comparisons, permutation analyses were conducted for all SNP associations. Results After adjusting for clinical covariates and multiple comparisons within each gene, seven NPPA/NPPB SNPs (rs632793, rs6668352, rs549596, rs198388, rs198389, rs6676300, rs1009592) were associated with decreased risk of postoperative VnD (additive model; odds ratios 0.44–0.55; P = 0.010–0.036), and four NPR3 SNPs (rs700923, rs16890196, rs765199, rs700926) were associated with increased risk of postoperative VnD (recessive model; odds ratios 3.89–4.28; P = 0.007–0.034). Conclusions Genetic variation within the NPPA/NPPB and NPR3 genes is associated with risk of VnD after primary coronary artery bypass grafting. Knowledge of such genotypic predictors may result in better understanding of the molecular mechanisms underlying postoperative VnD. PMID:19326473

  7. Radiogenomics to characterize regional genetic heterogeneity in glioblastoma

    PubMed Central

    Hu, Leland S.; Ning, Shuluo; Eschbacher, Jennifer M.; Baxter, Leslie C.; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C.; Peng, Sen; Smith, Kris A.; Nakaji, Peter; Karis, John P.; Quarles, C. Chad; Wu, Teresa; Loftus, Joseph C.; Jenkins, Robert B.; Sicotte, Hugues; Kollmeyer, Thomas M.; O'Neill, Brian P.; Elmquist, William; Hoxworth, Joseph M.; Frakes, David; Sarkaria, Jann; Swanson, Kristin R.; Tran, Nhan L.; Li, Jing; Mitchell, J. Ross

    2017-01-01

    Background Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. Methods We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). Results We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). Conclusion MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. PMID:27502248

  8. The molecular pathogenesis of schwannomatosis, a paradigm for the co-involvement of multiple tumour suppressor genes in tumorigenesis.

    PubMed

    Kehrer-Sawatzki, Hildegard; Farschtschi, Said; Mautner, Victor-Felix; Cooper, David N

    2017-02-01

    Schwannomatosis is characterized by the predisposition to develop multiple schwannomas and, less commonly, meningiomas. Despite the clinical overlap with neurofibromatosis type 2 (NF2), schwannomatosis is not caused by germline NF2 gene mutations. Instead, germline mutations of either the SMARCB1 or LZTR1 tumour suppressor genes have been identified in 86% of familial and 40% of sporadic schwannomatosis patients. In contrast to patients with rhabdoid tumours, which are due to complete loss-of-function SMARCB1 mutations, individuals with schwannomatosis harbour predominantly hypomorphic SMARCB1 mutations which give rise to the synthesis of mutant proteins with residual function that do not cause rhabdoid tumours. Although biallelic mutations of SMARCB1 or LZTR1 have been detected in the tumours of patients with schwannomatosis, the classical two-hit model of tumorigenesis is insufficient to account for schwannoma growth, since NF2 is also frequently inactivated in these tumours. Consequently, tumorigenesis in schwannomatosis must involve the mutation of at least two different tumour suppressor genes, an occurrence frequently mediated by loss of heterozygosity of large parts of chromosome 22q harbouring not only SMARCB1 and LZTR1 but also NF2. Thus, schwannomatosis is paradigmatic for a tumour predisposition syndrome caused by the concomitant mutational inactivation of two or more tumour suppressor genes. This review provides an overview of current models of tumorigenesis and mutational patterns underlying schwannomatosis that will ultimately help to explain the complex clinical presentation of this rare disease.

  9. Gene genealogies for genetic association mapping, with application to Crohn's disease

    PubMed Central

    Burkett, Kelly M.; Greenwood, Celia M. T.; McNeney, Brad; Graham, Jinko

    2013-01-01

    A gene genealogy describes relationships among haplotypes sampled from a population. Knowledge of the gene genealogy for a set of haplotypes is useful for estimation of population genetic parameters and it also has potential application in finding disease-predisposing genetic variants. As the true gene genealogy is unknown, Markov chain Monte Carlo (MCMC) approaches have been used to sample genealogies conditional on data at multiple genetic markers. We previously implemented an MCMC algorithm to sample from an approximation to the distribution of the gene genealogy conditional on haplotype data. Our approach samples ancestral trees, recombination and mutation rates at a genomic focal point. In this work, we describe how our sampler can be used to find disease-predisposing genetic variants in samples of cases and controls. We use a tree-based association statistic that quantifies the degree to which case haplotypes are more closely related to each other around the focal point than control haplotypes, without relying on a disease model. As the ancestral tree is a latent variable, so is the tree-based association statistic. We show how the sampler can be used to estimate the posterior distribution of the latent test statistic and corresponding latent p-values, which together comprise a fuzzy p-value. We illustrate the approach on a publicly-available dataset from a study of Crohn's disease that consists of genotypes at multiple SNP markers in a small genomic region. We estimate the posterior distribution of the tree-based association statistic and the recombination rate at multiple focal points in the region. Reassuringly, the posterior mean recombination rates estimated at the different focal points are consistent with previously published estimates. The tree-based association approach finds multiple sub-regions where the case haplotypes are more genetically related than the control haplotypes, and that there may be one or multiple disease-predisposing loci. PMID:24348515

  10. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  11. In Vivo Imaging of Transgenic Gene Expression in Individual Retinal Progenitors in Chimeric Zebrafish Embryos to Study Cell Nonautonomous Influences.

    PubMed

    Dudczig, Stefanie; Currie, Peter D; Poggi, Lucia; Jusuf, Patricia R

    2017-03-22

    The genetic and technical strengths have made the zebrafish vertebrate a key model organism in which the consequences of gene manipulations can be traced in vivo throughout the rapid developmental period. Multiple processes can be studied including cell proliferation, gene expression, cell migration and morphogenesis. Importantly, the generation of chimeras through transplantations can be easily performed, allowing mosaic labeling and tracking of individual cells under the influence of the host environment. For example, by combining functional gene manipulations of the host embryo (e.g., through morpholino microinjection) and live imaging, the effects of extrinsic, cell nonautonomous signals (provided by the genetically modified environment) on individual transplanted donor cells can be assessed. Here we demonstrate how this approach is used to compare the onset of fluorescent transgene expression as a proxy for the timing of cell fate determination in different genetic host environments. In this article, we provide the protocol for microinjecting zebrafish embryos to mark donor cells and to cause gene knockdown in host embryos, a description of the transplantation technique used to generate chimeric embryos, and the protocol for preparing and running in vivo time-lapse confocal imaging of multiple embryos. In particular, performing multiposition imaging is crucial when comparing timing of events such as the onset of gene expression. This requires data collection from multiple control and experimental embryos processed simultaneously. Such an approach can easily be extended for studies of extrinsic influences in any organ or tissue of choice accessible to live imaging, provided that transplantations can be targeted easily according to established embryonic fate maps.

  12. Gene Expression Profiling Predicts the Development of Oral Cancer

    PubMed Central

    Saintigny, Pierre; Zhang, Li; Fan, You-Hong; El-Naggar, Adel K.; Papadimitrakopoulou, Vali; Feng, Lei; Lee, J. Jack; Kim, Edward S.; Hong, Waun Ki; Mao, Li

    2011-01-01

    Patients with oral preneoplastic lesion (OPL) have high risk of developing oral cancer. Although certain risk factors such as smoking status and histology are known, our ability to predict oral cancer risk remains poor. The study objective was to determine the value of gene expression profiling in predicting oral cancer development. Gene expression profile was measured in 86 of 162 OPL patients who were enrolled in a clinical chemoprevention trial that used the incidence of oral cancer development as a prespecified endpoint. The median follow-up time was 6.08 years and 35 of the 86 patients developed oral cancer over the course. Gene expression profiles were associated with oral cancer-free survival and used to develope multivariate predictive models for oral cancer prediction. We developed a 29-transcript predictive model which showed marked improvement in terms of prediction accuracy (with 8% predicting error rate) over the models using previously known clinico-pathological risk factors. Based on the gene expression profile data, we also identified 2182 transcripts significantly associated with oral cancer risk associated genes (P-value<0.01, single variate Cox proportional hazards model). Functional pathway analysis revealed proteasome machinery, MYC, and ribosomes components as the top gene sets associated with oral cancer risk. In multiple independent datasets, the expression profiles of the genes can differentiate head and neck cancer from normal mucosa. Our results show that gene expression profiles may improve the prediction of oral cancer risk in OPL patients and the significant genes identified may serve as potential targets for oral cancer chemoprevention. PMID:21292635

  13. Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

    PubMed

    Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

    2018-05-10

    Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.

  14. The endogenous and reactive depression subtypes revisited: integrative animal and human studies implicate multiple distinct molecular mechanisms underlying major depressive disorder.

    PubMed

    Malki, Karim; Keers, Robert; Tosto, Maria Grazia; Lourdusamy, Anbarasu; Carboni, Lucia; Domenici, Enrico; Uher, Rudolf; McGuffin, Peter; Schalkwyk, Leonard C

    2014-05-07

    Traditional diagnoses of major depressive disorder (MDD) suggested that the presence or absence of stress prior to onset results in either 'reactive' or 'endogenous' subtypes of the disorder, respectively. Several lines of research suggest that the biological underpinnings of 'reactive' or 'endogenous' subtypes may also differ, resulting in differential response to treatment. We investigated this hypothesis by comparing the gene-expression profiles of three animal models of 'reactive' and 'endogenous' depression. We then translated these findings to clinical samples using a human post-mortem mRNA study. Affymetrix mouse whole-genome oligonucleotide arrays were used to measure gene expression from hippocampal tissues of 144 mice from the Genome-based Therapeutic Drugs for Depression (GENDEP) project. The study used four inbred mouse strains and two depressogenic 'stress' protocols (maternal separation and Unpredictable Chronic Mild Stress) to model 'reactive' depression. Stress-related mRNA differences in mouse were compared with a parallel mRNA study using Flinders Sensitive and Resistant rat lines as a model of 'endogenous' depression. Convergent genes differentially expressed across the animal studies were used to inform candidate gene selection in a human mRNA post-mortem case control study from the Stanley Brain Consortium. In the mouse 'reactive' model, the expression of 350 genes changed in response to early stresses and 370 in response to late stresses. A minimal genetic overlap (less than 8.8%) was detected in response to both stress protocols, but 30% of these genes (21) were also differentially regulated in the 'endogenous' rat study. This overlap is significantly greater than expected by chance. The VAMP-2 gene, differentially expressed across the rodent studies, was also significantly altered in the human study after correcting for multiple testing. Our results suggest that 'endogenous' and 'reactive' subtypes of depression are associated with largely distinct changes in gene-expression. However, they also suggest that the molecular signature of 'reactive' depression caused by early stressors differs considerably from that of 'reactive' depression caused by late stressors. A small set of genes was consistently dysregulated across each paradigm and in post-mortem brain tissue of depressed patients suggesting a final common pathway to the disorder. These genes included the VAMP-2 gene, which has previously been associated with Axis-I disorders including MDD, bipolar depression, schizophrenia and with antidepressant treatment response. We also discuss the implications of our findings for disease classification, personalized medicine and case-control studies of MDD.

  15. Theory of microbial genome evolution

    NASA Astrophysics Data System (ADS)

    Koonin, Eugene

    Bacteria and archaea have small genomes tightly packed with protein-coding genes. This compactness is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. By fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. Thus, the number of genes in prokaryotic genomes seems to reflect the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias. New genes acquired by microbial genomes, on average, appear to be adaptive. Evolution of bacterial and archaeal genomes involves extensive horizontal gene transfer and gene loss. Many microbes have open pangenomes, where each newly sequenced genome contains more than 10% `ORFans', genes without detectable homologues in other species. A simple, steady-state evolutionary model reveals two sharply distinct classes of microbial genes, one of which (ORFans) is characterized by effectively instantaneous gene replacement, whereas the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of at least a billion distinct genes in the prokaryotic genomic universe.

  16. Dynamic evolution of the GnRH receptor gene family in vertebrates.

    PubMed

    Williams, Barry L; Akazome, Yasuhisa; Oka, Yoshitaka; Eisthen, Heather L

    2014-10-25

    Elucidating the mechanisms underlying coevolution of ligands and receptors is an important challenge in molecular evolutionary biology. Peptide hormones and their receptors are excellent models for such efforts, given the relative ease of examining evolutionary changes in genes encoding for both molecules. Most vertebrates possess multiple genes for both the decapeptide gonadotropin releasing hormone (GnRH) and for the GnRH receptor. The evolutionary history of the receptor family, including ancestral copy number and timing of duplications and deletions, has been the subject of controversy. We report here for the first time sequences of three distinct GnRH receptor genes in salamanders (axolotls, Ambystoma mexicanum), which are orthologous to three GnRH receptors from ranid frogs. To understand the origin of these genes within the larger evolutionary context of the gene family, we performed phylogenetic analyses and probabilistic protein homology searches of GnRH receptor genes in vertebrates and their near relatives. Our analyses revealed four points that alter previous views about the evolution of the GnRH receptor gene family. First, the "mammalian" pituitary type GnRH receptor, which is the sole GnRH receptor in humans and previously presumed to be highly derived because it lacks the cytoplasmic C-terminal domain typical of most G-protein coupled receptors, is actually an ancient gene that originated in the common ancestor of jawed vertebrates (Gnathostomata). Second, unlike previous studies, we classify vertebrate GnRH receptors into five subfamilies. Third, the order of subfamily origins is the inverse of previous proposed models. Fourth, the number of GnRH receptor genes has been dynamic in vertebrates and their ancestors, with multiple duplications and losses. Our results provide a novel evolutionary framework for generating hypotheses concerning the functional importance of structural characteristics of vertebrate GnRH receptors. We show that five subfamilies of vertebrate GnRH receptors evolved early in the vertebrate phylogeny, followed by several independent instances of gene loss. Chief among cases of gene loss are humans, best described as degenerate with respect to GnRH receptors because we retain only a single, ancient gene.

  17. Proteasome, transporter associated with antigen processing, and class I genes in the nurse shark Ginglymostoma cirratum: evidence for a stable class I region and MHC haplotype lineages.

    PubMed

    Ohta, Yuko; McKinney, E Churchill; Criscitiello, Michael F; Flajnik, Martin F

    2002-01-15

    Cartilaginous fish (e.g., sharks) are derived from the oldest vertebrate ancestor having an adaptive immune system, and thus are key models for examining MHC evolution. Previously, family studies in two shark species showed that classical class I (UAA) and class II genes are genetically linked. In this study, we show that proteasome genes LMP2 and LMP7, shark-specific LMP7-like, and the TAP1/2 genes are linked to class I/II. Functional LMP7 and LMP7-like genes, as well as multiple LMP2 genes or gene fragments, are found only in some sharks, suggesting that different sets of peptides might be generated depending upon inherited MHC haplotypes. Cosmid clones bearing the MHC-linked classical class I genes were isolated and shown to contain proteasome gene fragments. A non-MHC-linked LMP7 gene also was identified on another cosmid, but only two exons of this gene were detected, closely linked to a class I pseudogene (UAA-NC2); this region probably resulted from a recent duplication and translocation from the functional MHC. Tight linkage of proteasome and class I genes, in comparison with gene organizations of other vertebrates, suggests a primordial MHC organization. Another nonclassical class I gene (UAA-NC1) was detected that is linked neither to MHC nor to UAA-NC2; its high level of sequence similarity to UAA suggests that UAA-NC1 also was recently derived from UAA and translocated from MHC. These data further support the principle of a primordial class I region with few class I genes. Finally, multiple paternities in one family were demonstrated, with potential segregation distortions.

  18. Simultaneous and Sequential Integration by Cre/loxP Site-Specific Recombination in Saccharomyces cerevisiae.

    PubMed

    Choi, Ho-Jung; Kim, Yeon-Hee

    2018-05-28

    A Cre/ loxP -δ-integration system was developed to allow sequential and simultaneous integration of a multiple gene expression cassette in Saccharomyces cerevisiae . To allow repeated integrations, the reusable Candida glabrata MARKER ( CgMARKER ) carrying loxP sequences was used, and the integrated CgMARKER was efficiently removed by inducing Cre recombinase. The XYLP and XYLB genes encoding endoxylanase and β-xylosidase, respectively, were used as model genes for xylan metabolism in this system, and the copy number of these genes was increased to 15.8 and 16.9 copies/cell, respectively, by repeated integration. This integration system is a promising approach for the easy construction of yeast strains with enhanced metabolic pathways through multicopy gene expression.

  19. Evolutionary Origins of Cancer Driver Genes and Implications for Cancer Prognosis

    PubMed Central

    Chu, Xin-Yi; Zhou, Xiong-Hui; Cui, Ze-Jia; Zhang, Hong-Yu

    2017-01-01

    The cancer atavistic theory suggests that carcinogenesis is a reverse evolution process. It is thus of great interest to explore the evolutionary origins of cancer driver genes and the relevant mechanisms underlying the carcinogenesis. Moreover, the evolutionary features of cancer driver genes could be helpful in selecting cancer biomarkers from high-throughput data. In this study, through analyzing the cancer endogenous molecular networks, we revealed that the subnetwork originating from eukaryota could control the unlimited proliferation of cancer cells, and the subnetwork originating from eumetazoa could recapitulate the other hallmarks of cancer. In addition, investigations based on multiple datasets revealed that cancer driver genes were enriched in genes originating from eukaryota, opisthokonta, and eumetazoa. These results have important implications for enhancing the robustness of cancer prognosis models through selecting the gene signatures by the gene age information. PMID:28708071

  20. Evolutionary Origins of Cancer Driver Genes and Implications for Cancer Prognosis.

    PubMed

    Chu, Xin-Yi; Jiang, Ling-Han; Zhou, Xiong-Hui; Cui, Ze-Jia; Zhang, Hong-Yu

    2017-07-14

    The cancer atavistic theory suggests that carcinogenesis is a reverse evolution process. It is thus of great interest to explore the evolutionary origins of cancer driver genes and the relevant mechanisms underlying the carcinogenesis. Moreover, the evolutionary features of cancer driver genes could be helpful in selecting cancer biomarkers from high-throughput data. In this study, through analyzing the cancer endogenous molecular networks, we revealed that the subnetwork originating from eukaryota could control the unlimited proliferation of cancer cells, and the subnetwork originating from eumetazoa could recapitulate the other hallmarks of cancer. In addition, investigations based on multiple datasets revealed that cancer driver genes were enriched in genes originating from eukaryota, opisthokonta, and eumetazoa. These results have important implications for enhancing the robustness of cancer prognosis models through selecting the gene signatures by the gene age information.

  1. Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum

    PubMed Central

    Artzy-Randrup, Yael; Rorick, Mary M; Day, Karen; Chen, Donald; Dobson, Andrew P; Pascual, Mercedes

    2012-01-01

    The coexistence of multiple independently circulating strains in pathogen populations that undergo sexual recombination is a central question of epidemiology with profound implications for control. An agent-based model is developed that extends earlier ‘strain theory’ by addressing the var gene family of Plasmodium falciparum. The model explicitly considers the extensive diversity of multi-copy genes that undergo antigenic variation via sequential, mutually exclusive expression. It tracks the dynamics of all unique var repertoires in a population of hosts, and shows that even under high levels of sexual recombination, strain competition mediated through cross-immunity structures the parasite population into a subset of coexisting dominant repertoires of var genes whose degree of antigenic overlap depends on transmission intensity. Empirical comparison of patterns of genetic variation at antigenic and neutral sites supports this role for immune selection in structuring parasite diversity. DOI: http://dx.doi.org/10.7554/eLife.00093.001 PMID:23251784

  2. Biological data warehousing system for identifying transcriptional regulatory sites from gene expressions of microarray data.

    PubMed

    Tsou, Ann-Ping; Sun, Yi-Ming; Liu, Chia-Lin; Huang, Hsien-Da; Horng, Jorng-Tzong; Tsai, Meng-Feng; Liu, Baw-Juine

    2006-07-01

    Identification of transcriptional regulatory sites plays an important role in the investigation of gene regulation. For this propose, we designed and implemented a data warehouse to integrate multiple heterogeneous biological data sources with data types such as text-file, XML, image, MySQL database model, and Oracle database model. The utility of the biological data warehouse in predicting transcriptional regulatory sites of coregulated genes was explored using a synexpression group derived from a microarray study. Both of the binding sites of known transcription factors and predicted over-represented (OR) oligonucleotides were demonstrated for the gene group. The potential biological roles of both known nucleotides and one OR nucleotide were demonstrated using bioassays. Therefore, the results from the wet-lab experiments reinforce the power and utility of the data warehouse as an approach to the genome-wide search for important transcription regulatory elements that are the key to many complex biological systems.

  3. Disentangling the many layers of eukaryotic transcriptional regulation.

    PubMed

    Lelli, Katherine M; Slattery, Matthew; Mann, Richard S

    2012-01-01

    Regulation of gene expression in eukaryotes is an extremely complex process. In this review, we break down several critical steps, emphasizing new data and techniques that have expanded current gene regulatory models. We begin at the level of DNA sequence where cis-regulatory modules (CRMs) provide important regulatory information in the form of transcription factor (TF) binding sites. In this respect, CRMs function as instructional platforms for the assembly of gene regulatory complexes. We discuss multiple mechanisms controlling complex assembly, including cooperative DNA binding, combinatorial codes, and CRM architecture. The second section of this review places CRM assembly in the context of nucleosomes and condensed chromatin. We discuss how DNA accessibility and histone modifications contribute to TF function. Lastly, new advances in chromosomal mapping techniques have provided increased understanding of intra- and interchromosomal interactions. We discuss how these topological maps influence gene regulatory models.

  4. Creating and validating cis-regulatory maps of tissue-specific gene expression regulation

    PubMed Central

    O'Connor, Timothy R.; Bailey, Timothy L.

    2014-01-01

    Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps. PMID:25200088

  5. CFH Variants Affect Structural and Functional Brain Changes and Genetic Risk of Alzheimer's Disease.

    PubMed

    Zhang, Deng-Feng; Li, Jin; Wu, Huan; Cui, Yue; Bi, Rui; Zhou, He-Jiang; Wang, Hui-Zhen; Zhang, Chen; Wang, Dong; Kong, Qing-Peng; Li, Tao; Fang, Yiru; Jiang, Tianzi; Yao, Yong-Gang

    2016-03-01

    The immune response is highly active in Alzheimer's disease (AD). Identification of genetic risk contributed by immune genes to AD may provide essential insight for the prognosis, diagnosis, and treatment of this neurodegenerative disease. In this study, we performed a genetic screening for AD-related top immune genes identified in Europeans in a Chinese cohort, followed by a multiple-stage study focusing on Complement Factor H (CFH) gene. Effects of the risk SNPs on AD-related neuroimaging endophenotypes were evaluated through magnetic resonance imaging scan, and the effects on AD cerebrospinal fluid biomarkers (CSF) and CFH expression changes were measured in aged and AD brain tissues and AD cellular models. Our results showed that the AD-associated top immune genes reported in Europeans (CR1, CD33, CLU, and TREML2) have weak effects in Chinese, whereas CFH showed strong effects. In particular, rs1061170 (P(meta)=5.0 × 10(-4)) and rs800292 (P(meta)=1.3 × 10(-5)) showed robust associations with AD, which were confirmed in multiple world-wide sample sets (4317 cases and 16 795 controls). Rs1061170 (P=2.5 × 10(-3)) and rs800292 (P=4.7 × 10(-4)) risk-allele carriers have an increased entorhinal thickness in their young age and a higher atrophy rate as the disease progresses. Rs800292 risk-allele carriers have higher CSF tau and Aβ levels and severe cognitive decline. CFH expression level, which was affected by the risk-alleles, was increased in AD brains and cellular models. These comprehensive analyses suggested that CFH is an important immune factor in AD and affects multiple pathological changes in early life and during disease progress.

  6. Gene prioritization and clustering by multi-view text mining

    PubMed Central

    2010-01-01

    Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336

  7. Culture adaptation of malaria parasites selects for convergent loss-of-function mutants.

    PubMed

    Claessens, Antoine; Affara, Muna; Assefa, Samuel A; Kwiatkowski, Dominic P; Conway, David J

    2017-01-24

    Cultured human pathogens may differ significantly from source populations. To investigate the genetic basis of laboratory adaptation in malaria parasites, clinical Plasmodium falciparum isolates were sampled from patients and cultured in vitro for up to three months. Genome sequence analysis was performed on multiple culture time point samples from six monoclonal isolates, and single nucleotide polymorphism (SNP) variants emerging over time were detected. Out of a total of five positively selected SNPs, four represented nonsense mutations resulting in stop codons, three of these in a single ApiAP2 transcription factor gene, and one in SRPK1. To survey further for nonsense mutants associated with culture, genome sequences of eleven long-term laboratory-adapted parasite strains were examined, revealing four independently acquired nonsense mutations in two other ApiAP2 genes, and five in Epac. No mutants of these genes exist in a large database of parasite sequences from uncultured clinical samples. This implicates putative master regulator genes in which multiple independent stop codon mutations have convergently led to culture adaptation, affecting most laboratory lines of P. falciparum. Understanding the adaptive processes should guide development of experimental models, which could include targeted gene disruption to adapt fastidious malaria parasite species to culture.

  8. Punctuated Evolution of Prostate Cancer Genomes

    PubMed Central

    Baca, Sylvan C.; Prandi, Davide; Lawrence, Michael S.; Mosquera, Juan Miguel; Romanel, Alessandro; Drier, Yotam; Park, Kyung; Kitabayashi, Naoki; MacDonald, Theresa Y.; Ghandi, Mahmoud; Van Allen, Eliezer; Kryukov, Gregory V.; Sboner, Andrea; Theurillat, Jean-Philippe; Soong, T. David; Nickerson, Elizabeth; Auclair, Daniel; Tewari, Ashutosh; Beltran, Himisha; Onofrio, Robert C.; Boysen, Gunther; Guiducci, Candace; Barbieri, Christopher E.; Cibulskis, Kristian; Sivachenko, Andrey; Carter, Scott L.; Saksena, Gordon; Voet, Douglas; Ramos, Alex H; Winckler, Wendy; Cipicchio, Michelle; Ardlie, Kristin; Kantoff, Philip W.; Berger, Michael F.; Gabriel, Stacey B.; Golub, Todd R.; Meyerson, Matthew; Lander, Eric S.; Elemento, Olivier; Getz, Gad; Demichelis, Francesca; Rubin, Mark A.; Garraway, Levi A.

    2013-01-01

    SUMMARY The analysis of exonic DNA from prostate cancers has identified recurrently mutated genes, but the spectrum of genome-wide alterations has not been profiled extensively in this disease. We sequenced the genomes of 57 prostate tumors and matched normal tissues to characterize somatic alterations and to study how they accumulate during oncogenesis and progression. By modeling the genesis of genomic rearrangements, we identified abundant DNA translocations and deletions that arise in a highly interdependent manner. This phenomenon, which we term “chromoplexy”, frequently accounts for the dysregulation of prostate cancer genes and appears to disrupt multiple cancer genes coordinately. Our modeling suggests that chromoplexy may induce considerable genomic derangement over relatively few events in prostate cancer and other neoplasms, supporting a model of punctuated cancer evolution. By characterizing the clonal hierarchy of genomic lesions in prostate tumors, we charted a path of oncogenic events along which chromoplexy may drive prostate carcinogenesis. PMID:23622249

  9. Punctuated evolution of prostate cancer genomes.

    PubMed

    Baca, Sylvan C; Prandi, Davide; Lawrence, Michael S; Mosquera, Juan Miguel; Romanel, Alessandro; Drier, Yotam; Park, Kyung; Kitabayashi, Naoki; MacDonald, Theresa Y; Ghandi, Mahmoud; Van Allen, Eliezer; Kryukov, Gregory V; Sboner, Andrea; Theurillat, Jean-Philippe; Soong, T David; Nickerson, Elizabeth; Auclair, Daniel; Tewari, Ashutosh; Beltran, Himisha; Onofrio, Robert C; Boysen, Gunther; Guiducci, Candace; Barbieri, Christopher E; Cibulskis, Kristian; Sivachenko, Andrey; Carter, Scott L; Saksena, Gordon; Voet, Douglas; Ramos, Alex H; Winckler, Wendy; Cipicchio, Michelle; Ardlie, Kristin; Kantoff, Philip W; Berger, Michael F; Gabriel, Stacey B; Golub, Todd R; Meyerson, Matthew; Lander, Eric S; Elemento, Olivier; Getz, Gad; Demichelis, Francesca; Rubin, Mark A; Garraway, Levi A

    2013-04-25

    The analysis of exonic DNA from prostate cancers has identified recurrently mutated genes, but the spectrum of genome-wide alterations has not been profiled extensively in this disease. We sequenced the genomes of 57 prostate tumors and matched normal tissues to characterize somatic alterations and to study how they accumulate during oncogenesis and progression. By modeling the genesis of genomic rearrangements, we identified abundant DNA translocations and deletions that arise in a highly interdependent manner. This phenomenon, which we term "chromoplexy," frequently accounts for the dysregulation of prostate cancer genes and appears to disrupt multiple cancer genes coordinately. Our modeling suggests that chromoplexy may induce considerable genomic derangement over relatively few events in prostate cancer and other neoplasms, supporting a model of punctuated cancer evolution. By characterizing the clonal hierarchy of genomic lesions in prostate tumors, we charted a path of oncogenic events along which chromoplexy may drive prostate carcinogenesis. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Advances and Challenges in Genomic Selection for Disease Resistance.

    PubMed

    Poland, Jesse; Rutkoski, Jessica

    2016-08-04

    Breeding for disease resistance is a central focus of plant breeding programs, as any successful variety must have the complete package of high yield, disease resistance, agronomic performance, and end-use quality. With the need to accelerate the development of improved varieties, genomics-assisted breeding is becoming an important tool in breeding programs. With marker-assisted selection, there has been success in breeding for disease resistance; however, much of this work and research has focused on identifying, mapping, and selecting for major resistance genes that tend to be highly effective but vulnerable to breakdown with rapid changes in pathogen races. In contrast, breeding for minor-gene quantitative resistance tends to produce more durable varieties but is a more challenging breeding objective. As the genetic architecture of resistance shifts from single major R genes to a diffused architecture of many minor genes, the best approach for molecular breeding will shift from marker-assisted selection to genomic selection. Genomics-assisted breeding for quantitative resistance will therefore necessitate whole-genome prediction models and selection methodology as implemented for classical complex traits such as yield. Here, we examine multiple case studies testing whole-genome prediction models and genomic selection for disease resistance. In general, whole-genome models for disease resistance can produce prediction accuracy suitable for application in breeding. These models also largely outperform multiple linear regression as would be applied in marker-assisted selection. With the implementation of genomic selection for yield and other agronomic traits, whole-genome marker profiles will be available for the entire set of breeding lines, enabling genomic selection for disease at no additional direct cost. In this context, the scope of implementing genomics selection for disease resistance, and specifically for quantitative resistance and quarantined pathogens, becomes a tractable and powerful approach in breeding programs.

  11. Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations

    DOE PAGES

    Bendall, Matthew L.; Stevens, Sarah L.R.; Chan, Leong-Keat; ...

    2016-01-08

    Multiple models describe the formation and evolution of distinct microbial phylogenetic groups. These evolutionary models make different predictions regarding how adaptive alleles spread through populations and how genetic diversity is maintained. Processes predicted by competing evolutionary models, for example, genome-wide selective sweeps vs gene-specific sweeps, could be captured in natural populations using time-series metagenomics if the approach were applied over a sufficiently long time frame. Direct observations of either process would help resolve how distinct microbial groups evolve. Using a 9-year metagenomic study of a freshwater lake (2005–2013), we explore changes in single-nucleotide polymorphism (SNP) frequencies and patterns of genemore » gain and loss in 30 bacterial populations. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied by >1000-fold among populations. SNP allele frequencies also changed dramatically over time within some populations. Interestingly, nearly all SNP variants were slowly purged over several years from one population of green sulfur bacteria, while at the same time multiple genes either swept through or were lost from this population. Furthermore, these patterns were consistent with a genome-wide selective sweep in progress, a process predicted by the ‘ecotype model’ of speciation but not previously observed in nature. In contrast, other populations contained large, SNP-free genomic regions that appear to have swept independently through the populations prior to the study without purging diversity elsewhere in the genome. Finally, evidence for both genome-wide and gene-specific sweeps suggests that different models of bacterial speciation may apply to different populations coexisting in the same environment.« less

  12. An integrative model of evolutionary covariance: a symposium on body shape in fishes.

    PubMed

    Walker, Jeffrey A

    2010-12-01

    A major direction of current and future biological research is to understand how multiple, interacting functional systems coordinate in producing a body that works. This understanding is complicated by the fact that organisms need to work well in multiple environments, with both predictable and unpredictable environmental perturbations. Furthermore, organismal design reflects a history of past environments and not a plan for future environments. How complex, interacting functional systems evolve, then, is a truly grand challenge. In accepting the challenge, an integrative model of evolutionary covariance is developed. The model combines quantitative genetics, functional morphology/physiology, and functional ecology. The model is used to convene scientists ranging from geneticists, to physiologists, to ecologists, to engineers to facilitate the emergence of body shape in fishes as a model system for understanding how complex, interacting functional systems develop and evolve. Body shape of fish is a complex morphology that (1) results from many developmental paths and (2) functions in many different behaviors. Understanding the coordination and evolution of the many paths from genes to body shape, body shape to function, and function to a working fish body in a dynamic environment is now possible given new technologies from genetics to engineering and new theoretical models that integrate the different levels of biological organization (from genes to ecology).

  13. Gene flow from single and stacked herbicide-resistant rice (Oryza sativa): modeling occurrence of multiple herbicide-resistant weedy rice.

    PubMed

    Dauer, Joseph; Hulting, Andrew; Carlson, Dale; Mankin, Luke; Harden, John; Mallory-Smith, Carol

    2018-02-01

    Provisia™ rice (PV), a non-genetically engineered (GE) quizalofop-resistant rice, will provide growers with an additional option for weed management to use in conjunction with Clearfield ® rice (CL) production. Modeling compared the impact of stacking resistance traits versus single traits in rice on introgression of the resistance trait to weedy rice (also called red rice). Common weed management practices were applied to 2-, 3- and 4-year crop rotations, and resistant and multiple-resistant weedy rice seeds, seedlings and mature plants were tracked for 15 years. Two-year crop rotations resulted in resistant weedy rice after 2 years with abundant populations (exceeding 0.4 weedy rice plants m -2 ) occurring after 7 years. When stacked trait rice was rotated with soybeans in a 3-year rotation and with soybeans and CL in a 4-year rotation, multiple-resistance occurred after 2-5 years with abundant populations present in 4-9 years. When CL rice, PV rice, and soybeans were used in 3- and 4-year rotations, the median time of first appearance of multiple-resistance was 7-11 years and reached abundant levels in 10-15 years. Maintaining separate CL and PV rice systems, in rotation with other crops and herbicides, minimized the evolution of multiple herbicide-resistant weedy rice through gene flow compared to stacking herbicide resistance traits. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  14. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    PubMed

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research.

  15. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach

    PubMed Central

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Introduction Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. Aim The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Methods Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate – adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Results Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. Conclusion To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research. PMID:26080057

  16. Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network

    PubMed Central

    Kozlov, Konstantin N.; Kulakovskiy, Ivan V.; Zubair, Asif; Marjoram, Paul; Lawrie, David S.; Nuzhdin, Sergey V.; Samsonova, Maria G.

    2017-01-01

    Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects. PMID:28898266

  17. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation.

    PubMed

    Wang, Xiyin; Guo, Hui; Wang, Jinpeng; Lei, Tianyu; Liu, Tao; Wang, Zhenyi; Li, Yuxian; Lee, Tae-Ho; Li, Jingping; Tang, Haibao; Jin, Dianchuan; Paterson, Andrew H

    2016-02-01

    The 'apparently' simple genomes of many angiosperms mask complex evolutionary histories. The reference genome sequence for cotton (Gossypium spp.) revealed a ploidy change of a complexity unprecedented to date, indeed that could not be distinguished as to its exact dosage. Herein, by developing several comparative, computational and statistical approaches, we revealed a 5× multiplication in the cotton lineage of an ancestral genome common to cotton and cacao, and proposed evolutionary models to show how such a decaploid ancestor formed. The c. 70% gene loss necessary to bring the ancestral decaploid to its current gene count appears to fit an approximate geometrical model; that is, although many genes may be lost by single-gene deletion events, some may be lost in groups of consecutive genes. Gene loss following cotton decaploidy has largely just reduced gene copy numbers of some homologous groups. We designed a novel approach to deconvolute layers of chromosome homology, providing definitive information on gene orthology and paralogy across broad evolutionary distances, both of fundamental value and serving as an important platform to support further studies in and beyond cotton and genomics communities. No claim to original US government works. New Phytologist © 2015 New Phytologist Trust.

  18. Comparing GWAS Results of Complex Traits Using Full Genetic Model and Additive Models for Revealing Genetic Architecture

    PubMed Central

    Monir, Md. Mamun; Zhu, Jun

    2017-01-01

    Most of the genome-wide association studies (GWASs) for human complex diseases have ignored dominance, epistasis and ethnic interactions. We conducted comparative GWASs for total cholesterol using full model and additive models, which illustrate the impacts of the ignoring genetic variants on analysis results and demonstrate how genetic effects of multiple loci could differ across different ethnic groups. There were 15 quantitative trait loci with 13 individual loci and 3 pairs of epistasis loci identified by full model, whereas only 14 loci (9 common loci and 5 different loci) identified by multi-loci additive model. Again, 4 full model detected loci were not detected using multi-loci additive model. PLINK-analysis identified two loci and GCTA-analysis detected only one locus with genome-wide significance. Full model identified three previously reported genes as well as several new genes. Bioinformatics analysis showed some new genes are related with cholesterol related chemicals and/or diseases. Analyses of cholesterol data and simulation studies revealed that the full model performs were better than the additive-model performs in terms of detecting power and unbiased estimations of genetic variants of complex traits. PMID:28079101

  19. Cluster Analysis of Campylobacter jejuni Genotypes Isolated from Small and Medium-Sized Mammalian Wildlife and Bovine Livestock from Ontario Farms.

    PubMed

    Viswanathan, M; Pearl, D L; Taboada, E N; Parmley, E J; Mutschall, S K; Jardine, C M

    2017-05-01

    Using data collected from a cross-sectional study of 25 farms (eight beef, eight swine and nine dairy) in 2010, we assessed clustering of molecular subtypes of C. jejuni based on a Campylobacter-specific 40 gene comparative genomic fingerprinting assay (CGF40) subtypes, using unweighted pair-group method with arithmetic mean (UPGMA) analysis, and multiple correspondence analysis. Exact logistic regression was used to determine which genes differentiate wildlife and livestock subtypes in our study population. A total of 33 bovine livestock (17 beef and 16 dairy), 26 wildlife (20 raccoon (Procyon lotor), five skunk (Mephitis mephitis) and one mouse (Peromyscus spp.) C. jejuni isolates were subtyped using CGF40. Dendrogram analysis, based on UPGMA, showed distinct branches separating bovine livestock and mammalian wildlife isolates. Furthermore, two-dimensional multiple correspondence analysis was highly concordant with dendrogram analysis showing clear differentiation between livestock and wildlife CGF40 subtypes. Based on multilevel logistic regression models with a random intercept for farm of origin, we found that isolates in general, and raccoons more specifically, were significantly more likely to be part of the wildlife branch. Exact logistic regression conducted gene by gene revealed 15 genes that were predictive of whether an isolate was of wildlife or bovine livestock isolate origin. Both multiple correspondence analysis and exact logistic regression revealed that in most cases, the presence of a particular gene (13 of 15) was associated with an isolate being of livestock rather than wildlife origin. In conclusion, the evidence gained from dendrogram analysis, multiple correspondence analysis and exact logistic regression indicates that mammalian wildlife carry CGF40 subtypes of C. jejuni distinct from those carried by bovine livestock. Future studies focused on source attribution of C. jejuni in human infections will help determine whether wildlife transmit Campylobacter jejuni directly to humans. © 2016 Blackwell Verlag GmbH.

  20. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION DATA.

    PubMed

    Wu, Zheyang; Zhao, Hongyu

    2012-01-01

    For more fruitful discoveries of genetic variants associated with diseases in genome-wide association studies, it is important to know whether joint analysis of multiple markers is more powerful than the commonly used single-marker analysis, especially in the presence of gene-gene interactions. This article provides a statistical framework to rigorously address this question through analytical power calculations for common model search strategies to detect binary trait loci: marginal search, exhaustive search, forward search, and two-stage screening search. Our approach incorporates linkage disequilibrium, random genotypes, and correlations among score test statistics of logistic regressions. We derive analytical results under two power definitions: the power of finding all the associated markers and the power of finding at least one associated marker. We also consider two types of error controls: the discovery number control and the Bonferroni type I error rate control. After demonstrating the accuracy of our analytical results by simulations, we apply them to consider a broad genetic model space to investigate the relative performances of different model search strategies. Our analytical study provides rapid computation as well as insights into the statistical mechanism of capturing genetic signals under different genetic models including gene-gene interactions. Even though we focus on genetic association analysis, our results on the power of model selection procedures are clearly very general and applicable to other studies.

  1. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION DATA

    PubMed Central

    Wu, Zheyang; Zhao, Hongyu

    2013-01-01

    For more fruitful discoveries of genetic variants associated with diseases in genome-wide association studies, it is important to know whether joint analysis of multiple markers is more powerful than the commonly used single-marker analysis, especially in the presence of gene-gene interactions. This article provides a statistical framework to rigorously address this question through analytical power calculations for common model search strategies to detect binary trait loci: marginal search, exhaustive search, forward search, and two-stage screening search. Our approach incorporates linkage disequilibrium, random genotypes, and correlations among score test statistics of logistic regressions. We derive analytical results under two power definitions: the power of finding all the associated markers and the power of finding at least one associated marker. We also consider two types of error controls: the discovery number control and the Bonferroni type I error rate control. After demonstrating the accuracy of our analytical results by simulations, we apply them to consider a broad genetic model space to investigate the relative performances of different model search strategies. Our analytical study provides rapid computation as well as insights into the statistical mechanism of capturing genetic signals under different genetic models including gene-gene interactions. Even though we focus on genetic association analysis, our results on the power of model selection procedures are clearly very general and applicable to other studies. PMID:23956610

  2. Combinations of chromosome transfer and genome editing for the development of cell/animal models of human disease and humanized animal models.

    PubMed

    Uno, Narumi; Abe, Satoshi; Oshimura, Mitsuo; Kazuki, Yasuhiro

    2018-02-01

    Chromosome transfer technology, including chromosome modification, enables the introduction of Mb-sized or multiple genes to desired cells or animals. This technology has allowed innovative developments to be made for models of human disease and humanized animals, including Down syndrome model mice and humanized transchromosomic (Tc) immunoglobulin mice. Genome editing techniques are developing rapidly, and permit modifications such as gene knockout and knockin to be performed in various cell lines and animals. This review summarizes chromosome transfer-related technologies and the combined technologies of chromosome transfer and genome editing mainly for the production of cell/animal models of human disease and humanized animal models. Specifically, these include: (1) chromosome modification with genome editing in Chinese hamster ovary cells and mouse A9 cells for efficient transfer to desired cell types; (2) single-nucleotide polymorphism modification in humanized Tc mice with genome editing; and (3) generation of a disease model of Down syndrome-associated hematopoiesis abnormalities by the transfer of human chromosome 21 to normal human embryonic stem cells and the induction of mutation(s) in the endogenous gene(s) with genome editing. These combinations of chromosome transfer and genome editing open up new avenues for drug development and therapy as well as for basic research.

  3. Two FGFRL-Wnt circuits organize the planarian anteroposterior axis

    PubMed Central

    Scimone, M Lucila; Cote, Lauren E; Rogers, Travis; Reddien, Peter W

    2016-01-01

    How positional information instructs adult tissue maintenance is poorly understood. Planarians undergo whole-body regeneration and tissue turnover, providing a model for adult positional information studies. Genes encoding secreted and transmembrane components of multiple developmental pathways are predominantly expressed in planarian muscle cells. Several of these genes regulate regional identity, consistent with muscle harboring positional information. Here, single-cell RNA-sequencing of 115 muscle cells from distinct anterior-posterior regions identified 44 regionally expressed genes, including multiple Wnt and ndk/FGF receptor-like (ndl/FGFRL) genes. Two distinct FGFRL-Wnt circuits, involving juxtaposed anterior FGFRL and posterior Wnt expression domains, controlled planarian head and trunk patterning. ndl-3 and wntP-2 inhibition expanded the trunk, forming ectopic mouths and secondary pharynges, which independently extended and ingested food. fz5/8-4 inhibition, like that of ndk and wntA, caused posterior brain expansion and ectopic eye formation. Our results suggest that FGFRL-Wnt circuits operate within a body-wide coordinate system to control adult axial positioning. DOI: http://dx.doi.org/10.7554/eLife.12845.001 PMID:27063937

  4. TimeXNet Web: Identifying cellular response networks from diverse omics time-course data.

    PubMed

    Tan, Phit Ling; López, Yosvany; Nakai, Kenta; Patil, Ashwini

    2018-05-14

    Condition-specific time-course omics profiles are frequently used to study cellular response to stimuli and identify associated signaling pathways. However, few online tools allow users to analyze multiple types of high-throughput time-course data. TimeXNet Web is a web server that extracts a time-dependent gene/protein response network from time-course transcriptomic, proteomic or phospho-proteomic data, and an input interaction network. It classifies the given genes/proteins into time-dependent groups based on the time of their highest activity and identifies the most probable paths connecting genes/proteins in consecutive groups. The response sub-network is enriched in activated genes/proteins and contains novel regulators that do not show any observable change in the input data. Users can view the resultant response network and analyze it for functional enrichment. TimeXNet Web supports the analysis of high-throughput data from multiple species by providing high quality, weighted protein-protein interaction networks for 12 model organisms. http://txnet.hgc.jp/. ashwini@hgc.jp. Supplementary data are available at Bioinformatics online.

  5. Multiple schwannomatosis caused by the recently described INI1 gene--molecular pathology, and implications for prognosis.

    PubMed

    Brennan, Paul M; Barlow, Antonio; Geraghty, Alistair; Summers, David; Fitzpatrick, Michael M

    2011-06-01

    The most common genetic predisposition to multiple schwannoma growth is mutation of the neurofibromatosis type 2 gene. We describe a patient with multiple schwannomas and mutation in the recently described INI1 gene, which also predisposes to the disease. We explore the implications for prognosis and outcome.

  6. Cloning and analysis of the positively acting regulatory gene amdR from Aspergillus nidulans.

    PubMed Central

    Andrianopoulos, A; Hynes, M J

    1988-01-01

    The positively acting regulatory gene amdR of Aspergillus nidulans coordinately regulates the expression of four unlinked structural genes involved in acetamide (amdS), omega amino acid (gatA and gabA), and lactam (lamA) catabolism. By the use of DNA-mediated transformation of A. nidulans, the amdR regulatory gene was cloned from a genomic cosmid library. Southern blot analysis of DNA from various loss-of-function amdR mutants revealed the presence of four detectable DNA rearrangements, including a deletion, an insertion, and a translocation. No detectable DNA rearrangements were found in several constitutive amdRc mutants. Analysis of the fate of amdR-bearing plasmids in transformants showed that 10 to 20% of the transformation events were homologous integrations or gene conversions, and this phenomenon was exploited in developing a strategy by which amdRc and amdR- alleles can be readily cloned and analyzed. Examination of the transcription of amdR by Northern blot (RNA blot) analysis revealed the presence of two mRNAs (2.7 and 1.8 kilobases) which were constitutively synthesized at a very low level. In addition, amdR transcription did not appear to depend on the presence of a functional amdR product nor was it altered in amdRc mutants. The dosage effects of multiple copies of amdR in transformants were examined, and it was shown that such transformants exhibited stronger growth than did the wild type on acetamide and pyrrolidinone media, indicating increased expression of the amdS and lamA genes, respectively. These results were used to formulate a model for amdR-mediated regulation of gene expression in which the low constitutive level of amdR product sets the upper limits of basal and induced transcription of the structural genes. Multiple copies of 5' sequences from the amdS gene can result in reduced growth on substrates whose utilization is dependent on amdR-controlled genes. This has been attributed to titration of limiting amdR gene product. Strong support for this proposal was obtained by showing that multiple copies of the amdR gene can reverse this phenomenon (antititration). Images PMID:3062382

  7. Natural genetic variation profoundly regulates gene expression in immune cells and dictates susceptibility to CNS autoimmunity

    PubMed Central

    Bearoff, Frank; del Rio, Roxana; Case, Laure K.; Dragon, Julie A.; Nguyen-Vu, Trang; Lin, Chin-Yo; Blankenhorn, Elizabeth P.; Teuscher, Cory; Krementsov, Dimitry N.

    2016-01-01

    Regulation of gene expression in immune cells is known to be under genetic control, and likely contributes to susceptibility to autoimmune diseases, such as multiple sclerosis (MS). How this occurs in concert across multiple immune cell types is poorly understood. Using a mouse model that harnesses the genetic diversity of wild-derived mice, more accurately reflecting genetically diverse human populations, we provide an extensive characterization of the genetic regulation of gene expression in five different naïve immune cell types relevant to MS. The immune cell transcriptome is shown to be under profound genetic control, exhibiting diverse patterns: global, cell-specific, and sex-specific. Bioinformatic analysis of the genetically-controlled transcript networks reveals reduced cell type-specificity and inflammatory activity in wild-derived PWD/PhJ mice, compared with the conventional laboratory strain C57BL/6J. Additionally, candidate MS-GWAS genes were significantly enriched among transcripts overrepresented in C57BL/6J cells compared to PWD. These expression level differences correlate with robust differences in susceptibility to experimental autoimmune encephalomyelitis, the principal model of MS, and skewing of the encephalitogenic T cell responses. Taken together, our results provide functional insights into the genetic regulation of the immune transcriptome, and shed light on how this in turn contributes to susceptibility to autoimmune disease. PMID:27653816

  8. EVALUATING VIRULENCE OF WATERBORNE AND CLINCIAL AEROMONAS ISOLATES USING GENE EXPRESSION AND MORTALITY IN NEONATAL MICE FOLLOWED BY ASSESSING CELL CULTURE'S ABILITY TO PREDICT VIRULENCE BASED ON TRANSCRIPTIONAL RESPONSE

    EPA Science Inventory

    The virulence of multiple Aeromonas spp. were assessed using two models, a neonatal mouse assay and a mouse intestinal cell culture. Transcriptional responses to both infection models were assessed using microarrays. After artificial infection with a variety of Aeromonas spp., ...

  9. The genetic interacting landscape of 63 candidate genes in Major Depressive Disorder: an explorative study.

    PubMed

    Lekman, Magnus; Hössjer, Ola; Andrews, Peter; Källberg, Henrik; Uvehag, Daniel; Charney, Dennis; Manji, Husseini; Rush, John A; McMahon, Francis J; Moore, Jason H; Kockum, Ingrid

    2014-01-01

    Genetic contributions to major depressive disorder (MDD) are thought to result from multiple genes interacting with each other. Different procedures have been proposed to detect such interactions. Which approach is best for explaining the risk of developing disease is unclear. This study sought to elucidate the genetic interaction landscape in candidate genes for MDD by conducting a SNP-SNP interaction analysis using an exhaustive search through 3,704 SNP-markers in 1,732 cases and 1,783 controls provided from the GAIN MDD study. We used three different methods to detect interactions, two logistic regressions models (multiplicative and additive) and one data mining and machine learning (MDR) approach. Although none of the interaction survived correction for multiple comparisons, the results provide important information for future genetic interaction studies in complex disorders. Among the 0.5% most significant observations, none had been reported previously for risk to MDD. Within this group of interactions, less than 0.03% would have been detectable based on main effect approach or an a priori algorithm. We evaluated correlations among the three different models and conclude that all three algorithms detected the same interactions to a low degree. Although the top interactions had a surprisingly large effect size for MDD (e.g. additive dominant model Puncorrected = 9.10E-9 with attributable proportion (AP) value = 0.58 and multiplicative recessive model with Puncorrected = 6.95E-5 with odds ratio (OR estimated from β3) value = 4.99) the area under the curve (AUC) estimates were low (< 0.54). Moreover, the population attributable fraction (PAF) estimates were also low (< 0.15). We conclude that the top interactions on their own did not explain much of the genetic variance of MDD. The different statistical interaction methods we used in the present study did not identify the same pairs of interacting markers. Genetic interaction studies may uncover previously unsuspected effects that could provide novel insights into MDD risk, but much larger sample sizes are needed before this strategy can be powerfully applied.

  10. An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

    PubMed

    Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

    2012-01-01

    Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.

  11. Molecular study on some antibiotic resistant genes in Salmonella spp. isolates

    NASA Astrophysics Data System (ADS)

    Nabi, Ari Q.

    2017-09-01

    Studying the genes related with antimicrobial resistance in Salmonella spp. is a crucial step toward a correct and faster treatment of infections caused by the pathogen. In this work Integron mediated antibiotic resistant gene IntI1 (Class I Integrase IntI1) and some plasmid mediated antibiotic resistance genes (Qnr) were scanned among the isolated non-Typhoid Salmonellae strains with known resistance to some important antimicrobial drugs using Sybr Green real time PCR. The aim of the study was to correlate the multiple antibiotics and antimicrobial resistance of Salmonella spp. with the presence of integrase (IntI1) gene and plasmid mediated quinolone resistant genes. Results revealed the presence of Class I Integrase gene in 76% of the isolates with confirmed multiple antibiotic resistances. Moreover, about 32% of the multiple antibiotic resistant serotypes showed a positive R-PCR for plasmid mediated qnrA gene encoding for nalidixic acid and ciprofloxacin resistance. No positive results could be revealed form R-PCRs targeting qnrB or qnrS. In light of these results we can conclude that the presence of at least one of the qnr genes and/or the presence of Integrase Class I gene were responsible for the multiple antibiotic resistance to for nalidixic acid and ciprofloxacin from the studied Salmonella spp. and further studies required to identify the genes related with multiple antibiotic resistance of the pathogen.

  12. Genetic analysis of Ikaros target genes and tumor suppressor function in BCR-ABL1+ pre–B ALL

    PubMed Central

    Aghajanirefah, Ali; McLaughlin, Jami; Cheng, Donghui; Geng, Huimin; Eggesbø, Linn M.; Smale, Stephen T.; Müschen, Markus

    2017-01-01

    Inactivation of the tumor suppressor gene encoding the transcriptional regulator Ikaros (IKZF1) is a hallmark of BCR-ABL1+ precursor B cell acute lymphoblastic leukemia (pre–B ALL). However, the mechanisms by which Ikaros functions as a tumor suppressor in pre–B ALL remain poorly understood. Here, we analyzed a mouse model of BCR-ABL1+ pre–B ALL together with a new model of inducible expression of wild-type Ikaros in IKZF1 mutant human BCR-ABL1+ pre–B ALL. We performed integrated genome-wide chromatin and expression analyses and identified Ikaros target genes in mouse and human BCR-ABL1+ pre–B ALL, revealing novel conserved gene pathways associated with Ikaros tumor suppressor function. Notably, genetic depletion of different Ikaros targets, including CTNND1 and the early hematopoietic cell surface marker CD34, resulted in reduced leukemic growth. Our results suggest that Ikaros mediates tumor suppressor function by enforcing proper developmental stage–specific expression of multiple genes through chromatin compaction at its target genes. PMID:28190001

  13. Grains of connectivity: analysis at multiple spatial scales in landscape genetics.

    PubMed

    Galpern, Paul; Manseau, Micheline; Wilson, Paul

    2012-08-01

    Landscape genetic analyses are typically conducted at one spatial scale. Considering multiple scales may be essential for identifying landscape features influencing gene flow. We examined landscape connectivity for woodland caribou (Rangifer tarandus caribou) at multiple spatial scales using a new approach based on landscape graphs that creates a Voronoi tessellation of the landscape. To illustrate the potential of the method, we generated five resistance surfaces to explain how landscape pattern may influence gene flow across the range of this population. We tested each resistance surface using a raster at the spatial grain of available landscape data (200 m grid squares). We then used our method to produce up to 127 additional grains for each resistance surface. We applied a causal modelling framework with partial Mantel tests, where evidence of landscape resistance is tested against an alternative hypothesis of isolation-by-distance, and found statistically significant support for landscape resistance to gene flow in 89 of the 507 spatial grains examined. We found evidence that major roads as well as the cumulative effects of natural and anthropogenic disturbance may be contributing to the genetic structure. Using only the original grid surface yielded no evidence for landscape resistance to gene flow. Our results show that using multiple spatial grains can reveal landscape influences on genetic structure that may be overlooked with a single grain, and suggest that coarsening the grain of landcover data may be appropriate for highly mobile species. We discuss how grains of connectivity and related analyses have potential landscape genetic applications in a broad range of systems. © 2012 Blackwell Publishing Ltd.

  14. Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits.

    PubMed

    Rudra, Pratyaydipta; Broadaway, K Alaine; Ware, Erin B; Jhun, Min A; Bielak, Lawrence F; Zhao, Wei; Smith, Jennifer A; Peyser, Patricia A; Kardia, Sharon L R; Epstein, Michael P; Ghosh, Debashis

    2018-06-01

    Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next-generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross-phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare-variant approaches exist for testing cross-phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross-phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome-wide scale due to the use of a closed-form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy. © 2018 WILEY PERIODICALS, INC.

  15. Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data.

    PubMed

    Bao, Le; Gu, Hong; Dunn, Katherine A; Bielawski, Joseph P

    2007-02-08

    Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, a priori biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial. In this study, we implement a set of fixed-effect codon models which allow for different levels of heterogeneity among partitions in the substitution process. We describe strategies for selecting among these models by a backward elimination procedure, Akaike information criterion (AIC) or a corrected Akaike information criterion (AICc). We evaluate the performance of these model selection methods via a simulation study, and make several recommendations for real data analysis. Our simulation study indicates that the backward elimination procedure can provide a reliable method for model selection in this setting. We also demonstrate the utility of these models by application to a single-gene dataset partitioned according to tertiary structure (abalone sperm lysin), and a multi-gene dataset partitioned according to the functional category of the gene (flagellar-related proteins of Listeria). Fixed-effect models have advantages and disadvantages. Fixed-effect models are desirable when data partitions are known to exhibit significant heterogeneity or when a statistical test of such heterogeneity is desired. They have the disadvantage of requiring a priori knowledge for partitioning sites. We recommend: (i) selection of models by using backward elimination rather than AIC or AICc, (ii) use a stringent cut-off, e.g., p = 0.0001, and (iii) conduct sensitivity analysis of results. With thoughtful application, fixed-effect codon models should provide a useful tool for large scale multi-gene analyses.

  16. The Evolution of Mobile DNAs: When Will Transposons Create Phylogenies That Look As If There Is a Master Gene?

    PubMed Central

    Brookfield, John F. Y.; Johnson, Louise J.

    2006-01-01

    Some families of mammalian interspersed repetitive DNA, such as the Alu SINE sequence, appear to have evolved by the serial replacement of one active sequence with another, consistent with there being a single source of transposition: the “master gene.” Alternative models, in which multiple source sequences are simultaneously active, have been called “transposon models.” Transposon models differ in the proportion of elements that are active and in whether inactivation occurs at the moment of transposition or later. Here we examine the predictions of various types of transposon model regarding the patterns of sequence variation expected at an equilibrium between transposition, inactivation, and deletion. Under the master gene model, all bifurcations in the true tree of elements occur in a single lineage. We show that this property will also hold approximately for transposon models in which most elements are inactive and where at least some of the inactivation events occur after transposition. Such tree shapes are therefore not conclusive evidence for a single source of transposition. PMID:16790583

  17. 151. Bromocriptine Challenge Affects Working Memory Processing in Humans Depending on DRD2-Related Genes

    PubMed Central

    Pergola, Giulio; Selvaggi, Pierluigi; Gelao, Barbara; Di Carlo, Pasquale; Nettis, Maria Antonietta; Amico, Graziella; Felici, Valentina; Fazio, Leonardo; Rampino, Antonio; Sambataro, Fabio; Blasi, Giuseppe; Bertolino, Alessandro

    2017-01-01

    Abstract Background: Dopamine D2 receptors (D2R) contribute to the inverted-U shaped relationship between dopamine dorsolateral prefrontal cortex (DLPFC) and working memory (WM). Genetic variation in DRD2 coding for D2Rs modulates D2 signaling, but other genes in its pathway may be involved. In a previous work, using gene co-expression networks we identified 84 partner genes coregulated with DRD2 and eight single nucleotide polymorphisms (SNPs) predicting coexpression of the whole gene set in the human DLPFC [1]. These SNPs combined into a polygenic coexpression index (PCI) predicted WM performance and DLPFC activity in two independent samples of living healthy humans [1]. Here, we asked whether response to D2R targeting drugs is associated with this PCI. Thus, we investigated the interaction between WM behavioral/brain response to the D2R agonist Bromocriptine (BRO) and the PCI. [1] Pergola G, Di Carlo P, et al. (In press). Translational Psychiatry. Methods: Fifty healthy volunteers entered a double-blind, crossover, randomized, placebo-controlled fMRI study with BRO 1.25 mg and performed the N-Back WM task during the fMRI scanning session. We computed the PCI for all participants and investigated its association with WM-related behavior and brain activity using general linear models. Results: A PCI by drug interaction was significant on both DLPFC signal (right BA46, 242 voxels, F(1, 48) = 24; right BA9, 177 voxels, F(1, 48) = 19; P < .05 cluster-level FWE corrected) and behavioral scores, F(1, 46) = 4.6, P = .045, using a U-shaped quadratic model. The U-shaped relationship between the PCI and WM processing found on placebo was reversed on BRO. Furthermore, the increase in behavioral performance on BRO correlated with a decrease in BA46 activity, t(48) = −2.0, P = .049). Conclusion: The combined effect of multiple alleles on DRD2 coexpression covaried with drug response such that different allelic patterns were associated with similar responses, as in the inverted U-shaped model of WM. Thus, multiple genes and multiple allelic patterns are implicated in the inverted U-shaped dopamine/WM relationship. This relationship is reversed when individuals are administered BRO, suggesting that brain and behavioral response to this pharmacological challenge depends on a pleiotropic individual genetic background. Hence, pharmacogenomics in schizophrenia should take into account allelic patterns associated with molecular phenomena such as gene expression to predict drug response.

  18. Development of a Knowledgebase (MetRxn) of Metabolites, Reactions and Atom Mappings to Accelerate Discovery and Redesign

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maranas, Costas D.

    With advances in DNA sequencing and genome annotation techniques, the breadth of metabolic knowledge across all kingdoms of life is increasing. The construction of genome-scale models (GSMs) facilitates this distillation of knowledge by systematically accounting for reaction stoichiometry and directionality, gene to protein to reaction relationships, reaction localization among cellular organelles, metabolite transport costs and routes, transcriptional regulation, and biomass composition. Genome-scale reconstructions available now span across all kingdoms of life, from microbes to whole-plant models, and have become indispensable for driving informed metabolic designs and interventions. A key barrier to the pace of this development is our inability tomore » utilize metabolite/reaction information from databases such as BRENDA [1], KEGG [2], MetaCyc [3], etc. due to incompatibilities of representation, duplications, and errors. Duplicate entries constitute a major impediment, where the same metabolite is found with multiple names across databases and models, which significantly slows downs the collating of information from multiple data sources. This can also lead to serious modeling errors such as charge/mass imbalances [4,5] which can thwart model predictive abilities such as identifying synthetic lethal gene pairs and quantifying metabolic flows. Hence, we created the MetRxn database [6] that takes the next step in integrating data from multiple sources and formats to automatically create a standardized knowledgebase. We subsequently deployed this resource to bring about new paradigms in genome-scale metabolic model reconstruction, metabolic flux elucidation through MFA, modeling of microbial communities, and pathway prospecting. This research has enabled the PI’s group to continue building upon research milestones and reach new ones (see list of MetRxn-related publications below).« less

  19. Imaging Effects of Neurotrophic Factor Genes on Brain Plasticity and Repair in Multiple Sclerosis

    DTIC Science & Technology

    2010-07-01

    cortical thickness and subcortical volume measures, lesion volumetry , and voxel-based morphometry and diffusion imaging. We are continuing to...th ickness and subcortical volume measures, lesion volumetry , and voxel-based morphometry and diffusion imaging. Regressio n and symbolic modeling

  20. IRAK1 variant is protective for orthodontic-induced external apical root resorption.

    PubMed

    Pereira, S; Nogueira, L; Canova, F; Lopez, M; Silva, H C

    2016-10-01

    Interleukin-1 beta (IL1B) pathway is a key player in orthodontic-induced external apical root resorption (EARR). The aim of this work was to identify the genes related to the IL1 pathway as possible candidate genes for EARR, which might be included in an integrative predictive model of this complex phenotype. Using a stepwise multiple linear regression model, 195 patients who had undergone orthodontic treatment were assessed for clinical and genetic factors associated with %EARRmax (maximum %EARR value obtained for each patient). The four maxillary incisors and the two maxillary canines were assessed. Three functional single nucleotide polymorphisms (SNPs) were genotyped: rs1143634 in IL1B gene, rs315952 in IL1RN gene, and rs1059703 in X-linked IRAK1 gene. The model showed that four of the nine clinical variables and one SNP explained 30% of the %EARRmax variability. The most significant unique contributions to the model were gender (P = 0.001), treatment duration (P < 0.001), premolar extractions (P = 0.003), Hyrax appliance (P < 0.001), and homozygosity/hemizygosity for variant C from IRAK1 gene (P = 0.018), which proved to be a protective factor. IRAK1 polymorphism is proposed as a protective variant for EARR. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. Injection of Aβ1-40 into hippocampus induced cognitive lesion associated with neuronal apoptosis and multiple gene expressions in the tree shrew.

    PubMed

    Lin, Na; Xiong, Liu-Lin; Zhang, Rong-Ping; Zheng, Hong; Wang, Lei; Qian, Zhong-Yi; Zhang, Piao; Chen, Zhi-Wei; Gao, Fa-Bao; Wang, Ting-Hua

    2016-05-01

    Alzheimer's disease (AD) can incur significant health care costs to the patient, their families, and society; furthermore, effective treatments are limited, as the mechanisms of AD are not fully understood. This study utilized twelve adult male tree shrews (TS), which were randomly divided into PBS and amyloidbetapeptide1-40 (Aβ1-40) groups. AD model was established via an intracerebroventricular (icv) injection of Aβ1-40 after being incubated for 4 days at 37 °C. Behavioral, pathophysiological and molecular changes were evaluated by hippocampal-dependent tasks, magnetic resonance imaging (MRI), silver staining, hematoxylin-eosin (HE) staining, TUNEL assay and gene sequencing, respectively. At 4 weeks post-injection, as compared with the PBS group, in Aβ1-40 injected animals: cognitive impairments happened, and the hippocampus had atrophied indicated by MRI findings; meanwhile, HE staining showed the cells of the CA3 and DG were significantly thinner and smaller. The average number of cells in the DG, but not the CA3, was also significantly reduced; furthermore, silver staining revealed neurotic plaques and neurofibrillary tangles (NFTs) in the hippocampi; TUNEL assay showed many cells exhibited apoptosis, which was associated with downregulated BCL-2/BCL-XL-associated death promoter (Bad), inhibitor of apoptosis protein (IAP), Cytochrome c (CytC) and upregulated tumor necrosis factor receptor 1 (TNF-R1); lastly, gene sequencing reported a total of 924 mobilized genes, among which 13 of the downregulated and 19 of the upregulated genes were common to the AD pathway. The present study not only established AD models in TS, but also reported on the underlying mechanism involved in neuronal apoptosis associated with multiple gene expression.

  2. Causal relationship between the AHSG gene and BMD through fetuin-A and BMI: multiple mediation analysis.

    PubMed

    Sritara, C; Thakkinstian, A; Ongphiphadhanakul, B; Chailurkit, L; Chanprasertyothin, S; Ratanachaiwong, W; Vathesatogkit, P; Sritara, P

    2014-05-01

    Using mediation analysis, a causal relationship between the AHSG gene and bone mineral density (BMD) through fetuin-A and body mass index (BMI) mediators was suggested. Fetuin-A, a multifunctional protein of hepatic origin, is associated with bone mineral density. It is unclear if this association is causal. This study aimed at clarification of this issue. A cross-sectional study was conducted among 1,741 healthy workers from the Electricity Generating Authority of Thailand (EGAT) cohort. The alpha-2-Heremans-Schmid glycoprotein (AHSG) rs2248690 gene was genotyped. Three mediation models were constructed using seemingly unrelated regression analysis. First, the ln[fetuin-A] group was regressed on the AHSG gene. Second, the BMI group was regressed on the AHSG gene and the ln[fetuin-A] group. Finally, the BMD model was constructed by fitting BMD on two mediators (ln[fetuin-A] and BMI) and the independent AHSG variable. All three analyses were adjusted for confounders. The prevalence of the minor T allele for the AHSG locus was 15.2%. The AHSG locus was highly related to serum fetuin-A levels (P < 0.001). Multiple mediation analyses showed that AHSG was significantly associated with BMD through the ln[fetuin-A] and BMI pathway, with beta coefficients of 0.0060 (95% CI 0.0038, 0.0083) and 0.0030 (95% CI 0.0020, 0.0045) at the total hip and lumbar spine, respectively. About 27.3 and 26.0% of total genetic effects on hip and spine BMD, respectively, were explained by the mediation effects of fetuin-A and BMI. Our study suggested evidence of a causal relationship between the AHSG gene and BMD through fetuin-A and BMI mediators.

  3. Dandelion root extract affects colorectal cancer proliferation and survival through the activation of multiple death signalling pathways

    PubMed Central

    Ovadje, Pamela; Ammar, Saleem; Guerrero, Jose-Antonio; Arnason, John Thor; Pandey, Siyaram

    2016-01-01

    Dandelion extracts have been studied extensively in recent years for its anti-depressant and anti-inflammatory activity. Recent work from our lab, with in-vitro systems, shows the anti-cancer potential of an aqueous dandelion root extract (DRE) in several cancer cell models, with no toxicity to non-cancer cells. In this study, we examined the cancer cell-killing effectiveness of an aqueous DRE in colon cancer cell models. Aqueous DRE induced programmed cell death (PCD) selectively in > 95% of colon cancer cells, irrespective of their p53 status, by 48 hours of treatment. The anti-cancer efficacy of this extract was confirmed in in-vivo studies, as the oral administration of DRE retarded the growth of human colon xenograft models by more than 90%. We found the activation of multiple death pathways in cancer cells by DRE treatment, as revealed by gene expression analyses showing the expression of genes implicated in programmed cell death. Phytochemical analyses of the extract showed complex multi-component composition of the DRE, including some known bioactive phytochemicals such as α-amyrin, β-amyrin, lupeol and taraxasterol. This suggested that this natural extract could engage and effectively target multiple vulnerabilities of cancer cells. Therefore, DRE could be a non-toxic and effective anti-cancer alternative, instrumental for reducing the occurrence of cancer cells drug-resistance. PMID:27564258

  4. Short and long-term genome stability analysis of prokaryotic genomes.

    PubMed

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.

  5. CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

    PubMed

    Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel

    2016-03-01

    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  6. [The value of 5-HTT gene polymorphism for the assessment and prediction of male adolescence violence].

    PubMed

    Yu, Yue; Liu, Xiang; Yang, Zhen-xing; Qiu, Chang-jian; Ma, Xiao-hong

    2012-08-01

    To establish an adolescent violence crime prediction model, and to assess the value of serotonin transporter (5-HTT) gene polymorphism for the assessment and prediction of violent crime. Investigative tools were used to analyze the difference in personality dimensions, social support, coping styles, aggressiveness, impulsivity, and family condition scale between 223 adolescents with violence behavior and 148 adolescents without violence behavior. The distribution of 5-HTT gene polymorphisms (5-HTTLPR and 5-HTTVNTR) was compared between the two groups. The role of 5-HTT gene polymorphism on adolescent personality, impulsion and aggression scale also was also analyzed. Stepwise logistic regression was used to establish a predictive model for adolescent violent crime. Significant difference was found between the violence group and the control group on multiple dimensions of psychology and environment scales. However, no statistical difference was found with regard to the 5-HTT genotypes and alleles between adolescents with violent behaviors and normal controls. The rate of prediction accuracy was not significantly improved when 5-HTT gene polymorphism was taken into the model. The violent crime of adolescents was closely related with social and environmental factors. No association was found between 5-HTT polymorphisms and adolescent violence criminal behavior.

  7. Modeling the functional genomics of autism using human neurons.

    PubMed

    Konopka, G; Wexler, E; Rosen, E; Mukamel, Z; Osborn, G E; Chen, L; Lu, D; Gao, F; Gao, K; Lowe, J K; Geschwind, D H

    2012-02-01

    Human neural progenitors from a variety of sources present new opportunities to model aspects of human neuropsychiatric disease in vitro. Such in vitro models provide the advantages of a human genetic background combined with rapid and easy manipulation, making them highly useful adjuncts to animal models. Here, we examined whether a human neuronal culture system could be utilized to assess the transcriptional program involved in human neural differentiation and to model some of the molecular features of a neurodevelopmental disorder, such as autism. Primary normal human neuronal progenitors (NHNPs) were differentiated into a post-mitotic neuronal state through addition of specific growth factors and whole-genome gene expression was examined throughout a time course of neuronal differentiation. After 4 weeks of differentiation, a significant number of genes associated with autism spectrum disorders (ASDs) are either induced or repressed. This includes the ASD susceptibility gene neurexin 1, which showed a distinct pattern from neurexin 3 in vitro, and which we validated in vivo in fetal human brain. Using weighted gene co-expression network analysis, we visualized the network structure of transcriptional regulation, demonstrating via this unbiased analysis that a significant number of ASD candidate genes are coordinately regulated during the differentiation process. As NHNPs are genetically tractable and manipulable, they can be used to study both the effects of mutations in multiple ASD candidate genes on neuronal differentiation and gene expression in combination with the effects of potential therapeutic molecules. These data also provide a step towards better understanding of the signaling pathways disrupted in ASD.

  8. Isolation and characterization of multiple F-box genes linked to the S9- and S10-RNase in apple (Malus × domestica Borkh.).

    PubMed

    Okada, Kazuma; Moriya, Shigeki; Haji, Takashi; Abe, Kazuyuki

    2013-06-01

    Using 11 consensus primer pairs designed from S-linked F-box genes of apple and Japanese pear, 10 new F-box genes (MdFBX21 to 30) were isolated from the apple cultivar 'Spartan' (S(9)S(10)). MdFBX21 to 23 and MdFBX24 to 30 were completely linked to the S(9) -RNase and S(10-)RNase, respectively, and showed pollen-specific expression and S-haplotype-specific polymorphisms. Therefore, these 10 F-box genes are good candidates for the pollen determinant of self-incompatibility in apple. Phylogenetic analysis and comparison of deduced amino acid sequences of MdFBX21 to 30 with those of 25 S-linked F-box genes previously isolated from apple showed that a deduced amino acid identity of greater than 88.0 % can be used as the tentative criterion to classify F-box genes into one type. Using this criterion, 31 of 35 F-box genes of apple were classified into 11 types (SFBB1-11). All types included F-box genes derived from S(3-) and S(9-)haplotypes, and seven types included F-box genes derived from S(3-), S(9-), and S(10-)haplotypes. Moreover, comparison of nucleotide sequences of S-RNases and multiple F-box genes among S(3-), S(9-), and S(10-)haplotypes suggested that F-box genes within each type showed high nucleotide identity regardless of the identity of the S-RNase. The large number of F-box genes as candidates for the pollen determinant and the high degree of conservation within each type are consistent with the collaborative non-self-recognition model reported for Petunia. These findings support that the collaborative non-self-recognition system also exists in apple.

  9. Genetic variation in cell death genes and risk of non-Hodgkin lymphoma.

    PubMed

    Schuetz, Johanna M; Daley, Denise; Graham, Jinko; Berry, Brian R; Gallagher, Richard P; Connors, Joseph M; Gascoyne, Randy D; Spinelli, John J; Brooks-Wilson, Angela R

    2012-01-01

    Non-Hodgkin lymphomas are a heterogeneous group of solid tumours that constitute the 5(th) highest cause of cancer mortality in the United States and Canada. Poor control of cell death in lymphocytes can lead to autoimmune disease or cancer, making genes involved in programmed cell death of lymphocytes logical candidate genes for lymphoma susceptibility. We tested for genetic association with NHL and NHL subtypes, of SNPs in lymphocyte cell death genes using an established population-based study. 17 candidate genes were chosen based on biological function, with 123 SNPs tested. These included tagSNPs from HapMap and novel SNPs discovered by re-sequencing 47 cases in genes for which SNP representation was judged to be low. The main analysis, which estimated odds ratios by fitting data to an additive logistic regression model, used European ancestry samples that passed quality control measures (569 cases and 547 controls). A two-tiered approach for multiple testing correction was used: correction for number of tests within each gene by permutation-based methodology, followed by correction for the number of genes tested using the false discovery rate. Variant rs928883, near miR-155, showed an association (OR per A-allele: 2.80 [95% CI: 1.63-4.82]; p(F) = 0.027) with marginal zone lymphoma that is significant after correction for multiple testing. This is the first reported association between a germline polymorphism at a miRNA locus and lymphoma.

  10. Production of red-flowered plants by genetic engineering of multiple flavonoid biosynthetic genes.

    PubMed

    Nakatsuka, Takashi; Abe, Yoshiko; Kakizaki, Yuko; Yamamura, Saburo; Nishihara, Masahiro

    2007-11-01

    Orange- to red-colored flowers are difficult to produce by conventional breeding techniques in some floricultural plants. This is due to the deficiency in the formation of pelargonidin, which confers orange to red colors, in their flowers. Previous researchers have reported that brick-red colored flowers can be produced by introducing a foreign dihydroflavonol 4-reductase (DFR) with different substrate specificity in Petunia hybrida, which does not accumulate pelargonidin pigments naturally. However, because these experiments used dihydrokaempferol (DHK)-accumulated mutants as transformation hosts, this strategy cannot be applied directly to other floricultural plants. Thus in this study, we attempted to produce red-flowered plants by suppressing two endogenous genes and expressing one foreign gene using tobacco as a model plant. We used a chimeric RNAi construct for suppression of two genes (flavonol synthase [FLS] and flavonoid 3'-hydroxylase [F3'H]) and expression of the gerbera DFR gene in order to accumulate pelargonidin pigments in tobacco flowers. We successfully produced red-flowered tobacco plants containing high amounts of additional pelargonidin as confirmed by HPLC analysis. The flavonol content was reduced in the transgenic plants as expected, although complete inhibition was not achieved. Expression analysis also showed that reduction of the two-targeted genes and expression of the foreign gene occurred simultaneously. These results demonstrate that flower color modification can be achieved by multiple gene regulation without use of mutants if the vector constructs are designed resourcefully.

  11. A diffusion model for the fate of tandem gene duplicates in diploids.

    PubMed

    O'Hely, Martin

    2007-06-01

    Suppose one chromosome in one member of a population somehow acquires a duplicate copy of the gene, fully linked to the original gene's locus. Preservation is the event that eventually every chromosome in the population is a descendant of the one which initially carried the duplicate. For a haploid population in which the absence of all copies of the gene is lethal, the probability of preservation has recently been estimated via a diffusion approximation. That approximation is shown to carry over to the case of diploids and arbitrary strong selection against the absence of the gene. The techniques used lead to some new results. In the large population limit, it is shown that the relative probability that descendants of a small number of individuals carrying multiple copies of the gene fix in the population is proportional to the number of copies carried. The probability of preservation is approximated when chromosomes carrying two copies of the gene are subject to additional, fully non-functionalizing mutations, thereby modelling either an additional cost of replicating a longer genome, or a partial duplication of the gene. In the latter case the preservation probability depends only on the mutation rate to null for the duplicated portion of the gene.

  12. Novel genomic rearrangements mediated by multiple genetic elements in Streptococcus pyogenes M23ND confer potential for evolutionary persistence

    PubMed Central

    Bao, Yun-Juan; Liang, Zhong; Mayfield, Jeffrey A.; McShan, William M.; Lee, Shaun W.; Ploplis, Victoria A.; Castellino, Francis J.

    2016-01-01

    Symmetric genomic rearrangements around replication axes in genomes are commonly observed in prokaryotic genomes, including Group A Streptococcus (GAS). However, asymmetric rearrangements are rare. Our previous studies showed that the hypervirulent invasive GAS strain, M23ND, containing an inactivated transcriptional regulator system, covRS, exhibits unique extensive asymmetric rearrangements, which reconstructed a genomic structure distinct from other GAS genomes. In the current investigation, we identified the rearrangement events and examined the genetic consequences and evolutionary implications underlying the rearrangements. By comparison with a close phylogenetic relative, M18-MGAS8232, we propose a molecular model wherein a series of asymmetric rearrangements have occurred in M23ND, involving translocations, inversions and integrations mediated by multiple factors, viz., rRNA-comX (factor for late competence), transposons and phage-encoded gene segments. Assessments of the cumulative gene orientations and GC skews reveal that the asymmetric genomic rearrangements did not affect the general genomic integrity of the organism. However, functional distributions reveal re-clustering of a broad set of CovRS-regulated actively transcribed genes, including virulence factors and metabolic genes, to the same leading strand, with high confidence (p-value ~10−10). The re-clustering of the genes suggests a potential selection advantage for the spatial proximity to the transcription complexes, which may contain the global transcriptional regulator, CovRS, and other RNA polymerases. Their proximities allow for efficient transcription of the genes required for growth, virulence and persistence. A new paradigm of survival strategies of GAS strains is provided through multiple genomic rearrangements, while, at the same time, maintaining genomic integrity. PMID:27329479

  13. Transcriptional Changes in Canine Distemper Virus-Induced Demyelinating Leukoencephalitis Favor a Biphasic Mode of Demyelination

    PubMed Central

    Ulrich, Reiner; Puff, Christina; Wewetzer, Konstantin; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang

    2014-01-01

    Canine distemper virus (CDV)-induced demyelinating leukoencephalitis in dogs (Canis familiaris) is suggested to represent a naturally occurring translational model for subacute sclerosing panencephalitis and multiple sclerosis in humans. The aim of this study was a hypothesis-free microarray analysis of the transcriptional changes within cerebellar specimens of five cases of acute, six cases of subacute demyelinating, and three cases of chronic demyelinating and inflammatory CDV leukoencephalitis as compared to twelve non-infected control dogs. Frozen cerebellar specimens were used for analysis of histopathological changes including demyelination, transcriptional changes employing microarrays, and presence of CDV nucleoprotein RNA and protein using microarrays, RT-qPCR and immunohistochemistry. Microarray analysis revealed 780 differentially expressed probe sets. The dominating change was an up-regulation of genes related to the innate and the humoral immune response, and less distinct the cytotoxic T-cell-mediated immune response in all subtypes of CDV leukoencephalitis as compared to controls. Multiple myelin genes including myelin basic protein and proteolipid protein displayed a selective down-regulation in subacute CDV leukoencephalitis, suggestive of an oligodendrocyte dystrophy. In contrast, a marked up-regulation of multiple immunoglobulin-like expressed sequence tags and the delta polypeptide of the CD3 antigen was observed in chronic CDV leukoencephalitis, in agreement with the hypothesis of an immune-mediated demyelination in the late inflammatory phase of the disease. Analysis of pathways intimately linked to demyelination as determined by morphometry employing correlation-based Gene Set Enrichment Analysis highlighted the pathomechanistic importance of up-regulated genes comprised by the gene ontology terms “viral replication” and “humoral immune response” as well as down-regulated genes functionally related to “metabolite and energy generation”. PMID:24755553

  14. Transcriptional changes in canine distemper virus-induced demyelinating leukoencephalitis favor a biphasic mode of demyelination.

    PubMed

    Ulrich, Reiner; Puff, Christina; Wewetzer, Konstantin; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang

    2014-01-01

    Canine distemper virus (CDV)-induced demyelinating leukoencephalitis in dogs (Canis familiaris) is suggested to represent a naturally occurring translational model for subacute sclerosing panencephalitis and multiple sclerosis in humans. The aim of this study was a hypothesis-free microarray analysis of the transcriptional changes within cerebellar specimens of five cases of acute, six cases of subacute demyelinating, and three cases of chronic demyelinating and inflammatory CDV leukoencephalitis as compared to twelve non-infected control dogs. Frozen cerebellar specimens were used for analysis of histopathological changes including demyelination, transcriptional changes employing microarrays, and presence of CDV nucleoprotein RNA and protein using microarrays, RT-qPCR and immunohistochemistry. Microarray analysis revealed 780 differentially expressed probe sets. The dominating change was an up-regulation of genes related to the innate and the humoral immune response, and less distinct the cytotoxic T-cell-mediated immune response in all subtypes of CDV leukoencephalitis as compared to controls. Multiple myelin genes including myelin basic protein and proteolipid protein displayed a selective down-regulation in subacute CDV leukoencephalitis, suggestive of an oligodendrocyte dystrophy. In contrast, a marked up-regulation of multiple immunoglobulin-like expressed sequence tags and the delta polypeptide of the CD3 antigen was observed in chronic CDV leukoencephalitis, in agreement with the hypothesis of an immune-mediated demyelination in the late inflammatory phase of the disease. Analysis of pathways intimately linked to demyelination as determined by morphometry employing correlation-based Gene Set Enrichment Analysis highlighted the pathomechanistic importance of up-regulated genes comprised by the gene ontology terms "viral replication" and "humoral immune response" as well as down-regulated genes functionally related to "metabolite and energy generation".

  15. Associations and interactions between SNPs in the alcohol metabolizing genes and alcoholism phenotypes in European Americans.

    PubMed

    Sherva, Richard; Rice, John P; Neuman, Rosalind J; Rochberg, Nanette; Saccone, Nancy L; Bierut, Laura J

    2009-05-01

    Alcohol dependence is a major cause of morbidity and mortality worldwide and has a strong familial component. Several linkage and association studies have identified chromosomal regions and/or genes that affect alcohol consumption, notably in genes involved in the 2-stage pathway of alcohol metabolism. Here, we use multiple regression models to test for associations and interactions between 2 alcohol-related phenotypes and SNPs in 17 genes involved in alcohol metabolism in a sample of 1,588 European American subjects. The strongest evidence for association after correcting for multiple testing was between rs1229984, a nonsynonymous coding SNP in ADH1B, and DSM-IV symptom count (p = 0.0003). This SNP was also associated with maximum number of drinks in 24 hours (p = 0.0004). Each minor allele at this SNP predicts 45% fewer DSM-IV symptoms and 18% fewer max drinks. Another SNP in a splice site in ALDH1A1 (rs8187974) showed evidence for association with both phenotypes as well (p = 0.02 and 0.004, respectively), but neither association was significant after accounting for multiple testing. Minor alleles at this SNP predict greater alcohol consumption. In addition, pairwise interactions were observed between SNPs in several genes (p = 0.00002). We replicated the large effect of rs1229984 on alcohol behavior, and although not common (MAF = 4%), this polymorphism may be highly relevant from a public health perspective in European Americans. Another SNP, rs8187974, may also affect alcohol behavior but requires replication. Also, interactions between polymorphisms in genes involved in alcohol metabolism are likely determinants of the parameters that ultimately affect alcohol consumption.

  16. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    PubMed Central

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775

  17. Thrombomodulin gene variants are associated with increased mortality after coronary artery bypass surgery in replicated analyses.

    PubMed

    Lobato, Robert L; White, William D; Mathew, Joseph P; Newman, Mark F; Smith, Peter K; McCants, Charles B; Alexander, John H; Podgoreanu, Mihai V

    2011-09-13

    We tested the hypothesis that genetic variation in thrombotic and inflammatory pathways is independently associated with long-term mortality after coronary artery bypass graft (CABG) surgery. Two separate cohorts of patients undergoing CABG surgery at a single institution were examined, and all-cause mortality between 30 days and 5 years after the index CABG was ascertained from the National Death Index. In a discovery cohort of 1018 patients, a panel of 90 single-nucleotide polymorphisms (SNPs) in 49 candidate genes was tested with Cox proportional hazard models to identify clinical and genomic multivariate predictors of incident death. After adjustment for multiple comparisons and clinical predictors of mortality, the homozygote minor allele of a common variant in the thrombomodulin (THBD) gene (rs1042579) was independently associated with significantly increased risk of all-cause mortality (hazard ratio, 2.26; 95% CI, 1.31 to 3.92; P=0.003). Six tag SNPs in the THBD gene, 1 of which (rs3176123) in complete linkage disequilibrium with rs1042579, were then assessed in an independent validation cohort of 930 patients. After multivariate adjustment for the clinical predictors identified in the discovery cohort and multiple testing, the homozygote minor allele of rs3176123 independently predicted all-cause mortality (hazard ratio, 3.6; 95% CI, 1.67 to 7.78; P=0.001). In 2 independent cardiac surgery cohorts, linked common allelic variants in the THBD gene are independently associated with increased long-term mortality risk after CABG and significantly improve the classification ability of traditional postoperative mortality prediction models.

  18. Candidate genes, pathways and mechanisms for alcoholism: an expanded convergent functional genomics approach.

    PubMed

    Rodd, Z A; Bertsch, B A; Strother, W N; Le-Niculescu, H; Balaraman, Y; Hayden, E; Jerome, R E; Lumeng, L; Nurnberger, J I; Edenberg, H J; McBride, W J; Niculescu, A B

    2007-08-01

    We describe a comprehensive translational approach for identifying candidate genes for alcoholism. The approach relies on the cross-matching of animal model brain gene expression data with human genetic linkage data, as well as human tissue data and biological roles data, an approach termed convergent functional genomics. An analysis of three animal model paradigms, based on inbred alcohol-preferring (iP) and alcohol-non-preferring (iNP) rats, and their response to treatments with alcohol, was used. A comprehensive analysis of microarray gene expression data from five key brain regions (frontal cortex, amygdala, caudate-putamen, nucleus accumbens and hippocampus) was carried out. The Bayesian-like integration of multiple independent lines of evidence, each by itself lacking sufficient discriminatory power, led to the identification of high probability candidate genes, pathways and mechanisms for alcoholism. These data reveal that alcohol has pleiotropic effects on multiple systems, which may explain the diverse neuropsychiatric and medical pathology in alcoholism. Some of the pathways identified suggest avenues for pharmacotherapy of alcoholism with existing agents, such as angiotensin-converting enzyme (ACE) inhibitors. Experiments we carried out in alcohol-preferring rats with an ACE inhibitor show a marked modulation of alcohol intake. Other pathways are new potential targets for drug development. The emergent overall picture is that physical and physiological robustness may permit alcohol-preferring individuals to withstand the aversive effects of alcohol. In conjunction with a higher reactivity to its rewarding effects, they may able to ingest enough of this nonspecific drug for a strong hedonic and addictive effect to occur.

  19. Serotonin transporter gene and childhood trauma--a G × E effect on anxiety sensitivity.

    PubMed

    Klauke, Benedikt; Deckert, Jürgen; Reif, Andreas; Pauli, Paul; Zwanzger, Peter; Baumann, Christian; Arolt, Volker; Glöckner-Rist, Angelika; Domschke, Katharina

    2011-12-21

    Genetic factors and environmental factors are assumed to interactively influence the pathogenesis of anxiety disorders. Thus, a gene-environment interaction (G × E) study was conducted with respect to anxiety sensitivity (AS) as a promising intermediate phenotype of anxiety disorders. Healthy subjects (N = 363) were assessed for AS, childhood maltreatment (Childhood Trauma Questionnaire), and genotyped for functional serotonin transporter gene variants (5-HTTLPR/5-HTT rs25531). The influence of genetic and environmental variables on AS and its subdimensions was determined by a step-wise hierarchical regression and a multiple indicator multiple cause (MIMIC) model. A significant G × E effect of the more active 5-HTT genotypes and childhood maltreatment on AS was observed. Furthermore, genotype (LL)-childhood trauma interaction particularly influenced somatic AS subdimensions, whereas cognitive subdimensions were affected by childhood maltreatment only. Results indicate a G × E effect of the more active 5-HTT genotypes and childhood maltreatment on AS, with particular impact on its somatic subcomponent. © 2011 Wiley Periodicals, Inc.

  20. Multiple Site-Directed and Saturation Mutagenesis by the Patch Cloning Method.

    PubMed

    Taniguchi, Naohiro; Murakami, Hiroshi

    2017-01-01

    Constructing protein-coding genes with desired mutations is a basic step for protein engineering. Herein, we describe a multiple site-directed and saturation mutagenesis method, termed MUPAC. This method has been used to introduce multiple site-directed mutations in the green fluorescent protein gene and in the moloney murine leukemia virus reverse transcriptase gene. Moreover, this method was also successfully used to introduce randomized codons at five desired positions in the green fluorescent protein gene, and for simple DNA assembly for cloning.

  1. Three Approaches to Modeling Gene-Environment Interactions in Longitudinal Family Data: Gene-Smoking Interactions in Blood Pressure.

    PubMed

    Basson, Jacob; Sung, Yun Ju; de Las Fuentes, Lisa; Schwander, Karen L; Vazquez, Ana; Rao, Dabeeru C

    2016-01-01

    Blood pressure (BP) has been shown to be substantially heritable, yet identified genetic variants explain only a small fraction of the heritability. Gene-smoking interactions have detected novel BP loci in cross-sectional family data. Longitudinal family data are available and have additional promise to identify BP loci. However, this type of data presents unique analysis challenges. Although several methods for analyzing longitudinal family data are available, which method is the most appropriate and under what conditions has not been fully studied. Using data from three clinic visits from the Framingham Heart Study, we performed association analysis accounting for gene-smoking interactions in BP at 31,203 markers on chromosome 22. We evaluated three different modeling frameworks: generalized estimating equations (GEE), hierarchical linear modeling, and pedigree-based mixed modeling. The three models performed somewhat comparably, with multiple overlaps in the most strongly associated loci from each model. Loci with the greatest significance were more strongly supported in the longitudinal analyses than in any of the component single-visit analyses. The pedigree-based mixed model was more conservative, with less inflation in the variant main effect and greater deflation in the gene-smoking interactions. The GEE, but not the other two models, resulted in substantial inflation in the tail of the distribution when variants with minor allele frequency <1% were included in the analysis. The choice of analysis method should depend on the model and the structure and complexity of the familial and longitudinal data. © 2015 WILEY PERIODICALS, INC.

  2. A double-strand break can trigger immunoglobulin gene conversion

    PubMed Central

    Bastianello, Giulia; Arakawa, Hiroshi

    2017-01-01

    All three B cell-specific activities of the immunoglobulin (Ig) gene re-modeling system—gene conversion, somatic hypermutation and class switch recombination—require activation-induced deaminase (AID). AID-induced DNA lesions must be further processed and dissected into different DNA recombination pathways. In order to characterize potential intermediates for Ig gene conversion, we inserted an I-SceI recognition site into the complementarity determining region 1 (CDR1) of the Ig light chain locus of the AID knockout DT40 cell line, and conditionally expressed I-SceI endonuclease. Here, we show that a double-strand break (DSB) in CDR1 is sufficient to trigger Ig gene conversion in the absence of AID. The pattern and pseudogene usage of DSB-induced gene conversion were comparable to those of AID-induced gene conversion; surprisingly, sometimes a single DSB induced multiple gene conversion events. These constitute direct evidence that a DSB in the V region can be an intermediate for gene conversion. The fate of the DNA lesion downstream of a DSB had more flexibility than that of AID, suggesting two alternative models: (i) DSBs during the physiological gene conversion are in the minority compared to single-strand breaks (SSBs), which are frequently generated following DNA deamination, or (ii) the physiological gene conversion is mediated by a tightly regulated DSB that is locally protected from non-homologous end joining (NHEJ) or other non-homologous DNA recombination machineries. PMID:27701075

  3. Mimosa: Mixture Model of Co-expression to Detect Modulators of Regulatory Interaction

    NASA Astrophysics Data System (ADS)

    Hansen, Matthew; Everett, Logan; Singh, Larry; Hannenhalli, Sridhar

    Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation. Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in a (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on three biological cases in cow and in yeast. While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.

  4. Gene Editing and Human Pluripotent Stem Cells: Tools for Advancing Diabetes Disease Modeling and Beta-Cell Development.

    PubMed

    Millette, Katelyn; Georgia, Senta

    2017-10-05

    This review will focus on the multiple approaches to gene editing and address the potential use of genetically modified human pluripotent stem cell-derived beta cells (SC-β) as a tool to study human beta-cell development and model their function in diabetes. We will explore how new variations of CRISPR/Cas9 gene editing may accelerate our understanding of beta-cell developmental biology, elucidate novel mechanisms that establish and regulate beta-cell function, and assist in pioneering new therapeutic modalities for treating diabetes. Improvements in CRISPR/Cas9 target specificity and homology-directed recombination continue to advance its use in engineering stem cells to model and potentially treat disease. We will review how CRISPR/Cas9 gene editing is informing our understanding of beta-cell development and expanding the therapeutic possibilities for treating diabetes and other diseases. Here we focus on the emerging use of gene editing technology, specifically CRISPR/Cas9, as a means of manipulating human gene expression to gain novel insights into the roles of key factors in beta-cell development and function. Taken together, the combined use of SC-β cells and CRISPR/Cas9 gene editing will shed new light on human beta-cell development and function and accelerate our progress towards developing new therapies for patients with diabetes.

  5. Noninvasive optical monitoring multiple physiological parameters response to cytokine storm

    NASA Astrophysics Data System (ADS)

    Li, Zebin; Li, Ting

    2018-02-01

    Cancer and other disease originated by immune or genetic problems have become a main cause of death. Gene/cell therapy is a highlighted potential method for the treatment of these diseases. However, during the treatment, it always causes cytokine storm, which probably trigger acute respiratory distress syndrome and multiple organ failure. Here we developed a point-of-care device for noninvasive monitoring cytokine storm induced multiple physiological parameters simultaneously. Oxy-hemoglobin, deoxy-hemoglobin, water concentration and deep-tissue/tumor temperature variations were simultaneously measured by extended near infrared spectroscopy. Detection algorithms of symptoms such as shock, edema, deep-tissue fever and tissue fibrosis were developed and included. Based on these measurements, modeling of patient tolerance and cytokine storm intensity were carried out. This custom device was tested on patients experiencing cytokine storm in intensive care unit. The preliminary data indicated the potential of our device in popular and milestone gene/cell therapy, especially, chimeric antigen receptor T-cell immunotherapy (CAR-T).

  6. Plasmacytomagenesis in Eμ-v-abl transgenic mice is accelerated when apoptosis is restrained

    PubMed Central

    Vandenberg, Cassandra J.; Waring, Paul; Strasser, Andreas

    2014-01-01

    Mice susceptible to plasma cell tumors provide a useful model for human multiple myeloma. We previously showed that mice expressing an Eµ-v-abl oncogene solely develop plasmacytomas. Here we show that loss of the proapoptotic BH3-only protein Bim or, to a lesser extent, overexpression of antiapoptotic Bcl-2 or Mcl-1, significantly accelerated the development of plasmacytomas and increased their incidence. Disease was preceded by an increased abundance of plasma cells, presumably reflecting their enhanced survival capacity in vivo. Plasmacytomas of each genotype expressed high levels of v-abl and frequently harbored a rearranged c-myc gene, probably as a result of chromosome translocation. As in human multiple myelomas, elevated expression of cyclin D genes was common, and p53 deregulation was rare. Our results for plasmacytomas highlight the significance of antiapoptotic changes in multiple myeloma, which include elevated expression of Mcl-1 and, less frequently, Bcl-2, and suggest that closer attention to defects in Bim expression is warranted. PMID:24986687

  7. A Systems' Biology Approach to Study MicroRNA-Mediated Gene Regulatory Networks

    PubMed Central

    Kunz, Manfred; Vera, Julio; Wolkenhauer, Olaf

    2013-01-01

    MicroRNAs (miRNAs) are potent effectors in gene regulatory networks where aberrant miRNA expression can contribute to human diseases such as cancer. For a better understanding of the regulatory role of miRNAs in coordinating gene expression, we here present a systems biology approach combining data-driven modeling and model-driven experiments. Such an approach is characterized by an iterative process, including biological data acquisition and integration, network construction, mathematical modeling and experimental validation. To demonstrate the application of this approach, we adopt it to investigate mechanisms of collective repression on p21 by multiple miRNAs. We first construct a p21 regulatory network based on data from the literature and further expand it using algorithms that predict molecular interactions. Based on the network structure, a detailed mechanistic model is established and its parameter values are determined using data. Finally, the calibrated model is used to study the effect of different miRNA expression profiles and cooperative target regulation on p21 expression levels in different biological contexts. PMID:24350286

  8. An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci

    PubMed Central

    Ju, Jin Hyun; Crystal, Ronald G.

    2017-01-01

    Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL. PMID:28505156

  9. An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci.

    PubMed

    Ju, Jin Hyun; Shenoy, Sushila A; Crystal, Ronald G; Mezey, Jason G

    2017-05-01

    Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL.

  10. Three gene expression vector sets for concurrently expressing multiple genes in Saccharomyces cerevisiae.

    PubMed

    Ishii, Jun; Kondo, Takashi; Makino, Harumi; Ogura, Akira; Matsuda, Fumio; Kondo, Akihiko

    2014-05-01

    Yeast has the potential to be used in bulk-scale fermentative production of fuels and chemicals due to its tolerance for low pH and robustness for autolysis. However, expression of multiple external genes in one host yeast strain is considerably labor-intensive due to the lack of polycistronic transcription. To promote the metabolic engineering of yeast, we generated systematic and convenient genetic engineering tools to express multiple genes in Saccharomyces cerevisiae. We constructed a series of multi-copy and integration vector sets for concurrently expressing two or three genes in S. cerevisiae by embedding three classical promoters. The comparative expression capabilities of the constructed vectors were monitored with green fluorescent protein, and the concurrent expression of genes was monitored with three different fluorescent proteins. Our multiple gene expression tool will be helpful to the advanced construction of genetically engineered yeast strains in a variety of research fields other than metabolic engineering. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  11. Positive Selection of Plasmodium falciparum Parasites With Multiple var2csa-Type PfEMP1 Genes During the Course of Infection in Pregnant Women

    PubMed Central

    Salanti, Ali; Lavstsen, Thomas; Nielsen, Morten A.; Theander, Thor G.; Leke, Rose G. F.; Lo, Yeung Y.; Bobbili, Naveen; Arnot, David E.; Taylor, Diane W.

    2011-01-01

    Placental malaria infections are caused by Plasmodium falciparum–infected red blood cells sequestering in the placenta by binding to chondroitin sulfate A, mediated by VAR2CSA, a variant of the PfEMP1 family of adhesion antigens. Recent studies have shown that many P. falciparum genomes have multiple genes coding for different VAR2CSA proteins, and parasites with >1 var2csa gene appear to be more common in pregnant women with placental malaria than in nonpregnant individuals. We present evidence that, in pregnant women, parasites containing multiple var2csa-type genes possess a selective advantage over parasites with a single var2csa gene. Accumulation of parasites with multiple copies of the var2csa gene during the course of pregnancy was also correlated with the development of antibodies involved in blocking VAR2CSA adhesion. The data suggest that multiplicity of var2csa-type genes enables P. falciparum parasites to persist for a longer period of time during placental infections, probably because of their greater capacity for antigenic variation and evasion of variant-specific immune responses. PMID:21592998

  12. Genomics of Natural Populations: How Differentially Expressed Genes Shape the Evolution of Chromosomal Inversions in Drosophila pseudoobscura

    PubMed Central

    Fuller, Zachary L.; Haynes, Gwilym D.; Richards, Stephen; Schaeffer, Stephen W.

    2016-01-01

    Chromosomal rearrangements can shape the structure of genetic variation in the genome directly through alteration of genes at breakpoints or indirectly by holding combinations of genetic variants together due to reduced recombination. The third chromosome of Drosophila pseudoobscura is a model system to test hypotheses about how rearrangements are established in populations because its third chromosome is polymorphic for >30 gene arrangements that were generated by a series of overlapping inversion mutations. Circumstantial evidence has suggested that these gene arrangements are selected. Despite the expected homogenizing effects of extensive gene flow, the frequencies of arrangements form gradients or clines in nature, which have been stable since the system was first described >80 years ago. Furthermore, multiple arrangements exist at appreciable frequencies across several ecological niches providing the opportunity for heterokaryotypes to form. In this study, we tested whether genes are differentially expressed among chromosome arrangements in first instar larvae, adult females and males. In addition, we asked whether transcriptional patterns in heterokaryotypes are dominant, semidominant, overdominant, or underdominant. We find evidence for a significant abundance of differentially expressed genes across the inverted regions of the third chromosome, including an enrichment of genes involved in sensory perception for males. We find the majority of loci show additivity in heterokaryotypes. Our results suggest that multiple genes have expression differences among arrangements that were either captured by the original inversion mutation or accumulated after it reached polymorphic frequencies, providing a potential source of genetic variation for selection to act upon. These data suggest that the inversions are favored because of their indirect effect of recombination suppression that has held different combinations of differentially expressed genes together in the various gene arrangement backgrounds. PMID:27401754

  13. System-level insights into the cellular interactome of a non-model organism: inferring, modelling and analysing functional gene network of soybean (Glycine max).

    PubMed

    Xu, Yungang; Guo, Maozu; Zou, Quan; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang

    2014-01-01

    Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN), a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max), due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs), in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional interactome at the genome and microRNome levels. Additionally, a web tool for information retrieval and analysis of SoyFGNs can be accessed at SoyFN: http://nclab.hit.edu.cn/SoyFN.

  14. System-Level Insights into the Cellular Interactome of a Non-Model Organism: Inferring, Modelling and Analysing Functional Gene Network of Soybean (Glycine max)

    PubMed Central

    Xu, Yungang; Guo, Maozu; Zou, Quan; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang

    2014-01-01

    Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN), a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max), due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs), in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional interactome at the genome and microRNome levels. Additionally, a web tool for information retrieval and analysis of SoyFGNs can be accessed at SoyFN: http://nclab.hit.edu.cn/SoyFN. PMID:25423109

  15. Pyrethroid Resistance in Malaysian Populations of Dengue Vector Aedes aegypti Is Mediated by CYP9 Family of Cytochrome P450 Genes

    PubMed Central

    Ishak, Intan H.; Kamgang, Basile; Ibrahim, Sulaiman S.; Riveron, Jacob M.; Irving, Helen

    2017-01-01

    Background Dengue control and prevention rely heavily on insecticide-based interventions. However, insecticide resistance in the dengue vector Aedes aegypti, threatens the continued effectiveness of these tools. The molecular basis of the resistance remains uncharacterised in many endemic countries including Malaysia, preventing the design of evidence-based resistance management. Here, we investigated the underlying molecular basis of multiple insecticide resistance in Ae. aegypti populations across Malaysia detecting the major genes driving the metabolic resistance. Methodology/Principal Findings Genome-wide microarray-based transcription analysis was carried out to detect the genes associated with metabolic resistance in these populations. Comparisons of the susceptible New Orleans strain to three non-exposed multiple insecticide resistant field strains; Penang, Kuala Lumpur and Kota Bharu detected 2605, 1480 and 425 differentially expressed transcripts respectively (fold-change>2 and p-value ≤ 0.05). 204 genes were commonly over-expressed with monooxygenase P450 genes (CYP9J27, CYP6CB1, CYP9J26 and CYP9M4) consistently the most up-regulated detoxification genes in all populations, indicating that they possibly play an important role in the resistance. In addition, glutathione S-transferases, carboxylesterases and other gene families commonly associated with insecticide resistance were also over-expressed. Gene Ontology (GO) enrichment analysis indicated an over-representation of GO terms linked to resistance such as monooxygenases, carboxylesterases, glutathione S-transferases and heme-binding. Polymorphism analysis of CYP9J27 sequences revealed a high level of polymorphism (except in Joho Bharu), suggesting a limited directional selection on this gene. In silico analysis of CYP9J27 activity through modelling and docking simulations suggested that this gene is involved in the multiple resistance in Malaysian populations as it is predicted to metabolise pyrethroids, DDT and bendiocarb. Conclusion/significance The predominant over-expression of cytochrome P450s suggests that synergist-based (PBO) control tools could be utilised to improve control of this major dengue vector across Malaysia. PMID:28114328

  16. Matrix metalloproteinases and educational attainment in refractive error: evidence of gene-environment interactions in the AREDS study

    PubMed Central

    Wojciechowski, Robert; Yee, Stephanie S.; Simpson, Claire L.; Bailey-Wilson, Joan E.; Stambolian, Dwight

    2012-01-01

    Purpose A previous study of Old Order Amish families has shown association of ocular refraction with markers proximal to matrix metalloproteinase (MMP) genes MMP1 and MMP10 and intragenic to MMP2. We conducted a candidate gene replication study of association between refraction and single nucleotide polymorphisms (SNPs) within these genomic regions. Design Candidate gene genetic association study. Participants 2,000 participants drawn from the Age Related Eye Disease Study (AREDS) were chosen for genotyping. After quality control filtering, 1912 individuals were available for analysis. Methods Microarray genotyping was performed using the HumanOmni 2.5 bead array. SNPs originally typed in the previous Amish association study were extracted for analysis. In addition, haplotype tagging SNPs were genotyped using TaqMan assays. Quantitative trait association analyses of mean spherical equivalent refraction (MSE) were performed on 30 markers using linear regression models and an additive genetic risk model, while adjusting for age, sex, education, and population substructure. Post-hoc analyses were performed after stratifying on a dichotomous education variable. Pointwise (P-emp) and multiple-test study-wise (P-multi) significance levels were calculated empirically through permutation. Main outcome measures MSE was used as a quantitative measure of ocular refraction. Results The mean age and ocular refraction were 68 years (SD=4.7) and +0.55 D (SD=2.14), respectively. Pointwise statistical significance was obtained for rs1939008 (P-emp=0.0326). No SNP attained statistical significance after correcting for multiple testing. In stratified analyses, multiple SNPs reached pointwise significance in the lower-education group: 2 of these were statistically significant after multiple testing correction. The two highest-ranking SNPs in Amish families (rs1939008 and rs9928731) showed pointwise P-emp<0.01 in the lower-education stratum of AREDS participants. Conclusions We show suggestive evidence of replication of an association signal for ocular refraction to a marker between MMP1 and MMP10. We also provide evidence of a gene-environment interaction between previously-reported markers and education on refractive error. Variants in MMP1- MMP10 and MMP2 regions appear to affect population variation in ocular refraction in environmental conditions less favorable for myopia development. PMID:23098370

  17. Genetics and Genomics of Social Behavior in a Chicken Model.

    PubMed

    Johnsson, Martin; Henriksen, Rie; Fogelholm, Jesper; Höglund, Andrey; Jensen, Per; Wright, Dominic

    2018-05-01

    The identification of genes affecting sociality can give insights into the maintenance and development of sociality and personality. In this study, we used the combination of an advanced intercross between wild and domestic chickens with a combined QTL and eQTL genetical genomics approach to identify genes for social reinstatement, a social and anxiety-related behavior. A total of 24 social reinstatement QTL were identified and overlaid with over 600 eQTL obtained from the same birds using hypothalamic tissue. Correlations between overlapping QTL and eQTL indicated five strong candidate genes, with the gene TTRAP being strongly significantly correlated with multiple aspects of social reinstatement behavior, as well as possessing a highly significant eQTL. Copyright © 2018 by the Genetics Society of America.

  18. Multiscale Modeling of Gene-Behavior Associations in an Artificial Neural Network Model of Cognitive Development.

    PubMed

    Thomas, Michael S C; Forrester, Neil A; Ronald, Angelica

    2016-01-01

    In the multidisciplinary field of developmental cognitive neuroscience, statistical associations between levels of description play an increasingly important role. One example of such associations is the observation of correlations between relatively common gene variants and individual differences in behavior. It is perhaps surprising that such associations can be detected despite the remoteness of these levels of description, and the fact that behavior is the outcome of an extended developmental process involving interaction of the whole organism with a variable environment. Given that they have been detected, how do such associations inform cognitive-level theories? To investigate this question, we employed a multiscale computational model of development, using a sample domain drawn from the field of language acquisition. The model comprised an artificial neural network model of past-tense acquisition trained using the backpropagation learning algorithm, extended to incorporate population modeling and genetic algorithms. It included five levels of description-four internal: genetic, network, neurocomputation, behavior; and one external: environment. Since the mechanistic assumptions of the model were known and its operation was relatively transparent, we could evaluate whether cross-level associations gave an accurate picture of causal processes. We established that associations could be detected between artificial genes and behavioral variation, even under polygenic assumptions of a many-to-one relationship between genes and neurocomputational parameters, and when an experience-dependent developmental process interceded between the action of genes and the emergence of behavior. We evaluated these associations with respect to their specificity (to different behaviors, to function vs. structure), to their developmental stability, and to their replicability, as well as considering issues of missing heritability and gene-environment interactions. We argue that gene-behavior associations can inform cognitive theory with respect to effect size, specificity, and timing. The model demonstrates a means by which researchers can undertake multiscale modeling with respect to cognition and develop highly specific and complex hypotheses across multiple levels of description. Copyright © 2015 Cognitive Science Society, Inc.

  19. The endogenous and reactive depression subtypes revisited: integrative animal and human studies implicate multiple distinct molecular mechanisms underlying major depressive disorder

    PubMed Central

    2014-01-01

    Background Traditional diagnoses of major depressive disorder (MDD) suggested that the presence or absence of stress prior to onset results in either ‘reactive’ or ‘endogenous’ subtypes of the disorder, respectively. Several lines of research suggest that the biological underpinnings of ‘reactive’ or ‘endogenous’ subtypes may also differ, resulting in differential response to treatment. We investigated this hypothesis by comparing the gene-expression profiles of three animal models of ‘reactive’ and ‘endogenous’ depression. We then translated these findings to clinical samples using a human post-mortem mRNA study. Methods Affymetrix mouse whole-genome oligonucleotide arrays were used to measure gene expression from hippocampal tissues of 144 mice from the Genome-based Therapeutic Drugs for Depression (GENDEP) project. The study used four inbred mouse strains and two depressogenic ‘stress’ protocols (maternal separation and Unpredictable Chronic Mild Stress) to model ‘reactive’ depression. Stress-related mRNA differences in mouse were compared with a parallel mRNA study using Flinders Sensitive and Resistant rat lines as a model of ‘endogenous’ depression. Convergent genes differentially expressed across the animal studies were used to inform candidate gene selection in a human mRNA post-mortem case control study from the Stanley Brain Consortium. Results In the mouse ‘reactive’ model, the expression of 350 genes changed in response to early stresses and 370 in response to late stresses. A minimal genetic overlap (less than 8.8%) was detected in response to both stress protocols, but 30% of these genes (21) were also differentially regulated in the ‘endogenous’ rat study. This overlap is significantly greater than expected by chance. The VAMP-2 gene, differentially expressed across the rodent studies, was also significantly altered in the human study after correcting for multiple testing. Conclusions Our results suggest that ‘endogenous’ and ‘reactive’ subtypes of depression are associated with largely distinct changes in gene-expression. However, they also suggest that the molecular signature of ‘reactive’ depression caused by early stressors differs considerably from that of ‘reactive’ depression caused by late stressors. A small set of genes was consistently dysregulated across each paradigm and in post-mortem brain tissue of depressed patients suggesting a final common pathway to the disorder. These genes included the VAMP-2 gene, which has previously been associated with Axis-I disorders including MDD, bipolar depression, schizophrenia and with antidepressant treatment response. We also discuss the implications of our findings for disease classification, personalized medicine and case-control studies of MDD. PMID:24886127

  20. A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction

    PubMed Central

    De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David

    2016-01-01

    Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. PMID:25281847

  1. Concept mapping One-Carbon Metabolism to model future ontologies for nutrient-gene-phenotype interactions.

    PubMed

    Joslin, A C; Green, R; German, J B; Lange, M C

    2014-09-01

    Advances in the development of bioinformatic tools continue to improve investigators' ability to interrogate, organize, and derive knowledge from large amounts of heterogeneous information. These tools often require advanced technical skills not possessed by life scientists. User-friendly, low-barrier-to-entry methods of visualizing nutrigenomics information are yet to be developed. We utilized concept mapping software from the Institute for Human and Machine Cognition to create a conceptual model of diet and health-related data that provides a foundation for future nutrigenomics ontologies describing published nutrient-gene/polymorphism-phenotype data. In this model, maps containing phenotype, nutrient, gene product, and genetic polymorphism interactions are visualized as triples of two concepts linked together by a linking phrase. These triples, or "knowledge propositions," contextualize aggregated data and information into easy-to-read knowledge maps. Maps of these triples enable visualization of genes spanning the One-Carbon Metabolism (OCM) pathway, their sequence variants, and multiple literature-mined associations including concepts relevant to nutrition, phenotypes, and health. The concept map development process documents the incongruity of information derived from pathway databases versus literature resources. This conceptual model highlights the importance of incorporating information about genes in upstream pathways that provide substrates, as well as downstream pathways that utilize products of the pathway under investigation, in this case OCM. Other genes and their polymorphisms, such as TCN2 and FUT2, although not directly involved in OCM, potentially alter OCM pathway functionality. These upstream gene products regulate substrates such as B12. Constellations of polymorphisms affecting the functionality of genes along OCM, together with substrate and cofactor availability, may impact resultant phenotypes. These conceptual maps provide a foundational framework for development of nutrient-gene/polymorphism-phenotype ontologies and systems visualization.

  2. Multiconstrained gene clustering based on generalized projections

    PubMed Central

    2010-01-01

    Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386

  3. Amplification of a Gene Related to Mammalian mdr Genes in Drug-Resistant Plasmodium falciparum

    NASA Astrophysics Data System (ADS)

    Wilson, Craig M.; Serrano, Adelfa E.; Wasley, Annemarie; Bogenschutz, Michael P.; Shankar, Anuraj H.; Wirth, Dyann F.

    1989-06-01

    The malaria parasite Plasmodium falciparum contains at least two genes related to the mammalian multiple drug resistance genes, and at least one of the P. falciparum genes is expressed at a higher level and is present in higher copy number in a strain that is resistant to multiple drugs than in a strain that is sensitive to the drugs.

  4. Association Study of 60 Candidate Genes with Antipsychotic-induced Weight Gain in Schizophrenia Patients.

    PubMed

    Ryu, S; Huh, I-S; Cho, E-Y; Cho, Y; Park, T; Yoon, S C; Joo, Y H; Hong, K S

    2016-03-01

    This study aimed to investigate the association of multiple candidate genes with weight gain and appetite change during antipsychotic treatment. A total of 233 single nucleotide polymorphisms (SNPs) within 60 candidate genes were genotyped. BMI changes for up to 8 weeks in 84 schizophrenia patients receiving antipsychotic medication were analyzed using a linear mixed model. In addition, we assessed appetite change during antipsychotic treatment in a different group of 46 schizophrenia patients using the Drug-Related Eating Behavior Questionnaire. No SNP showed a statistically significant association with BMI or appetite change after correction for multiple testing. We observed trends of association (P<0.05) between 19 SNPs of 11 genes and weight gain, and between 7 SNPs of 5 genes and appetite change. In particular, rs696217 in GHRL showed suggestive evidence of association with not only weight gain (P=0.001) but also appetite change (P=0.042). Patients carrying the GG genotype of rs696217 exhibited higher increase in both BMI and appetite compared to patients carrying the GT/TT genotype. Our findings suggested the involvement of a GHRL polymorphism in weight gain, which was specifically mediated by appetite change, during antipsychotic treatment in schizophrenia patients. © Georg Thieme Verlag KG Stuttgart · New York.

  5. Dosage changes of a segment at 17p13.1 lead to intellectual disability and microcephaly as a result of complex genetic interaction of multiple genes.

    PubMed

    Carvalho, Claudia M B; Vasanth, Shivakumar; Shinawi, Marwan; Russell, Chad; Ramocki, Melissa B; Brown, Chester W; Graakjaer, Jesper; Skytte, Anne-Bine; Vianna-Morgante, Angela M; Krepischi, Ana C V; Patel, Gayle S; Immken, LaDonna; Aleck, Kyrieckos; Lim, Cynthia; Cheung, Sau Wai; Rosenberg, Carla; Katsanis, Nicholas; Lupski, James R

    2014-11-06

    The 17p13.1 microdeletion syndrome is a recently described genomic disorder with a core clinical phenotype of intellectual disability, poor to absent speech, dysmorphic features, and a constellation of more variable clinical features, most prominently microcephaly. We identified five subjects with copy-number variants (CNVs) on 17p13.1 for whom we performed detailed clinical and molecular studies. Breakpoint mapping and retrospective analysis of published cases refined the smallest region of overlap (SRO) for microcephaly to a genomic interval containing nine genes. Dissection of this phenotype in zebrafish embryos revealed a complex genetic architecture: dosage perturbation of four genes (ASGR1, ACADVL, DVL2, and GABARAP) impeded neurodevelopment and decreased dosage of the same loci caused a reduced mitotic index in vitro. Moreover, epistatic analyses in vivo showed that dosage perturbations of discrete gene pairings induce microcephaly. Taken together, these studies support a model in which concomitant dosage perturbation of multiple genes within the CNV drive the microcephaly and possibly other neurodevelopmental phenotypes associated with rearrangements in the 17p13.1 SRO. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. Cell of origin associated classification of B-cell malignancies by gene signatures of the normal B-cell hierarchy.

    PubMed

    Johnsen, Hans Erik; Bergkvist, Kim Steve; Schmitz, Alexander; Kjeldsen, Malene Krag; Hansen, Steen Møller; Gaihede, Michael; Nørgaard, Martin Agge; Bæch, John; Grønholdt, Marie-Louise; Jensen, Frank Svendsen; Johansen, Preben; Bødker, Julie Støve; Bøgsted, Martin; Dybkær, Karen

    2014-06-01

    Recent findings have suggested biological classification of B-cell malignancies as exemplified by the "activated B-cell-like" (ABC), the "germinal-center B-cell-like" (GCB) and primary mediastinal B-cell lymphoma (PMBL) subtypes of diffuse large B-cell lymphoma and "recurrent translocation and cyclin D" (TC) classification of multiple myeloma. Biological classification of B-cell derived cancers may be refined by a direct and systematic strategy where identification and characterization of normal B-cell differentiation subsets are used to define the cancer cell of origin phenotype. Here we propose a strategy combining multiparametric flow cytometry, global gene expression profiling and biostatistical modeling to generate B-cell subset specific gene signatures from sorted normal human immature, naive, germinal centrocytes and centroblasts, post-germinal memory B-cells, plasmablasts and plasma cells from available lymphoid tissues including lymph nodes, tonsils, thymus, peripheral blood and bone marrow. This strategy will provide an accurate image of the stage of differentiation, which prospectively can be used to classify any B-cell malignancy and eventually purify tumor cells. This report briefly describes the current models of the normal B-cell subset differentiation in multiple tissues and the pathogenesis of malignancies originating from the normal germinal B-cell hierarchy.

  7. Accurate Encoding and Decoding by Single Cells: Amplitude Versus Frequency Modulation

    PubMed Central

    Micali, Gabriele; Aquino, Gerardo; Richards, David M.; Endres, Robert G.

    2015-01-01

    Cells sense external concentrations and, via biochemical signaling, respond by regulating the expression of target proteins. Both in signaling networks and gene regulation there are two main mechanisms by which the concentration can be encoded internally: amplitude modulation (AM), where the absolute concentration of an internal signaling molecule encodes the stimulus, and frequency modulation (FM), where the period between successive bursts represents the stimulus. Although both mechanisms have been observed in biological systems, the question of when it is beneficial for cells to use either AM or FM is largely unanswered. Here, we first consider a simple model for a single receptor (or ion channel), which can either signal continuously whenever a ligand is bound, or produce a burst in signaling molecule upon receptor binding. We find that bursty signaling is more accurate than continuous signaling only for sufficiently fast dynamics. This suggests that modulation based on bursts may be more common in signaling networks than in gene regulation. We then extend our model to multiple receptors, where continuous and bursty signaling are equivalent to AM and FM respectively, finding that AM is always more accurate. This implies that the reason some cells use FM is related to factors other than accuracy, such as the ability to coordinate expression of multiple genes or to implement threshold crossing mechanisms. PMID:26030820

  8. LocExpress: a web server for efficiently estimating expression of novel transcripts.

    PubMed

    Hou, Mei; Tian, Feng; Jiang, Shuai; Kong, Lei; Yang, Dechang; Gao, Ge

    2016-12-22

    The temporal and spatial-specific expression pattern of a transcript in multiple tissues and cell types can indicate key clues about its function. While several gene atlas available online as pre-computed databases for known gene models, it's still challenging to get expression profile for previously uncharacterized (i.e. novel) transcripts efficiently. Here we developed LocExpress, a web server for efficiently estimating expression of novel transcripts across multiple tissues and cell types in human (20 normal tissues/cells types and 14 cell lines) as well as in mouse (24 normal tissues/cell types and nine cell lines). As a wrapper to RNA-Seq quantification algorithm, LocExpress efficiently reduces the time cost by making abundance estimation calls increasingly within the minimum spanning bundle region of input transcripts. For a given novel gene model, such local context-oriented strategy allows LocExpress to estimate its FPKMs in hundreds of samples within minutes on a standard Linux box, making an online web server possible. To the best of our knowledge, LocExpress is the only web server to provide nearly real-time expression estimation for novel transcripts in common tissues and cell types. The server is publicly available at http://loc-express.cbi.pku.edu.cn .

  9. Arabidopsis thaliana gonidialess A/Zuotin related factors (GlsA/ZRF) are essential for maintenance of meristem integrity.

    PubMed

    Guzmán-López, José Alfredo; Abraham-Juárez, María Jazmín; Lozano-Sotomayor, Paulina; de Folter, Stefan; Simpson, June

    2016-05-01

    Observation of a differential expression pattern, including strong expression in meristematic tissue of an Agave tequilana GlsA/ZRF ortholog suggested an important role for this gene during bulbil formation and developmental changes in this species. In order to better understand this role, the two GlsA/ZFR orthologs present in the genome of Arabidopsis thaliana were functionally characterized by analyzing expression patterns, double mutant phenotypes, promoter-GUS fusions and expression of hormone related or meristem marker genes. Patterns of expression for A. thaliana show that GlsA/ZFR genes are strongly expressed in SAMs and RAMs in mature plants and developing embryos and double mutants showed multiple changes in morphology related to both SAM and RAM tissues. Typical double mutants showed stunted growth of aerial and root tissue, formation of multiple ectopic meristems and effects on cotyledons, leaves and flowers. The KNOX genes STM and BP were overexpressed in double mutants whereas CLV3, WUSCHEL and AS1 were repressed and lack of AtGlsA expression was also associated with changes in localization of auxin and cytokinin. These results suggest that GlsA/ZFR is an essential component of the machinery that maintains the integrity of SAM and RAM tissue and underline the potential to identify new genes or gene functions based on observations in non-model plants.

  10. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    PubMed

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  11. The mechanisms of sirtuin 2-mediated exacerbation of alpha-synuclein toxicity in models of Parkinson disease

    USDA-ARS?s Scientific Manuscript database

    Sirtuin genes have been associated with aging and are known to affect multiple cellular pathways. Sirtuin 2 was previously shown to modulate proteotoxicity associated with age-associated neurodegenerative disorders such as Alzheimer and Parkinson disease (PD). However, the precise molecular mechanis...

  12. Large-scale atlas of microarray data reveals biological landscape of gene expression in Arabidopsis

    USDA-ARS?s Scientific Manuscript database

    Transcriptome datasets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by the lack of metad...

  13. A Convenient Cas9-based Conditional Knockout Strategy for Simultaneously Targeting Multiple Genes in Mouse.

    PubMed

    Chen, Jiang; Du, Yinan; He, Xueyan; Huang, Xingxu; Shi, Yun S

    2017-03-31

    The most powerful way to probe protein function is to characterize the consequence of its deletion. Compared to conventional gene knockout (KO), conditional knockout (cKO) provides an advanced gene targeting strategy with which gene deletion can be performed in a spatially and temporally restricted manner. However, for most species that are amphiploid, the widely used Cre-flox conditional KO (cKO) system would need targeting loci in both alleles to be loxP flanked, which in practice, requires time and labor consuming breeding. This is considerably significant when one is dealing with multiple genes. CRISPR/Cas9 genome modulation system is advantaged in its capability in targeting multiple sites simultaneously. Here we propose a strategy that could achieve conditional KO of multiple genes in mouse with Cre recombinase dependent Cas9 expression. By transgenic construction of loxP-stop-loxP (LSL) controlled Cas9 (LSL-Cas9) together with sgRNAs targeting EGFP, we showed that the fluorescence molecule could be eliminated in a Cre-dependent manner. We further verified the efficacy of this novel strategy to target multiple sites by deleting c-Maf and MafB simultaneously in macrophages specifically. Compared to the traditional Cre-flox cKO strategy, this sgRNAs-LSL-Cas9 cKO system is simpler and faster, and would make conditional manipulation of multiple genes feasible.

  14. Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes.

    PubMed

    Lv, Jie; Havlak, Paul; Putnam, Nicholas H

    2011-10-05

    Many metazoan genomes conserve chromosome-scale gene linkage relationships ("macro-synteny") from the common ancestor of multicellular animal life 1234, but the biological explanation for this conservation is still unknown. Double cut and join (DCJ) is a simple, well-studied model of neutral genome evolution amenable to both simulation and mathematical analysis 5, but as we show here, it is not sufficent to explain long-term macro-synteny conservation. We examine a family of simple (one-parameter) extensions of DCJ to identify models and choices of parameters consistent with the levels of macro- and micro-synteny conservation observed among animal genomes. Our software implements a flexible strategy for incorporating genomic context into the DCJ model to incorporate various types of genomic context ("DCJ-[C]"), and is available as open source software from http://github.com/putnamlab/dcj-c. A simple model of genome evolution, in which DCJ moves are allowed only if they maintain chromosomal linkage among a set of constrained genes, can simultaneously account for the level of macro-synteny conservation and for correlated conservation among multiple pairs of species. Simulations under this model indicate that a constraint on approximately 7% of metazoan genes is sufficient to constrain genome rearrangement to an average rate of 25 inversions and 1.7 translocations per million years.

  15. Identification of landscape features influencing gene flow: How useful are habitat selection models?

    USGS Publications Warehouse

    Roffler, Gretchen H.; Schwartz, Michael K.; Pilgrim, Kristy L.; Talbot, Sandra L.; Sage, Kevin; Adams, Layne G.; Luikart, Gordon

    2016-01-01

    Understanding how dispersal patterns are influenced by landscape heterogeneity is critical for modeling species connectivity. Resource selection function (RSF) models are increasingly used in landscape genetics approaches. However, because the ecological factors that drive habitat selection may be different from those influencing dispersal and gene flow, it is important to consider explicit assumptions and spatial scales of measurement. We calculated pairwise genetic distance among 301 Dall's sheep (Ovis dalli dalli) in southcentral Alaska using an intensive noninvasive sampling effort and 15 microsatellite loci. We used multiple regression of distance matrices to assess the correlation of pairwise genetic distance and landscape resistance derived from an RSF, and combinations of landscape features hypothesized to influence dispersal. Dall's sheep gene flow was positively correlated with steep slopes, moderate peak normalized difference vegetation indices (NDVI), and open land cover. Whereas RSF covariates were significant in predicting genetic distance, the RSF model itself was not significantly correlated with Dall's sheep gene flow, suggesting that certain habitat features important during summer (rugged terrain, mid-range elevation) were not influential to effective dispersal. This work underscores that consideration of both habitat selection and landscape genetics models may be useful in developing management strategies to both meet the immediate survival of a species and allow for long-term genetic connectivity.

  16. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  17. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  18. Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.

    PubMed

    Tong, Dong Ling; Schierz, Amanda C

    2011-09-01

    Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data. The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time. The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types. The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. Modeling Anterior Development in Mice: Diet as Modulator of Risk for Neural Tube Defects

    PubMed Central

    Kappen, Claudia

    2014-01-01

    Head morphogenesis is a complex process that is controlled by multiple signaling centers. The most common defects of cranial development are craniofacial defects, such as cleft lip and cleft palate, and neural tube defects, such as anencephaly and encephalocoele in humans. More than 400 genes that contribute to proper neural tube closure have been identified in experimental animals, but only very few causative gene mutations have been identified in humans, supporting the notion that environmental influences are critical. The intrauterine environment is influenced by maternal nutrition, and hence, maternal diet can modulate the risk for cranial and neural tube defects. This article reviews recent progress toward a better understanding of nutrients during pregnancy, with particular focus on mouse models for defective neural tube closure. At least four major patterns of nutrient responses are apparent, suggesting that multiple pathways are involved in the response, and likely in the underlying pathogenesis of the defects. Folic acid has been the most widely studied nutrient, and the diverse responses of the mouse models to folic acid supplementation indicate that folic acid is not universally beneficial, but that the effect is dependent on genetic configuration. If this is the case for other nutrients as well, efforts to prevent neural tube defects with nutritional supplementation may need to become more specifically targeted than previously appreciated. Mouse models are indispensable for a better understanding of nutrient–gene interactions in normal pregnancies, as well as in those affected by metabolic diseases, such as diabetes and obesity. PMID:24124024

  20. Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data

    PubMed Central

    Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian

    2015-01-01

    In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213

  1. Human genetics of infectious diseases: a unified theory

    PubMed Central

    Casanova, Jean-Laurent; Abel, Laurent

    2007-01-01

    Since the early 1950s, the dominant paradigm in the human genetics of infectious diseases postulates that rare monogenic immunodeficiencies confer vulnerability to multiple infectious diseases (one gene, multiple infections), whereas common infections are associated with the polygenic inheritance of multiple susceptibility genes (one infection, multiple genes). Recent studies, since 1996 in particular, have challenged this view. A newly recognised group of primary immunodeficiencies predisposing the individual to a principal or single type of infection is emerging. In parallel, several common infections have been shown to reflect the inheritance of one major susceptibility gene, at least in some populations. This novel causal relationship (one gene, one infection) blurs the distinction between patient-based Mendelian genetics and population-based complex genetics, and provides a unified conceptual frame for exploring the molecular genetic basis of infectious diseases in humans. PMID:17255931

  2. Integrating machine learning techniques into robust data enrichment approach and its application to gene expression data.

    PubMed

    Erdoğdu, Utku; Tan, Mehmet; Alhajj, Reda; Polat, Faruk; Rokne, Jon; Demetrick, Douglas

    2013-01-01

    The availability of enough samples for effective analysis and knowledge discovery has been a challenge in the research community, especially in the area of gene expression data analysis. Thus, the approaches being developed for data analysis have mostly suffered from the lack of enough data to train and test the constructed models. We argue that the process of sample generation could be successfully automated by employing some sophisticated machine learning techniques. An automated sample generation framework could successfully complement the actual sample generation from real cases. This argument is validated in this paper by describing a framework that integrates multiple models (perspectives) for sample generation. We illustrate its applicability for producing new gene expression data samples, a highly demanding area that has not received attention. The three perspectives employed in the process are based on models that are not closely related. The independence eliminates the bias of having the produced approach covering only certain characteristics of the domain and leading to samples skewed towards one direction. The first model is based on the Probabilistic Boolean Network (PBN) representation of the gene regulatory network underlying the given gene expression data. The second model integrates Hierarchical Markov Model (HIMM) and the third model employs a genetic algorithm in the process. Each model learns as much as possible characteristics of the domain being analysed and tries to incorporate the learned characteristics in generating new samples. In other words, the models base their analysis on domain knowledge implicitly present in the data itself. The developed framework has been extensively tested by checking how the new samples complement the original samples. The produced results are very promising in showing the effectiveness, usefulness and applicability of the proposed multi-model framework.

  3. Anti-inflammatory genes associated with multiple sclerosis: a gene expression study.

    PubMed

    Perga, S; Montarolo, F; Martire, S; Berchialla, P; Malucchi, S; Bertolotto, A

    2015-02-15

    Multiple sclerosis (MS) is an autoimmune inflammatory disease of the central nervous system caused by a complex interaction between multiple genes and environmental factors. HLA region is the strongest susceptibility locus, but recent huge genome-wide association studies identified new susceptibility genes. Among these, BACH2, PTGER4, RGS1 and ZFP36L1 were highlighted. Here, a gene expression analysis revealed that three of them, namely BACH2, PTGER4 and ZFP36L1, are down-regulated in MS patients' blood cells compared to healthy subjects. Interestingly, all these genes are involved in the immune system regulation with predominant anti-inflammatory role and their reduction could predispose to MS development. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Estimating differential expression from multiple indicators

    PubMed Central

    Ilmjärv, Sten; Hundahl, Christian Ansgar; Reimets, Riin; Niitsoo, Margus; Kolde, Raivo; Vilo, Jaak; Vasar, Eero; Luuk, Hendrik

    2014-01-01

    Regardless of the advent of high-throughput sequencing, microarrays remain central in current biomedical research. Conventional microarray analysis pipelines apply data reduction before the estimation of differential expression, which is likely to render the estimates susceptible to noise from signal summarization and reduce statistical power. We present a probe-level framework, which capitalizes on the high number of concurrent measurements to provide more robust differential expression estimates. The framework naturally extends to various experimental designs and target categories (e.g. transcripts, genes, genomic regions) as well as small sample sizes. Benchmarking in relation to popular microarray and RNA-sequencing data-analysis pipelines indicated high and stable performance on the Microarray Quality Control dataset and in a cell-culture model of hypoxia. Experimental-data-exhibiting long-range epigenetic silencing of gene expression was used to demonstrate the efficacy of detecting differential expression of genomic regions, a level of analysis not embraced by conventional workflows. Finally, we designed and conducted an experiment to identify hypothermia-responsive genes in terms of monotonic time-response. As a novel insight, hypothermia-dependent up-regulation of multiple genes of two major antioxidant pathways was identified and verified by quantitative real-time PCR. PMID:24586062

  5. Virulence gene regulation by CvfA, a putative RNase: the CvfA-enolase complex in Streptococcus pyogenes links nutritional stress, growth-phase control, and virulence gene expression.

    PubMed

    Kang, Song Ok; Caparon, Michael G; Cho, Kyu Hong

    2010-06-01

    Streptococcus pyogenes, a multiple-auxotrophic human pathogen, regulates virulence gene expression according to nutritional availability during various stages in the infection process or in different infection sites. We discovered that CvfA influenced the expression of virulence genes according to growth phase and nutritional status. The influence of CvfA in C medium, rich in peptides and poor in carbohydrates, was most pronounced at the stationary phase. Under these conditions, up to 30% of the transcriptome exhibited altered expression; the levels of expression of multiple virulence genes were altered, including the genes encoding streptokinase, CAMP factor, streptolysin O, M protein (more abundant in the CvfA(-) mutant), SpeB, mitogenic factor, and streptolysin S (less abundant). The increase of carbohydrates or peptides in media restored the levels of expression of the virulence genes in the CvfA(-) mutant to wild-type levels (emm, ska, and cfa by carbohydrates; speB by peptides). Even though the regulation of gene expression dependent on nutritional stress is commonly linked to the stringent response, the levels of ppGpp were not altered by deletion of cvfA. Instead, CvfA interacted with enolase, implying that CvfA, a putative RNase, controls the transcript decay rates of virulence factors or their regulators according to nutritional status. The virulence of CvfA(-) mutants was highly attenuated in murine models, indicating that CvfA-mediated gene regulation is necessary for the pathogenesis of S. pyogenes. Taken together, the CvfA-enolase complex in S. pyogenes is involved in the regulation of virulence gene expression by controlling RNA degradation according to nutritional stress.

  6. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering

    NASA Technical Reports Server (NTRS)

    Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland

    2000-01-01

    Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.

  7. A new approach to estimate parameters of speciation models with application to apes.

    PubMed

    Becquet, Celine; Przeworski, Molly

    2007-10-01

    How populations diverge and give rise to distinct species remains a fundamental question in evolutionary biology, with important implications for a wide range of fields, from conservation genetics to human evolution. A promising approach is to estimate parameters of simple speciation models using polymorphism data from multiple loci. Existing methods, however, make a number of assumptions that severely limit their applicability, notably, no gene flow after the populations split and no intralocus recombination. To overcome these limitations, we developed a new Markov chain Monte Carlo method to estimate parameters of an isolation-migration model. The approach uses summaries of polymorphism data at multiple loci surveyed in a pair of diverging populations or closely related species and, importantly, allows for intralocus recombination. To illustrate its potential, we applied it to extensive polymorphism data from populations and species of apes, whose demographic histories are largely unknown. The isolation-migration model appears to provide a reasonable fit to the data. It suggests that the two chimpanzee species became reproductively isolated in allopatry approximately 850 Kya, while Western and Central chimpanzee populations split approximately 440 Kya but continued to exchange migrants. Similarly, Eastern and Western gorillas and Sumatran and Bornean orangutans appear to have experienced gene flow since their splits approximately 90 and over 250 Kya, respectively.

  8. Integrative Analysis of High-throughput Cancer Studies with Contrasted Penalization

    PubMed Central

    Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Shia, BenChang; Ma, Shuangge

    2015-01-01

    In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms “classic” meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance. PMID:24395534

  9. ePlant and the 3D data display initiative: integrative systems biology on the world wide web.

    PubMed

    Fucile, Geoffrey; Di Biase, David; Nahal, Hardeep; La, Garon; Khodabandeh, Shokoufeh; Chen, Yani; Easley, Kante; Christendat, Dinesh; Kelley, Lawrence; Provart, Nicholas J

    2011-01-10

    Visualization tools for biological data are often limited in their ability to interactively integrate data at multiple scales. These computational tools are also typically limited by two-dimensional displays and programmatic implementations that require separate configurations for each of the user's computing devices and recompilation for functional expansion. Towards overcoming these limitations we have developed "ePlant" (http://bar.utoronto.ca/eplant) - a suite of open-source world wide web-based tools for the visualization of large-scale data sets from the model organism Arabidopsis thaliana. These tools display data spanning multiple biological scales on interactive three-dimensional models. Currently, ePlant consists of the following modules: a sequence conservation explorer that includes homology relationships and single nucleotide polymorphism data, a protein structure model explorer, a molecular interaction network explorer, a gene product subcellular localization explorer, and a gene expression pattern explorer. The ePlant's protein structure explorer module represents experimentally determined and theoretical structures covering >70% of the Arabidopsis proteome. The ePlant framework is accessed entirely through a web browser, and is therefore platform-independent. It can be applied to any model organism. To facilitate the development of three-dimensional displays of biological data on the world wide web we have established the "3D Data Display Initiative" (http://3ddi.org).

  10. MINER: exploratory analysis of gene interaction networks by machine learning from expression data.

    PubMed

    Kadupitige, Sidath Randeni; Leung, Kin Chun; Sellmeier, Julia; Sivieng, Jane; Catchpoole, Daniel R; Bain, Michael E; Gaëta, Bruno A

    2009-12-03

    The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. We have developed MINER (Microarray Interactive Network Exploration and Representation), an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.

  11. Predicting features of breast cancer with gene expression patterns.

    PubMed

    Lu, Xuesong; Lu, Xin; Wang, Zhigang C; Iglehart, J Dirk; Zhang, Xuegong; Richardson, Andrea L

    2008-03-01

    Data from gene expression arrays hold an enormous amount of biological information. We sought to determine if global gene expression in primary breast cancers contained information about biologic, histologic, and anatomic features of the disease in individual patients. Microarray data from the tumors of 129 patients were analyzed for the ability to predict biomarkers [estrogen receptor (ER) and HER2], histologic features [grade and lymphatic-vascular invasion (LVI)], and stage parameters (tumor size and lymph node metastasis). Multiple statistical predictors were used and the prediction accuracy was determined by cross-validation error rate; multidimensional scaling (MDS) allowed visualization of the predicted states under study. Models built from gene expression data accurately predict ER and HER2 status, and divide tumor grade into high-grade and low-grade clusters; intermediate-grade tumors are not a unique group. In contrast, gene expression data is inaccurate at predicting tumor size, lymph node status or LVI. The best model for prediction of nodal status included tumor size, LVI status and pathologically defined tumor subtype (based on combinations of ER, HER2, and grade); the addition of microarray-based prediction to this model failed to improve the prediction accuracy. Global gene expression supports a binary division of ER, HER2, and grade, clearly separating tumors into two categories; intermediate values for these bio-indicators do not define intermediate tumor subsets. Results are consistent with a model of regional metastasis that depends on inherent biologic differences in metastatic propensity between breast cancer subtypes, upon which time and chance then operate.

  12. Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes.

    PubMed

    Liu, Ruifeng; AbdulHameed, Mohamed Diwan M; Wallqvist, Anders

    2017-09-25

    The quantitative structure-activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very large number of models. To address this issue, we examined the performance of a variable nearest neighbor (v-NN) method that uses information on near neighbors conforming to the principle that similar structures have similar activities. Using a data set of gene expression signatures of 13 150 compounds derived from cell-based measurements in the NIH Library of Integrated Network-based Cellular Signatures program, we were able to make predictions for 62% of the compounds in a 10-fold cross validation test, with a correlation coefficient of 0.61 between the predicted and experimentally derived signatures-a reproducibility rivaling that of high-throughput gene expression measurements. To evaluate the utility of the predicted gene expression signatures, we compared the predicted and experimentally derived signatures in their ability to identify drugs known to cause specific liver, kidney, and heart injuries. Overall, the predicted and experimentally derived signatures had similar receiver operating characteristics, whose areas under the curve ranged from 0.71 to 0.77 and 0.70 to 0.73, respectively, across the three organ injury models. However, detailed analyses of enrichment curves indicate that signatures predicted from multiple near neighbors outperformed those derived from experiments, suggesting that averaging information from near neighbors may help improve the signal from gene expression measurements. Our results demonstrate that the v-NN method can serve as a practical approach for modeling large-scale, genomewide, chemical-induced, gene expression changes.

  13. The dynamics of gene expression changes in a mouse model of oral tumorigenesis may help refine prevention and treatment strategies in patients with oral cancer.

    PubMed

    Foy, Jean-Philippe; Tortereau, Antonin; Caulin, Carlos; Le Texier, Vincent; Lavergne, Emilie; Thomas, Emilie; Chabaud, Sylvie; Perol, David; Lachuer, Joël; Lang, Wenhua; Hong, Waun Ki; Goudot, Patrick; Lippman, Scott M; Bertolus, Chloé; Saintigny, Pierre

    2016-06-14

    A better understanding of the dynamics of molecular changes occurring during the early stages of oral tumorigenesis may help refine prevention and treatment strategies. We generated genome-wide expression profiles of microdissected normal mucosa, hyperplasia, dysplasia and tumors derived from the 4-NQO mouse model of oral tumorigenesis. Genes differentially expressed between tumor and normal mucosa defined the "tumor gene set" (TGS), including 4 non-overlapping gene subsets that characterize the dynamics of gene expression changes through different stages of disease progression. The majority of gene expression changes occurred early or progressively. The relevance of these mouse gene sets to human disease was tested in multiple datasets including the TCGA and the Genomics of Drug Sensitivity in Cancer project. The TGS was able to discriminate oral squamous cell carcinoma (OSCC) from normal oral mucosa in 3 independent datasets. The OSCC samples enriched in the mouse TGS displayed high frequency of CASP8 mutations, 11q13.3 amplifications and low frequency of PIK3CA mutations. Early changes observed in the 4-NQO model were associated with a trend toward a shorter oral cancer-free survival in patients with oral preneoplasia that was not seen in multivariate analysis. Progressive changes observed in the 4-NQO model were associated with an increased sensitivity to 4 different MEK inhibitors in a panel of 51 squamous cell carcinoma cell lines of the areodigestive tract. In conclusion, the dynamics of molecular changes in the 4-NQO model reveal that MEK inhibition may be relevant to prevention and treatment of a specific molecularly-defined subgroup of OSCC.

  14. A Multiomics Approach to Identify Genes Associated with Childhood Asthma Risk and Morbidity.

    PubMed

    Forno, Erick; Wang, Ting; Yan, Qi; Brehm, John; Acosta-Perez, Edna; Colon-Semidey, Angel; Alvarez, Maria; Boutaoui, Nadia; Cloutier, Michelle M; Alcorn, John F; Canino, Glorisa; Chen, Wei; Celedón, Juan C

    2017-10-01

    Childhood asthma is a complex disease. In this study, we aim to identify genes associated with childhood asthma through a multiomics "vertical" approach that integrates multiple analytical steps using linear and logistic regression models. In a case-control study of childhood asthma in Puerto Ricans (n = 1,127), we used adjusted linear or logistic regression models to evaluate associations between several analytical steps of omics data, including genome-wide (GW) genotype data, GW methylation, GW expression profiling, cytokine levels, asthma-intermediate phenotypes, and asthma status. At each point, only the top genes/single-nucleotide polymorphisms/probes/cytokines were carried forward for subsequent analysis. In step 1, asthma modified the gene expression-protein level association for 1,645 genes; pathway analysis showed an enrichment of these genes in the cytokine signaling system (n = 269 genes). In steps 2-3, expression levels of 40 genes were associated with intermediate phenotypes (asthma onset age, forced expiratory volume in 1 second, exacerbations, eosinophil counts, and skin test reactivity); of those, methylation of seven genes was also associated with asthma. Of these seven candidate genes, IL5RA was also significant in analytical steps 4-8. We then measured plasma IL-5 receptor α levels, which were associated with asthma age of onset and moderate-severe exacerbations. In addition, in silico database analysis showed that several of our identified IL5RA single-nucleotide polymorphisms are associated with transcription factors related to asthma and atopy. This approach integrates several analytical steps and is able to identify biologically relevant asthma-related genes, such as IL5RA. It differs from other methods that rely on complex statistical models with various assumptions.

  15. Phylogenetic analysis of pectin-related gene families in Physcomitrella patens and nine other plant species yields evolutionary insights into cell walls

    PubMed Central

    2014-01-01

    Background Pectins are acidic sugar-containing polysaccharides that are universally conserved components of the primary cell walls of plants and modulate both tip and diffuse cell growth. However, many of their specific functions and the evolution of the genes responsible for producing and modifying them are incompletely understood. The moss Physcomitrella patens is emerging as a powerful model system for the study of plant cell walls. To identify deeply conserved pectin-related genes in Physcomitrella, we generated phylogenetic trees for 16 pectin-related gene families using sequences from ten plant genomes and analyzed the evolutionary relationships within these families. Results Contrary to our initial hypothesis that a single ancestral gene was present for each pectin-related gene family in the common ancestor of land plants, five of the 16 gene families, including homogalacturonan galacturonosyltransferases, polygalacturonases, pectin methylesterases, homogalacturonan methyltransferases, and pectate lyase-like proteins, show evidence of multiple members in the early land plant that gave rise to the mosses and vascular plants. Seven of the gene families, the UDP-rhamnose synthases, UDP-glucuronic acid epimerases, homogalacturonan galacturonosyltransferase-like proteins, β-1,4-galactan β-1,4-galactosyltransferases, rhamnogalacturonan II xylosyltransferases, and pectin acetylesterases appear to have had a single member in the common ancestor of land plants. We detected no Physcomitrella members in the xylogalacturonan xylosyltransferase, rhamnogalacturonan I arabinosyltransferase, pectin methylesterase inhibitor, or polygalacturonase inhibitor protein families. Conclusions Several gene families related to the production and modification of pectins in plants appear to have multiple members that are conserved as far back as the common ancestor of mosses and vascular plants. The presence of multiple members of these families even before the divergence of other important cell wall-related genes, such as cellulose synthases, suggests a more complex role than previously suspected for pectins in the evolution of land plants. The presence of relatively small pectin-related gene families in Physcomitrella as compared to Arabidopsis makes it an attractive target for analysis of the functions of pectins in cell walls. In contrast, the absence of genes in Physcomitrella for some families suggests that certain pectin modifications, such as homogalacturonan xylosylation, arose later during land plant evolution. PMID:24666997

  16. Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants1[W][OA

    PubMed Central

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-01-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs. PMID:23396833

  17. Three-dimensional transgenic cell model to quantify genotoxic effects of space environment

    NASA Astrophysics Data System (ADS)

    Gonda, S. R.; Wu, H.; Pingerelli, P. L.; Glickman, B. W.

    In this paper we describe a three-dimensional, multicellular tissue-equivalent model, produced in NASA-designed, rotating wall bioreactors using mammalian cells engineered for genomic containment of multiple copies of defined target genes for genotoxic assessment. Rat 2λ fibroblasts, genetically engineered to contain high-density target genes for mutagenesis (Stratagene, Inc., Austin, TX), were cocultured with human epithelial cells on Cytodex beads in the High Aspect Ratio Bioreactor (Synthecon, Inc, Houston, TX). Multi-bead aggregates were formed by day 5 following the complete covering of the beads by fibroblasts. Cellular retraction occurred 8-14 days after coculture initiation culminating in spheroids retaining few or no beads. Analysis of the resulting tissue assemblies revealed: multicellular spheroids, fibroblasts synthesized collagen, and cell viability was retained for the 30-day test period after removal from the bioreactor. Quantification of mutation at the LacI gene in Rat 2λ fibroblasts in spheroids exposed to 0-2 Gy neon using the Big Blue color assay (Stratagene, Inc.), revealed a linear dose-response for mutation induction. Limited sequencing analysis of mutant clones from 0.25 or 1 Gy exposures revealed a higher frequency of deletions and multiple base sequencing changes with increasing dose. These results suggest that the three-dimensional, multicellular tissue assembly model produced in NASA bioreactors are applicable to a wide variety of studies involving the quantification and identification of genotocity including measurement of the inherent damage incurred in Space.

  18. A Genome-Wide Analysis Reveals No Nuclear Dobzhansky-Muller Pairs of Determinants of Speciation between S. cerevisiae and S. paradoxus, but Suggests More Complex Incompatibilities

    PubMed Central

    Kao, Katy C.; Schwartz, Katja; Sherlock, Gavin

    2010-01-01

    The Dobzhansky-Muller (D-M) model of speciation by genic incompatibility is widely accepted as the primary cause of interspecific postzygotic isolation. Since the introduction of this model, there have been theoretical and experimental data supporting the existence of such incompatibilities. However, speciation genes have been largely elusive, with only a handful of candidate genes identified in a few organisms. The Saccharomyces sensu stricto yeasts, which have small genomes and can mate interspecifically to produce sterile hybrids, are thus an ideal model for studying postzygotic isolation. Among them, only a single D-M pair, comprising a mitochondrially targeted product of a nuclear gene and a mitochondrially encoded locus, has been found. Thus far, no D-M pair of nuclear genes has been identified between any sensu stricto yeasts. We report here the first detailed genome-wide analysis of rare meiotic products from an otherwise sterile hybrid and show that no classic D-M pairs of speciation genes exist between the nuclear genomes of the closely related yeasts S. cerevisiae and S. paradoxus. Instead, our analyses suggest that more complex interactions, likely involving multiple loci having weak effects, may be responsible for their post-zygotic separation. The lack of a nuclear encoded classic D-M pair between these two yeasts, yet the existence of multiple loci that may each exert a small effect through complex interactions suggests that initial speciation events might not always be mediated by D-M pairs. An alternative explanation may be that the accumulation of polymorphisms leads to gamete inviability due to the activities of anti-recombination mechanisms and/or incompatibilities between the species' transcriptional and metabolic networks, with no single pair at least initially being responsible for the incompatibility. After such a speciation event, it is possible that one or more D-M pairs might subsequently arise following isolation. PMID:20686707

  19. Radiogenomics to characterize regional genetic heterogeneity in glioblastoma.

    PubMed

    Hu, Leland S; Ning, Shuluo; Eschbacher, Jennifer M; Baxter, Leslie C; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C; Peng, Sen; Smith, Kris A; Nakaji, Peter; Karis, John P; Quarles, C Chad; Wu, Teresa; Loftus, Joseph C; Jenkins, Robert B; Sicotte, Hugues; Kollmeyer, Thomas M; O'Neill, Brian P; Elmquist, William; Hoxworth, Joseph M; Frakes, David; Sarkaria, Jann; Swanson, Kristin R; Tran, Nhan L; Li, Jing; Mitchell, J Ross

    2017-01-01

    Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. A gene expression inflammatory signature specifically predicts multiple myeloma evolution and patients survival.

    PubMed

    Botta, C; Di Martino, M T; Ciliberto, D; Cucè, M; Correale, P; Rossi, M; Tagliaferri, P; Tassone, P

    2016-12-16

    Multiple myeloma (MM) is closely dependent on cross-talk between malignant plasma cells and cellular components of the inflammatory/immunosuppressive bone marrow milieu, which promotes disease progression, drug resistance, neo-angiogenesis, bone destruction and immune-impairment. We investigated the relevance of inflammatory genes in predicting disease evolution and patient survival. A bioinformatics study by Ingenuity Pathway Analysis on gene expression profiling dataset of monoclonal gammopathy of undetermined significance, smoldering and symptomatic-MM, identified inflammatory and cytokine/chemokine pathways as the most progressively affected during disease evolution. We then selected 20 candidate genes involved in B-cell inflammation and we investigated their role in predicting clinical outcome, through univariate and multivariate analyses (log-rank test, logistic regression and Cox-regression model). We defined an 8-genes signature (IL8, IL10, IL17A, CCL3, CCL5, VEGFA, EBI3 and NOS2) identifying each condition (MGUS/smoldering/symptomatic-MM) with 84% accuracy. Moreover, six genes (IFNG, IL2, LTA, CCL2, VEGFA, CCL3) were found independently correlated with patients' survival. Patients whose MM cells expressed high levels of Th1 cytokines (IFNG/LTA/IL2/CCL2) and low levels of CCL3 and VEGFA, experienced the longest survival. On these six genes, we built a prognostic risk score that was validated in three additional independent datasets. In this study, we provide proof-of-concept that inflammation has a critical role in MM patient progression and survival. The inflammatory-gene prognostic signature validated in different datasets clearly indicates novel opportunities for personalized anti-MM treatment.

  1. Identification and characterisation of the angiotensin converting enzyme-3 (ACE3) gene: a novel mammalian homologue of ACE

    PubMed Central

    Rella, Monika; Elliot, Joann L; Revett, Timothy J; Lanfear, Jerry; Phelan, Anne; Jackson, Richard M; Turner, Anthony J; Hooper, Nigel M

    2007-01-01

    Background Mammalian angiotensin converting enzyme (ACE) plays a key role in blood pressure regulation. Although multiple ACE-like proteins exist in non-mammalian organisms, to date only one other ACE homologue, ACE2, has been identified in mammals. Results Here we report the identification and characterisation of the gene encoding a third homologue of ACE, termed ACE3, in several mammalian genomes. The ACE3 gene is located on the same chromosome downstream of the ACE gene. Multiple sequence alignment and molecular modelling have been employed to characterise the predicted ACE3 protein. In mouse, rat, cow and dog, the predicted protein has mutations in some of the critical residues involved in catalysis, including the catalytic Glu in the HEXXH zinc binding motif which is Gln, and ESTs or reverse-transcription PCR indicate that the gene is expressed. In humans, the predicted ACE3 protein has an intact HEXXH motif, but there are other deletions and insertions in the gene and no ESTs have been identified. Conclusion In the genomes of several mammalian species there is a gene that encodes a novel, single domain ACE-like protein, ACE3. In mouse, rat, cow and dog ACE3, the catalytic Glu is replaced by Gln in the putative zinc binding motif, indicating that in these species ACE3 would lack catalytic activity as a zinc metalloprotease. In humans, no evidence was found that the ACE3 gene is expressed and the presence of deletions and insertions in the sequence indicate that ACE3 is a pseudogene. PMID:17597519

  2. Alternate approaches to repress endogenous microRNA activity in Arabidopsis thaliana

    PubMed Central

    Wang, Ming-Bo

    2011-01-01

    MicroRnAs (miRnAs) are an endogenous class of regulatory small RnA (sRnA). in plants, miRnAs are processed from short non-protein-coding messenger RnAs (mRnAs) transcribed from small miRnA genes (MIR genes). Traditionally in the model plant Arabidopsis thaliana (Arabidopsis), the functional analysis of a gene product has relied on the identification of a corresponding T-DnA insertion knockout mutant from a large, randomly-mutagenized population. However, because of the small size of MIR genes and presence of multiple, highly conserved members in most plant miRnA families, it has been extremely laborious and time consuming to obtain a corresponding single or multiple, null mutant plant line. Our recent study published in Molecular Plant1 outlines an alternate method for the functional characterization of miRnA action in Arabidopsis, termed anti-miRnA technology. Using this approach we demonstrated that the expression of individual miRnAs or entire miRnA families, can be readily and efficiently knocked-down. Our approach is in addition to two previously reported methodologies that also allow for the targeted suppression of either individual miRnAs, or all members of a MIR gene family; these include miRnA target mimicry2,3 and transcriptional gene silencing (TGS) of MIR gene promoters.4 All three methodologies rely on endogenous gene regulatory machinery and in this article we provide an overview of these technologies and discuss their strengths and weaknesses in inhibiting the activity of their targeted miRnA(s). PMID:21358288

  3. Biomine: predicting links between biological entities using network models of heterogeneous databases.

    PubMed

    Eronen, Lauri; Toivonen, Hannu

    2012-06-06

    Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

  4. Screening of differentially expressed genes between multiple trauma patients with and without sepsis.

    PubMed

    Ji, S C; Pan, Y T; Lu, Q Y; Sun, Z Y; Liu, Y Z

    2014-03-17

    The purpose of this study was to identify critical genes associated with septic multiple trauma by comparing peripheral whole blood samples from multiple trauma patients with and without sepsis. A microarray data set was downloaded from the Gene Expression Omnibus (GEO) database. This data set included 70 samples, 36 from multiple trauma patients with sepsis and 34 from multiple trauma patients without sepsis (as a control set). The data were preprocessed, and differentially expressed genes (DEGs) were then screened for using packages of the R language. Functional analysis of DEGs was performed with DAVID. Interaction networks were then established for the most up- and down-regulated genes using HitPredict. Pathway-enrichment analysis was conducted for genes in the networks using WebGestalt. Fifty-eight DEGs were identified. The expression levels of PLAU (down-regulated) and MMP8 (up-regulated) presented the largest fold-changes, and interaction networks were established for these genes. Further analysis revealed that PLAT (plasminogen activator, tissue) and SERPINF2 (serpin peptidase inhibitor, clade F, member 2), which interact with PLAU, play important roles in the pathway of the component and coagulation cascade. We hypothesize that PLAU is a major regulator of the component and coagulation cascade, and down-regulation of PLAU results in dysfunction of the pathway, causing sepsis.

  5. Reconstruction of the regulatory network for Bacillus subtilis and reconciliation with gene expression data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Faria, Jose P.; Overbeek, Ross; Taylor, Ronald C.

    Here, we introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of B. subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, wemore » reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches and small regulatory RNAs. Overall, regulatory information is included in the model for approximately 2500 of the ~4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same “ON” and “OFF” gene expression profiles across multiple samples of experimental data. We show how atomic regulons for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how atomic regulons can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.« less

  6. Reconstruction of the regulatory network for Bacillus subtilis and reconciliation with gene expression data

    DOE PAGES

    Faria, Jose P.; Overbeek, Ross; Taylor, Ronald C.; ...

    2016-03-18

    Here, we introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of B. subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, wemore » reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches and small regulatory RNAs. Overall, regulatory information is included in the model for approximately 2500 of the ~4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same “ON” and “OFF” gene expression profiles across multiple samples of experimental data. We show how atomic regulons for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how atomic regulons can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.« less

  7. A multicolor panel of TALE-KRAB based transcriptional repressor vectors enabling knockdown of multiple gene targets

    PubMed Central

    Zhang, Zhonghui; Wu, Elise; Qian, Zhijian; Wu, Wen-Shu

    2014-01-01

    Stable and efficient knockdown of multiple gene targets is highly desirable for dissection of molecular pathways. Because it allows sequence-specific DNA binding, transcription activator-like effector (TALE) offers a new genetic perturbation technique that allows for gene-specific repression. Here, we constructed a multicolor lentiviral TALE-Kruppel-associated box (KRAB) expression vector platform that enables knockdown of multiple gene targets. This platform is fully compatible with the Golden Gate TALEN and TAL Effector Kit 2.0, a widely used and efficient method for TALE assembly. We showed that this multicolor TALE-KRAB vector system when combined together with bone marrow transplantation could quickly knock down c-kit and PU.1 genes in hematopoietic stem and progenitor cells of recipient mice. Furthermore, our data demonstrated that this platform simultaneously knocked down both c-Kit and PU.1 genes in the same primary cell populations. Together, our results suggest that this multicolor TALE-KRAB vector platform is a promising and versatile tool for knockdown of multiple gene targets and could greatly facilitate dissection of molecular pathways. PMID:25475013

  8. A multicolor panel of TALE-KRAB based transcriptional repressor vectors enabling knockdown of multiple gene targets.

    PubMed

    Zhang, Zhonghui; Wu, Elise; Qian, Zhijian; Wu, Wen-Shu

    2014-12-05

    Stable and efficient knockdown of multiple gene targets is highly desirable for dissection of molecular pathways. Because it allows sequence-specific DNA binding, transcription activator-like effector (TALE) offers a new genetic perturbation technique that allows for gene-specific repression. Here, we constructed a multicolor lentiviral TALE-Kruppel-associated box (KRAB) expression vector platform that enables knockdown of multiple gene targets. This platform is fully compatible with the Golden Gate TALEN and TAL Effector Kit 2.0, a widely used and efficient method for TALE assembly. We showed that this multicolor TALE-KRAB vector system when combined together with bone marrow transplantation could quickly knock down c-kit and PU.1 genes in hematopoietic stem and progenitor cells of recipient mice. Furthermore, our data demonstrated that this platform simultaneously knocked down both c-Kit and PU.1 genes in the same primary cell populations. Together, our results suggest that this multicolor TALE-KRAB vector platform is a promising and versatile tool for knockdown of multiple gene targets and could greatly facilitate dissection of molecular pathways.

  9. Mechanisms of rapid sympatric speciation by sex reversal and sexual selection in cichlid fish.

    PubMed

    Lande, R; Seehausen, O; van Alphen, J J

    2001-01-01

    Mechanisms of speciation in cichlid fish were investigated by analyzing population genetic models of sexual selection on sex-determining genes associated with color polymorphisms. The models are based on a combination of laboratory experiments and field observations on the ecology, male and female mating behavior, and inheritance of sex-determination and color polymorphisms. The models explain why sex-reversal genes that change males into females tend to be X-linked and associated with novel colors, using the hypothesis of restricted recombination on the sex chromosomes, as suggested by previous theory on the evolution of recombination. The models reveal multiple pathways for rapid sympatric speciation through the origin of novel color morphs with strong assortative mating that incorporate both sex-reversal and suppressor genes. Despite the lack of geographic isolation or ecological differentiation, the new species coexists with the ancestral species either temporarily or indefinitely. These results may help to explain different patterns and rates of speciation among groups of cichlids, in particular the explosive diversification of rock-dwelling haplochromine cichlids.

  10. Enhanced hexose fermentation by Saccharomyces cerevisiae through integration of stoichiometric modeling and genetic screening.

    PubMed

    Quarterman, Josh; Kim, Soo Rin; Kim, Pan-Jun; Jin, Yong-Su

    2015-01-20

    In order to determine beneficial gene deletions for ethanol production by the yeast Saccharomyces cerevisiae, we performed an in silico gene deletion experiment based on a genome-scale metabolic model. Genes coding for two oxidative phosphorylation reactions (cytochrome c oxidase and ubiquinol cytochrome c reductase) were identified by the model-based simulation as potential deletion targets for enhancing ethanol production and maintaining acceptable overall growth rate in oxygen-limited conditions. Since the two target enzymes are composed of multiple subunits, we conducted a genetic screening study to evaluate the in silico results and compare the effect of deleting various portions of the respiratory enzyme complexes. Over two-thirds of the knockout mutants identified by the in silico study did exhibit experimental behavior in qualitative agreement with model predictions, but the exceptions illustrate the limitation of using a purely stoichiometric model-based approach. Furthermore, there was a substantial quantitative variation in phenotype among the various respiration-deficient mutants that were screened in this study, and three genes encoding respiratory enzyme subunits were identified as the best knockout targets for improving hexose fermentation in microaerobic conditions. Specifically, deletion of either COX9 or QCR9 resulted in higher ethanol production rates than the parental strain by 37% and 27%, respectively, with slight growth disadvantages. Also, deletion of QCR6 led to improved ethanol production rate by 24% with no growth disadvantage. The beneficial effects of these gene deletions were consistently demonstrated in different strain backgrounds and with four common hexoses. The combination of stoichiometric modeling and genetic screening using a systematic knockout collection was useful for narrowing a large set of gene targets and identifying targets of interest. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Theory of prokaryotic genome evolution.

    PubMed

    Sela, Itamar; Wolf, Yuri I; Koonin, Eugene V

    2016-10-11

    Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.

  12. Circuit-Host Coupling Induces Multifaceted Behavioral Modulations of a Gene Switch.

    PubMed

    Blanchard, Andrew E; Liao, Chen; Lu, Ting

    2018-02-06

    Quantitative modeling of gene circuits is fundamentally important to synthetic biology, as it offers the potential to transform circuit engineering from trial-and-error construction to rational design and, hence, facilitates the advance of the field. Currently, typical models regard gene circuits as isolated entities and focus only on the biochemical processes within the circuits. However, such a standard paradigm is getting challenged by increasing experimental evidence suggesting that circuits and their host are intimately connected, and their interactions can potentially impact circuit behaviors. Here we systematically examined the roles of circuit-host coupling in shaping circuit dynamics by using a self-activating gene switch as a model circuit. Through a combination of deterministic modeling, stochastic simulation, and Fokker-Planck equation formalism, we found that circuit-host coupling alters switch behaviors across multiple scales. At the single-cell level, it slows the switch dynamics in the high protein production regime and enlarges the difference between stable steady-state values. At the population level, it favors cells with low protein production through differential growth amplification. Together, the two-level coupling effects induce both quantitative and qualitative modulations of the switch, with the primary component of the effects determined by the circuit's architectural parameters. This study illustrates the complexity and importance of circuit-host coupling in modulating circuit behaviors, demonstrating the need for a new paradigm-integrated modeling of the circuit-host system-for quantitative understanding of engineered gene networks. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  13. NvERTx: a gene expression database to compare embryogenesis and regeneration in the sea anemone Nematostella vectensis.

    PubMed

    Warner, Jacob F; Guerlais, Vincent; Amiel, Aldine R; Johnston, Hereroa; Nedoncelle, Karine; Röttinger, Eric

    2018-05-17

    For over a century, researchers have been comparing embryogenesis and regeneration hoping that lessons learned from embryonic development will unlock hidden regenerative potential. This problem has historically been a difficult one to investigate because the best regenerative model systems are poor embryonic models and vice versa. Recently, however, there has been renewed interest in this question, as emerging models have allowed researchers to investigate these processes in the same organism. This interest has been further fueled by the advent of high-throughput transcriptomic analyses that provide virtual mountains of data. Here, we present N ematostella vectensis Embryogenesis and Regeneration Transcriptomics (NvERTx), a platform for comparing gene expression during embryogenesis and regeneration. NvERTx consists of close to 50 transcriptomic data sets spanning embryogenesis and regeneration in Nematostella These data were used to perform a robust de novo transcriptome assembly, with which users can search, conduct BLAST analyses, and plot the expression of multiple genes during these two developmental processes. The site is also home to the results of gene clustering analyses, to further mine the data and identify groups of co-expressed genes. The site can be accessed at http://nvertx.kahikai.org. © 2018. Published by The Company of Biologists Ltd.

  14. Evaluation of the reversal of multidrug resistance by MDR1 ribonucleic acid interference in a human colon cancer model using a Renilla luciferase reporter gene and coelenterazine.

    PubMed

    Jeon, Yong Hyun; Bae, Seon-ae; Lee, Yong Jin; Lee, You La; Lee, Sang-Woo; Yoon, Ghil-Suk; Ahn, Byeong-Cheol; Ha, Jeoung-Hee; Lee, Jaetae

    2010-12-01

    The reversal effect of multidrug resistance (MDR1) gene expression by adenoviral vector-mediated MDR1 ribonucleic acid interference was assessed in a human colon cancer animal model using bioluminescent imaging with Renilla luciferase (Rluc) gene and coelenterazine, a substrate for Rluc or MDR1 gene expression. A fluorescent microscopic examination demonstrated an increased green fluorescent protein signal in Ad-shMDR1- (recombinant adenovirus that coexpressed MDR1 small hairpin ribonucleic acid [shRNA] and green fluorescent protein) infected HCT-15/Rluc cells in a virus dose-dependent manner. Concurrently, with an increasing administered virus dose (0, 15, 30, 60, and 120 multiplicity of infection), Rluc activity was significantly increased in Ad-shMDR1-infected HCT-15/Rluc cells in a virus dose-dependent manner. In vivo bioluminescent imaging showed about 7.5-fold higher signal intensity in Ad-shMDR1-infected tumors than in control tumors (p < .05). Immunohistologic analysis demonstrated marked reduction of P-glycoprotein expression in infected tumor but not in control tumor. In conclusion, the reversal of MDR1 gene expression by MDR1 shRNA was successfully evaluated by bioluminescence imaging with Rluc activity using an in vivo animal model with a multidrug resistance cancer xenograft.

  15. Plasticity of genetic interactions in metabolic networks of yeast.

    PubMed

    Harrison, Richard; Papp, Balázs; Pál, Csaba; Oliver, Stephen G; Delneri, Daniela

    2007-02-13

    Why are most genes dispensable? The impact of gene deletions may depend on the environment (plasticity), the presence of compensatory mechanisms (mutational robustness), or both. Here, we analyze the interaction between these two forces by exploring the condition-dependence of synthetic genetic interactions that define redundant functions and alternative pathways. We performed systems-level flux balance analysis of the yeast (Saccharomyces cerevisiae) metabolic network to identify genetic interactions and then tested the model's predictions with in vivo gene-deletion studies. We found that the majority of synthetic genetic interactions are restricted to certain environmental conditions, partly because of the lack of compensation under some (but not all) nutrient conditions. Moreover, the phylogenetic cooccurrence of synthetically interacting pairs is not significantly different from random expectation. These findings suggest that these gene pairs have at least partially independent functions, and, hence, compensation is only a byproduct of their evolutionary history. Experimental analyses that used multiple gene deletion strains not only confirmed predictions of the model but also showed that investigation of false predictions may both improve functional annotation within the model and also lead to the discovery of higher-order genetic interactions. Our work supports the view that functional redundancy may be more apparent than real, and it offers a unified framework for the evolution of environmental adaptation and mutational robustness.

  16. Genetic analysis of tachyzoite to bradyzoite differentiation mutants in Toxoplasma gondii reveals a hierarchy of gene induction.

    PubMed

    Singh, Upinder; Brewer, Jeremy L; Boothroyd, John C

    2002-05-01

    Developmental switching in Toxoplasma gondii, from the virulent tachyzoite to the relatively quiescent bradyzoite stage, is responsible for disease propagation and reactivation. We have generated tachyzoite to bradyzoite differentiation (Tbd-) mutants in T. gondii and used these in combination with a cDNA microarray to identify developmental pathways in bradyzoite formation. Four independently generated Tbd- mutants were analysed and had defects in bradyzoite development in response to multiple bradyzoite-inducing conditions, a stable phenotype after in vivo passages and a markedly reduced brain cyst burden in a murine model of chronic infection. Transcriptional profiles of mutant and wild-type parasites, growing under bradyzoite conditions, revealed a hierarchy of developmentally regulated genes, including many bradyzoite-induced genes whose transcripts were reduced in all mutants. A set of non-developmentally regulated genes whose transcripts were less abundant in Tbd- mutants were also identified. These may represent genes that mediate downstream effects and/or whose expression is dependent on the same transcription factors as the bradyzoite-induced set. Using these data, we have generated a model of transcription regulation during bradyzoite development in T. gondii. Our approach shows the utility of this system as a model to study developmental biology in single-celled eukaryotes including protozoa and fungi.

  17. In vivo simultaneous transcriptional activation of multiple genes in the brain using CRISPR-dCas9-activator transgenic mice.

    PubMed

    Zhou, Haibo; Liu, Junlai; Zhou, Changyang; Gao, Ni; Rao, Zhiping; Li, He; Hu, Xinde; Li, Changlin; Yao, Xuan; Shen, Xiaowen; Sun, Yidi; Wei, Yu; Liu, Fei; Ying, Wenqin; Zhang, Junming; Tang, Cheng; Zhang, Xu; Xu, Huatai; Shi, Linyu; Cheng, Leping; Huang, Pengyu; Yang, Hui

    2018-03-01

    Despite rapid progresses in the genome-editing field, in vivo simultaneous overexpression of multiple genes remains challenging. We generated a transgenic mouse using an improved dCas9 system that enables simultaneous and precise in vivo transcriptional activation of multiple genes and long noncoding RNAs in the nervous system. As proof of concept, we were able to use targeted activation of endogenous neurogenic genes in these transgenic mice to directly and efficiently convert astrocytes into functional neurons in vivo. This system provides a flexible and rapid screening platform for studying complex gene networks and gain-of-function phenotypes in the mammalian brain.

  18. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  19. Genes with a spike expression are clustered in chromosome (sub)bands and spike (sub)bands have a powerful prognostic value in patients with multiple myeloma

    PubMed Central

    Kassambara, Alboukadel; Hose, Dirk; Moreaux, Jérôme; Walker, Brian A.; Protopopov, Alexei; Reme, Thierry; Pellestor, Franck; Pantesco, Véronique; Jauch, Anna; Morgan, Gareth; Goldschmidt, Hartmut; Klein, Bernard

    2012-01-01

    Background Genetic abnormalities are common in patients with multiple myeloma, and may deregulate gene products involved in tumor survival, proliferation, metabolism and drug resistance. In particular, translocations may result in a high expression of targeted genes (termed spike expression) in tumor cells. We identified spike genes in multiple myeloma cells of patients with newly-diagnosed myeloma and investigated their prognostic value. Design and Methods Genes with a spike expression in multiple myeloma cells were picked up using box plot probe set signal distribution and two selection filters. Results In a cohort of 206 newly diagnosed patients with multiple myeloma, 2587 genes/expressed sequence tags with a spike expression were identified. Some spike genes were associated with some transcription factors such as MAF or MMSET and with known recurrent translocations as expected. Spike genes were not associated with increased DNA copy number and for a majority of them, involved unknown mechanisms. Of spiked genes, 36.7% clustered significantly in 149 out of 862 documented chromosome (sub)bands, of which 53 had prognostic value (35 bad, 18 good). Their prognostic value was summarized with a spike band score that delineated 23.8% of patients with a poor median overall survival (27.4 months versus not reached, P<0.001) using the training cohort of 206 patients. The spike band score was independent of other gene expression profiling-based risk scores, t(4;14), or del17p in an independent validation cohort of 345 patients. Conclusions We present a new approach to identify spike genes and their relationship to patients’ survival. PMID:22102711

  20. Refined mapping of autoimmune disease associated genetic variants with gene expression suggests an important role for non-coding RNAs.

    PubMed

    Ricaño-Ponce, Isis; Zhernakova, Daria V; Deelen, Patrick; Luo, Oscar; Li, Xingwang; Isaacs, Aaron; Karjalainen, Juha; Di Tommaso, Jennifer; Borek, Zuzanna Agnieszka; Zorro, Maria M; Gutierrez-Achury, Javier; Uitterlinden, Andre G; Hofman, Albert; van Meurs, Joyce; Netea, Mihai G; Jonkers, Iris H; Withoff, Sebo; van Duijn, Cornelia M; Li, Yang; Ruan, Yijun; Franke, Lude; Wijmenga, Cisca; Kumar, Vinod

    2016-04-01

    Genome-wide association and fine-mapping studies in 14 autoimmune diseases (AID) have implicated more than 250 loci in one or more of these diseases. As more than 90% of AID-associated SNPs are intergenic or intronic, pinpointing the causal genes is challenging. We performed a systematic analysis to link 460 SNPs that are associated with 14 AID to causal genes using transcriptomic data from 629 blood samples. We were able to link 71 (39%) of the AID-SNPs to two or more nearby genes, providing evidence that for part of the AID loci multiple causal genes exist. While 54 of the AID loci are shared by one or more AID, 17% of them do not share candidate causal genes. In addition to finding novel genes such as ULK3, we also implicate novel disease mechanisms and pathways like autophagy in celiac disease pathogenesis. Furthermore, 42 of the AID SNPs specifically affected the expression of 53 non-coding RNA genes. To further understand how the non-coding genome contributes to AID, the SNPs were linked to functional regulatory elements, which suggest a model where AID genes are regulated by network of chromatin looping/non-coding RNAs interactions. The looping model also explains how a causal candidate gene is not necessarily the gene closest to the AID SNP, which was the case in nearly 50% of cases. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

    PubMed Central

    Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

    2013-01-01

    Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129

  2. Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants.

    PubMed

    van Baren, Marijke J; Bachy, Charles; Reistetter, Emily Nahas; Purvine, Samuel O; Grimwood, Jane; Sudek, Sebastian; Yu, Hang; Poirier, Camille; Deerinck, Thomas J; Kuo, Alan; Grigoriev, Igor V; Wong, Chee-Hong; Smith, Richard D; Callister, Stephen J; Wei, Chia-Lin; Schmutz, Jeremy; Worden, Alexandra Z

    2016-03-31

    Prasinophytes are widespread marine green algae that are related to plants. Cellular abundance of the prasinophyte Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these unicellular eukaryotes are important for marine ecology and for understanding Viridiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb genome of Micromonas commoda (RCC299; named herein) shows they share ≤8,141 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequenced eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26 %) GC splice donors. Micromonas has more genus-specific protein families (19 %) than other genome sequenced prasinophytes (11 %). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other classes retain the entire PG pathway, like moss and glaucophyte algae. Surprisingly, multiple vascular plants also have the PG pathway, except the Penicillin-Binding Protein, and share a unique bi-domain protein potentially associated with the pathway. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in PG-pathway retention and implicate a role in chloroplast structure or division in several extant Viridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the plastid, have been selectively retained in multiple plants and algae, implying a biological function. Our studies provide robust genomic resources for emerging model algae, advancing knowledge of marine phytoplankton and plant evolution.

  3. Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies.

    PubMed

    Schaid, Daniel J; Sinnwell, Jason P; Jenkins, Gregory D; McDonnell, Shannon K; Ingle, James N; Kubo, Michiaki; Goss, Paul E; Costantino, Joseph P; Wickerham, D Lawrence; Weinshilboum, Richard M

    2012-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc "fixes." To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted P-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. © 2011 Wiley Periodicals, Inc.

  4. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  5. CMCpy: Genetic Code-Message Coevolution Models in Python

    PubMed Central

    Becich, Peter J.; Stark, Brian P.; Bhat, Harish S.; Ardell, David H.

    2013-01-01

    Code-message coevolution (CMC) models represent coevolution of a genetic code and a population of protein-coding genes (“messages”). Formally, CMC models are sets of quasispecies coupled together for fitness through a shared genetic code. Although CMC models display plausible explanations for the origin of multiple genetic code traits by natural selection, useful modern implementations of CMC models are not currently available. To meet this need we present CMCpy, an object-oriented Python API and command-line executable front-end that can reproduce all published results of CMC models. CMCpy implements multiple solvers for leading eigenpairs of quasispecies models. We also present novel analytical results that extend and generalize applications of perturbation theory to quasispecies models and pioneer the application of a homotopy method for quasispecies with non-unique maximally fit genotypes. Our results therefore facilitate the computational and analytical study of a variety of evolutionary systems. CMCpy is free open-source software available from http://pypi.python.org/pypi/CMCpy/. PMID:23532367

  6. Comparative genomics in the Asteraceae reveals little evidence for parallel evolutionary change in invasive taxa.

    PubMed

    Hodgins, Kathryn A; Bock, Dan G; Hahn, Min A; Heredia, Sylvia M; Turner, Kathryn G; Rieseberg, Loren H

    2015-05-01

    Asteraceae, the largest family of flowering plants, has given rise to many notorious invasive species. Using publicly available transcriptome assemblies from 35 Asteraceae, including six major invasive species, we examined evidence for micro- and macro-evolutionary genomic changes associated with invasion. To detect episodes of positive selection repeated across multiple introductions, we conducted comparisons between native and introduced genotypes from six focal species and identified genes with elevated rates of amino acid change (dN/dS). We then looked for evidence of positive selection at a broader phylogenetic scale across all taxa. As invasive species may experience founder events during colonization and spread, we also looked for evidence of increased genetic load in introduced genotypes. We rarely found evidence for parallel changes in orthologous genes in the intraspecific comparisons, but in some cases we identified changes in members of the same gene family. Using among-species comparisons, we detected positive selection in 0.003-0.69% and 2.4-7.8% of the genes using site and stochastic branch-site models, respectively. These genes had diverse putative functions, including defence response, stress response and herbicide resistance, although there was no clear pattern in the GO terms. There was no indication that introduced genotypes have a higher proportion of deleterious alleles than native genotypes in the six focal species, suggesting multiple introductions and admixture mitigated the impact of drift. Our findings provide little evidence for common genomic responses in invasive taxa of the Asteraceae and hence suggest that multiple evolutionary pathways may lead to adaptation during introduction and spread in these species. © 2014 John Wiley & Sons Ltd.

  7. Identification of five novel modifier loci of ApcMin harbored in the BXH14 recombinant inbred strain

    PubMed Central

    Siracusa, Linda D.

    2012-01-01

    Every year thousands of people in the USA are diagnosed with small intestine and colorectal cancers (CRC). Although environmental factors affect disease etiology, uncovering underlying genetic factors is imperative for risk assessment and developing preventative therapies. Familial adenomatous polyposis is a heritable genetic disorder in which individuals carry germ-line mutations in the adenomatous polyposis coli (APC) gene that predisposes them to CRC. The Apc Min mouse model carries a point mutation in the Apc gene and develops polyps along the intestinal tract. Inbred strain background influences polyp phenotypes in Apc Min mice. Several Modifier of Min (Mom) loci that alter tumor phenotypes associated with the Apc Min mutation have been identified to date. We screened BXH recombinant inbred (RI) strains by crossing BXH RI females with C57BL/6J (B6) Apc Min males and quantitating tumor phenotypes in backcross progeny. We found that the BXH14 RI strain harbors five modifier loci that decrease polyp multiplicity. Furthermore, we show that resistance is determined by varying combinations of these modifier loci. Gene interaction network analysis shows that there are multiple networks with proven gene–gene interactions, which contain genes from all five modifier loci. We discuss the implications of this result for studies that define susceptibility loci, namely that multiple networks may be acting concurrently to alter tumor phenotypes. Thus, the significance of this work resides not only with the modifier loci we identified but also with the combinations of loci needed to get maximal protection against polyposis and the impact of this finding on human disease studies. Abbreviations:APCadenomatous polyposis coliGWASgenome-wide association studiesQTLquantitative trait lociSNPsingle-nucleotide polymorphism. PMID:22637734

  8. Thermodynamics-Based Models of Transcriptional Regulation by Enhancers: The Roles of Synergistic Activation, Cooperative Binding and Short-Range Repression

    PubMed Central

    He, Xin; Samee, Md. Abul Hassan; Blatti, Charles; Sinha, Saurabh

    2010-01-01

    Quantitative models of cis-regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled, or heuristic approximations of the underlying regulatory mechanisms. We have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence, as a function of transcription factor concentrations and their DNA-binding specificities. It uses statistical thermodynamics theory to model not only protein-DNA interaction, but also the effect of DNA-bound activators and repressors on gene expression. In addition, the model incorporates mechanistic features such as synergistic effect of multiple activators, short range repression, and cooperativity in transcription factor-DNA binding, allowing us to systematically evaluate the significance of these features in the context of available expression data. Using this model on segmentation-related enhancers in Drosophila, we find that transcriptional synergy due to simultaneous action of multiple activators helps explain the data beyond what can be explained by cooperative DNA-binding alone. We find clear support for the phenomenon of short-range repression, where repressors do not directly interact with the basal transcriptional machinery. We also find that the binding sites contributing to an enhancer's function may not be conserved during evolution, and a noticeable fraction of these undergo lineage-specific changes. Our implementation of the model, called GEMSTAT, is the first publicly available program for simultaneously modeling the regulatory activities of a given set of sequences. PMID:20862354

  9. Gene flow in complex landscapes: Testing multiple hypotheses with causal modeling

    Treesearch

    Samuel A. Cushman; Kevin S. McKelvey; Jim Hayden; Michael K. Schwartz

    2006-01-01

    Predicting population-level effects of landscape change depends on identifying factors that influence population connectivity in complex landscapes. However, most putative movement corridors and barriers have not been based on empirical data. In this study, we identify factors that influence connectivity by comparing patterns of genetic similarity among 146 black bears...

  10. Multiple ß-defensin genes are upregulated by the vitamin D pathway in cattle

    USDA-ARS?s Scientific Manuscript database

    Experimental models of bacterial and viral infections in cattle have suggested vitamin D has a role in innate immunity of cattle. The intracrine vitamin D pathway of bovine macrophages, however, has only been shown to activate a nitric oxide-mediated defense mechanism, as opposed to cathelicidin and...

  11. A pathway-based network analysis of hypertension-related genes

    NASA Astrophysics Data System (ADS)

    Wang, Huan; Hu, Jing-Bo; Xu, Chuan-Yun; Zhang, De-Hai; Yan, Qian; Xu, Ming; Cao, Ke-Fei; Zhang, Xu-Sheng

    2016-02-01

    Complex network approach has become an effective way to describe interrelationships among large amounts of biological data, which is especially useful in finding core functions and global behavior of biological systems. Hypertension is a complex disease caused by many reasons including genetic, physiological, psychological and even social factors. In this paper, based on the information of biological pathways, we construct a network model of hypertension-related genes of the salt-sensitive rat to explore the interrelationship between genes. Statistical and topological characteristics show that the network has the small-world but not scale-free property, and exhibits a modular structure, revealing compact and complex connections among these genes. By the threshold of integrated centrality larger than 0.71, seven key hub genes are found: Jun, Rps6kb1, Cycs, Creb312, Cdk4, Actg1 and RT1-Da. These genes should play an important role in hypertension, suggesting that the treatment of hypertension should focus on the combination of drugs on multiple genes.

  12. A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.

    PubMed

    Soneson, Charlotte; Fontes, Magnus

    2012-01-01

    Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.

  13. Formononetin-induced oxidative stress abrogates the activation of STAT3/5 signaling axis and suppresses the tumor growth in multiple myeloma preclinical model.

    PubMed

    Kim, Chulwon; Lee, Seok-Geun; Yang, Woong Mo; Arfuso, Frank; Um, Jae-Young; Kumar, Alan Prem; Bian, Jinsong; Sethi, Gautam; Ahn, Kwang Seok

    2018-05-29

    Aberrant reactions of signal transducer and transcriptional activator (STAT) are frequently detected in multiple myeloma (MM) cancers and can upregulate the expression of multiple genes related to cell proliferation, survival, metastasis, and angiogenesis. Therefore, agents capable of inhibiting STAT activation can form the basis of novel therapies for MM patients. In the present study, we investigated whether the potential anti-cancer effects of Formononetin (FT), a naturally occurring isoflavone derived from Astragalus membranaceus, Trifolium pratense, Glycyrrhiza glabra, and Pueraria lobata, against MM cell lines and human multiple myeloma xenograft tumors in athymic nu/nu mice model are mediated through the negative regulation of STAT3 and STAT5 pathways. Data from the in vitro studies indicated that FT could significantly inhibit cell viability, and induce apoptosis. Interestingly, FT also suppressed constitutive STAT3 (tyrosine residue 705 and serine residue 727) and STAT5 (tyrosine residue 694/699) activation, which correlated with the suppression of the upstream kinases (JAK1, JAK2, and c-Src) in MM cells, and this effect was found to be mediated via an increased production of reactive oxygen species (ROS) due to GSH/GSSG imbalance. Also, FT abrogated STAT3 and STAT5 DNA binding capacity and nuclear translocation. FT induced cell cycle arrest, downregulated the expression of STAT3-regulated anti-apoptotic, angiogenetic, and proliferative gene products; and this correlated with induction of caspase-3 activation and cleavage of PARP. Intraperitoneal administration of FT significantly suppressed the tumor growth in the multiple myeloma xenograft mouse model without exhibiting any significant adverse effects. Overall, our findings indicate that FT exhibits significant anti-cancer effects in MM that may be primarily mediated through the ROS-regulated inhibition of the STAT3 and STAT5 signaling cascade. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Multiple OPR genes influence personality traits in substance dependent and healthy subjects in two American populations

    PubMed Central

    Luo, Xingguang; Zuo, Lingjun; Kranzler, Henry; Zhang, Huiping; Wang, Shuang; Gelernter, Joel

    2011-01-01

    Background Personality traits are among the most complex quantitative traits. Certain personality traits are associated with substance dependence (SD); genetic factors may influence both. Associations between opioid receptor (OPR) genes and SD have been reported. This study investigated the relationship between OPR genes and personality traits in a case-control sample. Methods We assessed dimensions of the five-factor model of personality in 556 subjects: 250 with SD [181 European-Americans (EAs) and 69 African-Americans (AAs)] and 306 healthy subjects (266 EAs and 40 AAs). We genotyped 20 OPRM1 markers, 8 OPRD1 markers, and 7 OPRK1 markers, and 38 unlinked ancestry-informative markers in these subjects. The relationships between OPR genes and personality traits were examined using MANCOVA, controlling for gene-gene interaction effects and potential confounders. Associations were decomposed by Roy-Bargmann Stepdown ANCOVA. Results Personality traits were associated as main or interaction effects with the haplotypes, diplotypes, alleles and genotypes at the three OPR genes (0.002

  15. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

    PubMed Central

    Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

    2013-01-01

    The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284

  16. Coregulation of terpenoid pathway genes and prediction of isoprene production in Bacillus subtilis using transcriptomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hess, Becky M.; Xue, Junfeng; Markillie, Lye Meng

    2013-06-19

    The isoprenoid pathway converts pyruvate to isoprene and related isoprenoid compounds in plants and some bacteria. Currently, this pathway is of great interest because of the critical role that isoprenoids play in basic cellular processes as well as the industrial value of metabolites such as isoprene. Although the regulation of several pathway genes has been described, there is a paucity of information regarding the system level regulation and control of the pathway. To address this limitation, we examined Bacillus subtilis grown under multiple conditions and then determined the relationship between altered isoprene production and the pattern of gene expression. Wemore » found that terpenoid genes appeared to fall into two distinct subsets with opposing correlations with respect to the amount of isoprene produced. The group whose expression levels positively correlated with isoprene production included dxs, the gene responsible for the commitment step in the pathway, as well as ispD, and two genes that participate in the mevalonate pathway, yhfS and pksG. The subset of terpenoid genes that inversely correlated with isoprene production included ispH, ispF, hepS, uppS, ispE, and dxr. A genome wide partial least squares regression model was created to identify other genes or pathways that contribute to isoprene production. This analysis showed that a subset of 213 regulated genes was sufficient to create a predictive model of isoprene production under different conditions and showed correlations at the transcriptional level. We conclude that gene expression levels alone are sufficiently informative about the metabolic state of a cell that produces increased isoprene and can be used to build a model which accurately predicts production of this secondary metabolite across many simulated environmental conditions.« less

  17. Interrogating the topological robustness of gene regulatory circuits by randomization

    PubMed Central

    Levine, Herbert; Onuchic, Jose N.

    2017-01-01

    One of the most important roles of cells is performing their cellular tasks properly for survival. Cells usually achieve robust functionality, for example, cell-fate decision-making and signal transduction, through multiple layers of regulation involving many genes. Despite the combinatorial complexity of gene regulation, its quantitative behavior has been typically studied on the basis of experimentally verified core gene regulatory circuitry, composed of a small set of important elements. It is still unclear how such a core circuit operates in the presence of many other regulatory molecules and in a crowded and noisy cellular environment. Here we report a new computational method, named random circuit perturbation (RACIPE), for interrogating the robust dynamical behavior of a gene regulatory circuit even without accurate measurements of circuit kinetic parameters. RACIPE generates an ensemble of random kinetic models corresponding to a fixed circuit topology, and utilizes statistical tools to identify generic properties of the circuit. By applying RACIPE to simple toggle-switch-like motifs, we observed that the stable states of all models converge to experimentally observed gene state clusters even when the parameters are strongly perturbed. RACIPE was further applied to a proposed 22-gene network of the Epithelial-to-Mesenchymal Transition (EMT), from which we identified four experimentally observed gene states, including the states that are associated with two different types of hybrid Epithelial/Mesenchymal phenotypes. Our results suggest that dynamics of a gene circuit is mainly determined by its topology, not by detailed circuit parameters. Our work provides a theoretical foundation for circuit-based systems biology modeling. We anticipate RACIPE to be a powerful tool to predict and decode circuit design principles in an unbiased manner, and to quantitatively evaluate the robustness and heterogeneity of gene expression. PMID:28362798

  18. Spontaneous and evolutionary changes in the antibiotic resistance of Burkholderia cenocepacia observed by global gene expression analysis.

    PubMed

    Sass, Andrea; Marchbank, Angela; Tullis, Elizabeth; Lipuma, John J; Mahenthiralingam, Eshwar

    2011-07-22

    Burkholderia cenocepacia is a member of the Burkholderia cepacia complex group of bacteria that cause infections in individuals with cystic fibrosis. B. cenocepacia isolate J2315 has been genome sequenced and is representative of a virulent, epidemic CF strain (ET12). Its genome encodes multiple antimicrobial resistance pathways and it is not known which of these is important for intrinsic or spontaneous resistance. To map these pathways, transcriptomic analysis was performed on: (i) strain J2315 exposed to sub-inhibitory concentrations of antibiotics and the antibiotic potentiator chlorpromazine, and (ii) on spontaneous mutants derived from J2315 and with increased resistance to the antibiotics amikacin, meropenem and trimethoprim-sulfamethoxazole. Two pan-resistant ET12 outbreak isolates recovered two decades after J2315 were also compared to identify naturally evolved gene expression changes. Spontaneous resistance in B. cenocepacia involved more gene expression changes and different subsets of genes than those provoked by exposure to sub inhibitory concentrations of each antibiotic. The phenotype and altered gene expression in the resistant mutants was also stable irrespective of the presence of the priming antibiotic. Both known and novel genes involved in efflux, antibiotic degradation/modification, membrane function, regulation and unknown functions were mapped. A novel role for the phenylacetic acid (PA) degradation pathway genes was identified in relation to spontaneous resistance to meropenem and glucose was found to repress their expression. Subsequently, 20 mM glucose was found to produce greater that 2-fold reductions in the MIC of multiple antibiotics against B. cenocepacia J2315. Mutation of an RND multidrug efflux pump locus (BCAM0925-27) and squalene-hopene cyclase gene (BCAS0167), both upregulated after chlorpromazine exposure, confirmed their role in resistance. The recently isolated outbreak isolates had altered the expression of multiple genes which mirrored changes seen in the antibiotic resistant mutants, corroborating the strategy used to model resistance. Mutation of an ABC transporter gene (BCAS0081) upregulated in both outbreak strains, confirmed its role in B. cenocepacia resistance. Global mapping of the genetic pathways which mediate antibiotic resistance in B. cenocepacia has revealed that they are multifactorial, identified potential therapeutic targets and also demonstrated that putative catabolite repression of genes by glucose can improve antibiotic efficacy.

  19. CRISPR-Cas9: a promising tool for gene editing on induced pluripotent stem cells

    PubMed Central

    Kim, Eun Ji; Kang, Ki Ho; Ju, Ji Hyeon

    2017-01-01

    Recent advances in genome editing with programmable nucleases have opened up new avenues for multiple applications, from basic research to clinical therapy. The ease of use of the technology—and particularly clustered regularly interspaced short palindromic repeats (CRISPR)—will allow us to improve our understanding of genomic variation in disease processes via cellular and animal models. Here, we highlight the progress made in correcting gene mutations in monogenic hereditary disorders and discuss various CRISPR-associated applications, such as cancer research, synthetic biology, and gene therapy using induced pluripotent stem cells. The challenges, ethical issues, and future prospects of CRISPR-based systems for human research are also discussed. PMID:28049282

  20. CRISPR-Cas9: a promising tool for gene editing on induced pluripotent stem cells.

    PubMed

    Kim, Eun Ji; Kang, Ki Ho; Ju, Ji Hyeon

    2017-01-01

    Recent advances in genome editing with programmable nucleases have opened up new avenues for multiple applications, from basic research to clinical therapy. The ease of use of the technology-and particularly clustered regularly interspaced short palindromic repeats (CRISPR)-will allow us to improve our understanding of genomic variation in disease processes via cellular and animal models. Here, we highlight the progress made in correcting gene mutations in monogenic hereditary disorders and discuss various CRISPR-associated applications, such as cancer research, synthetic biology, and gene therapy using induced pluripotent stem cells. The challenges, ethical issues, and future prospects of CRISPR-based systems for human research are also discussed.

  1. Evidence for a large expansion and subfunctionalisation of globin genes in sea anemones.

    PubMed

    Smith, Hayden L; Pavasovic, Ana; Surm, Joachim M; Phillips, Matthew J; Prentis, Peter J

    2018-06-27

    The globin gene superfamily has been well-characterised in vertebrates, however, there has been limited research in early-diverging lineages, such as phylum Cnidaria. This study aimed to identify globin genes in multiple cnidarian lineages, and use bioinformatic approaches to characterise the evolution, structure and expression of these genes. Phylogenetic analyses and in silico protein predictions showed that all cnidarians have undergone an expansion of globin genes, which likely have a hexacoordinate protein structure. Our protein modelling has also revealed the possibility of a single pentacoordinate globin lineage in anthozoan species. Some cnidarian globin genes displayed tissue and development specific expression with very few orthologous genes similarly expressed across species. Our phylogenetic analyses also revealed that eumetazoan globin genes form a polyphyletic relationship with vertebrate globin genes. Overall, our analyses suggest that a Ngb-like and GbX-like gene were most likely present in the globin gene repertoire for the last common ancestor of eumetazoans. The identification of a large-scale expansion and subfunctionalisation of globin genes in actiniarians provides an excellent starting point to further our understanding of the evolution and function of the globin gene superfamily in early-diverging lineages.

  2. SigmaS controls multiple pathways associated with intracellular multiplication of Legionella pneumophila.

    PubMed

    Hovel-Miner, Galadriel; Pampou, Sergey; Faucher, Sebastien P; Clarke, Margaret; Morozova, Irina; Morozov, Pavel; Russo, James J; Shuman, Howard A; Kalachikov, Sergey

    2009-04-01

    Legionella pneumophila is the causative agent of the severe and potentially fatal pneumonia Legionnaires' disease. L. pneumophila is able to replicate within macrophages and protozoa by establishing a replicative compartment in a process that requires the Icm/Dot type IVB secretion system. The signals and regulatory pathways required for Legionella infection and intracellular replication are poorly understood. Mutation of the rpoS gene, which encodes sigma(S), does not affect growth in rich medium but severely decreases L. pneumophila intracellular multiplication within protozoan hosts. To gain insight into the intracellular multiplication defect of an rpoS mutant, we examined its pattern of gene expression during exponential and postexponential growth. We found that sigma(S) affects distinct groups of genes that contribute to Legionella intracellular multiplication. We demonstrate that rpoS mutants have a functional Icm/Dot system yet are defective for the expression of many genes encoding Icm/Dot-translocated substrates. We also show that sigma(S) affects the transcription of the cpxR and pmrA genes, which encode two-component response regulators that directly affect the transcription of Icm/Dot substrates. Our characterization of the L. pneumophila small RNA csrB homologs, rsmY and rsmZ, introduces a link between sigma(S) and the posttranscriptional regulator CsrA. We analyzed the network of sigma(S)-controlled genes by mutational analysis of transcriptional regulators affected by sigma(S). One of these, encoding the L. pneumophila arginine repressor homolog gene, argR, is required for maximal intracellular growth in amoebae. These data show that sigma(S) is a key regulator of multiple pathways required for L. pneumophila intracellular multiplication.

  3. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation

    PubMed Central

    Roger, Andrew J; Hug, Laura A

    2006-01-01

    Determining the relationships among and divergence times for the major eukaryotic lineages remains one of the most important and controversial outstanding problems in evolutionary biology. The sequencing and phylogenetic analyses of ribosomal RNA (rRNA) genes led to the first nearly comprehensive phylogenies of eukaryotes in the late 1980s, and supported a view where cellular complexity was acquired during the divergence of extant unicellular eukaryote lineages. More recently, however, refinements in analytical methods coupled with the availability of many additional genes for phylogenetic analysis showed that much of the deep structure of early rRNA trees was artefactual. Recent phylogenetic analyses of a multiple genes and the discovery of important molecular and ultrastructural phylogenetic characters have resolved eukaryotic diversity into six major hypothetical groups. Yet relationships among these groups remain poorly understood because of saturation of sequence changes on the billion-year time-scale, possible rapid radiations of major lineages, phylogenetic artefacts and endosymbiotic or lateral gene transfer among eukaryotes. Estimating the divergence dates between the major eukaryote lineages using molecular analyses is even more difficult than phylogenetic estimation. Error in such analyses comes from a myriad of sources including: (i) calibration fossil dates, (ii) the assumed phylogenetic tree, (iii) the nucleotide or amino acid substitution model, (iv) substitution number (branch length) estimates, (v) the model of how rates of evolution change over the tree, (vi) error inherent in the time estimates for a given model and (vii) how multiple gene data are treated. By reanalysing datasets from recently published molecular clock studies, we show that when errors from these various sources are properly accounted for, the confidence intervals on inferred dates can be very large. Furthermore, estimated dates of divergence vary hugely depending on the methods used and their assumptions. Accurate dating of divergence times among the major eukaryote lineages will require a robust tree of eukaryotes, a much richer Proterozoic fossil record of microbial eukaryotes assignable to extant groups for calibration, more sophisticated relaxed molecular clock methods and many more genes sampled from the full diversity of microbial eukaryotes. PMID:16754613

  4. Computational discovery and in vivo validation of hnf4 as a regulatory gene in planarian regeneration.

    PubMed

    Lobo, Daniel; Morokuma, Junji; Levin, Michael

    2016-09-01

    Automated computational methods can infer dynamic regulatory network models directly from temporal and spatial experimental data, such as genetic perturbations and their resultant morphologies. Recently, a computational method was able to reverse-engineer the first mechanistic model of planarian regeneration that can recapitulate the main anterior-posterior patterning experiments published in the literature. Validating this comprehensive regulatory model via novel experiments that had not yet been performed would add in our understanding of the remarkable regeneration capacity of planarian worms and demonstrate the power of this automated methodology. Using the Michigan Molecular Interactions and STRING databases and the MoCha software tool, we characterized as hnf4 an unknown regulatory gene predicted to exist by the reverse-engineered dynamic model of planarian regeneration. Then, we used the dynamic model to predict the morphological outcomes under different single and multiple knock-downs (RNA interference) of hnf4 and its predicted gene pathway interactors β-catenin and hh Interestingly, the model predicted that RNAi of hnf4 would rescue the abnormal regenerated phenotype (tailless) of RNAi of hh in amputated trunk fragments. Finally, we validated these predictions in vivo by performing the same surgical and genetic experiments with planarian worms, obtaining the same phenotypic outcomes predicted by the reverse-engineered model. These results suggest that hnf4 is a regulatory gene in planarian regeneration, validate the computational predictions of the reverse-engineered dynamic model, and demonstrate the automated methodology for the discovery of novel genes, pathways and experimental phenotypes. michael.levin@tufts.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Brassinosteroid and gibberellin control of seedling traits in maize (Zea mays L.).

    PubMed

    Hu, Songlin; Sanchez, Darlene L; Wang, Cuiling; Lipka, Alexander E; Yin, Yanhai; Gardner, Candice A C; Lübberstedt, Thomas

    2017-10-01

    In this study, we established two doubled haploid (DH) libraries with a total of 207 DH lines. We applied BR and GA inhibitors to all DH lines at seedling stage and measured seedling BR and GA inhibitor responses. Moreover, we evaluated field traits for each DH line (untreated). We conducted genome-wide association studies (GWAS) with 62,049 genome wide SNPs to explore the genetic control of seedling traits by BR and GA. In addition, we correlate seedling stage hormone inhibitor response with field traits. Large variation for BR and GA inhibitor response and field traits was observed across these DH lines. Seedling stage BR and GA inhibitor response was significantly correlate with yield and flowering time. Using three different GWAS approaches to balance false positive/negatives, multiple SNPs were discovered to be significantly associated with BR/GA inhibitor responses with some localized within gene models. SNPs from gene model GRMZM2G013391 were associated with GA inhibitor response across all three GWAS models. This gene is expressed in roots and shoots and was shown to regulate GA signaling. These results show that BRs and GAs have a great impact for controlling seedling growth. Gene models from GWAS results could be targets for seeding traits improvement. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Otitis Media in a New Mouse Model for CHARGE Syndrome with a Deletion in the Chd7 Gene

    PubMed Central

    Tian, Cong; Yu, Heping; Yang, Bin; Han, Fengchan; Zheng, Ye; Bartels, Cynthia F.; Schelling, Deborah; Arnold, James E.; Scacheri, Peter C.; Zheng, Qing Yin

    2012-01-01

    Otitis media is a middle ear disease common in children under three years old. Otitis media can occur in normal individuals with no other symptoms or syndromes, but it is often seen in individuals clinically diagnosed with genetic diseases such as CHARGE syndrome, a complex genetic disease caused by mutation in the Chd7 gene and characterized by multiple birth defects. Although otitis media is common in human CHARGE syndrome patients, it has not been reported in mouse models of CHARGE syndrome. In this study, we report a mouse model with a spontaneous deletion mutation in the Chd7 gene and with chronic otitis media of early onset age accompanied by hearing loss. These mice also exhibit morphological alteration in the Eustachian tubes, dysregulation of epithelial proliferation, and decreased density of middle ear cilia. Gene expression profiling revealed up-regulation of Muc5ac, Muc5b and Tgf-β1 transcripts, the products of which are involved in mucin production and TGF pathway regulation. This is the first mouse model of CHARGE syndrome reported to show otitis media with effusion and it will be valuable for studying the etiology of otitis media and other symptoms in CHARGE syndrome. PMID:22539951

  7. Targeted and efficient transfer of multiple value-added genes into wheat varieties

    USDA-ARS?s Scientific Manuscript database

    With an objective to optimize an approach to transfer multiple value added genes to a wheat variety while maintaining and improving agronomic performance, two alleles with mutations in the acetolactate synthase (ALS) gene located on wheat chromosomes 6B and 6D providing tolerance to imidazolinone (I...

  8. Next-generation analysis of cataracts: determining knowledge driven gene-gene interactions using Biofilter, and gene-environment interactions using the PhenX Toolkit.

    PubMed

    Pendergrass, Sarah A; Verma, Shefali S; Holzinger, Emily R; Moore, Carrie B; Wallace, John; Dudek, Scott M; Huggins, Wayne; Kitchner, Terrie; Waudby, Carol; Berg, Richard; McCarty, Catherine A; Ritchie, Marylyn D

    2013-01-01

    Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10(-4) associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.

  9. Fragmentation of the large subunit ribosomal RNA gene in oyster mitochondrial genomes.

    PubMed

    Milbury, Coren A; Lee, Jung C; Cannone, Jamie J; Gaffney, Patrick M; Gutell, Robin R

    2010-09-02

    Discontinuous genes have been observed in bacteria, archaea, and eukaryotic nuclei, mitochondria and chloroplasts. Gene discontinuity occurs in multiple forms: the two most frequent forms result from introns that are spliced out of the RNA and the resulting exons are spliced together to form a single transcript, and fragmented gene transcripts that are not covalently attached post-transcriptionally. Within the past few years, fragmented ribosomal RNA (rRNA) genes have been discovered in bilateral metazoan mitochondria, all within a group of related oysters. In this study, we have characterized this fragmentation with comparative analysis and experimentation. We present secondary structures, modeled using comparative sequence analysis of the discontinuous mitochondrial large subunit rRNA genes of the cupped oysters C. virginica, C. gigas, and C. hongkongensis. Comparative structure models for the large subunit rRNA in each of the three oyster species are generally similar to those for other bilateral metazoans. We also used RT-PCR and analyzed ESTs to determine if the two fragmented LSU rRNAs are spliced together. The two segments are transcribed separately, and not spliced together although they still form functional rRNAs and ribosomes. Although many examples of discontinuous ribosomal genes have been documented in bacteria and archaea, as well as the nuclei, chloroplasts, and mitochondria of eukaryotes, oysters are some of the first characterized examples of fragmented bilateral animal mitochondrial rRNA genes. The secondary structures of the oyster LSU rRNA fragments have been predicted on the basis of previous comparative metazoan mitochondrial LSU rRNA structure models.

  10. Linkage analysis of schizophrenia with five dopamine receptor genes in nine pedigrees

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coon, H.; Byerley, W.; Holik, J.

    Alterations in dopamine neurotransmission have been strongly implicated in the pathogenesis of schizophrenia for nearly 2 decades. Recently, the genes for five dopamine receptors have been cloned and characterized, and genetic and physical map information has become available. Using these five loci as candidate genes, the authors have tested for genetic linkage to schizophrenia in nine multigenerational families which include multiple affected individuals. In addition to testing conservative disease models, the have used a neurophysiological indicator variable, the P50 auditory evoked response. Deficits in gating of the P50 response have been shown to segregate with schizophrenia in this sample andmore » may identify carriers of gene(s) predisposing for schizophrenia. Linkage results were consistently negative, indicating that a defect at any of the actual receptor sites is unlikely to be a major contributor to schizophrenia in the nine families studied. 47 refs., 1 fig., 4 tabs.« less

  11. Mining microarrays for metabolic meaning: nutritional regulation of hypothalamic gene expression.

    PubMed

    Mobbs, Charles V; Yen, Kelvin; Mastaitis, Jason; Nguyen, Ha; Watson, Elizabeth; Wurmbach, Elisa; Sealfon, Stuart C; Brooks, Andrew; Salton, Stephen R J

    2004-06-01

    DNA microarray analysis has been used to investigate relative changes in the level of gene expression in the CNS, including changes that are associated with disease, injury, psychiatric disorders, drug exposure or withdrawal, and memory formation. We have used oligonucleotide microarrays to identify hypothalamic genes that respond to nutritional manipulation. In addition to commonly used microarray analysis based on criteria such as fold-regulation, we have also found that simply carrying out multiple t tests then sorting by P value constitutes a highly reliable method to detect true regulation, as assessed by real-time polymerase chain reaction (PCR), even for relatively low abundance genes or relatively low magnitude of regulation. Such analyses directly suggested novel mechanisms that mediate effects of nutritional state on neuroendocrine function and are being used to identify regulated gene products that may elucidate the metabolic pathology of obese ob/ob, lean Vgf-/Vgf-, and other models with profound metabolic impairments.

  12. Inactivation of the mouse Magel2 gene results in growth abnormalities similar to Prader-Willi syndrome.

    PubMed

    Bischof, Jocelyn M; Stewart, Colin L; Wevrick, Rachel

    2007-11-15

    Prader-Willi syndrome (PWS) is an imprinted genetic obesity disorder characterized by abnormalities of growth and metabolism. Multiple mouse models with deficiency of one or more PWS candidate genes have partially correlated individual genes with aspects of the PWS phenotype, although the genetic origin of defects in growth and metabolism has not been elucidated. Gene-targeted mutation of the PWS candidate gene Magel2 in mice causes altered circadian rhythm output and reduced motor activity. We now report that Magel2-null mice exhibit neonatal growth retardation, excessive weight gain after weaning, and increased adiposity with altered metabolism in adulthood, recapitulating fundamental aspects of the PWS phenotype. Magel2-null mice provide an important opportunity to examine the physiological basis for PWS neonatal failure to thrive and post-weaning weight gain and for the relationships among circadian rhythm, feeding behavior, and metabolism.

  13. Lessons learned: Optimization of a murine small bowel resection model

    PubMed Central

    Taylor, Janice A.; Martin, Colin A.; Nair, Rajalakshmi; Guo, Jun; Erwin, Christopher R.; Warner, Brad W.

    2008-01-01

    Background/Purpose Central to the use of murine models of disease is the ability to derive reproducible data. The purpose of this study was to determine factors contributing to variability in our murine model of small bowel resection (SBR). Methods Male C57Bl/6 mice were randomized to sham or 50% SBR. The effect of housing type (pathogen-free versus standard housing), nutrition (reconstituted powder versus tube feeding formulation), and correlates of intestinal morphology with gene expression changes were investigated Multiple linear regression modeling or one-way ANOVA was used for data analysis. Results Pathogen-free mice had significantly shorter ileal villi at baseline and demonstrated greater villus growth after SBR compared to mice housed in standard rooms. Food type did not affect adaptation. Gene expression changes were more consistent and significant in isolated crypt cells that demonstrated adaptive growth when compared with crypts that did not deepen after SBR. Conclusion Maintenance of mice in pathogen-free conditions and restricting gene expression analysis to individual animals exhibiting morphologic adaptation enhances sensitivity and specificity of data derived from this model. These refinements will minimize experimental variability and lead to improved understanding of the complex process of intestinal adaptation. PMID:18558176

  14. Gene-environment studies: any advantage over environmental studies?

    PubMed

    Bermejo, Justo Lorenzo; Hemminki, Kari

    2007-07-01

    Gene-environment studies have been motivated by the likely existence of prevalent low-risk genes that interact with common environmental exposures. The present study assessed the statistical advantage of the simultaneous consideration of genes and environment to investigate the effect of environmental risk factors on disease. In particular, we contemplated the possibility that several genes modulate the environmental effect. Environmental exposures, genotypes and phenotypes were simulated according to a wide range of parameter settings. Different models of gene-gene-environment interaction were considered. For each parameter combination, we estimated the probability of detecting the main environmental effect, the power to identify the gene-environment interaction and the frequency of environmentally affected individuals at which environmental and gene-environment studies show the same statistical power. The proportion of cases in the population attributable to the modeled risk factors was also calculated. Our data indicate that environmental exposures with weak effects may account for a significant proportion of the population prevalence of the disease. A general result was that, if the environmental effect was restricted to rare genotypes, the power to detect the gene-environment interaction was higher than the power to identify the main environmental effect. In other words, when few individuals contribute to the overall environmental effect, individual contributions are large and result in easily identifiable gene-environment interactions. Moreover, when multiple genes interacted with the environment, the statistical benefit of gene-environment studies was limited to those studies that included major contributors to the gene-environment interaction. The advantage of gene-environment over plain environmental studies also depends on the inheritance mode of the involved genes, on the study design and, to some extend, on the disease prevalence.

  15. Draft Genome of the Scarab Beetle Oryctes borbonicus on La Réunion Island

    PubMed Central

    Meyer, Jan M.; Markov, Gabriel V.; Baskaran, Praveen; Herrmann, Matthias; Sommer, Ralf J.; Rödelsperger, Christian

    2016-01-01

    Beetles represent the largest insect order and they display extreme morphological, ecological and behavioral diversity, which makes them ideal models for evolutionary studies. Here, we present the draft genome of the scarab beetle Oryctes borbonicus, which has a more basal phylogenetic position than the two previously sequenced pest species Tribolium castaneum and Dendroctonus ponderosae providing the potential for sequence polarization. Oryctes borbonicus is endemic to La Réunion, an island located in the Indian Ocean, and is the host of the nematode Pristionchus pacificus, a well-established model organism for integrative evolutionary biology. At 518 Mb, the O. borbonicus genome is substantially larger and encodes more genes than T. castaneum and D. ponderosae. We found that only 25% of the predicted genes of O. borbonicus are conserved as single copy genes across the nine investigated insect genomes, suggesting substantial gene turnover within insects. Even within beetles, up to 21% of genes are restricted to only one species, whereas most other genes have undergone lineage-specific duplications and losses. We illustrate lineage-specific duplications using detailed phylogenetic analysis of two gene families. This study serves as a reference point for insect/coleopteran genomics, although its original motivation was to find evidence for potential horizontal gene transfer (HGT) between O. borbonicus and P. pacificus. The latter was previously shown to be the recipient of multiple horizontally transferred genes including some genes from insect donors. However, our study failed to provide any clear evidence for additional HGTs between the two species. PMID:27289092

  16. Long genes and genes with multiple splice variants are enriched in pathways linked to cancer and other multigenic diseases.

    PubMed

    Sahakyan, Aleksandr B; Balasubramanian, Shankar

    2016-03-12

    The role of random mutations and genetic errors in defining the etiology of cancer and other multigenic diseases has recently received much attention. With the view that complex genes should be particularly vulnerable to such events, here we explore the link between the simple properties of the human genes, such as transcript length, number of splice variants, exon/intron composition, and their involvement in the pathways linked to cancer and other multigenic diseases. We reveal a substantial enrichment of cancer pathways with long genes and genes that have multiple splice variants. Although the latter two factors are interdependent, we show that the overall gene length and splicing complexity increase in cancer pathways in a partially decoupled manner. Our systematic survey for the pathways enriched with top lengthy genes and with genes that have multiple splice variants reveal, along with cancer pathways, the pathways involved in various neuronal processes, cardiomyopathies and type II diabetes. We outline a correlation between the gene length and the number of somatic mutations. Our work is a step forward in the assessment of the role of simple gene characteristics in cancer and a wider range of multigenic diseases. We demonstrate a significant accumulation of long genes and genes with multiple splice variants in pathways of multigenic diseases that have already been associated with de novo mutations. Unlike the cancer pathways, we note that the pathways of neuronal processes, cardiomyopathies and type II diabetes contain genes long enough for topoisomerase-dependent gene expression to also be a potential contributing factor in the emergence of pathologies, should topoisomerases become impaired.

  17. Genetic variation in the oxytocin receptor (OXTR) gene is associated with Asperger Syndrome.

    PubMed

    Di Napoli, Agnese; Warrier, Varun; Baron-Cohen, Simon; Chakrabarti, Bhismadev

    2014-01-01

    Autism Spectrum Conditions (ASC) are a group of neurodevelopmental conditions characterized by impairments in communication and social interaction, alongside unusually repetitive behaviors and narrow interests. ASC are highly heritable and have complex patterns of inheritance where multiple genes are involved, alongside environmental and epigenetic factors. Asperger Syndrome (AS) is a subgroup of these conditions, where there is no history of language or cognitive delay. Animal models suggest a role for oxytocin (OXT) and oxytocin receptor (OXTR) genes in social-emotional behaviors, and several studies indicate that the oxytocin/oxytocin receptor system is altered in individuals with ASC. Previous studies have reported associations between genetic variations in the OXTR gene and ASC. The present study tested for an association between nine single nucleotide polymorphisms (SNPs) in the OXTR gene and AS in 530 individuals of Caucasian origin, using SNP association test and haplotype analysis. There was a significant association between rs2268493 in OXTR and AS. Multiple haplotypes that include this SNP (rs2268493-rs2254298, rs2268490-rs2268493-rs2254298, rs2268493-rs2254298-rs53576, rs237885-rs2268490-rs2268493-rs2254298, rs2268490-rs2268493-rs2254298-rs53576) were also associated with AS. rs2268493 has been previously associated with ASC and putatively alters several transcription factor-binding sites and regulates chromatin states, either directly or through other variants in linkage disequilibrium (LD). This study reports a significant association of the sequence variant rs2268493 in the OXTR gene and associated haplotypes with AS.

  18. High-Throughput Analysis of Promoter Occupancy Reveals New Targets for Arx, a Gene Mutated in Mental Retardation and Interneuronopathies

    PubMed Central

    Quillé, Marie-Lise; Hirchaud, Edouard; Baron, Daniel; Benech, Caroline; Guihot, Jeanne; Placet, Morgane; Mignen, Olivier; Férec, Claude; Houlgatte, Rémi; Friocourt, Gaëlle

    2011-01-01

    Genetic investigations of X-linked intellectual disabilities have implicated the ARX (Aristaless-related homeobox) gene in a wide spectrum of disorders extending from phenotypes characterised by severe neuronal migration defects such as lissencephaly, to mild or moderate forms of mental retardation without apparent brain abnormalities but with associated features of dystonia and epilepsy. Analysis of Arx spatio-temporal localisation profile in mouse revealed expression in telencephalic structures, mainly restricted to populations of GABAergic neurons at all stages of development. Furthermore, studies of the effects of ARX loss of function in humans and animal models revealed varying defects, suggesting multiple roles of this gene during brain development. However, to date, little is known about how ARX functions as a transcription factor and the nature of its targets. To better understand its role, we combined chromatin immunoprecipitation and mRNA expression with microarray analysis and identified a total of 1006 gene promoters bound by Arx in transfected neuroblastoma (N2a) cells and in mouse embryonic brain. Approximately 24% of Arx-bound genes were found to show expression changes following Arx overexpression or knock-down. Several of the Arx target genes we identified are known to be important for a variety of functions in brain development and some of them suggest new functions for Arx. Overall, these results identified multiple new candidate targets for Arx and should help to better understand the pathophysiological mechanisms of intellectual disability and epilepsy associated with ARX mutations. PMID:21966449

  19. Recent Advances in Utilizing Transcription Factors to Improve Plant Abiotic Stress Tolerance by Transgenic Technology

    PubMed Central

    Wang, Hongyan; Wang, Honglei; Shao, Hongbo; Tang, Xiaoli

    2016-01-01

    Agricultural production and quality are adversely affected by various abiotic stresses worldwide and this will be exacerbated by the deterioration of global climate. To feed a growing world population, it is very urgent to breed stress-tolerant crops with higher yields and improved qualities against multiple environmental stresses. Since conventional breeding approaches had marginal success due to the complexity of stress tolerance traits, the transgenic approach is now being popularly used to breed stress-tolerant crops. So identifying and characterizing the critical genes involved in plant stress responses is an essential prerequisite for engineering stress-tolerant crops. Far beyond the manipulation of single functional gene, engineering certain regulatory genes has emerged as an effective strategy now for controlling the expression of many stress-responsive genes. Transcription factors (TFs) are good candidates for genetic engineering to breed stress-tolerant crop because of their role as master regulators of many stress-responsive genes. Many TFs belonging to families AP2/EREBP, MYB, WRKY, NAC, bZIP have been found to be involved in various abiotic stresses and some TF genes have also been engineered to improve stress tolerance in model and crop plants. In this review, we take five large families of TFs as examples and review the recent progress of TFs involved in plant abiotic stress responses and their potential utilization to improve multiple stress tolerance of crops in the field conditions. PMID:26904044

  20. Identification of SNPs associated with variola virus virulence.

    PubMed

    Hoen, Anne Gatewood; Gardner, Shea N; Moore, Jason H

    2013-02-14

    Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity.

  1. Identification of SNPs associated with variola virus virulence

    PubMed Central

    2013-01-01

    Background Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. Conclusions We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity. PMID:23410064

  2. Module-based construction of plasmids for chromosomal integration of the fission yeast Schizosaccharomyces pombe

    PubMed Central

    Kakui, Yasutaka; Sunaga, Tomonari; Arai, Kunio; Dodgson, James; Ji, Liang; Csikász-Nagy, Attila; Carazo-Salas, Rafael; Sato, Masamitsu

    2015-01-01

    Integration of an external gene into a fission yeast chromosome is useful to investigate the effect of the gene product. An easy way to knock-in a gene construct is use of an integration plasmid, which can be targeted and inserted to a chromosome through homologous recombination. Despite the advantage of integration, construction of integration plasmids is energy- and time-consuming, because there is no systematic library of integration plasmids with various promoters, fluorescent protein tags, terminators and selection markers; therefore, researchers are often forced to make appropriate ones through multiple rounds of cloning procedures. Here, we establish materials and methods to easily construct integration plasmids. We introduce a convenient cloning system based on Golden Gate DNA shuffling, which enables the connection of multiple DNA fragments at once: any kind of promoters and terminators, the gene of interest, in combination with any fluorescent protein tag genes and any selection markers. Each of those DNA fragments, called a ‘module’, can be tandemly ligated in the order we desire in a single reaction, which yields a circular plasmid in a one-step manner. The resulting plasmids can be integrated through standard methods for transformation. Thus, these materials and methods help easy construction of knock-in strains, and this will further increase the value of fission yeast as a model organism. PMID:26108218

  3. Stressing Escherichia coli to educate students about research: A CURE to investigate multiple levels of gene regulation.

    PubMed

    McDonough, Janet; Goudsouzian, Lara K; Papaj, Agllai; Maceli, Ashley R; Klepac-Ceraj, Vanja; Peterson, Celeste N

    2017-09-01

    Course-based undergraduate research experiences (CUREs) have been shown to increase student retention and learning in the biological sciences. Most CURES cover only one aspect of gene regulation, such as transcriptional control. Here we present a new inquiry-based lab that engages understanding of gene expression from multiple perspectives. Students carry out a forward genetic screen to identify regulators of the stationary phase master regulator RpoS in the model organism Escherichia coli and then use a series of reporter fusions to determine if the regulation is at the level of transcription or the post-transcription level. This easy-to-implement course has been run both as a 9-week long project and a condensed 5-6 week version in three different schools and types of courses. A majority of the genes found in the screen are novel, thus giving students the opportunity to contribute to original findings to the field. Assessments of this CURE show student gains in learning in many knowledge areas. In addition, attitudinal surveys suggest the students are enthusiastic about the screen and their learning about gene regulation. In summary, this lab would be an appropriate addition to an intermediate or advanced level Molecular Biology, Genetics, or Microbiology curriculum. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(5):449-458, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.

  4. Lateral Gene Transfer in a Heavy Metal-Contaminated-Groundwater Microbial Community

    PubMed Central

    Hemme, Christopher L.; Green, Stefan J.; Rishishwar, Lavanya; Prakash, Om; Pettenato, Angelica; Chakraborty, Romy; Deutschbauer, Adam M.; Van Nostrand, Joy D.; Wu, Liyou; He, Zhili; Jordan, I. King; Arkin, Adam P.; Kostka, Joel E.

    2016-01-01

    ABSTRACT Unraveling the drivers controlling the response and adaptation of biological communities to environmental change, especially anthropogenic activities, is a central but poorly understood issue in ecology and evolution. Comparative genomics studies suggest that lateral gene transfer (LGT) is a major force driving microbial genome evolution, but its role in the evolution of microbial communities remains elusive. To delineate the importance of LGT in mediating the response of a groundwater microbial community to heavy metal contamination, representative Rhodanobacter reference genomes were sequenced and compared to shotgun metagenome sequences. 16S rRNA gene-based amplicon sequence analysis indicated that Rhodanobacter populations were highly abundant in contaminated wells with low pHs and high levels of nitrate and heavy metals but remained rare in the uncontaminated wells. Sequence comparisons revealed that multiple geochemically important genes, including genes encoding Fe2+/Pb2+ permeases, most denitrification enzymes, and cytochrome c553, were native to Rhodanobacter and not subjected to LGT. In contrast, the Rhodanobacter pangenome contained a recombinational hot spot in which numerous metal resistance genes were subjected to LGT and/or duplication. In particular, Co2+/Zn2+/Cd2+ efflux and mercuric resistance operon genes appeared to be highly mobile within Rhodanobacter populations. Evidence of multiple duplications of a mercuric resistance operon common to most Rhodanobacter strains was also observed. Collectively, our analyses indicated the importance of LGT during the evolution of groundwater microbial communities in response to heavy metal contamination, and a conceptual model was developed to display such adaptive evolutionary processes for explaining the extreme dominance of Rhodanobacter populations in the contaminated groundwater microbiome. PMID:27048805

  5. A combined analysis of genome-wide expression profiling of bipolar disorder in human prefrontal cortex.

    PubMed

    Wang, Jinglu; Qu, Susu; Wang, Weixiao; Guo, Liyuan; Zhang, Kunlin; Chang, Suhua; Wang, Jing

    2016-11-01

    Numbers of gene expression profiling studies of bipolar disorder have been published. Besides different array chips and tissues, variety of the data processes in different cohorts aggravated the inconsistency of results of these genome-wide gene expression profiling studies. By searching the gene expression databases, we obtained six data sets for prefrontal cortex (PFC) of bipolar disorder with raw data and combinable platforms. We used standardized pre-processing and quality control procedures to analyze each data set separately and then combined them into a large gene expression matrix with 101 bipolar disorder subjects and 106 controls. A standard linear mixed-effects model was used to calculate the differentially expressed genes (DEGs). Multiple levels of sensitivity analyses and cross validation with genetic data were conducted. Functional and network analyses were carried out on basis of the DEGs. In the result, we identified 198 unique differentially expressed genes in the PFC of bipolar disorder and control. Among them, 115 DEGs were robust to at least three leave-one-out tests or different pre-processing methods; 51 DEGs were validated with genetic association signals. Pathway enrichment analysis showed these DEGs were related with regulation of neurological system, cell death and apoptosis, and several basic binding processes. Protein-protein interaction network further identified one key hub gene. We have contributed the most comprehensive integrated analysis of bipolar disorder expression profiling studies in PFC to date. The DEGs, especially those with multiple validations, may denote a common signature of bipolar disorder and contribute to the pathogenesis of disease. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Shared molecular pathways and gene networks for cardiovascular disease and type 2 diabetes mellitus in women across diverse ethnicities.

    PubMed

    Chan, Kei Hang K; Huang, Yen-Tsung; Meng, Qingying; Wu, Chunyuan; Reiner, Alexander; Sobel, Eric M; Tinker, Lesley; Lusis, Aldons J; Yang, Xia; Liu, Simin

    2014-12-01

    Although cardiovascular disease (CVD) and type 2 diabetes mellitus (T2D) share many common risk factors, potential molecular mechanisms that may also be shared for these 2 disorders remain unknown. Using an integrative pathway and network analysis, we performed genome-wide association studies in 8155 blacks, 3494 Hispanic American, and 3697 Caucasian American women who participated in the national Women's Health Initiative single-nucleotide polymorphism (SNP) Health Association Resource and the Genomics and Randomized Trials Network. Eight top pathways and gene networks related to cardiomyopathy, calcium signaling, axon guidance, cell adhesion, and extracellular matrix seemed to be commonly shared between CVD and T2D across all 3 ethnic groups. We also identified ethnicity-specific pathways, such as cell cycle (specific for Hispanic American and Caucasian American) and tight junction (CVD and combined CVD and T2D in Hispanic American). In network analysis of gene-gene or protein-protein interactions, we identified key drivers that included COL1A1, COL3A1, and ELN in the shared pathways for both CVD and T2D. These key driver genes were cross-validated in multiple mouse models of diabetes mellitus and atherosclerosis. Our integrative analysis of American women of 3 ethnicities identified multiple shared biological pathways and key regulatory genes for the development of CVD and T2D. These prospective findings also support the notion that ethnicity-specific susceptibility genes and process are involved in the pathogenesis of CVD and T2D. © 2014 American Heart Association, Inc.

  7. Structural and Functional Analysis of the GRAS Gene Family in Grapevine Indicates a Role of GRAS Proteins in the Control of Development and Stress Responses

    PubMed Central

    Grimplet, Jérôme; Agudelo-Romero, Patricia; Teixeira, Rita T.; Martinez-Zapater, Jose M.; Fortes, Ana M.

    2016-01-01

    GRAS transcription factors are involved in many processes of plant growth and development (e.g., axillary shoot meristem formation, root radial patterning, nodule morphogenesis, arbuscular development) as well as in plant disease resistance and abiotic stress responses. However, little information is available concerning this gene family in grapevine (Vitis vinifera L.), an economically important woody crop. We performed a model curation of GRAS genes identified in the latest genome annotation leading to the identification of 52 genes. Gene models were improved and three new genes were identified that could be grapevine- or woody-plant specific. Phylogenetic analysis showed that GRAS genes could be classified into 13 groups that mapped on the 19 V. vinifera chromosomes. Five new subfamilies, previously not characterized in other species, were identified. Multiple sequence alignment showed typical GRAS domain in the proteins and new motifs were also described. As observed in other species, both segmental and tandem duplications contributed significantly to the expansion and evolution of the GRAS gene family in grapevine. Expression patterns across a variety of tissues and upon abiotic and biotic conditions revealed possible divergent functions of GRAS genes in grapevine development and stress responses. By comparing the information available for tomato and grapevine GRAS genes, we identified candidate genes that might constitute conserved transcriptional regulators of both climacteric and non-climacteric fruit ripening. Altogether this study provides valuable information and robust candidate genes for future functional analysis aiming at improving the quality of fleshy fruits. PMID:27065316

  8. Gene finding in metatranscriptomic sequences.

    PubMed

    Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

    2014-01-01

    Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.

  9. DNA repair variants and breast cancer risk.

    PubMed

    Grundy, Anne; Richardson, Harriet; Schuetz, Johanna M; Burstyn, Igor; Spinelli, John J; Brooks-Wilson, Angela; Aronson, Kristan J

    2016-05-01

    A functional DNA repair system has been identified as important in the prevention of tumour development. Previous studies have hypothesized that common polymorphisms in DNA repair genes could play a role in breast cancer risk and also identified the potential for interactions between these polymorphisms and established breast cancer risk factors such as physical activity. Associations with breast cancer risk for 99 single nucleotide polymorphisms (SNPs) from genes in ten DNA repair pathways were examined in a case-control study including both Europeans (644 cases, 809 controls) and East Asians (299 cases, 160 controls). Odds ratios in both additive and dominant genetic models were calculated separately for participants of European and East Asian ancestry using multivariate logistic regression. The impact of multiple comparisons was assessed by correcting for the false discovery rate within each DNA repair pathway. Interactions between several breast cancer risk factors and DNA repair SNPs were also evaluated. One SNP (rs3213282) in the gene XRCC1 was associated with an increased risk of breast cancer in the dominant model of inheritance following adjustment for the false discovery rate (P < 0.05), although no associations were observed for other DNA repair SNPs. Interactions of six SNPs in multiple DNA repair pathways with physical activity were evident prior to correction for FDR, following which there was support for only one of the interaction terms (P < 0.05). No consistent associations between variants in DNA repair genes and breast cancer risk or their modification by breast cancer risk factors were observed. © 2016 Wiley Periodicals, Inc.

  10. Nrf2 and Nrf2-Related Proteins in Development and Developmental Toxicity: Insights from studies in Zebrafish (Danio rerio)

    PubMed Central

    Hahn, Mark E.; Timme-Laragy, Alicia R.; Karchner, Sibel I.; Stegeman, John J.

    2015-01-01

    Oxidative stress is an important mechanism of chemical toxicity, contributing to developmental toxicity and teratogenesis as well as to cardiovascular and neurodegenerative diseases and diabetic embryopathy. Developing animals are especially sensitive to effects of chemicals that disrupt the balance of processes generating reactive species and oxidative stress, and those anti-oxidant defenses that protect against oxidative stress. The expression and inducibility of anti-oxidant defenses through activation of NFE2-related factor 2 (Nrf2) and related proteins is an essential process affecting the susceptibility to oxidants, but the complex interactions of Nrf2 in determining embryonic response to oxidants and oxidative stress are only beginning to be understood. The zebrafish (Danio rerio) is an established model in developmental biology and now also in developmental toxicology and redox signaling. Here we review the regulation of genes involved in protection against oxidative stress in developing vertebrates, with a focus on Nrf2 and related cap’n’collar (CNC)-basic-leucine zipper (bZIP) transcription factors. Vertebrate animals including zebrafish share Nfe2, Nrf1, Nrf2, and Nrf3 as well as a core set of genes that respond to oxidative stress, contributing to the value of zebrafish as a model system with which to investigate the mechanisms involved in regulation of redox signaling and the response to oxidative stress during embryolarval development. Moreover, studies in zebrafish have revealed nrf and keap1 gene duplications that provide an opportunity to dissect multiple functions of vertebrate NRF genes, including multiple sensing mechanisms involved in chemical-specific effects. PMID:26130508

  11. Effect of p27 gene combined with Pientzehuang ([characters: see text]) on tumor growth in osteosarcoma-bearing nude mice.

    PubMed

    Ren, Shou-song; Yuan, Fang; Liu, Ying-hong; Zhou, Le-tian; Li, Jun

    2015-11-01

    To observe the effect of p27 gene recombinant adenovirus combined with Chinese medicine Pientzehuang ([characters: see text]) on the growth of xenografted human osteosarcoma in nude mice. Tissue transplantation was used to construct the orthotopic model of human osteosarcoma Saos-2 cell in nude mice. Thirty tumor-bearing nude mice were randomly divided into 5 groups with 6 mice in each group: blank control group (model of osteosarcoma), empty vector group (recombinant adeno-associated virus-multiple cloning site), Pientzehuang group, p27 gene group and combined treatment group (p27 gene combined with Pientzehuang). The effect of combined treatment on human osteosarcoma was analyzed through the tumor formation, tumor volume and inhibition rate of tumor growth. The expression of p27 was measured by immunohistochemical staining and Western blot. The orthotopic model of osteosarcoma in nude mice was successfully constructed. The general appearance of tumor-bearing nude mice in Pientzehuang and p27 gene groups was markedly improved compared with the blank control group; and in the combined treatment group it was significantly improved compared with the Pientzehuang and p27 gene groups. The tumor growth in the Pientzehuang and p27 gene groups was significantly inhibited compared with the blank control group P<0.05); while in the combined treatment group it was markedly inhibited compared with the Pientzehuang and p27 gene groups (P<0.05). The rates of tumor growth inhibition were 34.1%, 56.5% and 63.8% in the Pientzehuang, p27 gene and combined treatment groups, respectively. Meanwhile, the protein expression of p27 gene in the p27 gene group was significantly increased compared with the blank control group (P<0.05); and it was significantly increased in the combined treatment group compared with the p27 gene and Pientzehuang groups (P<0.05). p27 gene introduced by adenovirus combined with Pientzehuang can inhibit the growth of human osteosarcoma cell Saos-2 in nude mice.

  12. A Tol2 Gateway-Compatible Toolbox for the Study of the Nervous System and Neurodegenerative Disease.

    PubMed

    Don, Emily K; Formella, Isabel; Badrock, Andrew P; Hall, Thomas E; Morsch, Marco; Hortle, Elinor; Hogan, Alison; Chow, Sharron; Gwee, Serene S L; Stoddart, Jack J; Nicholson, Garth; Chung, Roger; Cole, Nicholas J

    2017-02-01

    Currently there is a lack in fundamental understanding of disease progression of most neurodegenerative diseases, and, therefore, treatments and preventative measures are limited. Consequently, there is a great need for adaptable, yet robust model systems to both investigate elementary disease mechanisms and discover effective therapeutics. We have generated a Tol2 Gateway-compatible toolbox to study neurodegenerative disorders in zebrafish, which includes promoters for astrocytes, microglia and motor neurons, multiple fluorophores, and compatibility for the introduction of genes of interest or disease-linked genes. This toolbox will advance the rapid and flexible generation of zebrafish models to discover the biology of the nervous system and the disease processes that lead to neurodegeneration.

  13. Modeling Human Cancers in Drosophila.

    PubMed

    Sonoshita, M; Cagan, R L

    2017-01-01

    Cancer is a complex disease that affects multiple organs. Whole-body animal models provide important insights into oncology that can lead to clinical impact. Here, we review novel concepts that Drosophila studies have established for cancer biology, drug discovery, and patient therapy. Genetic studies using Drosophila have explored the roles of oncogenes and tumor-suppressor genes that when dysregulated promote cancer formation, making Drosophila a useful model to study multiple aspects of transformation. Not limited to mechanism analyses, Drosophila has recently been showing its value in facilitating drug development. Flies offer rapid, efficient platforms by which novel classes of drugs can be identified as candidate anticancer leads. Further, we discuss the use of Drosophila as a platform to develop therapies for individual patients by modeling the tumor's genetic complexity. Drosophila provides both a classical and a novel tool to identify new therapeutics, complementing other more traditional cancer tools. © 2017 Elsevier Inc. All rights reserved.

  14. The future: genetics advances in MEN1 therapeutic approaches and management strategies.

    PubMed

    Agarwal, Sunita K

    2017-10-01

    The identification of the multiple endocrine neoplasia type 1 ( MEN1 ) gene in 1997 has shown that germline heterozygous mutations in the MEN1 gene located on chromosome 11q13 predisposes to the development of tumors in the MEN1 syndrome. Tumor development occurs upon loss of the remaining normal copy of the MEN1 gene in MEN1-target tissues. Therefore, MEN1 is a classic tumor suppressor gene in the context of MEN1. This tumor suppressor role of the protein encoded by the MEN1 gene, menin, holds true in mouse models with germline heterozygous Men1 loss, wherein MEN1-associated tumors develop in adult mice after spontaneous loss of the remaining non-targeted copy of the Men1 gene. The availability of genetic testing for mutations in the MEN1 gene has become an essential part of the diagnosis and management of MEN1. Genetic testing is also helping to exclude mutation-negative cases in MEN1 families from the burden of lifelong clinical screening. In the past 20 years, efforts of various groups world-wide have been directed at mutation analysis, molecular genetic studies, mouse models, gene expression studies, epigenetic regulation analysis, biochemical studies and anti-tumor effects of candidate therapies in mouse models. This review will focus on the findings and advances from these studies to identify MEN1 germline and somatic mutations, the genetics of MEN1-related states, several protein partners of menin, the three-dimensional structure of menin and menin-dependent target genes. The ongoing impact of all these studies on disease prediction, management and outcomes will continue in the years to come. © 2017 Society for Endocrinology.

  15. Coregulation of Terpenoid Pathway Genes and Prediction of Isoprene Production in Bacillus subtilis Using Transcriptomics.

    PubMed

    Hess, Becky M; Xue, Junfeng; Markillie, Lye Meng; Taylor, Ronald C; Wiley, H Steven; Ahring, Birgitte K; Linggi, Bryan

    2013-01-01

    The isoprenoid pathway converts pyruvate to isoprene and related isoprenoid compounds in plants and some bacteria. Currently, this pathway is of great interest because of the critical role that isoprenoids play in basic cellular processes, as well as the industrial value of metabolites such as isoprene. Although the regulation of several pathway genes has been described, there is a paucity of information regarding system level regulation and control of the pathway. To address these limitations, we examined Bacillus subtilis grown under multiple conditions and determined the relationship between altered isoprene production and gene expression patterns. We found that with respect to the amount of isoprene produced, terpenoid genes fall into two distinct subsets with opposing correlations. The group whose expression levels positively correlated with isoprene production included dxs, which is responsible for the commitment step in the pathway, ispD, and two genes that participate in the mevalonate pathway, yhfS and pksG. The subset of terpenoid genes that inversely correlated with isoprene production included ispH, ispF, hepS, uppS, ispE, and dxr. A genome-wide partial least squares regression model was created to identify other genes or pathways that contribute to isoprene production. These analyses showed that a subset of 213 regulated genes was sufficient to create a predictive model of isoprene production under different conditions and showed correlations at the transcriptional level. We conclude that gene expression levels alone are sufficiently informative about the metabolic state of a cell that produces increased isoprene and can be used to build a model that accurately predicts production of this secondary metabolite across many simulated environmental conditions.

  16. Coregulation of Terpenoid Pathway Genes and Prediction of Isoprene Production in Bacillus subtilis Using Transcriptomics

    PubMed Central

    Hess, Becky M.; Xue, Junfeng; Markillie, Lye Meng; Taylor, Ronald C.; Wiley, H. Steven; Ahring, Birgitte K.; Linggi, Bryan

    2013-01-01

    The isoprenoid pathway converts pyruvate to isoprene and related isoprenoid compounds in plants and some bacteria. Currently, this pathway is of great interest because of the critical role that isoprenoids play in basic cellular processes, as well as the industrial value of metabolites such as isoprene. Although the regulation of several pathway genes has been described, there is a paucity of information regarding system level regulation and control of the pathway. To address these limitations, we examined Bacillus subtilis grown under multiple conditions and determined the relationship between altered isoprene production and gene expression patterns. We found that with respect to the amount of isoprene produced, terpenoid genes fall into two distinct subsets with opposing correlations. The group whose expression levels positively correlated with isoprene production included dxs, which is responsible for the commitment step in the pathway, ispD, and two genes that participate in the mevalonate pathway, yhfS and pksG. The subset of terpenoid genes that inversely correlated with isoprene production included ispH, ispF, hepS, uppS, ispE, and dxr. A genome-wide partial least squares regression model was created to identify other genes or pathways that contribute to isoprene production. These analyses showed that a subset of 213 regulated genes was sufficient to create a predictive model of isoprene production under different conditions and showed correlations at the transcriptional level. We conclude that gene expression levels alone are sufficiently informative about the metabolic state of a cell that produces increased isoprene and can be used to build a model that accurately predicts production of this secondary metabolite across many simulated environmental conditions. PMID:23840410

  17. BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.

    PubMed

    Spinelli, Lionel; Carpentier, Sabrina; Montañana Sanchis, Frédéric; Dalod, Marc; Vu Manh, Thien-Phong

    2015-10-19

    Recent advances in the analysis of high-throughput expression data have led to the development of tools that scaled-up their focus from single-gene to gene set level. For example, the popular Gene Set Enrichment Analysis (GSEA) algorithm can detect moderate but coordinated expression changes of groups of presumably related genes between pairs of experimental conditions. This considerably improves extraction of information from high-throughput gene expression data. However, although many gene sets covering a large panel of biological fields are available in public databases, the ability to generate home-made gene sets relevant to one's biological question is crucial but remains a substantial challenge to most biologists lacking statistic or bioinformatic expertise. This is all the more the case when attempting to define a gene set specific of one condition compared to many other ones. Thus, there is a crucial need for an easy-to-use software for generation of relevant home-made gene sets from complex datasets, their use in GSEA, and the correction of the results when applied to multiple comparisons of many experimental conditions. We developed BubbleGUM (GSEA Unlimited Map), a tool that allows to automatically extract molecular signatures from transcriptomic data and perform exhaustive GSEA with multiple testing correction. One original feature of BubbleGUM notably resides in its capacity to integrate and compare numerous GSEA results into an easy-to-grasp graphical representation. We applied our method to generate transcriptomic fingerprints for murine cell types and to assess their enrichments in human cell types. This analysis allowed us to confirm homologies between mouse and human immunocytes. BubbleGUM is an open-source software that allows to automatically generate molecular signatures out of complex expression datasets and to assess directly their enrichment by GSEA on independent datasets. Enrichments are displayed in a graphical output that helps interpreting the results. This innovative methodology has recently been used to answer important questions in functional genomics, such as the degree of similarities between microarray datasets from different laboratories or with different experimental models or clinical cohorts. BubbleGUM is executable through an intuitive interface so that both bioinformaticians and biologists can use it. It is available at http://www.ciml.univ-mrs.fr/applications/BubbleGUM/index.html .

  18. Mouse Models of Genomic Syndromes as Tools for Understanding the Basis of Complex Traits: An Example with the Smith-Magenis and the Potocki-Lupski Syndromes

    PubMed Central

    Carmona-Mora, P; Molina, J; Encina, C.A; Walz, K

    2009-01-01

    Each human's genome is distinguished by extra and missing DNA that can be “benign” or powerfully impact everything from development to disease. In the case of genomic disorders DNA rearrangements, such as deletions or duplications, correlate with a clinical specific phenotype. The clinical presentations of genomic disorders were thought to result from altered gene copy number of physically linked dosage sensitive genes. Genomic disorders are frequent diseases (~1 per 1,000 births). Smith-Magenis syndrome (SMS) and Potocki-Lupski syndrome (PTLS) are genomic disorders, associated with a deletion and a duplication, of 3.7 Mb respectively, within chromosome 17 band p11.2. This region includes 23 genes. Both syndromes have complex and distinctive phenotypes including multiple congenital and neurobehavioral abnormalities. Human chromosome 17p11.2 is syntenic to the 32-34 cM region of murine chromosome 11. The number and order of the genes are highly conserved. In this review, we will exemplify how genomic disorders can be modeled in mice and the advantages that such models can give in the study of genomic disorders in particular and gene copy number variation (CNV) in general. The contributions of the SMS and PTLS animal models in several aspects ranging from more specific ones, as the definition of the clinical aspects of the human clinical spectrum, the identification of dosage sensitive genes related to the human syndromes, to the more general contributions as the definition of genetic locus impacting obesity and behavior and the elucidation of general mechanisms related to the pathogenesis of gene CNV are discussed. PMID:19949547

  19. Array data extractor (ADE): a LabVIEW program to extract and merge gene array data.

    PubMed

    Kurtenbach, Stefan; Kurtenbach, Sarah; Zoidl, Georg

    2013-12-01

    Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Although existing software allows for complex data analyses, the LabVIEW based program presented here, "Array Data Extractor (ADE)", provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge.

  20. Evidence for major gene inheritance of Alzheimer disease in families of patients with and without Apolipoprotein E {epsilon}4

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, V.S.; Auerbach, S.A.; Farrer, L.A.

    1996-09-01

    Apolipoprotein E (APOE) genotype is the single most important determinant to the common form of Alzheimer disease (AD) yet identified. Several studies show that family history of AD is not entirely accounted for by APOE genotype. Also, there is evidence for an interaction between APOE genotype and gender. We carried out a complex segregation analysis in 636 nuclear families of consecutively ascertained and rigorously diagnosed probands in the Multi-Institutional Research in Alzheimer Genetic Epidemiology study in order to derive models of disease transmission which account for the influences of APOE genotype of the proband and gender. In the total groupmore » of families, models postulating sporadic occurrence, no major gene effect, random environmental transmission, and Mendelian inheritance were rejected. Transmission of AD in families of probands with at least one {epsilon}4 allele best fit a dominant model. Moreover, single gene inheritance best explained clustering of the disorder in families of probands lacking E4, but a more complex genetic model or multiple genetic models may ultimately account for risk in this group of families. Our results also suggest that susceptibility to AD differs between men and women regardless of the proband`s APOE status. Assuming a dominant model, AD appears to be completely penetrant in women, whereas only 62%-65% of men with predisposing genotypes develop AD. However, parameter estimates from the arbitrary major gene model suggests that AD is expressed dominantly in women and additively in men. These observations, taken together with epidemiologic data, are consistent with the hypothesis of an interaction between genes and other biological factors affecting disease susceptibility. 76 refs., 4 tabs.« less

  1. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  2. Selective sweep on human amylase genes postdates the split with Neanderthals

    PubMed Central

    Inchley, Charlotte E.; Larbey, Cynthia D. A.; Shwan, Nzar A. A.; Pagani, Luca; Saag, Lauri; Antão, Tiago; Jacobs, Guy; Hudjashov, Georgi; Metspalu, Ene; Mitt, Mario; Eichstaedt, Christina A.; Malyarchuk, Boris; Derenko, Miroslava; Wee, Joseph; Abdullah, Syafiq; Ricaut, François-Xavier; Mormina, Maru; Mägi, Reedik; Villems, Richard; Metspalu, Mait; Jones, Martin K.; Armour, John A. L.; Kivisild, Toomas

    2016-01-01

    Humans have more copies of amylase genes than other primates. It is still poorly understood, however, when the copy number expansion occurred and whether its spread was enhanced by selection. Here we assess amylase copy numbers in a global sample of 480 high coverage genomes and find that regions flanking the amylase locus show notable depression of genetic diversity both in African and non-African populations. Analysis of genetic variation in these regions supports the model of an early selective sweep in the human lineage after the split of humans from Neanderthals which led to the fixation of multiple copies of AMY1 in place of a single copy. We find evidence of multiple secondary losses of copy number with the highest frequency (52%) of a deletion of AMY2A and associated low copy number of AMY1 in Northeast Siberian populations whose diet has been low in starch content. PMID:27853181

  3. Selective sweep on human amylase genes postdates the split with Neanderthals.

    PubMed

    Inchley, Charlotte E; Larbey, Cynthia D A; Shwan, Nzar A A; Pagani, Luca; Saag, Lauri; Antão, Tiago; Jacobs, Guy; Hudjashov, Georgi; Metspalu, Ene; Mitt, Mario; Eichstaedt, Christina A; Malyarchuk, Boris; Derenko, Miroslava; Wee, Joseph; Abdullah, Syafiq; Ricaut, François-Xavier; Mormina, Maru; Mägi, Reedik; Villems, Richard; Metspalu, Mait; Jones, Martin K; Armour, John A L; Kivisild, Toomas

    2016-11-17

    Humans have more copies of amylase genes than other primates. It is still poorly understood, however, when the copy number expansion occurred and whether its spread was enhanced by selection. Here we assess amylase copy numbers in a global sample of 480 high coverage genomes and find that regions flanking the amylase locus show notable depression of genetic diversity both in African and non-African populations. Analysis of genetic variation in these regions supports the model of an early selective sweep in the human lineage after the split of humans from Neanderthals which led to the fixation of multiple copies of AMY1 in place of a single copy. We find evidence of multiple secondary losses of copy number with the highest frequency (52%) of a deletion of AMY2A and associated low copy number of AMY1 in Northeast Siberian populations whose diet has been low in starch content.

  4. Genome-wide network analysis of Wnt signaling in three pediatric cancers

    NASA Astrophysics Data System (ADS)

    Bao, Ju; Lee, Ho-Jin; Zheng, Jie J.

    2013-10-01

    Genomic structural alteration is common in pediatric cancers, and analysis of data generated by the Pediatric Cancer Genome Project reveals such tumor-related alterations in many Wnt signaling-associated genes. Most pediatric cancers are thought to arise within developing tissues that undergo substantial expansion during early organ formation, growth and maturation, and Wnt signaling plays an important role in this development. We examined three pediatric tumors--medullobastoma, early T-cell precursor acute lymphoblastic leukemia, and retinoblastoma--that show multiple genomic structural variations within Wnt signaling pathways. We mathematically modeled this pathway to investigate the effects of cancer-related structural variations on Wnt signaling. Surprisingly, we found that an outcome measure of canonical Wnt signaling was consistently similar in matched cancer cells and normal cells, even in the context of different cancers, different mutations, and different Wnt-related genes. Our results suggest that the cancer cells maintain a normal level of Wnt signaling by developing multiple mutations.

  5. The catechol-O-methyltransferase gene (COMT) and cognitive function from childhood through adolescence

    PubMed Central

    Gaysina, Darya; Xu, Man K.; Barnett, Jennifer H.; Croudace, Tim J.; Wong, Andrew; Richards, Marcus; Jones, Peter B.

    2013-01-01

    Genetic variation in the catechol-O-methyltransferase gene (COMT) can influence cognitive function, and this effect may depend on developmental stage. Using a large representative British birth cohort, we investigated the effect of COMT on cognitive function (verbal and non-verbal) at ages 8 and 15 years taking into account the possible modifying effect of pubertal stage. Five functional COMT polymorphisms, rs6269, rs4818, rs4680, rs737865 and rs165599 were analysed. Associations between COMT polymorphisms and cognition were tested using regression and latent variable structural equation modelling (SEM). Before correction for multiple testing, COMT rs737865 showed association with reading comprehension, verbal ability and global cognition at age 15 years in pubescent boys only. Although there was some evidence for age- and sex-specific effects of the COMT rs737865 none remained significant after correction for multiple testing. Further studies are necessary in order to make firmer conclusions. PMID:23178897

  6. Multiple Testing in the Context of Gene Discovery in Sickle Cell Disease Using Genome-Wide Association Studies.

    PubMed

    Kuo, Kevin H M

    2017-01-01

    The issue of multiple testing, also termed multiplicity, is ubiquitous in studies where multiple hypotheses are tested simultaneously. Genome-wide association study (GWAS), a type of genetic association study that has gained popularity in the past decade, is most susceptible to the issue of multiple testing. Different methodologies have been employed to address the issue of multiple testing in GWAS. The purpose of the review is to examine the methodologies employed in dealing with multiple testing in the context of gene discovery using GWAS in sickle cell disease complications.

  7. Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells

    PubMed Central

    Borel, Christelle; Mudge, Jonathan M.; Howald, Cédric; Foissac, Sylvain; Ucla, Catherine; Chrast, Jacqueline; Ribeca, Paolo; Martin, David; Murray, Ryan R.; Yang, Xinping; Ghamsari, Lila; Lin, Chenwei; Bell, Ian; Dumais, Erica; Drenkow, Jorg; Tress, Michael L.; Gelpí, Josep Lluís; Orozco, Modesto; Valencia, Alfonso; van Berkum, Nynke L.; Lajoie, Bryan R.; Vidal, Marc; Stamatoyannopoulos, John; Batut, Philippe; Dobin, Alex; Harrow, Jennifer; Hubbard, Tim; Dekker, Job; Frankish, Adam; Salehi-Ashtiani, Kourosh; Reymond, Alexandre; Antonarakis, Stylianos E.; Guigó, Roderic; Gingeras, Thomas R.

    2012-01-01

    The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network. PMID:22238572

  8. Gene: a gene-centered information resource at NCBI.

    PubMed

    Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D

    2015-01-01

    The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  9. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.

    PubMed

    Wei, Pi-Jing; Zhang, Di; Xia, Junfeng; Zheng, Chun-Hou

    2016-12-23

    Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each individual. Correspondingly, one of the key challenges is to pinpoint functional driver mutations or pathways, which contributes to tumorigenesis, from millions of functional neutral passenger mutations. In this paper, in order to identify driver genes effectively, we applied a generalized additive model to mutation profiles to filter genes with long length and constructed a new gene-gene interaction network. Then we integrated the mutation data and expression data into the gene-gene interaction network. Lastly, greedy algorithm was used to prioritize candidate driver genes from the integrated data. We named the proposed method Length-Net-Driver (LNDriver). Experiments on three TCGA datasets, i.e., head and neck squamous cell carcinoma, kidney renal clear cell carcinoma and thyroid carcinoma, demonstrated that the proposed method was effective. Also, it can identify not only frequently mutated drivers, but also rare candidate driver genes.

  10. Transcription Profiles Reveal Sugar and Hormone Signaling Pathways Mediating Flower Induction in Apple (Malus domestica Borkh.).

    PubMed

    Xing, Li-Bo; Zhang, Dong; Li, You-Mei; Shen, Ya-Wen; Zhao, Cai-Ping; Ma, Juan-Juan; An, Na; Han, Ming-Yu

    2015-10-01

    Flower induction in apple (Malus domestica Borkh.) is regulated by complex gene networks that involve multiple signal pathways to ensure flower bud formation in the next year, but the molecular determinants of apple flower induction are still unknown. In this research, transcriptomic profiles from differentiating buds allowed us to identify genes potentially involved in signaling pathways that mediate the regulatory mechanisms of flower induction. A hypothetical model for this regulatory mechanism was obtained by analysis of the available transcriptomic data, suggesting that sugar-, hormone- and flowering-related genes, as well as those involved in cell-cycle induction, participated in the apple flower induction process. Sugar levels and metabolism-related gene expression profiles revealed that sucrose is the initiation signal in flower induction. Complex hormone regulatory networks involved in cytokinin (CK), abscisic acid (ABA) and gibberellic acid pathways also induce apple flower formation. CK plays a key role in the regulation of cell formation and differentiation, and in affecting flowering-related gene expression levels during these processes. Meanwhile, ABA levels and ABA-related gene expression levels gradually increased, as did those of sugar metabolism-related genes, in developing buds, indicating that ABA signals regulate apple flower induction by participating in the sugar-mediated flowering pathway. Furthermore, changes in sugar and starch deposition levels in buds can be affected by ABA content and the expression of the genes involved in the ABA signaling pathway. Thus, multiple pathways, which are mainly mediated by crosstalk between sugar and hormone signals, regulate the molecular network involved in bud growth and flower induction in apple trees. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  11. Nipbl and mediator cooperatively regulate gene expression to control limb development.

    PubMed

    Muto, Akihiko; Ikeda, Shingo; Lopez-Burks, Martha E; Kikuchi, Yutaka; Calof, Anne L; Lander, Arthur D; Schilling, Thomas F

    2014-09-01

    Haploinsufficiency for Nipbl, a cohesin loading protein, causes Cornelia de Lange Syndrome (CdLS), the most common "cohesinopathy". It has been proposed that the effects of Nipbl-haploinsufficiency result from disruption of long-range communication between DNA elements. Here we use zebrafish and mouse models of CdLS to examine how transcriptional changes caused by Nipbl deficiency give rise to limb defects, a common condition in individuals with CdLS. In the zebrafish pectoral fin (forelimb), knockdown of Nipbl expression led to size reductions and patterning defects that were preceded by dysregulated expression of key early limb development genes, including fgfs, shha, hand2 and multiple hox genes. In limb buds of Nipbl-haploinsufficient mice, transcriptome analysis revealed many similar gene expression changes, as well as altered expression of additional classes of genes that play roles in limb development. In both species, the pattern of dysregulation of hox-gene expression depended on genomic location within the Hox clusters. In view of studies suggesting that Nipbl colocalizes with the mediator complex, which facilitates enhancer-promoter communication, we also examined zebrafish deficient for the Med12 Mediator subunit, and found they resembled Nipbl-deficient fish in both morphology and gene expression. Moreover, combined partial reduction of both Nipbl and Med12 had a strongly synergistic effect, consistent with both molecules acting in a common pathway. In addition, three-dimensional fluorescent in situ hybridization revealed that Nipbl and Med12 are required to bring regions containing long-range enhancers into close proximity with the zebrafish hoxda cluster. These data demonstrate a crucial role for Nipbl in limb development, and support the view that its actions on multiple gene pathways result from its influence, together with Mediator, on regulation of long-range chromosomal interactions.

  12. Identification of type 2 diabetes-associated combination of SNPs using support vector machine.

    PubMed

    Ban, Hyo-Jeong; Heo, Jee Yeon; Oh, Kyung-Soo; Park, Keun-Joon

    2010-04-23

    Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.

  13. Identification of a reference gene for the quantification of mRNA and miRNA expression during skin wound healing.

    PubMed

    Etich, Julia; Bergmeier, Vera; Pitzler, Lena; Brachvogel, Bent

    2017-03-01

    Wound healing is a coordinated process to restore tissue homeostasis and reestablish the protective barrier of the skin. miRNAs may modulate the expression of target genes to contribute to repair processes, but due to the complexity of the tissue it is challenging to quantify gene expression during the distinct phases of wound repair. Here, we aimed to identify a common reference gene to quantify changes in miRNA and mRNA expression during skin wound healing. Quantitative real-time PCR and bioinformatic analysis tools were used to identify suitable reference genes during skin repair and their reliability was tested by studying the expression of mRNAs and miRNAs. Morphological assessment of wounds showed that the injury model recapitulates the distinct phases of skin repair. Non-degraded RNA could be isolated from skin and wounds and used to study the expression of non-coding small nuclear RNAs during wound healing. Among those, RNU6B was most constantly expressed during skin repair. Using this reference gene we could confirm the transient upregulation of IL-1β and PTPRC/CD45 during the early phase as well as the increased expression of collagen type I at later stages of repair and validate the differential expression of miR-204, miR-205, and miR-31 in skin wounds. In contrast to Gapdh the normalization to multiple reference genes gave a similar outcome. RNU6B is an accurate alternative normalizer to quantify mRNA and miRNA expression during the distinct phases of skin wound healing when analysis of multiple reference genes is not feasible.

  14. Genes-environment interactions in obesity- and diabetes-associated pancreatic cancer: a GWAS data analysis.

    PubMed

    Tang, Hongwei; Wei, Peng; Duell, Eric J; Risch, Harvey A; Olson, Sara H; Bueno-de-Mesquita, H Bas; Gallinger, Steven; Holly, Elizabeth A; Petersen, Gloria M; Bracci, Paige M; McWilliams, Robert R; Jenab, Mazda; Riboli, Elio; Tjønneland, Anne; Boutron-Ruault, Marie Christine; Kaaks, Rudolf; Trichopoulos, Dimitrios; Panico, Salvatore; Sund, Malin; Peeters, Petra H M; Khaw, Kay-Tee; Amos, Christopher I; Li, Donghui

    2014-01-01

    Obesity and diabetes are potentially alterable risk factors for pancreatic cancer. Genetic factors that modify the associations of obesity and diabetes with pancreatic cancer have previously not been examined at the genome-wide level. Using genome-wide association studies (GWAS) genotype and risk factor data from the Pancreatic Cancer Case Control Consortium, we conducted a discovery study of 2,028 cases and 2,109 controls to examine gene-obesity and gene-diabetes interactions in relation to pancreatic cancer risk by using the likelihood-ratio test nested in logistic regression models and Ingenuity Pathway Analysis (IPA). After adjusting for multiple comparisons, a significant interaction of the chemokine signaling pathway with obesity (P = 3.29 × 10(-6)) and a near significant interaction of calcium signaling pathway with diabetes (P = 1.57 × 10(-4)) in modifying the risk of pancreatic cancer were observed. These findings were supported by results from IPA analysis of the top genes with nominal interactions. The major contributing genes to the two top pathways include GNGT2, RELA, TIAM1, and GNAS. None of the individual genes or single-nucleotide polymorphism (SNP) except one SNP remained significant after adjusting for multiple testing. Notably, SNP rs10818684 of the PTGS1 gene showed an interaction with diabetes (P = 7.91 × 10(-7)) at a false discovery rate of 6%. Genetic variations in inflammatory response and insulin resistance may affect the risk of obesity- and diabetes-related pancreatic cancer. These observations should be replicated in additional large datasets. A gene-environment interaction analysis may provide new insights into the genetic susceptibility and molecular mechanisms of obesity- and diabetes-related pancreatic cancer.

  15. Gene expression changes consistent with neuroAIDS and impaired working memory in HIV-1 transgenic rats

    PubMed Central

    2014-01-01

    Background A thorough investigation of the neurobiology of HIV-induced neuronal dysfunction and its evolving phenotype in the setting of viral suppression has been limited by the lack of validated small animal models to probe the effects of concomitant low level expression of multiple HIV-1 products in disease-relevant cells in the CNS. Results We report the results of gene expression profiling of the hippocampus of HIV-1 Tg rats, a rodent model of HIV infection in which multiple HIV-1 proteins are expressed under the control of the viral LTR promoter in disease-relevant cells including microglia and astrocytes. The Gene Set Enrichment Analysis (GSEA) algorithm was used for pathway analysis. Gene expression changes observed are consistent with astrogliosis and microgliosis and include evidence of inflammation and cell proliferation. Among the genes with increased expression in HIV-1 Tg rats was the interferon stimulated gene 15 (ISG-15), which was previously shown to be increased in the cerebrospinal fluid (CSF) of HIV patients and to correlate with neuropsychological impairment and neuropathology, and prostaglandin D2 (PGD2) synthase (Ptgds), which has been associated with immune activation and the induction of astrogliosis and microgliosis. GSEA-based pathway analysis highlighted a broad dysregulation of genes involved in neuronal trophism and neurodegenerative disorders. Among the latter are genesets associated with Huntington’s disease, Parkinson’s disease, mitochondrial, peroxisome function, and synaptic trophism and plasticity, such as IGF, ErbB and netrin signaling and the PI3K signal transduction pathway, a mediator of neural plasticity and of a vast array of trophic signals. Additionally, gene expression analyses also show altered lipid metabolism and peroxisomes dysfunction. Supporting the functional significance of these gene expression alterations, HIV-1 Tg rats showed working memory impairments in spontaneous alternation behavior in the T-Maze, a paradigm sensitive to prefrontal cortex and hippocampal function. Conclusions Altogether, differentially regulated genes and pathway analysis identify specific pathways that can be targeted therapeutically to increase trophic support, e.g. IGF, ErbB and netrin signaling, and reduce neuroinflammation, e.g. PGD2 synthesis, which may be beneficial in the treatment of chronic forms of HIV-associated neurocognitive disorders in the setting of viral suppression. PMID:24980976

  16. Novel application of multi-stimuli network inference to synovial fibroblasts of rheumatoid arthritis patients

    PubMed Central

    2014-01-01

    Background Network inference of gene expression data is an important challenge in systems biology. Novel algorithms may provide more detailed gene regulatory networks (GRN) for complex, chronic inflammatory diseases such as rheumatoid arthritis (RA), in which activated synovial fibroblasts (SFBs) play a major role. Since the detailed mechanisms underlying this activation are still unclear, simultaneous investigation of multi-stimuli activation of SFBs offers the possibility to elucidate the regulatory effects of multiple mediators and to gain new insights into disease pathogenesis. Methods A GRN was therefore inferred from RA-SFBs treated with 4 different stimuli (IL-1 β, TNF- α, TGF- β, and PDGF-D). Data from time series microarray experiments (0, 1, 2, 4, 12 h; Affymetrix HG-U133 Plus 2.0) were batch-corrected applying ‘ComBat’, analyzed for differentially expressed genes over time with ‘Limma’, and used for the inference of a robust GRN with NetGenerator V2.0, a heuristic ordinary differential equation-based method with soft integration of prior knowledge. Results Using all genes differentially expressed over time in RA-SFBs for any stimulus, and selecting the genes belonging to the most significant gene ontology (GO) term, i.e., ‘cartilage development’, a dynamic, robust, moderately complex multi-stimuli GRN was generated with 24 genes and 57 edges in total, 31 of which were gene-to-gene edges. Prior literature-based knowledge derived from Pathway Studio or manual searches was reflected in the final network by 25/57 confirmed edges (44%). The model contained known network motifs crucial for dynamic cellular behavior, e.g., cross-talk among pathways, positive feed-back loops, and positive feed-forward motifs (including suppression of the transcriptional repressor OSR2 by all 4 stimuli. Conclusion A multi-stimuli GRN highly concordant with literature data was successfully generated by network inference from the gene expression of stimulated RA-SFBs. The GRN showed high reliability, since 10 predicted edges were independently validated by literature findings post network inference. The selected GO term ‘cartilage development’ contained a number of differentiation markers, growth factors, and transcription factors with potential relevance for RA. Finally, the model provided new insight into the response of RA-SFBs to multiple stimuli implicated in the pathogenesis of RA, in particular to the ‘novel’ potent growth factor PDGF-D. PMID:24989895

  17. Multiple Functional Domains of Enterococcus faecalis Aggregation Substance Asc10 Contribute to Endocarditis Virulence ▿ †

    PubMed Central

    Chuang, Olivia N.; Schlievert, Patrick M.; Wells, Carol L.; Manias, Dawn A.; Tripp, Timothy J.; Dunny, Gary M.

    2009-01-01

    Aggregation substance proteins encoded by sex pheromone plasmids increase the virulence of Enterococcus faecalis in experimental pathogenesis models, including infectious endocarditis models. These large surface proteins may contain multiple functional domains involved in various interactions with other bacterial cells and with the mammalian host. Aggregation substance Asc10, encoded by plasmid pCF10, is induced during growth in the mammalian bloodstream, and pCF10 carriage gives E. faecalis a significant selective advantage in this environment. We employed a rabbit model to investigate the role of various functional domains of Asc10 in endocarditis. The data suggested that the bacterial load of the infected tissue was the best indicator of virulence. Isogenic strains carrying either no plasmid, wild-type pCF10, a pCF10 derivative with an in-frame deletion of the prgB gene encoding Asc10, or pCF10 derivatives expressing other alleles of prgB were examined in this model. Previously identified aggregation domains contributed to the virulence associated with the wild-type protein, and a strain expressing an Asc10 derivative in which glycine residues in two RGD motifs were changed to alanine residues showed the greatest reduction in virulence. Remarkably, this strain and the strain carrying the pCF10 derivative with the in-frame deletion of prgB were both significantly less virulent than an isogenic plasmid-free strain. The data demonstrate that multiple functional domains are important in Asc10-mediated interactions with the host during the course of experimental endocarditis and that in the absence of a functional prgB gene, pCF10 carriage is actually disadvantageous in vivo. PMID:18955479

  18. DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates

    PubMed Central

    Peng, Hao; Yang, Yifan; Zhe, Shandian; Wang, Jian; Gribskov, Michael; Qi, Yuan

    2017-01-01

    Abstract Motivation High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy. Results We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature. Availability and implementation The software is available at https://github.com/hao-peng/DEIsoM Contact pengh@alumni.purdue.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28595376

  19. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    PubMed

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Adaptive Horizontal Gene Transfers between Multiple Cheese-Associated Fungi.

    PubMed

    Ropars, Jeanne; Rodríguez de la Vega, Ricardo C; López-Villavicencio, Manuela; Gouzy, Jérôme; Sallet, Erika; Dumas, Émilie; Lacoste, Sandrine; Debuchy, Robert; Dupont, Joëlle; Branca, Antoine; Giraud, Tatiana

    2015-10-05

    Domestication is an excellent model for studies of adaptation because it involves recent and strong selection on a few, identified traits [1-5]. Few studies have focused on the domestication of fungi, with notable exceptions [6-11], despite their importance to bioindustry [12] and to a general understanding of adaptation in eukaryotes [5]. Penicillium fungi are ubiquitous molds among which two distantly related species have been independently selected for cheese making-P. roqueforti for blue cheeses like Roquefort and P. camemberti for soft cheeses like Camembert. The selected traits include morphology, aromatic profile, lipolytic and proteolytic activities, and ability to grow at low temperatures, in a matrix containing bacterial and fungal competitors [13-15]. By comparing the genomes of ten Penicillium species, we show that adaptation to cheese was associated with multiple recent horizontal transfers of large genomic regions carrying crucial metabolic genes. We identified seven horizontally transferred regions (HTRs) spanning more than 10 kb each, flanked by specific transposable elements, and displaying nearly 100% identity between distant Penicillium species. Two HTRs carried genes with functions involved in the utilization of cheese nutrients or competition and were found nearly identical in multiple strains and species of cheese-associated Penicillium fungi, indicating recent selective sweeps; they were experimentally associated with faster growth and greater competitiveness on cheese and contained genes highly expressed in the early stage of cheese maturation. These findings have industrial and food safety implications and improve our understanding of the processes of adaptation to rapid environmental changes. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. Adaptive Horizontal Gene Transfers between Multiple Cheese-Associated Fungi

    PubMed Central

    Ropars, Jeanne; Rodríguez de la Vega, Ricardo C.; López-Villavicencio, Manuela; Gouzy, Jérôme; Sallet, Erika; Dumas, Émilie; Lacoste, Sandrine; Debuchy, Robert; Dupont, Joëlle; Branca, Antoine; Giraud, Tatiana

    2015-01-01

    Summary Domestication is an excellent model for studies of adaptation because it involves recent and strong selection on a few, identified traits [1–5]. Few studies have focused on the domestication of fungi, with notable exceptions [6–11], despite their importance to bioindustry [12] and to a general understanding of adaptation in eukaryotes [5]. Penicillium fungi are ubiquitous molds among which two distantly related species have been independently selected for cheese making—P. roqueforti for blue cheeses like Roquefort and P. camemberti for soft cheeses like Camembert. The selected traits include morphology, aromatic profile, lipolytic and proteolytic activities, and ability to grow at low temperatures, in a matrix containing bacterial and fungal competitors [13–15]. By comparing the genomes of ten Penicillium species, we show that adaptation to cheese was associated with multiple recent horizontal transfers of large genomic regions carrying crucial metabolic genes. We identified seven horizontally transferred regions (HTRs) spanning more than 10 kb each, flanked by specific transposable elements, and displaying nearly 100% identity between distant Penicillium species. Two HTRs carried genes with functions involved in the utilization of cheese nutrients or competition and were found nearly identical in multiple strains and species of cheese-associated Penicillium fungi, indicating recent selective sweeps; they were experimentally associated with faster growth and greater competitiveness on cheese and contained genes highly expressed in the early stage of cheese maturation. These findings have industrial and food safety implications and improve our understanding of the processes of adaptation to rapid environmental changes. PMID:26412136

  2. Interplay between cardiac transcription factors and non-coding RNAs in predisposing to atrial fibrillation.

    PubMed

    Mikhailov, Alexander T; Torrado, Mario

    2018-05-12

    There is growing evidence that putative gene regulatory networks including cardio-enriched transcription factors, such as PITX2, TBX5, ZFHX3, and SHOX2, and their effector/target genes along with downstream non-coding RNAs can play a potentially important role in the process of adaptive and maladaptive atrial rhythm remodeling. In turn, expression of atrial fibrillation-associated transcription factors is under the control of upstream regulatory non-coding RNAs. This review broadly explores gene regulatory mechanisms associated with susceptibility to atrial fibrillation-with key examples from both animal models and patients-within the context of both cardiac transcription factors and non-coding RNAs. These two systems appear to have multiple levels of cross-regulation and act coordinately to achieve effective control of atrial rhythm effector gene expression. Perturbations of a dynamic expression balance between transcription factors and corresponding non-coding RNAs can provoke the development or promote the progression of atrial fibrillation. We also outline deficiencies in current models and discuss ongoing studies to clarify remaining mechanistic questions. An understanding of the function of transcription factors and non-coding RNAs in gene regulatory networks associated with atrial fibrillation risk will enable the development of innovative therapeutic strategies.

  3. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine.

    PubMed

    Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida

    2016-03-15

    Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.

  4. Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening.

    PubMed

    Mazoure, Bogdan; Caraus, Iurie; Nadon, Robert; Makarenkov, Vladimir

    2018-06-01

    Data generated by high-throughput screening (HTS) technologies are prone to spatial bias. Traditionally, bias correction methods used in HTS assume either a simple additive or, more recently, a simple multiplicative spatial bias model. These models do not, however, always provide an accurate correction of measurements in wells located at the intersection of rows and columns affected by spatial bias. The measurements in these wells depend on the nature of interaction between the involved biases. Here, we propose two novel additive and two novel multiplicative spatial bias models accounting for different types of bias interactions. We describe a statistical procedure that allows for detecting and removing different types of additive and multiplicative spatial biases from multiwell plates. We show how this procedure can be applied by analyzing data generated by the four HTS technologies (homogeneous, microorganism, cell-based, and gene expression HTS), the three high-content screening (HCS) technologies (area, intensity, and cell-count HCS), and the only small-molecule microarray technology available in the ChemBank small-molecule screening database. The proposed methods are included in the AssayCorrector program, implemented in R, and available on CRAN.

  5. Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules.

    PubMed

    Curtis, Ross E; Kim, Seyoung; Woolford, John L; Xu, Wenjie; Xing, Eric P

    2013-03-21

    Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.

  6. Proposed model for the high rate of rearrangement and rapid migration observed in some IncA/C plasmid lineages

    USDA-ARS?s Scientific Manuscript database

    IncA/C plasmids are a class of plasmids from Enterobacteraciae that are relatively large (49 to >180 kbp), are readily transferred by conjugation, and carry multiple antimicrobial resistance genes. Reconstruction of the phylogeny of these plasmids has been difficult because of the high rate of remo...

  7. Commentary: Gene by Environment Interplay and Psychopathology--In Search of a Paradigm

    ERIC Educational Resources Information Center

    Nigg, Joel T.

    2013-01-01

    The articles in this Special Issue (SI) extend research on G×E in multiple ways, showing the growing importance of specifying kinds of G×E models (e.g., bioecological, susceptibility, stress-diathesis), incorporation of sophisticated ways of measuring types of G×E correlations (rGE), checking effects of statistical artifact, exemplifying an…

  8. Transcription of the herpes simplex virus 1 genome during productive and quiescent infection of neuronal and nonneuronal cells.

    PubMed

    Harkness, Justine M; Kader, Muhamuda; DeLuca, Neal A

    2014-06-01

    Herpes simplex virus 1 (HSV-1) can undergo a productive infection in nonneuronal and neuronal cells such that the genes of the virus are transcribed in an ordered cascade. HSV-1 can also establish a more quiescent or latent infection in peripheral neurons, where gene expression is substantially reduced relative to that in productive infection. HSV mutants defective in multiple immediate early (IE) gene functions are highly defective for later gene expression and model some aspects of latency in vivo. We compared the expression of wild-type (wt) virus and IE gene mutants in nonneuronal cells (MRC5) and adult murine trigeminal ganglion (TG) neurons using the Illumina platform for cDNA sequencing (RNA-seq). RNA-seq analysis of wild-type virus revealed that expression of the genome mostly followed the previously established kinetics, validating the method, while highlighting variations in gene expression within individual kinetic classes. The accumulation of immediate early transcripts differed between MRC5 cells and neurons, with a greater abundance in neurons. Analysis of a mutant defective in all five IE genes (d109) showed dysregulated genome-wide low-level transcription that was more highly attenuated in MRC5 cells than in TG neurons. Furthermore, a subset of genes in d109 was more abundantly expressed over time in neurons. While the majority of the viral genome became relatively quiescent, the latency-associated transcript was specifically upregulated. Unexpectedly, other genes within repeat regions of the genome, as well as the unique genes just adjacent the repeat regions, also remained relatively active in neurons. The relative permissiveness of TG neurons to viral gene expression near the joint region is likely significant during the establishment and reactivation of latency. During productive infection, the genes of HSV-1 are transcribed in an ordered cascade. HSV can also establish a more quiescent or latent infection in peripheral neurons. HSV mutants defective in multiple immediate early (IE) genes establish a quiescent infection that models aspects of latency in vivo. We simultaneously quantified the expression of all the HSV genes in nonneuronal and neuronal cells by RNA-seq analysis. The results for productive infection shed further light on the nature of genes and promoters of different kinetic classes. In quiescent infection, there was greater transcription across the genome in neurons than in nonneuronal cells. In particular, the transcription of the latency-associated transcript (LAT), IE genes, and genes in the unique regions adjacent to the repeats persisted in neurons. The relative activity of this region of the genome in the absence of viral activators suggests a more dynamic state for quiescent genomes persisting in neurons. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  9. Transduction of a Foreign Histocompatibility Gene into the Arterial Wall Induces Vasculitis

    NASA Astrophysics Data System (ADS)

    Nabel, Elizabeth G.; Plautz, Gregory; Nabel, Gary J.

    1992-06-01

    Autoimmune vasculitis represents a disease characterized by focal inflammation within arteries at multiple sites in the vasculature. Therapeutic interventions in this disease are empirical and often unsuccessful, and the mechanisms of immune injury are not well-defined. The direct transfer of recombinant genes and their expression in the arterial wall provides an opportunity to explore the pathogenesis and treatment of vascular disease. In this report, an animal model for vasculitis has been developed. Inflammation has been elicited by direct gene transfer of a foreign class I major histocompatibility complex gene, HLA-B7, to specific sites in porcine arteries. Transfer and expression of this recombinant gene was confirmed by a polymerase chain reaction and immunohistochemistry, and cytolytic T cells specific for HLA-B7 were detected. These findings demonstrate that expression of a recombinant gene in the vessel wall can induce a focal immune response and suggest that vessel damage induced by cell-mediated immune injury can initiate vasculitis.

  10. The evolutionary landscape of intergenic trans-splicing events in insects

    PubMed Central

    Kong, Yimeng; Zhou, Hongxia; Yu, Yao; Chen, Longxian; Hao, Pei; Li, Xuan

    2015-01-01

    To explore the landscape of intergenic trans-splicing events and characterize their functions and evolutionary dynamics, we conduct a mega-data study of a phylogeny containing eight species across five orders of class Insecta, a model system spanning 400 million years of evolution. A total of 1,627 trans-splicing events involving 2,199 genes are identified, accounting for 1.58% of the total genes. Homology analysis reveals that mod(mdg4)-like trans-splicing is the only conserved event that is consistently observed in multiple species across two orders, which represents a unique case of functional diversification involving trans-splicing. Thus, evolutionarily its potential for generating proteins with novel function is not broadly utilized by insects. Furthermore, 146 non-mod trans-spliced transcripts are found to resemble canonical genes from different species. Trans-splicing preserving the function of ‘breakup' genes may serve as a general mechanism for relaxing the constraints on gene structure, with profound implications for the evolution of genes and genomes. PMID:26521696

  11. Spatially coordinated dynamic gene transcription in living pituitary tissue

    PubMed Central

    Featherstone, Karen; Hey, Kirsty; Momiji, Hiroshi; McNamara, Anne V; Patist, Amanda L; Woodburn, Joanna; Spiller, David G; Christian, Helen C; McNeilly, Alan S; Mullins, John J; Finkenstädt, Bärbel F; Rand, David A; White, Michael RH; Davis, Julian RE

    2016-01-01

    Transcription at individual genes in single cells is often pulsatile and stochastic. A key question emerges regarding how this behaviour contributes to tissue phenotype, but it has been a challenge to quantitatively analyse this in living cells over time, as opposed to studying snap-shots of gene expression state. We have used imaging of reporter gene expression to track transcription in living pituitary tissue. We integrated live-cell imaging data with statistical modelling for quantitative real-time estimation of the timing of switching between transcriptional states across a whole tissue. Multiple levels of transcription rate were identified, indicating that gene expression is not a simple binary ‘on-off’ process. Immature tissue displayed shorter durations of high-expressing states than the adult. In adult pituitary tissue, direct cell contacts involving gap junctions allowed local spatial coordination of prolactin gene expression. Our findings identify how heterogeneous transcriptional dynamics of single cells may contribute to overall tissue behaviour. DOI: http://dx.doi.org/10.7554/eLife.08494.001 PMID:26828110

  12. Outbred genome sequencing and CRISPR/Cas9 gene editing in butterflies

    PubMed Central

    Li, Xueyan; Fan, Dingding; Zhang, Wei; Liu, Guichun; Zhang, Lu; Zhao, Li; Fang, Xiaodong; Chen, Lei; Dong, Yang; Chen, Yuan; Ding, Yun; Zhao, Ruoping; Feng, Mingji; Zhu, Yabing; Feng, Yue; Jiang, Xuanting; Zhu, Deying; Xiang, Hui; Feng, Xikan; Li, Shuaicheng; Wang, Jun; Zhang, Guojie; Kronforst, Marcus R.; Wang, Wen

    2015-01-01

    Butterflies are exceptionally diverse but their potential as an experimental system has been limited by the difficulty of deciphering heterozygous genomes and a lack of genetic manipulation technology. Here we use a hybrid assembly approach to construct high-quality reference genomes for Papilio xuthus (contig and scaffold N50: 492 kb, 3.4 Mb) and Papilio machaon (contig and scaffold N50: 81 kb, 1.15 Mb), highly heterozygous species that differ in host plant affiliations, and adult and larval colour patterns. Integrating comparative genomics and analyses of gene expression yields multiple insights into butterfly evolution, including potential roles of specific genes in recent diversification. To functionally test gene function, we develop an efficient (up to 92.5%) CRISPR/Cas9 gene editing method that yields obvious phenotypes with three genes, Abdominal-B, ebony and frizzled. Our results provide valuable genomic and technological resources for butterflies and unlock their potential as a genetic model system. PMID:26354079

  13. JAK signaling globally counteracts heterochromatic gene silencing.

    PubMed

    Shi, Song; Calhoun, Healani C; Xia, Fan; Li, Jinghong; Le, Long; Li, Willis X

    2006-09-01

    The JAK/STAT pathway has pleiotropic roles in animal development, and its aberrant activation is implicated in multiple human cancers. JAK/STAT signaling effects have been attributed largely to direct transcriptional regulation by STAT of specific target genes that promote tumor cell proliferation or survival. We show here in a Drosophila melanogaster hematopoietic tumor model, however, that JAK overactivation globally disrupts heterochromatic gene silencing, an epigenetic tumor suppressive mechanism. This disruption allows derepression of genes that are not direct targets of STAT, as evidenced by suppression of heterochromatin-mediated position effect variegation. Moreover, mutations in the genes encoding heterochromatin components heterochromatin protein 1 (HP1) and Su(var)3-9 enhance tumorigenesis induced by an oncogenic JAK kinase without affecting JAK/STAT signaling. Consistently, JAK loss of function enhances heterochromatic gene silencing, whereas overexpressing HP1 suppresses oncogenic JAK-induced tumors. These results demonstrate that the JAK/STAT pathway regulates cellular epigenetic status and that globally disrupting heterochromatin-mediated tumor suppression is essential for tumorigenesis induced by JAK overactivation.

  14. JAK signaling globally counteracts heterochromatic gene silencing

    PubMed Central

    Shi, Song; Calhoun, Healani C; Xia, Fan; Li, Jinghong; Le, Long; Li, Willis X

    2011-01-01

    The JAK/STAT pathway has pleiotropic roles in animal development, and its aberrant activation is implicated in multiple human cancers1–3. JAK/STAT signaling effects have been attributed largely to direct transcriptional regulation by STAT of specific target genes that promote tumor cell proliferation or survival. We show here in a Drosophila melanogaster hematopoietic tumor model, however, that JAK overactivation globally disrupts heterochromatic gene silencing, an epigenetic tumor suppressive mechanism4. This disruption allows derepression of genes that are not direct targets of STAT, as evidenced by suppression of heterochromatin-mediated position effect variegation. Moreover, mutations in the genes encoding heterochromatin components heterochromatin protein 1 (HP1) and Su(var)3-9 enhance tumorigenesis induced by an oncogenic JAK kinase without affecting JAK/STAT signaling. Consistently, JAK loss of function enhances heterochromatic gene silencing, whereas overexpressing HP1 suppresses oncogenic JAK-induced tumors. These results demonstrate that the JAK/STAT pathway regulates cellular epigenetic status and that globally disrupting heterochromatin-mediated tumor suppression is essential for tumorigenesis induced by JAK overactivation. PMID:16892059

  15. Stable carbon isotope fractionation of chlorinated ethenes by a microbial consortium containing multiple dechlorinating genes.

    PubMed

    Liu, Na; Ding, Longzhen; Li, Haijun; Zhang, Pengpeng; Zheng, Jixing; Weng, Chih-Huang

    2018-08-01

    The study aimed to determine the possible contribution of specific growth conditions and community structures to variable carbon enrichment factors (Ɛ- carbon ) values for the degradation of chlorinated ethenes (CEs) by a bacterial consortium with multiple dechlorinating genes. Ɛ- carbon values for trichloroethylene, cis-1,2-dichloroethylene, and vinyl chloride were -7.24% ± 0.59%, -14.6% ± 1.71%, and -21.1% ± 1.14%, respectively, during their degradation by a microbial consortium containing multiple dechlorinating genes including tceA and vcrA. The Ɛ- carbon values of all CEs were not greatly affected by changes in growth conditions and community structures, which directly or indirectly affected reductive dechlorination of CEs by this consortium. Stability analysis provided evidence that the presence of multiple dechlorinating genes within a microbial consortium had little effect on carbon isotope fractionation, as long as the genes have definite, non-overlapping functions. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Aberrant gene promoter methylation associated with sporadic multiple colorectal cancer.

    PubMed

    Gonzalo, Victoria; Lozano, Juan José; Muñoz, Jenifer; Balaguer, Francesc; Pellisé, Maria; Rodríguez de Miguel, Cristina; Andreu, Montserrat; Jover, Rodrigo; Llor, Xavier; Giráldez, M Dolores; Ocaña, Teresa; Serradesanferm, Anna; Alonso-Espinaco, Virginia; Jimeno, Mireya; Cuatrecasas, Miriam; Sendino, Oriol; Castellví-Bel, Sergi; Castells, Antoni

    2010-01-19

    Colorectal cancer (CRC) multiplicity has been mainly related to polyposis and non-polyposis hereditary syndromes. In sporadic CRC, aberrant gene promoter methylation has been shown to play a key role in carcinogenesis, although little is known about its involvement in multiplicity. To assess the effect of methylation in tumor multiplicity in sporadic CRC, hypermethylation of key tumor suppressor genes was evaluated in patients with both multiple and solitary tumors, as a proof-of-concept of an underlying epigenetic defect. We examined a total of 47 synchronous/metachronous primary CRC from 41 patients, and 41 gender, age (5-year intervals) and tumor location-paired patients with solitary tumors. Exclusion criteria were polyposis syndromes, Lynch syndrome and inflammatory bowel disease. DNA methylation at the promoter region of the MGMT, CDKN2A, SFRP1, TMEFF2, HS3ST2 (3OST2), RASSF1A and GATA4 genes was evaluated by quantitative methylation specific PCR in both tumor and corresponding normal appearing colorectal mucosa samples. Overall, patients with multiple lesions exhibited a higher degree of methylation in tumor samples than those with solitary tumors regarding all evaluated genes. After adjusting for age and gender, binomial logistic regression analysis identified methylation of MGMT2 (OR, 1.48; 95% CI, 1.10 to 1.97; p = 0.008) and RASSF1A (OR, 2.04; 95% CI, 1.01 to 4.13; p = 0.047) as variables independently associated with tumor multiplicity, being the risk related to methylation of any of these two genes 4.57 (95% CI, 1.53 to 13.61; p = 0.006). Moreover, in six patients in whom both tumors were available, we found a correlation in the methylation levels of MGMT2 (r = 0.64, p = 0.17), SFRP1 (r = 0.83, 0.06), HPP1 (r = 0.64, p = 0.17), 3OST2 (r = 0.83, p = 0.06) and GATA4 (r = 0.6, p = 0.24). Methylation in normal appearing colorectal mucosa from patients with multiple and solitary CRC showed no relevant difference in any evaluated gene. These results provide a proof-of-concept that gene promoter methylation is associated with tumor multiplicity. This underlying epigenetic defect may have noteworthy implications in the prevention of patients with sporadic CRC.

  17. Evaluation of helper-dependent canine adenovirus vectors in a 3D human CNS model

    PubMed Central

    Simão, Daniel; Pinto, Catarina; Fernandes, Paulo; Peddie, Christopher J.; Piersanti, Stefania; Collinson, Lucy M.; Salinas, Sara; Saggio, Isabella; Schiavo, Giampietro; Kremer, Eric J.; Brito, Catarina; Alves, Paula M.

    2017-01-01

    Gene therapy is a promising approach with enormous potential for treatment of neurodegenerative disorders. Viral vectors derived from canine adenovirus type 2 (CAV-2) present attractive features for gene delivery strategies in the human brain, by preferentially transducing neurons, are capable of efficient axonal transport to afferent brain structures, have a 30-kb cloning capacity and have low innate and induced immunogenicity in pre-clinical tests. For clinical translation, in-depth pre-clinical evaluation of efficacy and safety in a human setting is primordial. Stem cell-derived human neural cells have a great potential as complementary tools by bridging the gap between animal models, which often diverge considerably from human phenotype, and clinical trials. Herein, we explore helper-dependent CAV-2 (hd-CAV-2) efficacy and safety for gene delivery in a human stem cell-derived 3D neural in vitro model. Assessment of hd-CAV-2 vector efficacy was performed at different multiplicities of infection, by evaluating transgene expression and impact on cell viability, ultrastructural cellular organization and neuronal gene expression. Under optimized conditions, hd-CAV-2 transduction led to stable long-term transgene expression with minimal toxicity. hd-CAV-2 preferentially transduced neurons, while human adenovirus type 5 (HAdV5) showed increased tropism towards glial cells. This work demonstrates, in a physiologically relevant 3D model, that hd-CAV-2 vectors are efficient tools for gene delivery to human neurons, with stable long-term transgene expression and minimal cytotoxicity. PMID:26181626

  18. Evaluation of helper-dependent canine adenovirus vectors in a 3D human CNS model.

    PubMed

    Simão, D; Pinto, C; Fernandes, P; Peddie, C J; Piersanti, S; Collinson, L M; Salinas, S; Saggio, I; Schiavo, G; Kremer, E J; Brito, C; Alves, P M

    2016-01-01

    Gene therapy is a promising approach with enormous potential for treatment of neurodegenerative disorders. Viral vectors derived from canine adenovirus type 2 (CAV-2) present attractive features for gene delivery strategies in the human brain, by preferentially transducing neurons, are capable of efficient axonal transport to afferent brain structures, have a 30-kb cloning capacity and have low innate and induced immunogenicity in preclinical tests. For clinical translation, in-depth preclinical evaluation of efficacy and safety in a human setting is primordial. Stem cell-derived human neural cells have a great potential as complementary tools by bridging the gap between animal models, which often diverge considerably from human phenotype, and clinical trials. Herein, we explore helper-dependent CAV-2 (hd-CAV-2) efficacy and safety for gene delivery in a human stem cell-derived 3D neural in vitro model. Assessment of hd-CAV-2 vector efficacy was performed at different multiplicities of infection, by evaluating transgene expression and impact on cell viability, ultrastructural cellular organization and neuronal gene expression. Under optimized conditions, hd-CAV-2 transduction led to stable long-term transgene expression with minimal toxicity. hd-CAV-2 preferentially transduced neurons, whereas human adenovirus type 5 (HAdV5) showed increased tropism toward glial cells. This work demonstrates, in a physiologically relevant 3D model, that hd-CAV-2 vectors are efficient tools for gene delivery to human neurons, with stable long-term transgene expression and minimal cytotoxicity.

  19. Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie

    2014-06-18

    Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.« less

  20. Single-Nucleotide Polymorphisms Associated with Skin Naphthyl–Keratin Adduct Levels in Workers Exposed to Naphthalene

    PubMed Central

    Jiang, Rong; French, John E.; Stober, Vandy P.; Kang-Sickel, Juei-Chuan C.; Zou, Fei

    2012-01-01

    Background: Individual genetic variation that results in differences in systemic response to xenobiotic exposure is not accounted for as a predictor of outcome in current exposure assessment models. Objective: We developed a strategy to investigate individual differences in single-nucleotide polymorphisms (SNPs) as genetic markers associated with naphthyl–keratin adduct (NKA) levels measured in the skin of workers exposed to naphthalene. Methods: The SNP-association analysis was conducted in PLINK using candidate-gene analysis and genome-wide analysis. We identified significant SNP–NKA associations and investigated the potential impact of these SNPs along with personal and workplace factors on NKA levels using a multiple linear regression model and the Pratt index. Results: In candidate-gene analysis, a SNP (rs4852279) located near the CYP26B1 gene contributed to the 2-naphthyl–keratin adduct (2NKA) level. In the multiple linear regression model, the SNP rs4852279, dermal exposure, exposure time, task replacing foam, age, and ethnicity all were significant predictors of 2NKA level. In genome-wide analysis, no single SNP reached genome-wide significance for NKA levels (all p ≥ 1.05 × 10–5). Pathway and network analyses of SNPs associated with NKA levels were predicted to be involved in the regulation of cellular processes and homeostasis. Conclusions: These results provide evidence that a quantitative biomarker can be used as an intermediate phenotype when investigating the association between genetic markers and exposure–dose relationship in a small, well-characterized exposed worker population. PMID:22391508

  1. Pan- and core- network analysis of co-expression genes in a model plant

    DOE PAGES

    He, Fei; Maslov, Sergei

    2016-12-16

    Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ andmore » ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. Here, we showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.« less

  2. Pan- and core- network analysis of co-expression genes in a model plant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Fei; Maslov, Sergei

    Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ andmore » ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. Here, we showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.« less

  3. Post-transcriptional regulation of Pabpn1 by the RNA binding protein HuR.

    PubMed

    Phillips, Brittany L; Banerjee, Ayan; Sanchez, Brenda J; Di Marco, Sergio; Gallouzi, Imed-Eddine; Pavlath, Grace K; Corbett, Anita H

    2018-06-25

    RNA processing is critical for proper spatial and temporal control of gene expression. The ubiquitous nuclear polyadenosine RNA binding protein, PABPN1, post-transcriptionally regulates multiple steps of gene expression. Mutations in the PABPN1 gene expanding an N-terminal alanine tract in the PABPN1 protein from 10 alanines to 11-18 alanines cause the muscle-specific disease oculopharyngeal muscular dystrophy (OPMD), which affects eyelid, pharynx, and proximal limb muscles. Previous work revealed that the Pabpn1 transcript is unstable, contributing to low steady-state Pabpn1 mRNA and protein levels in vivo, specifically in skeletal muscle, with even lower levels in muscles affected in OPMD. Thus, low levels of PABPN1 protein could predispose specific tissues to pathology in OPMD. However, no studies have defined the mechanisms that regulate Pabpn1 expression. Here, we define multiple cis-regulatory elements and a trans-acting factor, HuR, which regulate Pabpn1 expression specifically in mature muscle in vitro and in vivo. We exploit multiple models including C2C12 myotubes, primary muscle cells, and mice to determine that HuR decreases Pabpn1 expression. Overall, we have uncovered a mechanism in mature muscle that negatively regulates Pabpn1 expression in vitro and in vivo, which could provide insight to future studies investigating therapeutic strategies for OPMD treatment.

  4. Recent progress in the genetics of spontaneously hypertensive rats.

    PubMed

    Pravenec, M; Křen, V; Landa, V; Mlejnek, P; Musilová, A; Šilhavý, J; Šimáková, M; Zídek, V

    2014-01-01

    The spontaneously hypertensive rat (SHR) is the most widely used animal model of essential hypertension and accompanying metabolic disturbances. Recent advances in sequencing of genomes of BN-Lx and SHR progenitors of the BXH/HXB recombinant inbred (RI) strains as well as accumulation of multiple data sets of intermediary phenotypes in the RI strains, including mRNA and microRNA abundance, quantitative metabolomics, proteomics, methylomics or histone modifications, will make it possible to systematically search for genetic variants involved in regulation of gene expression and in the etiology of complex pathophysiological traits. New advances in manipulation of the rat genome, including efficient transgenesis and gene targeting, will enable in vivo functional analyses of selected candidate genes to identify QTL at the molecular level or to provide insight into mechanisms whereby targeted genes affect pathophysiological traits in the SHR.

  5. Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data.

    PubMed

    Johnson, Brent A

    2009-10-01

    We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.

  6. Genome-wide histone acetylation is altered in a transgenic mouse model of Huntington's disease.

    PubMed

    McFarland, Karen N; Das, Sudeshna; Sun, Ting Ting; Leyfer, Dmitri; Xia, Eva; Sangrey, Gavin R; Kuhn, Alexandre; Luthi-Carter, Ruth; Clark, Timothy W; Sadri-Vakili, Ghazaleh; Cha, Jang-Ho J

    2012-01-01

    In Huntington's disease (HD; MIM ID #143100), a fatal neurodegenerative disorder, transcriptional dysregulation is a key pathogenic feature. Histone modifications are altered in multiple cellular and animal models of HD suggesting a potential mechanism for the observed changes in transcriptional levels. In particular, previous work has suggested an important link between decreased histone acetylation, particularly acetylated histone H3 (AcH3; H3K9K14ac), and downregulated gene expression. However, the question remains whether changes in histone modifications correlate with transcriptional abnormalities across the entire transcriptome. Using chromatin immunoprecipitation paired with microarray hybridization (ChIP-chip), we interrogated AcH3-gene interactions genome-wide in striata of 12-week old wild-type (WT) and transgenic (TG) R6/2 mice, an HD mouse model, and correlated these interactions with gene expression levels. At the level of the individual gene, we found decreases in the number of sites occupied by AcH3 in the TG striatum. In addition, the total number of genes bound by AcH3 was decreased. Surprisingly, the loss of AcH3 binding sites occurred within the coding regions of the genes rather than at the promoter region. We also found that the presence of AcH3 at any location within a gene strongly correlated with the presence of its transcript in both WT and TG striatum. In the TG striatum, treatment with histone deacetylase (HDAC) inhibitors increased global AcH3 levels with concomitant increases in transcript levels; however, AcH3 binding at select gene loci increased only slightly. This study demonstrates that histone H3 acetylation at lysine residues 9 and 14 and active gene expression are intimately tied in the rodent brain, and that this fundamental relationship remains unchanged in an HD mouse model despite genome-wide decreases in histone H3 acetylation.

  7. A whole blood gene expression-based signature for smoking status

    PubMed Central

    2012-01-01

    Background Smoking is the leading cause of preventable death worldwide and has been shown to increase the risk of multiple diseases including coronary artery disease (CAD). We sought to identify genes whose levels of expression in whole blood correlate with self-reported smoking status. Methods Microarrays were used to identify gene expression changes in whole blood which correlated with self-reported smoking status; a set of significant genes from the microarray analysis were validated by qRT-PCR in an independent set of subjects. Stepwise forward logistic regression was performed using the qRT-PCR data to create a predictive model whose performance was validated in an independent set of subjects and compared to cotinine, a nicotine metabolite. Results Microarray analysis of whole blood RNA from 209 PREDICT subjects (41 current smokers, 4 quit ≤ 2 months, 64 quit > 2 months, 100 never smoked; NCT00500617) identified 4214 genes significantly correlated with self-reported smoking status. qRT-PCR was performed on 1,071 PREDICT subjects across 256 microarray genes significantly correlated with smoking or CAD. A five gene (CLDND1, LRRN3, MUC1, GOPC, LEF1) predictive model, derived from the qRT-PCR data using stepwise forward logistic regression, had a cross-validated mean AUC of 0.93 (sensitivity=0.78; specificity=0.95), and was validated using 180 independent PREDICT subjects (AUC=0.82, CI 0.69-0.94; sensitivity=0.63; specificity=0.94). Plasma from the 180 validation subjects was used to assess levels of cotinine; a model using a threshold of 10 ng/ml cotinine resulted in an AUC of 0.89 (CI 0.81-0.97; sensitivity=0.81; specificity=0.97; kappa with expression model = 0.53). Conclusion We have constructed and validated a whole blood gene expression score for the evaluation of smoking status, demonstrating that clinical and environmental factors contributing to cardiovascular disease risk can be assessed by gene expression. PMID:23210427

  8. RNA degradation and models for post-transcriptional gene-silencing.

    PubMed

    Meins, F

    2000-06-01

    Post-transcriptional gene silencing (PTGS) is a form of stable but potentially reversible epigenetic modification, which frequently occurs in transgenic plants. The interaction in trans of genes with similar transcribed sequences results in sequence-specific degradation of RNAs derived from the genes involved. Highly expressed single-copy loci, transcribed inverted repeats, and poorly transcribed complex loci can act as sources of signals that trigger PTGS. In some cases, mobile, sequence-specific silencing signals can move from cell to cell or even over long distances in the plant. Several current models hold that silencing signals are 'aberrant' RNAs (aRNA), which differ in some way from normal mRNAs. The most likely candidates are small antisense RNAs (asRNA) and double-stranded RNAs (dsRNA). Direct evidence that these or other aRNAs found in silent tissues can induce PTGS is still lacking. Most current models assume that silencing signals interact with target RNAs in a sequence-specific fashion. This results in degradation, usually in the cytoplasm, by exonucleolytic as well as endonucleolytic pathways, which are not necessarily PTGS-specific. Biochemical-switch models hold that the silent state is maintained by a positive auto-regulatory loop. One possibility is that concentrations of hypothetical silencing signals above a critical threshold trigger their own production by self-replication, by degradation of target RNAs, or by a combination of both mechanisms. These models can account for the stability, reversibility and multiplicity of silent states; the strong influence of transcription rate of target genes on the incidence and stability of silencing, and the amplification and systemic propagation of motile silencing signals.

  9. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

    PubMed

    Dwivedi, Bhakti; Kowalski, Jeanne

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.

  10. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations

    PubMed Central

    Dwivedi, Bhakti

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010

  11. Inference of developmental gene regulatory networks beyond classical model systems: new approaches in the post-genomic era.

    PubMed

    Fernandez-Valverde, Selene L; Aguilera, Felipe; Ramos-Díaz, René Alexander

    2018-06-18

    The advent of high-throughput sequencing technologies has revolutionized the way we understand the transformation of genetic information into morphological traits. Elucidating the network of interactions between genes that govern cell differentiation through development is one of the core challenges in genome research. These networks are known as developmental gene regulatory networks (dGRNs) and consist largely of the functional linkage between developmental control genes, cis-regulatory modules and differentiation genes, which generate spatially and temporally refined patterns of gene expression. Over the last 20 years, great advances have been made in determining these gene interactions mainly in classical model systems, including human, mouse, sea urchin, fruit fly, and worm. This has brought about a radical transformation in the fields of developmental biology and evolutionary biology, allowing the generation of high-resolution gene regulatory maps to analyse cell differentiation during animal development. Such maps have enabled the identification of gene regulatory circuits and have led to the development of network inference methods that can recapitulate the differentiation of specific cell-types or developmental stages. In contrast, dGRN research in non-classical model systems has been limited to the identification of developmental control genes via the candidate gene approach and the characterization of their spatiotemporal expression patterns, as well as to the discovery of cis-regulatory modules via patterns of sequence conservation and/or predicted transcription-factor binding sites. However, thanks to the continuous advances in high-throughput sequencing technologies, this scenario is rapidly changing. Here, we give a historical overview on the architecture and elucidation of the dGRNs. Subsequently, we summarize the approaches available to unravel these regulatory networks, highlighting the vast range of possibilities of integrating multiple technical advances and theoretical approaches to expand our understanding on the global of gene regulation during animal development in non-classical model systems. Such new knowledge will not only lead to greater insights into the evolution of molecular mechanisms underlying cell identity and animal body plans, but also into the evolution of morphological key innovations in animals.

  12. Immune Response in Microgravity: Genetic Basis and Countermeasure Development Implications

    NASA Technical Reports Server (NTRS)

    Risin, Diana; Ward, Nancy E.; Risin, Semyon A.; Pellis, Neal R.

    2006-01-01

    Impairment of the immunity in astronauts and cosmonauts even in shortterm flights is a recognized risk. Longterm orbital space missions and anticipated interplanetary flights increase the concern for more pronounced effects on the immune system with potential clinical consequences. Studies in true and modeled microgravity (MG) have demonstrated that MG directly affects numerous lymphocyte functions. The purpose of this study was to screen for genes involved in lymphocytes response to modeled microgravity (MMG) that could explain the functional and structural changes observed earlier. The microgravity-induced changes in gene expression were analyzed by microarray DNA chip technology. CD3and IL2activated Tcells were cultured in 1g (static) and modeled microgravity (NASA Rotating Wall Vessel bioreactor) conditions for 24 hours. Total RNA was extracted using the RNeasy isolation kit (Qiagen, Valencia, CA). Microarray experiments were performed utilizing Affymetrix Gene Chips (U133A), allowing testing for 18,400 human genes. To decrease the biological variation and aid in detecting microgravity-associated changes, experiments were performed in triplicate using cells obtained from three different donors. Exposure to modeled microgravity resulted in alteration of 89 genes, 10 of which were upregulated and 79 down-regulated. Altered genes were categorized by their function, structural role and by association with metabolic and regulatory pathways. A large proportion was found to be involved in fundamental cellular processes: signal transduction, DNA repair, apoptosis, and multiple metabolic pathways. There was a group of genes directly related to immune and inflammatory responses (IL7R, granulysin, proteasome activator subunit 2, peroxiredoxin 4, HLADRA, lymphocyte antigen 75, IL18R and DOCK2 genes). Among these genes only one (IL7R) was upregulated, the rest were downregulated. The upregulation of the IL7 receptor gene was confirmed by RT PCR. Three genes with altered expression were identified in the apoptosis related group (Granzyme B, APO2 ligand and Beta3endonexin). All of them were downregulated. Gene expression changes in MG might appear pivotal in identifying potential molecular targets for countermeasure development. (Supported by NRA OLMSA02 and NSCORT NAG54072 grants).

  13. An evidence-based knowledgebase of metastasis suppressors to identify key pathways relevant to cancer metastasis

    PubMed Central

    Zhao, Min; Li, Zhe; Qu, Hong

    2015-01-01

    Metastasis suppressor genes (MS genes) are genes that play important roles in inhibiting the process of cancer metastasis without preventing growth of the primary tumor. Identification of these genes and understanding their functions are critical for investigation of cancer metastasis. Recent studies on cancer metastasis have identified many new susceptibility MS genes. However, the comprehensive illustration of diverse cellular processes regulated by metastasis suppressors during the metastasis cascade is lacking. Thus, the relationship between MS genes and cancer risk is still unclear. To unveil the cellular complexity of MS genes, we have constructed MSGene (http://MSGene.bioinfo-minzhao.org/), the first literature-based gene resource for exploring human MS genes. In total, we manually curated 194 experimentally verified MS genes and mapped to 1448 homologous genes from 17 model species. Follow-up functional analyses associated 194 human MS genes with epithelium/tissue morphogenesis and epithelia cell proliferation. In addition, pathway analysis highlights the prominent role of MS genes in activation of platelets and coagulation system in tumor metastatic cascade. Moreover, global mutation pattern of MS genes across multiple cancers may reveal common cancer metastasis mechanisms. All these results illustrate the importance of MSGene to our understanding on cell development and cancer metastasis. PMID:26486520

  14. Usefulness of Housekeeping Genes for the Diagnosis of Helicobacter pylori Infection, Strain Discrimination and Detection of Multiple Infection.

    PubMed

    Palau, Montserrat; Kulmann, Marcos; Ramírez-Lázaro, María José; Lario, Sergio; Quilez, María Elisa; Campo, Rafael; Piqué, Núria; Calvet, Xavier; Miñana-Galbis, David

    2016-12-01

    Helicobacter pylori infects human stomachs of over half the world's population, evades the immune response and establishes a chronic infection. Although most people remains asymptomatic, duodenal and gastric ulcers, MALT lymphoma and progression to gastric cancer could be developed. Several virulence factors such as flagella, lipopolysaccharide, adhesins and especially the vacuolating cytotoxin VacA and the oncoprotein CagA have been described for H. pylori. Despite the extensive published data on H. pylori, more research is needed to determine new virulence markers, the exact mode of transmission or the role of multiple infection. Amplification and sequencing of six housekeeping genes (amiA, cgt, cpn60, cpn70, dnaJ, and luxS) related to H. pylori pathogenesis have been performed in order to evaluate their usefulness for the specific detection of H. pylori, the genetic discrimination at strain level and the detection of multiple infection. A total of 52 H. pylori clones, isolated from 14 gastric biopsies from 11 patients, were analyzed for this purpose. All genes were specifically amplified for H. pylori and all clones isolated from different patients were discriminated, with gene distances ranged from 0.9 to 7.8%. Although most clones isolated from the same patient showed identical gene sequences, an event of multiple infection was detected in all the genes and microevolution events were showed for amiA and cpn60 genes. These results suggested that housekeeping genes could be useful for H. pylori detection and to elucidate the mode of transmission and the relevance of the multiple infection. © 2016 John Wiley & Sons Ltd.

  15. The ABC Model and its Applicability to Basal Angiosperms

    PubMed Central

    Soltis, Douglas E.; Chanderbali, André S.; Kim, Sangtae; Buzgo, Matyas; Soltis, Pamela S.

    2007-01-01

    Background Although the flower is the central feature of the angiosperms, little is known of its origin and subsequent diversification. The ABC model has long been the unifying paradigm for floral developmental genetics, but it is based on phylogenetically derived eudicot models. Synergistic research involving phylogenetics, classical developmental studies, genomics and developmental genetics has afforded valuable new insights into floral evolution in general, and the early flower in particular. Scope and Conclusions Genomic studies indicate that basal angiosperms, and by inference the earliest angiosperms, had a rich tool kit of floral genes. Homologues of the ABCE floral organ identity genes are also present in basal angiosperm lineages; however, C-, E- and particularly B-function genes are more broadly expressed in basal lineages. There is no single model of floral organ identity that applies to all angiosperms; there are multiple models that apply depending on the phylogenetic position and floral structure of the group in question. The classic ABC (or ABCE) model may work well for most eudicots. However, modifications are needed for basal eudicots and, the focus of this paper, basal angiosperms. We offer ‘fading borders’ as a testable hypothesis for the basal-most angiosperms and, by inference, perhaps some of the earliest (now extinct) angiosperms. PMID:17616563

  16. Stressing "Escherichia coli" to Educate Students about Research: A CURE to Investigate Multiple Levels of Gene Regulation

    ERIC Educational Resources Information Center

    McDonough, Janet; Goudsouzian, Lara K.; Papaj, Agllai; Maceli, Ashley R.; Klepac-Ceraj, Vanja; Peterson, Celeste N.

    2017-01-01

    Course-based undergraduate research experiences (CUREs) have been shown to increase student retention and learning in the biological sciences. Most CURES cover only one aspect of gene regulation, such as transcriptional control. Here we present a new inquiry-based lab that engages understanding of gene expression from multiple perspectives.…

  17. ErbB4 in Laminated Brain Structures: A Neurodevelopmental Approach to Schizophrenia

    PubMed Central

    Perez-Garcia, Carlos G.

    2015-01-01

    The susceptibility genes for schizophrenia Neuregulin-1 (NRG1) and ErbB4 have critical functions during brain development and in the adult. Alterations in the ErbB4 signaling pathway cause a variety of neurodevelopmental defects including deficiencies in neuronal migration, synaptic plasticity, and myelination. I have used the ErbB4-/- HER4heart KO mice to study the neurodevelopmental insults associated to deficiencies in the NRG1-ErbB4 signaling pathway and their potential implication with brain disorders such as schizophrenia, a chronic psychiatric disease affecting 1% of the population worldwide. ErbB4 deletion results in an array of neurodevelopmental deficits that are consistent with a schizophrenic model. First, similar defects appear in multiple brain structures, from the cortex to the cerebellum. Second, these defects affect multiple aspects of brain development, from deficits in neuronal migration to impairments in excitatory/inhibitory systems, including reductions in brain volume, cortical and cerebellar heterotopias, alterations in number and distribution of specific subpopulations of interneurons, deficiencies in the astrocytic and oligodendrocytic lineages, and additional insults in major brain structures. This suggests that alterations in specific neurodevelopmental genes that play similar functions in multiple neuroanatomical structures might account for some of the symptomatology observed in schizophrenic patients, such as defects in cognition. ErbB4 mutation uncovers flaws in brain development that are compatible with a neurodevelopmental model of schizophrenia, and it establishes a comprehensive model to study the basis of the disorder before symptoms are detected in the adult. PMID:26733804

  18. Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

    PubMed Central

    Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

    2003-01-01

    Background Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Results Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. Conclusion In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects. PMID:12962547

  19. Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

    PubMed

    Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

    2003-09-08

    Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects.

  20. Germline mutations in candidate predisposition genes in individuals with cutaneous melanoma and at least two independent additional primary cancers.

    PubMed

    Pritchard, Antonia L; Johansson, Peter A; Nathan, Vaishnavi; Howlie, Madeleine; Symmons, Judith; Palmer, Jane M; Hayward, Nicholas K

    2018-01-01

    While a number of autosomal dominant and autosomal recessive cancer syndromes have an associated spectrum of cancers, the prevalence and variety of cancer predisposition mutations in patients with multiple primary cancers have not been extensively investigated. An understanding of the variants predisposing to more than one cancer type could improve patient care, including screening and genetic counselling, as well as advancing the understanding of tumour development. A cohort of 57 patients ascertained due to their cutaneous melanoma (CM) diagnosis and with a history of two or more additional non-cutaneous independent primary cancer types were recruited for this study. Patient blood samples were assessed by whole exome or whole genome sequencing. We focussed on variants in 525 pre-selected genes, including 65 autosomal dominant and 31 autosomal recessive cancer predisposition genes, 116 genes involved in the DNA repair pathway, and 313 commonly somatically mutated in cancer. The same genes were analysed in exome sequence data from 1358 control individuals collected as part of non-cancer studies (UK10K). The identified variants were classified for pathogenicity using online databases, literature and in silico prediction tools. No known pathogenic autosomal dominant or previously described compound heterozygous mutations in autosomal recessive genes were observed in the multiple cancer cohort. Variants typically found somatically in haematological malignancies (in JAK1, JAK2, SF3B1, SRSF2, TET2 and TYK2) were present in lymphocyte DNA of patients with multiple primary cancers, all of whom had a history of haematological malignancy and cutaneous melanoma, as well as colorectal cancer and/or prostate cancer. Other potentially pathogenic variants were discovered in BUB1B, POLE2, ROS1 and DNMT3A. Compared to controls, multiple cancer cases had significantly more likely damaging mutations (nonsense, frameshift ins/del) in tumour suppressor and tyrosine kinase genes and higher overall burden of mutations in all cancer genes. We identified several pathogenic variants that likely predispose to at least one of the tumours in patients with multiple cancers. We additionally present evidence that there may be a higher burden of variants of unknown significance in 'cancer genes' in patients with multiple cancer types. Further screens of this nature need to be carried out to build evidence to show if the cancers observed in these patients form part of a cancer spectrum associated with single germline variants in these genes, whether multiple layers of susceptibility exist (oligogenic or polygenic), or if the occurrence of multiple different cancers is due to random chance.

Top