Sample records for estimating gene expression

  1. A gene expression estimator of intramuscular fat percentage for use in both cattle and sheep

    PubMed Central

    2014-01-01

    Background The expression of genes encoding proteins involved in triacyglyceride and fatty acid synthesis and storage in cattle muscle are correlated with intramuscular fat (IMF)%. Are the same genes also correlated with IMF% in sheep muscle, and can the same set of genes be used to estimate IMF% in both species? Results The correlation between gene expression (microarray) and IMF% in the longissimus muscle (LM) of twenty sheep was calculated. An integrated analysis of this dataset with an equivalent cattle correlation dataset and a cattle differential expression dataset was undertaken. A total of 30 genes were identified to be strongly correlated with IMF% in both cattle and sheep. The overlap of genes was highly significant, 8 of the 13 genes in the TAG gene set and 8 of the 13 genes in the FA gene set were in the top 100 and 500 genes respectively most correlated with IMF% in sheep, P-value = 0. Of the 30 genes, CIDEA, THRSP, ACSM1, DGAT2 and FABP4 had the highest average rank in both species. Using the data from two small groups of Brahman cattle (control and Hormone growth promotant-treated [known to decrease IMF% in muscle]) and 22 animals in total, the utility of a direct measure and different estimators of IMF% (ultrasound and gene expression) to differentiate between the two groups were examined. Directly measured IMF% and IMF% estimated from ultrasound scanning could not discriminate between the two groups. However, using gene expression to estimate IMF% discriminated between the two groups. Increasing the number of genes used to estimate IMF% from one to five significantly increased the discrimination power; but increasing the number of genes to 15 resulted in little further improvement. Conclusion We have demonstrated the utility of a comparative approach to identify robust estimators of IMF% in the LM in cattle and sheep. We have also demonstrated a number of approaches (potentially applicable to much smaller groups of animals than conventional methods) to using gene expression to rank animals for IMF% within a single farm/treatment, or to estimate differences in IMF% between two farms/treatments. PMID:25028604

  2. Shrinkage estimation of effect sizes as an alternative to hypothesis testing followed by estimation in high-dimensional biology: applications to differential gene expression.

    PubMed

    Montazeri, Zahra; Yanofsky, Corey M; Bickel, David R

    2010-01-01

    Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a hard-threshold estimator of the expression ratio that is not known to perform well in terms of mean-squared error, the sum of estimator variance and squared estimator bias. On the basis of two distinct simulation studies and data from different microarray studies, we systematically compared the performance of several estimators representing both current practice and shrinkage. We find that the threshold-based estimators usually perform worse than the maximum-likelihood estimator (MLE) and they often perform far worse as quantified by estimated mean-squared risk. By contrast, the shrinkage estimators tend to perform as well as or better than the MLE and never much worse than the MLE, as expected from what is known about shrinkage. However, a Bayesian measure of performance based on the prior information that few genes are differentially expressed indicates that hard-threshold estimators perform about as well as the local false discovery rate (FDR), the best of the shrinkage estimators studied. Based on the ability of the latter to leverage information across genes, we conclude that the use of the local-FDR estimator of the fold change instead of informal or threshold-based combinations of statistical tests and non-shrinkage estimators can be expected to substantially improve the reliability of gene prioritization at very little risk of doing so less reliably. Since the proposed replacement of post-selection estimates with shrunken estimates applies as well to other types of high-dimensional data, it could also improve the analysis of SNP data from genome-wide association studies.

  3. Neighboring Genes Show Correlated Evolution in Gene Expression

    PubMed Central

    Ghanbarian, Avazeh T.; Hurst, Laurence D.

    2015-01-01

    When considering the evolution of a gene’s expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking. PMID:25743543

  4. Gene expression during blow fly development: improving the precision of age estimates in forensic entomology.

    PubMed

    Tarone, Aaron M; Foran, David R

    2011-01-01

    Forensic entomologists use size and developmental stage to estimate blow fly age, and from those, a postmortem interval. Since such estimates are generally accurate but often lack precision, particularly in the older developmental stages, alternative aging methods would be advantageous. Presented here is a means of incorporating developmentally regulated gene expression levels into traditional stage and size data, with a goal of more precisely estimating developmental age of immature Lucilia sericata. Generalized additive models of development showed improved statistical support compared to models that did not include gene expression data, resulting in an increase in estimate precision, especially for postfeeding third instars and pupae. The models were then used to make blind estimates of development for 86 immature L. sericata raised on rat carcasses. Overall, inclusion of gene expression data resulted in increased precision in aging blow flies. © 2010 American Academy of Forensic Sciences.

  5. Neighboring Genes Show Correlated Evolution in Gene Expression.

    PubMed

    Ghanbarian, Avazeh T; Hurst, Laurence D

    2015-07-01

    When considering the evolution of a gene's expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Evaluation of reference genes for reverse transcription quantitative real-time PCR (RT-qPCR) studies in Silene vulgaris considering the method of cDNA preparation

    PubMed Central

    Koloušková, Pavla; Stone, James D.

    2017-01-01

    Accurate gene expression measurements are essential in studies of both crop and wild plants. Reverse transcription quantitative real-time PCR (RT-qPCR) has become a preferred tool for gene expression estimation. A selection of suitable reference genes for the normalization of transcript levels is an essential prerequisite of accurate RT-qPCR results. We evaluated the expression stability of eight candidate reference genes across roots, leaves, flower buds and pollen of Silene vulgaris (bladder campion), a model plant for the study of gynodioecy. As random priming of cDNA is recommended for the study of organellar transcripts and poly(A) selection is indicated for nuclear transcripts, we estimated gene expression with both random-primed and oligo(dT)-primed cDNA. Accordingly, we determined reference genes that perform well with oligo(dT)- and random-primed cDNA, making it possible to estimate levels of nucleus-derived transcripts in the same cDNA samples as used for organellar transcripts, a key benefit in studies of cyto-nuclear interactions. Gene expression variance was estimated by RefFinder, which integrates four different analytical tools. The SvACT and SvGAPDH genes were the most stable candidates across various organs of S. vulgaris, regardless of whether pollen was included or not. PMID:28817728

  7. PanGEA: identification of allele specific gene expression using the 454 technology.

    PubMed

    Kofler, Robert; Teixeira Torres, Tatiana; Lelley, Tamas; Schlötterer, Christian

    2009-05-14

    Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression. We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: http://www.kofler.or.at/bioinformatics/PanGEA

  8. PanGEA: Identification of allele specific gene expression using the 454 technology

    PubMed Central

    Kofler, Robert; Teixeira Torres, Tatiana; Lelley, Tamas; Schlötterer, Christian

    2009-01-01

    Background Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression. Results We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology Conclusion To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: PMID:19442283

  9. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  10. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

  11. LocExpress: a web server for efficiently estimating expression of novel transcripts.

    PubMed

    Hou, Mei; Tian, Feng; Jiang, Shuai; Kong, Lei; Yang, Dechang; Gao, Ge

    2016-12-22

    The temporal and spatial-specific expression pattern of a transcript in multiple tissues and cell types can indicate key clues about its function. While several gene atlas available online as pre-computed databases for known gene models, it's still challenging to get expression profile for previously uncharacterized (i.e. novel) transcripts efficiently. Here we developed LocExpress, a web server for efficiently estimating expression of novel transcripts across multiple tissues and cell types in human (20 normal tissues/cells types and 14 cell lines) as well as in mouse (24 normal tissues/cell types and nine cell lines). As a wrapper to RNA-Seq quantification algorithm, LocExpress efficiently reduces the time cost by making abundance estimation calls increasingly within the minimum spanning bundle region of input transcripts. For a given novel gene model, such local context-oriented strategy allows LocExpress to estimate its FPKMs in hundreds of samples within minutes on a standard Linux box, making an online web server possible. To the best of our knowledge, LocExpress is the only web server to provide nearly real-time expression estimation for novel transcripts in common tissues and cell types. The server is publicly available at http://loc-express.cbi.pku.edu.cn .

  12. Estimating intrinsic and extrinsic noise from single-cell gene expression measurements

    PubMed Central

    Fu, Audrey Qiuyan; Pachter, Lior

    2017-01-01

    Gene expression is stochastic and displays variation (“noise”) both within and between cells. Intracellular (intrinsic) variance can be distinguished from extracellular (extrinsic) variance by applying the law of total variance to data from two-reporter assays that probe expression of identically regulated gene pairs in single cells. We examine established formulas [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): “Stochastic gene expression in a single cell,” Science, 297, 1183–1186.] for the estimation of intrinsic and extrinsic noise and provide interpretations of them in terms of a hierarchical model. This allows us to derive alternative estimators that minimize bias or mean squared error. We provide a geometric interpretation of these results that clarifies the interpretation in [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): “Stochastic gene expression in a single cell,” Science, 297, 1183–1186.]. We also demonstrate through simulation and re-analysis of published data that the distribution assumptions underlying the hierarchical model have to be satisfied for the estimators to produce sensible results, which highlights the importance of normalization. PMID:27875323

  13. Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

    PubMed

    Hu, Jianhua; Wright, Fred A

    2007-03-01

    The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.

  14. NURD: an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data

    PubMed Central

    2013-01-01

    Background RNA-Seq technology has been used widely in transcriptome study, and one of the most important applications is to estimate the expression level of genes and their alternative splicing isoforms. There have been several algorithms published to estimate the expression based on different models. Recently Wu et al. published a method that can accurately estimate isoform level expression by considering position-related sequencing biases using nonparametric models. The method has advantages in handling different read distributions, but there hasn’t been an efficient program to implement this algorithm. Results We developed an efficient implementation of the algorithm in the program NURD. It uses a binary interval search algorithm. The program can correct both the global tendency of sequencing bias in the data and local sequencing bias specific to each gene. The correction makes the isoform expression estimation more reliable under various read distributions. And the implementation is computationally efficient in both the memory cost and running time and can be readily scaled up for huge datasets. Conclusion NURD is an efficient and reliable tool for estimating the isoform expression level. Given the reads mapping result and gene annotation file, NURD will output the expression estimation result. The package is freely available for academic use at http://bioinfo.au.tsinghua.edu.cn/software/NURD/. PMID:23837734

  15. Estimation of the proteomic cancer co-expression sub networks by using association estimators.

    PubMed

    Erdoğan, Cihat; Kurt, Zeyneb; Diri, Banu

    2017-01-01

    In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators' performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists.

  16. At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana

    PubMed Central

    Laubinger, Sascha; Zeller, Georg; Henz, Stefan R; Sachsenberg, Timo; Widmer, Christian K; Naouar, Naïra; Vuylsteke, Marnik; Schölkopf, Bernhard; Rätsch, Gunnar; Weigel, Detlef

    2008-01-01

    Gene expression maps for model organisms, including Arabidopsis thaliana, have typically been created using gene-centric expression arrays. Here, we describe a comprehensive expression atlas, Arabidopsis thaliana Tiling Array Express (At-TAX), which is based on whole-genome tiling arrays. We demonstrate that tiling arrays are accurate tools for gene expression analysis and identified more than 1,000 unannotated transcribed regions. Visualizations of gene expression estimates, transcribed regions, and tiling probe measurements are accessible online at the At-TAX homepage. PMID:18613972

  17. Estimating differential expression from multiple indicators

    PubMed Central

    Ilmjärv, Sten; Hundahl, Christian Ansgar; Reimets, Riin; Niitsoo, Margus; Kolde, Raivo; Vilo, Jaak; Vasar, Eero; Luuk, Hendrik

    2014-01-01

    Regardless of the advent of high-throughput sequencing, microarrays remain central in current biomedical research. Conventional microarray analysis pipelines apply data reduction before the estimation of differential expression, which is likely to render the estimates susceptible to noise from signal summarization and reduce statistical power. We present a probe-level framework, which capitalizes on the high number of concurrent measurements to provide more robust differential expression estimates. The framework naturally extends to various experimental designs and target categories (e.g. transcripts, genes, genomic regions) as well as small sample sizes. Benchmarking in relation to popular microarray and RNA-sequencing data-analysis pipelines indicated high and stable performance on the Microarray Quality Control dataset and in a cell-culture model of hypoxia. Experimental-data-exhibiting long-range epigenetic silencing of gene expression was used to demonstrate the efficacy of detecting differential expression of genomic regions, a level of analysis not embraced by conventional workflows. Finally, we designed and conducted an experiment to identify hypothermia-responsive genes in terms of monotonic time-response. As a novel insight, hypothermia-dependent up-regulation of multiple genes of two major antioxidant pathways was identified and verified by quantitative real-time PCR. PMID:24586062

  18. Determining Physical Mechanisms of Gene Expression Regulation from Single Cell Gene Expression Data.

    PubMed

    Ezer, Daphne; Moignard, Victoria; Göttgens, Berthold; Adryan, Boris

    2016-08-01

    Many genes are expressed in bursts, which can contribute to cell-to-cell heterogeneity. It is now possible to measure this heterogeneity with high throughput single cell gene expression assays (single cell qPCR and RNA-seq). These experimental approaches generate gene expression distributions which can be used to estimate the kinetic parameters of gene expression bursting, namely the rate that genes turn on, the rate that genes turn off, and the rate of transcription. We construct a complete pipeline for the analysis of single cell qPCR data that uses the mathematics behind bursty expression to develop more accurate and robust algorithms for analyzing the origin of heterogeneity in experimental samples, specifically an algorithm for clustering cells by their bursting behavior (Simulated Annealing for Bursty Expression Clustering, SABEC) and a statistical tool for comparing the kinetic parameters of bursty expression across populations of cells (Estimation of Parameter changes in Kinetics, EPiK). We applied these methods to hematopoiesis, including a new single cell dataset in which transcription factors (TFs) involved in the earliest branchpoint of blood differentiation were individually up- and down-regulated. We could identify two unique sub-populations within a seemingly homogenous group of hematopoietic stem cells. In addition, we could predict regulatory mechanisms controlling the expression levels of eighteen key hematopoietic transcription factors throughout differentiation. Detailed information about gene regulatory mechanisms can therefore be obtained simply from high throughput single cell gene expression data, which should be widely applicable given the rapid expansion of single cell genomics.

  19. Improving cluster-based missing value estimation of DNA microarray data.

    PubMed

    Brás, Lígia P; Menezes, José C

    2007-06-01

    We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

  20. Immune gene expression for diverse haemocytes derived from pacific white shrimp, Litopenaeus vannamei.

    PubMed

    Yang, Chih-Chiu; Lu, Chung-Lun; Chen, Sherwin; Liao, Wen-Liang; Chen, Shiu-Nan

    2015-05-01

    In this study, diverse haemocytes from Pacific white shrimp Litopenaeus vannamei were spread by flow cytometer sorting system. Using the two commonly flow cytometric parameters FSC and SSC, the haemocytes could be divided into three populations. Microscopy observation of L. vannamei haemocytes in anticoagulant buffer revealed three morphologically distinct cell types designated as granular cell, hyaline cell and semigranular cell. Immune genes, which includes prophenoloxidase (proPO), lipopolysaccharide-β-glucan binding protein (LGBP), peroxinectin, crustin, lysozyme, penaeid-3a and transglutaminase (TGase), expressed from different haemocyte were analysed by quantitative real time PCR (qPCR). Results from the mRNA expression was estimated by relative level of each gene to β-actin gene. Finally, the seven genes could be grouped by their dominant expression sites. ProPO, LGBP and peroxinectin were highly expressed in granular cells, while LGBP, crustin, lysozyme and P-3a were highly expressed in semigranular cells and TGase was highly expressed in hyaline cells. In this study, L. vannamei haemocytes were firstly grouped into three different types and the immune related genes expression in grouped haemocytes were estimated. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Estimation of the proteomic cancer co-expression sub networks by using association estimators

    PubMed Central

    Kurt, Zeyneb; Diri, Banu

    2017-01-01

    In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators’ performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists. PMID:29145449

  2. An Exercise to Estimate Differential Gene Expression in Human Cells

    ERIC Educational Resources Information Center

    Chaudhry, M. Ahmad

    2006-01-01

    The expression of genes in cells of various tissue types varies considerably and is correlated with the function of a particular organ. The pattern of gene expression changes in diseased tissues, in response to therapy or infection and exposure to environmental mutagens, chemicals, ultraviolet light, and ionizing radiation. To better understand…

  3. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits.

    PubMed

    Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé; Kichaev, Gleb; Gusev, Alexander; Pasaniuc, Bogdan

    2017-03-02

    Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. Here, we introduce a method for estimating the local genetic correlation between gene expression and a complex trait and utilize it to estimate the genetic correlation due to predicted expression between pairs of traits. We integrated gene expression measurements from 45 expression panels with summary GWAS data to perform 30 multi-tissue transcriptome-wide association studies (TWASs). We identified 1,196 genes whose expression is associated with these traits; of these, 168 reside more than 0.5 Mb away from any previously reported GWAS significant variant. We then used our approach to find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, eight were not found through genetic correlation at the SNP level. Finally, we used bi-directional regression to find evidence that BMI causally influences triglyceride levels and that triglyceride levels causally influence low-density lipoprotein. Together, our results provide insight into the role of gene expression in the susceptibility of complex traits and diseases. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  4. Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

    PubMed

    Liu, Xuejun; Shi, Xinxin; Chen, Chunlin; Zhang, Li

    2015-10-16

    The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.

  5. Mimosa: Mixture Model of Co-expression to Detect Modulators of Regulatory Interaction

    NASA Astrophysics Data System (ADS)

    Hansen, Matthew; Everett, Logan; Singh, Larry; Hannenhalli, Sridhar

    Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation. Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in a (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on three biological cases in cow and in yeast. While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.

  6. Simulated maximum likelihood method for estimating kinetic rates in gene expression.

    PubMed

    Tian, Tianhai; Xu, Songlin; Gao, Junbin; Burrage, Kevin

    2007-01-01

    Kinetic rate in gene expression is a key measurement of the stability of gene products and gives important information for the reconstruction of genetic regulatory networks. Recent developments in experimental technologies have made it possible to measure the numbers of transcripts and protein molecules in single cells. Although estimation methods based on deterministic models have been proposed aimed at evaluating kinetic rates from experimental observations, these methods cannot tackle noise in gene expression that may arise from discrete processes of gene expression, small numbers of mRNA transcript, fluctuations in the activity of transcriptional factors and variability in the experimental environment. In this paper, we develop effective methods for estimating kinetic rates in genetic regulatory networks. The simulated maximum likelihood method is used to evaluate parameters in stochastic models described by either stochastic differential equations or discrete biochemical reactions. Different types of non-parametric density functions are used to measure the transitional probability of experimental observations. For stochastic models described by biochemical reactions, we propose to use the simulated frequency distribution to evaluate the transitional density based on the discrete nature of stochastic simulations. The genetic optimization algorithm is used as an efficient tool to search for optimal reaction rates. Numerical results indicate that the proposed methods can give robust estimations of kinetic rates with good accuracy.

  7. Modeling Bi-modality Improves Characterization of Cell Cycle on Gene Expression in Single Cells

    PubMed Central

    Danaher, Patrick; Finak, Greg; Krouse, Michael; Wang, Alice; Webster, Philippa; Beechem, Joseph; Gottardo, Raphael

    2014-01-01

    Advances in high-throughput, single cell gene expression are allowing interrogation of cell heterogeneity. However, there is concern that the cell cycle phase of a cell might bias characterizations of gene expression at the single-cell level. We assess the effect of cell cycle phase on gene expression in single cells by measuring 333 genes in 930 cells across three phases and three cell lines. We determine each cell's phase non-invasively without chemical arrest and use it as a covariate in tests of differential expression. We observe bi-modal gene expression, a previously-described phenomenon, wherein the expression of otherwise abundant genes is either strongly positive, or undetectable within individual cells. This bi-modality is likely both biologically and technically driven. Irrespective of its source, we show that it should be modeled to draw accurate inferences from single cell expression experiments. To this end, we propose a semi-continuous modeling framework based on the generalized linear model, and use it to characterize genes with consistent cell cycle effects across three cell lines. Our new computational framework improves the detection of previously characterized cell-cycle genes compared to approaches that do not account for the bi-modality of single-cell data. We use our semi-continuous modelling framework to estimate single cell gene co-expression networks. These networks suggest that in addition to having phase-dependent shifts in expression (when averaged over many cells), some, but not all, canonical cell cycle genes tend to be co-expressed in groups in single cells. We estimate the amount of single cell expression variability attributable to the cell cycle. We find that the cell cycle explains only 5%–17% of expression variability, suggesting that the cell cycle will not tend to be a large nuisance factor in analysis of the single cell transcriptome. PMID:25032992

  8. Gene expression distribution deconvolution in single-cell RNA sequencing.

    PubMed

    Wang, Jingshu; Huang, Mo; Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Murray, John; Raj, Arjun; Li, Mingyao; Zhang, Nancy R

    2018-06-26

    Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene's expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND's noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers. Copyright © 2018 the Author(s). Published by PNAS.

  9. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray

    PubMed Central

    Carter, Mark G; Sharov, Alexei A; VanBuren, Vincent; Dudekula, Dawood B; Carmack, Condie E; Nelson, Charlie; Ko, Minoru SH

    2005-01-01

    The ability to quantitatively measure the expression of all genes in a given tissue or cell with a single assay is an exciting promise of gene-expression profiling technology. An in situ-synthesized 60-mer oligonucleotide microarray designed to detect transcripts from all mouse genes was validated, as well as a set of exogenous RNA controls derived from the yeast genome (made freely available without restriction), which allow quantitative estimation of absolute endogenous transcript abundance. PMID:15998450

  10. Estimating the age of Lucilia illustris during the intrapuparial period using two approaches: Morphological changes and differential gene expression.

    PubMed

    Wang, Yu; Gu, Zhi-Ya; Xia, Shui-Xiu; Wang, Jiang-Feng; Zhang, Ying-Na; Tao, Lu-Yang

    2018-06-01

    Lucilia illustris (Meigen, 1826) (Diptera: Calliphoridae) is a cosmopolitan species of fly that has forensic and medical significance. However, there is no relevant study regarding the determination of the age of this species during the intrapuparial period. In this study, we investigated the changes in both morphology and differential gene expression during intrapuparial development, with an aim to estimate the age of L. illustris during the intrapuparial stage. The overall intrapuparial morphological changes of L. illustris were divided into 12 substages. Structures such as the compound eyes, mouthparts, antennae, thorax, legs, wings, and abdomen, each capable of indicating age during the intrapuparial stage, were observed in detail, and the developmental progression of each of these structures was divided into six to eight stages. We recorded the time range over which each substage or structure appeared. The differential expression of the three genes 15_2, actin, and tbp previously identified for predicting the timing of intrapuparial development was measured during L. illustris metamorphosis. The expression of these genes was quantified by real-time PCR, and the results revealed that these genes can be used to estimate the age of L. illustris during the intrapuparial period, as they exhibit regular changes and temperature dependence. This study provides an important basis for estimating the minimum postmortem interval (PMI min ) in forensic entomology according to changes in intrapuparial development and differential gene expression. Furthermore, combination of the two approaches can generate a more precise PMI min than either approach alone. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Discovering graphical Granger causality using the truncating lasso penalty

    PubMed Central

    Shojaie, Ali; Michailidis, George

    2010-01-01

    Motivation: Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering regulatory interactions among genes is therefore an important problem in systems biology. Whole-genome expression data over time provides an opportunity to determine how the expression levels of genes are affected by changes in transcription levels of other genes, and can therefore be used to discover regulatory interactions among genes. Results: In this article, we propose a novel penalization method, called truncating lasso, for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the underlying time series, and improves the performance of the lasso-type estimators. Moreover, the resulting estimate provides information on the time lag between activation of transcription factors and their effects on regulated genes. We provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consistently discover causal relationships in the large p, small n setting. The performance of the proposed model is evaluated favorably in simulated, as well as real, data examples. Availability: The proposed truncating lasso method is implemented in the R-package ‘grangerTlasso’ and is freely available at http://www.stat.lsa.umich.edu/∼shojaie/ Contact: shojaie@umich.edu PMID:20823316

  12. Codon usage and amino acid usage influence genes expression level.

    PubMed

    Paul, Prosenjit; Malakar, Arup Kumar; Chakraborty, Supriyo

    2018-02-01

    Highly expressed genes in any species differ in the usage frequency of synonymous codons. The relative recurrence of an event of the favored codon pair (amino acid pairs) varies between gene and genomes due to varying gene expression and different base composition. Here we propose a new measure for predicting the gene expression level, i.e., codon plus amino bias index (CABI). Our approach is based on the relative bias of the favored codon pair inclination among the genes, illustrated by analyzing the CABI score of the Medicago truncatula genes. CABI showed strong correlation with all other widely used measures (CAI, RCBS, SCUO) for gene expression analysis. Surprisingly, CABI outperforms all other measures by showing better correlation with the wet-lab data. This emphasizes the importance of the neighboring codons of the favored codon in a synonymous group while estimating the expression level of a gene.

  13. Estimation of Dynamic Systems for Gene Regulatory Networks from Dependent Time-Course Data.

    PubMed

    Kim, Yoonji; Kim, Jaejik

    2018-06-15

    Dynamic system consisting of ordinary differential equations (ODEs) is a well-known tool for describing dynamic nature of gene regulatory networks (GRNs), and the dynamic features of GRNs are usually captured through time-course gene expression data. Owing to high-throughput technologies, time-course gene expression data have complex structures such as heteroscedasticity, correlations between genes, and time dependence. Since gene experiments typically yield highly noisy data with small sample size, for a more accurate prediction of the dynamics, the complex structures should be taken into account in ODE models. Hence, this study proposes an ODE model considering such data structures and a fast and stable estimation method for the ODE parameters based on the generalized profiling approach with data smoothing techniques. The proposed method also provides statistical inference for the ODE estimator and it is applied to a zebrafish retina cell network.

  14. Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

    PubMed

    Nariai, N; Kim, S; Imoto, S; Miyano, S

    2004-01-01

    We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.

  15. Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications.

    PubMed

    Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S

    2015-08-07

    Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.

  16. Adjusting for background mutation frequency biases improves the identification of cancer driver genes.

    PubMed

    Evans, Perry; Avey, Stefan; Kong, Yong; Krauthammer, Michael

    2013-09-01

    A common goal of tumor sequencing projects is finding genes whose mutations are selected for during tumor development. This is accomplished by choosing genes that have more non-synonymous mutations than expected from an estimated background mutation frequency. While this background frequency is unknown, it can be estimated using both the observed synonymous mutation frequency and the non-synonymous to synonymous mutation ratio. The synonymous mutation frequency can be determined across all genes or in a gene-specific manner. This choice introduces an interesting trade-off. A gene-specific frequency adjusts for an underlying mutation bias, but is difficult to estimate given missing synonymous mutation counts. Using a genome-wide synonymous frequency is more robust, but is less suited for adjusting biases. Studying four evaluation criteria for identifying genes with high non-synonymous mutation burden (reflecting preferential selection of expressed genes, genes with mutations in conserved bases, genes with many protein interactions, and genes that show loss of heterozygosity), we find that the gene-specific synonymous frequency is superior in the gene expression and protein interaction tests. In conclusion, the use of the gene-specific synonymous mutation frequency is well suited for assessing a gene's non-synonymous mutation burden.

  17. Maternal residential air pollution and placental imprinted gene expression.

    PubMed

    Kingsley, Samantha L; Deyssenroth, Maya A; Kelsey, Karl T; Awad, Yara Abu; Kloog, Itai; Schwartz, Joel D; Lambertini, Luca; Chen, Jia; Marsit, Carmen J; Wellenius, Gregory A

    2017-11-01

    Maternal exposure to air pollution is associated with reduced fetal growth, but its relationship with expression of placental imprinted genes (important regulators of fetal growth) has not yet been studied. To examine relationships between maternal residential air pollution and expression of placental imprinted genes in the Rhode Island Child Health Study (RICHS). Women-infant pairs were enrolled following delivery between 2009 and 2013. We geocoded maternal residential addresses at delivery, estimated daily levels of fine particulate matter (PM 2.5 ; n=355) and black carbon (BC; n=336) using spatial-temporal models, and estimated residential distance to nearest major roadway (n=355). Using linear regression models we investigated the associations between each exposure metric and expression of nine candidate genes previously associated with infant birthweight in RICHS, with secondary analyses of a panel of 108 imprinted genes expressed in the placenta. We also explored effect measure modification by infant sex. PM 2.5 and BC were associated with altered expression for seven and one candidate genes, respectively, previously linked with birthweight in this cohort. Adjusting for multiple comparisons, we found that PM 2.5 and BC were associated with changes in expression of 41 and 12 of 108 placental imprinted genes, respectively. Infant sex modified the association between PM 2.5 and expression of CHD7 and between proximity to major roadways and expression of ZDBF2. We found that maternal exposure to residential PM 2.5 and BC was associated with changes in placental imprinted gene expression, which suggests a plausible line of investigation of how air pollution affects fetal growth and development. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Cytosolic T3-binding protein modulates dynamic alteration of T3-mediated gene expression in cells.

    PubMed

    Takeshige, Keiko; Sekido, Takashi; Kitahara, Jun-ichirou; Ohkubo, Yousuke; Hiwatashi, Dai; Ishii, Hiroaki; Nishio, Shin-ichi; Takeda, Teiji; Komatsu, Mitsuhisa; Suzuki, Satoru

    2014-01-01

    μ-Crystallin (CRYM) is also known as NADPH-dependent cytosolic T3-binding protein. A study using CRYM-null mice suggested that CRYM stores triiodothyronine (T3) in tissues. We previously established CRYM-expressing cells derived from parental GH3 cells. To examine the precise regulation of T3-responsive genes in the presence of CRYM, we evaluated serial alterations of T3-responsive gene expression by changing pericellular T3 concentrations in the media. We estimated the constitutive expression of three T3-responsive genes, growth hormone (GH), deiodinase 1 (Dio1), and deiodinase 2 (Dio2), in two cell lines. Subsequently, we measured the responsiveness of these three genes at 4, 8, 16, and 24 h after adding various concentrations of T3. We also estimated the levels of these mRNAs 24 and 48 h after removing T3. The levels of constitutive expression of GH and Dio1 were low and high in C8 cells, respectively, while Dio2 expression was not significantly different between GH3 and C8 cells. When treated with T3, Dio2 expression was significantly enhanced in C8 cells, while there were no differences in GH or Dio1 expression between GH3 and C8 cell lines. In contrast, removal of T3 retained the mRNA expression of GH and Dio2 in C8 cells. These results suggest that CRYM expression increases and sustains the T3 responsiveness of genes in cells, especially with alteration of the pericellular T3 concentration. The heterogeneity of T3-related gene expression is dependent on cellular CRYM expression in cases of dynamic changes in pericellular T3 concentration.

  19. Reverse engineering gene regulatory networks from measurement with missing values.

    PubMed

    Ogundijo, Oyetunji E; Elmas, Abdulkadir; Wang, Xiaodong

    2016-12-01

    Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values. We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements . The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae . PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.

  20. Detection of Diurnal Variation of Tomato Transcriptome through the Molecular Timetable Method in a Sunlight-Type Plant Factory.

    PubMed

    Higashi, Takanobu; Tanigaki, Yusuke; Takayama, Kotaro; Nagano, Atsushi J; Honjo, Mie N; Fukuda, Hirokazu

    2016-01-01

    The timing of measurement during plant growth is important because many genes are expressed periodically and orchestrate physiological events. Their periodicity is generated by environmental fluctuations as external factors and the circadian clock as the internal factor. The circadian clock orchestrates physiological events such as photosynthesis or flowering and it enables enhanced growth and herbivory resistance. These characteristics have possible applications for agriculture. In this study, we demonstrated the diurnal variation of the transcriptome in tomato (Solanum lycopersicum) leaves through molecular timetable method in a sunlight-type plant factory. Molecular timetable methods have been developed to detect periodic genes and estimate individual internal body time from these expression profiles in mammals. We sampled tomato leaves every 2 h for 2 days and acquired time-course transcriptome data by RNA-Seq. Many genes were expressed periodically and these expressions were stable across the 1st and 2nd days of measurement. We selected 143 time-indicating genes whose expression indicated periodically, and estimated internal time in the plant from these expression profiles. The estimated internal time was generally the same as the external environment time; however, there was a difference of more than 1 h between the two for some sampling points. Furthermore, the stress-responsive genes also showed weakly periodic expression, implying that they were usually expressed periodically, regulated by light-dark cycles as an external factor or the circadian clock as the internal factor, and could be particularly expressed when the plant experiences some specific stress under agricultural situations. This study suggests that circadian clock mediate the optimization for fluctuating environments in the field and it has possibilities to enhance resistibility to stress and floral induction by controlling circadian clock through light supplement and temperature control.

  1. Detection of Diurnal Variation of Tomato Transcriptome through the Molecular Timetable Method in a Sunlight-Type Plant Factory

    PubMed Central

    Higashi, Takanobu; Tanigaki, Yusuke; Takayama, Kotaro; Nagano, Atsushi J.; Honjo, Mie N.; Fukuda, Hirokazu

    2016-01-01

    The timing of measurement during plant growth is important because many genes are expressed periodically and orchestrate physiological events. Their periodicity is generated by environmental fluctuations as external factors and the circadian clock as the internal factor. The circadian clock orchestrates physiological events such as photosynthesis or flowering and it enables enhanced growth and herbivory resistance. These characteristics have possible applications for agriculture. In this study, we demonstrated the diurnal variation of the transcriptome in tomato (Solanum lycopersicum) leaves through molecular timetable method in a sunlight-type plant factory. Molecular timetable methods have been developed to detect periodic genes and estimate individual internal body time from these expression profiles in mammals. We sampled tomato leaves every 2 h for 2 days and acquired time-course transcriptome data by RNA-Seq. Many genes were expressed periodically and these expressions were stable across the 1st and 2nd days of measurement. We selected 143 time-indicating genes whose expression indicated periodically, and estimated internal time in the plant from these expression profiles. The estimated internal time was generally the same as the external environment time; however, there was a difference of more than 1 h between the two for some sampling points. Furthermore, the stress-responsive genes also showed weakly periodic expression, implying that they were usually expressed periodically, regulated by light–dark cycles as an external factor or the circadian clock as the internal factor, and could be particularly expressed when the plant experiences some specific stress under agricultural situations. This study suggests that circadian clock mediate the optimization for fluctuating environments in the field and it has possibilities to enhance resistibility to stress and floral induction by controlling circadian clock through light supplement and temperature control. PMID:26904059

  2. Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis

    PubMed Central

    Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun

    2013-01-01

    The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867

  3. NETWORK ASSISTED ANALYSIS TO REVEAL THE GENETIC BASIS OF AUTISM1

    PubMed Central

    Liu, Li; Lei, Jing; Roeder, Kathryn

    2016-01-01

    While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk. PMID:27134692

  4. General statistics of stochastic process of gene expression in eukaryotic cells.

    PubMed Central

    Kuznetsov, V A; Knott, G D; Bonner, R F

    2002-01-01

    Thousands of genes are expressed at such very low levels (< or =1 copy per cell) that global gene expression analysis of rarer transcripts remains problematic. Ambiguity in identification of rarer transcripts creates considerable uncertainty in fundamental questions such as the total number of genes expressed in an organism and the biological significance of rarer transcripts. Knowing the distribution of the true number of genes expressed at each level and the corresponding gene expression level probability function (GELPF) could help resolve these uncertainties. We found that all observed large-scale gene expression data sets in yeast, mouse, and human cells follow a Pareto-like distribution model skewed by many low-abundance transcripts. A novel stochastic model of the gene expression process predicts the universality of the GELPF both across different cell types within a multicellular organism and across different organisms. This model allows us to predict the frequency distribution of all gene expression levels within a single cell and to estimate the number of expressed genes in a single cell and in a population of cells. A random "basal" transcription mechanism for protein-coding genes in all or almost all eukaryotic cell types is predicted. This fundamental mechanism might enhance the expression of rarely expressed genes and, thus, provide a basic level of phenotypic diversity, adaptability, and random monoallelic expression in cell populations. PMID:12136033

  5. A Hybrid One-Way ANOVA Approach for the Robust and Efficient Estimation of Differential Gene Expression with Multiple Patterns

    PubMed Central

    Mollah, Mohammad Manir Hossain; Jamal, Rahman; Mokhtar, Norfilza Mohd; Harun, Roslan; Mollah, Md. Nurul Haque

    2015-01-01

    Background Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression. Results The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA. Conclusion Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression. PMID:26413858

  6. Characterization of Changes in Gene Expression and Biochemical Pathways at Low Levels of Benzene Exposure

    PubMed Central

    Thomas, Reuben; Hubbard, Alan E.; McHale, Cliona M.; Zhang, Luoping; Rappaport, Stephen M.; Lan, Qing; Rothman, Nathaniel; Vermeulen, Roel; Guyton, Kathryn Z.; Jinot, Jennifer; Sonawane, Babasaheb R.; Smith, Martyn T.

    2014-01-01

    Benzene, a ubiquitous environmental pollutant, causes acute myeloid leukemia (AML). Recently, through transcriptome profiling of peripheral blood mononuclear cells (PBMC), we reported dose-dependent effects of benzene exposure on gene expression and biochemical pathways in 83 workers exposed across four airborne concentration ranges (from <1 ppm to >10 ppm) compared with 42 subjects with non-workplace ambient exposure levels. Here, we further characterize these dose-dependent effects with continuous benzene exposure in all 125 study subjects. We estimated air benzene exposure levels in the 42 environmentally-exposed subjects from their unmetabolized urinary benzene levels. We used a novel non-parametric, data-adaptive model selection method to estimate the change with dose in the expression of each gene. We describe non-parametric approaches to model pathway responses and used these to estimate the dose responses of the AML pathway and 4 other pathways of interest. The response patterns of majority of genes as captured by mean estimates of the first and second principal components of the dose-response for the five pathways and the profiles of 6 AML pathway response-representative genes (identified by clustering) exhibited similar apparent supra-linear responses. Responses at or below 0.1 ppm benzene were observed for altered expression of AML pathway genes and CYP2E1. Together, these data show that benzene alters disease-relevant pathways and genes in a dose-dependent manner, with effects apparent at doses as low as 100 ppb in air. Studies with extensive exposure assessment of subjects exposed in the low-dose range between 10 ppb and 1 ppm are needed to confirm these findings. PMID:24786086

  7. Robust Gaussian Graphical Modeling via l1 Penalization

    PubMed Central

    Sun, Hokeun; Li, Hongzhe

    2012-01-01

    Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso. PMID:23020775

  8. Evaluating intra- and inter-individual variation in the human placental transcriptome.

    PubMed

    Hughes, David A; Kircher, Martin; He, Zhisong; Guo, Song; Fairbrother, Genevieve L; Moreno, Carlos S; Khaitovich, Philipp; Stoneking, Mark

    2015-03-19

    Gene expression variation is a phenotypic trait of particular interest as it represents the initial link between genotype and other phenotypes. Analyzing how such variation apportions among and within groups allows for the evaluation of how genetic and environmental factors influence such traits. It also provides opportunities to identify genes and pathways that may have been influenced by non-neutral processes. Here we use a population genetics framework and next generation sequencing to evaluate how gene expression variation is apportioned among four human groups in a natural biological tissue, the placenta. We estimate that on average, 33.2%, 58.9%, and 7.8% of the placental transcriptome is explained by variation within individuals, among individuals, and among human groups, respectively. Additionally, when technical and biological traits are included in models of gene expression they each account for roughly 2% of total gene expression variation. Notably, the variation that is significantly different among groups is enriched in biological pathways associated with immune response, cell signaling, and metabolism. Many biological traits demonstrate correlated changes in expression in numerous pathways of potential interest to clinicians and evolutionary biologists. Finally, we estimate that the majority of the human placental transcriptome exhibits expression profiles consistent with neutrality; the remainder are consistent with stabilizing selection, directional selection, or diversifying selection. We apportion placental gene expression variation into individual, population, and biological trait factors and identify how each influence the transcriptome. Additionally, we advance methods to associate expression profiles with different forms of selection.

  9. Quantifying whole transcriptome size, a prerequisite for understanding transcriptome evolution across species: an example from a plant allopolyploid.

    PubMed

    Coate, Jeremy E; Doyle, Jeff J

    2010-01-01

    Evolutionary biologists are increasingly comparing gene expression patterns across species. Due to the way in which expression assays are normalized, such studies provide no direct information about expression per gene copy (dosage responses) or per cell and can give a misleading picture of genes that are differentially expressed. We describe an assay for estimating relative expression per cell. When used in conjunction with transcript profiling data, it is possible to compare the sizes of whole transcriptomes, which in turn makes it possible to compare expression per cell for each gene in the transcript profiling data set. We applied this approach, using quantitative reverse transcriptase-polymerase chain reaction and high throughput RNA sequencing, to a recently formed allopolyploid and showed that its leaf transcriptome was approximately 1.4-fold larger than either progenitor transcriptome (70% of the sum of the progenitor transcriptomes). In contrast, the allopolyploid genome is 94.3% as large as the sum of its progenitor genomes and retains > or =93.5% of the sum of its progenitor gene complements. Thus, "transcriptome downsizing" is greater than genome downsizing. Using this transcriptome size estimate, we inferred dosage responses for several thousand genes and showed that the majority exhibit partial dosage compensation. Homoeologue silencing is nonrandomly distributed across dosage responses, with genes showing extreme responses in either direction significantly more likely to have a silent homoeologue. This experimental approach will add value to transcript profiling experiments involving interspecies and interploidy comparisons by converting expression per transcriptome to expression per genome, eliminating the need for assumptions about transcriptome size.

  10. Reconstructing Dynamic Promoter Activity Profiles from Reporter Gene Data.

    PubMed

    Kannan, Soumya; Sams, Thomas; Maury, Jérôme; Workman, Christopher T

    2018-03-16

    Accurate characterization of promoter activity is important when designing expression systems for systems biology and metabolic engineering applications. Promoters that respond to changes in the environment enable the dynamic control of gene expression without the necessity of inducer compounds, for example. However, the dynamic nature of these processes poses challenges for estimating promoter activity. Most experimental approaches utilize reporter gene expression to estimate promoter activity. Typically the reporter gene encodes a fluorescent protein that is used to infer a constant promoter activity despite the fact that the observed output may be dynamic and is a number of steps away from the transcription process. In fact, some promoters that are often thought of as constitutive can show changes in activity when growth conditions change. For these reasons, we have developed a system of ordinary differential equations for estimating dynamic promoter activity for promoters that change their activity in response to the environment that is robust to noise and changes in growth rate. Our approach, inference of dynamic promoter activity (PromAct), improves on existing methods by more accurately inferring known promoter activity profiles. This method is also capable of estimating the correct scale of promoter activity and can be applied to quantitative data sets to estimate quantitative rates.

  11. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction.

    PubMed

    Yu, Yao; Tu, Kang; Zheng, Siyuan; Li, Yun; Ding, Guohui; Ping, Jie; Hao, Pei; Li, Yixue

    2009-08-25

    In the post-genomic era, the development of high-throughput gene expression detection technology provides huge amounts of experimental data, which challenges the traditional pipelines for data processing and analyzing in scientific researches. In our work, we integrated gene expression information from Gene Expression Omnibus (GEO), biomedical ontology from Medical Subject Headings (MeSH) and signaling pathway knowledge from sigPathway entries to develop a context mining tool for gene expression analysis - GEOGLE. GEOGLE offers a rapid and convenient way for searching relevant experimental datasets, pathways and biological terms according to multiple types of queries: including biomedical vocabularies, GDS IDs, gene IDs, pathway names and signature list. Moreover, GEOGLE summarizes the signature genes from a subset of GDSes and estimates the correlation between gene expression and the phenotypic distinction with an integrated p value. This approach performing global searching of expression data may expand the traditional way of collecting heterogeneous gene expression experiment data. GEOGLE is a novel tool that provides researchers a quantitative way to understand the correlation between gene expression and phenotypic distinction through meta-analysis of gene expression datasets from different experiments, as well as the biological meaning behind. The web site and user guide of GEOGLE are available at: http://omics.biosino.org:14000/kweb/workflow.jsp?id=00020.

  12. Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.

    PubMed

    Serra, Angela; Coretto, Pietro; Fratello, Michele; Tagliaferri, Roberto; Stegle, Oliver

    2018-02-15

    Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input. The R software is available at https://github.com/angy89/RobustSparseCorrelation. aserra@unisa.it or robtag@unisa.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. Benchmark Dose Modeling Estimates of the Concentrations of Inorganic Arsenic That Induce Changes to the Neonatal Transcriptome, Proteome, and Epigenome in a Pregnancy Cohort.

    PubMed

    Rager, Julia E; Auerbach, Scott S; Chappell, Grace A; Martin, Elizabeth; Thompson, Chad M; Fry, Rebecca C

    2017-10-16

    Prenatal inorganic arsenic (iAs) exposure influences the expression of critical genes and proteins associated with adverse outcomes in newborns, in part through epigenetic mediators. The doses at which these genomic and epigenomic changes occur have yet to be evaluated in the context of dose-response modeling. The goal of the present study was to estimate iAs doses that correspond to changes in transcriptomic, proteomic, epigenomic, and integrated multi-omic signatures in human cord blood through benchmark dose (BMD) modeling. Genome-wide DNA methylation, microRNA expression, mRNA expression, and protein expression levels in cord blood were modeled against total urinary arsenic (U-tAs) levels from pregnant women exposed to varying levels of iAs. Dose-response relationships were modeled in BMDExpress, and BMDs representing 10% response levels were estimated. Overall, DNA methylation changes were estimated to occur at lower exposure concentrations in comparison to other molecular endpoints. Multi-omic module eigengenes were derived through weighted gene co-expression network analysis, representing co-modulated signatures across transcriptomic, proteomic, and epigenomic profiles. One module eigengene was associated with decreased gestational age occurring alongside increased iAs exposure. Genes/proteins within this module eigengene showed enrichment for organismal development, including potassium voltage-gated channel subfamily Q member 1 (KCNQ1), an imprinted gene showing differential methylation and expression in response to iAs. Modeling of this prioritized multi-omic module eigengene resulted in a BMD(BMDL) of 58(45) μg/L U-tAs, which was estimated to correspond to drinking water arsenic concentrations of 51(40) μg/L. Results are in line with epidemiological evidence supporting effects of prenatal iAs occurring at levels <100 μg As/L urine. Together, findings present a variety of BMD measures to estimate doses at which prenatal iAs exposure influences neonatal outcome-relevant transcriptomic, proteomic, and epigenomic profiles.

  14. Digital sorting of complex tissues for cell type-specific gene expression profiles.

    PubMed

    Zhong, Yi; Wan, Ying-Wooi; Pang, Kaifang; Chow, Lionel M L; Liu, Zhandong

    2013-03-07

    Cellular heterogeneity is present in almost all gene expression profiles. However, transcriptome analysis of tissue specimens often ignores the cellular heterogeneity present in these samples. Standard deconvolution algorithms require prior knowledge of the cell type frequencies within a tissue or their in vitro expression profiles. Furthermore, these algorithms tend to report biased estimations. Here, we describe a Digital Sorting Algorithm (DSA) for extracting cell-type specific gene expression profiles from mixed tissue samples that is unbiased and does not require prior knowledge of cell type frequencies. The results suggest that DSA is a specific and sensitivity algorithm in gene expression profile deconvolution and will be useful in studying individual cell types of complex tissues.

  15. Defining suitable reference genes for RT-qPCR analysis on human sertoli cells after 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) exposure.

    PubMed

    Ribeiro, Mariana Antunes; dos Reis, Mariana Bisarro; de Moraes, Leonardo Nazário; Briton-Jones, Christine; Rainho, Cláudia Aparecida; Scarano, Wellerson Rodrigo

    2014-11-01

    Quantitative real-time RT-PCR (qPCR) has proven to be a valuable molecular technique to quantify gene expression. There are few studies in the literature that describe suitable reference genes to normalize gene expression data. Studies of transcriptionally disruptive toxins, like tetrachlorodibenzo-p-dioxin (TCDD), require careful consideration of reference genes. The present study was designed to validate potential reference genes in human Sertoli cells after exposure to TCDD. 32 candidate reference genes were analyzed to determine their applicability. geNorm and NormFinder softwares were used to obtain an estimation of the expression stability of the 32 genes and to identify the most suitable genes for qPCR data normalization.

  16. Integrating Genomic Analysis with the Genetic Basis of Gene Expression: Preliminary Evidence of the Identification of Causal Genes for Cardiovascular and Metabolic Traits Related to Nutrition in Mexicans123

    PubMed Central

    Bastarrachea, Raúl A.; Gallegos-Cabriales, Esther C.; Nava-González, Edna J.; Haack, Karin; Voruganti, V. Saroja; Charlesworth, Jac; Laviada-Molina, Hugo A.; Veloz-Garza, Rosa A.; Cardenas-Villarreal, Velia Margarita; Valdovinos-Chavez, Salvador B.; Gomez-Aguilar, Patricia; Meléndez, Guillermo; López-Alvarenga, Juan Carlos; Göring, Harald H. H.; Cole, Shelley A.; Blangero, John; Comuzzie, Anthony G.; Kent, Jack W.

    2012-01-01

    Whole-transcriptome expression profiling provides novel phenotypes for analysis of complex traits. Gene expression measurements reflect quantitative variation in transcript-specific messenger RNA levels and represent phenotypes lying close to the action of genes. Understanding the genetic basis of gene expression will provide insight into the processes that connect genotype to clinically significant traits representing a central tenet of system biology. Synchronous in vivo expression profiles of lymphocytes, muscle, and subcutaneous fat were obtained from healthy Mexican men. Most genes were expressed at detectable levels in multiple tissues, and RNA levels were correlated between tissue types. A subset of transcripts with high reliability of expression across tissues (estimated by intraclass correlation coefficients) was enriched for cis-regulated genes, suggesting that proximal sequence variants may influence expression similarly in different cellular environments. This integrative global gene expression profiling approach is proving extremely useful for identifying genes and pathways that contribute to complex clinical traits. Clearly, the coincidence of clinical trait quantitative trait loci and expression quantitative trait loci can help in the prioritization of positional candidate genes. Such data will be crucial for the formal integration of positional and transcriptomic information characterized as genetical genomics. PMID:22797999

  17. Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression.

    PubMed

    Lemieux, Sébastien

    2006-08-25

    The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear Model) method implements a linear model applied on the probe-level data to directly estimate the treatment effect. A finite mixture of Gaussian components is then used to identify DEGs using the coefficients estimated by the linear model. This approach can readily be applied to experimental design with or without replication. On a wholly defined dataset, the PL-LM method was able to identify 75% of the differentially expressed genes within 10% of false positives. This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition. The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition.

  18. SiGN-SSM: open source parallel software for estimating gene networks with state space models.

    PubMed

    Tamada, Yoshinori; Yamaguchi, Rui; Imoto, Seiya; Hirose, Osamu; Yoshida, Ryo; Nagasaki, Masao; Miyano, Satoru

    2011-04-15

    SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles. SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site. tamada@ims.u-tokyo.ac.jp.

  19. Robust Learning of High-dimensional Biological Networks with Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Nägele, Andreas; Dejori, Mathäus; Stetter, Martin

    Structure learning of Bayesian networks applied to gene expression data has become a potentially useful method to estimate interactions between genes. However, the NP-hardness of Bayesian network structure learning renders the reconstruction of the full genetic network with thousands of genes unfeasible. Consequently, the maximal network size is usually restricted dramatically to a small set of genes (corresponding with variables in the Bayesian network). Although this feature reduction step makes structure learning computationally tractable, on the downside, the learned structure might be adversely affected due to the introduction of missing genes. Additionally, gene expression data are usually very sparse with respect to the number of samples, i.e., the number of genes is much greater than the number of different observations. Given these problems, learning robust network features from microarray data is a challenging task. This chapter presents several approaches tackling the robustness issue in order to obtain a more reliable estimation of learned network features.

  20. AUCTSP: an improved biomarker gene pair class predictor.

    PubMed

    Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin T

    2018-06-26

    The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

  1. LEAP: constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering.

    PubMed

    Specht, Alicia T; Li, Jun

    2017-03-01

    To construct gene co-expression networks based on single-cell RNA-Sequencing data, we present an algorithm called LEAP, which utilizes the estimated pseudotime of the cells to find gene co-expression that involves time delay. R package LEAP available on CRAN. jun.li@nd.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  2. A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

    PubMed

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.

  3. A Nonparametric Mean-Variance Smoothing Method to Assess Arabidopsis Cold Stress Transcriptional Regulator CBF2 Overexpression Microarray Data

    PubMed Central

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181

  4. Directed Shotgun Proteomics Guided by Saturated RNA-seq Identifies a Complete Expressed Prokaryotic Proteome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Omasits, U.; Quebatte, Maxime; Stekhoven, Daniel J.

    2013-11-01

    Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, wemore » could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ~90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.« less

  5. Directed shotgun proteomics guided by saturated RNA-seq identifies a complete expressed prokaryotic proteome

    PubMed Central

    Omasits, Ulrich; Quebatte, Maxime; Stekhoven, Daniel J.; Fortes, Claudia; Roschitzki, Bernd; Robinson, Mark D.; Dehio, Christoph; Ahrens, Christian H.

    2013-01-01

    Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, we could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ∼90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor. PMID:23878158

  6. A framework for analyzing the relationship between gene expression and morphological, topological, and dynamical patterns in neuronal networks.

    PubMed

    de Arruda, Henrique Ferraz; Comin, Cesar Henrique; Miazaki, Mauro; Viana, Matheus Palhares; Costa, Luciano da Fontoura

    2015-04-30

    A key point in developmental biology is to understand how gene expression influences the morphological and dynamical patterns that are observed in living beings. In this work we propose a methodology capable of addressing this problem that is based on estimating the mutual information and Pearson correlation between the intensity of gene expression and measurements of several morphological properties of the cells. A similar approach is applied in order to identify effects of gene expression over the system dynamics. Neuronal networks were artificially grown over a lattice by considering a reference model used to generate artificial neurons. The input parameters of the artificial neurons were determined according to two distinct patterns of gene expression and the dynamical response was assessed by considering the integrate-and-fire model. As far as single gene dependence is concerned, we found that the interaction between the gene expression and the network topology, as well as between the former and the dynamics response, is strongly affected by the gene expression pattern. In addition, we observed a high correlation between the gene expression and some topological measurements of the neuronal network for particular patterns of gene expression. To our best understanding, there are no similar analyses to compare with. A proper understanding of gene expression influence requires jointly studying the morphology, topology, and dynamics of neurons. The proposed framework represents a first step towards predicting gene expression patterns from morphology and connectivity. Copyright © 2015. Published by Elsevier B.V.

  7. Identification of Primary Transcriptional Regulation of Cell Cycle-Regulated Genes upon DNA Damage

    PubMed Central

    Zhou, Tong; Chou, Jeff; Mullen, Thomas E.; Elkon, Rani; Zhou, Yingchun; Simpson, Dennis A.; Bushel, Pierre R.; Paules, Richard S.; Lobenhofer, Edward K.; Hurban, Patrick; Kaufmann, William K.

    2007-01-01

    The changes in global gene expression in response to DNA damage may derive from either direct induction or repression by transcriptional regulation or indirectly by synchronization of cells to specific cell cycle phases, such as G1 or G2. We developed a model that successfully estimated the expression levels of >400 cell cycle-regulated genes in normal human fibroblasts based on the proportions of cells in each phase of the cell cycle. By isolating effects on the gene expression associated with the cell cycle phase redistribution after genotoxin treatment, the direct transcriptional target genes were distinguished from genes for which expression changed secondary to cell synchronization. Application of this model to ionizing radiation (IR)-treated normal human fibroblasts identified 150 of 406 cycle-regulated genes as putative direct transcriptional targets of IR-induced DNA damage. Changes in expression of these genes after IR treatment derived from both direct transcriptional regulation and cell cycle synchronization. PMID:17404513

  8. Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference

    PubMed Central

    Campbell, Kieran R.

    2016-01-01

    Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. PMID:27870852

  9. The transcriptomic and evolutionary signature of social interactions regulating honey bee caste development.

    PubMed

    Vojvodic, Svjetlana; Johnson, Brian R; Harpur, Brock A; Kent, Clement F; Zayed, Amro; Anderson, Kirk E; Linksvayer, Timothy A

    2015-11-01

    The caste fate of developing female honey bee larvae is strictly socially regulated by adult nurse workers. As a result of this social regulation, nurse-expressed genes as well as larval-expressed genes may affect caste expression and evolution. We used a novel transcriptomic approach to identify genes with putative direct and indirect effects on honey bee caste development, and we subsequently studied the relative rates of molecular evolution at these caste-associated genes. We experimentally induced the production of new queens by removing the current colony queen, and we used RNA sequencing to study the gene expression profiles of both developing larvae and their caregiving nurses before and after queen removal. By comparing the gene expression profiles of queen-destined versus worker-destined larvae as well as nurses observed feeding these two types of larvae, we identified larval and nurse genes associated with caste development. Of 950 differentially expressed genes associated with caste, 82% were expressed in larvae with putative direct effects on larval caste, and 18% were expressed in nurses with putative indirect effects on caste. Estimated selection coefficients suggest that both nurse and larval genes putatively associated with caste are rapidly evolving, especially those genes associated with worker development. Altogether, our results suggest that indirect effect genes play important roles in both the expression and evolution of socially influenced traits such as caste.

  10. A random variance model for detection of differential gene expression in small microarray experiments.

    PubMed

    Wright, George W; Simon, Richard M

    2003-12-12

    Microarray techniques provide a valuable way of characterizing the molecular nature of disease. Unfortunately expense and limited specimen availability often lead to studies with small sample sizes. This makes accurate estimation of variability difficult, since variance estimates made on a gene by gene basis will have few degrees of freedom, and the assumption that all genes share equal variance is unlikely to be true. We propose a model by which the within gene variances are drawn from an inverse gamma distribution, whose parameters are estimated across all genes. This results in a test statistic that is a minor variation of those used in standard linear models. We demonstrate that the model assumptions are valid on experimental data, and that the model has more power than standard tests to pick up large changes in expression, while not increasing the rate of false positives. This method is incorporated into BRB-ArrayTools version 3.0 (http://linus.nci.nih.gov/BRB-ArrayTools.html). ftp://linus.nci.nih.gov/pub/techreport/RVM_supplement.pdf

  11. F-MAP: A Bayesian approach to infer the gene regulatory network using external hints

    PubMed Central

    Shahdoust, Maryam; Mahjub, Hossein; Sadeghi, Mehdi

    2017-01-01

    The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches. PMID:28938012

  12. Estimation of gene induction enables a relevance-based ranking of gene sets.

    PubMed

    Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens

    2009-07-01

    In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.

  13. Mapping cis- and trans-regulatory effects across multiple tissues in twins

    PubMed Central

    Grundberg, Elin; Small, Kerrin S.; Hedman, Åsa K.; Nica, Alexandra C.; Buil, Alfonso; Keildson, Sarah; Bell, Jordana T.; Yang, Tsun-Po; Meduri, Eshwar; Barrett, Amy; Nisbett, James; Sekowska, Magdalena; Wilk, Alicja; Shin, So-Youn; Glass, Daniel; Travers, Mary; Min, Josine L.; Ring, Sue; Ho, Karen; Thorleifsson, Gudmar; Kong, Augustine; Thorsteindottir, Unnur; Ainali, Chrysanthi; Dimas, Antigone S.; Hassanali, Neelam; Ingle, Catherine; Knowles, David; Krestyaninova, Maria; Lowe, Christopher E.; Di Meglio, Paola; Montgomery, Stephen B.; Parts, Leopold; Potter, Simon; Surdulescu, Gabriela; Tsaprouni, Loukia; Tsoka, Sophia; Bataille, Veronique; Durbin, Richard; Nestle, Frank O.; O’Rahilly, Stephen; Soranzo, Nicole; Lindgren, Cecilia M.; Zondervan, Krina T.; Ahmadi, Kourosh R.; Schadt, Eric E.; Stefansson, Kari; Smith, George Davey; McCarthy, Mark I.; Deloukas, Panos; Dermitzakis, Emmanouil T.; Spector, Tim D.

    2013-01-01

    Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many eQTL studies typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis-effect on expression cannot be accounted for by common cis-variants, a finding which exposes the contribution of low frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene and identify several replicating trans-variants which act predominantly in a tissue-restricted manner and may regulate the transcription of many genes. PMID:22941192

  14. Effects of CASP5 gene overexpression on angiogenesis of HMEC-1 cells.

    PubMed

    Li, Haiyan; Li, Yuzhen; Cai, Limin; Bai, Bingxue; Wang, Yanhua

    2015-01-01

    The efficacy of gene overexpression of CASP5, a caspase family member, in angiogenesis in vitro and its mechanisms were clarified. Human full-length CASP5 gene was delivered into human microvascular endothelial HMEC-1 cells by recombinant lentivirus. The infection was estimated by green fluorescent protein. MTT method was used to analyze the efficacy of gene overexpression in cell proliferation ability, and Matrigel was used to estimate its effects in angiogenesis ability of cells. Meanwhile, Western blot was used to analyze the effects of CASP5 gene overexpression on the expression levels of angpt-1, angpt-2, Tie2 and VEGF-1 in the cells, which were signaling pathway factors related to angiogenesis. Recombinant lentivirus containing human full-length CASP5 gene was packed and purified successfully, with virus titer of 1×10(8) TU/ml. The recombinant lentivirus was used to infect HMEC-1 cells with MOI of 1, leading to a cell infection rate of 100%. There were no significant effects of CASP5 gene overexpression on both cell proliferation ability and the expression level of angpt-1. Meanwhile, expressions of angpt-2 and VEGF-1 were both enhanced, while Tie2 expression was inhibited. Results indicated that CASP5 gene overexpression promoted angiogenesis of HMEC-1 cells. CASP5 gene overexpression significantly promoted angiogenesis ability of HMEC-1 cells, which was probably achieved by inhibiting angpt-1/Tie2 and promoting VEGF-1 signal pathway.

  15. Selection and Validation of Reference Genes for Quantitative Real-Time PCR in Buckwheat (Fagopyrum esculentum) Based on Transcriptome Sequence Data

    PubMed Central

    Demidenko, Natalia V.; Logacheva, Maria D.; Penin, Aleksey A.

    2011-01-01

    Quantitative reverse transcription PCR (qRT-PCR) is one of the most precise and widely used methods of gene expression analysis. A necessary prerequisite of exact and reliable data is the accurate choice of reference genes. We studied the expression stability of potential reference genes in common buckwheat (Fagopyrum esculentum) in order to find the optimal reference for gene expression analysis in this economically important crop. Recently sequenced buckwheat floral transcriptome was used as source of sequence information. Expression stability of eight candidate reference genes was assessed in different plant structures (leaves and inflorescences at two stages of development and fruits). These genes are the orthologs of Arabidopsis genes identified as stable in a genome-wide survey gene of expression stability and a traditionally used housekeeping gene GAPDH. Three software applications – geNorm, NormFinder and BestKeeper - were used to estimate expression stability and provided congruent results. The orthologs of AT4G33380 (expressed protein of unknown function, Expressed1), AT2G28390 (SAND family protein, SAND) and AT5G46630 (clathrin adapter complex subunit family protein, CACS) are revealed as the most stable. We recommend using the combination of Expressed1, SAND and CACS for the normalization of gene expression data in studies on buckwheat using qRT-PCR. These genes are listed among five the most stably expressed in Arabidopsis that emphasizes utility of the studies on model plants as a framework for other species. PMID:21589908

  16. The Global Error Assessment (GEA) model for the selection of differentially expressed genes in microarray data.

    PubMed

    Mansourian, Robert; Mutch, David M; Antille, Nicolas; Aubert, Jerome; Fogel, Paul; Le Goff, Jean-Marc; Moulin, Julie; Petrov, Anton; Rytz, Andreas; Voegel, Johannes J; Roberts, Matthew-Alan

    2004-11-01

    Microarray technology has become a powerful research tool in many fields of study; however, the cost of microarrays often results in the use of a low number of replicates (k). Under circumstances where k is low, it becomes difficult to perform standard statistical tests to extract the most biologically significant experimental results. Other more advanced statistical tests have been developed; however, their use and interpretation often remain difficult to implement in routine biological research. The present work outlines a method that achieves sufficient statistical power for selecting differentially expressed genes under conditions of low k, while remaining as an intuitive and computationally efficient procedure. The present study describes a Global Error Assessment (GEA) methodology to select differentially expressed genes in microarray datasets, and was developed using an in vitro experiment that compared control and interferon-gamma treated skin cells. In this experiment, up to nine replicates were used to confidently estimate error, thereby enabling methods of different statistical power to be compared. Gene expression results of a similar absolute expression are binned, so as to enable a highly accurate local estimate of the mean squared error within conditions. The model then relates variability of gene expression in each bin to absolute expression levels and uses this in a test derived from the classical ANOVA. The GEA selection method is compared with both the classical and permutational ANOVA tests, and demonstrates an increased stability, robustness and confidence in gene selection. A subset of the selected genes were validated by real-time reverse transcription-polymerase chain reaction (RT-PCR). All these results suggest that GEA methodology is (i) suitable for selection of differentially expressed genes in microarray data, (ii) intuitive and computationally efficient and (iii) especially advantageous under conditions of low k. The GEA code for R software is freely available upon request to authors.

  17. Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples.

    PubMed

    Reiman, Mario; Laan, Maris; Rull, Kristiina; Sõber, Siim

    2017-08-01

    RNA degradation is a ubiquitous process that occurs in living and dead cells, as well as during handling and storage of extracted RNA. Reduced RNA quality caused by degradation is an established source of uncertainty for all RNA-based gene expression quantification techniques. RNA sequencing is an increasingly preferred method for transcriptome analyses, and dependence of its results on input RNA integrity is of significant practical importance. This study aimed to characterize the effects of varying input RNA integrity [estimated as RNA integrity number (RIN)] on transcript level estimates and delineate the characteristic differences between transcripts that differ in degradation rate. The study used ribodepleted total RNA sequencing data from a real-life clinically collected set ( n = 32) of human solid tissue (placenta) samples. RIN-dependent alterations in gene expression profiles were quantified by using DESeq2 software. Our results indicate that small differences in RNA integrity affect gene expression quantification by introducing a moderate and pervasive bias in expression level estimates that significantly affected 8.1% of studied genes. The rapidly degrading transcript pool was enriched in pseudogenes, short noncoding RNAs, and transcripts with extended 3' untranslated regions. Typical slowly degrading transcripts (median length, 2389 nt) represented protein coding genes with 4-10 exons and high guanine-cytosine content.-Reiman, M., Laan, M., Rull, K., Sõber, S. Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples. © FASEB.

  18. A transcriptional dynamic network during Arabidopsis thaliana pollen development.

    PubMed

    Wang, Jigang; Qiu, Xiaojie; Li, Yuhua; Deng, Youping; Shi, Tieliu

    2011-01-01

    To understand transcriptional regulatory networks (TRNs), especially the coordinated dynamic regulation between transcription factors (TFs) and their corresponding target genes during development, computational approaches would represent significant advances in the genome-wide expression analysis. The major challenges for the experiments include monitoring the time-specific TFs' activities and identifying the dynamic regulatory relationships between TFs and their target genes, both of which are currently not yet available at the large scale. However, various methods have been proposed to computationally estimate those activities and regulations. During the past decade, significant progresses have been made towards understanding pollen development at each development stage under the molecular level, yet the regulatory mechanisms that control the dynamic pollen development processes remain largely unknown. Here, we adopt Networks Component Analysis (NCA) to identify TF activities over time course, and infer their regulatory relationships based on the coexpression of TFs and their target genes during pollen development. We carried out meta-analysis by integrating several sets of gene expression data related to Arabidopsis thaliana pollen development (stages range from UNM, BCP, TCP, HP to 0.5 hr pollen tube and 4 hr pollen tube). We constructed a regulatory network, including 19 TFs, 101 target genes and 319 regulatory interactions. The computationally estimated TF activities were well correlated to their coordinated genes' expressions during the development process. We clustered the expression of their target genes in the context of regulatory influences, and inferred new regulatory relationships between those TFs and their target genes, such as transcription factor WRKY34, which was identified that specifically expressed in pollen, and regulated several new target genes. Our finding facilitates the interpretation of the expression patterns with more biological relevancy, since the clusters corresponding to the activity of specific TF or the combination of TFs suggest the coordinated regulation of TFs to their target genes. Through integrating different resources, we constructed a dynamic regulatory network of Arabidopsis thaliana during pollen development with gene coexpression and NCA. The network illustrated the relationships between the TFs' activities and their target genes' expression, as well as the interactions between TFs, which provide new insight into the molecular mechanisms that control the pollen development.

  19. Faster-X Evolution of Gene Expression in Drosophila

    PubMed Central

    Meisel, Richard P.; Malone, John H.; Clark, Andrew G.

    2012-01-01

    DNA sequences on X chromosomes often have a faster rate of evolution when compared to similar loci on the autosomes, and well articulated models provide reasons why the X-linked mode of inheritance may be responsible for the faster evolution of X-linked genes. We analyzed microarray and RNA–seq data collected from females and males of six Drosophila species and found that the expression levels of X-linked genes also diverge faster than autosomal gene expression, similar to the “faster-X” effect often observed in DNA sequence evolution. Faster-X evolution of gene expression was recently described in mammals, but it was limited to the evolutionary lineages shortly following the creation of the therian X chromosome. In contrast, we detect a faster-X effect along both deep lineages and those on the tips of the Drosophila phylogeny. In Drosophila males, the dosage compensation complex (DCC) binds the X chromosome, creating a unique chromatin environment that promotes the hyper-expression of X-linked genes. We find that DCC binding, chromatin environment, and breadth of expression are all predictive of the rate of gene expression evolution. In addition, estimates of the intraspecific genetic polymorphism underlying gene expression variation suggest that X-linked expression levels are not under relaxed selective constraints. We therefore hypothesize that the faster-X evolution of gene expression is the result of the adaptive fixation of beneficial mutations at X-linked loci that change expression level in cis. This adaptive faster-X evolution of gene expression is limited to genes that are narrowly expressed in a single tissue, suggesting that relaxed pleiotropic constraints permit a faster response to selection. Finally, we present a conceptional framework to explain faster-X expression evolution, and we use this framework to examine differences in the faster-X effect between Drosophila and mammals. PMID:23071459

  20. Transcriptome sequencing of Eucalyptus camaldulensis seedlings subjected to water stress reveals functional single nucleotide polymorphisms and genes under selection

    PubMed Central

    2012-01-01

    Background Water stress limits plant survival and production in many parts of the world. Identification of genes and alleles responding to water stress conditions is important in breeding plants better adapted to drought. Currently there are no studies examining the transcriptome wide gene and allelic expression patterns under water stress conditions. We used RNA sequencing (RNA-seq) to identify the candidate genes and alleles and to explore the evolutionary signatures of selection. Results We studied the effect of water stress on gene expression in Eucalyptus camaldulensis seedlings derived from three natural populations. We used reference-guided transcriptome mapping to study gene expression. Several genes showed differential expression between control and stress conditions. Gene ontology (GO) enrichment tests revealed up-regulation of 140 stress-related gene categories and down-regulation of 35 metabolic and cell wall organisation gene categories. More than 190,000 single nucleotide polymorphisms (SNPs) were detected and 2737 of these showed differential allelic expression. Allelic expression of 52% of these variants was correlated with differential gene expression. Signatures of selection patterns were studied by estimating the proportion of nonsynonymous to synonymous substitution rates (Ka/Ks). The average Ka/Ks ratio among the 13,719 genes was 0.39 indicating that most of the genes are under purifying selection. Among the positively selected genes (Ka/Ks > 1.5) apoptosis and cell death categories were enriched. Of the 287 positively selected genes, ninety genes showed differential expression and 27 SNPs from 17 positively selected genes showed differential allelic expression between treatments. Conclusions Correlation of allelic expression of several SNPs with total gene expression indicates that these variants may be the cis-acting variants or in linkage disequilibrium with such variants. Enrichment of apoptosis and cell death gene categories among the positively selected genes reveals the past selection pressures experienced by the populations used in this study. PMID:22853646

  1. Cell-type specific features of circular RNA expression.

    PubMed

    Salzman, Julia; Chen, Raymond E; Olsen, Mari N; Wang, Peter L; Brown, Patrick O

    2013-01-01

    Thousands of loci in the human and mouse genomes give rise to circular RNA transcripts; at many of these loci, the predominant RNA isoform is a circle. Using an improved computational approach for circular RNA identification, we found widespread circular RNA expression in Drosophila melanogaster and estimate that in humans, circular RNA may account for 1% as many molecules as poly(A) RNA. Analysis of data from the ENCODE consortium revealed that the repertoire of genes expressing circular RNA, the ratio of circular to linear transcripts for each gene, and even the pattern of splice isoforms of circular RNAs from each gene were cell-type specific. These results suggest that biogenesis of circular RNA is an integral, conserved, and regulated feature of the gene expression program.

  2. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. CHEMICALLY ACTIVATED LUCIFASE GENE EXPRESSION (CALUX) CELL BIOASSAY ANALYSIS FOR THE ESTIMATION OF DIOXIN-LIKE ACTIVITIY: CRITICAL PARAMETERS OF THE CALUX PROCEDURE THAT IMPACT ASSAY RESULTS

    EPA Science Inventory

    The Chemically Activated Luciferase gene expression (CALUX) in vitro cell bioassay is an emerging bioanalytical tool that is increasingly being used for the screening and relative quantification of dioxins and dioxin-like compounds. Since CALUX analyses provide a biological respo...

  4. Spatially coordinated dynamic gene transcription in living pituitary tissue

    PubMed Central

    Featherstone, Karen; Hey, Kirsty; Momiji, Hiroshi; McNamara, Anne V; Patist, Amanda L; Woodburn, Joanna; Spiller, David G; Christian, Helen C; McNeilly, Alan S; Mullins, John J; Finkenstädt, Bärbel F; Rand, David A; White, Michael RH; Davis, Julian RE

    2016-01-01

    Transcription at individual genes in single cells is often pulsatile and stochastic. A key question emerges regarding how this behaviour contributes to tissue phenotype, but it has been a challenge to quantitatively analyse this in living cells over time, as opposed to studying snap-shots of gene expression state. We have used imaging of reporter gene expression to track transcription in living pituitary tissue. We integrated live-cell imaging data with statistical modelling for quantitative real-time estimation of the timing of switching between transcriptional states across a whole tissue. Multiple levels of transcription rate were identified, indicating that gene expression is not a simple binary ‘on-off’ process. Immature tissue displayed shorter durations of high-expressing states than the adult. In adult pituitary tissue, direct cell contacts involving gap junctions allowed local spatial coordination of prolactin gene expression. Our findings identify how heterogeneous transcriptional dynamics of single cells may contribute to overall tissue behaviour. DOI: http://dx.doi.org/10.7554/eLife.08494.001 PMID:26828110

  5. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    PubMed Central

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  6. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    PubMed

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  7. The Choice of the Filtering Method in Microarrays Affects the Inference Regarding Dosage Compensation of the Active X-Chromosome

    PubMed Central

    Zeller, Tanja; Wild, Philipp S.; Truong, Vinh; Trégouët, David-Alexandre; Munzel, Thomas; Ziegler, Andreas; Cambien, François; Blankenberg, Stefan; Tiret, Laurence

    2011-01-01

    Background The hypothesis of dosage compensation of genes of the X chromosome, supported by previous microarray studies, was recently challenged by RNA-sequencing data. It was suggested that microarray studies were biased toward an over-estimation of X-linked expression levels as a consequence of the filtering of genes below the detection threshold of microarrays. Methodology/Principal Findings To investigate this hypothesis, we used microarray expression data from circulating monocytes in 1,467 individuals. In total, 25,349 and 1,156 probes were unambiguously assigned to autosomes and the X chromosome, respectively. Globally, there was a clear shift of X-linked expressions toward lower levels than autosomes. We compared the ratio of expression levels of X-linked to autosomal transcripts (X∶AA) using two different filtering methods: 1. gene expressions were filtered out using a detection threshold irrespective of gene chromosomal location (the standard method in microarrays); 2. equal proportions of genes were filtered out separately on the X and on autosomes. For a wide range of filtering proportions, the X∶AA ratio estimated with the first method was not significantly different from 1, the value expected if dosage compensation was achieved, whereas it was significantly lower than 1 with the second method, leading to the rejection of the hypothesis of dosage compensation. We further showed in simulated data that the choice of the most appropriate method was dependent on biological assumptions regarding the proportion of actively expressed genes on the X chromosome comparative to the autosomes and the extent of dosage compensation. Conclusion/Significance This study shows that the method used for filtering out lowly expressed genes in microarrays may have a major impact according to the hypothesis investigated. The hypothesis of dosage compensation of X-linked genes cannot be firmly accepted or rejected using microarray-based data. PMID:21912656

  8. Modeling gene expression measurement error: a quasi-likelihood approach

    PubMed Central

    Strimmer, Korbinian

    2003-01-01

    Background Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale). Results Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects. Conclusions The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression. PMID:12659637

  9. Rate of Amino Acid Substitution Is Influenced by the Degree and Conservation of Male-Biased Transcription Over 50 Myr of Drosophila Evolution

    PubMed Central

    Grath, Sonja; Parsch, John

    2012-01-01

    Sex-biased gene expression (i.e., the differential expression of genes between males and females) is common among sexually reproducing species. However, genes often differ in their sex-bias classification or degree of sex bias between species. There is also an unequal distribution of sex-biased genes (especially male-biased genes) between the X chromosome and the autosomes. We used whole-genome expression data and evolutionary rate estimates for two different Drosophilid lineages, melanogaster and obscura, spanning an evolutionary time scale of around 50 Myr to investigate the influence of sex-biased gene expression and chromosomal location on the rate of molecular evolution. In both lineages, the rate of protein evolution correlated positively with the male/female expression ratio. Genes with highly male-biased expression, genes expressed specifically in male reproductive tissues, and genes with conserved male-biased expression over long evolutionary time scales showed the fastest rates of evolution. An analysis of sex-biased gene evolution in both lineages revealed evidence for a “fast-X” effect in which the rate of evolution was greater for X-linked than for autosomal genes. This pattern was particularly pronounced for male-biased genes. Genes located on the obscura “neo-X” chromosome, which originated from a recent X-autosome fusion, showed rates of evolution that were intermediate between genes located on the ancestral X-chromosome and the autosomes. This suggests that the shift to X-linkage led to an increase in the rate of molecular evolution. PMID:22321769

  10. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    PubMed Central

    2012-01-01

    Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies) was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST) data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts) longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10−09 and 1.1 × 10−09) is an order of magnitude smaller than values reported for angiosperm herbs. However, if one takes generation time into account, most of this difference disappears. The estimates of the dN/dS ratio (non-synonymous over synonymous divergence) reported here are in general much lower than 1 and only a few genes showed a ratio larger than 1. PMID:23122049

  11. Gene expression information improves reliability of receptor status in breast cancer patients

    PubMed Central

    Kenn, Michael; Schlangen, Karin; Castillo-Tong, Dan Cacsire; Singer, Christian F.; Cibena, Michael; Koelbl, Heinz; Schreiner, Wolfgang

    2017-01-01

    Immunohistochemical (IHC) determination of receptor status in breast cancer patients is frequently inaccurate. Since it directs the choice of systemic therapy, it is essential to increase its reliability. We increase the validity of IHC receptor expression by additionally considering gene expression (GE) measurements. Crisp therapeutic decisions are based on IHC estimates, even if they are borderline reliable. We further improve decision quality by a responsibility function, defining a critical domain for gene expression. Refined normalization is devised to file any newly diagnosed patient into existing data bases. Our approach renders receptor estimates more reliable by identifying patients with questionable receptor status. The approach is also more efficient since the rate of conclusive samples is increased. We have curated and evaluated gene expression data, together with clinical information, from 2880 breast cancer patients. Combining IHC with gene expression information yields a method more reliable and also more efficient as compared to common practice up to now. Several types of possibly suboptimal treatment allocations, based on IHC receptor status alone, are enumerated. A ‘therapy allocation check’ identifies patients possibly miss-classified. Estrogen: false negative 8%, false positive 6%. Progesterone: false negative 14%, false positive 11%. HER2: false negative 2%, false positive 50%. Possible implications are discussed. We propose an ‘expression look-up-plot’, allowing for a significant potential to improve the quality of precision medicine. Methods are developed and exemplified here for breast cancer patients, but they may readily be transferred to diagnostic data relevant for therapeutic decisions in other fields of oncology. PMID:29100391

  12. MALDI-TOF mass spectrometry for quantitative gene expression analysis of acid responses in Staphylococcus aureus.

    PubMed

    Rode, Tone Mari; Berget, Ingunn; Langsrud, Solveig; Møretrø, Trond; Holck, Askild

    2009-07-01

    Microorganisms are constantly exposed to new and altered growth conditions, and respond by changing gene expression patterns. Several methods for studying gene expression exist. During the last decade, the analysis of microarrays has been one of the most common approaches applied for large scale gene expression studies. A relatively new method for gene expression analysis is MassARRAY, which combines real competitive-PCR and MALDI-TOF (matrix-assisted laser desorption/ionization time-of-flight) mass spectrometry. In contrast to microarray methods, MassARRAY technology is suitable for analysing a larger number of samples, though for a smaller set of genes. In this study we compare the results from MassARRAY with microarrays on gene expression responses of Staphylococcus aureus exposed to acid stress at pH 4.5. RNA isolated from the same stress experiments was analysed using both the MassARRAY and the microarray methods. The MassARRAY and microarray methods showed good correlation. Both MassARRAY and microarray estimated somewhat lower fold changes compared with quantitative real-time PCR (qRT-PCR). The results confirmed the up-regulation of the urease genes in acidic environments, and also indicated the importance of metal ion regulation. This study shows that the MassARRAY technology is suitable for gene expression analysis in prokaryotes, and has advantages when a set of genes is being analysed for an organism exposed to many different environmental conditions.

  13. Comparison of gene expression response to neutron and x-ray irradiation using mouse blood.

    PubMed

    Broustas, Constantinos G; Xu, Yanping; Harken, Andrew D; Garty, Guy; Amundson, Sally A

    2017-01-03

    In the event of an improvised nuclear device detonation, the prompt radiation exposure would consist of photons plus a neutron component that would contribute to the total dose. As neutrons cause more complex and difficult to repair damage to cells that would result in a more severe health burden to affected individuals, it is paramount to be able to estimate the contribution of neutrons to an estimated dose, to provide information for those making treatment decisions. Mice exposed to either 0.25 or 1 Gy of neutron or 1 or 4 Gy x-ray radiation were sacrificed at 1 or 7 days after exposure. Whole genome microarray analysis identified 7285 and 5045 differentially expressed genes in the blood of mice exposed to neutron or x-ray radiation, respectively. Neutron exposure resulted in mostly downregulated genes, whereas x-rays showed both down- and up-regulated genes. A total of 34 differentially expressed genes were regulated in response to all ≥1 Gy exposures at both times. Of these, 25 genes were consistently downregulated at days 1 and 7, whereas 9 genes, including the transcription factor E2f2, showed bi-directional regulation; being downregulated at day 1, while upregulated at day 7. Gene ontology analysis revealed that genes involved in nucleic acid metabolism processes were persistently downregulated in neutron irradiated mice, whereas genes involved in lipid metabolism were upregulated in x-ray irradiated animals. Most biological processes significantly enriched at both timepoints were consistently represented by either under- or over-expressed genes. In contrast, cell cycle processes were significant among down-regulated genes at day 1, but among up-regulated genes at day 7 after exposure to either neutron or x-rays. Cell cycle genes downregulated at day 1 were mostly distinct from the cell cycle genes upregulated at day 7. However, five cell cycle genes, Fzr1, Ube2c, Ccna2, Nusap1, and Cdc25b, were both downregulated at day 1 and upregulated at day 7. We describe, for the first time, the gene expression profile of mouse blood cells following exposure to neutrons. We have found that neutron radiation results in both distinct and common gene expression patterns compared with x-ray radiation.

  14. [Gene expression analyses of kidney biopsies: the European renal cDNA bank--Kröner-Fresenius biopsy bank].

    PubMed

    Cohen, C D; Kretzler, M

    2009-03-01

    Histological analysis of kidney biopsies is an essential part of our current diagnostic workup of patients with renal disease. Besides the already established diagnostic tools, new methods allow extensive analysis of the sample tissue's gene expression. Using results from a European multicenter study on gene expression analysis of renal biopsies, in this review we demonstrate that this novel approach not only expands the scope of so-called basic research but also might supplement future biopsy diagnostics. The goals are improved diagnosis and more specific therapy choice and prognosis estimates.

  15. Clustering change patterns using Fourier transformation with time-course gene expression data.

    PubMed

    Kim, Jaehee

    2011-01-01

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.

  16. Cell-Type Specific Features of Circular RNA Expression

    PubMed Central

    Salzman, Julia; Chen, Raymond E.; Olsen, Mari N.; Wang, Peter L.; Brown, Patrick O.

    2013-01-01

    Thousands of loci in the human and mouse genomes give rise to circular RNA transcripts; at many of these loci, the predominant RNA isoform is a circle. Using an improved computational approach for circular RNA identification, we found widespread circular RNA expression in Drosophila melanogaster and estimate that in humans, circular RNA may account for 1% as many molecules as poly(A) RNA. Analysis of data from the ENCODE consortium revealed that the repertoire of genes expressing circular RNA, the ratio of circular to linear transcripts for each gene, and even the pattern of splice isoforms of circular RNAs from each gene were cell-type specific. These results suggest that biogenesis of circular RNA is an integral, conserved, and regulated feature of the gene expression program. PMID:24039610

  17. EGRINs (Environmental Gene Regulatory Influence Networks) in Rice That Function in the Response to Water Deficit, High Temperature, and Agricultural Environments[OPEN

    PubMed Central

    Hafemeister, Christoph; Nicotra, Adrienne B.; Jagadish, S.V. Krishna; Bonneau, Richard; Purugganan, Michael

    2016-01-01

    Environmental gene regulatory influence networks (EGRINs) coordinate the timing and rate of gene expression in response to environmental signals. EGRINs encompass many layers of regulation, which culminate in changes in accumulated transcript levels. Here, we inferred EGRINs for the response of five tropical Asian rice (Oryza sativa) cultivars to high temperatures, water deficit, and agricultural field conditions by systematically integrating time-series transcriptome data, patterns of nucleosome-free chromatin, and the occurrence of known cis-regulatory elements. First, we identified 5447 putative target genes for 445 transcription factors (TFs) by connecting TFs with genes harboring known cis-regulatory motifs in nucleosome-free regions proximal to their transcriptional start sites. We then used network component analysis to estimate the regulatory activity for each TF based on the expression of its putative target genes. Finally, we inferred an EGRIN using the estimated transcription factor activity (TFA) as the regulator. The EGRINs include regulatory interactions between 4052 target genes regulated by 113 TFs. We resolved distinct regulatory roles for members of the heat shock factor family, including a putative regulatory connection between abiotic stress and the circadian clock. TFA estimation using network component analysis is an effective way of incorporating multiple genome-scale measurements into network inference. PMID:27655842

  18. Development and application of a modified dynamic time warping algorithm (DTW-S) to analyses of primate brain expression time series

    PubMed Central

    2011-01-01

    Background Comparing biological time series data across different conditions, or different specimens, is a common but still challenging task. Algorithms aligning two time series represent a valuable tool for such comparisons. While many powerful computation tools for time series alignment have been developed, they do not provide significance estimates for time shift measurements. Results Here, we present an extended version of the original DTW algorithm that allows us to determine the significance of time shift estimates in time series alignments, the DTW-Significance (DTW-S) algorithm. The DTW-S combines important properties of the original algorithm and other published time series alignment tools: DTW-S calculates the optimal alignment for each time point of each gene, it uses interpolated time points for time shift estimation, and it does not require alignment of the time-series end points. As a new feature, we implement a simulation procedure based on parameters estimated from real time series data, on a series-by-series basis, allowing us to determine the false positive rate (FPR) and the significance of the estimated time shift values. We assess the performance of our method using simulation data and real expression time series from two published primate brain expression datasets. Our results show that this method can provide accurate and robust time shift estimates for each time point on a gene-by-gene basis. Using these estimates, we are able to uncover novel features of the biological processes underlying human brain development and maturation. Conclusions The DTW-S provides a convenient tool for calculating accurate and robust time shift estimates at each time point for each gene, based on time series data. The estimates can be used to uncover novel biological features of the system being studied. The DTW-S is freely available as an R package TimeShift at http://www.picb.ac.cn/Comparative/data.html. PMID:21851598

  19. Development and application of a modified dynamic time warping algorithm (DTW-S) to analyses of primate brain expression time series.

    PubMed

    Yuan, Yuan; Chen, Yi-Ping Phoebe; Ni, Shengyu; Xu, Augix Guohua; Tang, Lin; Vingron, Martin; Somel, Mehmet; Khaitovich, Philipp

    2011-08-18

    Comparing biological time series data across different conditions, or different specimens, is a common but still challenging task. Algorithms aligning two time series represent a valuable tool for such comparisons. While many powerful computation tools for time series alignment have been developed, they do not provide significance estimates for time shift measurements. Here, we present an extended version of the original DTW algorithm that allows us to determine the significance of time shift estimates in time series alignments, the DTW-Significance (DTW-S) algorithm. The DTW-S combines important properties of the original algorithm and other published time series alignment tools: DTW-S calculates the optimal alignment for each time point of each gene, it uses interpolated time points for time shift estimation, and it does not require alignment of the time-series end points. As a new feature, we implement a simulation procedure based on parameters estimated from real time series data, on a series-by-series basis, allowing us to determine the false positive rate (FPR) and the significance of the estimated time shift values. We assess the performance of our method using simulation data and real expression time series from two published primate brain expression datasets. Our results show that this method can provide accurate and robust time shift estimates for each time point on a gene-by-gene basis. Using these estimates, we are able to uncover novel features of the biological processes underlying human brain development and maturation. The DTW-S provides a convenient tool for calculating accurate and robust time shift estimates at each time point for each gene, based on time series data. The estimates can be used to uncover novel biological features of the system being studied. The DTW-S is freely available as an R package TimeShift at http://www.picb.ac.cn/Comparative/data.html.

  20. Expression stability of two housekeeping genes (18S rRNA and G3PDH) during in vitro maturation of follicular oocytes in buffalo (Bubalus bubalis).

    PubMed

    Aswal, Ajay Pal Singh; Raghav, Sarvesh; De, Sachinandan; Thakur, Manish; Goswami, Surender Lal; Datta, Tirtha Kumar

    2008-01-15

    The present study was undertaken to evaluate the expression stability of two housekeeping genes (HKGs), 18S rRNA and G3PDH during in vitro maturation (IVM) of oocytes in buffalo, which qualifies their use as internal controls for valid qRT-PCR estimation of other oocyte transcripts. A semi quantitative RT-PCR system was used with optimised qRT-PCR parameters at exponential PCR cycle for evaluation of temporal expression pattern of these genes over 24 h of IVM. 18S rRNA was found more stable in its expression pattern than G3PDH.

  1. Regulation of gene expression in the mammalian eye and its relevance to eye disease.

    PubMed

    Scheetz, Todd E; Kim, Kwang-Youn A; Swiderski, Ruth E; Philp, Alisdair R; Braun, Terry A; Knudtson, Kevin L; Dorrance, Anne M; DiBona, Gerald F; Huang, Jian; Casavant, Thomas L; Sheffield, Val C; Stone, Edwin M

    2006-09-26

    We used expression quantitative trait locus mapping in the laboratory rat (Rattus norvegicus) to gain a broad perspective of gene regulation in the mammalian eye and to identify genetic variation relevant to human eye disease. Of >31,000 gene probes represented on an Affymetrix expression microarray, 18,976 exhibited sufficient signal for reliable analysis and at least 2-fold variation in expression among 120 F(2) rats generated from an SR/JrHsd x SHRSP intercross. Genome-wide linkage analysis with 399 genetic markers revealed significant linkage with at least one marker for 1,300 probes (alpha = 0.001; estimated empirical false discovery rate = 2%). Both contiguous and noncontiguous loci were found to be important in regulating mammalian eye gene expression. We investigated one locus of each type in greater detail and identified putative transcription-altering variations in both cases. We found an inserted cREL binding sequence in the 5' flanking sequence of the Abca4 gene associated with an increased expression level of that gene, and we found a mutation of the gene encoding thyroid hormone receptor beta2 associated with a decreased expression level of the gene encoding short-wavelength sensitive opsin (Opn1sw). In addition to these positional studies, we performed a pairwise analysis of gene expression to identify genes that are regulated in a coordinated manner and used this approach to validate two previously undescribed genes involved in the human disease Bardet-Biedl syndrome. These data and analytical approaches can be used to facilitate the discovery of additional genes and regulatory elements involved in human eye disease.

  2. Bayesian estimation of the discrete coefficient of determination.

    PubMed

    Chen, Ting; Braga-Neto, Ulisses M

    2016-12-01

    The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.

  3. Assessment of the reliability of protein-protein interactions and protein function prediction.

    PubMed

    Deng, Minghua; Sun, Fengzhu; Chen, Ting

    2003-01-01

    As more and more high-throughput protein-protein interaction data are collected, the task of estimating the reliability of different data sets becomes increasingly important. In this paper, we present our study of two groups of protein-protein interaction data, the physical interaction data and the protein complex data, and estimate the reliability of these data sets using three different measurements: (1) the distribution of gene expression correlation coefficients, (2) the reliability based on gene expression correlation coefficients, and (3) the accuracy of protein function predictions. We develop a maximum likelihood method to estimate the reliability of protein interaction data sets according to the distribution of correlation coefficients of gene expression profiles of putative interacting protein pairs. The results of the three measurements are consistent with each other. The MIPS protein complex data have the highest mean gene expression correlation coefficients (0.256) and the highest accuracy in predicting protein functions (70% sensitivity and specificity), while Ito's Yeast two-hybrid data have the lowest mean (0.041) and the lowest accuracy (15% sensitivity and specificity). Uetz's data are more reliable than Ito's data in all three measurements, and the TAP protein complex data are more reliable than the HMS-PCI data in all three measurements as well. The complex data sets generally perform better in function predictions than do the physical interaction data sets. Proteins in complexes are shown to be more highly correlated in gene expression. The results confirm that the components of a protein complex can be assigned to functions that the complex carries out within a cell. There are three interaction data sets different from the above two groups: the genetic interaction data, the in-silico data and the syn-express data. Their capability of predicting protein functions generally falls between that of the Y2H data and that of the MIPS protein complex data. The supplementary information is available at the following Web site: http://www-hto.usc.edu/-msms/AssessInteraction/.

  4. Male- and Female-Biased Gene Expression of Olfactory-Related Genes in the Antennae of Asian Corn Borer, Ostrinia furnacalis (Guenée) (Lepidoptera: Crambidae)

    PubMed Central

    Zhang, Tiantao; Coates, Brad S.; Ge, Xing; Bai, Shuxiong; He, Kanglai; Wang, Zhenying

    2015-01-01

    The Asian corn borer (ACB), Ostrinia furnacalis (Guenée), is a destructive pest insect of cultivated corn crops, for which antennal-expressed receptors are important to detect olfactory cues for mate attraction and oviposition. Few olfactory related genes were reported in ACB, so we sequenced and characterized the transcriptome of male and female O. furnacalis antennae. Non-normalized male and female O. furnacalis antennal cDNA libraries were sequenced on the Illumina HiSeq 2000 and assembled into a reference transcriptome. Functional gene annotations identified putative olfactory-related genes; 56 odorant receptors (ORs), 23 odorant binding proteins (OBPs), and 10 CSPs. RNA-seq estimates of gene expression respectively showed up- and down-regulation of 79 and 30 genes in female compared to male antennae, which included up-regulation of 8 ORs and 1 PBP gene in male antennae as well as 3 ORs in female antennae. Quantitative real-time RT-PCR analyses validated strong male antennal-biased expression of OfurOR3, 4, 6, 7, 8, 11, 12, 13 and 14 transcripts, whereas OfurOR17 and 18 were specially expressed in female antennae. Sex-biases gene expression described here provides important insight in gene functionalization, and provides candidate genes putatively involved in environmental perception, host plant attraction, and mate recognition. PMID:26062030

  5. Bayesian median regression for temporal gene expression data

    NASA Astrophysics Data System (ADS)

    Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

    2007-09-01

    Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

  6. Identification of General Patterns of Sex-Biased Expression in Daphnia, a Genus with Environmental Sex Determination

    PubMed Central

    Molinier, Cécile; Reisser, Céline M.O.; Fields, Peter; Ségard, Adeline; Galimov, Yan; Haag, Christoph R.

    2018-01-01

    Daphnia reproduce by cyclic-parthenogenesis, where phases of asexual reproduction are intermitted by sexual production of diapause stages. This life cycle, together with environmental sex determination, allow the comparison of gene expression between genetically identical males and females. We investigated gene expression differences between males and females in four genotypes of Daphnia magna and compared the results with published data on sex-biased gene expression in two other Daphnia species, each representing one of the major phylogenetic clades within the genus. We found that 42% of all annotated genes showed sex-biased expression in D. magna. This proportion is similar both to estimates from other Daphnia species as well as from species with genetic sex determination, suggesting that sex-biased expression is not reduced under environmental sex determination. Among 7453 single copy, one-to-one orthologs in the three Daphnia species, 707 consistently showed sex-biased expression and 675 were biased in the same direction in all three species. Hence these genes represent a core-set of genes with consistent sex-differential expression in the genus. A functional analysis identified that several of them are involved in known sex determination pathways. Moreover, 75% were overexpressed in females rather than males, a pattern that appears to be a general feature of sex-biased gene expression in Daphnia. PMID:29535148

  7. Identification of General Patterns of Sex-Biased Expression in Daphnia, a Genus with Environmental Sex Determination.

    PubMed

    Molinier, Cécile; Reisser, Céline M O; Fields, Peter; Ségard, Adeline; Galimov, Yan; Haag, Christoph R

    2018-05-04

    Daphnia reproduce by cyclic-parthenogenesis, where phases of asexual reproduction are intermitted by sexual production of diapause stages. This life cycle, together with environmental sex determination, allow the comparison of gene expression between genetically identical males and females. We investigated gene expression differences between males and females in four genotypes of Daphnia magna and compared the results with published data on sex-biased gene expression in two other Daphnia species, each representing one of the major phylogenetic clades within the genus. We found that 42% of all annotated genes showed sex-biased expression in D. magna This proportion is similar both to estimates from other Daphnia species as well as from species with genetic sex determination, suggesting that sex-biased expression is not reduced under environmental sex determination. Among 7453 single copy, one-to-one orthologs in the three Daphnia species, 707 consistently showed sex-biased expression and 675 were biased in the same direction in all three species. Hence these genes represent a core-set of genes with consistent sex-differential expression in the genus. A functional analysis identified that several of them are involved in known sex determination pathways. Moreover, 75% were overexpressed in females rather than males, a pattern that appears to be a general feature of sex-biased gene expression in Daphnia . Copyright © 2018 Molinier et al.

  8. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  9. Examining Radiation-Induced In Vivo and In Vitro Gene Expression Changes of the Peripheral Blood in Different Laboratories for Biodosimetry Purposes: First RENEB Gene Expression Study.

    PubMed

    Abend, M; Badie, C; Quintens, R; Kriehuber, R; Manning, G; Macaeva, E; Njima, M; Oskamp, D; Strunz, S; Moertl, S; Doucha-Senf, S; Dahlke, S; Menzel, J; Port, M

    2016-02-01

    The risk of a large-scale event leading to acute radiation exposure necessitates the development of high-throughput methods for providing rapid individual dose estimates. Our work addresses three goals, which align with the directive of the European Union's Realizing the European Network of Biodosimetry project (EU-RENB): 1. To examine the suitability of different gene expression platforms for biodosimetry purposes; 2. To perform this examination using blood samples collected from prostate cancer patients (in vivo) and from healthy donors (in vitro); and 3. To compare radiation-induced gene expression changes of the in vivo with in vitro blood samples. For the in vitro part of this study, EDTA-treated whole blood was irradiated immediately after venipuncture using single X-ray doses (1 Gy/min(-1) dose rate, 100 keV). Blood samples used to generate calibration curves as well as 10 coded (blinded) samples (0-4 Gy dose range) were incubated for 24 h in vitro, lysed and shipped on wet ice. For the in vivo part of the study PAXgene tubes were used and peripheral blood (2.5 ml) was collected from prostate cancer patients before and 24 h after the first fractionated 2 Gy dose of localized radiotherapy to the pelvis [linear accelerator (LINAC), 580 MU/min, exposure 1-1.5 min]. Assays were run in each laboratory according to locally established protocols using either microarray platforms (2 laboratories) or qRT-PCR (2 laboratories). Report times on dose estimates were documented. The mean absolute difference of estimated doses relative to the true doses (Gy) were calculated. Doses were also merged into binary categories reflecting aspects of clinical/diagnostic relevance. For the in vitro part of the study, the earliest report time on dose estimates was 7 h for qRT-PCR and 35 h for microarrays. Methodological variance of gene expression measurements (CV ≤10% for technical replicates) and interindividual variance (≤twofold for all genes) were low. Dose estimates based on one gene, ferredoxin reductase (FDXR), using qRT-PCR were as precise as dose estimates based on multiple genes using microarrays, but the precision decreased at doses ≥2 Gy. Binary dose categories comprising, for example, unexposed compared with exposed samples, could be completely discriminated with most of our methods. Exposed prostate cancer blood samples (n = 4) could be completely discriminated from unexposed blood samples (n = 4, P < 0.03, two-sided Fisher's exact test) without individual controls. This could be performed by introducing an in vitro-to-in vivo correction factor of FDXR, which varied among the laboratories. After that the in vitro-constructed calibration curves could be used for dose estimation of the in vivo exposed prostate cancer blood samples within an accuracy window of ±0.5 Gy in both contributing qRT-PCR laboratories. In conclusion, early and precise dose estimates can be performed, in particular at doses ≤2 Gy in vitro. Blood samples of prostate cancer patients exposed to 0.09-0.017 Gy could be completely discriminated from pre-exposure blood samples with the doses successfully estimated using adjusted in vitro-constructed calibration curves.

  10. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    PubMed

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  11. BATS: a Bayesian user-friendly software for analyzing time series microarray experiments.

    PubMed

    Angelini, Claudia; Cutillo, Luisa; De Canditiis, Daniela; Mutarelli, Margherita; Pensky, Marianna

    2008-10-06

    Gene expression levels in a given cell can be influenced by different factors, namely pharmacological or medical treatments. The response to a given stimulus is usually different for different genes and may depend on time. One of the goals of modern molecular biology is the high-throughput identification of genes associated with a particular treatment or a biological process of interest. From methodological and computational point of view, analyzing high-dimensional time course microarray data requires very specific set of tools which are usually not included in standard software packages. Recently, the authors of this paper developed a fully Bayesian approach which allows one to identify differentially expressed genes in a 'one-sample' time-course microarray experiment, to rank them and to estimate their expression profiles. The method is based on explicit expressions for calculations and, hence, very computationally efficient. The software package BATS (Bayesian Analysis of Time Series) presented here implements the methodology described above. It allows an user to automatically identify and rank differentially expressed genes and to estimate their expression profiles when at least 5-6 time points are available. The package has a user-friendly interface. BATS successfully manages various technical difficulties which arise in time-course microarray experiments, such as a small number of observations, non-uniform sampling intervals and replicated or missing data. BATS is a free user-friendly software for the analysis of both simulated and real microarray time course experiments. The software, the user manual and a brief illustrative example are freely available online at the BATS website: http://www.na.iac.cnr.it/bats.

  12. Evolution and Variation of Renin Genes in Mice

    PubMed Central

    Dickinson, Douglas P.; Gross, Kenneth W.; Piccini, Nina; Wilson, Carol M.

    1984-01-01

    Inbred strains of mice carry Ren-1, a gene encoding the thermostable Renin-1 isozyme. Ren-1 is expressed at relatively low levels in mouse submandibular gland and kidney. Some strains also carry Ren-2, a gene encoding the thermolabile Renin-2 isozyme. Ren-2 is expressed at high levels in the mouse submandibular gland and at very low levels, if at all, in the kidney. Ren-1 and Ren-2 are closely linked on mouse chromosome 1, show extensive homology in coding and noncoding regions and provide a model for studying the regulation of gene expression. An investigation of renin genes and enzymatic activity in wild-derived mice identified several restriction site polymorphisms as well as putative variants in renin gene expression and protein structure. The number of renin genes carried by different subpopulations of wild-derived mice is consistent with the occurrence of a gene duplication event prior to the divergence of M. spretus (2.75–5.5 million yr ago). This conclusion is in agreement with a prior estimate based upon comparative sequence analysis of Ren-1 and Ren-2 from inbred laboratory mice. PMID:6389258

  13. Validation of reference genes aiming accurate normalization of qRT-PCR data in Dendrocalamus latiflorus Munro.

    PubMed

    Liu, Mingying; Jiang, Jing; Han, Xiaojiao; Qiao, Guirong; Zhuo, Renying

    2014-01-01

    Dendrocalamus latiflorus Munro distributes widely in subtropical areas and plays vital roles as valuable natural resources. The transcriptome sequencing for D. latiflorus Munro has been performed and numerous genes especially those predicted to be unique to D. latiflorus Munro were revealed. qRT-PCR has become a feasible approach to uncover gene expression profiling, and the accuracy and reliability of the results obtained depends upon the proper selection of stable reference genes for accurate normalization. Therefore, a set of suitable internal controls should be validated for D. latiflorus Munro. In this report, twelve candidate reference genes were selected and the assessment of gene expression stability was performed in ten tissue samples and four leaf samples from seedlings and anther-regenerated plants of different ploidy. The PCR amplification efficiency was estimated, and the candidate genes were ranked according to their expression stability using three software packages: geNorm, NormFinder and Bestkeeper. GAPDH and EF1α were characterized to be the most stable genes among different tissues or in all the sample pools, while CYP showed low expression stability. RPL3 had the optimal performance among four leaf samples. The application of verified reference genes was illustrated by analyzing ferritin and laccase expression profiles among different experimental sets. The analysis revealed the biological variation in ferritin and laccase transcript expression among the tissues studied and the individual plants. geNorm, NormFinder, and BestKeeper analyses recommended different suitable reference gene(s) for normalization according to the experimental sets. GAPDH and EF1α had the highest expression stability across different tissues and RPL3 for the other sample set. This study emphasizes the importance of validating superior reference genes for qRT-PCR analysis to accurately normalize gene expression of D. latiflorus Munro.

  14. Fully moderated T-statistic for small sample size gene expression arrays.

    PubMed

    Yu, Lianbo; Gulati, Parul; Fernandez, Soledad; Pennell, Michael; Kirschner, Lawrence; Jarjoura, David

    2011-09-15

    Gene expression microarray experiments with few replications lead to great variability in estimates of gene variances. Several Bayesian methods have been developed to reduce this variability and to increase power. Thus far, moderated t methods assumed a constant coefficient of variation (CV) for the gene variances. We provide evidence against this assumption, and extend the method by allowing the CV to vary with gene expression. Our CV varying method, which we refer to as the fully moderated t-statistic, was compared to three other methods (ordinary t, and two moderated t predecessors). A simulation study and a familiar spike-in data set were used to assess the performance of the testing methods. The results showed that our CV varying method had higher power than the other three methods, identified a greater number of true positives in spike-in data, fit simulated data under varying assumptions very well, and in a real data set better identified higher expressing genes that were consistent with functional pathways associated with the experiments.

  15. BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-Expression Analysis.

    PubMed

    Wang, Duolin; Wang, Juexin; Jiang, Yuexu; Liang, Yanchun; Xu, Dong

    2017-02-03

    Comparing the gene-expression profiles between biological conditions is useful for understanding gene regulation underlying complex phenotypes. Along this line, analysis of differential co-expression (DC) has gained attention in the recent years, where genes under one condition have different co-expression patterns compared with another. We developed an R package Bayes Factor approach for Differential Co-expression Analysis (BFDCA) for DC analysis. BFDCA is unique in integrating various aspects of DC patterns (including Shift, Cross, and Re-wiring) into one uniform Bayes factor. We tested BFDCA using simulation data and experimental data. Simulation results indicate that BFDCA outperforms existing methods in accuracy and robustness of detecting DC pairs and DC modules. Results of using experimental data suggest that BFDCA can cluster disease-related genes into functional DC subunits and estimate the regulatory impact of disease-related genes well. BFDCA also achieves high accuracy in predicting case-control phenotypes by using significant DC gene pairs as markers. BFDCA is publicly available at http://dx.doi.org/10.17632/jdz4vtvnm3.1. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Quantitative analysis of a deeply sequenced marine microbial metatranscriptome.

    PubMed

    Gifford, Scott M; Sharma, Shalabh; Rinta-Kanto, Johanna M; Moran, Mary Ann

    2011-03-01

    The potential of metatranscriptomic sequencing to provide insights into the environmental factors that regulate microbial activities depends on how fully the sequence libraries capture community expression (that is, sample-sequencing depth and coverage depth), and the sensitivity with which expression differences between communities can be detected (that is, statistical power for hypothesis testing). In this study, we use an internal standard approach to make absolute (per liter) estimates of transcript numbers, a significant advantage over proportional estimates that can be biased by expression changes in unrelated genes. Coastal waters of the southeastern United States contain 1 × 10(12) bacterioplankton mRNA molecules per liter of seawater (~200 mRNA molecules per bacterial cell). Even for the large bacterioplankton libraries obtained in this study (~500,000 possible protein-encoding sequences in each of two libraries after discarding rRNAs and small RNAs from >1 million 454 FLX pyrosequencing reads), sample-sequencing depth was only 0.00001%. Expression levels of 82 genes diagnostic for transformations in the marine nitrogen, phosphorus and sulfur cycles ranged from below detection (<1 × 10(6) transcripts per liter) for 36 genes (for example, phosphonate metabolism gene phnH, dissimilatory nitrate reductase subunit napA) to >2.7 × 10(9) transcripts per liter (ammonia transporter amt and ammonia monooxygenase subunit amoC). Half of the categories for which expression was detected, however, had too few copy numbers for robust statistical resolution, as would be required for comparative (experimental or time-series) expression studies. By representing whole community gene abundance and expression in absolute units (per volume or mass of environment), 'omics' data can be better leveraged to improve understanding of microbially mediated processes in the ocean.

  17. Comparative Study of Regulatory Circuits in Two Sea Urchin Species Reveals Tight Control of Timing and High Conservation of Expression Dynamics

    PubMed Central

    Gildor, Tsvia; Ben-Tabou de-Leon, Smadar

    2015-01-01

    Accurate temporal control of gene expression is essential for normal development and must be robust to natural genetic and environmental variation. Studying gene expression variation within and between related species can delineate the level of expression variability that development can tolerate. Here we exploit the comprehensive model of sea urchin gene regulatory networks and generate high-density expression profiles of key regulatory genes of the Mediterranean sea urchin, Paracentrotus lividus (Pl). The high resolution of our studies reveals highly reproducible gene initiation times that have lower variation than those of maximal mRNA levels between different individuals of the same species. This observation supports a threshold behavior of gene activation that is less sensitive to input concentrations. We then compare Mediterranean sea urchin gene expression profiles to those of its Pacific Ocean relative, Strongylocentrotus purpuratus (Sp). These species shared a common ancestor about 40 million years ago and show highly similar embryonic morphologies. Our comparative analyses of five regulatory circuits operating in different embryonic territories reveal a high conservation of the temporal order of gene activation but also some cases of divergence. A linear ratio of 1.3-fold between gene initiation times in Pl and Sp is partially explained by scaling of the developmental rates with temperature. Scaling the developmental rates according to the estimated Sp-Pl ratio and normalizing the expression levels reveals a striking conservation of relative dynamics of gene expression between the species. Overall, our findings demonstrate the ability of biological developmental systems to tightly control the timing of gene activation and relative dynamics and overcome expression noise induced by genetic variation and growth conditions. PMID:26230518

  18. Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data.

    PubMed

    Racle, Julien; de Jonge, Kaat; Baumgaertner, Petra; Speiser, Daniel E; Gfeller, David

    2017-11-13

    Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).

  19. Quantifying Extrinsic Noise in Gene Expression Using the Maximum Entropy Framework

    PubMed Central

    Dixit, Purushottam D.

    2013-01-01

    We present a maximum entropy framework to separate intrinsic and extrinsic contributions to noisy gene expression solely from the profile of expression. We express the experimentally accessible probability distribution of the copy number of the gene product (mRNA or protein) by accounting for possible variations in extrinsic factors. The distribution of extrinsic factors is estimated using the maximum entropy principle. Our results show that extrinsic factors qualitatively and quantitatively affect the probability distribution of the gene product. We work out, in detail, the transcription of mRNA from a constitutively expressed promoter in Escherichia coli. We suggest that the variation in extrinsic factors may account for the observed wider-than-Poisson distribution of mRNA copy numbers. We successfully test our framework on a numerical simulation of a simple gene expression scheme that accounts for the variation in extrinsic factors. We also make falsifiable predictions, some of which are tested on previous experiments in E. coli whereas others need verification. Application of the presented framework to more complex situations is also discussed. PMID:23790383

  20. Quantifying extrinsic noise in gene expression using the maximum entropy framework.

    PubMed

    Dixit, Purushottam D

    2013-06-18

    We present a maximum entropy framework to separate intrinsic and extrinsic contributions to noisy gene expression solely from the profile of expression. We express the experimentally accessible probability distribution of the copy number of the gene product (mRNA or protein) by accounting for possible variations in extrinsic factors. The distribution of extrinsic factors is estimated using the maximum entropy principle. Our results show that extrinsic factors qualitatively and quantitatively affect the probability distribution of the gene product. We work out, in detail, the transcription of mRNA from a constitutively expressed promoter in Escherichia coli. We suggest that the variation in extrinsic factors may account for the observed wider-than-Poisson distribution of mRNA copy numbers. We successfully test our framework on a numerical simulation of a simple gene expression scheme that accounts for the variation in extrinsic factors. We also make falsifiable predictions, some of which are tested on previous experiments in E. coli whereas others need verification. Application of the presented framework to more complex situations is also discussed. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  1. Parameter estimation methods for gene circuit modeling from time-series mRNA data: a comparative study.

    PubMed

    Fan, Ming; Kuwahara, Hiroyuki; Wang, Xiaolei; Wang, Suojin; Gao, Xin

    2015-11-01

    Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  2. Modeling genome-wide dynamic regulatory network in mouse lungs with influenza infection using high-dimensional ordinary differential equations.

    PubMed

    Wu, Shuang; Liu, Zhi-Ping; Qiu, Xing; Wu, Hulin

    2014-01-01

    The immune response to viral infection is regulated by an intricate network of many genes and their products. The reverse engineering of gene regulatory networks (GRNs) using mathematical models from time course gene expression data collected after influenza infection is key to our understanding of the mechanisms involved in controlling influenza infection within a host. A five-step pipeline: detection of temporally differentially expressed genes, clustering genes into co-expressed modules, identification of network structure, parameter estimate refinement, and functional enrichment analysis, is developed for reconstructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. Applying the pipeline to the time course gene expression data from influenza-infected mouse lungs, we have identified 20 distinct temporal expression patterns in the differentially expressed genes and constructed a module-based dynamic network using a linear ODE model. Both intra-module and inter-module annotations and regulatory relationships of our inferred network show some interesting findings and are highly consistent with existing knowledge about the immune response in mice after influenza infection. The proposed method is a computationally efficient, data-driven pipeline bridging experimental data, mathematical modeling, and statistical analysis. The application to the influenza infection data elucidates the potentials of our pipeline in providing valuable insights into systematic modeling of complicated biological processes.

  3. Bayesian estimation of differential transcript usage from RNA-seq data.

    PubMed

    Papastamoulis, Panagiotis; Rattray, Magnus

    2017-11-27

    Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.

  4. Multiparametric Determination of Radiation Risk

    NASA Technical Reports Server (NTRS)

    Richmond, Robert C.

    2003-01-01

    Predicting risk of human cancer following exposure to ionizing space radiation is challenging in part because of uncertainties of low-dose distribution amongst cells, of unknown potentially synergistic effects of microgravity upon cellular protein-expression, and of processing dose-related damage within cells to produce rare and late-appearing malignant transformation, degrade the confidence of cancer risk-estimates. The NASA- specific responsibility to estimate the risks of radiogenic cancer in a limited number of astronauts is not amenable to epidemiologic study, thereby increasing this challenge. Developing adequately sensitive cellular biodosimeters that simultaneously report 1) the quantity of absorbed close after exposure to ionizing radiation, 2) the quality of radiation delivering that dose, and 3) the risk of developing malignant transformation by the cells absorbing that dose could be useful for resolving these challenges. Use of a multiparametric cellular biodosimeter is suggested using analyses of gene-expression and protein-expression whereby large datasets of cellular response to radiation-induced damage are obtained and analyzed for expression-profiles correlated with established end points and molecular markers predictive for cancer-risk. Analytical techniques of genomics and proteomics may be used to establish dose-dependency of multiple gene- and protein- expressions resulting from radiation-induced cellular damage. Furthermore, gene- and protein-expression from cells in microgravity are known to be altered relative to cells grown on the ground at 1g. Therefore, hypotheses are proposed that 1) macromolecular expression caused by radiation-induced damage in cells in microgravity may be different than on the ground, and 2) different patterns of macromolecular expression in microgravity may alter human radiogenic cancer risk relative to radiation exposure on Earth. A new paradigm is accordingly suggested as a national database wherein genomic and proteomic datasets are registered and interrogated in order to provide statistically significant dose-dependent risk estimation of radiogenic cancer in astronauts.

  5. GC-Content Normalization for RNA-Seq Data

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264

  6. Inference for High-dimensional Differential Correlation Matrices.

    PubMed

    Cai, T Tony; Zhang, Anru

    2016-01-01

    Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.

  7. IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing

    PubMed Central

    Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason; Wang, Xiu-Jie; Au, Kin Fai

    2017-01-01

    Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only. PMID:27899656

  8. Use of gene-expression programming to estimate Manning’s roughness coefficient for high gradient streams

    USGS Publications Warehouse

    Azamathulla, H. Md.; Jarrett, Robert D.

    2013-01-01

    Manning’s roughness coefficient (n) has been widely used in the estimation of flood discharges or depths of flow in natural channels. Therefore, the selection of appropriate Manning’s nvalues is of paramount importance for hydraulic engineers and hydrologists and requires considerable experience, although extensive guidelines are available. Generally, the largest source of error in post-flood estimates (termed indirect measurements) is due to estimates of Manning’s n values, particularly when there has been minimal field verification of flow resistance. This emphasizes the need to improve methods for estimating n values. The objective of this study was to develop a soft computing model in the estimation of the Manning’s n values using 75 discharge measurements on 21 high gradient streams in Colorado, USA. The data are from high gradient (S > 0.002 m/m), cobble- and boulder-bed streams for within bank flows. This study presents Gene-Expression Programming (GEP), an extension of Genetic Programming (GP), as an improved approach to estimate Manning’s roughness coefficient for high gradient streams. This study uses field data and assessed the potential of gene-expression programming (GEP) to estimate Manning’s n values. GEP is a search technique that automatically simplifies genetic programs during an evolutionary processes (or evolves) to obtain the most robust computer program (e.g., simplify mathematical expressions, decision trees, polynomial constructs, and logical expressions). Field measurements collected by Jarrett (J Hydraulic Eng ASCE 110: 1519–1539, 1984) were used to train the GEP network and evolve programs. The developed network and evolved programs were validated by using observations that were not involved in training. GEP and ANN-RBF (artificial neural network-radial basis function) models were found to be substantially more effective (e.g., R2 for testing/validation of GEP and RBF-ANN is 0.745 and 0.65, respectively) than Jarrett’s (J Hydraulic Eng ASCE 110: 1519–1539, 1984) equation (R2 for testing/validation equals 0.58) in predicting the Manning’s n.

  9. Regulation of gene expression in the mammalian eye and its relevance to eye disease

    PubMed Central

    Scheetz, Todd E.; Kim, Kwang-Youn A.; Swiderski, Ruth E.; Philp, Alisdair R.; Braun, Terry A.; Knudtson, Kevin L.; Dorrance, Anne M.; DiBona, Gerald F.; Huang, Jian; Casavant, Thomas L.; Sheffield, Val C.; Stone, Edwin M.

    2006-01-01

    We used expression quantitative trait locus mapping in the laboratory rat (Rattus norvegicus) to gain a broad perspective of gene regulation in the mammalian eye and to identify genetic variation relevant to human eye disease. Of >31,000 gene probes represented on an Affymetrix expression microarray, 18,976 exhibited sufficient signal for reliable analysis and at least 2-fold variation in expression among 120 F2 rats generated from an SR/JrHsd × SHRSP intercross. Genome-wide linkage analysis with 399 genetic markers revealed significant linkage with at least one marker for 1,300 probes (α = 0.001; estimated empirical false discovery rate = 2%). Both contiguous and noncontiguous loci were found to be important in regulating mammalian eye gene expression. We investigated one locus of each type in greater detail and identified putative transcription-altering variations in both cases. We found an inserted cREL binding sequence in the 5′ flanking sequence of the Abca4 gene associated with an increased expression level of that gene, and we found a mutation of the gene encoding thyroid hormone receptor β2 associated with a decreased expression level of the gene encoding short-wavelength sensitive opsin (Opn1sw). In addition to these positional studies, we performed a pairwise analysis of gene expression to identify genes that are regulated in a coordinated manner and used this approach to validate two previously undescribed genes involved in the human disease Bardet–Biedl syndrome. These data and analytical approaches can be used to facilitate the discovery of additional genes and regulatory elements involved in human eye disease. PMID:16983098

  10. Transcriptional profiles of bovine in vivo pre-implantation development.

    PubMed

    Jiang, Zongliang; Sun, Jiangwen; Dong, Hong; Luo, Oscar; Zheng, Xinbao; Obergfell, Craig; Tang, Yong; Bi, Jinbo; O'Neill, Rachel; Ruan, Yijun; Chen, Jingbo; Tian, Xiuchun Cindy

    2014-09-04

    During mammalian pre-implantation embryonic development dramatic and orchestrated changes occur in gene transcription. The identification of the complete changes has not been possible until the development of the Next Generation Sequencing Technology. Here we report comprehensive transcriptome dynamics of single matured bovine oocytes and pre-implantation embryos developed in vivo. Surprisingly, more than half of the estimated 22,000 bovine genes, 11,488 to 12,729 involved in more than 100 pathways, is expressed in oocytes and early embryos. Despite the similarity in the total numbers of genes expressed across stages, the nature of the expressed genes is dramatically different. A total of 2,845 genes were differentially expressed among different stages, of which the largest change was observed between the 4- and 8-cell stages, demonstrating that the bovine embryonic genome is activated at this transition. Additionally, 774 genes were identified as only expressed/highly enriched in particular stages of development, suggesting their stage-specific roles in embryogenesis. Using weighted gene co-expression network analysis, we found 12 stage-specific modules of co-expressed genes that can be used to represent the corresponding stage of development. Furthermore, we identified conserved key members (or hub genes) of the bovine expressed gene networks. Their vast association with other embryonic genes suggests that they may have important regulatory roles in embryo development; yet, the majority of the hub genes are relatively unknown/under-studied in embryos. We also conducted the first comparison of embryonic expression profiles across three mammalian species, human, mouse and bovine, for which RNA-seq data are available. We found that the three species share more maternally deposited genes than embryonic genome activated genes. More importantly, there are more similarities in embryonic transcriptomes between bovine and humans than between humans and mice, demonstrating that bovine embryos are better models for human embryonic development. This study provides a comprehensive examination of gene activities in bovine embryos and identified little-known potential master regulators of pre-implantation development.

  11. In Vivo Regulation of Human Skeletal Muscle Gene Expression by Thyroid Hormone

    PubMed Central

    Clément, Karine; Viguerie, Nathalie; Diehn, Maximilian; Alizadeh, Ash; Barbe, Pierre; Thalamas, Claire; Storey, John D.; Brown, Patrick O.; Barsh, Greg S.; Langin, Dominique

    2002-01-01

    Thyroid hormones are key regulators of metabolism that modulate transcription via nuclear receptors. Hyperthyroidism is associated with increased metabolic rate, protein breakdown, and weight loss. Although the molecular actions of thyroid hormones have been studied thoroughly, their pleiotropic effects are mediated by complex changes in expression of an unknown number of target genes. Here, we measured patterns of skeletal muscle gene expression in five healthy men treated for 14 days with 75 μg of triiodothyronine, using 24,000 cDNA element microarrays. To analyze the data, we used a new statistical method that identifies significant changes in expression and estimates the false discovery rate. The 381 up-regulated genes were involved in a wide range of cellular functions including transcriptional control, mRNA maturation, protein turnover, signal transduction, cellular trafficking, and energy metabolism. Only two genes were down-regulated. Most of the genes are novel targets of thyroid hormone. Cluster analysis of triiodothyronine-regulated gene expression among 19 different human tissues or cell lines revealed sets of coregulated genes that serve similar biologic functions. These results define molecular signatures that help to understand the physiology and pathophysiology of thyroid hormone action. [The list of transcripts corresponding to up-regulated and down-regulated genes is available as a web supplement at http://www.genome.org.] PMID:11827947

  12. Evidence of sex-bias in gene expression in the brain transcriptome of two populations of rainbow trout (Oncorhynchus mykiss) with divergent life histories.

    PubMed

    Hale, Matthew C; McKinney, Garrett J; Thrower, Frank P; Nichols, Krista M

    2018-01-01

    Sex-bias in gene expression is a mechanism that can generate phenotypic variance between the sexes, however, relatively little is known about how patterns of sex-bias vary during development, and how variable sex-bias is between different populations. To that end, we measured sex-bias in gene expression in the brain transcriptome of rainbow trout (Oncorhynchus mykiss) during the first two years of development. Our sampling included from the fry stage through to when O. mykiss either migrate to the ocean or remain resident and undergo sexual maturation. Samples came from two F1 lines: One from migratory steelhead trout and one from resident rainbow trout. All samples were reared in a common garden environment and RNA sequencing (RNA-seq) was used to estimate patterns of gene expression. A total of 1,716 (4.6% of total) genes showed evidence of sex-bias in gene expression in at least one time point. The majority (96.7%) of sex-biased genes were differentially expressed during the second year of development, indicating that patterns of sex-bias in expression are tied to key developmental events, such as migration and sexual maturation. Mapping of differentially expressed genes to the O. mykiss genome revealed that the X chromosome is enriched for female upregulated genes, and this may indicate a lack of dosage compensation in rainbow trout. There were many more sex-biased genes in the migratory line than the resident line suggesting differences in patterns of gene expression in the brain between populations subjected to different forces of selection. Overall, our results suggest that there is considerable variation in the extent and identity of genes exhibiting sex-bias during the first two years of life. These differentially expressed genes may be connected to developmental differences between the sexes, and/or between adopting a resident or migratory life history.

  13. Changes in gene expression in PBMCs profiles of PPARα target genes in obese and non-obese individuals during fasting.

    PubMed

    Felicidade, Ingrid; Marcarini, Juliana Cristina; Carreira, Clísia Mara; Amarante, Marla Karine; Afman, Lydia A; Mantovani, Mário Sérgio; Ribeiro, Lúcia Regina

    2015-01-01

    The prevalence of obesity has risen dramatically and the World Health Organization estimates that 700 million people will be obese worldwide by 2015. Approximately, 50% of the Brazilian population above 20 years of age is overweight, and 16% is obese. This study aimed to evaluate the differences in the expression of PPARα target genes in human peripheral blood mononuclear cells (PBMCs) and free fatty acids (FFA) in obese and non-obese individuals after 24 h of fasting. We first presented evidence that Brazilian people exhibit expression changes in PPARα target genes in PBMCs under fasting conditions. Q-PCR was utilized to assess the mRNA expression levels of target genes. In both groups, the FFA concentrations increased significantly after 24 h of fasting. The basal FFA mean concentration was two-fold higher in the obese group compared with the non-obese group. After fasting, all genes evaluated in this study showed increased expression levels compared with basal expression in both groups. However, our results reveal no differences in gene expression between the obese and non-obese, more studies are necessary to precisely delineate the associated mechanisms, particularly those that include groups with different degrees of obesity and patients with diabetes mellitus type 2 because the expression of the main genes that are involved in β-oxidation and glucose level maintenance are affected by these factors. © 2014 S. Karger AG, Basel.

  14. Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma

    PubMed Central

    Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

    2007-01-01

    Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825

  15. Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma.

    PubMed

    Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

    2007-12-30

    Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.

  16. Short communication: expression and alternative splicing of POU1F1 pathway genes in preimplantation bovine embryos.

    PubMed

    Laporta, J; Driver, A; Khatib, H

    2011-08-01

    Early embryo loss is a major contributing factor to cow infertility and that 70 to 80% of this loss occurs between d 8 and 16 postfertilization. However, little is known about the molecular mechanisms and the nature of genes involved in normal and abnormal embryonic development. Moreover, information is limited on the contributions of the genomes of dams and of embryos to the development and survival of preimplantation embryos. We hypothesized that proper gene expression level in the developing embryo is essential for embryo survival and pregnancy success. As such, the characterization of expression profiles in early embryos could lead to a better understanding of the mechanisms involved in normal and abnormal embryo development. To test this hypothesis, 2 d-8 embryo populations (degenerate embryos and blastocysts) that differed in morphology and developmental status were investigated. Expression levels of POU1F1 pathway genes were estimated in 4 sets of biological replicate pools of degenerate embryos and blastocysts. The OPN and STAT5A genes were found to be upregulated in degenerate embryos compared with blastocysts, whereas STAT5B showed similar expression levels in both embryo groups. Analysis of splice variants of OPN and STAT5A revealed expression patterns different from the total expression values of these genes. As such, measuring expression of individual transcripts should be considered in gene expression studies. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  17. Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients

    PubMed Central

    2016-01-01

    Abstract Microarray gene expression data sets are jointly analyzed to increase statistical power. They could either be merged together or analyzed by meta-analysis. For a given ensemble of data sets, it cannot be foreseen which of these paradigms, merging or meta-analysis, works better. In this article, three joint analysis methods, Z -score normalization, ComBat and the inverse normal method (meta-analysis) were selected for survival prognosis and risk assessment of breast cancer patients. The methods were applied to eight microarray gene expression data sets, totaling 1324 patients with two clinical endpoints, overall survival and relapse-free survival. The performance derived from the joint analysis methods was evaluated using Cox regression for survival analysis and independent validation used as bias estimation. Overall, Z -score normalization had a better performance than ComBat and meta-analysis. Higher Area Under the Receiver Operating Characteristic curve and hazard ratio were also obtained when independent validation was used as bias estimation. With a lower time and memory complexity, Z -score normalization is a simple method for joint analysis of microarray gene expression data sets. The derived findings suggest further assessment of this method in future survival prediction and cancer classification applications. PMID:26504096

  18. Duplication and diversification of the LEAFY HULL STERILE1 and Oryza sativa MADS5 SEPALLATA lineages in graminoid Poales

    PubMed Central

    2012-01-01

    Background Gene duplication and the subsequent divergence in function of the resulting paralogs via subfunctionalization and/or neofunctionalization is hypothesized to have played a major role in the evolution of plant form. The LEAFY HULL STERILE1 (LHS1) SEPALLATA (SEP) genes have been linked with the origin and diversification of the grass spikelet, but it is uncertain 1) when the duplication event that produced the LHS1 clade and its paralogous lineage Oryza sativa MADS5 (OSM5) occurred, and 2) how changes in gene structure and/or expression might have contributed to subfunctionalization and/or neofunctionalization in the two lineages. Methods Phylogenetic relationships among 84 SEP genes were estimated using Bayesian methods. RNA expression patterns were inferred using in situ hybridization. The patterns of protein sequence and RNA expression evolution were reconstructed using maximum parsimony (MP) and maximum likelihood (ML) methods, respectively. Results Phylogenetic analyses mapped the LHS1/OSM5 duplication event to the base of the grass family. MP character reconstructions estimated a change from cytosine to thymine in the first codon position of the first amino acid after the Zea mays MADS3 (ZMM3) domain converted a glutamine to a stop codon in the OSM5 ancestor following the LHS1/OSM5 duplication event. RNA expression analyses of OSM5 co-orthologs in Avena sativa, Chasmanthium latifolium, Hordeum vulgare, Pennisetum glaucum, and Sorghum bicolor followed by ML reconstructions of these data and previously published analyses estimated a complex pattern of gain and loss of LHS1 and OSM5 expression in different floral organs and different flowers within the spikelet or inflorescence. Conclusions Previous authors have reported that rice OSM5 and LHS1 proteins have different interaction partners indicating that the truncation of OSM5 following the LHS1/OSM5 duplication event has resulted in both partitioned and potentially novel gene functions. The complex pattern of OSM5 and LHS1 expression evolution is not consistent with a simple subfunctionalization model following the gene duplication event, but there is evidence of recent partitioning of OSM5 and LHS1 expression within different floral organs of A. sativa, C. latifolium, P. glaucum and S. bicolor, and between the upper and lower florets of the two-flowered maize spikelet. PMID:22340849

  19. In Utero Fine Particle Air Pollution and Placental Expression of Genes in the Brain-Derived Neurotrophic Factor Signaling Pathway: An ENVIRONAGE Birth Cohort Study.

    PubMed

    Saenen, Nelly D; Plusquin, Michelle; Bijnens, Esmée; Janssen, Bram G; Gyselaers, Wilfried; Cox, Bianca; Fierens, Frans; Molenberghs, Geert; Penders, Joris; Vrijens, Karen; De Boever, Patrick; Nawrot, Tim S

    2015-08-01

    Developmental processes in the placenta and the fetal brain are shaped by the same biological signals. Recent evidence suggests that adaptive responses of the placenta to the maternal environment may influence central nervous system development. We studied the association between in utero exposure to fine particle air pollution with a diameter ≤ 2.5 μm (PM2.5) and placental expression of genes implicated in neural development. Expression of 10 target genes in the brain-derived neurotrophic factor (BDNF) signaling pathway were quantified in placental tissue of 90 mother-infant pairs from the ENVIRONAGE birth cohort using quantitative real-time polymerase chain reaction. Trimester-specific PM2.5 exposure levels were estimated for each mother's home address using a spatiotemporal model. Mixed-effects models were used to evaluate the association between the target genes and PM2.5 exposure measured in different time windows of pregnancy. A 5-μg/m3 increase in residential PM2.5 exposure during the first trimester of pregnancy was associated with a 15.9% decrease [95% confidence interval (CI): -28.7, -3.2%, p = 0.015] in expression of placental BDNF at birth. The corresponding estimate for synapsin 1 (SYN1) was a 24.3% decrease (95% CI: -42.8, -5.8%, p = 0.011). Placental expression of BDNF and SYN1, two genes implicated in normal neurodevelopmental trajectories, decreased with increasing in utero exposure to PM2.5. Future studies are needed to confirm our findings and evaluate the potential relevance of associations between PM2.5 and placental expression of BDNF and SYN1 on neurodevelopment. We provide the first molecular epidemiological evidence concerning associations between in utero fine particle air pollution exposure and the expression of genes that may influence neurodevelopmental processes.

  20. Quantification of differential gene expression by multiplexed targeted resequencing of cDNA

    PubMed Central

    Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.

    2017-01-01

    Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677

  1. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    PubMed Central

    2011-01-01

    Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer cell lines. Conclusions Deep transcriptional sequencing and analysis with targeted and spliced alignment methods can effectively identify TIC events across the genome in individual tissues. Prostate and reference samples exhibit a wide range of TIC events, involving more genes than estimated previously using ESTs. Tissue specificity of TIC events is correlated with expression patterns of the upstream gene. Some TIC events, such as MSMB-NCOA4, may play functional roles in cancer. PMID:21261984

  2. Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

    PubMed Central

    Racle, Julien; de Jonge, Kaat; Baumgaertner, Petra; Speiser, Daniel E

    2017-01-01

    Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org). PMID:29130882

  3. Intraindividual dynamics of transcriptome and genome-wide stability of DNA methylation

    PubMed Central

    Furukawa, Ryohei; Hachiya, Tsuyoshi; Ohmomo, Hideki; Shiwa, Yuh; Ono, Kanako; Suzuki, Sadafumi; Satoh, Mamoru; Hitomi, Jiro; Sobue, Kenji; Shimizu, Atsushi

    2016-01-01

    Cytosine methylation at CpG dinucleotides is an epigenetic mechanism that affects the gene expression profiles responsible for the functional differences in various cells and tissues. Although gene expression patterns are dynamically altered in response to various stimuli, the intraindividual dynamics of DNA methylation in human cells are yet to be fully understood. Here, we investigated the extent to which DNA methylation contributes to the dynamics of gene expression by collecting 24 blood samples from two individuals over a period of 3 months. Transcriptome and methylome association analyses revealed that only ~2% of dynamic changes in gene expression could be explained by the intraindividual variation of DNA methylation levels in peripheral blood mononuclear cells and purified monocytes. These results showed that DNA methylation levels remain stable for at least several months, suggesting that disease-associated DNA methylation markers are useful for estimating the risk of disease manifestation. PMID:27192970

  4. Prediction of the contact sensitizing potential of chemicals using analysis of gene expression changes in human THP-1 monocytes.

    PubMed

    Arkusz, Joanna; Stępnik, Maciej; Sobala, Wojciech; Dastych, Jarosław

    2010-11-10

    The aim of this study was to find differentially regulated genes in THP-1 monocytic cells exposed to sensitizers and nonsensitizers and to investigate if such genes could be reliable markers for an in vitro predictive method for the identification of skin sensitizing chemicals. Changes in expression of 35 genes in the THP-1 cell line following treatment with chemicals of different sensitizing potential (from nonsensitizers to extreme sensitizers) were assessed using real-time PCR. Verification of 13 candidate genes by testing a large number of chemicals (an additional 22 sensitizers and 8 nonsensitizers) revealed that prediction of contact sensitization potential was possible based on evaluation of changes in three genes: IL8, HMOX1 and PAIMP1. In total, changes in expression of these genes allowed correct detection of sensitization potential of 21 out of 27 (78%) test sensitizers. The gene expression levels inside potency groups varied and did not allow estimation of sensitization potency of test chemicals. Results of this study indicate that evaluation of changes in expression of proposed biomarkers in THP-1 cells could be a valuable model for preliminary screening of chemicals to discriminate an appreciable majority of sensitizers from nonsensitizers. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  5. Concordance of transcriptional and apical benchmark dose levels for conazole-induced liver effects in mice.

    PubMed

    Bhat, Virunya S; Hester, Susan D; Nesnow, Stephen; Eastmond, David A

    2013-11-01

    The ability to anchor chemical class-based gene expression changes to phenotypic lesions and to describe these changes as a function of dose and time informs mode-of-action determinations and improves quantitative risk assessments. Previous global expression profiling identified a 330-probe cluster differentially expressed and commonly responsive to 3 hepatotumorigenic conazoles (cyproconazole, epoxiconazole, and propiconazole) at 30 days. Extended to 2 more conazoles (triadimefon and myclobutanil), the present assessment encompasses 4 tumorigenic and 1 nontumorigenic conazole. Transcriptional benchmark dose levels (BMDL(T)) were estimated for a subset of the cluster with dose-responsive behavior and a ≥ 5-fold increase or decrease in signal intensity at the highest dose. These genes primarily encompassed CAR/RXR activation, P450 metabolism, liver hypertrophy- glutathione depletion, LPS/IL-1-mediated inhibition of RXR, and NRF2-mediated oxidative stress pathways. Median BMDL(T) estimates from the subset were concordant (within a factor of 2.4) with apical benchmark doses (BMDL(A)) for increased liver weight at 30 days for the 5 conazoles. The 30-day median BMDL(T) estimates were within one-half order of magnitude of the chronic BMDLA for hepatocellular tumors. Potency differences seen in the dose-responsive transcription of certain phase II metabolism, bile acid detoxification, and lipid oxidation genes mirrored each conazole's tumorigenic potency. The 30-day BMDL(T) corresponded to tumorigenic potency on a milligram per kilogram day basis with cyproconazole > epoxiconazole > propiconazole > triadimefon > myclobutanil (nontumorigenic). These results support the utility of measuring short-term gene expression changes to inform quantitative risk assessments from long-term exposures.

  6. DNetDB: The human disease network database based on dysfunctional regulation mechanism.

    PubMed

    Yang, Jing; Wu, Su-Juan; Yang, Shao-You; Peng, Jia-Wei; Wang, Shi-Nuo; Wang, Fu-Yan; Song, Yu-Xing; Qi, Ting; Li, Yi-Xue; Li, Yuan-Yuan

    2016-05-21

    Disease similarity study provides new insights into disease taxonomy, pathogenesis, which plays a guiding role in diagnosis and treatment. The early studies were limited to estimate disease similarities based on clinical manifestations, disease-related genes, medical vocabulary concepts or registry data, which were inevitably biased to well-studied diseases and offered small chance of discovering novel findings in disease relationships. In other words, genome-scale expression data give us another angle to address this problem since simultaneous measurement of the expression of thousands of genes allows for the exploration of gene transcriptional regulation, which is believed to be crucial to biological functions. Although differential expression analysis based methods have the potential to explore new disease relationships, it is difficult to unravel the upstream dysregulation mechanisms of diseases. We therefore estimated disease similarities based on gene expression data by using differential coexpression analysis, a recently emerging method, which has been proved to be more potential to capture dysfunctional regulation mechanisms than differential expression analysis. A total of 1,326 disease relationships among 108 diseases were identified, and the relevant information constituted the human disease network database (DNetDB). Benefiting from the use of differential coexpression analysis, the potential common dysfunctional regulation mechanisms shared by disease pairs (i.e. disease relationships) were extracted and presented. Statistical indicators, common disease-related genes and drugs shared by disease pairs were also included in DNetDB. In total, 1,326 disease relationships among 108 diseases, 5,598 pathways, 7,357 disease-related genes and 342 disease drugs are recorded in DNetDB, among which 3,762 genes and 148 drugs are shared by at least two diseases. DNetDB is the first database focusing on disease similarity from the viewpoint of gene regulation mechanism. It provides an easy-to-use web interface to search and browse the disease relationships and thus helps to systematically investigate etiology and pathogenesis, perform drug repositioning, and design novel therapeutic interventions.Database URL: http://app.scbit.org/DNetDB/ #.

  7. Budget impact analysis of gene expression tests to aid therapy decisions for breast cancer patients in Germany.

    PubMed

    Lux, M P; Nabieva, N; Hildebrandt, T; Rebscher, H; Kümmel, S; Blohmer, J-U; Schrauder, M G

    2018-02-01

    Many women with early-stage, hormone receptor-positive breast cancer may not benefit from adjuvant chemotherapy. Gene expression tests can reduce chemotherapy over- and undertreatment by providing prognostic information on the likelihood of recurrence and, with Oncotype DX, predictive information on chemotherapy benefit. These tests are currently not reimbursed by German healthcare payers. An analysis was conducted to evaluate the budget impact of gene expression tests in Germany. Costs of gene expression tests and medical and non-medical costs associated with treatment were assessed from healthcare payer and societal perspectives. Costs were estimated from data collected at a university hospital and were combined with decision impact data for Oncotype DX, MammaPrint, Prosigna and EndoPredict (EPclin). Changes in chemotherapy use and budget impact were evaluated over 1 year for 20,000 women. Chemotherapy was associated with substantial annual costs of EUR 19,003 and EUR 84,412 per therapy from the healthcare payer and societal perspective, respectively. Compared with standard care, only Oncotype DX was associated with cost savings to healthcare payers and society (EUR 5.9 million and EUR 253 million, respectively). Scenario analysis showed that both women at high clinical but low genomic risk and low clinical but high genomic risk were important contributors to costs. Oncotype DX was the only gene expression test that was estimated to reduce costs versus standard care in Germany. The reimbursement of Oncotype DX testing in standard clinical practice in Germany should be considered. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays.

    PubMed

    McLachlan, G J; Bean, R W; Jones, L Ben-Tovim

    2006-07-01

    An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.

  9. Mapping eQTL Networks with Mixed Graphical Markov Models

    PubMed Central

    Tur, Inma; Roverato, Alberto; Castelo, Robert

    2014-01-01

    Expression quantitative trait loci (eQTL) mapping constitutes a challenging problem due to, among other reasons, the high-dimensional multivariate nature of gene-expression traits. Next to the expression heterogeneity produced by confounding factors and other sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular, and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of all of these factors to end up with a network of direct associations connecting the path from genotype to phenotype. In this article we approach this challenge with mixed graphical Markov models, higher-order conditional independences, and q-order correlation graphs. These models show that additive genetic effects propagate through the network as function of gene–gene correlations. Our estimation of the eQTL network underlying a well-studied yeast data set leads to a sparse structure with more direct genetic and regulatory associations that enable a straightforward comparison of the genetic control of gene expression across chromosomes. Interestingly, it also reveals that eQTLs explain most of the expression variability of network hub genes. PMID:25271303

  10. An effective fuzzy kernel clustering analysis approach for gene expression data.

    PubMed

    Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

    2015-01-01

    Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.

  11. Transcriptional dynamics of the developing sweet cherry (Prunus avium L.) fruit: sequencing, annotation and expression profiling of exocarp-associated genes

    PubMed Central

    Alkio, Merianne; Jonas, Uwe; Declercq, Myriam; Van Nocker, Steven; Knoche, Moritz

    2014-01-01

    The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., ‘Regina’), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop. PMID:26504533

  12. The combined expression patterns of Ikaros isoforms characterize different hematological tumor subtypes.

    PubMed

    Orozco, Carlos A; Acevedo, Andrés; Cortina, Lazaro; Cuellar, Gina E; Duarte, Mónica; Martín, Liliana; Mesa, Néstor M; Muñoz, Javier; Portilla, Carlos A; Quijano, Sandra M; Quintero, Guillermo; Rodriguez, Miriam; Saavedra, Carlos E; Groot, Helena; Torres, María M; López-Segura, Valeriano

    2013-01-01

    A variety of genetic alterations are considered hallmarks of cancer development and progression. The Ikaros gene family, encoding for key transcription factors in hematopoietic development, provides several examples as genetic defects in these genes are associated with the development of different types of leukemia. However, the complex patterns of expression of isoforms in Ikaros family genes has prevented their use as clinical markers. In this study, we propose the use of the expression profiles of the Ikaros isoforms to classify various hematological tumor diseases. We have standardized a quantitative PCR protocol to estimate the expression levels of the Ikaros gene exons. Our analysis reveals that these levels are associated with specific types of leukemia and we have found differences in the levels of expression relative to five interexonic Ikaros regions for all diseases studied. In conclusion, our method has allowed us to precisely discriminate between B-ALL, CLL and MM cases. Differences between the groups of lymphoid and myeloid pathologies were also identified in the same way.

  13. Differential retention of metabolic genes following whole-genome duplication.

    PubMed

    Gout, Jean-François; Duret, Laurent; Kahn, Daniel

    2009-05-01

    Classical studies in Metabolic Control Theory have shown that metabolic fluxes usually exhibit little sensitivity to changes in individual enzyme activity, yet remain sensitive to global changes of all enzymes in a pathway. Therefore, little selective pressure is expected on the dosage or expression of individual metabolic genes, yet entire pathways should still be constrained. However, a direct estimate of this selective pressure had not been evaluated. Whole-genome duplications (WGDs) offer a good opportunity to address this question by analyzing the fates of metabolic genes during the massive gene losses that follow. Here, we take advantage of the successive rounds of WGD that occurred in the Paramecium lineage. We show that metabolic genes exhibit different gene retention patterns than nonmetabolic genes. Contrary to what was expected for individual genes, metabolic genes appeared more retained than other genes after the recent WGD, which was best explained by selection for gene expression operating on entire pathways. Metabolic genes also tend to be less retained when present at high copy number before WGD, contrary to other genes that show a positive correlation between gene retention and preduplication copy number. This is rationalized on the basis of the classical concave relationship relating metabolic fluxes with enzyme expression.

  14. Web application for automatic prediction of gene translation elongation efficiency.

    PubMed

    Sokolov, Vladimir; Zuraev, Bulat; Lashin, Sergei; Matushkin, Yury

    2015-09-03

    Expression efficiency is one of the major characteristics describing genes in various modern investigations. Expression efficiency of genes is regulated at various stages: transcription, translation, posttranslational protein modification and others. In this study, a special EloE (Elongation Efficiency) web application is described. The EloE sorts the organism's genes in a descend order on their theoretical rate of the elongation stage of translation based on the analysis of their nucleotide sequences. Obtained theoretical data have a significant correlation with available experimental data of gene expression in various organisms. In addition, the program identifies preferential codons in organism's genes and defines distribution of potential secondary structures energy in 5´ and 3´ regions of mRNA. The EloE can be useful in preliminary estimation of translation elongation efficiency for genes for which experimental data are not available yet. Some results can be used, for instance, in other programs modeling artificial genetic structures in genetically engineered experiments.

  15. Gene regulatory network inference from multifactorial perturbation data using both regression and correlation analyses.

    PubMed

    Xiong, Jie; Zhou, Tong

    2012-01-01

    An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among [Formula: see text] genes is decomposed into [Formula: see text] different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0-1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.

  16. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

    PubMed

    Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh

    2018-04-26

    Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

  17. Optimization of cDNA microarrays procedures using criteria that do not rely on external standards.

    PubMed

    Bruland, Torunn; Anderssen, Endre; Doseth, Berit; Bergum, Hallgeir; Beisvag, Vidar; Laegreid, Astrid

    2007-10-18

    The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish.

  18. Optimization of cDNA microarrays procedures using criteria that do not rely on external standards

    PubMed Central

    Bruland, Torunn; Anderssen, Endre; Doseth, Berit; Bergum, Hallgeir; Beisvag, Vidar; Lægreid, Astrid

    2007-01-01

    Background The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. Results We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. Conclusion The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish. PMID:17949480

  19. A High-Throughput Data Mining of Single Nucleotide Polymorphisms in Coffea Species Expressed Sequence Tags Suggests Differential Homeologous Gene Expression in the Allotetraploid Coffea arabica1[W

    PubMed Central

    Vidal, Ramon Oliveira; Mondego, Jorge Maurício Costa; Pot, David; Ambrósio, Alinne Batista; Andrade, Alan Carvalho; Pereira, Luiz Filipe Protasio; Colombo, Carlos Augusto; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães

    2010-01-01

    Polyploidization constitutes a common mode of evolution in flowering plants. This event provides the raw material for the divergence of function in homeologous genes, leading to phenotypic novelty that can contribute to the success of polyploids in nature or their selection for use in agriculture. Mounting evidence underlined the existence of homeologous expression biases in polyploid genomes; however, strategies to analyze such transcriptome regulation remained scarce. Important factors regarding homeologous expression biases remain to be explored, such as whether this phenomenon influences specific genes, how paralogs are affected by genome doubling, and what is the importance of the variability of homeologous expression bias to genotype differences. This study reports the expressed sequence tag assembly of the allopolyploid Coffea arabica and one of its direct ancestors, Coffea canephora. The assembly was used for the discovery of single nucleotide polymorphisms through the identification of high-quality discrepancies in overlapped expressed sequence tags and for gene expression information indirectly estimated by the transcript redundancy. Sequence diversity profiles were evaluated within C. arabica (Ca) and C. canephora (Cc) and used to deduce the transcript contribution of the Coffea eugenioides (Ce) ancestor. The assignment of the C. arabica haplotypes to the C. canephora (CaCc) or C. eugenioides (CaCe) ancestral genomes allowed us to analyze gene expression contributions of each subgenome in C. arabica. In silico data were validated by the quantitative polymerase chain reaction and allele-specific combination TaqMAMA-based method. The presence of differential expression of C. arabica homeologous genes and its implications in coffee gene expression, ontology, and physiology are discussed. PMID:20864545

  20. Inference for High-dimensional Differential Correlation Matrices *

    PubMed Central

    Cai, T. Tony; Zhang, Anru

    2015-01-01

    Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed. PMID:26500380

  1. Sign: large-scale gene network estimation environment for high performance computing.

    PubMed

    Tamada, Yoshinori; Shimamura, Teppei; Yamaguchi, Rui; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru

    2011-01-01

    Our research group is currently developing software for estimating large-scale gene networks from gene expression data. The software, called SiGN, is specifically designed for the Japanese flagship supercomputer "K computer" which is planned to achieve 10 petaflops in 2012, and other high performance computing environments including Human Genome Center (HGC) supercomputer system. SiGN is a collection of gene network estimation software with three different sub-programs: SiGN-BN, SiGN-SSM and SiGN-L1. In these three programs, five different models are available: static and dynamic nonparametric Bayesian networks, state space models, graphical Gaussian models, and vector autoregressive models. All these models require a huge amount of computational resources for estimating large-scale gene networks and therefore are designed to be able to exploit the speed of 10 petaflops. The software will be available freely for "K computer" and HGC supercomputer system users. The estimated networks can be viewed and analyzed by Cell Illustrator Online and SBiP (Systems Biology integrative Pipeline). The software project web site is available at http://sign.hgc.jp/ .

  2. A Computational Framework for Analyzing Stochasticity in Gene Expression

    PubMed Central

    Sherman, Marc S.; Cohen, Barak A.

    2014-01-01

    Stochastic fluctuations in gene expression give rise to distributions of protein levels across cell populations. Despite a mounting number of theoretical models explaining stochasticity in protein expression, we lack a robust, efficient, assumption-free approach for inferring the molecular mechanisms that underlie the shape of protein distributions. Here we propose a method for inferring sets of biochemical rate constants that govern chromatin modification, transcription, translation, and RNA and protein degradation from stochasticity in protein expression. We asked whether the rates of these underlying processes can be estimated accurately from protein expression distributions, in the absence of any limiting assumptions. To do this, we (1) derived analytical solutions for the first four moments of the protein distribution, (2) found that these four moments completely capture the shape of protein distributions, and (3) developed an efficient algorithm for inferring gene expression rate constants from the moments of protein distributions. Using this algorithm we find that most protein distributions are consistent with a large number of different biochemical rate constant sets. Despite this degeneracy, the solution space of rate constants almost always informs on underlying mechanism. For example, we distinguish between regimes where transcriptional bursting occurs from regimes reflecting constitutive transcript production. Our method agrees with the current standard approach, and in the restrictive regime where the standard method operates, also identifies rate constants not previously obtainable. Even without making any assumptions we obtain estimates of individual biochemical rate constants, or meaningful ratios of rate constants, in 91% of tested cases. In some cases our method identified all of the underlying rate constants. The framework developed here will be a powerful tool for deducing the contributions of particular molecular mechanisms to specific patterns of gene expression. PMID:24811315

  3. Expression profiles and associations of muscle regulatory factor (MRF) genes with growth traits in Tibetan chickens.

    PubMed

    Zhang, R; Li, R; Zhi, L; Xu, Y; Lin, Y; Chen, L

    2018-02-01

    1. Muscle regulatory factors (MRFs), including Myf5, Myf6 (MRF4/herculin), MyoD and MyoG (myogenin), play pivotal roles in muscle growth and development. Therefore, they are considered as candidate genes for meat production traits in livestock and poultry. 2. The objective of this study was to investigate the expression profiles of these genes in skeletal muscles (breast muscle and thigh muscle) at 5 developmental stages (0, 81, 119, 154 and 210 d old) of Tibetan chickens. Relationships between expressions of these genes and growth and carcass traits in these chickens were also estimated. 3. The expression profiles showed that in the breast muscle of both genders the mRNA levels of MRF genes were highest on the day of hatching, then declined significantly from d 0 to d 81, and fluctuated in a certain range from d 81 to d 210. However, the expression of Myf5, Myf6 and MyoG reached peaks in the thigh muscle in 118-d-old females and for MyoD in 154-d-old females, whereas the mRNA amounts of MRF genes in the male thigh muscle were in a narrow range from d 0 to d 210. 4. Correlation analysis suggested that gender had an influence on the relationships of MRF gene expression with growth traits. The RNA levels of MyoD, Myf5 genes in male breast muscle were positively related with several growth traits of Tibetan chickens (P < 0.05). No correlation was found between expressions of MRF genes and carcass traits of the chickens. 5. These results will provide a base for functional studies of MRF genes on growth and development of Tibetan chickens, as well as selective breeding and resource exploration.

  4. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data

    PubMed Central

    Vallejos, Catalina A.; Marioni, John C.; Richardson, Sylvia

    2015-01-01

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach. PMID:26107944

  5. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.

    PubMed

    Vallejos, Catalina A; Marioni, John C; Richardson, Sylvia

    2015-06-01

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.

  6. Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data

    PubMed Central

    Daub, Carsten O; Steuer, Ralf; Selbig, Joachim; Kloska, Sebastian

    2004-01-01

    Background The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. Results In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from kloska@scienion.de upon request. Conclusion The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended. PMID:15339346

  7. Mixture models for detecting differentially expressed genes in microarrays.

    PubMed

    Jones, Liat Ben-Tovim; Bean, Richard; McLachlan, Geoffrey J; Zhu, Justin Xi

    2006-10-01

    An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.

  8. On construction of stochastic genetic networks based on gene expression sequences.

    PubMed

    Ching, Wai-Ki; Ng, Michael M; Fung, Eric S; Akutsu, Tatsuya

    2005-08-01

    Reconstruction of genetic regulatory networks from time series data of gene expression patterns is an important research topic in bioinformatics. Probabilistic Boolean Networks (PBNs) have been proposed as an effective model for gene regulatory networks. PBNs are able to cope with uncertainty, corporate rule-based dependencies between genes and discover the sensitivity of genes in their interactions with other genes. However, PBNs are unlikely to use directly in practice because of huge amount of computational cost for obtaining predictors and their corresponding probabilities. In this paper, we propose a multivariate Markov model for approximating PBNs and describing the dynamics of a genetic network for gene expression sequences. The main contribution of the new model is to preserve the strength of PBNs and reduce the complexity of the networks. The number of parameters of our proposed model is O(n2) where n is the number of genes involved. We also develop efficient estimation methods for solving the model parameters. Numerical examples on synthetic data sets and practical yeast data sequences are given to demonstrate the effectiveness of the proposed model.

  9. Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce

    PubMed Central

    Namroud, Marie-Claire; Beaulieu, Jean; Juge, Nicolas; Laroche, Jérôme; Bousquet, Jean

    2008-01-01

    Conifers are characterized by a large genome size and a rapid decay of linkage disequilibrium, most often within gene limits. Genome scans based on noncoding markers are less likely to detect molecular adaptation linked to genes in these species. In this study, we assessed the effectiveness of a genome-wide single nucleotide polymorphism (SNP) scan focused on expressed genes in detecting local adaptation in a conifer species. Samples were collected from six natural populations of white spruce (Picea glauca) moderately differentiated for several quantitative characters. A total of 534 SNPs representing 345 expressed genes were analysed. Genes potentially under natural selection were identified by estimating the differentiation in SNP frequencies among populations (FST) and identifying outliers, and by estimating local differentiation using a Bayesian approach. Both average expected heterozygosity and population differentiation estimates (HE = 0.270 and FST = 0.006) were comparable to those obtained with other genetic markers. Of all genes, 5.5% were identified as outliers with FST at the 95% confidence level, while 14% were identified as candidates for local adaptation with the Bayesian method. There was some overlap between the two gene sets. More than half of the candidate genes for local adaptation were specific to the warmest population, about 20% to the most arid population, and 15% to the coldest and most humid higher altitude population. These adaptive trends were consistent with the genes’ putative functions and the divergence in quantitative traits noted among the populations. The results suggest that an approach separating the locus and population effects is useful to identify genes potentially under selection. These candidates are worth exploring in more details at the physiological and ecological levels. PMID:18662225

  10. [Correlation of codon biases and potential secondary structures with mRNA translation efficiency in unicellular organisms].

    PubMed

    Vladimirov, N V; Likhoshvaĭ, V A; Matushkin, Iu G

    2007-01-01

    Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/.

  11. Selection of reference genes for gene expression studies in virus-infected monocots using quantitative real-time PCR.

    PubMed

    Zhang, Kun; Niu, Shaofang; Di, Dianping; Shi, Lindan; Liu, Deshui; Cao, Xiuling; Miao, Hongqin; Wang, Xianbing; Han, Chenggui; Yu, Jialin; Li, Dawei; Zhang, Yongliang

    2013-10-10

    Both genome-wide transcriptomic surveys of the mRNA expression profiles and virus-induced gene silencing-based molecular studies of target gene during virus-plant interaction involve the precise estimation of the transcript abundance. Quantitative real-time PCR (qPCR) is the most widely adopted technique for mRNA quantification. In order to obtain reliable quantification of transcripts, identification of the best reference genes forms the basis of the preliminary work. Nevertheless, the stability of internal controls in virus-infected monocots needs to be fully explored. In this work, the suitability of ten housekeeping genes (ACT, EF1α, FBOX, GAPDH, GTPB, PP2A, SAND, TUBβ, UBC18 and UK) for potential use as reference genes in qPCR were investigated in five different monocot plants (Brachypodium, barley, sorghum, wheat and maize) under infection with different viruses including Barley stripe mosaic virus (BSMV), Brome mosaic virus (BMV), Rice black-streaked dwarf virus (RBSDV) and Sugarcane mosaic virus (SCMV). By using three different algorithms, the most appropriate reference genes or their combinations were identified for different experimental sets and their effectiveness for the normalisation of expression studies were further validated by quantitative analysis of a well-studied PR-1 gene. These results facilitate the selection of desirable reference genes for more accurate gene expression studies in virus-infected monocots. Copyright © 2013 Elsevier B.V. All rights reserved.

  12. Molecular diversity and population structure at the Cytochrome P450 3A5 gene in Africa

    PubMed Central

    2013-01-01

    Background Cytochrome P450 3A5 (CYP3A5) is an enzyme involved in the metabolism of many therapeutic drugs. CYP3A5 expression levels vary between individuals and populations, and this contributes to adverse clinical outcomes. Variable expression is largely attributed to four alleles, CYP3A5*1 (expresser allele); CYP3A5*3 (rs776746), CYP3A5*6 (rs10264272) and CYP3A5*7 (rs41303343) (low/non-expresser alleles). Little is known about CYP3A5 variability in Africa, a region with considerable genetic diversity. Here we used a multi-disciplinary approach to characterize CYP3A5 variation in geographically and ethnically diverse populations from in and around Africa, and infer the evolutionary processes that have shaped patterns of diversity in this gene. We genotyped 2538 individuals from 36 diverse populations in and around Africa for common low/non-expresser CYP3A5 alleles, and re-sequenced the CYP3A5 gene in five Ethiopian ethnic groups. We estimated the ages of low/non-expresser CYP3A5 alleles using a linked microsatellite and assuming a step-wise mutation model of evolution. Finally, we examined a hypothesis that CYP3A5 is important in salt retention adaptation by performing correlations with ecological data relating to aridity for the present day, 10,000 and 50,000 years ago. Results We estimate that ~43% of individuals within our African dataset express CYP3A5, which is lower than previous independent estimates for the region. We found significant intra-African variability in CYP3A5 expression phenotypes. Within Africa the highest frequencies of high-activity alleles were observed in equatorial and Niger-Congo speaking populations. Ethiopian allele frequencies were intermediate between those of other sub-Saharan African and non-African groups. Re-sequencing of CYP3A5 identified few additional variants likely to affect CYP3A5 expression. We estimate the ages of CYP3A5*3 as ~76,400 years and CYP3A5*6 as ~218,400 years. Finally we report that global CYP3A5 expression levels correlated significantly with aridity measures for 10,000 [Spearmann’s Rho= −0.465, p=0.004] and 50,000 years ago [Spearmann’s Rho= −0.379, p=0.02]. Conclusions Significant intra-African diversity at the CYP3A5 gene is likely to contribute to multiple pharmacogenetic profiles across the continent. Significant correlations between CYP3A5 expression phenotypes and aridity data are consistent with a hypothesis that the enzyme is important in salt-retention adaptation. PMID:23641907

  13. Expression Divergence Is Correlated with Sequence Evolution but Not Positive Selection in Conifers.

    PubMed

    Hodgins, Kathryn A; Yeaman, Sam; Nurkowski, Kristin A; Rieseberg, Loren H; Aitken, Sally N

    2016-06-01

    The evolutionary and genomic determinants of sequence evolution in conifers are poorly understood, and previous studies have found only limited evidence for positive selection. Using RNAseq data, we compared gene expression profiles to patterns of divergence and polymorphism in 44 seedlings of lodgepole pine (Pinus contorta) and 39 seedlings of interior spruce (Picea glauca × engelmannii) to elucidate the evolutionary forces that shape their genomes and their plastic responses to abiotic stress. We found that rapidly diverging genes tend to have greater expression divergence, lower expression levels, reduced levels of synonymous site diversity, and longer proteins than slowly diverging genes. Similar patterns were identified for the untranslated regions, but with some exceptions. We found evidence that genes with low expression levels had a larger fraction of nearly neutral sites, suggesting a primary role for negative selection in determining the association between evolutionary rate and expression level. There was limited evidence for differences in the rate of positive selection among genes with divergent versus conserved expression profiles and some evidence supporting relaxed selection in genes diverging in expression between the species. Finally, we identified a small number of genes that showed evidence of site-specific positive selection using divergence data alone. However, estimates of the proportion of sites fixed by positive selection (α) were in the range of other plant species with large effective population sizes suggesting relatively high rates of adaptive divergence among conifers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.

    PubMed

    Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho

    2015-10-28

    Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.

  15. Gene Circuit Analysis of the Terminal Gap Gene huckebein

    PubMed Central

    Ashyraliyev, Maksat; Siggens, Ken; Janssens, Hilde; Blom, Joke; Akam, Michael; Jaeger, Johannes

    2009-01-01

    The early embryo of Drosophila melanogaster provides a powerful model system to study the role of genes in pattern formation. The gap gene network constitutes the first zygotic regulatory tier in the hierarchy of the segmentation genes involved in specifying the position of body segments. Here, we use an integrative, systems-level approach to investigate the regulatory effect of the terminal gap gene huckebein (hkb) on gap gene expression. We present quantitative expression data for the Hkb protein, which enable us to include hkb in gap gene circuit models. Gap gene circuits are mathematical models of gene networks used as computational tools to extract regulatory information from spatial expression data. This is achieved by fitting the model to gap gene expression patterns, in order to obtain estimates for regulatory parameters which predict a specific network topology. We show how considering variability in the data combined with analysis of parameter determinability significantly improves the biological relevance and consistency of the approach. Our models are in agreement with earlier results, which they extend in two important respects: First, we show that Hkb is involved in the regulation of the posterior hunchback (hb) domain, but does not have any other essential function. Specifically, Hkb is required for the anterior shift in the posterior border of this domain, which is now reproduced correctly in our models. Second, gap gene circuits presented here are able to reproduce mutants of terminal gap genes, while previously published models were unable to reproduce any null mutants correctly. As a consequence, our models now capture the expression dynamics of all posterior gap genes and some variational properties of the system correctly. This is an important step towards a better, quantitative understanding of the developmental and evolutionary dynamics of the gap gene network. PMID:19876378

  16. Gene circuit analysis of the terminal gap gene huckebein.

    PubMed

    Ashyraliyev, Maksat; Siggens, Ken; Janssens, Hilde; Blom, Joke; Akam, Michael; Jaeger, Johannes

    2009-10-01

    The early embryo of Drosophila melanogaster provides a powerful model system to study the role of genes in pattern formation. The gap gene network constitutes the first zygotic regulatory tier in the hierarchy of the segmentation genes involved in specifying the position of body segments. Here, we use an integrative, systems-level approach to investigate the regulatory effect of the terminal gap gene huckebein (hkb) on gap gene expression. We present quantitative expression data for the Hkb protein, which enable us to include hkb in gap gene circuit models. Gap gene circuits are mathematical models of gene networks used as computational tools to extract regulatory information from spatial expression data. This is achieved by fitting the model to gap gene expression patterns, in order to obtain estimates for regulatory parameters which predict a specific network topology. We show how considering variability in the data combined with analysis of parameter determinability significantly improves the biological relevance and consistency of the approach. Our models are in agreement with earlier results, which they extend in two important respects: First, we show that Hkb is involved in the regulation of the posterior hunchback (hb) domain, but does not have any other essential function. Specifically, Hkb is required for the anterior shift in the posterior border of this domain, which is now reproduced correctly in our models. Second, gap gene circuits presented here are able to reproduce mutants of terminal gap genes, while previously published models were unable to reproduce any null mutants correctly. As a consequence, our models now capture the expression dynamics of all posterior gap genes and some variational properties of the system correctly. This is an important step towards a better, quantitative understanding of the developmental and evolutionary dynamics of the gap gene network.

  17. Identification of Reference Genes for Quantitative Gene Expression Studies in a Non-Model Tree Pistachio (Pistacia vera L.)

    PubMed Central

    Moazzam Jazi, Maryam; Ghadirzadeh Khorzoghi, Effat; Botanga, Christopher; Seyedi, Seyed Mahdi

    2016-01-01

    The tree species, Pistacia vera (P. vera) is an important commercial product that is salt-tolerant and long-lived, with a possible lifespan of over one thousand years. Gene expression analysis is an efficient method to explore the possible regulatory mechanisms underlying these characteristics. Therefore, having the most suitable set of reference genes is required for transcript level normalization under different conditions in P. vera. In the present study, we selected eight widely used reference genes, ACT, EF1α, α-TUB, β-TUB, GAPDH, CYP2, UBQ10, and 18S rRNA. Using qRT-PCR their expression was assessed in 54 different samples of three cultivars of P. vera. The samples were collected from different organs under various abiotic treatments (cold, drought, and salt) across three time points. Several statistical programs (geNorm, NormFinder, and BestKeeper) were applied to estimate the expression stability of candidate reference genes. Results obtained from the statistical analysis were then exposed to Rank aggregation package to generate a consensus gene rank. Based on our results, EF1α was found to be the superior reference gene in all samples under all abiotic treatments. In addition to EF1α, ACT and β-TUB were the second best reference genes for gene expression analysis in leaf and root. We recommended β-TUB as the second most stable gene for samples under the cold and drought treatments, while ACT holds the same position in samples analyzed under salt treatment. This report will benefit future research on the expression profiling of P. vera and other members of the Anacardiaceae family. PMID:27308855

  18. Identification of Reference Genes for Quantitative Gene Expression Studies in a Non-Model Tree Pistachio (Pistacia vera L.).

    PubMed

    Moazzam Jazi, Maryam; Ghadirzadeh Khorzoghi, Effat; Botanga, Christopher; Seyedi, Seyed Mahdi

    2016-01-01

    The tree species, Pistacia vera (P. vera) is an important commercial product that is salt-tolerant and long-lived, with a possible lifespan of over one thousand years. Gene expression analysis is an efficient method to explore the possible regulatory mechanisms underlying these characteristics. Therefore, having the most suitable set of reference genes is required for transcript level normalization under different conditions in P. vera. In the present study, we selected eight widely used reference genes, ACT, EF1α, α-TUB, β-TUB, GAPDH, CYP2, UBQ10, and 18S rRNA. Using qRT-PCR their expression was assessed in 54 different samples of three cultivars of P. vera. The samples were collected from different organs under various abiotic treatments (cold, drought, and salt) across three time points. Several statistical programs (geNorm, NormFinder, and BestKeeper) were applied to estimate the expression stability of candidate reference genes. Results obtained from the statistical analysis were then exposed to Rank aggregation package to generate a consensus gene rank. Based on our results, EF1α was found to be the superior reference gene in all samples under all abiotic treatments. In addition to EF1α, ACT and β-TUB were the second best reference genes for gene expression analysis in leaf and root. We recommended β-TUB as the second most stable gene for samples under the cold and drought treatments, while ACT holds the same position in samples analyzed under salt treatment. This report will benefit future research on the expression profiling of P. vera and other members of the Anacardiaceae family.

  19. Peripheral blood gene expression signature differentiates children with autism from unaffected siblings

    PubMed Central

    Kong, SW; Shimizu-Motohashi, Y; Campbell, MG; Lee, IH; Collins, CD; Brewster, SJ; Holm, IA; Rappaport, L

    2013-01-01

    Autism spectrum disorder (ASD) is one of the most prevalent neurodevelopmental disorders with high heritability, yet a majority of genetic contribution to pathophysiology is not known. Siblings of individuals with ASD are at increased risk for ASD and autistic traits, but the genetic contribution for simplex families is estimated to be less when compared to multiplex families. To explore the genomic (dis-) similarity between proband and unaffected sibling in simplex families, we used genome-wide gene expression profiles of blood from 20 proband-unaffected sibling pairs and 18 unrelated control individuals. The global gene expression profiles of unaffected siblings were more similar to those from probands as they shared genetic and environmental background. One hundred eighty nine genes were significantly differentially expressed between proband-sib pairs (nominal p-value < 0.01) after controlling for age, sex, and family effects. Probands and siblings were distinguished into two groups by cluster analysis with these genes. Overall, unaffected siblings were equally distant from the centroid of probands and from that of unrelated controls with the differentially expressed genes. Interestingly, 5 of 20 siblings had gene expression profiles that were more similar to unrelated controls than to their matched probands. In summary, we found a set of genes that distinguished probands from the unaffected siblings, and a subgroup of unaffected siblings who were more similar to probands. The pathways that characterized probands compared to siblings using peripheral blood gene expression profiles were the up-regulation of ribosomal, spliceosomal, and mitochondrial pathways, and the down-regulation of neuroreceptor-ligand, immune response and calcium signaling pathways. Further integrative study with structural genetic variations such as de novo mutations, rare variants, and copy number variations would clarify whether these transcriptomic changes are structural or environmental in origin. PMID:23625158

  20. Molecular Analysis of the In Situ Growth Rates of Subsurface Geobacter Species

    PubMed Central

    Giloteaux, Ludovic; Barlett, Melissa; Chavan, Milind A.; Smith, Jessica A.; Williams, Kenneth H.; Wilkins, Michael; Long, Philip; Lovley, Derek R.

    2013-01-01

    Molecular tools that can provide an estimate of the in situ growth rate of Geobacter species could improve understanding of dissimilatory metal reduction in a diversity of environments. Whole-genome microarray analyses of a subsurface isolate of Geobacter uraniireducens, grown under a variety of conditions, identified a number of genes that are differentially expressed at different specific growth rates. Expression of two genes encoding ribosomal proteins, rpsC and rplL, was further evaluated with quantitative reverse transcription-PCR (qRT-PCR) in cells with doubling times ranging from 6.56 h to 89.28 h. Transcript abundance of rpsC correlated best (r2 = 0.90) with specific growth rates. Therefore, expression patterns of rpsC were used to estimate specific growth rates of Geobacter species during an in situ uranium bioremediation field experiment in which acetate was added to the groundwater to promote dissimilatory metal reduction. Initially, increased availability of acetate in the groundwater resulted in higher expression of Geobacter rpsC, and the increase in the number of Geobacter cells estimated with fluorescent in situ hybridization compared well with specific growth rates estimated from levels of in situ rpsC expression. However, in later phases, cell number increases were substantially lower than predicted from rpsC transcript abundance. This change coincided with a bloom of protozoa and increased attachment of Geobacter species to solid phases. These results suggest that monitoring rpsC expression may better reflect the actual rate that Geobacter species are metabolizing and growing during in situ uranium bioremediation than changes in cell abundance. PMID:23275510

  1. Sequence and Expression Analyses of Ethylene Response Factors Highly Expressed in Latex Cells from Hevea brasiliensis

    PubMed Central

    Piyatrakul, Piyanuch; Yang, Meng; Putranto, Riza-Arief; Pirrello, Julien; Dessailly, Florence; Hu, Songnian; Summo, Marilyne; Theeravatanasuk, Kannikar; Leclercq, Julie; Kuswanhadi; Montoro, Pascal

    2014-01-01

    The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors. PMID:24971876

  2. Sequence and expression analyses of ethylene response factors highly expressed in latex cells from Hevea brasiliensis.

    PubMed

    Piyatrakul, Piyanuch; Yang, Meng; Putranto, Riza-Arief; Pirrello, Julien; Dessailly, Florence; Hu, Songnian; Summo, Marilyne; Theeravatanasuk, Kannikar; Leclercq, Julie; Kuswanhadi; Montoro, Pascal

    2014-01-01

    The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors.

  3. Liposomal lipid and plasmid DNA delivery to B16/BL6 tumors after intraperitoneal administration of cationic liposome DNA aggregates.

    PubMed

    Reimer, D L; Kong, S; Monck, M; Wyles, J; Tam, P; Wasan, E K; Bally, M B

    1999-05-01

    The transfer of plasmid expression vectors to cells is essential for transfection after administration of lipid-based DNA formulations (lipoplexes). A murine i.p. B16/BL6 tumor model was used to characterize DNA delivery, liposomal lipid delivery, and gene transfer after regional (i.p.) administration of free plasmid DNA and DNA lipoplexes. DNA lipoplexes were prepared using cationic dioleoyldimethylammonium chloride/dioleoylphosphatidylethanolamine (50:50 mol ratio) liposomes mixed with plasmid DNA (1 microgram DNA/10 nmol lipid). The plasmid used contained the chloramphenicol acetyltransferase gene and chloramphenicol acetyltransferase expression (mU/g tumor) was measured to estimate transfection efficiency. Tumor-associated DNA and liposomal lipid levels were measured to estimate the efficiency of lipid-mediated DNA delivery to tumors. Plasmid DNA delivery was estimated using [3H]-labeled plasmid as a tracer, dot blot analysis, and/or Southern analysis. Liposomal lipid delivery was estimated using [14C]-dioleoylphosphatidylethanolamine as a liposomal lipid marker. Gene expression in the B16/BL6 tumors was highly variable, with values ranging from greater than 2,000 mU/g tumor to less than 100 mU/g tumor. There was a tendency to observe enhanced transfection in small (<250 mg) tumors. Approximately 18% of the injected dose of DNA was associated with these small tumors 2 h after i.p. administration. Southern analysis of extracted tumor DNA indicated that plasmid DNA associated with tumors was intact 24 h after administration. DNA and associated liposomal lipid are efficiently bound to tumors after regional administration; however, it is unclear whether delivery is sufficient to abet internalization and appropriate subcellular localization of the expression vector.

  4. IAOseq: inferring abundance of overlapping genes using RNA-seq data.

    PubMed

    Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue

    2015-01-01

    Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.

  5. Applications of lentiviral vectors in molecular imaging.

    PubMed

    Chatterjee, Sushmita; De, Abhijit

    2014-06-01

    Molecular imaging provides the ability of simultaneous visual and quantitative estimation of long term gene expression directly from living organisms. To reveal the kinetics of gene expression by imaging method, often sustained expression of the transgene is required. Lentiviral vectors have been extensively used over last fifteen years for delivery of a transgene in a wide variety of cell types. Lentiviral vectors have the well known advantages such as sustained transgene delivery through stable integration into the host genome, the capability of infecting non-dividing and dividing cells, broad tissue tropism, a reasonably large carrying capacity for delivering therapeutic and reporter gene combinations. Additionally, they do not express viral proteins during transduction, have a potentially safe integration site profile, and a relatively easy system for vector manipulation and infective viral particle production. As a result, lentiviral vector mediated therapeutic and imaging reporter gene delivery to various target organs holds promise for the future treatment. In this review, we have conducted a brief survey of important lentiviral vector developments in diverse biomedical fields including reproductive biology.

  6. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data

    PubMed Central

    Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe

    2015-01-01

    Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374

  7. Peak flood estimation using gene expression programming

    NASA Astrophysics Data System (ADS)

    Zorn, Conrad R.; Shamseldin, Asaad Y.

    2015-12-01

    As a case study for the Auckland Region of New Zealand, this paper investigates the potential use of gene-expression programming (GEP) in predicting specific return period events in comparison to the established and widely used Regional Flood Estimation (RFE) method. Initially calibrated to 14 gauged sites, the GEP derived model was further validated to 10 and 100 year flood events with a relative errors of 29% and 18%, respectively. This is compared to the RFE method providing 48% and 44% errors for the same flood events. While the effectiveness of GEP in predicting specific return period events is made apparent, it is argued that the derived equations should be used in conjunction with those existing methodologies rather than as a replacement.

  8. Heritable variation in heat shock gene expression: a potential mechanism for adaptation to thermal stress in embryos of sea turtles.

    PubMed

    Tedeschi, J N; Kennington, W J; Tomkins, J L; Berry, O; Whiting, S; Meekan, M G; Mitchell, N J

    2016-01-13

    The capacity of species to respond adaptively to warming temperatures will be key to their survival in the Anthropocene. The embryos of egg-laying species such as sea turtles have limited behavioural means for avoiding high nest temperatures, and responses at the physiological level may be critical to coping with predicted global temperature increases. Using the loggerhead sea turtle (Caretta caretta) as a model, we used quantitative PCR to characterise variation in the expression response of heat-shock genes (hsp60, hsp70 and hsp90; molecular chaperones involved in cellular stress response) to an acute non-lethal heat shock. We show significant variation in gene expression at the clutch and population levels for some, but not all hsp genes. Using pedigree information, we estimated heritabilities of the expression response of hsp genes to heat shock and demonstrated both maternal and additive genetic effects. This is the first evidence that the heat-shock response is heritable in sea turtles and operates at the embryonic stage in any reptile. The presence of heritable variation in the expression of key thermotolerance genes is necessary for sea turtles to adapt at a molecular level to warming incubation environments. © 2016 The Author(s).

  9. Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments

    PubMed Central

    Brodsky, Leonid; Leontovich, Andrei; Shtutman, Michael; Feinstein, Elena

    2004-01-01

    Mathematical methods of analysis of microarray hybridizations deal with gene expression profiles as elementary units. However, some of these profiles do not reflect a biologically relevant transcriptional response, but rather stem from technical artifacts. Here, we describe two technically independent but rationally interconnected methods for identification of such artifactual profiles. Our diagnostics are based on detection of deviations from uniformity, which is assumed as the main underlying principle of microarray design. Method 1 is based on detection of non-uniformity of microarray distribution of printed genes that are clustered based on the similarity of their expression profiles. Method 2 is based on evaluation of the presence of gene-specific microarray spots within the slides’ areas characterized by an abnormal concentration of low/high differential expression values, which we define as ‘patterns of differentials’. Applying two novel algorithms, for nested clustering (method 1) and for pattern detection (method 2), we can make a dual estimation of the profile’s quality for almost every printed gene. Genes with artifactual profiles detected by method 1 may then be removed from further analysis. Suspicious differential expression values detected by method 2 may be either removed or weighted according to the probabilities of patterns that cover them, thus diminishing their input in any further data analysis. PMID:14999086

  10. Heritable variation in heat shock gene expression: a potential mechanism for adaptation to thermal stress in embryos of sea turtles

    PubMed Central

    Kennington, W. J.; Tomkins, J. L.; Berry, O.; Whiting, S.; Meekan, M. G.; Mitchell, N. J.

    2016-01-01

    The capacity of species to respond adaptively to warming temperatures will be key to their survival in the Anthropocene. The embryos of egg-laying species such as sea turtles have limited behavioural means for avoiding high nest temperatures, and responses at the physiological level may be critical to coping with predicted global temperature increases. Using the loggerhead sea turtle (Caretta caretta) as a model, we used quantitative PCR to characterise variation in the expression response of heat-shock genes (hsp60, hsp70 and hsp90; molecular chaperones involved in cellular stress response) to an acute non-lethal heat shock. We show significant variation in gene expression at the clutch and population levels for some, but not all hsp genes. Using pedigree information, we estimated heritabilities of the expression response of hsp genes to heat shock and demonstrated both maternal and additive genetic effects. This is the first evidence that the heat-shock response is heritable in sea turtles and operates at the embryonic stage in any reptile. The presence of heritable variation in the expression of key thermotolerance genes is necessary for sea turtles to adapt at a molecular level to warming incubation environments. PMID:26763709

  11. CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.

    PubMed

    Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T

    2017-12-28

    In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .

  12. Construction of novel shuttle expression vectors for gene expression in Bacillus subtilis and Bacillus pumilus.

    PubMed

    Shao, Huanhuan; Cao, Qinghua; Zhao, Hongyan; Tan, Xuemei; Feng, Hong

    2015-01-01

    A native plasmid (pSU01) was detected by genome sequencing of Bacillus subtilis strain S1-4. Two pSU01-based shuttle expression vectors pSU02-AP and pSU03-AP were constructed enabling stable replication in B. subtilis WB600. These vectors contained the reporter gene aprE, encoding an alkaline protease from Bacillus pumilus BA06. The expression vector pSU03-AP only possessed the minimal replication elements (rep, SSO, DSO) and exhibited more stability on structure, suggesting that the rest of the genes in pSU01 (ORF1, ORF2, mob, hsp) were unessential for the structural stability of plasmid in B. subtilis. In addition, recombinant production of the alkaline protease was achieved more efficiently with pSU03-AP whose copy number was estimated to be more than 100 per chromosome. Furthermore, pSU03-AP could also be used to transform and replicate in B. pumilus BA06 under selective pressure. In conclusion, pSU03-AP is expected to be a useful tool for gene expression in Bacillus subtilis and B. pumilus.

  13. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

    PubMed

    Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

    2010-12-15

    Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  14. Sexually dimorphic gene expressions in eels: useful markers for early sex assessment in a conservation context

    PubMed Central

    Geffroy, Benjamin; Guilbaud, Florian; Amilhat, Elsa; Beaulaton, Laurent; Vignon, Matthias; Huchet, Emmanuel; Rives, Jacques; Bobe, Julien; Fostier, Alexis; Guiguen, Yann; Bardonnet, Agnès

    2016-01-01

    Environmental sex determination (ESD) has been detected in a range of vertebrate reptile and fish species. Eels are characterized by an ESD that occurs relatively late, since sex cannot be histologically determined before individuals reach 28 cm. Because several eel species are at risk of extinction, assessing sex at the earliest stage is a crucial management issue. Based on preliminary results of RNA sequencing, we targeted genes susceptible to be differentially expressed between ovaries and testis at different stages of development. Using qPCR, we detected testis-specific expressions of dmrt1, amh, gsdf and pre-miR202 and ovary-specific expressions were obtained for zar1, zp3 and foxn5. We showed that gene expressions in the gonad of intersexual eels were quite similar to those of males, supporting the idea that intersexual eels represent a transitional stage towards testicular differentiation. To assess whether these genes would be effective early molecular markers, we sampled juvenile eels in two locations with highly skewed sex ratios. The combined expression of six of these genes allowed the discrimination of groups according to their potential future sex and thus this appears to be a useful tool to estimate sex ratios of undifferentiated juvenile eels. PMID:27658729

  15. Transient Gene Expression in Maize, Rice, and Wheat Cells Using an Airgun Apparatus 1

    PubMed Central

    Oard, James H.; Paige, David F.; Simmonds, John A.; Gradziel, Thomas M.

    1990-01-01

    An airgun apparatus has been constructed for transient gene expression studies of monocots. This device utilizes compressed air from a commercial airgun to propel macroprojectile and DNA-coated tungsten particles. The β-glucuronidase (GUS) reporter gene was used to monitor transient expression in three distinct cell types of maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum). The highest level of GUS activity in cultured maize cells was observed when distance between stopping plate and target cells was adjusted to 4.3 centimeters. Efficiency of transformation was estimated to be 4.4 × 10−3. In a partial vacuum of 700 millimeters Hg, velocity of macroprojectile was measured at 520 meters per second with a 6% reduction in velocity at atmospheric pressure. A polyethylene film placed in the breech before firing contributed to a 12% increase in muzzle velocity. A 700 millimeters Hg level of vacuum was necessary for maximum number of transfornants. GUS expression was also detected in wheat leaf base tissue of microdissected shoot apices. High levels of transient gene expression were also observed in hard, compact embryogenic callus of rice. These results show that the airgun apparatus is a convenient, safe, and low-cost device for rapid transient gene expression studies in cereals. Images Figure 7 Figure 8 Figure 9 PMID:16667278

  16. Sexually dimorphic gene expressions in eels: useful markers for early sex assessment in a conservation context

    NASA Astrophysics Data System (ADS)

    Geffroy, Benjamin; Guilbaud, Florian; Amilhat, Elsa; Beaulaton, Laurent; Vignon, Matthias; Huchet, Emmanuel; Rives, Jacques; Bobe, Julien; Fostier, Alexis; Guiguen, Yann; Bardonnet, Agnès

    2016-09-01

    Environmental sex determination (ESD) has been detected in a range of vertebrate reptile and fish species. Eels are characterized by an ESD that occurs relatively late, since sex cannot be histologically determined before individuals reach 28 cm. Because several eel species are at risk of extinction, assessing sex at the earliest stage is a crucial management issue. Based on preliminary results of RNA sequencing, we targeted genes susceptible to be differentially expressed between ovaries and testis at different stages of development. Using qPCR, we detected testis-specific expressions of dmrt1, amh, gsdf and pre-miR202 and ovary-specific expressions were obtained for zar1, zp3 and foxn5. We showed that gene expressions in the gonad of intersexual eels were quite similar to those of males, supporting the idea that intersexual eels represent a transitional stage towards testicular differentiation. To assess whether these genes would be effective early molecular markers, we sampled juvenile eels in two locations with highly skewed sex ratios. The combined expression of six of these genes allowed the discrimination of groups according to their potential future sex and thus this appears to be a useful tool to estimate sex ratios of undifferentiated juvenile eels.

  17. A web application for automatic prediction of gene translation elongation efficiency.

    PubMed

    Sokolov, Vladimir S; Zuraev, Bulat S; Lashin, Sergei A; Matushkin, Yury G

    2015-03-01

    Expression efficiency is one of the major characteristics describing genes in various modern investigations. Expression efficiency of genes is regulated at various stages: transcription, translation, posttranslational protein modification and others. In this study, a special EloE (Elongation Efficiency) web application is described. The EloE sorts the organism's genes in a descend order on their theoretical rate of the elongation stage of translation based on the analysis of their nucleotide sequences. Obtained theoretical data have a significant correlation with available experimental data of gene expression in various organisms. In addition, the program identifies preferential codons in organism's genes and defines distribution of potential secondary structures energy in 5´ and 3´ regions of mRNA. The EloE can be useful in preliminary estimation of translation elongation efficiency for genes for which experimental data are not available yet. Some results can be used, for instance, in other programs modeling artificial genetic structures in genetically engineered experiments. The EloE web application is available at http://www-bionet.sscc.ru:7780/EloE.

  18. A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq Data.

    PubMed

    Choi, Yoonha; Coram, Marc; Peng, Jie; Tang, Hua

    2017-07-01

    Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.

  19. Classification of early-stage non-small cell lung cancer by weighing gene expression profiles with connectivity information.

    PubMed

    Zhang, Ao; Tian, Suyan

    2018-05-01

    Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. DNA methylation patterns and gene expression associated with litter size in Berkshire pig placenta

    PubMed Central

    Kwon, Seulgi; Park, Da Hye; Kim, Tae Wan; Kang, Deok Gyeong; Yu, Go Eun; Kim, Il-Suk; Park, Hwa Chun; Ha, Jeongim; Kim, Chul Wook

    2017-01-01

    Increasing litter size is of great interest to the pig industry. DNA methylation is an important epigenetic modification that regulates gene expression, resulting in livestock phenotypes such as disease resistance, milk production, and reproduction. We classified Berkshire pigs into two groups according to litter size and estimated breeding value: smaller (SLG) and larger (LLG) litter size groups. Genome-wide DNA methylation and gene expression were analyzed using placenta genomic DNA and RNA to identify differentially methylated regions (DMRs) and differentially expressed genes (DEGs) associated with litter size. The methylation levels of CpG dinucleotides in different genomic regions were noticeably different between the groups, while global methylation pattern was similar, and excluding intergenic regions they were found the most frequently in gene body regions. Next, we analyzed RNA-Seq data to identify DEGs between the SLG and LLG groups. A total of 1591 DEGs were identified: 567 were downregulated and 1024 were upregulated in LLG compared to SLG. To identify genes that simultaneously exhibited changes in DNA methylation and mRNA expression, we integrated and analyzed the data from bisulfite-Seq and RNA-Seq. Nine DEGs positioned in DMRs were found. The expression of only three of these genes (PRKG2, CLCA4, and PCK1) was verified by RT-qPCR. Furthermore, we observed the same methylation patterns in blood samples as in the placental tissues by PCR-based methylation analysis. Together, these results provide useful data regarding potential epigenetic markers for selecting hyperprolific sows. PMID:28880934

  1. Identification and expression analysis of glucosinolate biosynthetic genes and estimation of glucosinolate contents in edible organs of Brassica oleracea subspecies.

    PubMed

    Yi, Go-Eun; Robin, Arif Hasan Khan; Yang, Kiwoung; Park, Jong-In; Kang, Jong-Goo; Yang, Tae-Jin; Nou, Ill-Sup

    2015-07-20

    Glucosinolates are anti-carcinogenic, anti-oxidative biochemical compounds that defend plants from insect and microbial attack. Glucosinolates are abundant in all cruciferous crops, including all vegetable and oilseed Brassica species. Here, we studied the expression of glucosinolate biosynthesis genes and determined glucosinolate contents in the edible organs of a total of 12 genotypes of Brassica oleracea: three genotypes each from cabbage, kale, kohlrabi and cauliflower subspecies. Among the 81 genes analyzed by RT-PCR, 19 are transcription factor-related, two different sets of 25 genes are involved in aliphatic and indolic biosynthesis pathways and the rest are breakdown-related. The expression of glucosinolate-related genes in the stems of kohlrabi was remarkably different compared to leaves of cabbage and kale and florets of cauliflower as only eight genes out of 81 were expressed in the stem tissues of kohlrabi. In the stem tissue of kohlrabi, only one aliphatic transcription factor-related gene, Bol036286 (MYB28) and one indolic transcription factor-related gene, Bol030761 (MYB51), were expressed. The results indicated the expression of all genes is not essential for glucosinolate biosynthesis. Using HPLC analysis, a total of 16 different types of glucosinolates were identified in four subspecies, nine of them were aliphatic, four of them were indolic and one was aromatic. Cauliflower florets measured the highest number of 14 glucosinolates. Among the aliphatic glucosinolates, only gluconapin was found in the florets of cauliflower. Glucoiberverin and glucobrassicanapin contents were the highest in the stems of kohlrabi. The indolic methoxyglucobrassicin and aromatic gluconasturtiin accounted for the highest content in the florets of cauliflower. A further detailed investigation and analyses is required to discern the precise roles of each of the genes for aliphatic and indolic glucosinolate biosynthesis in the edible organs.

  2. Modularity and evolutionary constraints in a baculovirus gene regulatory network

    PubMed Central

    2013-01-01

    Background The structure of regulatory networks remains an open question in our understanding of complex biological systems. Interactions during complete viral life cycles present unique opportunities to understand how host-parasite network take shape and behave. The Anticarsia gemmatalis multiple nucleopolyhedrovirus (AgMNPV) is a large double-stranded DNA virus, whose genome may encode for 152 open reading frames (ORFs). Here we present the analysis of the ordered cascade of the AgMNPV gene expression. Results We observed an earlier onset of the expression than previously reported for other baculoviruses, especially for genes involved in DNA replication. Most ORFs were expressed at higher levels in a more permissive host cell line. Genes with more than one copy in the genome had distinct expression profiles, which could indicate the acquisition of new functionalities. The transcription gene regulatory network (GRN) for 149 ORFs had a modular topology comprising five communities of highly interconnected nodes that separated key genes that are functionally related on different communities, possibly maximizing redundancy and GRN robustness by compartmentalization of important functions. Core conserved functions showed expression synchronicity, distinct GRN features and significantly less genetic diversity, consistent with evolutionary constraints imposed in key elements of biological systems. This reduced genetic diversity also had a positive correlation with the importance of the gene in our estimated GRN, supporting a relationship between phylogenetic data of baculovirus genes and network features inferred from expression data. We also observed that gene arrangement in overlapping transcripts was conserved among related baculoviruses, suggesting a principle of genome organization. Conclusions Albeit with a reduced number of nodes (149), the AgMNPV GRN had a topology and key characteristics similar to those observed in complex cellular organisms, which indicates that modularity may be a general feature of biological gene regulatory networks. PMID:24006890

  3. Hepatic CD36 downregulation parallels steatosis improvement in morbidly obese undergoing bariatric surgery.

    PubMed

    Pardina, E; Ferrer, R; Rossell, J; Ricart-Jané, D; Méndez-Lara, K A; Baena-Fustegueras, J A; Lecube, A; Julve, J; Peinado-Onsurbe, J

    2017-09-01

    The notion that hepatic expression of genes involved in lipid metabolism is altered in obese patients is relatively new and its relationship with hepatic steatosis and cardiometabolic alterations remains unclear. We assessed the impact of Roux-en-Y gastric bypass surgery (RYGB) on the expression profile of genes related to metabolic syndrome in liver biopsies from morbidly obese individuals using a custom-made, focused cDNA microarray, and assessed the relationship between the expression profile and hepatic steatosis regression. Plasma and liver samples were obtained from patients at baseline and 12 months after surgery. Samples were assayed for chemical and gene expression analyses, as appropriate. Gene expression profiles were assessed using custom-made, focused TaqMan low-density array cards. RYGB-induced weight loss produced a favorable reduction in fat deposits, insulin resistance (estimated by homeostasis model assessment of insulin resistance (HOMA-IR)), and plasma and hepatic lipid levels. Compared with the baseline values, the gene expression levels of key targets of lipid metabolism were significantly altered: CD36 was significantly downregulated (-40%; P=0.001), whereas APOB (+27%; P=0.032) and SCARB1 (+37%; P=0.040) were upregulated in response to surgery-induced weight reduction. We also observed a favorable reduction in the expression of the PAI1 gene (-80%; P=0.007) and a significant increase in the expression of the PPARA (+60%; P=0.014) and PPARGC1 genes (+36%; P=0.015). Notably, the relative fold decrease in the expression of the CD36 gene was directly associated with a concomitant reduction in the cholesterol (Spearman's r=0.92; P=0.001) and phospholipid (Spearman's r=0.76; P=0.04) contents in this tissue. For the first time, RYGB-induced weight loss was shown to promote a favorable downregulation of CD36 expression, which was proportional to a favorable reduction in the hepatic cholesterol and phospholipid contents in our morbidly obese subjects following surgery.

  4. Computational deconvolution of genome wide expression data from Parkinson's and Huntington's disease brain tissues using population-specific expression analysis

    PubMed Central

    Capurro, Alberto; Bodea, Liviu-Gabriel; Schaefer, Patrick; Luthi-Carter, Ruth; Perreau, Victoria M.

    2015-01-01

    The characterization of molecular changes in diseased tissues gives insight into pathophysiological mechanisms and is important for therapeutic development. Genome-wide gene expression analysis has proven valuable for identifying biological processes in neurodegenerative diseases using post mortem human brain tissue and numerous datasets are publically available. However, many studies utilize heterogeneous tissue samples consisting of multiple cell types, all of which contribute to global gene expression values, confounding biological interpretation of the data. In particular, changes in numbers of neuronal and glial cells occurring in neurodegeneration confound transcriptomic analyses, particularly in human brain tissues where sample availability and controls are limited. To identify cell specific gene expression changes in neurodegenerative disease, we have applied our recently published computational deconvolution method, population specific expression analysis (PSEA). PSEA estimates cell-type-specific expression values using reference expression measures, which in the case of brain tissue comprises mRNAs with cell-type-specific expression in neurons, astrocytes, oligodendrocytes and microglia. As an exercise in PSEA implementation and hypothesis development regarding neurodegenerative diseases, we applied PSEA to Parkinson's and Huntington's disease (PD, HD) datasets. Genes identified as differentially expressed in substantia nigra pars compacta neurons by PSEA were validated using external laser capture microdissection data. Network analysis and Annotation Clustering (DAVID) identified molecular processes implicated by differential gene expression in specific cell types. The results of these analyses provided new insights into the implementation of PSEA in brain tissues and additional refinement of molecular signatures in human HD and PD. PMID:25620908

  5. Minimising Immunohistochemical False Negative ER Classification Using a Complementary 23 Gene Expression Signature of ER Status

    PubMed Central

    Li, Qiyuan; Eklund, Aron C.; Juul, Nicolai; Haibe-Kains, Benjamin; Workman, Christopher T.; Richardson, Andrea L.; Szallasi, Zoltan; Swanton, Charles

    2010-01-01

    Background Expression of the oestrogen receptor (ER) in breast cancer predicts benefit from endocrine therapy. Minimising the frequency of false negative ER status classification is essential to identify all patients with ER positive breast cancers who should be offered endocrine therapies in order to improve clinical outcome. In routine oncological practice ER status is determined by semi-quantitative methods such as immunohistochemistry (IHC) or other immunoassays in which the ER expression level is compared to an empirical threshold[1], [2]. The clinical relevance of gene expression-based ER subtypes as compared to IHC-based determination has not been systematically evaluated. Here we attempt to reduce the frequency of false negative ER status classification using two gene expression approaches and compare these methods to IHC based ER status in terms of predictive and prognostic concordance with clinical outcome. Methodology/Principal Findings Firstly, ER status was discriminated by fitting the bimodal expression of ESR1 to a mixed Gaussian model. The discriminative power of ESR1 suggested bimodal expression as an efficient way to stratify breast cancer; therefore we identified a set of genes whose expression was both strongly bimodal, mimicking ESR expression status, and highly expressed in breast epithelial cell lines, to derive a 23-gene ER expression signature-based classifier. We assessed our classifiers in seven published breast cancer cohorts by comparing the gene expression-based ER status to IHC-based ER status as a predictor of clinical outcome in both untreated and tamoxifen treated cohorts. In untreated breast cancer cohorts, the 23 gene signature-based ER status provided significantly improved prognostic power compared to IHC-based ER status (P = 0.006). In tamoxifen-treated cohorts, the 23 gene ER expression signature predicted clinical outcome (HR = 2.20, P = 0.00035). These complementary ER signature-based strategies estimated that between 15.1% and 21.8% patients of IHC-based negative ER status would be classified with ER positive breast cancer. Conclusion/Significance Expression-based ER status classification may complement IHC to minimise false negative ER status classification and optimise patient stratification for endocrine therapies. PMID:21152022

  6. [Influence of physiologic 17 beta-estradiol concentrations on gene E6 expression in HVP type 18 in vitro].

    PubMed

    Dziubińska-Parol, Izabella; Gasowska, Urszula; Rzymowska, Jolanta; Kwaśniewska, Anna

    2003-09-01

    Many recent studies indicate that long term use of contraceptives is a strong risk factor in the development of cervical cancer. Steroid hormones, in persistent papilloma virus infection act on various levels, one of them is enhancing transforming activity of the virus. The aim of the study was to estimate if physiological concentrations of 17 beta-estradiol could influence expression of viral transforming genes. HeLa cell lines were incubated with three different physiological concentrations and and on the third day of incubation the level of E6 gene expression was determined. Results show no differences in expression between the control culter, and cultures incubated with physiological concentrations. It indicates that normal levels of 17 beta-estradiol don't play role in transforming process but it also shows need to analyse higher levels of hormones by quantitative analyses in prospective studies.

  7. Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues.

    PubMed

    Wheeler, Heather E; Shah, Kaanan P; Brenner, Jonathon; Garcia, Tzintzuni; Aquino-Michaels, Keston; Cox, Nancy J; Nicolae, Dan L; Im, Hae Kyung

    2016-11-01

    Understanding the genetic architecture of gene expression traits is key to elucidating the underlying mechanisms of complex traits. Here, for the first time, we perform a systematic survey of the heritability and the distribution of effect sizes across all representative tissues in the human body. We find that local h2 can be relatively well characterized with 59% of expressed genes showing significant h2 (FDR < 0.1) in the DGN whole blood cohort. However, current sample sizes (n ≤ 922) do not allow us to compute distal h2. Bayesian Sparse Linear Mixed Model (BSLMM) analysis provides strong evidence that the genetic contribution to local expression traits is dominated by a handful of genetic variants rather than by the collective contribution of a large number of variants each of modest size. In other words, the local architecture of gene expression traits is sparse rather than polygenic across all 40 tissues (from DGN and GTEx) examined. This result is confirmed by the sparsity of optimal performing gene expression predictors via elastic net modeling. To further explore the tissue context specificity, we decompose the expression traits into cross-tissue and tissue-specific components using a novel Orthogonal Tissue Decomposition (OTD) approach. Through a series of simulations we show that the cross-tissue and tissue-specific components are identifiable via OTD. Heritability and sparsity estimates of these derived expression phenotypes show similar characteristics to the original traits. Consistent properties relative to prior GTEx multi-tissue analysis results suggest that these traits reflect the expected biology. Finally, we apply this knowledge to develop prediction models of gene expression traits for all tissues. The prediction models, heritability, and prediction performance R2 for original and decomposed expression phenotypes are made publicly available (https://github.com/hakyimlab/PrediXcan).

  8. Whole Blood Gene Expression Profile Associated with Spontaneous Preterm Birth in Women with Threatened Preterm Labor

    PubMed Central

    Heng, Yujing Jan; Pennell, Craig Edward; Chua, Hon Nian; Perkins, Jonathan Edward; Lye, Stephen James

    2014-01-01

    Threatened preterm labor (TPTL) is defined as persistent premature uterine contractions between 20 and 37 weeks of gestation and is the most common condition that requires hospitalization during pregnancy. Most of these TPTL women continue their pregnancies to term while only an estimated 5% will deliver a premature baby within ten days. The aim of this work was to study differential whole blood gene expression associated with spontaneous preterm birth (sPTB) within 48 hours of hospital admission. Peripheral blood was collected at point of hospital admission from 154 women with TPTL before any medical treatment. Microarrays were utilized to investigate differential whole blood gene expression between TPTL women who did (n = 48) or did not have a sPTB (n = 106) within 48 hours of admission. Total leukocyte and neutrophil counts were significantly higher (35% and 41% respectively) in women who had sPTB than women who did not deliver within 48 hours (p<0.001). Fetal fibronectin (fFN) test was performed on 62 women. There was no difference in the urine, vaginal and placental microbiology and histopathology reports between the two groups of women. There were 469 significant differentially expressed genes (FDR<0.05); 28 differentially expressed genes were chosen for microarray validation using qRT-PCR and 20 out of 28 genes were successfully validated (p<0.05). An optimal random forest classifier model to predict sPTB was achieved using the top nine differentially expressed genes coupled with peripheral clinical blood data (sensitivity 70.8%, specificity 75.5%). These differentially expressed genes may further elucidate the underlying mechanisms of sPTB and pave the way for future systems biology studies to predict sPTB. PMID:24828675

  9. Single sea urchin phagocytes express messages of a single sequence from the diverse Sp185/333 gene family in response to bacterial challenge.

    PubMed

    Majeske, Audrey J; Oren, Matan; Sacchi, Sandro; Smith, L Courtney

    2014-12-01

    Immune systems in animals rely on fast and efficient responses to a wide variety of pathogens. The Sp185/333 gene family in the purple sea urchin, Strongylocentrotus purpuratus, consists of an estimated 50 (±10) members per genome that share a basic gene structure but show high sequence diversity, primarily due to the mosaic appearance of short blocks of sequence called elements. The genes show significantly elevated expression in three subpopulations of phagocytes responding to marine bacteria. The encoded Sp185/333 proteins are highly diverse and have central effector functions in the immune system. In this study we report the Sp185/333 gene expression in single sea urchin phagocytes. Sea urchins challenged with heat-killed marine bacteria resulted in a typical increase in coelomocyte concentration within 24 h, which included an increased proportion of phagocytes expressing Sp185/333 proteins. Phagocyte fractions enriched from coelomocytes were used in limiting dilutions to obtain samples of single cells that were evaluated for Sp185/333 gene expression by nested RT-PCR. Amplicon sequences showed identical or nearly identical Sp185/333 amplicon sequences in single phagocytes with matches to six known Sp185/333 element patterns, including both common and rare element patterns. This suggested that single phagocytes show restricted expression from the Sp185/333 gene family and infers a diverse, flexible, and efficient response to pathogens. This type of expression pattern from a family of immune response genes in single cells has not been identified previously in other invertebrates. Copyright © 2014 by The American Association of Immunologists, Inc.

  10. Reference gene selection for qRT-PCR assays in Stellera chamaejasme subjected to abiotic stresses and hormone treatments based on transcriptome datasets.

    PubMed

    Liu, Xin; Guan, Huirui; Song, Min; Fu, Yanping; Han, Xiaomin; Lei, Meng; Ren, Jingyu; Guo, Bin; He, Wei; Wei, Yahui

    2018-01-01

    Stellera chamaejasme Linn, an important poisonous plant of the China grassland, is toxic to humans and livestock. The rapid expansion of S. chamaejasme has greatly damaged the grassland ecology and, consequently, seriously endangered the development of animal husbandry. To draft efficient prevention and control measures, it has become more urgent to carry out research on its adaptive and expansion mechanisms in different unfavorable habitats at the genetic level. Quantitative real-time polymerase chain reaction (qRT-PCR) is a widely used technique for studying gene expression at the transcript level; however, qRT-PCR requires reference genes (RGs) as endogenous controls for data normalization and only through appropriate RG selection and qRT-PCR can we guarantee the reliability and robustness of expression studies and RNA-seq data analysis. Unfortunately, little research on the selection of RGs for gene expression data normalization in S. chamaejasme has been reported. In this study, 10 candidate RGs namely, 18S , 60S , CYP , GAPCP1 , GAPDH2 , EF1B , MDH , SAND , TUA1 , and TUA6 , were singled out from the transcriptome database of S. chamaejasme , and their expression stability under three abiotic stresses (drought, cold, and salt) and three hormone treatments (abscisic acid, ABA; gibberellin, GA; ethephon, ETH) were estimated with the programs geNorm, NormFinder, and BestKeeper. Our results showed that GAPCP1 and EF1B were the best combination for the three abiotic stresses, whereas TUA6 and SAND , TUA1 and CYP , GAPDH2 and 60S were the best choices for ABA, GA, and ETH treatment, respectively. Moreover, GAPCP1 and 60S were assessed to be the best combination for all samples, and 18S was the least stable RG for use as an internal control in all of the experimental subsets. The expression patterns of two target genes ( P5CS2 and GI ) further verified that the RGs that we selected were suitable for gene expression normalization. This work is the first attempt to comprehensively estimate the stability of RGs in S. chamaejasme . Our results provide suitable RGs for high-precision normalization in qRT-PCR analysis, thereby making it more convenient to analyze gene expression under these experimental conditions.

  11. Gene expression-based detection of radiation exposure in mice after treatment with granulocyte colony-stimulating factor and lipopolysaccharide.

    PubMed

    Tucker, James D; Grever, William E; Joiner, Michael C; Konski, Andre A; Thomas, Robert A; Smolinski, Joseph M; Divine, George W; Auner, Gregory W

    2012-02-01

    In a large-scale nuclear incident, many thousands of people may be exposed to a wide range of radiation doses. Rapid biological dosimetry will be required on an individualized basis to estimate the exposures and to make treatment decisions. To ameliorate the adverse effects of exposure, victims may be treated with one or more cytokine growth factors, including granulocyte colony-stimulating factor (G-CSF), which has therapeutic efficacy for treating radiation-induced bone marrow ablation by stimulating granulopoiesis. The existence of infections and the administration of G-CSF each may confound the ability to achieve reliable dosimetry by gene expression analysis. In this study, C57BL/6 mice were used to determine the extent to which G-CSF and lipopolysaccharide (LPS, which simulates infection by gram-negative bacteria) alter the expression of genes that are either radiation-responsive or non-responsive, i.e., show potential for use as endogenous controls. Mice were acutely exposed to (60)Co γ rays at either 0 Gy or 6 Gy. Two hours later the animals were injected with either 0.1 mg/kg of G-CSF or 0.3 mg/kg of LPS. Expression levels of 96 different gene targets were evaluated in peripheral blood after an additional 4 or 24 h using real-time quantitative PCR. The results indicate that the expression levels of some genes are altered by LPS, but altered expression after G-CSF treatment was generally not observed. The expression levels of many genes therefore retain utility for biological dosimetry or as endogenous controls. These data suggest that PCR-based quantitative gene expression analyses may have utility in radiation biodosimetry in humans even in the presence of an infection or after treatment with G-CSF.

  12. Network Reconstruction From High-Dimensional Ordinary Differential Equations.

    PubMed

    Chen, Shizhe; Shojaie, Ali; Witten, Daniela M

    2017-01-01

    We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.

  13. Genetic localization and phenotypic expression of X-linked cataract (Xcat) in Mus musculus.

    PubMed

    Favor, J; Pretsch, W

    1990-01-01

    Linkage data relative to the markers tabby and glucose-6-phosphate dehydrogenase are presented to locate X-linked cataract (Xcat) in the distal portion of the mouse X-chromosome between jimpy and hypophosphatemia. The human X-linked cataract-dental syndrome, Nance-Horan Syndrome, also maps closely to human hypophosphatemia and would suggest homology between mouse Xcat and human Nance-Horan Syndrome genes. In hemizygous males and homozygous females penetrance is complete with only slight variation in the degree of expression. Phenotypic expression in Xcat heterozygous females ranges from totally clear to totally opaque lenses. The phenotypic expression between the two lenses of a heterozygous individual could also vary between totally clear and totally opaque lenses. However, a correlation in the degree of expression between the eyes of an individual was observed. A variegated pattern of lens opacity was evident in female heterozygotes. Based on these observations, the site of gene action for the Xcat locus is suggested to be endogenous to the lens cells and the precursor cell population of the lens is concluded to be small. The identification of an X-linked cataract locus is an important contribution to the estimate of the number of mutable loci resulting in cataract, an estimate required so that dominant cataract mutagenesis results may be expressed on a per locus basis. The Xcat mutation may be a useful marker for a distal region of the mouse X-chromosome which is relatively sparsely marked and the X-linked cataract mutation may be employed in gene expression and lens development studies.

  14. Classification based upon gene expression data: bias and precision of error rates.

    PubMed

    Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L

    2007-06-01

    Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp

  15. Effect of NET-1 siRNA conjugated sub-micron bubble complex combined with low-frequency ultrasound exposure in gene transfection

    PubMed Central

    Wu, Bolin; Liang, Xitian; Jing, Hui; Han, Xue; Sun, Yixin; Guo, Cunli; Liu, Ying; Cheng, Wen

    2018-01-01

    The present study evaluated the effect of NET-1 siRNA-conjugated sub-micron bubble (SMB) complexes combined with low-frequency ultrasound exposure in gene transfection. The NET-1 gene was highly expressed level in SMMC-7721 human hepatocellular carcinoma cell line. The cells were divided into seven groups and treated with different conditions. The groups with or without low-frequency ultrasound exposure, groups of adherent cells, and suspension cells were separated. The NET-1 siRNA-conjugated SMB complexes were made in the laboratory and tested by Zetasizer Nano ZS90 analyzer. Flow cytometry was used to estimate the transfection efficiency and cellular apoptosis. Western blot and quantitative real-time polymerase chain reaction (qPCR) were used for the estimation of the protein and mRNA expressions, respectively. Transwell analysis determined the migration and invasion capacities of the tumor cells. The results did not show any difference in the transfection efficiency between adherent and suspension cells. However, the NET-1 siRNA-SMB complexes combined with low-frequency ultrasound exposure could enhance the gene transfection effectively. In summary, the NET-1 siRNA-SMB complexes appeared to be promising gene vehicle. PMID:29423111

  16. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

    PubMed

    Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

    2015-11-15

    High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.

  17. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

    PubMed Central

    Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

    2015-01-01

    Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307

  18. De novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes.

    PubMed

    Croucher, Peter J P; Brewer, Michael S; Winchell, Christopher J; Oxford, Geoff S; Gillespie, Rosemary G

    2013-12-08

    A number of spider species within the family Theridiidae exhibit a dramatic abdominal (opisthosomal) color polymorphism. The polymorphism is inherited in a broadly Mendelian fashion and in some species consists of dozens of discrete morphs that are convergent across taxa and populations. Few genomic resources exist for spiders. Here, as a first necessary step towards identifying the genetic basis for this trait we present the near complete transcriptomes of two species: the Hawaiian happy-face spider Theridion grallator and Theridion californicum. We mined the gene complement for pigment-pathway genes and examined differential expression (DE) between morphs that are unpatterned (plain yellow) and patterned (yellow with superimposed patches of red, white or very dark brown). By deep sequencing both RNA-seq and normalized cDNA libraries from pooled specimens of each species we were able to assemble a comprehensive gene set for both species that we estimate to be 98-99% complete. It is likely that these species express more than 20,000 protein-coding genes, perhaps 4.5% (ca. 870) of which might be unique to spiders. Mining for pigment-associated Drosophila melanogaster genes indicated the presence of all ommochrome pathway genes and most pteridine pathway genes and DE analyses further indicate a possible role for the pteridine pathway in theridiid color patterning. Based upon our estimates, T. grallator and T. californicum express a large inventory of protein-coding genes. Our comprehensive assembly illustrates the continuing value of sequencing normalized cDNA libraries in addition to RNA-seq in order to generate a reference transcriptome for non-model species. The identification of pteridine-related genes and their possible involvement in color patterning is a novel finding in spiders and one that suggests a biochemical link between guanine deposits and the pigments exhibited by these species.

  19. Genetics of Mitochondrial Disease.

    PubMed

    Saneto, Russell P

    2017-01-01

    Mitochondria are intracellular organelles responsible for adenosine triphosphate production. The strict control of intracellular energy needs require proper mitochondrial functioning. The mitochondria are under dual controls of mitochondrial DNA (mtDNA) and nuclear DNA (nDNA). Mitochondrial dysfunction can arise from changes in either mtDNA or nDNA genes regulating function. There are an estimated ∼1500 proteins in the mitoproteome, whereas the mtDNA genome has 37 proteins. There are, to date, ∼275 genes shown to give rise to disease. The unique physiology of mitochondrial functioning contributes to diverse gene expression. The onset and range of phenotypic expression of disease is diverse, with onset from neonatal to seventh decade of life. The range of dysfunction is heterogeneous, ranging from single organ to multisystem involvement. The complexity of disease expression has severely limited gene discovery. Combining phenotypes with improvements in gene sequencing strategies are improving the diagnosis process. This chapter focuses on the interplay of the unique physiology and gene discovery in the current knowledge of genetically derived mitochondrial disease. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Evaluation of RNA extraction methods and identification of putative reference genes for real-time quantitative polymerase chain reaction expression studies on olive (Olea europaea L.) fruits.

    PubMed

    Nonis, Alberto; Vezzaro, Alice; Ruperti, Benedetto

    2012-07-11

    Genome wide transcriptomic surveys together with targeted molecular studies are uncovering an ever increasing number of differentially expressed genes in relation to agriculturally relevant processes in olive (Olea europaea L). These data need to be supported by quantitative approaches enabling the precise estimation of transcript abundance. qPCR being the most widely adopted technique for mRNA quantification, preliminary work needs to be done to set up robust methods for extraction of fully functional RNA and for the identification of the best reference genes to obtain reliable quantification of transcripts. In this work, we have assessed different methods for their suitability for RNA extraction from olive fruits and leaves and we have evaluated thirteen potential candidate reference genes on 21 RNA samples belonging to fruit developmental/ripening series and to leaves subjected to wounding. By using two different algorithms, GAPDH2 and PP2A1 were identified as the best reference genes for olive fruit development and ripening, and their effectiveness for normalization of expression of two ripening marker genes was demonstrated.

  1. Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

    PubMed Central

    Michailidis, George

    2014-01-01

    Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g., wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network. PMID:24586224

  2. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.

    PubMed

    Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu

    2018-05-30

    One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.

  3. Penalized differential pathway analysis of integrative oncogenomics studies.

    PubMed

    van Wieringen, Wessel N; van de Wiel, Mark A

    2014-04-01

    Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology. DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels. Motivated by the rate equations, the regulatory mechanism between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneous-equations model, for the one- and two-group case. The latter facilitates the identification of differential interactions between the two groups. Model parameters are estimated by penalized least squares using the lasso (L1) penalty to obtain a sparse pathway topology. Simulations show that the inclusion of DNA copy number data benefits the discovery of gene-gene interactions. In addition, the simulations reveal that cis-effects tend to be over-estimated in a univariate (single gene) analysis. In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analyses of the TP53 and TGFb signaling pathways between ER+ and ER- samples from an integrative genomics breast cancer study identify reproducible differential regulatory patterns that corroborate with existing literature.

  4. Temporal variations in the gene expression levels of cyanobacterial anti-oxidant enzymes through geological history: implications for biological evolution during the Great Oxidation Event

    NASA Astrophysics Data System (ADS)

    Harada, M.; Furukawa, R.; Yokobori, S. I.; Tajika, E.; Yamagishi, A.

    2016-12-01

    A significant rise in atmospheric O2 levels during the GOE (Great Oxidation Event), ca. 2.45-2.0 Ga, must have caused a great stress to biosphere, enforcing life to adapt to oxic conditions. Cyanobacteria, oxygenic photosynthetic bacteria that had been responsible for the GOE, are at the same time one of the organisms that would have been greatly affected by the rise of O2 level in the surface environments. Knowledge on the evolution of cyanobacteria is not only important to elucidate the cause of the GOE, but also helps us to better understand the adaptive evolution of life in response to the GOE. Here we performed phylogenetic analysis of an anti-oxidant enzyme Fe-SOD (iron superoxide dismutase) of cyanobacteria, to assess the adaptive evolution of life under the GOE. The rise of O2 level must have increased the level of toxic reactive oxygen species in cyanobacterial cells, thus forced them to change activities or the gene expression levels of Fe-SOD. In the present study, we focus on the change in the gene expression levels of the enzyme, which can be estimated from the promoter sequences of the gene. Promoters are DNA sequences found upstream of protein encoding regions, where RNA polymerase binds and initiates transcription. "Strong" promoters that efficiently interact with RNA polymerase induce high rates of transcription, leading to high levels of gene expression. Thus, from the temporal changes in the promoter sequences, we can estimate the variations in the gene expression levels during the geological time. Promoter sequences of Fe-SOD at each ancestral node of cyanobacteria were predicted from phylogenetic analysis, and the ancestral promoter sequences were compared to the promoters of known highly expressed genes. The similarity was low at the time of the emergence of cyanobacteria; however, increased at the branching nodes diverged 2.4 billon years ago. This roughly coincided with the onset of the GOE, implying that the transition from low to high gene expression levels of Fe-SOD occurred in response to the GOE. We propose that this is the first direct evidence of the evolution of cyanobacteria related to the rise of O2, and that the methodologies of ancestral promoter analysis used in this study can be a novel tools to reveal the biological adaptation to such a significant geologic event.

  5. DREISS: Using State-Space Models to Infer the Dynamics of Gene Expression Driven by External and Internal Regulatory Networks

    PubMed Central

    Gerstein, Mark

    2016-01-01

    Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs), cellular growth factors and microRNAs. A subsystem’s gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally–e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org) for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the “state” and “control” in the model refer to its own (internal) and another subsystem’s (external) gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model’s parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation) representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs), seeing the degree to which these can be accounted for by orthologous (internal) versus species-specific (external) TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with evolutionarily ancient functions (e.g. the ribosomal proteins), in contrast to those with more recently evolved functions (e.g., cell-cell communication). This suggests that despite striking morphological differences, some fundamental embryonic-developmental processes are still controlled by ancient regulatory systems. PMID:27760135

  6. DREISS: Using State-Space Models to Infer the Dynamics of Gene Expression Driven by External and Internal Regulatory Networks.

    PubMed

    Wang, Daifeng; He, Fei; Maslov, Sergei; Gerstein, Mark

    2016-10-01

    Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs), cellular growth factors and microRNAs. A subsystem's gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally-e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org) for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the "state" and "control" in the model refer to its own (internal) and another subsystem's (external) gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model's parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation) representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs), seeing the degree to which these can be accounted for by orthologous (internal) versus species-specific (external) TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with evolutionarily ancient functions (e.g. the ribosomal proteins), in contrast to those with more recently evolved functions (e.g., cell-cell communication). This suggests that despite striking morphological differences, some fundamental embryonic-developmental processes are still controlled by ancient regulatory systems.

  7. Measles virus minigenomes encoding two autofluorescent proteins reveal cell-to-cell variation in reporter expression dependent on viral sequences between the transcription units.

    PubMed

    Rennick, Linda J; Duprex, W Paul; Rima, Bert K

    2007-10-01

    Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.

  8. Non-Viral Transfection Methods Optimized for Gene Delivery to a Lung Cancer Cell Line

    PubMed Central

    Salimzadeh, Loghman; Jaberipour, Mansooreh; Hosseini, Ahmad; Ghaderi, Abbas

    2013-01-01

    Background Mehr-80 is a newly established adherent human large cell lung cancer cell line that has not been transfected until now. This study aims to define the optimal transfection conditions and effects of some critical elements for enhancing gene delivery to this cell line by utilizing different non-viral transfection Procedures. Methods In the current study, calcium phosphate (CaP), DEAE-dextran, superfect, electroporation and lipofection transfection methods were used to optimize delivery of a plasmid construct that expressed Green Fluorescent Protein (GFP). Transgene expression was detected by fluorescent microscopy and flowcytometry. Toxicities of the methods were estimated by trypan blue staining. In order to evaluate the density of the transfected gene, we used a plasmid construct that expressed the Stromal cell-Derived Factor-1 (SDF-1) gene and measured its expression by real-time PCR. Results Mean levels of GFP-expressing cells 48 hr after transfection were 8.4% (CaP), 8.2% (DEAE-dextran), 4.9% (superfect), 34.1% (electroporation), and 40.1% (lipofection). Lipofection had the highest intense SDF-1 expression of the analyzed methods. Conclusion This study has shown that the lipofection and electroporation methods were more efficient at gene delivery to Mehr-80 cells. The quantity of DNA per transfection, reagent concentration, and incubation time were identified as essential factors for successful transfection in all of the studied methods. PMID:23799175

  9. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    PubMed

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  10. Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data.

    PubMed

    Johnson, Brent A

    2009-10-01

    We consider estimation and variable selection in the partial linear model for censored data. The partial linear model for censored data is a direct extension of the accelerated failure time model, the latter of which is a very important alternative model to the proportional hazards model. We extend rank-based lasso-type estimators to a model that may contain nonlinear effects. Variable selection in such partial linear model has direct application to high-dimensional survival analyses that attempt to adjust for clinical predictors. In the microarray setting, previous methods can adjust for other clinical predictors by assuming that clinical and gene expression data enter the model linearly in the same fashion. Here, we select important variables after adjusting for prognostic clinical variables but the clinical effects are assumed nonlinear. Our estimator is based on stratification and can be extended naturally to account for multiple nonlinear effects. We illustrate the utility of our method through simulation studies and application to the Wisconsin prognostic breast cancer data set.

  11. Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH.

    PubMed

    Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Gospocic, Janko; Gupte, Rohit; Bonasio, Roberto; Kim, Junhyong; Murray, John; Raj, Arjun

    2018-02-28

    Although single-cell RNA sequencing can reliably detect large-scale transcriptional programs, it is unclear whether it accurately captures the behavior of individual genes, especially those that express only in rare cells. Here, we use single-molecule RNA fluorescence in situ hybridization as a gold standard to assess trade-offs in single-cell RNA-sequencing data for detecting rare cell expression variability. We quantified the gene expression distribution for 26 genes that range from ubiquitous to rarely expressed and found that the correspondence between estimates across platforms improved with both transcriptome coverage and increased number of cells analyzed. Further, by characterizing the trade-off between transcriptome coverage and number of cells analyzed, we show that when the number of genes required to answer a given biological question is small, then greater transcriptome coverage is more important than analyzing large numbers of cells. More generally, our report provides guidelines for selecting quality thresholds for single-cell RNA-sequencing experiments aimed at rare cell analyses. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. Gene expression profiling of prostate tissue identifies chromatin regulation as a potential link between obesity and lethal prostate cancer.

    PubMed

    Ebot, Ericka M; Gerke, Travis; Labbé, David P; Sinnott, Jennifer A; Zadra, Giorgia; Rider, Jennifer R; Tyekucheva, Svitlana; Wilson, Kathryn M; Kelly, Rachel S; Shui, Irene M; Loda, Massimo; Kantoff, Philip W; Finn, Stephen; Vander Heiden, Matthew G; Brown, Myles; Giovannucci, Edward L; Mucci, Lorelei A

    2017-11-01

    Obese men are at higher risk of advanced prostate cancer and cancer-specific mortality; however, the biology underlying this association remains unclear. This study examined gene expression profiles of prostate tissue to identify biological processes differentially expressed by obesity status and lethal prostate cancer. Gene expression profiling was performed on tumor (n = 402) and adjacent normal (n = 200) prostate tissue from participants in 2 prospective cohorts who had been diagnosed with prostate cancer from 1982 to 2005. Body mass index (BMI) was calculated from the questionnaire immediately preceding cancer diagnosis. Men were followed for metastases or prostate cancer-specific death (lethal disease) through 2011. Gene Ontology biological processes differentially expressed by BMI were identified using gene set enrichment analysis. Pathway scores were computed by averaging the signal intensities of member genes. Odds ratios (ORs) for lethal prostate cancer were estimated with logistic regression. Among 402 men, 48% were healthy weight, 31% were overweight, and 21% were very overweight/obese. Fifteen gene sets were enriched in tumor tissue, but not normal tissue, of very overweight/obese men versus healthy-weight men; 5 of these were related to chromatin modification and remodeling (false-discovery rate < 0.25). Patients with high tumor expression of chromatin-related genes had worse clinical characteristics (Gleason grade > 7, 41% vs 17%; P = 2 × 10 -4 ) and an increased risk of lethal disease that was independent of grade and stage (OR, 5.26; 95% confidence interval, 2.37-12.25). This study improves our understanding of the biology of aggressive prostate cancer and identifies a potential mechanistic link between obesity and prostate cancer death that warrants further study. Cancer 2017;123:4130-4138. © 2017 American Cancer Society. © 2017 American Cancer Society.

  13. Shotgun Bisulfite Sequencing of the Betula platyphylla Genome Reveals the Tree’s DNA Methylation Patterning

    PubMed Central

    Su, Chang; Wang, Chao; He, Lin; Yang, Chuanping; Wang, Yucheng

    2014-01-01

    DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch) by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees. PMID:25514241

  14. Analysis of barosensitive mechanisms in yeast for Pressure Regulated Fermentation

    NASA Astrophysics Data System (ADS)

    Nomura, Kazuki; Iwahashi, Hitoshi; Iguchi, Akinori; Shigematsu, Toru

    2013-06-01

    Introduction: We are intending to develop a novel food processing technology, Pressure Regulated Fermentation (PReF), using pressure sensitive (barosensitive) fermentation microorganisms. Objectives of our study are to clarify barosensitive mechanisms for application to PReF technology. We isolated Saccharomyces cerevisiae barosensitive mutant a924E1 that was derived from the parent KA31a. Methods: Gene expression levels were analyzed by DNA microarray. The altered genes of expression levels were classified according to the gene function. Mutated genes were estimated by mating and producing diploid strains and confirmed by PCR of mitochondrial DNA (mtDNA). Results and Discussion: Gene expression profiles showed that genes of `Energy' function and that of encoding protein localized in ``Mitochondria'' were significantly down regulated in the mutant. These results suggest the respiratory deficiency and relationship between barosensitivity and respiratory deficiency. Since the respiratory functions of diploids showed non Mendelian inheritance, the respiratory deficiency was indicated to be due to mtDNA mutation. PCR analysis showed that the region of COX1 locus was deleted. COX1 gene encodes the subunit 1 of cytochrome c oxidase. For this reason, barosensitivity is strongly correlated with mitochondrial functions.

  15. Structure and vascular tissue expression of duplicated TERMINAL EAR1-like paralogues in poplar.

    PubMed

    Charon, Céline; Vivancos, Julien; Mazubert, Christelle; Paquet, Nicolas; Pilate, Gilles; Dron, Michel

    2010-02-01

    TERMINAL EAR1-like (TEL) genes encode putative RNA-binding proteins only found in land plants. Previous studies suggested that they may regulate tissue and organ initiation in Poaceae. Two TEL genes were identified in both Populus trichocarpa and the hybrid aspen Populus tremula x P. alba, named, respectively, PoptrTEL1-2 and PtaTEL1-2. The analysis of the organisation around the PoptrTEL genes in the P. trichocarpa genome and the estimation of the synonymous substitution rate for PtaTEL1-2 genes indicate that the paralogous link between these two Populus TEL genes probably results from the Salicoid large-scale gene-duplication event. Phylogenetic analyses confirmed their orthology link with the other TEL genes. The expression pattern of both PtaTEL genes appeared to be restricted to the mother cells of the plant body: leaf founder cells, leaf primordia, axillary buds and root differentiating tissues, as well as to mother cells of vascular tissues. Most interestingly, PtaTEL1-2 transcripts were found in differentiating cells of secondary xylem and phloem, but probably not in the cambium itself. Taken together, these results indicate specific expression of the TEL genes in differentiating cells controlling tissue and organ development in Populus (and other Angiosperm species).

  16. Evaluation of Reference Genes for RT-qPCR Studies in the Seagrass Zostera muelleri Exposed to Light Limitation

    PubMed Central

    Schliep, M.; Pernice, M.; Sinutok, S.; Bryant, C. V.; York, P. H.; Rasheed, M. A.; Ralph, P. J.

    2015-01-01

    Seagrass meadows are threatened by coastal development and global change. In the face of these pressures, molecular techniques such as reverse transcription quantitative real-time PCR (RT-qPCR) have great potential to improve management of these ecosystems by allowing early detection of chronic stress. In RT-qPCR, the expression levels of target genes are estimated on the basis of reference genes, in order to control for RNA variations. Although determination of suitable reference genes is critical for RT-qPCR studies, reports on the evaluation of reference genes are still absent for the major Australian species Zostera muelleri subsp. capricorni (Z. muelleri). Here, we used three different software (geNorm, NormFinder and Bestkeeper) to evaluate ten widely used reference genes according to their expression stability in Z. muelleri exposed to light limitation. We then combined results from different software and used a consensus rank of four best reference genes to validate regulation in Photosystem I reaction center subunit IV B and Heat Stress Transcription factor A- gene expression in Z. muelleri under light limitation. This study provides the first comprehensive list of reference genes in Z. muelleri and demonstrates RT-qPCR as an effective tool to identify early responses to light limitation in seagrass. PMID:26592440

  17. Estimating replicate time shifts using Gaussian process regression

    PubMed Central

    Liu, Qiang; Andersen, Bogi; Smyth, Padhraic; Ihler, Alexander

    2010-01-01

    Motivation: Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate. Results: We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study. Availability: Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/ Contact: ihler@ics.uci.edu PMID:20147305

  18. Medicago truncatula contains a second gene encoding a plastid located glutamine synthetase exclusively expressed in developing seeds.

    PubMed

    Seabra, Ana R; Vieira, Cristina P; Cullimore, Julie V; Carvalho, Helena G

    2010-08-19

    Nitrogen is a crucial nutrient that is both essential and rate limiting for plant growth and seed production. Glutamine synthetase (GS), occupies a central position in nitrogen assimilation and recycling, justifying the extensive number of studies that have been dedicated to this enzyme from several plant sources. All plants species studied to date have been reported as containing a single, nuclear gene encoding a plastid located GS isoenzyme per haploid genome. This study reports the existence of a second nuclear gene encoding a plastid located GS in Medicago truncatula. This study characterizes a new, second gene encoding a plastid located glutamine synthetase (GS2) in M. truncatula. The gene encodes a functional GS isoenzyme with unique kinetic properties, which is exclusively expressed in developing seeds. Based on molecular data and the assumption of a molecular clock, it is estimated that the gene arose from a duplication event that occurred about 10 My ago, after legume speciation and that duplicated sequences are also present in closely related species of the Vicioide subclade. Expression analysis by RT-PCR and western blot indicate that the gene is exclusively expressed in developing seeds and its expression is related to seed filling, suggesting a specific function of the enzyme associated to legume seed metabolism. Interestingly, the gene was found to be subjected to alternative splicing over the first intron, leading to the formation of two transcripts with similar open reading frames but varying 5' UTR lengths, due to retention of the first intron. To our knowledge, this is the first report of alternative splicing on a plant GS gene. This study shows that Medicago truncatula contains an additional GS gene encoding a plastid located isoenzyme, which is functional and exclusively expressed during seed development. Legumes produce protein-rich seeds requiring high amounts of nitrogen, we postulate that this gene duplication represents a functional innovation of plastid located GS related to storage protein accumulation exclusive to legume seed metabolism.

  19. A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

    PubMed

    Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang

    2017-08-23

    Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.

  20. Preprocessing of gene expression data by optimally robust estimators

    PubMed Central

    2010-01-01

    Background The preprocessing of gene expression data obtained from several platforms routinely includes the aggregation of multiple raw signal intensities to one expression value. Examples are the computation of a single expression measure based on the perfect match (PM) and mismatch (MM) probes for the Affymetrix technology, the summarization of bead level values to bead summary values for the Illumina technology or the aggregation of replicated measurements in the case of other technologies including real-time quantitative polymerase chain reaction (RT-qPCR) platforms. The summarization of technical replicates is also performed in other "-omics" disciplines like proteomics or metabolomics. Preprocessing methods like MAS 5.0, Illumina's default summarization method, RMA, or VSN show that the use of robust estimators is widely accepted in gene expression analysis. However, the selection of robust methods seems to be mainly driven by their high breakdown point and not by efficiency. Results We describe how optimally robust radius-minimax (rmx) estimators, i.e. estimators that minimize an asymptotic maximum risk on shrinking neighborhoods about an ideal model, can be used for the aggregation of multiple raw signal intensities to one expression value for Affymetrix and Illumina data. With regard to the Affymetrix data, we have implemented an algorithm which is a variant of MAS 5.0. Using datasets from the literature and Monte-Carlo simulations we provide some reasoning for assuming approximate log-normal distributions of the raw signal intensities by means of the Kolmogorov distance, at least for the discussed datasets, and compare the results of our preprocessing algorithms with the results of Affymetrix's MAS 5.0 and Illumina's default method. The numerical results indicate that when using rmx estimators an accuracy improvement of about 10-20% is obtained compared to Affymetrix's MAS 5.0 and about 1-5% compared to Illumina's default method. The improvement is also visible in the analysis of technical replicates where the reproducibility of the values (in terms of Pearson and Spearman correlation) is increased for all Affymetrix and almost all Illumina examples considered. Our algorithms are implemented in the R package named RobLoxBioC which is publicly available via CRAN, The Comprehensive R Archive Network (http://cran.r-project.org/web/packages/RobLoxBioC/). Conclusions Optimally robust rmx estimators have a high breakdown point and are computationally feasible. They can lead to a considerable gain in efficiency for well-established bioinformatics procedures and thus, can increase the reproducibility and power of subsequent statistical analysis. PMID:21118506

  1. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification.

    PubMed

    Oberthuer, André; Berthold, Frank; Warnat, Patrick; Hero, Barbara; Kahlert, Yvonne; Spitz, Rüdiger; Ernestus, Karen; König, Rainer; Haas, Stefan; Eils, Roland; Schwab, Manfred; Brors, Benedikt; Westermann, Frank; Fischer, Matthias

    2006-11-01

    To develop a gene expression-based classifier for neuroblastoma patients that reliably predicts courses of the disease. Two hundred fifty-one neuroblastoma specimens were analyzed using a customized oligonucleotide microarray comprising 10,163 probes for transcripts with differential expression in clinical subgroups of the disease. Subsequently, the prediction analysis for microarrays (PAM) was applied to a first set of patients with maximally divergent clinical courses (n = 77). The classification accuracy was estimated by a complete 10-times-repeated 10-fold cross validation, and a 144-gene predictor was constructed from this set. This classifier's predictive power was evaluated in an independent second set (n = 174) by comparing results of the gene expression-based classification with those of risk stratification systems of current trials from Germany, Japan, and the United States. The first set of patients was accurately predicted by PAM (cross-validated accuracy, 99%). Within the second set, the PAM classifier significantly separated cohorts with distinct courses (3-year event-free survival [EFS] 0.86 +/- 0.03 [favorable; n = 115] v 0.52 +/- 0.07 [unfavorable; n = 59] and 3-year overall survival 0.99 +/- 0.01 v 0.84 +/- 0.05; both P < .0001) and separated risk groups of current neuroblastoma trials into subgroups with divergent outcome (NB2004: low-risk 3-year EFS 0.86 +/- 0.04 v 0.25 +/- 0.15, P < .0001; intermediate-risk 1.00 v 0.57 +/- 0.19, P = .018; high-risk 0.81 +/- 0.10 v 0.56 +/- 0.08, P = .06). In a multivariate Cox regression model, the PAM predictor classified patients of the second set more accurately than risk stratification of current trials from Germany, Japan, and the United States (P < .001; hazard ratio, 4.756 [95% CI, 2.544 to 8.893]). Integration of gene expression-based class prediction of neuroblastoma patients may improve risk estimation of current neuroblastoma trials.

  2. paraGSEA: a scalable approach for large-scale gene expression profiling

    PubMed Central

    Peng, Shaoliang; Yang, Shunyun

    2017-01-01

    Abstract More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. PMID:28973463

  3. Karyotype Stability and Unbiased Fractionation in the Paleo-Allotetraploid Cucurbita Genomes.

    PubMed

    Sun, Honghe; Wu, Shan; Zhang, Guoyu; Jiao, Chen; Guo, Shaogui; Ren, Yi; Zhang, Jie; Zhang, Haiying; Gong, Guoyi; Jia, Zhangcai; Zhang, Fan; Tian, Jiaxing; Lucas, William J; Doyle, Jeff J; Li, Haizhen; Fei, Zhangjun; Xu, Yong

    2017-10-09

    The Cucurbita genus contains several economically important species in the Cucurbitaceae family. Here, we report high-quality genome sequences of C. maxima and C. moschata and provide evidence supporting an allotetraploidization event in Cucurbita. We are able to partition the genome into two homoeologous subgenomes based on different genetic distances to melon, cucumber, and watermelon in the Benincaseae tribe. We estimate that the two diploid progenitors successively diverged from Benincaseae around 31 and 26 million years ago (Mya), respectively, and the allotetraploidization happened at some point between 26 Mya and 3 Mya, the estimated date when C. maxima and C. moschata diverged. The subgenomes have largely maintained the chromosome structures of their diploid progenitors. Such long-term karyotype stability after polyploidization has not been commonly observed in plant polyploids. The two subgenomes have retained similar numbers of genes, and neither subgenome is globally dominant in gene expression. Allele-specific expression analysis in the C. maxima × C. moschata interspecific F 1 hybrid and their two parents indicates the predominance of trans-regulatory effects underlying expression divergence of the parents, and detects transgressive gene expression changes in the hybrid correlated with heterosis in important agronomic traits. Our study provides insights into polyploid genome evolution and valuable resources for genetic improvement of cucurbit crops. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.

  4. Cloning, expression and purification of d-tagatose 3-epimerase gene from Escherichia coli JM109.

    PubMed

    He, Xiaoliang; Zhou, Xiaohui; Yang, Zi; Xu, Le; Yu, Yuxiu; Jia, Lingling; Li, Guoqing

    2015-10-01

    An unknown d-tagatose 3-epimerase (DTE) containing a IoIE domain was identified and cloned from Escherichia coli. This gene was subcloned into the prokaryotic expression vector pET-15b, and induced by IPTG in E. coli BL21 expression system. Through His-select gel column purification and fast-protein liquid chromatography, highly purified and stable DTE protein was produced. The molecular weight of the DTE protein was estimated to be 29.8kDa. The latest 83 DTE sequences from public database were selected and analyzed by molecular clustering, multi-sequence alignment. DTEs were roughly divided into five categories. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress.

    PubMed

    Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun

    2014-06-01

    Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Validation of Reference Genes for Quantitative Expression Analysis by Real-Time RT-PCR in Four Lepidopteran Insects

    PubMed Central

    Teng, Xiaolu; Zhang, Zan; He, Guiling; Yang, Liwen; Li, Fei

    2012-01-01

    Quantitative real-time polymerase chain reaction (qPCR) is an efficient and widely used technique to monitor gene expression. Housekeeping genes (HKGs) are often empirically selected as the reference genes for data normalization. However, the suitability of HKGs used as the reference genes has been seldom validated. Here, six HKGs were chosen (actin A3, actin A1, GAPDH, G3PDH, E2F, rp49) in four lepidopteran insects Bombyx mori L. (Lepidoptera: Bombycidae), Plutella xylostella L. (Plutellidae), Chilo suppressalis Walker (Crambidae), and Spodoptera exigua Hübner (Noctuidae) to study their expression stability. The algorithms of geNorm, NormFinder, stability index, and ΔCt analysis were used to evaluate these HKGs. Across different developmental stages, actin A1 was the most stable in P. xylostella and C. suppressalis, but it was the least stable in B. mori and S. exigua. Rp49 and GAPDH were the most stable in B. mori and S. exigua, respectively. In different tissues, GAPDH, E2F, and Rp49 were the most stable in B. mori, S. exigua, and C. suppressalis, respectively. The relative abundances of Siwi genes estimated by 2-ΔΔCt method were tested with different HKGs as the reference gene, proving the importance of internal controls in qPCR data analysis. The results not only presented a list of suitable reference genes in four lepidopteran insects, but also proved that the expression stabilities of HKGs were different among evolutionarily close species. There was no single universal reference gene that could be used in all situations. It is indispensable to validate the expression of HKGs before using them as the internal control in qPCR. PMID:22938136

  7. Validation of reference genes for quantitative expression analysis by real-time rt-PCR in four lepidopteran insects.

    PubMed

    Teng, Xiaolu; Zhang, Zan; He, Guiling; Yang, Liwen; Li, Fei

    2012-01-01

    Quantitative real-time polymerase chain reaction (qPCR) is an efficient and widely used technique to monitor gene expression. Housekeeping genes (HKGs) are often empirically selected as the reference genes for data normalization. However, the suitability of HKGs used as the reference genes has been seldom validated. Here, six HKGs were chosen (actin A3, actin A1, GAPDH, G3PDH, E2F, rp49) in four lepidopteran insects Bombyx mori L. (Lepidoptera: Bombycidae), Plutella xylostella L. (Plutellidae), Chilo suppressalis Walker (Crambidae), and Spodoptera exigua Hübner (Noctuidae) to study their expression stability. The algorithms of geNorm, NormFinder, stability index, and ΔCt analysis were used to evaluate these HKGs. Across different developmental stages, actin A1 was the most stable in P. xylostella and C. suppressalis, but it was the least stable in B. mori and S. exigua. Rp49 and GAPDH were the most stable in B. mori and S. exigua, respectively. In different tissues, GAPDH, E2F, and Rp49 were the most stable in B. mori, S. exigua, and C. suppressalis, respectively. The relative abundances of Siwi genes estimated by 2(-ΔΔCt) method were tested with different HKGs as the reference gene, proving the importance of internal controls in qPCR data analysis. The results not only presented a list of suitable reference genes in four lepidopteran insects, but also proved that the expression stabilities of HKGs were different among evolutionarily close species. There was no single universal reference gene that could be used in all situations. It is indispensable to validate the expression of HKGs before using them as the internal control in qPCR.

  8. Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

    PubMed Central

    Lin, Wei; Feng, Rui; Li, Hongzhe

    2014-01-01

    In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642

  9. Punctual Transcriptional Regulation by the Rice Circadian Clock under Fluctuating Field Conditions[OPEN

    PubMed Central

    Matsuzaki, Jun; Kawahara, Yoshihiro; Izawa, Takeshi

    2015-01-01

    Plant circadian clocks that oscillate autonomously with a roughly 24-h period are entrained by fluctuating light and temperature and globally regulate downstream genes in the field. However, it remains unknown how punctual internal time produced by the circadian clock in the field is and how it is affected by environmental fluctuations due to weather or daylength. Using hundreds of samples of field-grown rice (Oryza sativa) leaves, we developed a statistical model for the expression of circadian clock-related genes integrating diurnally entrained circadian clock with phase setting by light, both responses to light and temperature gated by the circadian clock. We show that expression of individual genes was strongly affected by temperature. However, internal time estimated from expression of multiple genes, which may reflect transcriptional regulation of downstream genes, is punctual to 22 min and not affected by weather, daylength, or plant developmental age in the field. We also revealed perturbed progression of internal time under controlled environment or in a mutant of the circadian clock gene GIGANTEA. Thus, we demonstrated that the circadian clock is a regulatory network of multiple genes that retains accurate physical time of day by integrating the perturbations on individual genes under fluctuating environments in the field. PMID:25757473

  10. Chemopreventive glucosinolate accumulation in various broccoli and collard tissues: Microfluidic-based targeted transcriptomics for by-product valorization

    PubMed Central

    Becker, Talon M.; Juvik, John A.

    2017-01-01

    Floret, leaf, and root tissues were harvested from broccoli and collard cultivars and extracted to determine their glucosinolate and hydrolysis product profiles using high performance liquid chromatography and gas chromotography. Quinone reductase inducing bioactivity, an estimate of anti-cancer chemopreventive potential, of the extracts was measured using a hepa1c1c7 murine cell line. Extracts from root tissues were significantly different from other tissues and contained high levels of gluconasturtiin and glucoerucin. Targeted gene expression analysis on glucosinolate biosynthesis revealed that broccoli root tissue has elevated gene expression of AOP2 and low expression of FMOGS-OX homologs, essentially the opposite of what was observed in broccoli florets, which accumulated high levels of glucoraphanin. Broccoli floret tissue has significantly higher nitrile formation (%) and epithionitrile specifier protein gene expression than other tissues. This study provides basic information of the glucosinolate metabolome and transcriptome for various tissues of Brassica oleracea that maybe utilized as potential byproducts for the nutraceutical market. PMID:28945821

  11. Chemopreventive glucosinolate accumulation in various broccoli and collard tissues: Microfluidic-based targeted transcriptomics for by-product valorization.

    PubMed

    Lee, Young-Sang; Ku, Kang-Mo; Becker, Talon M; Juvik, John A

    2017-01-01

    Floret, leaf, and root tissues were harvested from broccoli and collard cultivars and extracted to determine their glucosinolate and hydrolysis product profiles using high performance liquid chromatography and gas chromotography. Quinone reductase inducing bioactivity, an estimate of anti-cancer chemopreventive potential, of the extracts was measured using a hepa1c1c7 murine cell line. Extracts from root tissues were significantly different from other tissues and contained high levels of gluconasturtiin and glucoerucin. Targeted gene expression analysis on glucosinolate biosynthesis revealed that broccoli root tissue has elevated gene expression of AOP2 and low expression of FMOGS-OX homologs, essentially the opposite of what was observed in broccoli florets, which accumulated high levels of glucoraphanin. Broccoli floret tissue has significantly higher nitrile formation (%) and epithionitrile specifier protein gene expression than other tissues. This study provides basic information of the glucosinolate metabolome and transcriptome for various tissues of Brassica oleracea that maybe utilized as potential byproducts for the nutraceutical market.

  12. Analysis of gene expression levels in individual bacterial cells without image segmentation.

    PubMed

    Kwak, In Hae; Son, Minjun; Hagen, Stephen J

    2012-05-11

    Studies of stochasticity in gene expression typically make use of fluorescent protein reporters, which permit the measurement of expression levels within individual cells by fluorescence microscopy. Analysis of such microscopy images is almost invariably based on a segmentation algorithm, where the image of a cell or cluster is analyzed mathematically to delineate individual cell boundaries. However segmentation can be ineffective for studying bacterial cells or clusters, especially at lower magnification, where outlines of individual cells are poorly resolved. Here we demonstrate an alternative method for analyzing such images without segmentation. The method employs a comparison between the pixel brightness in phase contrast vs fluorescence microscopy images. By fitting the correlation between phase contrast and fluorescence intensity to a physical model, we obtain well-defined estimates for the different levels of gene expression that are present in the cell or cluster. The method reveals the boundaries of the individual cells, even if the source images lack the resolution to show these boundaries clearly. Copyright © 2012 Elsevier Inc. All rights reserved.

  13. SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies

    PubMed Central

    2013-01-01

    Background Surrogate variable analysis (SVA) is a powerful method to identify, estimate, and utilize the components of gene expression heterogeneity due to unknown and/or unmeasured technical, genetic, environmental, or demographic factors. These sources of heterogeneity are common in gene expression studies, and failing to incorporate them into the analysis can obscure results. Using SVA increases the biological accuracy and reproducibility of gene expression studies by identifying these sources of heterogeneity and correctly accounting for them in the analysis. Results Here we have developed a web application called SVAw (Surrogate variable analysis Web app) that provides a user friendly interface for SVA analyses of genome-wide expression studies. The software has been developed based on open source bioconductor SVA package. In our software, we have extended the SVA program functionality in three aspects: (i) the SVAw performs a fully automated and user friendly analysis workflow; (ii) It calculates probe/gene Statistics for both pre and post SVA analysis and provides a table of results for the regression of gene expression on the primary variable of interest before and after correcting for surrogate variables; and (iii) it generates a comprehensive report file, including graphical comparison of the outcome for the user. Conclusions SVAw is a web server freely accessible solution for the surrogate variant analysis of high-throughput datasets and facilitates removing all unwanted and unknown sources of variation. It is freely available for use at http://psychiatry.igm.jhmi.edu/sva. The executable packages for both web and standalone application and the instruction for installation can be downloaded from our web site. PMID:23497726

  14. Identification of transcript regulatory patterns in cell differentiation.

    PubMed

    Gusnanto, Arief; Gosling, John Paul; Pope, Christopher

    2017-10-15

    Studying transcript regulatory patterns in cell differentiation is critical in understanding its complex nature of the formation and function of different cell types. This is done usually by measuring gene expression at different stages of the cell differentiation. However, if the gene expression data available are only from the mature cells, we have some challenges in identifying transcript regulatory patterns that govern the cell differentiation. We propose to exploit the information of the lineage of cell differentiation in terms of correlation structure between cell types. We assume that two different cell types that are close in the lineage will exhibit many common genes that are co-expressed relative to those that are far in the lineage. Current analysis methods tend to ignore this correlation by testing for differential expression assuming some sort of independence between cell types. We employ a Bayesian approach to estimate the posterior distribution of the mean of expression in each cell type, by taking into account the cell formation path in the lineage. This enables us to infer genes that are specific in each cell type, indicating the genes are involved in directing the cell differentiation to that particular cell type. We illustrate the method using gene expression data from a study of haematopoiesis. R codes to perform the analysis are available in http://www1.maths.leeds.ac.uk/∼arief/R/CellDiff/. a.gusnanto@leeds.ac.uk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. Effects of intense magnetic fields on sedimentation pattern and gene expression profile in budding yeast

    NASA Astrophysics Data System (ADS)

    Ikehata, Masateru; Iwasaka, Masakazu; Miyakoshi, Junji; Ueno, Shoogo; Koana, Takao

    2003-05-01

    Effects of magnetic fields (MFs) on biological systems are usually investigated using biological indices such as gene expression profiles. However, to precisely evaluate the biological effects of MF, the effects of intense MFs on systematic material transport processes including experimental environment must be seriously taken into consideration. In this study, a culture of the budding yeast, Saccharomyces cerevisiae, was used as a model for an in vitro biological test system. After exposure to 5 T static vertical MF, we found a difference in the sedimentation pattern of cells depending on the location of the dish in the magnet bore. Sedimented cells were localized in the center of the dish when they were placed in the lower part of the magnet bore while the sedimentation of the cells was uniform in dishes placed in the upper part of the bore because of the diamagnetic force. Genome wide gene expression profile of the yeast cells after exposure to 5 T static MF for 2 h suggested that the MF did not affect the expression level of any gene in yeast cells although the sedimentation pattern was altered. In addition, exposure to 10 T for 1 h and 5 T for 24 h also did not affect the gene expression. On the other hand, a slight change in expressions of several genes which are related to respiration was observed by exposure to a 14 T static MF for 24 h. The necessity of estimating the indirect effects of MFs on a study of its biological effect of MF in vitro will be discussed.

  16. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

    PubMed Central

    Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol; Scalcinati, Gionata; Fagerberg, Linn; Uhlén, Matthias; Nielsen, Jens

    2012-01-01

    RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation ≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation ≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data. PMID:22965124

  17. Correlated gene expression and anatomical communication support synchronized brain activity in the mouse functional connectome.

    PubMed

    Mills, Brian D; Grayson, David S; Shunmugavel, Anandakumar; Miranda-Dominguez, Oscar; Feczko, Eric; Earl, Eric; Neve, Kim; Fair, Damien A

    2018-05-22

    Cognition and behavior depend on synchronized intrinsic brain activity that is organized into functional networks across the brain. Research has investigated how anatomical connectivity both shapes and is shaped by these networks, but not how anatomical connectivity interacts with intra-areal molecular properties to drive functional connectivity. Here, we present a novel linear model to explain functional connectivity by integrating systematically obtained measurements of axonal connectivity, gene expression, and resting state functional connectivity MRI in the mouse brain. The model suggests that functional connectivity arises from both anatomical links and inter-areal similarities in gene expression. By estimating these effects, we identify anatomical modules in which correlated gene expression and anatomical connectivity support functional connectivity. Along with providing evidence that not all genes equally contribute to functional connectivity, this research establishes new insights regarding the biological underpinnings of coordinated brain activity measured by BOLD fMRI. SIGNIFICANCE STATEMENT Efforts at characterizing the functional connectome with fMRI have risen exponentially over the last decade. Yet despite this rise, the biological underpinnings of these functional measurements are still largely unknown. The current report begins to fill this void by investigating the molecular underpinnings of the functional connectome through an integration of systematically obtained structural information and gene expression data throughout the rodent brain. We find that both white matter connectivity and similarity in regional gene expression relate to resting state functional connectivity. The current report furthers our understanding of the biological underpinnings of the functional connectome and provides a linear model that can be utilized to streamline preclinical animal studies of disease. Copyright © 2018 the authors.

  18. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  19. The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.

    PubMed

    Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R

    2017-01-01

    Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.

  20. Endometrial gene expression profile of pregnant sows with extreme phenotypes for reproductive efficiency.

    PubMed

    Córdoba, S; Balcells, I; Castelló, A; Ovilo, C; Noguera, J L; Timoneda, O; Sánchez, A

    2015-10-05

    Prolificacy can directly impact porcine profitability, but large genetic variation and low heritability have been found regarding litter size among porcine breeds. To identify key differences in gene expression associated to swine reproductive efficiency, we performed a transcriptome analysis of sows' endometrium from an Iberian x Meishan F2 population at day 30-32 of gestation, classified according to their estimated breeding value (EBV) as high (H, EBV > 0) and low (L, EBV < 0) prolificacy phenotypes. For each sample, mRNA and small RNA libraries were RNA-sequenced, identifying 141 genes and 10 miRNAs differentially expressed between H and L groups. We selected four miRNAs based on their role in reproduction, and five genes displaying the highest differences and a positive mapping into known reproductive QTLs for RT-qPCR validation on the whole extreme population. Significant differences were validated for genes: PTGS2 (p = 0.03; H/L ratio = 3.50), PTHLH (p = 0.03; H/L ratio = 3.69), MMP8 (p = 0.01; H/L ratio =4.41) and SCNN1G (p = 0.04; H/L ratio = 3.42). Although selected miRNAs showed similar expression levels between H and L groups, significant correlation was found between the expression level of ssc-miR-133a (p < 0.01) and ssc-miR-92a (p < 0.01) and validated genes. These results provide a better understanding of the genetic architecture of prolificacy-related traits and embryo implantation failure in pigs.

  1. puma: a Bioconductor package for propagating uncertainty in microarray analysis.

    PubMed

    Pearson, Richard D; Liu, Xuejun; Sanguinetti, Guido; Milo, Marta; Lawrence, Neil D; Rattray, Magnus

    2009-07-09

    Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied. puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.

  2. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing.

    PubMed

    Marinov, Georgi K; Williams, Brian A; McCue, Ken; Schroth, Gary P; Gertz, Jason; Myers, Richard M; Wold, Barbara J

    2014-03-01

    Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.

  3. Scoring clustering solutions by their biological relevance.

    PubMed

    Gat-Viks, I; Sharan, R; Shamir, R

    2003-12-12

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.

  4. Co-expression networks reveal the tissue-specific regulation of transcription and splicing

    PubMed Central

    Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D.H.; Jo, Brian; Gao, Chuan; McDowell, Ian C.; Engelhardt, Barbara E.

    2017-01-01

    Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues. PMID:29021288

  5. Dynamics of cellular level function and regulation derived from murine expression array data.

    PubMed

    de Bivort, Benjamin; Huang, Sui; Bar-Yam, Yaneer

    2004-12-21

    A major open question of systems biology is how genetic and molecular components interact to create phenotypes at the cellular level. Although much recent effort has been dedicated to inferring effective regulatory influences within small networks of genes, the power of microarray bioinformatics has yet to be used to determine functional influences at the cellular level. In all cases of data-driven parameter estimation, the number of model parameters estimable from a set of data is strictly limited by the size of that set. Rather than infer parameters describing the detailed interactions of just a few genes, we chose a larger-scale investigation so that the cumulative effects of all gene interactions could be analyzed to identify the dynamics of cellular-level function. By aggregating genes into large groups with related behaviors (megamodules), we were able to determine the effective aggregate regulatory influences among 12 major gene groups in murine B lymphocytes over a variety of time steps. Intriguing observations about the behavior of cells at this high level of abstraction include: (i) a medium-term critical global transcriptional dependence on ATP-generating genes in the mitochondria, (ii) a longer-term dependence on glycolytic genes, (iii) the dual role of chromatin-reorganizing genes in transcriptional activation and repression, (iv) homeostasis-favoring influences, (v) the indication that, as a group, G protein-mediated signals are not concentration-dependent in their influence on target gene expression, and (vi) short-term-activating/long-term-repressing behavior of the cell-cycle system that reflects its oscillatory behavior.

  6. Factors affecting expression of the recF gene of Escherichia coli K-12.

    PubMed

    Sandler, S J; Clark, A J

    1990-01-31

    This report describes four factors which affect expression of the recF gene from strong upstream lambda promoters under temperature-sensitive cIAt2-encoded repressor control. The first factor was the long mRNA leader sequence consisting of the Escherichia coli dnaN gene and 95% of the dnaA gene and lambda bet, N (double amber) and 40% of the exo gene. When most of this DNA was deleted, RecF became detectable in maxicells. The second factor was the vector, pBEU28, a runaway replication plasmid. When we substituted pUC118 for pBEU28, RecF became detectable in whole cells by the Coomassie blue staining technique. The third factor was the efficiency of initiation of translation. We used site-directed mutagenesis to change the mRNA leader, ribosome-binding site and the 3 bp before and after the translational start codon. Monitoring the effect of these mutational changes by translational fusion to lacZ, we discovered that the efficiency of initiation of translation was increased 30-fold. Only an estimated two- or threefold increase in accumulated levels of RecF occurred, however. This led us to discover the fourth factor, namely sequences in the recF gene itself. These sequences reduce expression of the recF-lacZ fusion genes 100-fold. The sequences responsible for this decrease in expression occur in four regions in the N-terminal half of recF. Expression is reduced by some sequences at the transcriptional level and by others at the translational level.

  7. An intersection network based on combining SNP co-association and RNA co-expression networks for feed utilization traits in Japanese Black cattle.

    PubMed

    Okada, D; Endo, S; Matsuda, H; Ogawa, S; Taniguchi, Y; Katsuta, T; Watanabe, T; Iwaisaki, H

    2018-05-12

    Genome-wide association studies (GWAS) of quantitative traits have detected numerous genetic associations, but they encounter difficulties in pinpointing prominent candidate genes and inferring gene networks. The present study used a systems genetics approach integrating GWAS results with external RNA-expression data to detect candidate gene networks in feed utilization and growth traits of Japanese Black cattle, which are matters of concern. A SNP co-association network was derived from significant correlations between SNPs with effects estimated by GWAS across seven phenotypic traits. The resulting network genes contained significant numbers of annotations related to the traits. Using bovine transcriptome data from a public database, an RNA co-expression network was inferred based on the similarity of expression patterns across different tissues. An intersection network was then generated by superimposing the SNP and RNA networks and extracting shared interactions. This intersection network contained four tissue-specific modules: nervous system, reproductive system, muscular system, and glands. To characterize the structure (topographical properties) of the three networks, their scale-free properties were evaluated, which revealed that the intersection network was the most scale-free. In the sub-network containing the most connected transcription factors (URI1, ROCK2 and ETV6), most genes were widely expressed across tissues, and genes previously shown to be involved in the traits were found. Results indicated that the current approach might be used to construct a gene network that better reflects biological information, providing encouragement for the genetic dissection of economically important quantitative traits.

  8. A Penalized Robust Method for Identifying Gene-Environment Interactions

    PubMed Central

    Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Xie, Yang; Ma, Shuangge

    2015-01-01

    In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model mis-specification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications. PMID:24616063

  9. Discovery of error-tolerant biclusters from noisy gene expression data.

    PubMed

    Gupta, Rohit; Rao, Navneet; Kumar, Vipin

    2011-11-24

    An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET-bicluster approach not only recover larger set of genes as compared to those obtained from RAP approach but also have higher functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. The results obtained for both the problems: functional module discovery and biomarkers discovery, clearly signifies the usefulness of the proposed ET-bicluster approach and illustrate the importance of explicitly incorporating noise/errors in discovering coherent groups of genes from gene-expression data.

  10. Expression of the filaggrin gene in umbilical cord blood predicts eczema risk in infancy: A birth cohort study.

    PubMed

    Ziyab, A H; Ewart, S; Lockett, G A; Zhang, H; Arshad, H; Holloway, J W; Karmaus, W

    2017-09-01

    Filaggrin gene (FLG) expression, particularly in the skin, has been linked to the development of the skin barrier and is associated with eczema risk. However, knowledge as to whether FLG expression in umbilical cord blood (UCB) is associated with eczema development and prediction is lacking. This study sought to assess whether FLG expression in UCB associates with and predicts the development of eczema in infancy. Infants enrolled in a birth cohort study (n=94) were assessed for eczema at ages 3, 6, and 12 months. Five probes measuring FLG transcripts expression in UCB were available from genomewide gene expression profiling. FLG genetic variants R501X, 2282del4, and S3247X were genotyped. Associations were assessed using Poisson regression with robust variance estimation. Area under the curve (AUC), describing the discriminatory/predictive performance of fitted models, was estimated from logistic regression. Increased level of FLG expression measured by probe A_24_P51322 was associated with reduced risk of eczema during the first year of life (RR=0.60, 95% CI: 0.38-0.95). In contrast, increased level of FLG antisense transcripts measured by probe A_21_P0014075 was associated with increased risk of eczema (RR=2.02, 95% CI: 1.10-3.72). In prediction models including FLG expression, FLG genetic variants, and sex, discrimination between children who will and will not develop eczema at 3 months of age was high (AUC: 0.91, 95% CI: 0.84-0.98). This study demonstrated, for the first time, that FLG expression in UCB is associated with eczema development in infancy. Moreover, our analysis provided prediction models that were capable of discriminating, to a great extent, between those who will and will not develop eczema in infancy. Therefore, early identification of infants at increased risk of developing eczema is possible and such high-risk newborns may benefit from early stratification and intervention. © 2017 John Wiley & Sons Ltd.

  11. Genome-wide characterization of phenylalanine ammonia-lyase gene family in watermelon (Citrullus lanatus).

    PubMed

    Dong, Chun-Juan; Shang, Qing-Mao

    2013-07-01

    Phenylalanine ammonia-lyase (PAL), the first enzyme in the phenylpropanoid pathway, plays a critical role in plant growth, development, and adaptation. PAL enzymes are encoded by a gene family in plants. Here, we report a genome-wide search for PAL genes in watermelon. A total of 12 PAL genes, designated ClPAL1-12, are identified . Nine are arranged in tandem in two duplication blocks located on chromosomes 4 and 7, and the other three ClPAL genes are distributed as single copies on chromosomes 2, 3, and 8. Both the cDNA and protein sequences of ClPALs share an overall high identity with each other. A phylogenetic analysis places 11 of the ClPALs into a separate cucurbit subclade, whereas ClPAL2, which belongs to neither monocots nor dicots, may serve as an ancestral PAL in plants. In the cucurbit subclade, seven ClPALs form homologous pairs with their counterparts from cucumber. Expression profiling reveals that 11 of the ClPAL genes are expressed and show preferential expression in the stems and male and female flowers. Six of the 12 ClPALs are moderately or strongly expressed in the fruits, particularly in the pulp, suggesting the potential roles of PAL in the development of fruit color and flavor. A promoter motif analysis of the ClPAL genes implies redundant but distinctive cis-regulatory structures for stress responsiveness. Finally, duplication events during the evolution and expansion of the ClPAL gene family are discussed, and the relationships between the ClPAL genes and their cucumber orthologs are estimated.

  12. Global changes in gene expression, assayed by microarray hybridization and quantitative RT-PCR, during acclimation of three Arabidopsis thaliana accessions to sub-zero temperatures after cold acclimation.

    PubMed

    Le, Mai Q; Pagter, Majken; Hincha, Dirk K

    2015-01-01

    During cold acclimation plants increase in freezing tolerance in response to low non-freezing temperatures. This is accompanied by many physiological, biochemical and molecular changes that have been extensively investigated. In addition, plants of many species, including Arabidopsis thaliana, become more freezing tolerant during exposure to mild, non-damaging sub-zero temperatures after cold acclimation. There is hardly any information available about the molecular basis of this adaptation. Here, we have used microarrays and a qRT-PCR primer platform covering 1,880 genes encoding transcription factors (TFs) to monitor changes in gene expression in the Arabidopsis accessions Columbia-0, Rschew and Tenela during the first 3 days of sub-zero acclimation at -3 °C. The results indicate that gene expression during sub-zero acclimation follows a tighly controlled time-course. Especially AP2/EREBP and WRKY TFs may be important regulators of sub-zero acclimation, although the CBF signal transduction pathway seems to be less important during sub-zero than during cold acclimation. Globally, we estimate that approximately 5% of all Arabidopsis genes are regulated during sub-zero acclimation. Particularly photosynthesis-related genes are down-regulated and genes belonging to the functional classes of cell wall biosynthesis, hormone metabolism and RNA regulation of transcription are up-regulated. Collectively, these data provide the first global analysis of gene expression during sub-zero acclimation and allow the identification of candidate genes for forward and reverse genetic studies into the molecular mechanisms of sub-zero acclimation.

  13. Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

    PubMed Central

    2011-01-01

    Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981

  14. Transcriptome de novo assembly from next-generation sequencing and comparative analyses in the hexaploid salt marsh species Spartina maritima and Spartina alterniflora (Poaceae)

    PubMed Central

    Ferreira de Carvalho, J; Poulain, J; Da Silva, C; Wincker, P; Michon-Coudouel, S; Dheilly, A; Naquin, D; Boutte, J; Salmon, A; Ainouche, M

    2013-01-01

    Spartina species have a critical ecological role in salt marshes and represent an excellent system to investigate recurrent polyploid speciation. Using the 454 GS-FLX pyrosequencer, we assembled and annotated the first reference transcriptome (from roots and leaves) for two related hexaploid Spartina species that hybridize in Western Europe, the East American invasive Spartina alterniflora and the Euro-African S. maritima. The de novo read assembly generated 38 478 consensus sequences and 99% found an annotation using Poaceae databases, representing a total of 16 753 non-redundant genes. Spartina expressed sequence tags were mapped onto the Sorghum bicolor genome, where they were distributed among the subtelomeric arms of the 10 S. bicolor chromosomes, with high gene density correlation. Normalization of the complementary DNA library improved the number of annotated genes. Ecologically relevant genes were identified among GO biological function categories in salt and heavy metal stress response, C4 photosynthesis and in lignin and cellulose metabolism. Expression of some of these genes had been found to be altered by hybridization and genome duplication in a previous microarray-based study in Spartina. As these species are hexaploid, up to three duplicated homoeologs may be expected per locus. When analyzing sequence polymorphism at four different loci in S. maritima and S. alterniflora, we found up to four haplotypes per locus, suggesting the presence of two expressed homoeologous sequences with one or two allelic variants each. This reference transcriptome will allow analysis of specific Spartina genes of ecological or evolutionary interest, estimation of homoeologous gene expression variation using RNA-seq and further gene expression evolution analyses in natural populations. PMID:23149455

  15. Predicting human genetic interactions from cancer genome evolution.

    PubMed

    Lu, Xiaowen; Megchelenbrink, Wout; Notebaart, Richard A; Huynen, Martijn A

    2015-01-01

    Synthetic Lethal (SL) genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75) for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.

  16. Prediction of Bacillus weihenstephanensis acid resistance: the use of gene expression patterns to select potential biomarkers.

    PubMed

    Desriac, N; Postollec, F; Coroller, L; Sohier, D; Abee, T; den Besten, H M W

    2013-10-01

    Exposure to mild stress conditions can activate stress adaptation mechanisms and provide cross-resistance towards otherwise lethal stresses. In this study, an approach was followed to select molecular biomarkers (quantitative gene expressions) to predict induced acid resistance after exposure to various mild stresses, i.e. exposure to sublethal concentrations of salt, acid and hydrogen peroxide during 5 min to 60 min. Gene expression patterns of unstressed and mildly stressed cells of Bacillus weihenstephanensis were correlated to their acid resistance (3D value) which was estimated after exposure to lethal acid conditions. Among the twenty-nine candidate biomarkers, 12 genes showed expression patterns that were correlated either linearly or non-linearly to acid resistance, while for the 17 other genes the correlation remains to be determined. The selected genes represented two types of biomarkers, (i) four direct biomarker genes (lexA, spxA, narL, bkdR) for which expression patterns upon mild stress treatment were linearly correlated to induced acid resistance; and (ii) nine long-acting biomarker genes (spxA, BcerKBAB4_0325, katA, trxB, codY, lacI, BcerKBAB4_1716, BcerKBAB4_2108, relA) which were transiently up-regulated during mild stress exposure and correlated to increased acid resistance over time. Our results highlight that mild stress induced transcripts can be linearly or non-linearly correlated to induced acid resistance and both approaches can be used to find relevant biomarkers. This quantitative and systematic approach opens avenues to select cellular biomarkers that could be incremented in mathematical models to predict microbial behaviour. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms.

    PubMed

    Milnthorpe, Andrew T; Soloviev, Mikhail

    2011-04-15

    The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.

  18. Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms

    PubMed Central

    2011-01-01

    Background The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. Results We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Conclusion Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used. PMID:21496233

  19. A Comparison of the Costs and Benefits of Bacterial Gene Expression

    DOE PAGES

    Price, Morgan N.; Wetmore, Kelly M.; Deutschbauer, Adam M.; ...

    2016-10-06

    In order to study how a bacterium allocates its resources, we compared the costs and benefits of most (86%) of the proteins in Escherichia coli K-12 during growth in minimal glucose medium. The cost or investment in each protein was estimated from ribosomal profiling data, and the benefit of each protein was measured by assaying a library of transposon mutants. We found that proteins that are important for fitness are usually highly expressed, and 95% of these proteins are expressed at above 13 parts per million (ppm). Conversely, proteins that do not measurably benefit the host (with a benefit ofmore » less than 5% per generation) tend to be weakly expressed, with a median expression of 13 ppm. In aggregate, genes with no detectable benefit account for 31% of protein production, or about 22% if we correct for genetic redundancy. Though some of the apparently unnecessary expression could have subtle benefits in minimal glucose medium, the majority of the burden is due to genes that are important in other conditions. We propose that at least 13% of the cell's protein is on standby in case conditions change.« less

  20. A Comparison of the Costs and Benefits of Bacterial Gene Expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Price, Morgan N.; Wetmore, Kelly M.; Deutschbauer, Adam M.

    In order to study how a bacterium allocates its resources, we compared the costs and benefits of most (86%) of the proteins in Escherichia coli K-12 during growth in minimal glucose medium. The cost or investment in each protein was estimated from ribosomal profiling data, and the benefit of each protein was measured by assaying a library of transposon mutants. We found that proteins that are important for fitness are usually highly expressed, and 95% of these proteins are expressed at above 13 parts per million (ppm). Conversely, proteins that do not measurably benefit the host (with a benefit ofmore » less than 5% per generation) tend to be weakly expressed, with a median expression of 13 ppm. In aggregate, genes with no detectable benefit account for 31% of protein production, or about 22% if we correct for genetic redundancy. Though some of the apparently unnecessary expression could have subtle benefits in minimal glucose medium, the majority of the burden is due to genes that are important in other conditions. We propose that at least 13% of the cell's protein is on standby in case conditions change.« less

  1. Heritability in the genomics era--concepts and misconceptions.

    PubMed

    Visscher, Peter M; Hill, William G; Wray, Naomi R

    2008-04-01

    Heritability allows a comparison of the relative importance of genes and environment to the variation of traits within and across populations. The concept of heritability and its definition as an estimable, dimensionless population parameter was introduced by Sewall Wright and Ronald Fisher nearly a century ago. Despite continuous misunderstandings and controversies over its use and application, heritability remains key to the response to selection in evolutionary biology and agriculture, and to the prediction of disease risk in medicine. Recent reports of substantial heritability for gene expression and new estimation methods using marker data highlight the relevance of heritability in the genomics era.

  2. High throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes

    PubMed Central

    Hidalgo, Marta R.; Cubuk, Cankut; Amadoz, Alicia; Salavert, Francisco; Carbonell-Caballero, José; Dopazo, Joaquin

    2017-01-01

    Understanding the aspects of the cell functionality that account for disease or drug action mechanisms is a main challenge for precision medicine. Here we propose a new method that models cell signaling using biological knowledge on signal transduction. The method recodes individual gene expression values (and/or gene mutations) into accurate measurements of changes in the activity of signaling circuits, which ultimately constitute high-throughput estimations of cell functionalities caused by gene activity within the pathway. Moreover, such estimations can be obtained either at cohort-level, in case/control comparisons, or personalized for individual patients. The accuracy of the method is demonstrated in an extensive analysis involving 5640 patients from 12 different cancer types. Circuit activity measurements not only have a high diagnostic value but also can be related to relevant disease outcomes such as survival, and can be used to assess therapeutic interventions. PMID:28042959

  3. RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

    PubMed

    Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

    2017-01-01

    Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.

  4. Activation of Wnt/β-Catenin Pathway in Monocytes Derived from Chronic Kidney Disease Patients

    PubMed Central

    Al-Chaqmaqchi, Heevy Abdulkareem Musa; Moshfegh, Ali; Dadfar, Elham; Paulsson, Josefin; Hassan, Moustapha; Jacobson, Stefan H.; Lundahl, Joachim

    2013-01-01

    Patients with chronic kidney disease (CKD) have significantly increased morbidity and mortality resulting from infections and cardiovascular diseases. Since monocytes play an essential role in host immunity, this study was directed to explore the gene expression profile in order to identify differences in activated pathways in monocytes relevant to the pathophysiology of atherosclerosis and increased susceptibility to infections. Monocytes from CKD patients (stages 4 and 5, estimated GFR <20 ml/min/1.73 m2) and healthy donors were collected from peripheral blood. Microarray gene expression profile was performed and data were interpreted by GeneSpring software and by PANTHER tool. Western blot was done to validate the pathway members. The results demonstrated that 600 and 272 genes were differentially up- and down regulated respectively in the patient group. Pathways involved in the inflammatory response were highly expressed and the Wnt/β-catenin signaling pathway was the most significant pathway expressed in the patient group. Since this pathway has been attributed to a variety of inflammatory manifestations, the current findings may contribute to dysfunctional monocytes in CKD patients. Strategies to interfere with this pathway may improve host immunity and prevent cardiovascular complications in CKD patients. PMID:23935909

  5. DNA polymerase β variant Ile260Met generates global gene expression changes related to cellular transformation

    PubMed Central

    Sweasy, Joann B.

    2012-01-01

    Maintenance of genomic stability is essential for cellular survival. The base excision repair (BER) pathway is critical for resolution of abasic sites and damaged bases, estimated to occur 20,000 times in cells daily. DNA polymerase β (Pol β) participates in BER by filling DNA gaps that result from excision of damaged bases. Approximately 30% of human tumours express Pol β variants, many of which have altered fidelity and activity in vitro and when expressed, induce cellular transformation. The prostate tumour variant Ile260Met transforms cells and is a sequence-context-dependent mutator. To test the hypothesis that mutations induced in vivo by Ile260Met lead to cellular transformation, we characterized the genome-wide expression profile of a clone expressing Ile260Met as compared with its non-induced counterpart. Using a 1.5-fold minimum cut-off with a false discovery rate (FDR) of <0.05, 912 genes exhibit altered expression. Microarray results were confirmed by quantitative real-time polymerase chain reaction (qRT-PCR) and revealed unique expression profiles in other clones. Gene Ontology (GO) clusters were analyzed using Ingenuity Pathways Analysis to identify altered gene networks and associated nodes. We determined three nodes of interest that exhibited dysfunctional regulation of downstream gene products without themselves having altered expression. One node, peroxisome proliferator-activated protein γ (PPARG), was sequenced and found to contain a coding region mutation in PPARG2 only in transformed cells. Further analysis suggests that this mutation leads to dominant negative activity of PPARG2. PPARG is a transcription factor implicated to have tumour suppressor function. This suggests that the PPARG2 mutant may have played a role in driving cellular transformation. We conclude that PPARG induces cellular transformation by a mutational mechanism. PMID:22914675

  6. Congruence of Additive and Non-Additive Effects on Gene Expression Estimated from Pedigree and SNP Data

    PubMed Central

    Powell, Joseph E.; Henders, Anjali K.; McRae, Allan F.; Kim, Jinhee; Hemani, Gibran; Martin, Nicholas G.; Dermitzakis, Emmanouil T.; Gibson, Greg

    2013-01-01

    There is increasing evidence that heritable variation in gene expression underlies genetic variation in susceptibility to disease. Therefore, a comprehensive understanding of the similarity between relatives for transcript variation is warranted—in particular, dissection of phenotypic variation into additive and non-additive genetic factors and shared environmental effects. We conducted a gene expression study in blood samples of 862 individuals from 312 nuclear families containing MZ or DZ twin pairs using both pedigree and genotype information. From a pedigree analysis we show that the vast majority of genetic variation across 17,994 probes is additive, although non-additive genetic variation is identified for 960 transcripts. For 180 of the 960 transcripts with non-additive genetic variation, we identify expression quantitative trait loci (eQTL) with dominance effects in a sample of 339 unrelated individuals and replicate 31% of these associations in an independent sample of 139 unrelated individuals. Over-dominance was detected and replicated for a trans association between rs12313805 and ETV6, located 4MB apart on chromosome 12. Surprisingly, only 17 probes exhibit significant levels of common environmental effects, suggesting that environmental and lifestyle factors common to a family do not affect expression variation for most transcripts, at least those measured in blood. Consistent with the genetic architecture of common diseases, gene expression is predominantly additive, but a minority of transcripts display non-additive effects. PMID:23696747

  7. Congruence of additive and non-additive effects on gene expression estimated from pedigree and SNP data.

    PubMed

    Powell, Joseph E; Henders, Anjali K; McRae, Allan F; Kim, Jinhee; Hemani, Gibran; Martin, Nicholas G; Dermitzakis, Emmanouil T; Gibson, Greg; Montgomery, Grant W; Visscher, Peter M

    2013-05-01

    There is increasing evidence that heritable variation in gene expression underlies genetic variation in susceptibility to disease. Therefore, a comprehensive understanding of the similarity between relatives for transcript variation is warranted--in particular, dissection of phenotypic variation into additive and non-additive genetic factors and shared environmental effects. We conducted a gene expression study in blood samples of 862 individuals from 312 nuclear families containing MZ or DZ twin pairs using both pedigree and genotype information. From a pedigree analysis we show that the vast majority of genetic variation across 17,994 probes is additive, although non-additive genetic variation is identified for 960 transcripts. For 180 of the 960 transcripts with non-additive genetic variation, we identify expression quantitative trait loci (eQTL) with dominance effects in a sample of 339 unrelated individuals and replicate 31% of these associations in an independent sample of 139 unrelated individuals. Over-dominance was detected and replicated for a trans association between rs12313805 and ETV6, located 4MB apart on chromosome 12. Surprisingly, only 17 probes exhibit significant levels of common environmental effects, suggesting that environmental and lifestyle factors common to a family do not affect expression variation for most transcripts, at least those measured in blood. Consistent with the genetic architecture of common diseases, gene expression is predominantly additive, but a minority of transcripts display non-additive effects.

  8. Gene expression programming approach for the estimation of moisture ratio in herbal plants drying with vacuum heat pump dryer

    NASA Astrophysics Data System (ADS)

    Dikmen, Erkan; Ayaz, Mahir; Gül, Doğan; Şahin, Arzu Şencan

    2017-07-01

    The determination of drying behavior of herbal plants is a complex process. In this study, gene expression programming (GEP) model was used to determine drying behavior of herbal plants as fresh sweet basil, parsley and dill leaves. Time and drying temperatures are input parameters for the estimation of moisture ratio of herbal plants. The results of the GEP model are compared with experimental drying data. The statistical values as mean absolute percentage error, root-mean-squared error and R-square are used to calculate the difference between values predicted by the GEP model and the values actually observed from the experimental study. It was found that the results of the GEP model and experimental study are in moderately well agreement. The results have shown that the GEP model can be considered as an efficient modelling technique for the prediction of moisture ratio of herbal plants.

  9. A study of the role of the FOXP2 and CNTNAP2 genes in persistent developmental stuttering.

    PubMed

    Han, Tae-Un; Park, John; Domingues, Carlos F; Moretti-Ferreira, Danilo; Paris, Emily; Sainz, Eduardo; Gutierrez, Joanne; Drayna, Dennis

    2014-09-01

    A number of speech disorders including stuttering have been shown to have important genetic contributions, as indicated by high heritability estimates from twin and other studies. We studied the potential contribution to stuttering from variants in the FOXP2 gene, which have previously been associated with developmental verbal dyspraxia, and from variants in the CNTNAP2 gene, which have been associated with specific language impairment (SLI). DNA sequence analysis of these two genes in a group of 602 unrelated cases, all with familial persistent developmental stuttering, revealed no excess of potentially deleterious coding sequence variants in the cases compared to a matched group of 487 well characterized neurologically normal controls. This was compared to the distribution of variants in the GNPTAB, GNPTG, and NAGPA genes which have previously been associated with persistent stuttering. Using an expanded subject data set, we again found that NAGPA showed significantly different mutation frequencies in North Americans of European descent (p=0.0091) and a significant difference existed in the mutation frequency of GNPTAB in Brazilians (p=0.00050). No significant differences in mutation frequency in the FOXP2 and CNTNAP2 genes were observed between cases and controls. To examine the pattern of expression of these five genes in the human brain, real time quantitative reverse transcription PCR was performed on RNA purified from 27 different human brain regions. The expression patterns of FOXP2 and CNTNAP2 were generally different from those of GNPTAB, GNPTG and NAPGA in terms of relatively lower expression in the cerebellum. This study provides an improved estimate of the contribution of mutations in GNPTAB, GNPTG and NAGPA to persistent stuttering, and suggests that variants in FOXP2 and CNTNAP2 are not involved in the genesis of familial persistent stuttering. This, together with the different brain expression patterns of GNPTAB, GNPTG, and NAGPA compared to that of FOXP2 and CNTNAP2, suggests that the genetic neuropathological origins of stuttering differ from those of verbal dyspraxia and SLI. Published by Elsevier Inc.

  10. CRISPR/Cas9-mediated gene knockout screens and target identification via whole-genome sequencing uncover host genes required for picornavirus infection.

    PubMed

    Kim, Heon Seok; Lee, Kyungjin; Bae, Sangsu; Park, Jeongbin; Lee, Chong-Kyo; Kim, Meehyein; Kim, Eunji; Kim, Minju; Kim, Seokjoong; Kim, Chonsaeng; Kim, Jin-Soo

    2017-06-23

    Several groups have used genome-wide libraries of lentiviruses encoding small guide RNAs (sgRNAs) for genetic screens. In most cases, sgRNA expression cassettes are integrated into cells by using lentiviruses, and target genes are statistically estimated by the readout of sgRNA sequences after targeted sequencing. We present a new virus-free method for human gene knockout screens using a genome-wide library of CRISPR/Cas9 sgRNAs based on plasmids and target gene identification via whole-genome sequencing (WGS) confirmation of authentic mutations rather than statistical estimation through targeted amplicon sequencing. We used 30,840 pairs of individually synthesized oligonucleotides to construct the genome-scale sgRNA library, collectively targeting 10,280 human genes ( i.e. three sgRNAs per gene). These plasmid libraries were co-transfected with a Cas9-expression plasmid into human cells, which were then treated with cytotoxic drugs or viruses. Only cells lacking key factors essential for cytotoxic drug metabolism or viral infection were able to survive. Genomic DNA isolated from cells that survived these challenges was subjected to WGS to directly identify CRISPR/Cas9-mediated causal mutations essential for cell survival. With this approach, we were able to identify known and novel genes essential for viral infection in human cells. We propose that genome-wide sgRNA screens based on plasmids coupled with WGS are powerful tools for forward genetics studies and drug target discovery. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  11. miR-24-2 controls H2AFX expression regardless of gene copy number alteration and induces apoptosis by targeting antiapoptotic gene BCL-2: a potential for therapeutic intervention.

    PubMed

    Srivastava, Niloo; Manvati, Siddharth; Srivastava, Archita; Pal, Ranjana; Kalaiarasan, Ponnusamy; Chattopadhyay, Shilpi; Gochhait, Sailesh; Dua, Raina; Bamezai, Rameshwar N K

    2011-04-04

    New levels of gene regulation with microRNA (miR) and gene copy number alterations (CNAs) have been identified as playing a role in various cancers. We have previously reported that sporadic breast cancer tissues exhibit significant alteration in H2AX gene copy number. However, how CNA affects gene expression and what is the role of miR, miR-24-2, known to regulate H2AX expression, in the background of the change in copy number, are not known. Further, many miRs, including miR-24-2, are implicated as playing a role in cell proliferation and apoptosis, but their specific target genes and the pathways contributing to them remain unexplored. Changes in gene copy number and mRNA/miR expression were estimated using real-time polymerase chain reaction assays in two mammalian cell lines, MCF-7 and HeLa, and in a set of sporadic breast cancer tissues. In silico analysis was performed to find the putative target for miR-24-2. MCF-7 cells were transfected with precursor miR-24-2 oligonucleotides, and the gene expression levels of BRCA1, BRCA2, ATM, MDM2, TP53, CHEK2, CYT-C, BCL-2, H2AFX and P21 were examined using TaqMan gene expression assays. Apoptosis was measured by flow cytometric detection using annexin V dye. A luciferase assay was performed to confirm BCL-2 as a valid cellular target of miR-24-2. It was observed that H2AX gene expression was negatively correlated with miR-24-2 expression and not in accordance with the gene copy number status, both in cell lines and in sporadic breast tumor tissues. Further, the cells overexpressing miR-24-2 were observed to be hypersensitive to DNA damaging drugs, undergoing apoptotic cell death, suggesting the potentiating effect of mir-24-2-mediated apoptotic induction in human cancer cell lines treated with anticancer drugs. BCL-2 was identified as a novel cellular target of miR-24-2. mir-24-2 is capable of inducing apoptosis by modulating different apoptotic pathways and targeting BCL-2, an antiapoptotic gene. The study suggests that miR-24-2 is more effective in controlling H2AX gene expression, regardless of the change in gene copy number. Further, the study indicates that combination therapy with miR-24-2 along with an anticancer drug such as cisplatin could provide a new avenue in cancer therapy for patients with tumors otherwise resistant to drugs.

  12. Expression of acid-sensing ion channels and selection of reference genes in mouse and naked mole rat.

    PubMed

    Schuhmacher, Laura-Nadine; Smith, Ewan St John

    2016-12-13

    Acid-sensing ion channels (ASICs) are a family of ion channels comprised of six subunits encoded by four genes and they are expressed throughout the peripheral and central nervous systems. ASICs have been implicated in a wide range of physiological and pathophysiological processes: pain, breathing, synaptic plasticity and excitotoxicity. Unlike mice and humans, naked mole-rats do not perceive acid as a noxious stimulus, even though their sensory neurons express functional ASICs, likely an adaptation to living in a hypercapnic subterranean environment. Previous studies of ASIC expression in the mammalian nervous system have often not examined all subunits, or have failed to adequately quantify expression between tissues; to date there has been no attempt to determine ASIC expression in the central nervous system of the naked mole-rat. Here we perform a geNorm study to identify reliable housekeeping genes in both mouse and naked mole-rat and then use quantitative real-time PCR to estimate the relative amounts of ASIC transcripts in different tissues of both species. We identify RPL13A (ribosomal protein L13A) and CANX (calnexin), and β-ACTIN and EIF4A (eukaryotic initiation factor 4a) as being the most stably expressed housekeeping genes in mouse and naked mole-rat, respectively. In both species, ASIC3 was most highly expressed in dorsal root ganglia (DRG), and ASIC1a, ASIC2b and ASIC3 were more highly expressed across all brain regions compared to the other subunits. We also show that ASIC4, a proton-insensitive subunit of relatively unknown function, was highly expressed in all mouse tissues apart from DRG and hippocampus, but was by contrast the lowliest expressed ASIC in all naked mole-rat tissues.

  13. Transcriptome analysis of Schistosoma mansoni larval development using serial analysis of gene expression (SAGE).

    PubMed

    Taft, A S; Vermeire, J J; Bernier, J; Birkeland, S R; Cipriano, M J; Papa, A R; McArthur, A G; Yoshino, T P

    2009-04-01

    Infection of the snail, Biomphalaria glabrata, by the free-swimming miracidial stage of the human blood fluke, Schistosoma mansoni, and its subsequent development to the parasitic sporocyst stage is critical to establishment of viable infections and continued human transmission. We performed a genome-wide expression analysis of the S. mansoni miracidia and developing sporocyst using Long Serial Analysis of Gene Expression (LongSAGE). Five cDNA libraries were constructed from miracidia and in vitro cultured 6- and 20-day-old sporocysts maintained in sporocyst medium (SM) or in SM conditioned by previous cultivation with cells of the B. glabrata embryonic (Bge) cell line. We generated 21 440 SAGE tags and mapped 13 381 to the S. mansoni gene predictions (v4.0e) either by estimating theoretical 3' UTR lengths or using existing 3' EST sequence data. Overall, 432 transcripts were found to be differentially expressed amongst all 5 libraries. In total, 172 tags were differentially expressed between miracidia and 6-day conditioned sporocysts and 152 were differentially expressed between miracidia and 6-day unconditioned sporocysts. In addition, 53 and 45 tags, respectively, were differentially expressed in 6-day and 20-day cultured sporocysts, due to the effects of exposure to Bge cell-conditioned medium.

  14. Lateralized Feeding Behavior is Associated with Asymmetrical Neuroanatomy and Lateralized Gene Expressions in the Brain in Scale-Eating Cichlid Fish

    PubMed Central

    Lee, Hyuk Je; Schneider, Ralf F; Manousaki, Tereza; Kang, Ji Hyoun; Lein, Etienne; Franchini, Paolo

    2017-01-01

    Abstract Lateralized behavior (“handedness”) is unusual, but consistently found across diverse animal lineages, including humans. It is thought to reflect brain anatomical and/or functional asymmetries, but its neuro-molecular mechanisms remain largely unknown. Lake Tanganyika scale-eating cichlid fish, Perissodus microlepis show pronounced asymmetry in their jaw morphology as well as handedness in feeding behavior—biting scales preferentially only from one or the other side of their victims. This makes them an ideal model in which to investigate potential laterality in neuroanatomy and transcription in the brain in relation to behavioral handedness. After determining behavioral handedness in P. microlepis (preferred attack side), we estimated the volume of the hemispheres of brain regions and captured their gene expression profiles. Our analyses revealed that the degree of behavioral handedness is mirrored at the level of neuroanatomical asymmetry, particularly in the tectum opticum. Transcriptome analyses showed that different brain regions (tectum opticum, telencephalon, hypothalamus, and cerebellum) display distinct expression patterns, potentially reflecting their developmental interrelationships. For numerous genes in each brain region, their extent of expression differences between hemispheres was found to be correlated with the degree of behavioral lateralization. Interestingly, the tectum opticum and telencephalon showed divergent biases on the direction of up- or down-regulation of the laterality candidate genes (e.g., grm2) in the hemispheres, highlighting the connection of handedness with gene expression profiles and the different roles of these brain regions. Hence, handedness in predation behavior may be caused by asymmetric size of brain hemispheres and also by lateralized gene expressions in the brain. PMID:29069363

  15. Lateralized Feeding Behavior is Associated with Asymmetrical Neuroanatomy and Lateralized Gene Expressions in the Brain in Scale-Eating Cichlid Fish.

    PubMed

    Lee, Hyuk Je; Schneider, Ralf F; Manousaki, Tereza; Kang, Ji Hyoun; Lein, Etienne; Franchini, Paolo; Meyer, Axel

    2017-11-01

    Lateralized behavior ("handedness") is unusual, but consistently found across diverse animal lineages, including humans. It is thought to reflect brain anatomical and/or functional asymmetries, but its neuro-molecular mechanisms remain largely unknown. Lake Tanganyika scale-eating cichlid fish, Perissodus microlepis show pronounced asymmetry in their jaw morphology as well as handedness in feeding behavior-biting scales preferentially only from one or the other side of their victims. This makes them an ideal model in which to investigate potential laterality in neuroanatomy and transcription in the brain in relation to behavioral handedness. After determining behavioral handedness in P. microlepis (preferred attack side), we estimated the volume of the hemispheres of brain regions and captured their gene expression profiles. Our analyses revealed that the degree of behavioral handedness is mirrored at the level of neuroanatomical asymmetry, particularly in the tectum opticum. Transcriptome analyses showed that different brain regions (tectum opticum, telencephalon, hypothalamus, and cerebellum) display distinct expression patterns, potentially reflecting their developmental interrelationships. For numerous genes in each brain region, their extent of expression differences between hemispheres was found to be correlated with the degree of behavioral lateralization. Interestingly, the tectum opticum and telencephalon showed divergent biases on the direction of up- or down-regulation of the laterality candidate genes (e.g., grm2) in the hemispheres, highlighting the connection of handedness with gene expression profiles and the different roles of these brain regions. Hence, handedness in predation behavior may be caused by asymmetric size of brain hemispheres and also by lateralized gene expressions in the brain. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. Expression of SLCO transport genes in castration resistant prostate cancer and impact of genetic variation in SCLO1B3 and SLCO2B1 on prostate cancer outcomes

    PubMed Central

    Wright, Jonathan L; Kwon, Erika M; Ostrander, Elaine A; Montgomery, R Bruce; Lin, Daniel W; Vessella, Robert; Stanford, Janet L; Mostaghel, Elahe A

    2011-01-01

    Background Metastases from men with castration resistant prostate cancer (CRPC) harbor increased tumoral androgens vs. untreated prostate cancers (PCa). This may reflect steroid uptake by OATP/SLCO transporters. We evaluated SLCO gene expression in CRPC metastases and determined whether PCa outcomes are associated with single nucleotide polymorphisms (SNPs) in SLCO2B1 and SLCO1B3, transporters previously demonstrated to mediate androgen uptake. Methods Transcripts encoding 11 SLCO genes were analyzed in untreated PCa, and in metastatic CRPC tumors obtained by rapid autopsy. SNPs in SLCO2B1 and SLCO1B3 were genotyped in a population-based cohort of 1,309 Caucasian PCa patients. Median survival follow-up was 7.0 years (0.77–16.4). The risk of PCa recurrence/progression and PCa-specific mortality (PCSM) was estimated with Cox proportional hazards analysis. Results Six SLCO genes were highly expressed in CRPC metastases vs. untreated PCa, including SLCO1B3 (3.6 fold, p=0.0517) and SLCO2B1 (5.5 fold, p=0.0034). Carriers of the variant alleles SLCO2B1 SNP rs12422149 (HR 1.99, 95% CI 1.11 – 3.55) or SLCO1B3 SNP rs4149117 (HR 1.76, 95% CI 1.00 – 3.08) had an increased risk of PCSM. Conclusions CRPC metastases demonstrate increased expression of SLCO genes vs. primary PCa. Genetic variants of SLCO1B3 and SLCO2B1 are associated with PCSM. Expression and genetic variation of SLCO genes which alter androgen uptake may be important in PCa outcomes. Impact OATP/SLCO genes may be potential biomarkers for assessing risk of prostate cancer-specific mortality. Expression and genetic variation in these genes may allow stratification of patients to more aggressive hormonal therapy or earlier incorporation of non-hormonal based treatment strategies. PMID:21266523

  17. The alternative oxidase family of Vitis vinifera reveals an attractive model to study the importance of genomic design.

    PubMed

    Costa, José Hélio; de Melo, Dirce Fernandes; Gouveia, Zélia; Cardoso, Hélia Guerra; Peixe, Augusto; Arnholdt-Schmitt, Birgit

    2009-12-01

    'Genomic design' refers to the structural organization of gene sequences. Recently, the role of intron sequences for gene regulation is being better understood. Further, introns possess high rates of polymorphism that are considered as the major source for speciation. In molecular breeding, the length of gene-specific introns is recognized as a tool to discriminate genotypes with diverse traits of agronomic interest. 'Economy selection' and 'time-economy selection' have been proposed as models for explaining why highly expressed genes typically contain small introns. However, in contrast to these theories, plant-specific selection reveals that highly expressed genes contain introns that are large. In the presented research, 'wet'Aox gene identification from grapevine is advanced by a bioinformatics approach to study the species-specific organization of Aox gene structures in relation to available expressed sequence tag (EST) data. Two Aox1 and one Aox2 gene sequences have been identified in Vitis vinifera using grapevine cultivars from Portugal and Germany. Searching the complete genome sequence data of two grapevine cultivars confirmed that V. vinifera alternative oxidase (Aox) is encoded by a small multigene family composed of Aox1a, Aox1b and Aox2. An analysis of EST distribution revealed high expression of the VvAox2 gene. A relationship between the atypical long primary transcript of VvAox2 (in comparison to other plant Aox genes) and its expression level is suggested. V. vinifera Aox genes contain four exons interrupted by three introns except for Aox1a which contains an additional intron in the 3'-UTR. The lengths of primary Aox transcripts were estimated for each gene in two V. vinifera varieties: PN40024 and Pinot Noir. In both varieties, Aox1a and Aox1b contained small introns that corresponded to primary transcript lengths ranging from 1501 to 1810 bp. The Aox2 of PN40024 (12 329 bp) was longer than that from Pinot Noir (7279 bp) because of selection against a transposable-element insertion that is 5028 bp in size. An EST database basic local alignment search tool (BLAST) search of GenBank revealed the following ESTs percentages for each gene: Aox1a (26.2%), Aox1b (11.9%) and Aox2 (61.9%). Aox1a was expressed in fruits and roots, Aox1b expression was confined to flowers and Aox2 was ubiquitously expressed. These data for V. vinifera show that atypically long Aox intron lengths are related to high levels of gene expression. Furthermore, it is shown for the first time that two grapevine cultivars can be distinguished by Aox intron length polymorphism.

  18. Developmental Transcriptomic Features of the Carcinogenic Liver Fluke, Clonorchis sinensis

    PubMed Central

    Cho, Pyo Yun; Kim, Tae Im; Cho, Shin-Hyeong; Choi, Sang-Haeng; Park, Hong-Seog; Kim, Tong-Soo; Hong, Sung-Jong

    2011-01-01

    Clonorchis sinensis is the causative agent of the life-threatening disease endemic to China, Korea, and Vietnam. It is estimated that about 15 million people are infected with this fluke. C. sinensis provokes inflammation, epithelial hyperplasia, and periductal fibrosis in bile ducts, and may cause cholangiocarcinoma in chronically infected individuals. Accumulation of a large amount of biological information about the adult stage of this liver fluke in recent years has advanced our understanding of the pathological interplay between this parasite and its hosts. However, no developmental gene expression profiles of C. sinensis have been published. In this study, we generated gene expression profiles of three developmental stages of C. sinensis by analyzing expressed sequence tags (ESTs). Complementary DNA libraries were constructed from the adult, metacercaria, and egg developmental stages of C. sinensis. A total of 52,745 ESTs were generated and assembled into 12,830 C. sinensis assembled EST sequences, and then these assemblies were further categorized into groups according to biological functions and developmental stages. Most of the genes that were differentially expressed in the different stages were consistent with the biological and physical features of the particular developmental stage; high energy metabolism, motility and reproduction genes were differentially expressed in adults, minimal metabolism and final host adaptation genes were differentially expressed in metacercariae, and embryonic genes were differentially expressed in eggs. The higher expression of glucose transporters, proteases, and antioxidant enzymes in the adults accounts for active uptake of nutrients and defense against host immune attacks. The types of ion channels present in C. sinensis are consistent with its parasitic nature and phylogenetic placement in the tree of life. We anticipate that the transcriptomic information on essential regulators of development, bile chemotaxis, and physico-metabolic pathways in C. sinensis that presented in this study will guide further studies to identify novel drug targets and diagnostic antigens. PMID:21738807

  19. Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

    PubMed

    Fan, Yue; Wang, Xiao; Peng, Qinke

    2017-01-01

    Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

  20. MANTIS: a phylogenetic framework for multi-species genome comparisons.

    PubMed

    Tzika, Athanasia C; Helaers, Raphaël; Van de Peer, Yves; Milinkovitch, Michel C

    2008-01-15

    Practitioners of comparative genomics face huge analytical challenges as whole genome sequences and functional/expression data accumulate. Furthermore, the field would greatly benefit from a better integration of this wealth of data with evolutionary concepts. Here, we present MANTIS, a relational database for the analysis of (i) gains and losses of genes on specific branches of the metazoan phylogeny, (ii) reconstructed genome content of ancestral species and (iii) over- or under-representation of functions/processes and tissue specificity of gained, duplicated and lost genes. MANTIS estimates the most likely positions of gene losses on the true phylogeny using a maximum-likelihood function. A user-friendly interface and an extensive query system allow to investigate questions pertaining to gene identity, phylogenetic mapping and function/expression parameters. MANTIS is freely available at http://www.mantisdb.org and constitutes the missing link between multi-species genome comparisons and functional analyses.

  1. Transcriptional changes induced by candidate malaria vaccines and correlation with protection against malaria in a human challenge model

    PubMed Central

    Dunachie, Susanna; Berthoud, Tamara; Hill, Adrian V.S.; Fletcher, Helen A.

    2015-01-01

    Introduction The complexity of immunity to malaria is well known, and clear correlates of protection against malaria have not been established. A better understanding of immune markers induced by candidate malaria vaccines would greatly enhance vaccine development, immunogenicity monitoring and estimation of vaccine efficacy in the field. We have previously reported complete or partial efficacy against experimental sporozoite challenge by several vaccine regimens in healthy malaria-naïve subjects in Oxford. These include a prime-boost regimen with RTS,S/AS02A and modified vaccinia virus Ankara (MVA) expressing the CSP antigen, and a DNA-prime, MVA-boost regimen expressing the ME TRAP antigens. Using samples from these trials we performed transcriptional profiling, allowing a global assessment of responses to vaccination. Methods We used Human RefSeq8 Bead Chips from Illumina to examine gene expression using PBMC (peripheral blood mononuclear cells) from 16 human volunteers. To focus on antigen-specific changes, comparisons were made between PBMC stimulated with CSP or TRAP peptide pools and unstimulated PBMC post vaccination. We then correlated gene expression with protection against malaria in a human Plasmodium falciparum malaria challenge model. Results Differentially expressed genes induced by both vaccine regimens were predominantly in the IFN-γ pathway. Gene set enrichment analysis revealed antigen-specific effects on genes associated with IFN induction and proteasome modules after vaccination. Genes associated with IFN induction and antigen presentation modules were positively enriched in subjects with complete protection from malaria challenge, while genes associated with haemopoietic stem cells, regulatory monocytes and the myeloid lineage modules were negatively enriched in protected subjects. Conclusions These results represent novel insights into the immune repertoires involved in malaria vaccination. PMID:26256523

  2. Transcriptional changes induced by candidate malaria vaccines and correlation with protection against malaria in a human challenge model.

    PubMed

    Dunachie, Susanna; Berthoud, Tamara; Hill, Adrian V S; Fletcher, Helen A

    2015-09-29

    The complexity of immunity to malaria is well known, and clear correlates of protection against malaria have not been established. A better understanding of immune markers induced by candidate malaria vaccines would greatly enhance vaccine development, immunogenicity monitoring and estimation of vaccine efficacy in the field. We have previously reported complete or partial efficacy against experimental sporozoite challenge by several vaccine regimens in healthy malaria-naïve subjects in Oxford. These include a prime-boost regimen with RTS,S/AS02A and modified vaccinia virus Ankara (MVA) expressing the CSP antigen, and a DNA-prime, MVA-boost regimen expressing the ME TRAP antigens. Using samples from these trials we performed transcriptional profiling, allowing a global assessment of responses to vaccination. We used Human RefSeq8 Bead Chips from Illumina to examine gene expression using PBMC (peripheral blood mononuclear cells) from 16 human volunteers. To focus on antigen-specific changes, comparisons were made between PBMC stimulated with CSP or TRAP peptide pools and unstimulated PBMC post vaccination. We then correlated gene expression with protection against malaria in a human Plasmodium falciparum malaria challenge model. Differentially expressed genes induced by both vaccine regimens were predominantly in the IFN-γ pathway. Gene set enrichment analysis revealed antigen-specific effects on genes associated with IFN induction and proteasome modules after vaccination. Genes associated with IFN induction and antigen presentation modules were positively enriched in subjects with complete protection from malaria challenge, while genes associated with haemopoietic stem cells, regulatory monocytes and the myeloid lineage modules were negatively enriched in protected subjects. These results represent novel insights into the immune repertoires involved in malaria vaccination. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. A tetO Toolkit To Alter Expression of Genes in Saccharomyces cerevisiae.

    PubMed

    Cuperus, Josh T; Lo, Russell S; Shumaker, Lucia; Proctor, Julia; Fields, Stanley

    2015-07-17

    Strategies to optimize a metabolic pathway often involve building a large collection of strains, each containing different versions of sequences that regulate the expression of pathway genes. Here, we develop reagents and methods to carry out this process at high efficiency in the yeast Saccharomyces cerevisiae. We identify variants of the Escherichia coli tet operator (tetO) sequence that bind a TetR-VP16 activator with differential affinity and therefore result in different TetR-VP16 activator-driven expression. By recombining these variants upstream of the genes of a pathway, we generate unique combinations of expression levels. Here, we built a tetO toolkit, which includes the I-OnuI homing endonuclease to create double-strand breaks, which increases homologous recombination by 10(5); a plasmid carrying six variant tetO sequences flanked by I-OnuI sites, uncoupling transformation and recombination steps; an S. cerevisiae-optimized TetR-VP16 activator; and a vector to integrate constructs into the yeast genome. We introduce into the S. cerevisiae genome the three crt genes from Erwinia herbicola required for yeast to synthesize lycopene and carry out the recombination process to produce a population of cells with permutations of tetO variants regulating the three genes. We identify 0.7% of this population as making detectable lycopene, of which the vast majority have undergone recombination at all three crt genes. We estimate a rate of ∼20% recombination per targeted site, much higher than that obtained in other studies. Application of this toolkit to medically or industrially important end products could reduce the time and labor required to optimize the expression of a set of metabolic genes.

  4. Physiologically Shrinking the Solution Space of a Saccharomyces cerevisiae Genome-Scale Model Suggests the Role of the Metabolic Network in Shaping Gene Expression Noise.

    PubMed

    Chi, Baofang; Tao, Shiheng; Liu, Yanlin

    2015-01-01

    Sampling the solution space of genome-scale models is generally conducted to determine the feasible region for metabolic flux distribution. Because the region for actual metabolic states resides only in a small fraction of the entire space, it is necessary to shrink the solution space to improve the predictive power of a model. A common strategy is to constrain models by integrating extra datasets such as high-throughput datasets and C13-labeled flux datasets. However, studies refining these approaches by performing a meta-analysis of massive experimental metabolic flux measurements, which are closely linked to cellular phenotypes, are limited. In the present study, experimentally identified metabolic flux data from 96 published reports were systematically reviewed. Several strong associations among metabolic flux phenotypes were observed. These phenotype-phenotype associations at the flux level were quantified and integrated into a Saccharomyces cerevisiae genome-scale model as extra physiological constraints. By sampling the shrunken solution space of the model, the metabolic flux fluctuation level, which is an intrinsic trait of metabolic reactions determined by the network, was estimated and utilized to explore its relationship to gene expression noise. Although no correlation was observed in all enzyme-coding genes, a relationship between metabolic flux fluctuation and expression noise of genes associated with enzyme-dosage sensitive reactions was detected, suggesting that the metabolic network plays a role in shaping gene expression noise. Such correlation was mainly attributed to the genes corresponding to non-essential reactions, rather than essential ones. This was at least partially, due to regulations underlying the flux phenotype-phenotype associations. Altogether, this study proposes a new approach in shrinking the solution space of a genome-scale model, of which sampling provides new insights into gene expression noise.

  5. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease.

    PubMed

    Marigorta, Urko M; Denson, Lee A; Hyams, Jeffrey S; Mondal, Kajari; Prince, Jarod; Walters, Thomas D; Griffiths, Anne; Noe, Joshua D; Crandall, Wallace V; Rosh, Joel R; Mack, David R; Kellermayer, Richard; Heyman, Melvin B; Baker, Susan S; Stephens, Michael C; Baldassano, Robert N; Markowitz, James F; Kim, Mi-Ok; Dubinsky, Marla C; Cho, Judy; Aronow, Bruce J; Kugathasan, Subra; Gibson, Greg

    2017-10-01

    Gene expression profiling can be used to uncover the mechanisms by which loci identified through genome-wide association studies (GWAS) contribute to pathology. Given that most GWAS hits are in putative regulatory regions and transcript abundance is physiologically closer to the phenotype of interest, we hypothesized that summation of risk-allele-associated gene expression, namely a transcriptional risk score (TRS), should provide accurate estimates of disease risk. We integrate summary-level GWAS and expression quantitative trait locus (eQTL) data with RNA-seq data from the RISK study, an inception cohort of pediatric Crohn's disease. We show that TRSs based on genes regulated by variants linked to inflammatory bowel disease (IBD) not only outperform genetic risk scores (GRSs) in distinguishing Crohn's disease from healthy samples, but also serve to identify patients who in time will progress to complicated disease. Our dissection of eQTL effects may be used to distinguish genes whose association with disease is through promotion versus protection, thereby linking statistical association to biological mechanism. The TRS approach constitutes a potential strategy for personalized medicine that enhances inference from static genotypic risk assessment.

  6. Functional modules by relating protein interaction networks and gene expression.

    PubMed

    Tornow, Sabine; Mewes, H W

    2003-11-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships.

  7. Functional modules by relating protein interaction networks and gene expression

    PubMed Central

    Tornow, Sabine; Mewes, H. W.

    2003-01-01

    Genes and proteins are organized on the basis of their particular mutual relations or according to their interactions in cellular and genetic networks. These include metabolic or signaling pathways and protein interaction, regulatory or co-expression networks. Integrating the information from the different types of networks may lead to the notion of a functional network and functional modules. To find these modules, we propose a new technique which is based on collective, multi-body correlations in a genetic network. We calculated the correlation strength of a group of genes (e.g. in the co-expression network) which were identified as members of a module in a different network (e.g. in the protein interaction network) and estimated the probability that this correlation strength was found by chance. Groups of genes with a significant correlation strength in different networks have a high probability that they perform the same function. Here, we propose evaluating the multi-body correlations by applying the superparamagnetic approach. We compare our method to the presently applied mean Pearson correlations and show that our method is more sensitive in revealing functional relationships. PMID:14576317

  8. Low copy number of the salivary amylase gene predisposes to obesity.

    PubMed

    Falchi, Mario; El-Sayed Moustafa, Julia Sarah; Takousis, Petros; Pesce, Francesco; Bonnefond, Amélie; Andersson-Assarsson, Johanna C; Sudmant, Peter H; Dorajoo, Rajkumar; Al-Shafai, Mashael Nedham; Bottolo, Leonardo; Ozdemir, Erdal; So, Hon-Cheong; Davies, Robert W; Patrice, Alexandre; Dent, Robert; Mangino, Massimo; Hysi, Pirro G; Dechaume, Aurélie; Huyvaert, Marlène; Skinner, Jane; Pigeyre, Marie; Caiazzo, Robert; Raverdy, Violeta; Vaillant, Emmanuel; Field, Sarah; Balkau, Beverley; Marre, Michel; Visvikis-Siest, Sophie; Weill, Jacques; Poulain-Godefroy, Odile; Jacobson, Peter; Sjostrom, Lars; Hammond, Christopher J; Deloukas, Panos; Sham, Pak Chung; McPherson, Ruth; Lee, Jeannette; Tai, E Shyong; Sladek, Robert; Carlsson, Lena M S; Walley, Andrew; Eichler, Evan E; Pattou, Francois; Spector, Timothy D; Froguel, Philippe

    2014-05-01

    Common multi-allelic copy number variants (CNVs) appear enriched for phenotypic associations compared to their biallelic counterparts. Here we investigated the influence of gene dosage effects on adiposity through a CNV association study of gene expression levels in adipose tissue. We identified significant association of a multi-allelic CNV encompassing the salivary amylase gene (AMY1) with body mass index (BMI) and obesity, and we replicated this finding in 6,200 subjects. Increased AMY1 copy number was positively associated with both amylase gene expression (P = 2.31 × 10(-14)) and serum enzyme levels (P < 2.20 × 10(-16)), whereas reduced AMY1 copy number was associated with increased BMI (change in BMI per estimated copy = -0.15 (0.02) kg/m(2); P = 6.93 × 10(-10)) and obesity risk (odds ratio (OR) per estimated copy = 1.19, 95% confidence interval (CI) = 1.13-1.26; P = 1.46 × 10(-10)). The OR value of 1.19 per copy of AMY1 translates into about an eightfold difference in risk of obesity between subjects in the top (copy number > 9) and bottom (copy number < 4) 10% of the copy number distribution. Our study provides a first genetic link between carbohydrate metabolism and BMI and demonstrates the power of integrated genomic approaches beyond genome-wide association studies.

  9. Co-expression networks reveal the tissue-specific regulation of transcription and splicing.

    PubMed

    Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D H; Jo, Brian; Gao, Chuan; McDowell, Ian C; Engelhardt, Barbara E; Battle, Alexis

    2017-11-01

    Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues. © 2017 Saha et al.; Published by Cold Spring Harbor Laboratory Press.

  10. A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

    PubMed

    Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

    2017-08-30

    Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.

  11. Effects of Particulate Matter on Genomic DNA Methylation Content and iNOS Promoter Methylation

    PubMed Central

    Tarantini, Letizia; Bonzini, Matteo; Apostoli, Pietro; Pegoraro, Valeria; Bollati, Valentina; Marinelli, Barbara; Cantone, Laura; Rizzo, Giovanna; Hou, Lifang; Schwartz, Joel; Bertazzi, Pier Alberto; Baccarelli, Andrea

    2009-01-01

    Background Altered patterns of gene expression mediate the effects of particulate matter (PM) on human health, but mechanisms through which PM modifies gene expression are largely undetermined. Objectives We aimed at identifying short- and long-term effects of PM exposure on DNA methylation, a major genomic mechanism of gene expression control, in workers in an electric furnace steel plant with well-characterized exposure to PM with aerodynamic diameters < 10 μm (PM10). Methods We measured global genomic DNA methylation content estimated in Alu and long interspersed nuclear element-1 (LINE-1) repeated elements, and promoter DNA methylation of iNOS (inducible nitric oxide synthase), a gene suppressed by DNA methylation and induced by PM exposure in blood leukocytes. Quantitative DNA methylation analysis was performed through bisulfite PCR pyrosequencing on blood DNA obtained from 63 workers on the first day of a work week (baseline, after 2 days off work) and after 3 days of work (postexposure). Individual PM10 exposure was between 73.4 and 1,220 μg/m3. Results Global methylation content estimated in Alu and LINE-1 repeated elements did not show changes in postexposure measures compared with baseline. PM10 exposure levels were negatively associated with methylation in both Alu [β = −0.19 %5-methylcytosine (%5mC); p = 0.04] and LINE-1 [β = −0.34 %5mC; p = 0.04], likely reflecting long-term PM10 effects. iNOS promoter DNA methylation was significantly lower in postexposure blood samples compared with baseline (difference = −0.61 %5mC; p = 0.02). Conclusions We observed changes in global and gene specific methylation that should be further characterized in future investigations on the effects of PM. PMID:19270791

  12. Transcriptome analysis of genes and gene networks involved in aggressive behavior in mouse and zebrafish.

    PubMed

    Malki, Karim; Du Rietz, Ebba; Crusio, Wim E; Pain, Oliver; Paya-Cano, Jose; Karadaghi, Rezhaw L; Sluyter, Frans; de Boer, Sietse F; Sandnabba, Kenneth; Schalkwyk, Leonard C; Asherson, Philip; Tosto, Maria Grazia

    2016-09-01

    Despite moderate heritability estimates, the molecular architecture of aggressive behavior remains poorly characterized. This study compared gene expression profiles from a genetic mouse model of aggression with zebrafish, an animal model traditionally used to study aggression. A meta-analytic, cross-species approach was used to identify genomic variants associated with aggressive behavior. The Rankprod algorithm was used to evaluated mRNA differences from prefrontal cortex tissues of three sets of mouse lines (N = 18) selectively bred for low and high aggressive behavior (SAL/LAL, TA/TNA, and NC900/NC100). The same approach was used to evaluate mRNA differences in zebrafish (N = 12) exposed to aggressive or non-aggressive social encounters. Results were compared to uncover genes consistently implicated in aggression across both studies. Seventy-six genes were differentially expressed (PFP < 0.05) in aggressive compared to non-aggressive mice. Seventy genes were differentially expressed in zebrafish exposed to a fight encounter compared to isolated zebrafish. Seven genes (Fos, Dusp1, Hdac4, Ier2, Bdnf, Btg2, and Nr4a1) were differentially expressed across both species 5 of which belonging to a gene-network centred on the c-Fos gene hub. Network analysis revealed an association with the MAPK signaling cascade. In human studies HDAC4 haploinsufficiency is a key genetic mechanism associated with brachydactyly mental retardation syndrome (BDMR), which is associated with aggressive behaviors. Moreover, the HDAC4 receptor is a drug target for valproic acid, which is being employed as an effective pharmacological treatment for aggressive behavior in geriatric, psychiatric, and brain-injury patients. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  13. Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

    PubMed

    Chakraborty, Sutirtha

    2018-05-26

    RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. Copyright © 2017. Published by Elsevier Inc.

  14. Changes in ABA and gene expression in cold-acclimated sugar maple.

    PubMed

    Bertrand, A; Robitaille, G; Castonguay, Y; Nadeau, P; Boutin, R

    1997-01-01

    To determine if cold acclimation of sugar maple (Acer saccharum Marsh.) is associated with specific changes in gene expression under natural hardening conditions, we compared bud and root translatable mRNAs of potted maple seedlings after cold acclimation under natural conditions and following spring dehardening. Cold-hardened roots and buds were sampled in January when tissues reached their maximum hardiness. Freezing tolerance, expressed as the lethal temperature for 50% of the tissues (LT(50)), was estimated at -17 degrees C for roots, and at lower than -36 degrees C for buds. Approximately ten transcripts were specifically synthesized in cold-acclimated buds, or were more abundant in cold-acclimated buds than in unhardened buds. Cold hardening was also associated with changes in translation. At least five translation products were more abundant in cold-acclimated buds and roots compared with unhardened tissues. Abscisic acid (ABA) concentration increased approximately tenfold in the xylem sap following winter acclimation, and the maximum concentration was reached just before maximal acclimation. We discuss the potential involvement of ABA in the observed modification of gene expression during cold hardening.

  15. Improving wood properties for wood utilization through multi-omics integration in lignin biosynthesis

    DOE PAGES

    Wang, Jack P.; Matthews, Megan L.; Williams, Cranos M.; ...

    2018-04-20

    A multi-omics quantitative integrative analysis of lignin biosynthesis can advance the strategic engineering of wood for timber, pulp, and biofuels. Lignin is polymerized from three monomers (monolignols) produced by a grid-like pathway. The pathway in wood formation of Populus trichocarpa has at least 21 genes, encoding enzymes that mediate 37 reactions on 24 metabolites, leading to lignin and affecting wood properties. We perturb these 21 pathway genes and integrate transcriptomic, proteomic, fluxomic and phenomic data from 221 lines selected from ~2000 transgenics (6-month-old). The integrative analysis estimates how changing expression of pathway gene or gene combination affects protein abundance, metabolic-flux,more » metabolite concentrations, and 25 wood traits, including lignin, tree-growth, density, strength, and saccharification. The analysis then predicts improvements in any of these 25 traits individually or in combinations, through engineering expression of specific monolignol genes. The analysis may lead to greater understanding of other pathways for improved growth and adaptation.« less

  16. Improving wood properties for wood utilization through multi-omics integration in lignin biosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Jack P.; Matthews, Megan L.; Williams, Cranos M.

    A multi-omics quantitative integrative analysis of lignin biosynthesis can advance the strategic engineering of wood for timber, pulp, and biofuels. Lignin is polymerized from three monomers (monolignols) produced by a grid-like pathway. The pathway in wood formation of Populus trichocarpa has at least 21 genes, encoding enzymes that mediate 37 reactions on 24 metabolites, leading to lignin and affecting wood properties. We perturb these 21 pathway genes and integrate transcriptomic, proteomic, fluxomic and phenomic data from 221 lines selected from ~2000 transgenics (6-month-old). The integrative analysis estimates how changing expression of pathway gene or gene combination affects protein abundance, metabolic-flux,more » metabolite concentrations, and 25 wood traits, including lignin, tree-growth, density, strength, and saccharification. The analysis then predicts improvements in any of these 25 traits individually or in combinations, through engineering expression of specific monolignol genes. The analysis may lead to greater understanding of other pathways for improved growth and adaptation.« less

  17. PCR-free Quantification of Multiple Splice Variants in Cancer Gene by Surface Enhanced Raman Spectroscopy

    PubMed Central

    Sun, Lan; Irudayaraj, Joseph

    2009-01-01

    We demonstrate a surface enhanced Raman spectroscopy (SERS) based array platform to monitor gene expression in cancer cells in a multiplex and quantitative format without amplification steps. A strategy comprising of DNA/RNA hybridization, S1 nuclease digestion, and alkaline hydrolysis was adopted to obtain DNA targets specific to two splice junction variants Δ(9, 10) and Δ(5) of the breast cancer susceptibility gene 1 (BRCA1) from MCF-7 and MDA-MB-231 breast cancer cell lines. These two targets were identified simultaneously and their absolute quantities were estimated by a SERS strategy utilizing the inherent plasmon-phonon Raman mode of gold nanoparticle probes as a self-referencing standard to correct for variability in surface enhancement. Results were then validated by reverse transcription PCR (RT-PCR). Our proposed methodology could be expanded to a higher level of multiplexing for quantitative gene expression analysis of any gene without any amplification steps. PMID:19780515

  18. Improving wood properties for wood utilization through multi-omics integration in lignin biosynthesis.

    PubMed

    Wang, Jack P; Matthews, Megan L; Williams, Cranos M; Shi, Rui; Yang, Chenmin; Tunlaya-Anukit, Sermsawat; Chen, Hsi-Chuan; Li, Quanzi; Liu, Jie; Lin, Chien-Yuan; Naik, Punith; Sun, Ying-Hsuan; Loziuk, Philip L; Yeh, Ting-Feng; Kim, Hoon; Gjersing, Erica; Shollenberger, Todd; Shuford, Christopher M; Song, Jina; Miller, Zachary; Huang, Yung-Yun; Edmunds, Charles W; Liu, Baoguang; Sun, Yi; Lin, Ying-Chung Jimmy; Li, Wei; Chen, Hao; Peszlen, Ilona; Ducoste, Joel J; Ralph, John; Chang, Hou-Min; Muddiman, David C; Davis, Mark F; Smith, Chris; Isik, Fikret; Sederoff, Ronald; Chiang, Vincent L

    2018-04-20

    A multi-omics quantitative integrative analysis of lignin biosynthesis can advance the strategic engineering of wood for timber, pulp, and biofuels. Lignin is polymerized from three monomers (monolignols) produced by a grid-like pathway. The pathway in wood formation of Populus trichocarpa has at least 21 genes, encoding enzymes that mediate 37 reactions on 24 metabolites, leading to lignin and affecting wood properties. We perturb these 21 pathway genes and integrate transcriptomic, proteomic, fluxomic and phenomic data from 221 lines selected from ~2000 transgenics (6-month-old). The integrative analysis estimates how changing expression of pathway gene or gene combination affects protein abundance, metabolic-flux, metabolite concentrations, and 25 wood traits, including lignin, tree-growth, density, strength, and saccharification. The analysis then predicts improvements in any of these 25 traits individually or in combinations, through engineering expression of specific monolignol genes. The analysis may lead to greater understanding of other pathways for improved growth and adaptation.

  19. Genes and gene networks implicated in aggression related behaviour.

    PubMed

    Malki, Karim; Pain, Oliver; Du Rietz, Ebba; Tosto, Maria Grazia; Paya-Cano, Jose; Sandnabba, Kenneth N; de Boer, Sietse; Schalkwyk, Leonard C; Sluyter, Frans

    2014-10-01

    Aggressive behaviour is a major cause of mortality and morbidity. Despite of moderate heritability estimates, progress in identifying the genetic factors underlying aggressive behaviour has been limited. There are currently three genetic mouse models of high and low aggression created using selective breeding. This is the first study to offer a global transcriptomic characterization of the prefrontal cortex across all three genetic mouse models of aggression. A systems biology approach has been applied to transcriptomic data across the three pairs of selected inbred mouse strains (Turku Aggressive (TA) and Turku Non-Aggressive (TNA), Short Attack Latency (SAL) and Long Attack Latency (LAL) mice and North Carolina Aggressive (NC900) and North Carolina Non-Aggressive (NC100)), providing novel insight into the neurobiological mechanisms and genetics underlying aggression. First, weighted gene co-expression network analysis (WGCNA) was performed to identify modules of highly correlated genes associated with aggression. Probe sets belonging to gene modules uncovered by WGCNA were carried forward for network analysis using ingenuity pathway analysis (IPA). The RankProd non-parametric algorithm was then used to statistically evaluate expression differences across the genes belonging to modules significantly associated with aggression. IPA uncovered two pathways, involving NF-kB and MAPKs. The secondary RankProd analysis yielded 14 differentially expressed genes, some of which have previously been implicated in pathways associated with aggressive behaviour, such as Adrbk2. The results highlighted plausible candidate genes and gene networks implicated in aggression-related behaviour.

  20. Estimating the potential refolding yield of recombinant proteins expressed as inclusion bodies.

    PubMed

    Ho, Jason G S; Middelberg, Anton P J

    2004-09-05

    Recombinant protein production in bacteria is efficient except that insoluble inclusion bodies form when some gene sequences are expressed. Such proteins must undergo renaturation, which is an inefficient process due to protein aggregation on dilution from concentrated denaturant. In this study, the protein-protein interactions of eight distinct inclusion-body proteins are quantified, in different solution conditions, by measurement of protein second virial coefficients (SVCs). Protein solubility is shown to decrease as the SVC is reduced (i.e., as protein interactions become more attractive). Plots of SVC versus denaturant concentration demonstrate two clear groupings of proteins: a more aggregative group and a group having higher SVC and better solubility. A correlation of the measured SVC with protein molecular weight and hydropathicity, that is able to predict which group each of the eight proteins falls into, is presented. The inclusion of additives known to inhibit aggregation during renaturation improves solubility and increases the SVC of both protein groups. Furthermore, an estimate of maximum refolding yield (or solubility) using high-performance liquid chromatography was obtained for each protein tested, under different environmental conditions, enabling a relationship between "yield" and SVC to be demonstrated. Combined, the results enable an approximate estimation of the maximum refolding yield that is attainable for each of the eight proteins examined, under a selected chemical environment. Although the correlations must be tested with a far larger set of protein sequences, this work represents a significant move beyond empirical approaches for optimizing renaturation conditions. The approach moves toward the ideal of predicting maximum refolding yield using simple bioinformatic metrics that can be estimated from the gene sequence. Such a capability could potentially "screen," in silico, those sequences suitable for expression in bacteria from those that must be expressed in more complex hosts.

  1. Systematic correlation of environmental exposure and physiological and self-reported behaviour factors with leukocyte telomere length.

    PubMed

    Patel, Chirag J; Manrai, Arjun K; Corona, Erik; Kohane, Isaac S

    2017-02-01

    It is hypothesized that environmental exposures and behaviour influence telomere length, an indicator of cellular ageing. We systematically associated 461 indicators of environmental exposures, physiology and self-reported behaviour with telomere length in data from the US National Health and Nutrition Examination Survey (NHANES) in 1999-2002. Further, we tested whether factors identified in the NHANES participants are also correlated with gene expression of telomere length modifying genes. We correlated 461 environmental exposures, behaviours and clinical variables with telomere length, using survey-weighted linear regression, adjusting for sex, age, age squared, race/ethnicity, poverty level, education and born outside the USA, and estimated the false discovery rate to adjust for multiple hypotheses. We conducted a secondary analysis to investigate the correlation between identified environmental variables and gene expression levels of telomere-associated genes in publicly available gene expression samples. After correlating 461 variables with telomere length, we found 22 variables significantly associated with telomere length after adjustment for multiple hypotheses. Of these varaibales, 14 were associated with longer telomeres, including biomarkers of polychlorinated biphenyls([PCBs; 0.1 to 0.2 standard deviation (SD) increase for 1 SD increase in PCB level, P  < 0.002] and a form of vitamin A, retinyl stearate. Eight variables associated with shorter telomeres, including biomarkers of cadmium, C-reactive protein and lack of physical activity. We could not conclude that PCBs are correlated with gene expression of telomere-associated genes. Both environmental exposures and chronic disease-related risk factors may play a role in telomere length. Our secondary analysis found no evidence of association between PCBs/smoking and gene expression of telomere-associated genes. All correlations between exposures, behaviours and clinical factors and changes in telomere length will require further investigation regarding biological influence of exposure. © The Author 2016. Published by Oxford University Press on behalf of the International Epidemiological Association

  2. Particle Radiation signals the Expression of Genes in stress-associated Pathways

    NASA Astrophysics Data System (ADS)

    Blakely, E.; Chang, P.; Bjornstad, K.; Dosanjh, M.; Cherbonnel, C.; Rosen, C.

    The explosive development of microarray screening methods has propelled genome research in a variety of biological systems allowing investigators to examine large-scale alterations in gene expression for research in toxicology pathology and therapy The radiation environment in space is complex and encompasses a variety of highly energetic and charged particles Estimation of biological responses after exposure to these types of radiation is important for NASA in their plans for long-term manned space missions Instead of using the 10 000 gene arrays that are in the marketplace we have chosen to examine particle radiation-induced changes in gene expression using a focused DNA microarray system to study the expression of about 100 genes specifically associated with both the upstream and downstream aspects of the TP53 stress-responsive pathway Genes that are regulated by TP53 include functional clusters that are implicated in cell cycle arrest apoptosis and DNA repair A cultured human lens epithelial cell model Blakely et al IOVS 41 3808 2000 was used for these studies Additional human normal and radiosensitive fibroblast cell lines have also been examined Lens cells were grown on matrix-coated substrate and exposed to 55 MeV u protons at the 88 cyclotron in LBNL or 1 GeV u Iron ions at the NASA Space Radiation Laboratory The other cells lines were grown on conventional tissue culture plasticware RNA and proteins were harvested at different times after irradiation RNA was isolated from sham-treated or select irradiated populations

  3. Immune modulation through RNA interference-mediated silencing of CD40 in dendritic cells.

    PubMed

    Karimi, Mohammad Hossein; Ebadi, Padideh; Pourfathollah, Ali Akbar; Soheili, Zahra Soheila; Samiee, Shahram; Ataee, Zahra; Tabei, Seyyed Ziyaoddin; Moazzeni, Seyed Mohammad

    2009-01-01

    RNA interference (RNAi) is an exciting mechanism for knocking down any target gene in transcriptional level. It is now clear that small interfering RNA (siRNA), a 19-21nt long dsRNA, can trigger a degradation process (RNAi) that specifically silences the expression of a cognate mRNA. Our findings in this study showed that down regulation of CD40 gene expression in dendritic cells (DCs) by RNAi culminated to immune modulation. Effective delivery of siRNA into DCs would be a reasonable method for the blocking of CD40 gene expression at the cell surface without any effect on other genes and cell cytotoxicity. The effects of siRNA against CD40 mRNA on the function and phenotype of DCs were investigated. The DCs were separated from the mice spleen and then cultured in vitro. By the means of Lipofectamine2000, siRNA was delivered to the cells and the efficacy of transfection was estimated by flow cytometry. By Annexine V and Propidium Iodide staining, we could evaluate the transfected cells viability. Also, the mRNA expression and protein synthesis were assessed by real-time PCR and flow cytometry, respectively. Knocking down the CD40 gene in the DCs caused an increase in IL-4 production, decrease in IL-12 production and allostimulation activity. All together, these effects would stimulate Th2 cytokines production from allogenic T-cells in vitro.

  4. Along the Central Dogma-Controlling Gene Expression with Small Molecules.

    PubMed

    Schneider-Poetsch, Tilman; Yoshida, Minoru

    2018-05-04

    The central dogma of molecular biology, that DNA is transcribed into RNA and RNA translated into protein, was coined in the early days of modern biology. Back in the 1950s and 1960s, bacterial genetics first opened the way toward understanding life as the genetically encoded interaction of macromolecules. As molecular biology progressed and our knowledge of gene control deepened, it became increasingly clear that expression relied on many more levels of regulation. In the process of dissecting mechanisms of gene expression, specific small-molecule inhibitors played an important role and became valuable tools of investigation. Small molecules offer significant advantages over genetic tools, as they allow inhibiting a process at any desired time point, whereas mutating or altering the gene of an important regulator would likely result in a dead organism. With the advent of modern sequencing technology, it has become possible to monitor global cellular effects of small-molecule treatment and thereby overcome the limitations of classical biochemistry, which usually looks at a biological system in isolation. This review focuses on several molecules, especially natural products, that have played an important role in dissecting gene expression and have opened up new fields of investigation as well as clinical venues for disease treatment. Expected final online publication date for the Annual Review of Biochemistry Volume 87 is June 20, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

  5. Tissue and serum expression of TGM-3 may be prognostic marker in patients of oral squamous cell carcinoma undergoing chemo-radiotherapy.

    PubMed

    Nayak, Seema; Bhatt, M L B; Goel, Madhu Mati; Gupta, Seema; Mahdi, Abbas Ali; Mishra, Anupam; Mehrotra, Divya

    2018-01-01

    Radioresistance is one of the main determinants of treatment outcome in oral squamous cell carcinoma (OSCC), but its prediction is difficult. Several authors aimed to establish radioresistant OSCC cell lines to identify genes with altered expression in response to radioresistance. The development of OSCC is a multistep carcinogenic process that includes activation of several oncogenes and inactivation of tumour suppressor genes. TGM-3 is a tumour suppressor gene and contributes to carcinogenesis process. The aim of this study was to estimate serum and tissue expression of TGM-3 and its correlation with clinico-pathological factors and overall survival in patients of OSCC undergoing chemo-radiotherapy. Tissue expression was observed in formalin fixed tissue biopsies of 96 cases of OSCC and 32 healthy controls were subjected to immunohistochemistry (IHC) by using antibody against TGM-3 and serum level was estimated by ELISA method. mRNA expression was determined by using Real-Time PCR. Patients were followed for 2 year for chemo radiotherapy response. In OSCC, 76.70% cases and in controls 90.62% were positive for TGM-3 IHC expression. TGM-3 expression was cytoplasmic and nuclear staining expressed in keratinized layer, stratum granulosum and stratum spinosum in controls and tumour cells. Mean serum TGM-3 in pre chemo-radiotherapy OSCC cases were 1304.83±573.55, post chemo-radiotherapy samples were 1530.64±669.33 and controls were 1869.16±1377.36, but difference was significant in pre chemo-radiotherapy samples as compared to controls (p<0.018). This finding was also confirmed by real- time PCR analysis in which down regulation (-7.92 fold change) of TGM-3 in OSCC as compared to controls. TGM-3 expression was significantly associated with response to chemo-radiotherapy treatment (p<0.007) and overall survival (p<0.015). Patents having higher level of TGM-3 expression have good response to chemo-radiotherapy and also have better overall survival. TGM-3 may serve as a candidate biomarker for responsiveness to chemo-radiotherapy treatment in OSCC patients.

  6. Translational resistivity/conductivity of coding sequences during exponential growth of Escherichia coli.

    PubMed

    Takai, Kazuyuki

    2017-01-21

    Codon adaptation index (CAI) has been widely used for prediction of expression of recombinant genes in Escherichia coli and other organisms. However, CAI has no mechanistic basis that rationalizes its application to estimation of translational efficiency. Here, I propose a model based on which we could consider how codon usage is related to the level of expression during exponential growth of bacteria. In this model, translation of a gene is considered as an analog of electric current, and an analog of electric resistance corresponding to each gene is considered. "Translational resistance" is dependent on the steady-state concentration and the sequence of the mRNA species, and "translational resistivity" is dependent only on the mRNA sequence. The latter is the sum of two parts: one is the resistivity for the elongation reaction (coding sequence resistivity), and the other comes from all of the other steps of the decoding reaction. This electric circuit model clearly shows that some conditions should be met for codon composition of a coding sequence to correlate well with its expression level. On the other hand, I calculated relative frequency of each of the 61 sense codon triplets translated during exponential growth of E. coli from a proteomic dataset covering over 2600 proteins. A tentative method for estimating relative coding sequence resistivity based on the data is presented. Copyright © 2016. Published by Elsevier Ltd.

  7. Parameter estimation in tree graph metabolic networks.

    PubMed

    Astola, Laura; Stigter, Hans; Gomez Roldan, Maria Victoria; van Eeuwijk, Fred; Hall, Robert D; Groenenboom, Marian; Molenaar, Jaap J

    2016-01-01

    We study the glycosylation processes that convert initially toxic substrates to nutritionally valuable metabolites in the flavonoid biosynthesis pathway of tomato (Solanum lycopersicum) seedlings. To estimate the reaction rates we use ordinary differential equations (ODEs) to model the enzyme kinetics. A popular choice is to use a system of linear ODEs with constant kinetic rates or to use Michaelis-Menten kinetics. In reality, the catalytic rates, which are affected among other factors by kinetic constants and enzyme concentrations, are changing in time and with the approaches just mentioned, this phenomenon cannot be described. Another problem is that, in general these kinetic coefficients are not always identifiable. A third problem is that, it is not precisely known which enzymes are catalyzing the observed glycosylation processes. With several hundred potential gene candidates, experimental validation using purified target proteins is expensive and time consuming. We aim at reducing this task via mathematical modeling to allow for the pre-selection of most potential gene candidates. In this article we discuss a fast and relatively simple approach to estimate time varying kinetic rates, with three favorable properties: firstly, it allows for identifiable estimation of time dependent parameters in networks with a tree-like structure. Secondly, it is relatively fast compared to usually applied methods that estimate the model derivatives together with the network parameters. Thirdly, by combining the metabolite concentration data with a corresponding microarray data, it can help in detecting the genes related to the enzymatic processes. By comparing the estimated time dynamics of the catalytic rates with time series gene expression data we may assess potential candidate genes behind enzymatic reactions. As an example, we show how to apply this method to select prominent glycosyltransferase genes in tomato seedlings.

  8. Gene-Expression Signature Predicts Postoperative Recurrence in Stage I Non-Small Cell Lung Cancer Patients

    PubMed Central

    Lu, Yan; Wang, Liang; Liu, Pengyuan; Yang, Ping; You, Ming

    2012-01-01

    About 30% stage I non-small cell lung cancer (NSCLC) patients undergoing resection will recur. Robust prognostic markers are required to better manage therapy options. The purpose of this study is to develop and validate a novel gene-expression signature that can predict tumor recurrence of stage I NSCLC patients. Cox proportional hazards regression analysis was performed to identify recurrence-related genes and a partial Cox regression model was used to generate a gene signature of recurrence in the training dataset −142 stage I lung adenocarcinomas without adjunctive therapy from the Director's Challenge Consortium. Four independent validation datasets, including GSE5843, GSE8894, and two other datasets provided by Mayo Clinic and Washington University, were used to assess the prediction accuracy by calculating the correlation between risk score estimated from gene expression and real recurrence-free survival time and AUC of time-dependent ROC analysis. Pathway-based survival analyses were also performed. 104 probesets correlated with recurrence in the training dataset. They are enriched in cell adhesion, apoptosis and regulation of cell proliferation. A 51-gene expression signature was identified to distinguish patients likely to develop tumor recurrence (Dxy = −0.83, P<1e-16) and this signature was validated in four independent datasets with AUC >85%. Multiple pathways including leukocyte transendothelial migration and cell adhesion were highly correlated with recurrence-free survival. The gene signature is highly predictive of recurrence in stage I NSCLC patients, which has important prognostic and therapeutic implications for the future management of these patients. PMID:22292069

  9. RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis.

    PubMed

    Williams, Alexander G; Thomas, Sean; Wyman, Stacia K; Holloway, Alisha K

    2014-10-01

    RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development. Copyright © 2014 John Wiley & Sons, Inc.

  10. How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?

    PubMed

    Veturi, Yogasudha; Ritchie, Marylyn D

    2018-01-01

    Transcriptome-wide association studies (TWAS) have recently been employed as an approach that can draw upon the advantages of genome-wide association studies (GWAS) and gene expression studies to identify genes associated with complex traits. Unlike standard GWAS, summary level data suffices for TWAS and offers improved statistical power. Two popular TWAS methods include either (a) imputing the cis genetic component of gene expression from smaller sized studies (using multi-SNP prediction or MP) into much larger effective sample sizes afforded by GWAS - TWAS-MP or (b) using summary-based Mendelian randomization - TWAS-SMR. Although these methods have been effective at detecting functional variants, it remains unclear how extensive variability in the genetic architecture of complex traits and diseases impacts TWAS results. Our goal was to investigate the different scenarios under which these methods yielded enough power to detect significant expression-trait associations. In this study, we conducted extensive simulations based on 6000 randomly chosen, unrelated Caucasian males from Geisinger's MyCode population to compare the power to detect cis expression-trait associations (within 500 kb of a gene) using the above-described approaches. To test TWAS across varying genetic backgrounds we simulated gene expression and phenotype using different quantitative trait loci per gene and cis-expression /trait heritability under genetic models that differentiate the effect of causality from that of pleiotropy. For each gene, on a training set ranging from 100 to 1000 individuals, we either (a) estimated regression coefficients with gene expression as the response using five different methods: LASSO, elastic net, Bayesian LASSO, Bayesian spike-slab, and Bayesian ridge regression or (b) performed eQTL analysis. We then sampled with replacement 50,000, 150,000, and 300,000 individuals respectively from the testing set of the remaining 5000 individuals and conducted GWAS on each set. Subsequently, we integrated the GWAS summary statistics derived from the testing set with the weights (or eQTLs) derived from the training set to identify expression-trait associations using (a) TWAS-MP (b) TWAS-SMR (c) eQTL-based GWAS, or (d) standalone GWAS. Finally, we examined the power to detect functionally relevant genes using the different approaches under the considered simulation scenarios. In general, we observed great similarities among TWAS-MP methods although the Bayesian methods resulted in improved power in comparison to LASSO and elastic net as the trait architecture grew more complex while training sample sizes and expression heritability remained small. Finally, we observed high power under causality but very low to moderate power under pleiotropy.

  11. ExpressionDB: An open source platform for distributing genome-scale datasets.

    PubMed

    Hughes, Laura D; Lewis, Scott A; Hughes, Michael E

    2017-01-01

    RNA-sequencing (RNA-seq) and microarrays are methods for measuring gene expression across the entire transcriptome. Recent advances have made these techniques practical and affordable for essentially any laboratory with experience in molecular biology. A variety of computational methods have been developed to decrease the amount of bioinformatics expertise necessary to analyze these data. Nevertheless, many barriers persist which discourage new labs from using functional genomics approaches. Since high-quality gene expression studies have enduring value as resources to the entire research community, it is of particular importance that small labs have the capacity to share their analyzed datasets with the research community. Here we introduce ExpressionDB, an open source platform for visualizing RNA-seq and microarray data accommodating virtually any number of different samples. ExpressionDB is based on Shiny, a customizable web application which allows data sharing locally and online with customizable code written in R. ExpressionDB allows intuitive searches based on gene symbols, descriptions, or gene ontology terms, and it includes tools for dynamically filtering results based on expression level, fold change, and false-discovery rates. Built-in visualization tools include heatmaps, volcano plots, and principal component analysis, ensuring streamlined and consistent visualization to all users. All of the scripts for building an ExpressionDB with user-supplied data are freely available on GitHub, and the Creative Commons license allows fully open customization by end-users. We estimate that a demo database can be created in under one hour with minimal programming experience, and that a new database with user-supplied expression data can be completed and online in less than one day.

  12. Gene Expression and Correlation of Pten and Fabp4 in Liver, Muscle, and Adipose Tissues of Type 2 Diabetes Rats.

    PubMed

    Su, Di; Zhang, Chuan-Ling; Gao, Ying-Chun; Liu, Xiao-Ying; Li, Cai-Ping; Huangfu, Jian; Xiao, Rui

    2015-11-22

    The aim of this work was to study the Fabp4 and Pten gene expression and correlation in the liver, muscle, and adipose tissues of type 2 diabetes mellitus (T2DM) rats. Male Wistar rats (8 weeks old) were randomly divided into 2 groups (n=12/group): a control group fed a normal diet for 8 weeks and an experimental group fed a high-fat, high-sugar diet for 8 weeks and that received 25 mg/kg streptozotocin by intraperitoneal injection to induce T2DM. The random blood glucose, fasting blood glucose, and fasting insulin levels were measured. The expression of Pten and Fabp4 in the liver, muscle, and epididymal adipose tissues was estimated by real-time quantitative PCR. Pearson correlation coefficient analysis was used to investigate the expression correlation between Pten and Fabp4 in T2DM rats. The gene expressions of Pten and Fabp4 in the liver, muscle, and adipose tissues of T2DM rats were all significantly higher than those in the control group (P<0.05). Pten was highly expressed in the muscles and Fabp4 was highly expressed in muscle and adipose tissues. Furthermore, expressions of Fabp4 and Pten in the muscle and adipose tissues of T2DM rats were positively correlated (P<0.05), but not in the liver. The increased expression of PTEN and FABP4 in the adipose and muscles of T2DM rats may play an important role in the insulin resistance of T2DM. However, the mechanism by which these 2 genes function in T2DM needs further study.

  13. Analysis of gene expression levels in individual bacterial cells without image segmentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwak, In Hae; Son, Minjun; Hagen, Stephen J., E-mail: sjhagen@ufl.edu

    2012-05-11

    Highlights: Black-Right-Pointing-Pointer We present a method for extracting gene expression data from images of bacterial cells. Black-Right-Pointing-Pointer The method does not employ cell segmentation and does not require high magnification. Black-Right-Pointing-Pointer Fluorescence and phase contrast images of the cells are correlated through the physics of phase contrast. Black-Right-Pointing-Pointer We demonstrate the method by characterizing noisy expression of comX in Streptococcus mutans. -- Abstract: Studies of stochasticity in gene expression typically make use of fluorescent protein reporters, which permit the measurement of expression levels within individual cells by fluorescence microscopy. Analysis of such microscopy images is almost invariably based on amore » segmentation algorithm, where the image of a cell or cluster is analyzed mathematically to delineate individual cell boundaries. However segmentation can be ineffective for studying bacterial cells or clusters, especially at lower magnification, where outlines of individual cells are poorly resolved. Here we demonstrate an alternative method for analyzing such images without segmentation. The method employs a comparison between the pixel brightness in phase contrast vs fluorescence microscopy images. By fitting the correlation between phase contrast and fluorescence intensity to a physical model, we obtain well-defined estimates for the different levels of gene expression that are present in the cell or cluster. The method reveals the boundaries of the individual cells, even if the source images lack the resolution to show these boundaries clearly.« less

  14. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  15. Papain-like cysteine proteases in Carica papaya: lineage-specific gene duplication and expansion.

    PubMed

    Liu, Juan; Sharma, Anupma; Niewiara, Marie Jamille; Singh, Ratnesh; Ming, Ray; Yu, Qingyi

    2018-01-06

    Papain-like cysteine proteases (PLCPs), a large group of cysteine proteases structurally related to papain, play important roles in plant development, senescence, and defense responses. Papain, the first cysteine protease whose structure was determined by X-ray crystallography, plays a crucial role in protecting papaya from herbivorous insects. Except the four major PLCPs purified and characterized in papaya latex, the rest of the PLCPs in papaya genome are largely unknown. We identified 33 PLCP genes in papaya genome. Phylogenetic analysis clearly separated plant PLCP genes into nine subfamilies. PLCP genes are not equally distributed among the nine subfamilies and the number of PLCPs in each subfamily does not increase or decrease proportionally among the seven selected plant species. Papaya showed clear lineage-specific gene expansion in the subfamily III. Interestingly, all four major PLCPs purified from papaya latex, including papain, chymopapain, glycyl endopeptidase and caricain, were grouped into the lineage-specific expansion branch in the subfamily III. Mapping PLCP genes on chromosomes of five plant species revealed that lineage-specific expansions of PLCP genes were mostly derived from tandem duplications. We estimated divergence time of papaya PLCP genes of subfamily III. The major duplication events leading to lineage-specific expansion of papaya PLCP genes in subfamily III were estimated at 48 MYA, 34 MYA, and 16 MYA. The gene expression patterns of the papaya PLCP genes in different tissues were assessed by transcriptome sequencing and qRT-PCR. Most of the papaya PLCP genes of subfamily III expressed at high levels in leaf and green fruit tissues. Tandem duplications played the dominant role in affecting copy number of PLCPs in plants. Significant variations in size of the PLCP subfamilies among species may reflect genetic adaptation of plant species to different environments. The lineage-specific expansion of papaya PLCPs of subfamily III might have been promoted by the continuous reciprocal selective effects of herbivore attack and plant defense.

  16. Updated Rice Kinase Database RKD 2.0: enabling transcriptome and functional analysis of rice kinase genes.

    PubMed

    Chandran, Anil Kumar Nalini; Yoo, Yo-Han; Cao, Peijian; Sharma, Rita; Sharma, Manoj; Dardick, Christopher; Ronald, Pamela C; Jung, Ki-Hong

    2016-12-01

    Protein kinases catalyze the transfer of a phosphate moiety from a phosphate donor to the substrate molecule, thus playing critical roles in cell signaling and metabolism. Although plant genomes contain more than 1000 genes that encode kinases, knowledge is limited about the function of each of these kinases. A major obstacle that hinders progress towards kinase characterization is functional redundancy. To address this challenge, we previously developed the rice kinase database (RKD) that integrated omics-scale data within a phylogenetics context. An updated version of rice kinase database (RKD) that contains metadata derived from NCBI GEO expression datasets has been developed. RKD 2.0 facilitates in-depth transcriptomic analyses of kinase-encoding genes in diverse rice tissues and in response to biotic and abiotic stresses and hormone treatments. We identified 261 kinases specifically expressed in particular tissues, 130 that are significantly up- regulated in response to biotic stress, 296 in response to abiotic stress, and 260 in response to hormones. Based on this update and Pearson correlation coefficient (PCC) analysis, we estimated that 19 out of 26 genes characterized through loss-of-function studies confer dominant functions. These were selected because they either had paralogous members with PCC values of <0.5 or had no paralog. Compared with the previous version of RKD, RKD 2.0 enables more effective estimations of functional redundancy or dominance because it uses comprehensive expression profiles rather than individual profiles. The integrated analysis of RKD with PCC establishes a single platform for researchers to select rice kinases for functional analyses.

  17. Astrocyte elevated gene-1: a novel independent prognostic biomarker for metastatic ovarian tumors.

    PubMed

    Li, Cong; Chen, Kexin; Cai, Jianping; Shi, Qing-Tao; Li, Yinghong; Li, Lejing; Song, Hongtao; Qiu, Huilei; Qin, Yu; Geng, Jing-Shu

    2014-04-01

    Astrocyte elevated gene-1 (AEG-1), a novel tumor-associated gene, was found overexpressed in many tumors. Therefore, our purpose is to estimate whether AEG-1 overexpression is a novel predictor of prognostic marker in metastatic ovarian tumors. Immunohistochemistry was used to estimate AEG-1 overexpression in metastatic ovarian tumors from 102 samples. The association between AEG-1 expression and prognosis was estimated by univariate and multivariate survival analyses with Cox regression. The log-rank test was used to identify any differences in the prognosis between the two groups. The median overall and progression-free survival rates of patients with tumors from gastrointestinal tract origin were 0.97 and 0.51 years, respectively. Similarly, survival rates of patients with tumors of breast origin were 2.68 and 1.96 years (P < 0.0001). Of 102 patients, 77 had high expression, and AEG-1 overexpression had a significant link of prognosis in metastatic ovarian patients (P < 0.01). On the other hand, medians of overall survival and progression-free survival of patients with tumors of gastrointestinal tract origin were significantly lower than those of patients with tumors of breast origin (P < 0.0001). Patients with metastatic ovarian tumors of breast origin had significantly better prognosis than those with the tumors from gastrointestinal tract primary malignancies. It is suggested that AEG-1 overexpression might be an independent prognostic marker of metastatic ovarian tumors.

  18. The role of alternative splicing coupled to nonsense-mediated mRNA decay in human disease.

    PubMed

    da Costa, Paulo J; Menezes, Juliane; Romão, Luísa

    2017-10-01

    Alternative pre-mRNA splicing (AS) affects gene expression as it generates proteome diversity. Nonsense-mediated mRNA decay (NMD) is a surveillance pathway that recognizes and selectively degrades mRNAs carrying premature translation-termination codons (PTCs), preventing the production of truncated proteins that could result in disease. Several studies have also implicated NMD in the regulation of steady-state levels of physiological mRNAs. In addition, it is known that several regulated AS events do not lead to generation of protein products, as they lead to transcripts that carry PTCs and thus, they are committed to NMD. Indeed, an estimated one-third of naturally occurring, alternatively spliced mRNAs is targeted for NMD, being AS coupled to NMD (AS-NMD) an efficient strategy to regulate gene expression. In this review, we will focus on how AS mechanism operates and how can be coupled to NMD to fine-tune gene expression levels. Furthermore, we will demonstrate the physiological significance of the interplay among AS and NMD in human disease, such as cancer and neurological disorders. The understanding of how AS-NMD orchestrates expression of vital genes is of utmost importance for the advance in diagnosis, prognosis and treatment of many human disorders. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Combined DNA methylation and gene expression profiling in gastrointestinal stromal tumors reveals hypomethylation of SPP1 as an independent prognostic factor.

    PubMed

    Haller, Florian; Zhang, Jitao David; Moskalev, Evgeny A; Braun, Alexander; Otto, Claudia; Geddert, Helene; Riazalhosseini, Yasser; Ward, Aoife; Balwierz, Aleksandra; Schaefer, Inga-Marie; Cameron, Silke; Ghadimi, B Michael; Agaimy, Abbas; Fletcher, Jonathan A; Hoheisel, Jörg; Hartmann, Arndt; Werner, Martin; Wiemann, Stefan; Sahin, Ozgür

    2015-03-01

    Gastrointestinal stromal tumors (GISTs) have distinct gene expression patterns according to localization, genotype and aggressiveness. DNA methylation at CpG dinucleotides is an important mechanism for regulation of gene expression. We performed targeted DNA methylation analysis of 1.505 CpG loci in 807 cancer-related genes in a cohort of 76 GISTs, combined with genome-wide mRNA expression analysis in 22 GISTs, to identify signatures associated with clinicopathological parameters and prognosis. Principal component analysis revealed distinct DNA methylation patterns associated with anatomical localization, genotype, mitotic counts and clinical follow-up. Methylation of a single CpG dinucleotide in the non-CpG island promoter of SPP1 was significantly correlated with shorter disease-free survival. Hypomethylation of this CpG was an independent prognostic parameter in a multivariate analysis compared to anatomical localization, genotype, tumor size and mitotic counts in a cohort of 141 GISTs with clinical follow-up. The epigenetic regulation of SPP1 was confirmed in vitro, and the functional impact of SPP1 protein on tumorigenesis-related signaling pathways was demonstrated. In summary, SPP1 promoter methylation is a novel and independent prognostic parameter in GISTs, and might be helpful in estimating the aggressiveness of GISTs from the intermediate-risk category. © 2014 UICC.

  20. Partial Roc Reveals Superiority of Mutual Rank of Pearson's Correlation Coefficient as a Coexpression Measure to Elucidate Functional Association of Genes

    NASA Astrophysics Data System (ADS)

    Obayashi, Takeshi; Kinoshita, Kengo

    2013-01-01

    Gene coexpression analysis is a powerful approach to elucidate gene function. We have established and developed this approach using vast amount of publicly available gene expression data measured by microarray techniques. The coexpressed genes are used to estimate gene function of the guide gene or to construct gene coexpression networks. In the case to construct gene networks, researchers should introduce an arbitrary threshold of gene coexpression, because gene coexpression value is continuous value. In the viewpoint to introduce common threshold of gene coexpression, we previously reported rank of Pearson's correlation coefficient (PCC) is more useful than the original PCC value. In this manuscript, we re-assessed the measure of gene coexpression to construct gene coexpression network, and found that mutual rank (MR) of PCC showed better performance than rank of PCC and the original PCC in low false positive rate.

  1. Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...

  2. Quantifying Intrinsic and Extrinsic Variability in Stochastic Gene Expression Models

    PubMed Central

    Singh, Abhyudai; Soltani, Mohammad

    2013-01-01

    Genetically identical cell populations exhibit considerable intercellular variation in the level of a given protein or mRNA. Both intrinsic and extrinsic sources of noise drive this variability in gene expression. More specifically, extrinsic noise is the expression variability that arises from cell-to-cell differences in cell-specific factors such as enzyme levels, cell size and cell cycle stage. In contrast, intrinsic noise is the expression variability that is not accounted for by extrinsic noise, and typically arises from the inherent stochastic nature of biochemical processes. Two-color reporter experiments are employed to decompose expression variability into its intrinsic and extrinsic noise components. Analytical formulas for intrinsic and extrinsic noise are derived for a class of stochastic gene expression models, where variations in cell-specific factors cause fluctuations in model parameters, in particular, transcription and/or translation rate fluctuations. Assuming mRNA production occurs in random bursts, transcription rate is represented by either the burst frequency (how often the bursts occur) or the burst size (number of mRNAs produced in each burst). Our analysis shows that fluctuations in the transcription burst frequency enhance extrinsic noise but do not affect the intrinsic noise. On the contrary, fluctuations in the transcription burst size or mRNA translation rate dramatically increase both intrinsic and extrinsic noise components. Interestingly, simultaneous fluctuations in transcription and translation rates arising from randomness in ATP abundance can decrease intrinsic noise measured in a two-color reporter assay. Finally, we discuss how these formulas can be combined with single-cell gene expression data from two-color reporter experiments for estimating model parameters. PMID:24391934

  3. Quantifying intrinsic and extrinsic variability in stochastic gene expression models.

    PubMed

    Singh, Abhyudai; Soltani, Mohammad

    2013-01-01

    Genetically identical cell populations exhibit considerable intercellular variation in the level of a given protein or mRNA. Both intrinsic and extrinsic sources of noise drive this variability in gene expression. More specifically, extrinsic noise is the expression variability that arises from cell-to-cell differences in cell-specific factors such as enzyme levels, cell size and cell cycle stage. In contrast, intrinsic noise is the expression variability that is not accounted for by extrinsic noise, and typically arises from the inherent stochastic nature of biochemical processes. Two-color reporter experiments are employed to decompose expression variability into its intrinsic and extrinsic noise components. Analytical formulas for intrinsic and extrinsic noise are derived for a class of stochastic gene expression models, where variations in cell-specific factors cause fluctuations in model parameters, in particular, transcription and/or translation rate fluctuations. Assuming mRNA production occurs in random bursts, transcription rate is represented by either the burst frequency (how often the bursts occur) or the burst size (number of mRNAs produced in each burst). Our analysis shows that fluctuations in the transcription burst frequency enhance extrinsic noise but do not affect the intrinsic noise. On the contrary, fluctuations in the transcription burst size or mRNA translation rate dramatically increase both intrinsic and extrinsic noise components. Interestingly, simultaneous fluctuations in transcription and translation rates arising from randomness in ATP abundance can decrease intrinsic noise measured in a two-color reporter assay. Finally, we discuss how these formulas can be combined with single-cell gene expression data from two-color reporter experiments for estimating model parameters.

  4. Up-Regulation of Angiotensin-Converting Enzyme (ACE) Enhances Cell Proliferation and Predicts Poor Prognosis in Laryngeal Cancer.

    PubMed

    Han, Chao-Dong; Ge, Wen-Sheng

    2016-11-01

    BACKGROUND The angiotensin-converting enzyme (ACE, CD143) gene plays a crucial role in the pathology of many cancers. Previous studies mostly focused on the gene polymorphism, but the other functions of ACE have rarely been reported. The purpose of this study was to investigate the expression of ACE and its biological function, as well as its prognostic value, in laryngeal cancer. MATERIAL AND METHODS The expression of ACE was detected by quantitative real-time polymerase chain reaction (qRT-PCR) analysis in 106 patients with laryngeal cancer and 85 healthy people. Then the cell proliferation was estimated after the cell lines Hep-2 were transfected with pGL3-ACE and empty vector, respectively. In addition, the relationship between ACE expression and clinicopathologic characteristics was analyzed. Finally, Kaplan-Meier analysis was used to evaluate the overall survival of patients with different ACE expression, while Cox regression analysis was conducted to reveal the prognostic value of ACE in laryngeal cancer. RESULTS Our results demonstrate that ACE is over-expressed in laryngeal cancer and thus promotes cell proliferation. The up-regulation of ACE was significantly influenced by tumor stage and lymph node metastasis. Patients with high ACE expression had a shorter overall survival compared with those with low ACE expression according to Kaplan-Meier analysis. The ACE gene was also found to be an important factor in the prognosis of laryngeal cancer. CONCLUSIONS Our study shows that the ACE gene was up-regulated, which promoted the cell proliferation, and it could be an independent prognostic marker in laryngeal cancer.

  5. The age dependency of gene expression for plasma lipids, lipoproteins, and apolipoproteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Snieder, H.; Doornen, L.J.P. van; Boomsma, D.I.

    The aim of this study was to investigate and disentangle the genetic and nongenetic causes of stability and change in lipids and (apo)lipoproteins that occur during the lifespan. Total cholesterol, low-density lipoprotein (LDL), high-density lipoprotein (HDL), triglycerides, apolipoprotein A1 (ApoA1), apolipoprotein B (ApoB), and lipoprotein(a) (Lp[a]) were measured in a group of 160 middle-aged parents and their twin offspring (first project) and in a group of 203 middle-aged twin pairs (second project). Combining the data of both projects enabled the estimation of the extent to which measured lipid parameters are influenced by different genes in adolescence and adulthood. To thatmore » end, an extended quantitative genetic model was specified, which allowed the estimation of heritabilities for each sex and generation separately. Heritabilities were similar for both sexes and both generations. Larger variances in the parental generation could be ascribed to proportional increases in both unique environmental and additive genetic variance from childhood to adulthood, which led to similar heritability estimates in adolescent and middle-aged twins. Although the magnitudes of heritabilities were similar across generations, results showed that, for total cholesterol, triglycerides, HDL, and LDL, partly different genes are expressed in adolescence compared to adulthood. For triglycerides, only 46% of the genetic variance was common to both age groups; for total cholesterol this was 80%. Intermediate values were found for HDL (66%) and LDL (76%). For ApoA1, ApoB, and Lp(a), the same genes seem to act in both generations. 56 refs., 2 figs., 5 tabs.« less

  6. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain

    PubMed Central

    Krienen, Fenna M.; Yeo, B. T. Thomas; Ge, Tian; Buckner, Randy L.; Sherwood, Chet C.

    2016-01-01

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute’s human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections. PMID:26739559

  7. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain.

    PubMed

    Krienen, Fenna M; Yeo, B T Thomas; Ge, Tian; Buckner, Randy L; Sherwood, Chet C

    2016-01-26

    The human brain is patterned with disproportionately large, distributed cerebral networks that connect multiple association zones in the frontal, temporal, and parietal lobes. The expansion of the cortical surface, along with the emergence of long-range connectivity networks, may be reflected in changes to the underlying molecular architecture. Using the Allen Institute's human brain transcriptional atlas, we demonstrate that genes particularly enriched in supragranular layers of the human cerebral cortex relative to mouse distinguish major cortical classes. The topography of transcriptional expression reflects large-scale brain network organization consistent with estimates from functional connectivity MRI and anatomical tracing in nonhuman primates. Microarray expression data for genes preferentially expressed in human upper layers (II/III), but enriched only in lower layers (V/VI) of mouse, were cross-correlated to identify molecular profiles across the cerebral cortex of postmortem human brains (n = 6). Unimodal sensory and motor zones have similar molecular profiles, despite being distributed across the cortical mantle. Sensory/motor profiles were anticorrelated with paralimbic and certain distributed association network profiles. Tests of alternative gene sets did not consistently distinguish sensory and motor regions from paralimbic and association regions: (i) genes enriched in supragranular layers in both humans and mice, (ii) genes cortically enriched in humans relative to nonhuman primates, (iii) genes related to connectivity in rodents, (iv) genes associated with human and mouse connectivity, and (v) 1,454 gene sets curated from known gene ontologies. Molecular innovations of upper cortical layers may be an important component in the evolution of long-range corticocortical projections.

  8. Isolation and expression analysis of FTZ-F1 encoding gene of black rock fish ( Sebastes schlegelii)

    NASA Astrophysics Data System (ADS)

    Shafi, Muhammad; Wang, Yanan; Zhou, Xiaosu; Ma, Liman; Muhammad, Faiz; Qi, Jie; Zhang, Quanqi

    2013-03-01

    Sex related FTZ-F1 is a transcriptional factor regulating the expression of fushi tarazu (a member of the orphan nuclear receptors) gene. In this study, FTZ-F1 gene ( FTZ-F1) was isolated from the testis of black rockfish ( Sebastes schlegeli) by homology cloning. The full-length cDNA of S. schlegeli FTZ-F1 ( ssFTZ-F1) contained a 232bp 5' UTR, a 1449bp ORF encoding FTZ-F1 (482 amino acid residules in length) with an estimated molecular weight of 5.4kD and a 105bp 3' UTR. Sequence, tissue distribution and phylogenic analysis showed that ssFTZ-F1 belonged to FTZ group, holding highly conserved regions including I, II and III FTZ-F1 boxes and an AF-2 hexamer. Relatively high expression was observed at different larva stages. In juveniles (105 days old), the transcript of ssFTZ-F1 can be detected in all tissues and the abuncance of the gene transcript in testis, ovary, spleen and brain was higher than that in other tissues. In mature fish, the abundance of gene transcript was higher in testis, ovary, spleen and brain than that in liver (trace amount), and the gene was not transcribed in other tissues. The highest abundance of gene transcript was always observed in gonads of both juvenile and mature fish. In addition, the abundance of gene transcript in male tissues were higher than that in female tissue counterparts ( P<0.05).

  9. Identification of Genes Whose Expression Profile Is Associated with Non-Progression towards AIDS Using eQTLs

    PubMed Central

    Le Clerc, Sigrid; van Manen, Daniëlle; Coulonges, Cédric; Ulveling, Damien; Laville, Vincent; Labib, Taoufik; Taing, Lieng; Delaneau, Olivier; Montes, Matthieu; Schuitemaker, Hanneke; Zagury, Jean-François

    2015-01-01

    Background Many genome-wide association studies have been performed on progression towards the acquired immune deficiency syndrome (AIDS) and they mainly identified associations within the HLA loci. In this study, we demonstrate that the integration of biological information, namely gene expression data, can enhance the sensitivity of genetic studies to unravel new genetic associations relevant to AIDS. Methods We collated the biological information compiled from three databases of expression quantitative trait loci (eQTLs) involved in cells of the immune system. We derived a list of single nucleotide polymorphisms (SNPs) that are functional in that they correlate with differential expression of genes in at least two of the databases. We tested the association of those SNPs with AIDS progression in two cohorts, GRIV and ACS. Tests on permuted phenotypes of the GRIV and ACS cohorts or on randomised sets of equivalent SNPs allowed us to assess the statistical robustness of this method and to estimate the true positive rate. Results Eight genes were identified with high confidence (p = 0.001, rate of true positives 75%). Some of those genes had previously been linked with HIV infection. Notably, ENTPD4 belongs to the same family as CD39, whose expression has already been associated with AIDS progression; while DNAJB12 is part of the HSP90 pathway, which is involved in the control of HIV latency. Our study also drew our attention to lesser-known functions such as mitochondrial ribosomal proteins and a zinc finger protein, ZFP57, which could be central to the effectiveness of HIV infection. Interestingly, for six out of those eight genes, down-regulation is associated with non-progression, which makes them appealing targets to develop drugs against HIV. PMID:26367535

  10. A distinct adipose tissue gene expression response to caloric restriction predicts 6-mo weight maintenance in obese subjects.

    PubMed

    Mutch, David M; Pers, Tune H; Temanni, M Ramzi; Pelloux, Veronique; Marquez-Quiñones, Adriana; Holst, Claus; Martinez, J Alfredo; Babalis, Dimitris; van Baak, Marleen A; Handjieva-Darlenska, Teodora; Walker, Celia G; Astrup, Arne; Saris, Wim H M; Langin, Dominique; Viguerie, Nathalie; Zucker, Jean-Daniel; Clément, Karine

    2011-12-01

    Weight loss has been shown to reduce risk factors associated with cardiovascular disease and diabetes; however, successful maintenance of weight loss continues to pose a challenge. The present study was designed to assess whether changes in subcutaneous adipose tissue (scAT) gene expression during a low-calorie diet (LCD) could be used to differentiate and predict subjects who experience successful short-term weight maintenance from subjects who experience weight regain. Forty white women followed a dietary protocol consisting of an 8-wk LCD phase followed by a 6-mo weight-maintenance phase. Participants were classified as weight maintainers (WMs; 0-10% weight regain) and weight regainers (WRs; 50-100% weight regain) by considering changes in body weight during the 2 phases. Anthropometric measurements, bioclinical variables, and scAT gene expression were studied in all individuals before and after the LCD. Energy intake was estimated by using 3-d dietary records. No differences in body weight and fasting insulin were observed between WMs and WRs at baseline or after the LCD period. The LCD resulted in significant decreases in body weight and in several plasma variables in both groups. WMs experienced a significant reduction in insulin secretion in response to an oral-glucose-tolerance test after the LCD; in contrast, no changes in insulin secretion were observed in WRs after the LCD. An ANOVA of scAT gene expression showed that genes regulating fatty acid metabolism, citric acid cycle, oxidative phosphorylation, and apoptosis were regulated differently by the LCD in WM and WR subjects. This study suggests that LCD-induced changes in insulin secretion and scAT gene expression may have the potential to predict successful short-term weight maintenance. This trial was registered at clinicaltrials.gov as NCT00390637.

  11. RNA-sequencing quantification of hepatic ontogeny of phase-I enzymes in mice.

    PubMed

    Peng, Lai; Cui, Julia Y; Yoo, Byunggil; Gunewardena, Sumedha S; Lu, Hong; Klaassen, Curtis D; Zhong, Xiao-Bo

    2013-12-01

    Phase-I drug metabolizing enzymes catalyze reactions of hydrolysis, reduction, and oxidation of drugs and play a critical role in drug metabolism. However, the functions of most phase-I enzymes are not mature at birth, which markedly affects drug metabolism in newborns. Therefore, characterization of the expression profiles of phase-I enzymes and the underlying regulatory mechanisms during liver maturation is needed for better estimation of using drugs in pediatric patients. The mouse is an animal model widely used for studying the mechanisms in the regulation of developmental expression of phase-I genes. Therefore, we applied RNA sequencing to provide a "true quantification" of the mRNA expression of phase-I genes in the mouse liver during development. Liver samples of male C57BL/6 mice at 12 different ages from prenatal to adulthood were used for defining the ontogenic mRNA profiles of phase-I families, including hydrolysis: carboxylesterase (Ces), paraoxonase (Pon), and epoxide hydrolase (Ephx); reduction: aldo-keto reductase (Akr), quinone oxidoreductase (Nqo), and dihydropyrimidine dehydrogenase (Dpyd); and oxidation: alcohol dehydrogenase (Adh), aldehyde dehydrogenase (Aldh), flavin monooxygenases (Fmo), molybdenum hydroxylase (Aox and Xdh), cytochrome P450 (P450), and cytochrome P450 oxidoreductase (Por). Two rapidly increasing stages of total phase-I gene expression after birth reflect functional transition of the liver during development. Diverse expression patterns were identified, and some large gene families contained the mRNA of genes that are enriched at different stages of development. Our study reveals the mRNA abundance of phase-I genes in the mouse liver during development and provides a valuable foundation for mechanistic studies in the future.

  12. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    PubMed

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  13. Comprehensive analysis of RNA-seq data reveals the complexity of the transcriptome in Brassica rapa.

    PubMed

    Tong, Chaobo; Wang, Xiaowu; Yu, Jingyin; Wu, Jian; Li, Wanshun; Huang, Junyan; Dong, Caihua; Hua, Wei; Liu, Shengyi

    2013-10-07

    The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues. RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns. The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.

  14. Comparison of Established and Emerging Biodosimetry Assays

    PubMed Central

    Rothkamm, K.; Beinke, C.; Romm, H.; Badie, C.; Balagurunathan, Y.; Barnard, S.; Bernard, N.; Boulay-Greene, H.; Brengues, M.; De Amicis, A.; De Sanctis, S.; Greither, R.; Herodin, F.; Jones, A.; Kabacik, S.; Knie, T.; Kulka, U.; Lista, F.; Martigne, P.; Missel, A.; Moquet, J.; Oestreicher, U.; Peinnequin, A.; Poyot, T.; Roessler, U.; Scherthan, H.; Terbrueggen, B.; Thierens, H.; Valente, M.; Vral, A.; Zenhausern, F.; Meineke, V.; Braselmann, H.; Abend, M.

    2014-01-01

    Rapid biodosimetry tools are required to assist with triage in the case of a large-scale radiation incident. Here, we aimed to determine the dose-assessment accuracy of the well-established dicentric chromosome assay (DCA) and cytokinesis-block micronucleus assay (CBMN) in comparison to the emerging γ-H2AX foci and gene expression assays for triage mode biodosimetry and radiation injury assessment. Coded blood samples exposed to 10 X-ray doses (240 kVp, 1 Gy/min) of up to 6.4 Gy were sent to participants for dose estimation. Report times were documented for each laboratory and assay. The mean absolute difference (MAD) of estimated doses relative to the true doses was calculated. We also merged doses into binary dose categories of clinical relevance and examined accuracy, sensitivity and specificity of the assays. Dose estimates were reported by the first laboratories within 0.3–0.4 days of receipt of samples for the γ-H2AX and gene expression assays compared to 2.4 and 4 days for the DCA and CBMN assays, respectively. Irrespective of the assay we found a 2.5–4-fold variation of interlaboratory accuracy per assay and lowest MAD values for the DCA assay (0.16 Gy) followed by CBMN (0.34 Gy), gene expression (0.34 Gy) and γ-H2AX (0.45 Gy) foci assay. Binary categories of dose estimates could be discriminated with equal efficiency for all assays, but at doses ≥1.5 Gy a 10% decrease in efficiency was observed for the foci assay, which was still comparable to the CBMN assay. In conclusion, the DCA has been confirmed as the gold standard biodosimetry method, but in situations where speed and throughput are more important than ultimate accuracy, the emerging rapid molecular assays have the potential to become useful triage tools. PMID:23862692

  15. The SGBS cell strain as a model for the in vitro study of obesity and cancer.

    PubMed

    Allott, Emma H; Oliver, Elizabeth; Lysaght, Joanne; Gray, Steven G; Reynolds, John V; Roche, Helen M; Pidgeon, Graham P

    2012-10-01

    The murine adipocyte cell line 3T3-L1 is well characterised and used widely, while the human pre-adipocyte cell strain, Simpson-Golabi-Behmel Syndrome (SGBS), requires validation for use in human studies. Obesity is currently estimated to account for up to 41 % of the worldwide cancer burden. A human in vitro model system is required to elucidate the molecular mechanisms for this poorly understood association. This work investigates the relevance of the SGBS cell strain for obesity and cancer research in humans. Pre-adipocyte 3T3-L1 and SGBS were differentiated according to standard protocols. Morphology was assessed by Oil Red O staining. Adipocyte-specific gene expression was measured by qPCR and biochemical function was assessed by glycerol-3-phosphate dehydrogenase (GPDH) enzyme activity. Differential gene expression in oesophageal adenocarcinoma cell line OE33 following co-culture with SGBS or primary omental human adipocytes was investigated using Human Cancer Profiler qPCR arrays. During the process of differentiation, SGBS expressed higher levels of adipocyte-specific transcripts and fully differentiated SGBS expressed more similar morphology, transcript levels and biochemical function to primary omental adipocytes, relative to 3T3-L1. Co-culture with SGBS or primary omental adipocytes induced differential expression of genes involved in adhesion (ITGB3), angiogenesis (IGF1, TEK, TNF, VEGFA), apoptosis (GZMA, TERT) and invasion and metastasis (MMP9, TIMP3) in OE33 tumour cells. Comparable adipocyte-specific gene expression, biochemical function and a shared induced gene signature in co-cultured OE33 cells indicate that SGBS is a relevant in vitro model for obesity and cancer research in humans.

  16. Characterization and SNP variation analysis of a HSP70 gene from miiuy croaker and its expression as related to bacterial challenge and heat shock.

    PubMed

    Wei, Tao; Sun, Yuena; Shi, Ge; Wang, Rixin; Xu, Tianjun

    2012-09-01

    Heat shock proteins (HSPs) play crucial roles in the immune response of vertebrates. In order to study immune defense mechanism of heat shock protein gene in miiuy croaker (Miichthys miiuy), a cDNA encoding heat shock protein 70 (designated Mimi-HSP70) gene was cloned from miiuy croaker. The cDNA was 2195 bp in length, consisting of an open reading frame (ORF) of 1917 bp encoding a polypeptide of 638 amino acids with estimated molecular mass of 70.3 kDa and theoretical isoelectric point of 5.55. Genomic DNA structure analysis revealed that the Mimi-HSP70 gene contain no introns in coding region and four SNPs with 373 C/T, 789 G/A, 1005 C/T, and 1185 G/A were detected by direct sequencing of 20 samples from six different populations. BLAST analysis, structure comparison and phylogenetic analysis indicated that Mimi-HSP70 should be an inducible cytosolic member of the HSP70 family. The deduced amino acid sequence of Mimi-HSP70 had 82.4%-92.2% identity with those of vertebrate. A real-time quantitative RT-PCR demonstrated that the HSP70 gene was ubiquitously expressed in ten normal tissues. Under different temperature shock stress, the expression of Mimi-HSP70 gene in miiuy croaker increased at first and then decreased with the rise of temperature, finally, reached a maximum level in liver, spleen and kidney tissues. Infection of miiuy croaker with Vibrio anguillarum resulted in significant changes expression of Mimi-HSP70 gene in the immune-related tissues. These results indicated that expression analysis of Mimi-HSP70 gene provide theoretical basis to further study the mechanism of anti-adverseness in the miiuy croaker. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Microarray image analysis: background estimation using quantile and morphological filters.

    PubMed

    Bengtsson, Anders; Bengtsson, Henrik

    2006-02-28

    In a microarray experiment the difference in expression between genes on the same slide is up to 103 fold or more. At low expression, even a small error in the estimate will have great influence on the final test and reference ratios. In addition to the true spot intensity the scanned signal consists of different kinds of noise referred to as background. In order to assess the true spot intensity background must be subtracted. The standard approach to estimate background intensities is to assume they are equal to the intensity levels between spots. In the literature, morphological opening is suggested to be one of the best methods for estimating background this way. This paper examines fundamental properties of rank and quantile filters, which include morphological filters at the extremes, with focus on their ability to estimate between-spot intensity levels. The bias and variance of these filter estimates are driven by the number of background pixels used and their distributions. A new rank-filter algorithm is implemented and compared to methods available in Spot by CSIRO and GenePix Pro by Axon Instruments. Spot's morphological opening has a mean bias between -47 and -248 compared to a bias between 2 and -2 for the rank filter and the variability of the morphological opening estimate is 3 times higher than for the rank filter. The mean bias of Spot's second method, morph.close.open, is between -5 and -16 and the variability is approximately the same as for morphological opening. The variability of GenePix Pro's region-based estimate is more than ten times higher than the variability of the rank-filter estimate and with slightly more bias. The large variability is because the size of the background window changes with spot size. To overcome this, a non-adaptive region-based method is implemented. Its bias and variability are comparable to that of the rank filter. The performance of more advanced rank filters is equal to the best region-based methods. However, in order to get unbiased estimates these filters have to be implemented with great care. The performance of morphological opening is in general poor with a substantial spatial-dependent bias.

  18. A comparative study of covariance selection models for the inference of gene regulatory networks.

    PubMed

    Stifanelli, Patrizia F; Creanza, Teresa M; Anglani, Roberto; Liuzzi, Vania C; Mukherjee, Sayan; Schena, Francesco P; Ancona, Nicola

    2013-10-01

    The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.

    PubMed

    Shimoni, Yishai

    2018-02-01

    One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.

  20. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification

    PubMed Central

    2018-01-01

    One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520

  1. Expression of ORF2 partial gene of hepatitis E virus in tomatoes and immunoactivity of expression products.

    PubMed

    Ma, Ying; Lin, Shun-Quan; Gao, Yi; Li, Mei; Luo, Wen-Xin; Zhang, Jun; Xia, Ning-Shao

    2003-10-01

    To transfer hepatitis E virus (HEV) ORF2 partial gene to tomato plants, to investigate its expression in transformants and the immunoactivity of expression products, and to explore the feasibility of developing a new type of plant-derived HEV oral vaccine. Plant binary expression vector p1301E2, carrying a fragment of HEV open reading frame-2 (named HEV-E2), was constructed by linking the fragment to a constitutive CaMV35s promoter and nos terminator, then directly introduced into Agrobacterium tumefaciens EHA105. With leaf-disc method, tomato plants medicated by EHA105 were transformed and hygromycin-resistant plantlets were obtained in selective medium containing hygromycin. The presence and integration of foreign DNA in transgenic tomato genome were confirmed by Gus gene expression, PCR amplification and Southern dot blotting. The immunoactivity of recombinant protein extracted from transformed plants was examined by enzyme-linked immunosorbant assay (ELISA) using a monoclonal antibody specifically against HEV. ELISA was also used to estimate the recombinant protein content in leaves and fruits of the transformants. Seven positive lines of HEV-E2-transgenic tomato plants confirmed by PCR and Southern blotting were obtained and the immunoactivity of recombinant protein could be detected in extracts of transformants. The expression levels of recombinant protein were 61.22 ng/g fresh weight in fruits and 6.37-47.9 ng/g fresh weight in leaves of the transformants. HEV-E2 gene was correctly expressed in transgenic tomatoes and the recombinant antigen derived from them has normal immunoactivity. Transgenic tomatoes may hold a good promise for producing a new type of low-cost oral vaccine for hepatitis E virus.

  2. Using nonlinear least squares to assess relative expression and its uncertainty in real-time qPCR studies.

    PubMed

    Tellinghuisen, Joel

    2016-03-01

    Relative expression ratios are commonly estimated in real-time qPCR studies by comparing the quantification cycle for the target gene with that for a reference gene in the treatment samples, normalized to the same quantities determined for a control sample. For the "standard curve" design, where data are obtained for all four of these at several dilutions, nonlinear least squares can be used to assess the amplification efficiencies (AE) and the adjusted ΔΔCq and its uncertainty, with automatic inclusion of the effect of uncertainty in the AEs. An algorithm is illustrated for the KaleidaGraph program. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Mutual information estimation reveals global associations between stimuli and biological processes

    PubMed Central

    Suzuki, Taiji; Sugiyama, Masashi; Kanamori, Takafumi; Sese, Jun

    2009-01-01

    Background Although microarray gene expression analysis has become popular, it remains difficult to interpret the biological changes caused by stimuli or variation of conditions. Clustering of genes and associating each group with biological functions are often used methods. However, such methods only detect partial changes within cell processes. Herein, we propose a method for discovering global changes within a cell by associating observed conditions of gene expression with gene functions. Results To elucidate the association, we introduce a novel feature selection method called Least-Squares Mutual Information (LSMI), which computes mutual information without density estimaion, and therefore LSMI can detect nonlinear associations within a cell. We demonstrate the effectiveness of LSMI through comparison with existing methods. The results of the application to yeast microarray datasets reveal that non-natural stimuli affect various biological processes, whereas others are no significant relation to specific cell processes. Furthermore, we discover that biological processes can be categorized into four types according to the responses of various stimuli: DNA/RNA metabolism, gene expression, protein metabolism, and protein localization. Conclusion We proposed a novel feature selection method called LSMI, and applied LSMI to mining the association between conditions of yeast and biological processes through microarray datasets. In fact, LSMI allows us to elucidate the global organization of cellular process control. PMID:19208155

  4. Lignin, mitochondrial family, and photorespiratory transporter classification as case studies in using co-expression, co-response, and protein locations to aid in identifying transport functions

    PubMed Central

    Tohge, Takayuki; Fernie, Alisdair R.

    2014-01-01

    Whole genome sequencing and the relative ease of transcript profiling have facilitated the collection and data warehousing of immense quantities of expression data. However, a substantial proportion of genes are not yet functionally annotated a problem which is particularly acute for transport proteins. In Arabidopsis, for example, only a minor fraction of the estimated 700 intracellular transporters have been identified at the molecular genetic level. Furthermore it is only within the last couple of years that critical genes such as those encoding the final transport step required for the long distance transport of sucrose and the first transporter of the core photorespiratory pathway have been identified. Here we will describe how transcriptional coordination between genes of known function and non-annotated genes allows the identification of putative transporters on the premise that such co-expressed genes tend to be functionally related. We will additionally extend this to include the expansion of this approach to include phenotypic information from other levels of cellular organization such as proteomic and metabolomic data and provide case studies wherein this approach has successfully been used to fill knowledge gaps in important metabolic pathways and physiological processes. PMID:24672529

  5. SCOUP: a probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation.

    PubMed

    Matsumoto, Hirotaka; Kiryu, Hisanori

    2016-06-08

    Single-cell technologies make it possible to quantify the comprehensive states of individual cells, and have the power to shed light on cellular differentiation in particular. Although several methods have been developed to fully analyze the single-cell expression data, there is still room for improvement in the analysis of differentiation. In this paper, we propose a novel method SCOUP to elucidate differentiation process. Unlike previous dimension reduction-based approaches, SCOUP describes the dynamics of gene expression throughout differentiation directly, including the degree of differentiation of a cell (in pseudo-time) and cell fate. SCOUP is superior to previous methods with respect to pseudo-time estimation, especially for single-cell RNA-seq. SCOUP also successfully estimates cell lineage more accurately than previous method, especially for cells at an early stage of bifurcation. In addition, SCOUP can be applied to various downstream analyses. As an example, we propose a novel correlation calculation method for elucidating regulatory relationships among genes. We apply this method to a single-cell RNA-seq data and detect a candidate of key regulator for differentiation and clusters in a correlation network which are not detected with conventional correlation analysis. We develop a stochastic process-based method SCOUP to analyze single-cell expression data throughout differentiation. SCOUP can estimate pseudo-time and cell lineage more accurately than previous methods. We also propose a novel correlation calculation method based on SCOUP. SCOUP is a promising approach for further single-cell analysis and available at https://github.com/hmatsu1226/SCOUP.

  6. Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.

    PubMed

    Adriaens, M E; Bezzina, C R

    2018-06-22

    Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.

  7. Expression level, cellular compartment and metabolic network position all influence the average selective constraint on mammalian enzymes

    PubMed Central

    2011-01-01

    Background A gene's position in regulatory, protein interaction or metabolic networks can be predictive of the strength of purifying selection acting on it, but these relationships are neither universal nor invariably strong. Following work in bacteria, fungi and invertebrate animals, we explore the relationship between selective constraint and metabolic function in mammals. Results We measure the association between selective constraint, estimated by the ratio of nonsynonymous (Ka) to synonymous (Ks) substitutions, and several, primarily metabolic, measures of gene function. We find significant differences between the selective constraints acting on enzyme-coding genes from different cellular compartments, with the nucleus showing higher constraint than genes from either the cytoplasm or the mitochondria. Among metabolic genes, the centrality of an enzyme in the metabolic network is significantly correlated with Ka/Ks. In contrast to yeasts, gene expression magnitude does not appear to be the primary predictor of selective constraint in these organisms. Conclusions Our results imply that the relationship between selective constraint and enzyme centrality is complex: the strength of selective constraint acting on mammalian genes is quite variable and does not appear to exclusively follow patterns seen in other organisms. PMID:21470417

  8. The transcriptome of the bowhead whale Balaena mysticetus reveals adaptations of the longest-lived mammal

    PubMed Central

    Seim, Inge; Ma, Siming; Zhou, Xuming; Gerashchenko, Maxim V.; Lee, Sang-Goo; Suydam, Robert; George, John C.; Bickham, John W.; Gladyshev, Vadim N.

    2014-01-01

    Mammals vary dramatically in lifespan, by at least two-orders of magnitude, but the molecular basis for this difference remains largely unknown. The bowhead whale Balaena mysticetus is the longest-lived mammal known, with an estimated maximal lifespan in excess of two hundred years. It is also one of the two largest animals and the most cold-adapted baleen whale species. Here, we report the first genome-wide gene expression analyses of the bowhead whale, based on the de novo assembly of its transcriptome. Bowhead whale or cetacean-specific changes in gene expression were identified in the liver, kidney and heart, and complemented with analyses of positively selected genes. Changes associated with altered insulin signaling and other gene expression patterns could help explain the remarkable longevity of bowhead whales as well as their adaptation to a lipid-rich diet. The data also reveal parallels in candidate longevity adaptations of the bowhead whale, naked mole rat and Brandt's bat. The bowhead whale transcriptome is a valuable resource for the study of this remarkable animal, including the evolution of longevity and its important correlates such as resistance to cancer and other diseases. PMID:25411232

  9. No control genes required: Bayesian analysis of qRT-PCR data.

    PubMed

    Matz, Mikhail V; Wright, Rachel M; Scott, James G

    2013-01-01

    Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the "classic" analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.

  10. Oxycodone Self-Administration Induces Alterations in Expression of Integrin, Semaphorin and Ephrin Genes in the Mouse Striatum.

    PubMed

    Yuferov, Vadim; Zhang, Yong; Liang, Yupu; Zhao, Connie; Randesi, Matthew; Kreek, Mary J

    2018-01-01

    Oxycodone is one a commonly used medication for pain, and is also a widely abused prescription opioid, like other short-acting MOPr agonists. Neurochemical and structural adaptations in brain following chronic MOPr-agonist administration are thought to underlie pathogenesis and persistence of opiate addiction. Many axon guidance molecules, such as integrins, semaphorins, and ephrins may contribute to oxycodone-induced neuroadaptations through alterations in axon-target connections and synaptogenesis, that may be implicated in the behaviors associated with opiate addiction. However, little is known about this important area. The aim of this study is to investigate alterations in expression of selected integrin, semaphorin, ephrins, netrin, and slit genes in the nucleus accumbens (NAc) and caudate putamen (CPu) of mice following extended 14-day oxycodone self-administration (SA), using RNAseq. Methods: Total RNA from the NAc and CPu were isolated from adult male C57BL/6J mice within 1 h after the last session of oxycodone in a 14-day self-administration paradigm (4h/day, 0.25 mg/kg/infusion, FR1) or from yoked saline controls. Gene expressions were examined using RNA sequencing (RNA-Seq) technology. RNA-Seq libraries were prepared using Illumina's TruSeq® Stranded Total RNA LT kit. The reads were aligned to the mouse reference genome (version mm10) using STAR. DESeq2 was applied to the counts of protein coding genes to estimate the fold change between the treatment groups. False Discovery Rate (FDR) q < 0.1 were used to select genes that have a significant expression change. For selection of a subset of genes related to axon guidance pathway, REACTOME was used. Results: Among 38 known genes of the integrin, semaphorin, and ephrin gene families, RNA-seq data revealed up-regulation of six genes in the NAc: heterodimer receptor, integrins Itgal, Itgb2 , and Itgam , and its ligand semaphorin Sema7a , two semaphorin receptors, plexins Plxnd1 and Plxdc1 . There was down-regulation of eight genes in this region: two integrin genes Itga3 and Itgb8 , semaphorins Sema3c, Sema4g, Sema6a, Sema6d , semaphorin receptor neuropilin Nrp2 , and ephrin receptor Epha3 . In the CPu, there were five differentially expressed axon guidance genes: up-regulation of three integrin genes, Itgal, Itgb2, Itga1 , and down-regulation of Itga9 and ephrin Efna3 were thus observed. No significant alterations in expression of Netrin-1 or Slit were observed. Conclusion: We provide evidence for alterations in the expression of selective axon guidance genes in adult mouse brain following chronic self-administration of oxycodone. Further examination of oxycodone-induced changes in the expression of these specific axon guidance molecules and integrin genes in relation to behavior may provide new insights into development of addiction to oxycodone.

  11. Secondary metabolite gene expression and interplay of bacterial functions in a tropical freshwater cyanobacterial bloom.

    PubMed

    Penn, Kevin; Wang, Jia; Fernando, Samodha C; Thompson, Janelle R

    2014-09-01

    Cyanobacterial harmful algal blooms (cyanoHABs) appear to be increasing in frequency on a global scale. The Cyanobacteria in blooms can produce toxic secondary metabolites that make freshwater dangerous for drinking and recreation. To characterize microbial activities in a cyanoHAB, transcripts from a eutrophic freshwater reservoir in Singapore were sequenced for six samples collected over one day-night period. Transcripts from the Cyanobacterium Microcystis dominated all samples and were accompanied by at least 533 genera primarily from the Cyanobacteria, Proteobacteria, Bacteroidetes and Actinobacteria. Within the Microcystis population, abundant transcripts were from genes for buoyancy, photosynthesis and synthesis of the toxin microviridin, suggesting that these are necessary for competitive dominance in the Reservoir. During the day, Microcystis transcripts were enriched in photosynthesis and energy metabolism while at night enriched pathways included DNA replication and repair and toxin biosynthesis. Microcystis was the dominant source of transcripts from polyketide and non-ribosomal peptide synthase (PKS and NRPS, respectively) gene clusters. Unexpectedly, expression of all PKS/NRPS gene clusters, including for the toxins microcystin and aeruginosin, occurred throughout the day-night cycle. The most highly expressed PKS/NRPS gene cluster from Microcystis is not associated with any known product. The four most abundant phyla in the reservoir were enriched in different functions, including photosynthesis (Cyanobacteria), breakdown of complex organic molecules (Proteobacteria), glycan metabolism (Bacteroidetes) and breakdown of plant carbohydrates, such as cellobiose (Actinobacteria). These results provide the first estimate of secondary metabolite gene expression, functional partitioning and functional interplay in a freshwater cyanoHAB.

  12. Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba).

    PubMed

    Rahi, Md Lifat; Amin, Shorash; Mather, Peter B; Hurwood, David A

    2017-01-01

    The endemic Australian freshwater prawn, Macrobrachium koombooloomba , provides a model for exploring genes involved with freshwater adaptation because it is one of the relatively few Macrobrachium species that can complete its entire life cycle in freshwater. The present study was conducted to identify potential candidate genes that are likely to contribute to effective freshwater adaptation by M. koombooloomba using a transcriptomics approach. De novo assembly of 75 bp paired end 227,564,643 high quality Illumina raw reads from 6 different cDNA libraries revealed 125,917 contigs of variable lengths (200-18,050 bp) with an N50 value of 1597. In total, 31,272 (24.83%) of the assembled contigs received significant blast hits, of which 27,686 and 22,560 contigs were mapped and functionally annotated, respectively. CEGMA (Core Eukaryotic Genes Mapping Approach) based transcriptome quality assessment revealed 96.37% completeness. We identified 43 different potential genes that are likely to be involved with freshwater adaptation in M. koombooloomba . Identified candidate genes included: 25 genes for osmoregulation, five for cell volume regulation, seven for stress tolerance, three for body fluid (haemolymph) maintenance, eight for epithelial permeability and water channel regulation, nine for egg size control and three for larval development. RSEM (RNA-Seq Expectation Maximization) based abundance estimation revealed that 6,253, 5,753 and 3,795 transcripts were expressed (at TPM value ≥10) in post larvae, juveniles and adults, respectively. Differential gene expression (DGE) analysis showed that 15 genes were expressed differentially in different individuals but these genes apparently were not involved with freshwater adaptation but rather were involved in growth, development and reproductive maturation. The genomic resources developed here will be useful for better understanding the molecular basis of freshwater adaptation in Macrobrachium prawns and other crustaceans more broadly.

  13. Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba)

    PubMed Central

    Amin, Shorash; Mather, Peter B.; Hurwood, David A.

    2017-01-01

    Background The endemic Australian freshwater prawn, Macrobrachium koombooloomba, provides a model for exploring genes involved with freshwater adaptation because it is one of the relatively few Macrobrachium species that can complete its entire life cycle in freshwater. Methods The present study was conducted to identify potential candidate genes that are likely to contribute to effective freshwater adaptation by M. koombooloomba using a transcriptomics approach. De novo assembly of 75 bp paired end 227,564,643 high quality Illumina raw reads from 6 different cDNA libraries revealed 125,917 contigs of variable lengths (200–18,050 bp) with an N50 value of 1597. Results In total, 31,272 (24.83%) of the assembled contigs received significant blast hits, of which 27,686 and 22,560 contigs were mapped and functionally annotated, respectively. CEGMA (Core Eukaryotic Genes Mapping Approach) based transcriptome quality assessment revealed 96.37% completeness. We identified 43 different potential genes that are likely to be involved with freshwater adaptation in M. koombooloomba. Identified candidate genes included: 25 genes for osmoregulation, five for cell volume regulation, seven for stress tolerance, three for body fluid (haemolymph) maintenance, eight for epithelial permeability and water channel regulation, nine for egg size control and three for larval development. RSEM (RNA-Seq Expectation Maximization) based abundance estimation revealed that 6,253, 5,753 and 3,795 transcripts were expressed (at TPM value ≥10) in post larvae, juveniles and adults, respectively. Differential gene expression (DGE) analysis showed that 15 genes were expressed differentially in different individuals but these genes apparently were not involved with freshwater adaptation but rather were involved in growth, development and reproductive maturation. Discussion The genomic resources developed here will be useful for better understanding the molecular basis of freshwater adaptation in Macrobrachium prawns and other crustaceans more broadly. PMID:28194319

  14. Identification of genes and gene pathways associated with major depressive disorder by integrative brain analysis of rat and human prefrontal cortex transcriptomes

    PubMed Central

    Malki, K; Pain, O; Tosto, M G; Du Rietz, E; Carboni, L; Schalkwyk, L C

    2015-01-01

    Despite moderate heritability estimates, progress in uncovering the molecular substrate underpinning major depressive disorder (MDD) has been slow. In this study, we used prefrontal cortex (PFC) gene expression from a genetic rat model of MDD to inform probe set prioritization in PFC in a human post-mortem study to uncover genes and gene pathways associated with MDD. Gene expression differences between Flinders sensitive (FSL) and Flinders resistant (FRL) rat lines were statistically evaluated using the RankProd, non-parametric algorithm. Top ranking probe sets in the rat study were subsequently used to prioritize orthologous selection in a human PFC in a case–control post-mortem study on MDD from the Stanley Brain Consortium. Candidate genes in the human post-mortem study were then tested against a matched control sample using the RankProd method. A total of 1767 probe sets were differentially expressed in the PFC between FSL and FRL rat lines at (q⩽0.001). A total of 898 orthologous probe sets was found on Affymetrix's HG-U95A chip used in the human study. Correcting for the number of multiple, non-independent tests, 20 probe sets were found to be significantly dysregulated between human cases and controls at q⩽0.05. These probe sets tagged the expression profile of 18 human genes (11 upregulated and seven downregulated). Using an integrative rat–human study, a number of convergent genes that may have a role in pathogenesis of MDD were uncovered. Eighty percent of these genes were functionally associated with a key stress response signalling cascade, involving NF-κB (nuclear factor kappa-light-chain-enhancer of activated B cells), AP-1 (activator protein 1) and ERK/MAPK, which has been systematically associated with MDD, neuroplasticity and neurogenesis. PMID:25734512

  15. Gene trapping in differentiating cell lines: regulation of the lysosomal protease cathepsin B in skeletal myoblast growth and fusion.

    PubMed

    Gogos, J A; Thompson, R; Lowry, W; Sloane, B F; Weintraub, H; Horwitz, M

    1996-08-01

    To identify genes regulated during skeletal muscle differentiation, we have infected mouse C2C12 myoblasts with retroviral gene trap vectors, containing a promoterless marker gene with a 5' splice acceptor signal. Integration of the vector adjacent to an actively transcribed gene places the marker under the transcriptional control of the endogenous gene, while the adjacent vector sequences facilitate cloning. The vector insertionally mutates the trapped locus and may also form fusion proteins with the endogenous gene product. We have screened several hundred clones, each containing a trapping vector integrated into a different endogenous gene. In agreement with previous estimates based on hybridization kinetics, we find that a large proportion of all genes expressed in myoblasts are regulated during differentiation. Many of these genes undergo unique temporal patterns of activation or repression during cell growth and myotube formation, and some show specific patterns of subcellular localization. The first gene we have identified with this strategy is the lysosomal cysteine protease cathepsin B. Expression from the trapped allele is upregulated during early myoblast fusion and downregulated in myotubes. A direct role for cathepsin B in myoblast growth and fusion is suggested by the observation that the trapped cells deficient in cathepsin B activity have an unusual morphology and reduced survival in low-serum media and undergo differentiation with impaired cellular fusion. The phenotype is reproduced by antisense cathepsin B expression in parental C2C12 myoblasts. The cellular phenotype is similar to that observed in cultured myoblasts from patients with I cell disease, in which there is diminished accumulation of lysosomal enzymes. This suggests that a specific deficiency of cathepsin B could contribute to the myopathic component of this illness.

  16. Characterizing the Grape Transcriptome. Analysis of Expressed Sequence Tags from Multiple Vitis Species and Development of a Compendium of Gene Expression during Berry Development1[w

    PubMed Central

    Silva, Francisco Goes da; Iandolino, Alberto; Al-Kayal, Fadi; Bohlmann, Marlene C.; Cushman, Mary Ann; Lim, Hyunju; Ergul, Ali; Figueroa, Rubi; Kabuloglu, Elif K.; Osborne, Craig; Rowe, Joan; Tattersall, Elizabeth; Leslie, Anna; Xu, Jane; Baek, JongMin; Cramer, Grant R.; Cushman, John C.; Cook, Douglas R.

    2005-01-01

    We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of ≥98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation to previously characterized aspects of berry development and physiology. Comparison with published results for select genes, as well as correlation analysis between independent data sets, suggests that the inferred in silico patterns of expression are likely to be an accurate representation of transcript abundance for the conditions surveyed. Thus, the combined data set reveals the in silico expression patterns for hundreds of genes in V. vinifera, the majority of which have not been previously studied within this species. PMID:16219919

  17. A gene profiling deconvolution approach to estimating immune cell composition from complex tissues.

    PubMed

    Chen, Shu-Hwa; Kuo, Wen-Yu; Su, Sheng-Yao; Chung, Wei-Chun; Ho, Jen-Ming; Lu, Henry Horng-Shing; Lin, Chung-Yen

    2018-05-08

    A new emerged cancer treatment utilizes intrinsic immune surveillance mechanism that is silenced by those malicious cells. Hence, studies of tumor infiltrating lymphocyte populations (TILs) are key to the success of advanced treatments. In addition to laboratory methods such as immunohistochemistry and flow cytometry, in silico gene expression deconvolution methods are available for analyses of relative proportions of immune cell types. Herein, we used microarray data from the public domain to profile gene expression pattern of twenty-two immune cell types. Initially, outliers were detected based on the consistency of gene profiling clustering results and the original cell phenotype notation. Subsequently, we filtered out genes that are expressed in non-hematopoietic normal tissues and cancer cells. For every pair of immune cell types, we ran t-tests for each gene, and defined differentially expressed genes (DEGs) from this comparison. Equal numbers of DEGs were then collected as candidate lists and numbers of conditions and minimal values for building signature matrixes were calculated. Finally, we used v -Support Vector Regression to construct a deconvolution model. The performance of our system was finally evaluated using blood biopsies from 20 adults, in which 9 immune cell types were identified using flow cytometry. The present computations performed better than current state-of-the-art deconvolution methods. Finally, we implemented the proposed method into R and tested extensibility and usability on Windows, MacOS, and Linux operating systems. The method, MySort, is wrapped as the Galaxy platform pluggable tool and usage details are available at https://testtoolshed.g2.bx.psu.edu/view/moneycat/mysort/e3afe097e80a .

  18. Gene-Specific DNA Methylation Changes Predict Remission in Patients with ANCA-Associated Vasculitis

    PubMed Central

    Jones, Britta E.; Yang, Jiajin; Muthigi, Akhil; Hogan, Susan L.; Hu, Yichun; Starmer, Joshua; Henderson, Candace D.; Poulton, Caroline J.; Brant, Elizabeth J.; Pendergraft, William F.; Jennette, J. Charles; Falk, Ronald J.

    2017-01-01

    ANCA-associated vasculitis is an autoimmune condition characterized by vascular inflammation and organ damage. Pharmacologically induced remission of this condition is complicated by relapses. Potential triggers of relapse are immunologic challenges and environmental insults, both of which associate with changes in epigenetic silencing modifications. Altered histone modifications implicated in gene silencing associate with aberrant autoantigen expression. To establish a link between DNA methylation, a model epigenetic gene silencing modification, and autoantigen gene expression and disease status in ANCA-associated vasculitis, we measured gene-specific DNA methylation of the autoantigen genes myeloperoxidase (MPO) and proteinase 3 (PRTN3) in leukocytes of patients with ANCA-associated vasculitis observed longitudinally (n=82) and of healthy controls (n=32). Patients with active disease demonstrated hypomethylation of MPO and PRTN3 and increased expression of the autoantigens; in remission, DNA methylation generally increased. Longitudinal analysis revealed that patients with ANCA-associated vasculitis could be divided into two groups, on the basis of whether DNA methylation increased or decreased from active disease to remission. In patients with increased DNA methylation, MPO and PRTN3 expression correlated with DNA methylation. Kaplan–Meier estimate of relapse revealed patients with increased DNA methylation at the PRTN3 promoter had a significantly greater probability of a relapse-free period (P<0.001), independent of ANCA serotype. Patients with decreased DNA methylation at the PRTN3 promoter had a greater risk of relapse (hazard ratio, 4.55; 95% confidence interval, 2.09 to 9.91). Thus, changes in the DNA methylation status of the PRTN3 promoter may predict the likelihood of stable remission and explain autoantigen gene regulation. PMID:27821628

  19. Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer

    PubMed Central

    2015-01-01

    Background microRNA (miRNA) expression plays an influential role in cancer classification and malignancy, and miRNAs are feasible as alternative diagnostic markers for pancreatic cancer, a highly aggressive neoplasm with silent early symptoms, high metastatic potential, and resistance to conventional therapies. Methods In this study, we evaluated the benefits of multi-omics data analysis by integrating miRNA and mRNA expression data in pancreatic cancer. Using support vector machine (SVM) modelling and leave-one-out cross validation (LOOCV), we evaluated the diagnostic performance of single- or multi-markers based on miRNA and mRNA expression profiles from 104 PDAC tissues and 17 benign pancreatic tissues. For selecting even more reliable and robust markers, we performed validation by independent datasets from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) data depositories. For validation, miRNA activity was estimated by miRNA-target gene interaction and mRNA expression datasets in pancreatic cancer. Results Using a comprehensive identification approach, we successfully identified 705 multi-markers having powerful diagnostic performance for PDAC. In addition, these marker candidates annotated with cancer pathways using gene ontology analysis. Conclusions Our prediction models have strong potential for the diagnosis of pancreatic cancer. PMID:26328610

  20. Elucidation of the transcription network governing mammalian sex determination by exploiting strain-specific susceptibility to sex reversal

    PubMed Central

    Munger, Steven C.; Aylor, David L.; Syed, Haider Ali; Magwene, Paul M.; Threadgill, David W.; Capel, Blanche

    2009-01-01

    Despite the identification of some key genes that regulate sex determination, most cases of disorders of sexual development remain unexplained. Evidence suggests that the sexual fate decision in the developing gonad depends on a complex network of interacting factors that converge on a critical threshold. To elucidate the transcriptional network underlying sex determination, we took the first expression quantitative trait loci (eQTL) approach in a developing organ. We identified reproducible differences in the transcriptome of the embryonic day 11.5 (E11.5) XY gonad between C57BL/6J (B6) and 129S1/SvImJ (129S1), indicating that the reported sensitivity of B6 to sex reversal is consistent with a higher expression of a female-like transcriptome in B6. Gene expression is highly variable in F2 XY gonads from B6 and 129S1 intercrosses, yet strong correlations emerged. We estimated the F2 coexpression network and predicted roles for genes of unknown function based on their connectivity and position within the network. A genetic analysis of the F2 population detected autosomal regions that control the expression of many sex-related genes, including Sry (sex-determining region of the Y chromosome) and Sox9 (Sry-box containing gene 9), the key regulators of male sex determination. Our results reveal the complex transcription architecture underlying sex determination, and provide a mechanism by which individuals may be sensitized for sex reversal. PMID:19884258

  1. Meiotic drive impacts expression and evolution of x-linked genes in stalk-eyed flies.

    PubMed

    Reinhardt, Josephine A; Brand, Cara L; Paczolt, Kimberly A; Johns, Philip M; Baker, Richard H; Wilkinson, Gerald S

    2014-01-01

    Although sex chromosome meiotic drive has been observed in a variety of species for over 50 years, the genes causing drive are only known in a few cases, and none of these cases cause distorted sex-ratios in nature. In stalk-eyed flies (Teleopsis dalmanni), driving X chromosomes are commonly found at frequencies approaching 30% in the wild, but the genetic basis of drive has remained elusive due to reduced recombination between driving and non-driving X chromosomes. Here, we used RNAseq to identify transcripts that are differentially expressed between males carrying either a driving X (XSR) or a standard X chromosome (XST), and found hundreds of these, the majority of which are X-linked. Drive-associated transcripts show increased levels of sequence divergence (dN/dS) compared to a control set, and are predominantly expressed either in testes or in the gonads of both sexes. Finally, we confirmed that XSR and XST are highly divergent by estimating sequence differentiation between the RNAseq pools. We found that X-linked transcripts were often strongly differentiated (whereas most autosomal transcripts were not), supporting the presence of a relatively large region of recombination suppression on XSR presumably caused by one or more inversions. We have identified a group of genes that are good candidates for further study into the causes and consequences of sex-chromosome drive, and demonstrated that meiotic drive has had a profound effect on sequence evolution and gene expression of X-linked genes in this species.

  2. The Dysregulation of Polyamine Metabolism in Colorectal Cancer Is Associated with Overexpression of c-Myc and C/EBPβ rather than Enterotoxigenic Bacteroides fragilis Infection.

    PubMed

    Snezhkina, Anastasiya V; Krasnov, George S; Lipatova, Anastasiya V; Sadritdinova, Asiya F; Kardymon, Olga L; Fedorova, Maria S; Melnikova, Nataliya V; Stepanov, Oleg A; Zaretsky, Andrew R; Kaprin, Andrey D; Alekseev, Boris Y; Dmitriev, Alexey A; Kudryavtseva, Anna V

    2016-01-01

    Colorectal cancer is one of the most common cancers in the world. It is well known that the chronic inflammation can promote the progression of colorectal cancer (CRC). Recently, a number of studies revealed a potential association between colorectal inflammation, cancer progression, and infection caused by enterotoxigenic Bacteroides fragilis (ETBF). Bacterial enterotoxin activates spermine oxidase (SMO), which produces spermidine and H2O2 as byproducts of polyamine catabolism, which, in turn, enhances inflammation and tissue injury. Using qPCR analysis, we estimated the expression of SMOX gene and ETBF colonization in CRC patients. We found no statistically significant associations between them. Then we selected genes involved in polyamine metabolism, metabolic reprogramming, and inflammation regulation and estimated their expression in CRC. We observed overexpression of SMOX, ODC1, SRM, SMS, MTAP, c-Myc, C/EBPβ (CREBP), and other genes. We found that two mediators of metabolic reprogramming, inflammation, and cell proliferation c-Myc and C/EBPβ may serve as regulators of polyamine metabolism genes (SMOX, AZIN1, MTAP, SRM, ODC1, AMD1, and AGMAT) as they are overexpressed in tumors, have binding site according to ENCODE ChIP-Seq data, and demonstrate strong coexpression with their targets. Thus, increased polyamine metabolism in CRC could be driven by c-Myc and C/EBPβ rather than ETBF infection.

  3. The Dysregulation of Polyamine Metabolism in Colorectal Cancer Is Associated with Overexpression of c-Myc and C/EBPβ rather than Enterotoxigenic Bacteroides fragilis Infection

    PubMed Central

    Snezhkina, Anastasiya V.; Lipatova, Anastasiya V.; Sadritdinova, Asiya F.; Kardymon, Olga L.; Fedorova, Maria S.; Kaprin, Andrey D.

    2016-01-01

    Colorectal cancer is one of the most common cancers in the world. It is well known that the chronic inflammation can promote the progression of colorectal cancer (CRC). Recently, a number of studies revealed a potential association between colorectal inflammation, cancer progression, and infection caused by enterotoxigenic Bacteroides fragilis (ETBF). Bacterial enterotoxin activates spermine oxidase (SMO), which produces spermidine and H2O2 as byproducts of polyamine catabolism, which, in turn, enhances inflammation and tissue injury. Using qPCR analysis, we estimated the expression of SMOX gene and ETBF colonization in CRC patients. We found no statistically significant associations between them. Then we selected genes involved in polyamine metabolism, metabolic reprogramming, and inflammation regulation and estimated their expression in CRC. We observed overexpression of SMOX, ODC1, SRM, SMS, MTAP, c-Myc, C/EBPβ (CREBP), and other genes. We found that two mediators of metabolic reprogramming, inflammation, and cell proliferation c-Myc and C/EBPβ may serve as regulators of polyamine metabolism genes (SMOX, AZIN1, MTAP, SRM, ODC1, AMD1, and AGMAT) as they are overexpressed in tumors, have binding site according to ENCODE ChIP-Seq data, and demonstrate strong coexpression with their targets. Thus, increased polyamine metabolism in CRC could be driven by c-Myc and C/EBPβ rather than ETBF infection. PMID:27433286

  4. Cardiomyocyte cell cycle control and growth estimation in vivo--an analysis based on cardiomyocyte nuclei.

    PubMed

    Walsh, Stuart; Pontén, Annica; Fleischmann, Bernd K; Jovinge, Stefan

    2010-06-01

    Adult mammalian cardiomyocytes are traditionally viewed as being permanently withdrawn from the cell cycle. Whereas some groups have reported none, others have reported extensive mitosis in adult myocardium under steady-state conditions. Recently, a highly specific assay of 14C dating in humans has suggested a continuous generation of cardiomyocytes in the adult, albeit at a very low rate. Mice represent the most commonly used animal model for these studies, but their short lifespan makes them unsuitable for 14C studies. Herein, we investigate the cellular growth pattern for murine cardiomyocyte growth under steady-state conditions, addressed with new analytical and technical strategies, and we furthermore relate this to gene expression patterns. The observed levels of DNA synthesis in early life were associated with cardiomyocyte proliferation. Mitosis was prolonged into early life, longer than the most conservative previous estimates. DNA synthesis in neonatal life was attributable to bi-nucleation, therefore suggesting that cardiomyocytes withdraw from the cell cycle shortly after birth. No cell cycle activity was observed in adult cardiomyocytes and significant polyploidy was observed in cardiomyocyte nuclei. Gene analyses identified 32 genes whose expression was predicted to be particular to day 3-4 neonatal myocytes, compared with embryonic or adult cells. These cell cycle-associated genes are crucial to the understanding of the mechanisms of bi-nucleation and physiological cellular growth in the neonatal period.

  5. kappa-Opioid receptor in humans: cDNA and genomic cloning, chromosomal assignment, functional expression, pharmacology, and expression pattern in the central nervous system.

    PubMed Central

    Simonin, F; Gavériaux-Ruff, C; Befort, K; Matthes, H; Lannes, B; Micheletti, G; Mattéi, M G; Charron, G; Bloch, B; Kieffer, B

    1995-01-01

    Using the mouse delta-opioid receptor cDNA as a probe, we have isolated genomic clones encoding the human mu- and kappa-opioid receptor genes. Their organization appears similar to that of the human delta receptor gene, with exon-intron boundaries located after putative transmembrane domains 1 and 4. The kappa gene was mapped at position q11-12 in human chromosome 8. A full-length cDNA encoding the human kappa-opioid receptor has been isolated. The cloned receptor expressed in COS cells presents a typical kappa 1 pharmacological profile and is negatively coupled to adenylate cyclase. The expression of kappa-opioid receptor mRNA in human brain, as estimated by reverse transcription-polymerase chain reaction, is consistent with the involvement of kappa-opioid receptors in pain perception, neuroendocrine physiology, affective behavior, and cognition. In situ hybridization studies performed on human fetal spinal cord demonstrate the presence of the transcript specifically in lamina II of the dorsal horn. Some divergences in structural, pharmacological, and anatomical properties are noted between the cloned human and rodent receptors. Images Fig. 3 Fig. 4 PMID:7624359

  6. High natural gene expression variation in the reef-building coral Acropora millepora: potential for acclimative and adaptive plasticity.

    PubMed

    Granados-Cifuentes, Camila; Bellantuono, Anthony J; Ridgway, Tyrone; Hoegh-Guldberg, Ove; Rodriguez-Lanetty, Mauricio

    2013-04-08

    Ecosystems worldwide are suffering the consequences of anthropogenic impact. The diverse ecosystem of coral reefs, for example, are globally threatened by increases in sea surface temperatures due to global warming. Studies to date have focused on determining genetic diversity, the sequence variability of genes in a species, as a proxy to estimate and predict the potential adaptive response of coral populations to environmental changes linked to climate changes. However, the examination of natural gene expression variation has received less attention. This variation has been implicated as an important factor in evolutionary processes, upon which natural selection can act. We acclimatized coral nubbins from six colonies of the reef-building coral Acropora millepora to a common garden in Heron Island (Great Barrier Reef, GBR) for a period of four weeks to remove any site-specific environmental effects on the physiology of the coral nubbins. By using a cDNA microarray platform, we detected a high level of gene expression variation, with 17% (488) of the unigenes differentially expressed across coral nubbins of the six colonies (jsFDR-corrected, p < 0.01). Among the main categories of biological processes found differentially expressed were transport, translation, response to stimulus, oxidation-reduction processes, and apoptosis. We found that the transcriptional profiles did not correspond to the genotype of the colony characterized using either an intron of the carbonic anhydrase gene or microsatellite loci markers. Our results provide evidence of the high inter-colony variation in A. millepora at the transcriptomic level grown under a common garden and without a correspondence with genotypic identity. This finding brings to our attention the importance of taking into account natural variation between reef corals when assessing experimental gene expression differences. The high transcriptional variation detected in this study is interpreted and discussed within the context of adaptive potential and phenotypic plasticity of reef corals. Whether this variation will allow coral reefs to survive to current challenges remains unknown.

  7. Hapten-derivatized nanoparticle targeting and imaging of gene expression by multimodality imaging systems.

    PubMed

    Cheng, C-M; Chu, P-Y; Chuang, K-H; Roffler, S R; Kao, C-H; Tseng, W-L; Shiea, J; Chang, W-D; Su, Y-C; Chen, B-M; Wang, Y-M; Cheng, T-L

    2009-01-01

    Non-invasive gene monitoring is important for most gene therapy applications to ensure selective gene transfer to specific cells or tissues. We developed a non-invasive imaging system to assess the location and persistence of gene expression by anchoring an anti-dansyl (DNS) single-chain antibody (DNS receptor) on the cell surface to trap DNS-derivatized imaging probes. DNS hapten was covalently attached to cross-linked iron oxide (CLIO) to form a 39+/-0.5 nm DNS-CLIO nanoparticle imaging probe. DNS-CLIO specifically bound to DNS receptors but not to a control single-chain antibody receptor. DNS-CLIO (100 microM Fe) was non-toxic to both B16/DNS (DNS receptor positive) and B16/phOx (control receptor positive) cells. Magnetic resonance (MR) imaging could detect as few as 10% B16/DNS cells in a mixture in vitro. Importantly, DNS-CLIO specifically bound to a B16/DNS tumor, which markedly reduced signal intensity. Similar results were also shown with DNS quantum dots, which specifically targeted CT26/DNS cells but not control CT26/phOx cells both in vitro and in vivo. These results demonstrate that DNS nanoparticles can systemically monitor the expression of DNS receptor in vivo by feasible imaging systems. This targeting strategy may provide a valuable tool to estimate the efficacy and specificity of different gene delivery systems and optimize gene therapy protocols in the clinic.

  8. Hypoxia-inducible vascular endothelial growth factor gene therapy using the oxygen-dependent degradation domain in myocardial ischemia.

    PubMed

    Kim, Hyun Ah; Lim, Soyeon; Moon, Hyung-Ho; Kim, Sung Wan; Hwang, Ki-Chul; Lee, Minhyung; Kim, Sun Hwa; Choi, Donghoon

    2010-10-01

    A hypoxia-inducible VEGF expression system with the oxygen-dependent degradation (ODD) domain was constructed and tested to be used in gene therapy for ischemic myocardial disease. Luciferase and VEGF expression vector systems were constructed with or without the ODD domain: pEpo-SV-Luc (or pEpo-SV-VEGF) and pEpo-SV-Luc-ODD (or pEpo-SV-VEGF-ODD). In vitro gene expression efficiency of each vector type was evaluated in HEK 293 cells under both hypoxic and normoxic conditions. The amount of VEGF protein was estimated by ELISA. The VEGF expression vectors with or without the ODD domain were injected into ischemic rat myocardium. Fibrosis, neovascularization, and cardiomyocyte apoptosis were assessed using Masson's trichrome staining, α-smooth muscle actin (α-SMA) immunostaining, and the TUNEL assay, respectively. The plasmid vectors containing ODD significantly improved the expression level of VEGF protein in hypoxic conditions. The enhancement of VEGF protein production was attributed to increased protein stability due to oxygen deficiency. In a rat model of myocardial ischemia, the pEpo-SV-VEGF-ODD group exhibited less myocardial fibrosis, higher microvessel density, and less cardiomyocyte apoptosis compared to the control groups (saline and pEpo-SV-VEGF treatments). An ODD-mediated VEGF expression system that facilitates VEGF-production under hypoxia may be useful in the treatment of ischemic heart disease.

  9. Accelerated rates of protein evolution in barley grain and pistil biased genes might be legacy of domestication.

    PubMed

    Shi, Tao; Dimitrov, Ivan; Zhang, Yinling; Tax, Frans E; Yi, Jing; Gou, Xiaoping; Li, Jia

    2015-10-01

    Traits related to grain and reproductive organs in grass crops have been under continuous directional selection during domestication. Barley is one of the oldest domesticated crops in human history. Thus genes associated with the grain and reproductive organs in barley may show evidence of dramatic evolutionary change. To understand how artificial selection contributes to protein evolution of biased genes in different barley organs, we used Digital Gene Expression analysis of six barley organs (grain, pistil, anther, leaf, stem and root) to identify genes with biased expression in specific organs. Pairwise comparisons of orthologs between barley and Brachypodium distachyon, as well as between highland and lowland barley cultivars mutually indicated that grain and pistil biased genes show relatively higher protein evolutionary rates compared with the median of all orthologs and other organ biased genes. Lineage-specific protein evolutionary rates estimation showed similar patterns with elevated protein evolution in barley grain and pistil biased genes, yet protein sequences generally evolve much faster in the lowland barley cultivar. Further functional annotations revealed that some of these grain and pistil biased genes with rapid protein evolution are related to nutrient biosynthesis and cell cycle/division. Our analyses provide insights into how domestication differentially shaped the evolution of genes specific to different organs of a crop species, and implications for future functional studies of domestication genes.

  10. Changes in the human transcriptome upon vitamin D supplementation.

    PubMed

    Pasing, Yvonne; Fenton, Christopher Graham; Jorde, Rolf; Paulssen, Ruth Hracky

    2017-10-01

    Vitamin D is hydroxylated in the liver and kidneys to its active form, which can bind to the vitamin D receptor (VDR). The VDR is present in a wide variety of different cells types and tissues and acts as a transcription factor. Although activation of the VDR is estimated to regulate expression of up to 5% of the human genome, our study is the first analysing gene expression after supplementation in more than 10 subjects. Subjects of a randomized controlled trial (RCT) received either vitamin D 3 (n=47) in a weekly dose of 20,000 IU or placebo (n=47) for a period of three to five years. For this study, blood samples for preparation of RNA were drawn from the subjects and mRNA gene expression in blood was determined using microarray analysis. The two study groups were similar regarding gender, age, BMI and duration of supplementation, whereas the mean serum 25-hydroxyvitamin D (25(OH)D) level as expected was significantly higher in the vitamin D group (119 versus 63nmol/L). When analysing all subjects, nearly no significant differences in gene expression between the two groups were found. However, when analysing men and women separately, significant effects on gene expression were observed for women. Furthermore, when only including subjects with the highest and lowest serum 25(OH)D levels, additional vitamin D regulated genes were disclosed. Thus, a total of 99 genes (p≤0.05, log2 fold change ≥|0.2|) were found to be regulated, of which 72 have not been published before as influenced by vitamin D. These genes were particularly involved in the interleukin signaling pathway, oxidative stress response, apoptosis signaling pathway and gonadotropin releasing hormone receptor pathway. Thus, our results open the possibility for many future studies. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Bisphenol A Exposure Is Associated with in Vivo Estrogenic Gene Expression in Adults

    PubMed Central

    Melzer, David; Harries, Lorna; Cipelli, Riccardo; Henley, William; Money, Cathryn; McCormack, Paul; Young, Anita; Guralnik, Jack; Ferrucci, Luigi; Bandinelli, Stefania; Corsi, Anna Maria

    2011-01-01

    Background: Bisphenol A (BPA) is a synthetic estrogen commonly used in polycarbonate plastic and resin-lined food and beverage containers. Exposure of animal and cell models to doses of BPA below the recommended tolerable daily intake (TDI) of 50 μg/kg/day have been shown to alter specific estrogen-responsive gene expression, but this has not previously been shown in humans. Objective: We investigated associations between BPA exposure and in vivo estrogenic gene expression in humans. Methods: We studied 96 adult men from the InCHIANTI population study and examined in vivo expression of six estrogen receptor, estrogen-related receptor, and androgen receptor genes in peripheral blood leukocytes. Results: The geometric mean urinary BPA concentration was 3.65 ng/mL [95% confidence interval (CI): 3.13, 4.28], giving an estimated mean excretion of 5.84 μg/day (95% CI: 5.00, 6.85), significantly below the current TDI. In age-adjusted models, there were positive associations between higher BPA concentrations and higher ESR2 [estrogen receptor 2 (ER beta)] expression (unstandardized linear regression coefficient = 0.1804; 95% CI: 0.0388, 0.3221; p = 0.013) and ESRRA (estrogen related receptor alpha) expression (coefficient = 0.1718; 95% CI: 0.0213, 0.3223; p = 0.026): These associations were little changed after adjusting for potential confounders, including obesity, serum lipid concentrations, and white cell subtype percentages. Upper-tertile BPA excretors (urinary BPA > 4.6 ng/mL) had 65% higher mean ESR2 expression than did lower-tertile BPA excretors (0–2.4 ng/mL). Conclusions: Because activation of nuclear-receptor–mediated pathways by BPA is consistently found in laboratory studies, such activation in humans provides evidence that BPA is likely to function as a xenoestrogen in this sample of adults. PMID:21831745

  12. Urotensin-II System in Genetic Control of Blood Pressure and Renal Function

    PubMed Central

    Debiec, Radoslaw; Christofidou, Paraskevi; Denniff, Matthew; Bloomer, Lisa D.; Bogdanski, Pawel; Wojnar, Lukasz; Musialik, Katarzyna; Charchar, Fadi J.; Thompson, John R.; Waterworth, Dawn; Song, Kijoung; Vollenweider, Peter; Waeber, Gerard; Zukowska-Szczechowska, Ewa; Samani, Nilesh J.; Lambert, David; Tomaszewski, Maciej

    2013-01-01

    Urotensin-II controls ion/water homeostasis in fish and vascular tone in rodents. We hypothesised that common genetic variants in urotensin-II pathway genes are associated with human blood pressure or renal function. We performed family-based analysis of association between blood pressure, glomerular filtration and genes of the urotensin-II pathway (urotensin-II, urotensin-II related peptide, urotensin-II receptor) saturated with 28 tagging single nucleotide polymorphisms in 2024 individuals from 520 families; followed by an independent replication in 420 families and 7545 unrelated subjects. The expression studies of the urotensin-II pathway were carried out in 97 human kidneys. Phylogenetic evolutionary analysis was conducted in 17 vertebrate species. One single nucleotide polymorphism (rs531485 in urotensin-II gene) was associated with adjusted estimated glomerular filtration rate in the discovery cohort (p = 0.0005). It showed no association with estimated glomerular filtration rate in the combined replication resource of 8724 subjects from 6 populations. Expression of urotensin-II and its receptor showed strong linear correlation (r = 0.86, p<0.0001). There was no difference in renal expression of urotensin-II system between hypertensive and normotensive subjects. Evolutionary analysis revealed accumulation of mutations in urotensin-II since the divergence of primates and weaker conservation of urotensin-II receptor in primates than in lower vertebrates. Our data suggest that urotensin-II system genes are unlikely to play a major role in genetic control of human blood pressure or renal function. The signatures of evolutionary forces acting on urotensin-II system indicate that it may have evolved towards loss of function since the divergence of primates. PMID:24391740

  13. Clustering of change patterns using Fourier coefficients.

    PubMed

    Kim, Jaehee; Kim, Haseong

    2008-01-15

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.

  14. Comparative analysis of gene regulatory networks: from network reconstruction to evolution.

    PubMed

    Thompson, Dawn; Regev, Aviv; Roy, Sushmita

    2015-01-01

    Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.

  15. Molecular Screening Tools to Study Arabidopsis Transcription Factors

    PubMed Central

    Wehner, Nora; Weiste, Christoph; Dröge-Laser, Wolfgang

    2011-01-01

    In the model plant Arabidopsis thaliana, more than 2000 genes are estimated to encode transcription factors (TFs), which clearly emphasizes the importance of transcriptional control. Although genomic approaches have generated large TF open reading frame (ORF) collections, only a limited number of these genes is functionally characterized, yet. This review evaluates strategies and methods to identify TF functions. In particular, we focus on two recently developed TF screening platforms, which make use of publically available GATEWAY®-compatible ORF collections. (1) The Arabidopsis thaliana TF ORF over-Expression (AtTORF-Ex) library provides pooled collections of transgenic lines over-expressing HA-tagged TF genes, which are suited for screening approaches to define TF functions in stress defense and development. (2) A high-throughput microtiter plate based protoplast trans activation (PTA) system has been established to screen for TFs which are regulating a given promoter:Luciferase construct in planta. PMID:22645547

  16. Impact of missing data imputation methods on gene expression clustering and classification.

    PubMed

    de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G

    2015-02-26

    Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .

  17. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures.

    PubMed

    Munro, Sarah A; Lund, Steven P; Pine, P Scott; Binder, Hans; Clevert, Djork-Arné; Conesa, Ana; Dopazo, Joaquin; Fasold, Mario; Hochreiter, Sepp; Hong, Huixiao; Jafari, Nadereh; Kreil, David P; Łabaj, Paweł P; Li, Sheng; Liao, Yang; Lin, Simon M; Meehan, Joseph; Mason, Christopher E; Santoyo-Lopez, Javier; Setterquist, Robert A; Shi, Leming; Shi, Wei; Smyth, Gordon K; Stralis-Pavese, Nancy; Su, Zhenqiang; Tong, Weida; Wang, Charles; Wang, Jian; Xu, Joshua; Ye, Zhan; Yang, Yong; Yu, Ying; Salit, Marc

    2014-09-25

    There is a critical need for standard approaches to assess, report and compare the technical performance of genome-scale differential gene expression experiments. Here we assess technical performance with a proposed standard 'dashboard' of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared among 12 laboratories with three different measurement processes demonstrates generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias are also comparable among laboratories for the same measurement process. We observe different biases for measurement processes using different mRNA-enrichment protocols.

  18. Gene expression complex networks: synthesis, identification, and analysis.

    PubMed

    Lopes, Fabrício M; Cesar, Roberto M; Costa, Luciano Da F

    2011-10-01

    Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree variation, decreasing its network recovery rate with the increase of . The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

  19. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array

    PubMed Central

    Gardina, Paul J; Clark, Tyson A; Shimada, Brian; Staples, Michelle K; Yang, Qing; Veitch, James; Schweitzer, Anthony; Awad, Tarif; Sugnet, Charles; Dee, Suzanne; Davies, Christopher; Williams, Alan; Turpaz, Yaron

    2006-01-01

    Background Alternative splicing is a mechanism for increasing protein diversity by excluding or including exons during post-transcriptional processing. Alternatively spliced proteins are particularly relevant in oncology since they may contribute to the etiology of cancer, provide selective drug targets, or serve as a marker set for cancer diagnosis. While conventional identification of splice variants generally targets individual genes, we present here a new exon-centric array (GeneChip Human Exon 1.0 ST) that allows genome-wide identification of differential splice variation, and concurrently provides a flexible and inclusive analysis of gene expression. Results We analyzed 20 paired tumor-normal colon cancer samples using a microarray designed to detect over one million putative exons that can be virtually assembled into potential gene-level transcripts according to various levels of prior supporting evidence. Analysis of high confidence (empirically supported) transcripts identified 160 differentially expressed genes, with 42 genes occupying a network impacting cell proliferation and another twenty nine genes with unknown functions. A more speculative analysis, including transcripts based solely on computational prediction, produced another 160 differentially expressed genes, three-fourths of which have no previous annotation. We also present a comparison of gene signal estimations from the Exon 1.0 ST and the U133 Plus 2.0 arrays. Novel splicing events were predicted by experimental algorithms that compare the relative contribution of each exon to the cognate transcript intensity in each tissue. The resulting candidate splice variants were validated with RT-PCR. We found nine genes that were differentially spliced between colon tumors and normal colon tissues, several of which have not been previously implicated in cancer. Top scoring candidates from our analysis were also found to substantially overlap with EST-based bioinformatic predictions of alternative splicing in cancer. Conclusion Differential expression of high confidence transcripts correlated extremely well with known cancer genes and pathways, suggesting that the more speculative transcripts, largely based solely on computational prediction and mostly with no previous annotation, might be novel targets in colon cancer. Five of the identified splicing events affect mediators of cytoskeletal organization (ACTN1, VCL, CALD1, CTTN, TPM1), two affect extracellular matrix proteins (FN1, COL6A3) and another participates in integrin signaling (SLC3A2). Altogether they form a pattern of colon-cancer specific alterations that may particularly impact cell motility. PMID:17192196

  20. The contribution of de novo coding mutations to autism spectrum disorder

    PubMed Central

    Iossifov, Ivan; O’Roak, Brian J.; Sanders, Stephan J.; Ronemus, Michael; Krumm, Niklas; Levy, Dan; Stessman, Holly A.; Witherspoon, Kali; Vives, Laura; Patterson, Karynne E.; Smith, Joshua D.; Paeper, Bryan; Nickerson, Deborah A.; Dea, Jeanselle; Dong, Shan; Gonzalez, Luis E.; Mandell, Jefferey D.; Mane, Shrikant M.; Murtha, Michael T.; Sullivan, Catherine A.; Walker, Michael F.; Waqar, Zainulabedin; Wei, Liping; Willsey, A. Jeremy; Yamrom, Boris; Lee, Yoon-ha; Grabowska, Ewa; Dalkic, Ertugrul; Wang, Zihua; Marks, Steven; Andrews, Peter; Leotta, Anthony; Kendall, Jude; Hakker, Inessa; Rosenbaum, Julie; Ma, Beicong; Rodgers, Linda; Troge, Jennifer; Narzisi, Giuseppe; Yoon, Seungtai; Schatz, Michael C.; Ye, Kenny; McCombie, W. Richard; Shendure, Jay; Eichler, Evan E.; State, Matthew W.; Wigler, Michael

    2015-01-01

    We sequenced exomes from more than 2,500 simplex families each having a child with an autistic spectrum disorder (ASD). By comparing affected to unaffected siblings, we estimate that 13% of de novo (DN) missense mutations and 42% of DN likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding DN mutations contribute to about 30% of all simplex and 45% of female diagnoses. Virtually all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower IQ, but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to causative missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Virtually all significance for the latter comes from affected females. PMID:25363768

  1. Extracted Cookstove Emissions Differentially Alter Pro-inflammatory and Adaptive Gene Expression in Lung Epithelial Cells

    EPA Science Inventory

    Current estimates attribute exposure to cookstove emissions (CE) to over 4 million deaths annually. While the development of several new cookstove (CS) designs has led efforts to reduce CE with relative success, the data supporting potential health benefits from the use of new CS...

  2. Comparative analysis of microarray data in Arabidopsis transcriptome during compatible interactions with plant viruses

    USDA-ARS?s Scientific Manuscript database

    To analyze transcriptome response to virus infection, we have assembled currently available microarray data on changes in gene expression levels in compatible Arabidopsis-virus interactions. We used the mean r (Pearson’s correlation coefficient) for neighboring pairs to estimate pairwise local simil...

  3. Increased lipoprotein lipase activity in non-small cell lung cancer tissue predicts shorter patient survival.

    PubMed

    Trost, Zoran; Sok, Miha; Marc, Janja; Cerne, Darko

    2009-07-01

    Cumulative evidence suggests the involvement of lipoprotein lipase (LPL) in tumor progression. We tested the hypothesis that increased LPL activity in resectable non-small cell lung cancer (NSCLC) tissue and the increased LPL gene expression in the surrounding non-cancer lung tissue found in our previous study are predictors of patient survival. Forty two consecutive patients with resected NSCLC were enrolled in the study. Paired samples of lung cancer tissue and adjacent non-cancer lung tissue were collected from resected specimens for baseline LPL activity and gene expression estimation. During a 4-year follow-up, 21 patients died due to tumor progression. One patient died due to a non-cancer reason and was not included in Cox regression analysis. High LPL activity in cancer tissue (relative to the adjacent non-cancer lung tissue) predicted shorter survival, independently of standard prognostic factors (p=0.003). High gene expression in the non-cancer lung tissue surrounding the tumor had no predictive value. Our study further underlines the involvement of cancer tissue LPL activity in tumor progression.

  4. Transcriptomics Reveal Several Gene Expression Patterns in the Piezophile Desulfovibrio hydrothermalis in Response to Hydrostatic Pressure

    PubMed Central

    Amrani, Amira; Bergon, Aurélie; Holota, Hélène; Tamburini, Christian; Garel, Marc; Ollivier, Bernard; Imbert, Jean; Dolla, Alain; Pradel, Nathalie

    2014-01-01

    RNA-seq was used to study the response of Desulfovibrio hydrothermalis, isolated from a deep-sea hydrothermal chimney on the East-Pacific Rise at a depth of 2,600 m, to various hydrostatic pressure growth conditions. The transcriptomic datasets obtained after growth at 26, 10 and 0.1 MPa identified only 65 differentially expressed genes that were distributed among four main categories: aromatic amino acid and glutamate metabolisms, energy metabolism, signal transduction, and unknown function. The gene expression patterns suggest that D. hydrothermalis uses at least three different adaptation mechanisms, according to a hydrostatic pressure threshold (HPt) that was estimated to be above 10 MPa. Both glutamate and energy metabolism were found to play crucial roles in these mechanisms. Quantitation of the glutamate levels in cells revealed its accumulation at high hydrostatic pressure, suggesting its role as a piezolyte. ATP measurements showed that the energy metabolism of this bacterium is optimized for deep-sea life conditions. This study provides new insights into the molecular mechanisms linked to hydrostatic pressure adaptation in sulfate-reducing bacteria. PMID:25215865

  5. Transcript and protein environmental biomarkers in fish--a review.

    PubMed

    Tom, Moshe; Auslander, Meirav

    2005-04-01

    The levels of contaminant-affected gene products (transcripts and proteins) are increasingly utilized as environmental biomarkers, and their appropriate implementation as diagnostic tools is discussed. The required characteristics of a gene product biomarker are accurate evaluation using properly normalized absolute units, aiming at long-term comparability of biomarker levels over a wide geographical range and among many laboratories. Quantitative RT-PCR and competitive ELISA are suggested as preferred evaluation methods for transcript and protein, respectively. Constitutively expressed RNAs or proteins which are part of the examined homogenate are suggested as normalizing agents, compensating for variable processing efficiency. Essential characterization of expression patterns is suggested, providing reference values to be compared to the monitored levels. This comparison would enable estimation of the intensity of biological effects of contaminants. Contaminant-independent reference expression patterns should include natural fluctuations of the biomarker level. Contaminant-dependent patterns should include dose response to model contaminants chronically administered in two environmentally-realistic routes, reaching extreme sub-lethal affected levels. Recent studies using fish as environmental sentinel species, applying gene products as environmental biomarkers, and implementing at least part of the depicted methodologies are reviewed.

  6. Suppressive effects of retinoids, carotenoids and antioxidant vitamins on heterocyclic amine-induced umu C gene expression in Salmonella typhimurium (TA 1535/pSK 1002).

    PubMed

    Okai, Y; Higashi-Okai, K; Nakamura, S; Yano, Y; Otani, S

    1996-06-12

    Effects of retinoids, carotenoids and antioxidant vitamins were studied by mutagen-induced umu C gene expression system in Salmonella typhimurium (TA 1535/pSK 1002). Retinol (vitamin A), retinol acetate and retinoic acid showed remarkable inhibitory activities, whereas retinol palmitate exhibited significant but weak activity for umu C gene expression in tester bacteria induced by 3-amino-3,4-dimethyl-5H-pyrido[4.3-b]indol (Trp-P-1) in the presence of hepatic metabolizing enzymes (S9 mixture). Carotenoids having provitamin A activity (beta-carotene and canthaxanthin) exhibited moderate suppressive effects on the same experimental system. The ranks of suppressive activities were retinol > retinol acetate > retinoic acid > canthaxanthin > beta-carotene > retinol palmitate and their doses for inhibition by 50% (ID50) were estimated to be 1.2 x 10(-7), 3.0 x 10(-7), 5.4 x 10(-7), 1.5 x 10(-6), 4.0 x 10(-5) and 6.0 x 10(-5) M, respectively. However, they did not cause significant inhibition on umu C gene expression induced by direct-acting mutagen (adriamycin or mitomycin C) in the absence of S9 mixture. Inhibition of umu gene expression appears to be due to inhibition of P450-mediated metabolic activation of the heterocyclic amine Trp-P-1. Ascorbic acid (vitamin C) showed weak but significant suppressive activity at high-dose concentrations (3 x 10(-6) - 10(-4)M). However, alpha-tocopherol did not exhibit significant suppression at all dose concentrations. The significance of the experimental results is discussed from the viewpoint of the chemoprevention against genotoxicity associated with carcinogenesis.

  7. Circular RNAs Are the Predominant Transcript Isoform from Hundreds of Human Genes in Diverse Cell Types

    PubMed Central

    Wang, Peter Lincoln; Lacayo, Norman; Brown, Patrick O.

    2012-01-01

    Most human pre-mRNAs are spliced into linear molecules that retain the exon order defined by the genomic sequence. By deep sequencing of RNA from a variety of normal and malignant human cells, we found RNA transcripts from many human genes in which the exons were arranged in a non-canonical order. Statistical estimates and biochemical assays provided strong evidence that a substantial fraction of the spliced transcripts from hundreds of genes are circular RNAs. Our results suggest that a non-canonical mode of RNA splicing, resulting in a circular RNA isoform, is a general feature of the gene expression program in human cells. PMID:22319583

  8. The Interrelationship between Promoter Strength, Gene Expression, and Growth Rate

    PubMed Central

    Klesmith, Justin R.; Detwiler, Emily E.; Tomek, Kyle J.; Whitehead, Timothy A.

    2014-01-01

    In exponentially growing bacteria, expression of heterologous protein impedes cellular growth rates. Quantitative understanding of the relationship between expression and growth rate will advance our ability to forward engineer bacteria, important for metabolic engineering and synthetic biology applications. Recently, a work described a scaling model based on optimal allocation of ribosomes for protein translation. This model quantitatively predicts a linear relationship between microbial growth rate and heterologous protein expression with no free parameters. With the aim of validating this model, we have rigorously quantified the fitness cost of gene expression by using a library of synthetic constitutive promoters to drive expression of two separate proteins (eGFP and amiE) in E. coli in different strains and growth media. In all cases, we demonstrate that the fitness cost is consistent with the previous findings. We expand upon the previous theory by introducing a simple promoter activity model to quantitatively predict how basal promoter strength relates to growth rate and protein expression. We then estimate the amount of protein expression needed to support high flux through a heterologous metabolic pathway and predict the sizable fitness cost associated with enzyme production. This work has broad implications across applied biological sciences because it allows for prediction of the interplay between promoter strength, protein expression, and the resulting cost to microbial growth rates. PMID:25286161

  9. Use of a Novel Embryonic Mammary Stem Cell Gene Signature to Improve Human Breast Cancer Diagnostics and Therapeutic Decision Making

    DTIC Science & Technology

    2013-10-01

    dilution transplantation functional assays, we estimated the fMaSC population to be 10-20% pure. Therefore, we inferred that its gene expression...measured by the gold standard of in vivo transplantation . This approach will not only enable us to identify biomarkers useful for prospectively...image of two of the 96 Fluidigm-C1 capture wells containing candidate fMaSC cells. green=live (Calcein-AM), red= dead (Ethidium Bromide). (B) RT-PCR

  10. DEsingle for detecting three types of differential expression in single-cell RNA-seq data.

    PubMed

    Miao, Zhun; Deng, Ke; Wang, Xiaowo; Zhang, Xuegong

    2018-04-24

    The excessive amount of zeros in single-cell RNA-seq data include "real" zeros due to the on-off nature of gene transcription in single cells and "dropout" zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy. The R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor's consideration now. zhangxg@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.

  11. Repeated Measurements on Distinct Scales With Censoring—A Bayesian Approach Applied to Microarray Analysis of Maize

    PubMed Central

    Love, Tanzy; Carriquiry, Alicia

    2009-01-01

    We analyze data collected in a somatic embryogenesis experiment carried out on Zea mays at Iowa State University. The main objective of the study was to identify the set of genes in maize that actively participate in embryo development. Embryo tissue was sampled and analyzed at various time periods and under different mediums and light conditions. As is the case in many microarray experiments, the operator scanned each slide multiple times to find the slide-specific ‘optimal’ laser and sensor settings. The multiple readings of each slide are repeated measurements on different scales with differing censoring; they cannot be considered to be replicate measurements in the traditional sense. Yet it has been shown that the choice of reading can have an impact on genetic inference. We propose a hierarchical modeling approach to estimating gene expression that combines all available readings on each spot and accounts for censoring in the observed values. We assess the statistical properties of the proposed expression estimates using a simulation experiment. As expected, combining all available scans using an approach with good statistical properties results in expression estimates with noticeably lower bias and root mean squared error relative to other approaches that have been proposed in the literature. Inferences drawn from the somatic embryogenesis experiment, which motivated this work changed drastically when data were analyzed using the standard approaches or using the methodology we propose. PMID:19960120

  12. Unsupervised Bayesian linear unmixing of gene expression microarrays.

    PubMed

    Bazot, Cécile; Dobigeon, Nicolas; Tourneret, Jean-Yves; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O

    2013-03-19

    This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.

  13. Genomic data assimilation for estimating hybrid functional Petri net from time-course gene expression data.

    PubMed

    Nagasaki, Masao; Yamaguchi, Rui; Yoshida, Ryo; Imoto, Seiya; Doi, Atsushi; Tamada, Yoshinori; Matsuno, Hiroshi; Miyano, Satoru; Higuchi, Tomoyuki

    2006-01-01

    We propose an automatic construction method of the hybrid functional Petri net as a simulation model of biological pathways. The problems we consider are how we choose the values of parameters and how we set the network structure. Usually, we tune these unknown factors empirically so that the simulation results are consistent with biological knowledge. Obviously, this approach has the limitation in the size of network of interest. To extend the capability of the simulation model, we propose the use of data assimilation approach that was originally established in the field of geophysical simulation science. We provide genomic data assimilation framework that establishes a link between our simulation model and observed data like microarray gene expression data by using a nonlinear state space model. A key idea of our genomic data assimilation is that the unknown parameters in simulation model are converted as the parameter of the state space model and the estimates are obtained as the maximum a posteriori estimators. In the parameter estimation process, the simulation model is used to generate the system model in the state space model. Such a formulation enables us to handle both the model construction and the parameter tuning within a framework of the Bayesian statistical inferences. In particular, the Bayesian approach provides us a way of controlling overfitting during the parameter estimations that is essential for constructing a reliable biological pathway. We demonstrate the effectiveness of our approach using synthetic data. As a result, parameter estimation using genomic data assimilation works very well and the network structure is suitably selected.

  14. Global parameter estimation for thermodynamic models of transcriptional regulation.

    PubMed

    Suleimenov, Yerzhan; Ay, Ahmet; Samee, Md Abul Hassan; Dresch, Jacqueline M; Sinha, Saurabh; Arnosti, David N

    2013-07-15

    Deciphering the mechanisms involved in gene regulation holds the key to understanding the control of central biological processes, including human disease, population variation, and the evolution of morphological innovations. New experimental techniques including whole genome sequencing and transcriptome analysis have enabled comprehensive modeling approaches to study gene regulation. In many cases, it is useful to be able to assign biological significance to the inferred model parameters, but such interpretation should take into account features that affect these parameters, including model construction and sensitivity, the type of fitness calculation, and the effectiveness of parameter estimation. This last point is often neglected, as estimation methods are often selected for historical reasons or for computational ease. Here, we compare the performance of two parameter estimation techniques broadly representative of local and global approaches, namely, a quasi-Newton/Nelder-Mead simplex (QN/NMS) method and a covariance matrix adaptation-evolutionary strategy (CMA-ES) method. The estimation methods were applied to a set of thermodynamic models of gene transcription applied to regulatory elements active in the Drosophila embryo. Measuring overall fit, the global CMA-ES method performed significantly better than the local QN/NMS method on high quality data sets, but this difference was negligible on lower quality data sets with increased noise or on data sets simplified by stringent thresholding. Our results suggest that the choice of parameter estimation technique for evaluation of gene expression models depends both on quality of data, the nature of the models [again, remains to be established] and the aims of the modeling effort. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

    PubMed Central

    Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

    2015-01-01

    Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836

  16. Minnelide: A Novel Therapeutic That Promotes Apoptosis in Non-Small Cell Lung Carcinoma In Vivo

    PubMed Central

    Rousalova, Ilona; Banerjee, Sulagna; Sangwan, Veena; Evenson, Kristen; McCauley, Joel A.; Kratzke, Robert; Vickers, Selwyn M.; Saluja, Ashok; D’Cunha, Jonathan

    2013-01-01

    Background Minnelide, a pro-drug of triptolide, has recently emerged as a potent anticancer agent. The precise mechanisms of its cytotoxic effects remain unclear. Methods Cell viability was studied using CCK8 assay. Cell proliferation was measured real-time on cultured cells using Electric Cell Substrate Impedence Sensing (ECIS). Apoptosis was assayed by Caspase activity on cultured lung cancer cells and TUNEL staining on tissue sections. Expression of pro-survival and anti-apoptotic genes (HSP70, BIRC5, BIRC4, BIRC2, UACA, APAF-1) was estimated by qRTPCR. Effect of Minnelide on proliferative cells in the tissue was estimated by Ki-67 staining of animal tissue sections. Results In this study, we investigated in vitro and in vivo antitumor effects of triptolide/Minnelide in non-small cell lung carcinoma (NSCLC). Triptolide/Minnelide exhibited anti-proliferative effects and induced apoptosis in NSCLC cell lines and NSCLC mouse models. Triptolide/Minnelide significantly down-regulated the expression of pro-survival and anti-apoptotic genes (HSP70, BIRC5, BIRC4, BIRC2, UACA) and up-regulated pro-apoptotic APAF-1 gene, in part, via attenuating the NF-κB signaling activity. Conclusion In conclusion, our results provide supporting mechanistic evidence for Minnelide as a potential in NSCLC. PMID:24143232

  17. Minnelide: a novel therapeutic that promotes apoptosis in non-small cell lung carcinoma in vivo.

    PubMed

    Rousalova, Ilona; Banerjee, Sulagna; Sangwan, Veena; Evenson, Kristen; McCauley, Joel A; Kratzke, Robert; Vickers, Selwyn M; Saluja, Ashok; D'Cunha, Jonathan

    2013-01-01

    Minnelide, a pro-drug of triptolide, has recently emerged as a potent anticancer agent. The precise mechanisms of its cytotoxic effects remain unclear. Cell viability was studied using CCK8 assay. Cell proliferation was measured real-time on cultured cells using Electric Cell Substrate Impedence Sensing (ECIS). Apoptosis was assayed by Caspase activity on cultured lung cancer cells and TUNEL staining on tissue sections. Expression of pro-survival and anti-apoptotic genes (HSP70, BIRC5, BIRC4, BIRC2, UACA, APAF-1) was estimated by qRTPCR. Effect of Minnelide on proliferative cells in the tissue was estimated by Ki-67 staining of animal tissue sections. In this study, we investigated in vitro and in vivo antitumor effects of triptolide/Minnelide in non-small cell lung carcinoma (NSCLC). Triptolide/Minnelide exhibited anti-proliferative effects and induced apoptosis in NSCLC cell lines and NSCLC mouse models. Triptolide/Minnelide significantly down-regulated the expression of pro-survival and anti-apoptotic genes (HSP70, BIRC5, BIRC4, BIRC2, UACA) and up-regulated pro-apoptotic APAF-1 gene, in part, via attenuating the NF-κB signaling activity. In conclusion, our results provide supporting mechanistic evidence for Minnelide as a potential in NSCLC.

  18. BoolFilter: an R package for estimation and identification of partially-observed Boolean dynamical systems.

    PubMed

    Mcclenny, Levi D; Imani, Mahdi; Braga-Neto, Ulisses M

    2017-11-25

    Gene regulatory networks govern the function of key cellular processes, such as control of the cell cycle, response to stress, DNA repair mechanisms, and more. Boolean networks have been used successfully in modeling gene regulatory networks. In the Boolean network model, the transcriptional state of each gene is represented by 0 (inactive) or 1 (active), and the relationship among genes is represented by logical gates updated at discrete time points. However, the Boolean gene states are never observed directly, but only indirectly and incompletely through noisy measurements based on expression technologies such as cDNA microarrays, RNA-Seq, and cell imaging-based assays. The Partially-Observed Boolean Dynamical System (POBDS) signal model is distinct from other deterministic and stochastic Boolean network models in removing the requirement of a directly observable Boolean state vector and allowing uncertainty in the measurement process, addressing the scenario encountered in practice in transcriptomic analysis. BoolFilter is an R package that implements the POBDS model and associated algorithms for state and parameter estimation. It allows the user to estimate the Boolean states, network topology, and measurement parameters from time series of transcriptomic data using exact and approximated (particle) filters, as well as simulate the transcriptomic data for a given Boolean network model. Some of its infrastructure, such as the network interface, is the same as in the previously published R package for Boolean Networks BoolNet, which enhances compatibility and user accessibility to the new package. We introduce the R package BoolFilter for Partially-Observed Boolean Dynamical Systems (POBDS). The BoolFilter package provides a useful toolbox for the bioinformatics community, with state-of-the-art algorithms for simulation of time series transcriptomic data as well as the inverse process of system identification from data obtained with various expression technologies such as cDNA microarrays, RNA-Seq, and cell imaging-based assays.

  19. In Silico Estimation of Translation Efficiency in Human Cell Lines: Potential Evidence for Widespread Translational Control

    PubMed Central

    Stevens, Stewart G.; Brown, Chris M

    2013-01-01

    Recently large scale transcriptome and proteome datasets for human cells have become available. A striking finding from these studies is that the level of an mRNA typically predicts no more than 40% of the abundance of protein. This correlation represents the overall figure for all genes. We present here a bioinformatic analysis of translation efficiency – the rate at which mRNA is translated into protein. We have analysed those human datasets that include genome wide mRNA and protein levels determined in the same study. The analysis comprises five distinct human cell lines that together provide comparable data for 8,170 genes. For each gene we have used levels of mRNA and protein combined with protein stability data from the HeLa cell line to estimate translation efficiency. This was possible for 3,990 genes in one or more cell lines and 1,807 genes in all five cell lines. Interestingly, our analysis and modelling shows that for many genes this estimated translation efficiency has considerable consistency between cell lines. Some deviations from this consistency likely result from the regulation of protein degradation. Others are likely due to known translational control mechanisms. These findings suggest it will be possible to build improved models for the interpretation of mRNA expression data. The results we present here provide a view of translation efficiency for many genes. We provide an online resource allowing the exploration of translation efficiency in genes of interest within different cell lines (http://bioanalysis.otago.ac.nz/TranslationEfficiency). PMID:23460887

  20. Competition between the sperm of a single male can increase the evolutionary rate of haploid expressed genes.

    PubMed

    Ezawa, Kiyoshi; Innan, Hideki

    2013-07-01

    The population genetic behavior of mutations in sperm genes is theoretically investigated. We modeled the processes at two levels. One is the standard population genetic process, in which the population allele frequencies change generation by generation, depending on the difference in selective advantages. The other is the sperm competition during each genetic transmission from one generation to the next generation. For the sperm competition process, we formulate the situation where a huge number of sperm with alleles A and B, produced by a single heterozygous male, compete to fertilize a single egg. This "minimal model" demonstrates that a very slight difference in sperm performance amounts to quite a large difference between the alleles' winning probabilities. By incorporating this effect of paternity-sharing sperm competition into the standard population genetic process, we show that fierce sperm competition can enhance the fixation probability of a mutation with a very small phenotypic effect at the single-sperm level, suggesting a contribution of sperm competition to rapid amino acid substitutions in haploid-expressed sperm genes. Considering recent genome-wide demonstrations that a substantial fraction of the mammalian sperm genes are haploid expressed, our model could provide a potential explanation of rapid evolution of sperm genes with a wide variety of functions (as long as they are expressed in the haploid phase). Another advantage of our model is that it is applicable to a wide range of species, irrespective of whether the species is externally fertilizing, polygamous, or monogamous. The theoretical result was applied to mammalian data to estimate the selection intensity on nonsynonymous mutations in sperm genes.

  1. Competition Between the Sperm of a Single Male Can Increase the Evolutionary Rate of Haploid Expressed Genes

    PubMed Central

    Ezawa, Kiyoshi; Innan, Hideki

    2013-01-01

    The population genetic behavior of mutations in sperm genes is theoretically investigated. We modeled the processes at two levels. One is the standard population genetic process, in which the population allele frequencies change generation by generation, depending on the difference in selective advantages. The other is the sperm competition during each genetic transmission from one generation to the next generation. For the sperm competition process, we formulate the situation where a huge number of sperm with alleles A and B, produced by a single heterozygous male, compete to fertilize a single egg. This “minimal model” demonstrates that a very slight difference in sperm performance amounts to quite a large difference between the alleles’ winning probabilities. By incorporating this effect of paternity-sharing sperm competition into the standard population genetic process, we show that fierce sperm competition can enhance the fixation probability of a mutation with a very small phenotypic effect at the single-sperm level, suggesting a contribution of sperm competition to rapid amino acid substitutions in haploid-expressed sperm genes. Considering recent genome-wide demonstrations that a substantial fraction of the mammalian sperm genes are haploid expressed, our model could provide a potential explanation of rapid evolution of sperm genes with a wide variety of functions (as long as they are expressed in the haploid phase). Another advantage of our model is that it is applicable to a wide range of species, irrespective of whether the species is externally fertilizing, polygamous, or monogamous. The theoretical result was applied to mammalian data to estimate the selection intensity on nonsynonymous mutations in sperm genes. PMID:23666936

  2. Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

    PubMed Central

    2011-01-01

    Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389

  3. Root Transcriptomic Analysis Revealing the Importance of Energy Metabolism to the Development of Deep Roots in Rice (Oryza sativa L.).

    PubMed

    Lou, Qiaojun; Chen, Liang; Mei, Hanwei; Xu, Kai; Wei, Haibin; Feng, Fangjun; Li, Tiemei; Pang, Xiaomeng; Shi, Caiping; Luo, Lijun; Zhong, Yang

    2017-01-01

    Drought is the most serious abiotic stress limiting rice production, and deep root is the key contributor to drought avoidance. However, the genetic mechanism regulating the development of deep roots is largely unknown. In this study, the transcriptomes of 74 root samples from 37 rice varieties, representing the extreme genotypes of shallow or deep rooting, were surveyed by RNA-seq. The 13,242 differentially expressed genes (DEGs) between deep rooting and shallow rooting varieties (H vs. L) were enriched in the pathway of genetic information processing and metabolism, while the 1,052 DEGs between the deep roots and shallow roots from each of the plants (D vs. S) were significantly enriched in metabolic pathways especially energy metabolism. Ten quantitative trait transcripts (QTTs) were identified and some were involved in energy metabolism. Forty-nine candidate DEGs were confirmed by qRT-PCR and microarray. Through weighted gene co-expression network analysis (WGCNA), we found 18 hub genes. Surprisingly, all these hub genes expressed higher in deep roots than in shallow roots, furthermore half of them functioned in energy metabolism. We also estimated that the ATP production in the deep roots was faster than shallow roots. Our results provided a lot of reliable candidate genes to improve deep rooting, and firstly highlight the importance of energy metabolism to the development of deep roots.

  4. Root Transcriptomic Analysis Revealing the Importance of Energy Metabolism to the Development of Deep Roots in Rice (Oryza sativa L.)

    PubMed Central

    Lou, Qiaojun; Chen, Liang; Mei, Hanwei; Xu, Kai; Wei, Haibin; Feng, Fangjun; Li, Tiemei; Pang, Xiaomeng; Shi, Caiping; Luo, Lijun; Zhong, Yang

    2017-01-01

    Drought is the most serious abiotic stress limiting rice production, and deep root is the key contributor to drought avoidance. However, the genetic mechanism regulating the development of deep roots is largely unknown. In this study, the transcriptomes of 74 root samples from 37 rice varieties, representing the extreme genotypes of shallow or deep rooting, were surveyed by RNA-seq. The 13,242 differentially expressed genes (DEGs) between deep rooting and shallow rooting varieties (H vs. L) were enriched in the pathway of genetic information processing and metabolism, while the 1,052 DEGs between the deep roots and shallow roots from each of the plants (D vs. S) were significantly enriched in metabolic pathways especially energy metabolism. Ten quantitative trait transcripts (QTTs) were identified and some were involved in energy metabolism. Forty-nine candidate DEGs were confirmed by qRT-PCR and microarray. Through weighted gene co-expression network analysis (WGCNA), we found 18 hub genes. Surprisingly, all these hub genes expressed higher in deep roots than in shallow roots, furthermore half of them functioned in energy metabolism. We also estimated that the ATP production in the deep roots was faster than shallow roots. Our results provided a lot of reliable candidate genes to improve deep rooting, and firstly highlight the importance of energy metabolism to the development of deep roots. PMID:28798764

  5. Deregulation of HIF1-alpha and hypoxia-regulated pathways in hepatocellular carcinoma and corresponding non-malignant liver tissue--influence of a modulated host stroma on the prognosis of HCC.

    PubMed

    Simon, Frank; Bockhorn, Maximilian; Praha, Christian; Baba, Hideo A; Broelsch, Christoph E; Frilling, Andrea; Weber, Frank

    2010-04-01

    The aim of this study was to elucidate the role of HIF1A expression in hepatocellular carcinoma (HCC) and the corresponding non-malignant liver tissue and to correlate it with the clinical outcome of HCC patients after curative liver resection. HIF1A expression was determined by quantitative RT-PCR in HCC and corresponding non-malignant liver tissue of 53 patients surgically treated for HCC. High-density gene expression analysis and pathway analysis was performed on a selected subset of patients with high and low HIF1A expression in the non-malignant liver tissue. HIF1A over-expression in the apparently non-malignant liver tissue was a predictor of tumor recurrence and survival. The estimated 1-year and 5-year disease-free survival was significantly better in patients with low HIF1A expression in the non-malignant liver tissue when compared to those patients with high HIF1 expression (88.9% vs. 67.9% and 61.0% vs. 22.6%, respectively, p = 0.008). Based on molecular pathway analysis utilizing high-density gene-expression profiling, HIF1A related molecular networks were identified that contained genes involved in cell migration, cell homing, and cell-cell interaction. Our study identified a potential novel mechanism contributing to prognosis of HCC. The deregulation of HIF1A and its related pathways in the apparently non-malignant liver tissue provides for a modulated environment that potentially enhances or allows for HCC recurrence after curative resection.

  6. Comparative whole genome transcriptome and metabolome analyses of five Klebsiella pneumonia strains.

    PubMed

    Lee, Soojin; Kim, Borim; Yang, Jeongmo; Jeong, Daun; Park, Soohyun; Shin, Sang Heum; Kook, Jun Ho; Yang, Kap-Seok; Lee, Jinwon

    2015-11-01

    The integration of transcriptomics and metabolomics can provide precise information on gene-to-metabolite networks for identifying the function of novel genes. The goal of this study was to identify novel gene functions involved in 2,3-butanediol (2,3-BDO) biosynthesis by a comprehensive analysis of the transcriptome and metabolome of five mutated Klebsiella pneumonia strains (∆wabG = SGSB100, ∆wabG∆budA = SGSB106, ∆wabG∆budB = SGSB107, ∆wabG∆budC = SGSB108, ∆wabG∆budABC = SGSB109). First, the transcriptomes of all five mutants were analyzed and the genes exhibiting reproducible changes in expression were determined. The transcriptome was well conserved among the five strains, and differences in gene expression occurred mainly in genes coding for 2,3-BDO biosynthesis (budA, budB, and budC) and the genes involved in the degradation of reactive oxygen, biosynthesis and transport of arginine, cysteine biosynthesis, sulfur metabolism, oxidoreductase reaction, and formate dehydrogenase reaction. Second, differences in the metabolome (estimated by carbon distribution, CO2 emission, and redox balance) among the five mutant strains due to gene alteration of the 2,3-BDO operon were detected. The functional genomics approach integrating metabolomics and transcriptomics in K. Pneumonia presented here provides an innovative means of identifying novel gene functions involved in 2,3-BDO biosynthesis metabolism and whole cell metabolism.

  7. Comparative transcriptomics indicate changes in cell wall organization and stress response in seedlings during spaceflight.

    PubMed

    Johnson, Christina M; Subramanian, Aswati; Pattathil, Sivakumar; Correll, Melanie J; Kiss, John Z

    2017-08-21

    Plants will play an important role in the future of space exploration as part of bioregenerative life support. Thus, it is important to understand the effects of microgravity and spaceflight on gene expression in plant development. We analyzed the transcriptome of Arabidopsis thaliana using the Biological Research in Canisters (BRIC) hardware during Space Shuttle mission STS-131. The bioinformatics methods used included RMA (robust multi-array average), MAS5 (Microarray Suite 5.0), and PLIER (probe logarithmic intensity error estimation). Glycome profiling was used to analyze cell wall composition in the samples. In addition, our results were compared to those of two other groups using the same hardware on the same mission (BRIC-16). In our BRIC-16 experiments, we noted expression changes in genes involved in hypoxia and heat shock responses, DNA repair, and cell wall structure between spaceflight samples compared to the ground controls. In addition, glycome profiling supported our expression analyses in that there was a difference in cell wall components between ground control and spaceflight-grown plants. Comparing our studies to those of the other BRIC-16 experiments demonstrated that, even with the same hardware and similar biological materials, differences in results in gene expression were found among these spaceflight experiments. A common theme from our BRIC-16 space experiments and those of the other two groups was the downregulation of water stress response genes in spaceflight. In addition, all three studies found differential regulation of genes associated with cell wall remodeling and stress responses between spaceflight-grown and ground control plants. © 2017 Botanical Society of America.

  8. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

    PubMed Central

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-01-01

    Background DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. Results GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865

  9. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.

    PubMed

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-09-03

    DNA microarray technology provides us with a first step toward the goal of uncovering gene functions on a genomic scale. In recent years, vast amounts of gene expression data have been collected, much of which are available in public databases, such as the Gene Expression Omnibus (GEO). To date, most researchers have been manually retrieving data from databases through web browsers using accession numbers (IDs) or keywords, but gene-expression patterns are not considered when retrieving such data. The Connectivity Map was recently introduced to compare gene expression data by introducing gene-expression signatures (represented by a set of genes with up- or down-regulated labels according to their biological states) and is available as a web tool for detecting similar gene-expression signatures from a limited data set (approximately 7,000 expression profiles representing 1,309 compounds). In order to support researchers to utilize the public gene expression data more effectively, we developed a web tool for finding similar gene expression data and generating its co-expression networks from a publicly available database. GEM-TREND, a web tool for searching gene expression data, allows users to search data from GEO using gene-expression signatures or gene expression ratio data as a query and retrieve gene expression data by comparing gene-expression pattern between the query and GEO gene expression data. The comparison methods are based on the nonparametric, rank-based pattern matching approach of Lamb et al. (Science 2006) with the additional calculation of statistical significance. The web tool was tested using gene expression ratio data randomly extracted from the GEO and with in-house microarray data, respectively. The results validated the ability of GEM-TREND to retrieve gene expression entries biologically related to a query from GEO. For further analysis, a network visualization interface is also provided, whereby genes and gene annotations are dynamically linked to external data repositories. GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.

  10. Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning.

    PubMed

    Zhao, Jonathan Z L; Mucaki, Eliseos J; Rogan, Peter K

    2018-01-01

    Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2 ,  PRKDC , TPP2 , PTPRE , and GADD45A ) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2 ,  CD8A ,  TALDO1 ,  PCNA ,  EIF4G2 ,  LCN2 ,  CDKN1A ,  PRKCH ,  ENO1 ,  and PPM1D ) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

  11. MiR-203 involves in neuropathic pain development and represses Rap1a expression in nerve growth factor differentiated neuronal PC12 cells.

    PubMed

    Li, Haixia; Huang, Yuguang; Ma, Chao; Yu, Xuerong; Zhang, Zhiyong; Shen, Le

    2015-01-01

    Although microRNAs (miRNAs) have been shown to play a role in numerous biological processes, their function in neuropathic pain is not clear. The rat bilateral sciatic nerve chronic constriction injury (bCCI) is an established model of neuropathic pain, so we examined miRNA expression and function in the spinal dorsal horn in bCCI rats. Microarray and real-time polymerase chain reaction were used to examine the expression of miRNA in nerve system of bCCI rats, and the targets of miRNA were predicted by bioinformatic approaches. The function of specific miRNA was estimated through the methods of gene engineering. This study revealed substantially (∼10-fold) decreased miR-203 expression in the spinal dorsal horns but not the dorsal root ganglions, hippocampus, or anterior cingulate cortexes of bCCI rats. Rap1a protein expression was upregulated in bCCI rat spinal dorsal horns. We further verified that miR-203 directly targeted the 3'-untranslated region of the rap1a gene, thereby decreasing rap1a protein expression in neuron-like cells. Rap1a has diverse neuronal functions and their perturbation is responsible for several mental disorders. For example, Rap1a/MEK/ERK is involved in peripheral sensitization. These data suggest a potential role for miR-203 in regulating neuropathic pain development, and Rap1a is a validated target gene in vitro. Results from our study and others indicate the possibility that Rap1a may be involved in pain. We hope that these results can provide support for future research into miR-203 in gene therapy for neuropathic pain.

  12. A chain reaction approach to modelling gene pathways.

    PubMed

    Cheng, Gary C; Chen, Dung-Tsa; Chen, James J; Soong, Seng-Jaw; Lamartiniere, Coral; Barnes, Stephen

    2012-08-01

    BACKGROUND: Of great interest in cancer prevention is how nutrient components affect gene pathways associated with the physiological events of puberty. Nutrient-gene interactions may cause changes in breast or prostate cells and, therefore, may result in cancer risk later in life. Analysis of gene pathways can lead to insights about nutrient-gene interactions and the development of more effective prevention approaches to reduce cancer risk. To date, researchers have relied heavily upon experimental assays (such as microarray analysis, etc.) to identify genes and their associated pathways that are affected by nutrient and diets. However, the vast number of genes and combinations of gene pathways, coupled with the expense of the experimental analyses, has delayed the progress of gene-pathway research. The development of an analytical approach based on available test data could greatly benefit the evaluation of gene pathways, and thus advance the study of nutrient-gene interactions in cancer prevention. In the present study, we have proposed a chain reaction model to simulate gene pathways, in which the gene expression changes through the pathway are represented by the species undergoing a set of chemical reactions. We have also developed a numerical tool to solve for the species changes due to the chain reactions over time. Through this approach we can examine the impact of nutrient-containing diets on the gene pathway; moreover, transformation of genes over time with a nutrient treatment can be observed numerically, which is very difficult to achieve experimentally. We apply this approach to microarray analysis data from an experiment which involved the effects of three polyphenols (nutrient treatments), epigallo-catechin-3-O-gallate (EGCG), genistein, and resveratrol, in a study of nutrient-gene interaction in the estrogen synthesis pathway during puberty. RESULTS: In this preliminary study, the estrogen synthesis pathway was simulated by a chain reaction model. By applying it to microarray data, the chain reaction model computed a set of reaction rates to examine the effects of three polyphenols (EGCG, genistein, and resveratrol) on gene expression in this pathway during puberty. We first performed statistical analysis to test the time factor on the estrogen synthesis pathway. Global tests were used to evaluate an overall gene expression change during puberty for each experimental group. Then, a chain reaction model was employed to simulate the estrogen synthesis pathway. Specifically, the model computed the reaction rates in a set of ordinary differential equations to describe interactions between genes in the pathway (A reaction rate K of A to B represents gene A will induce gene B per unit at a rate of K; we give details in the "method" section). Since disparate changes of gene expression may cause numerical error problems in solving these differential equations, we used an implicit scheme to address this issue. We first applied the chain reaction model to obtain the reaction rates for the control group. A sensitivity study was conducted to evaluate how well the model fits to the control group data at Day 50. Results showed a small bias and mean square error. These observations indicated the model is robust to low random noises and has a good fit for the control group. Then the chain reaction model derived from the control group data was used to predict gene expression at Day 50 for the three polyphenol groups. If these nutrients affect the estrogen synthesis pathways during puberty, we expect discrepancy between observed and expected expressions. Results indicated some genes had large differences in the EGCG (e.g., Hsd3b and Sts) and the resveratrol (e.g., Hsd3b and Hrmt12) groups. CONCLUSIONS: In the present study, we have presented (I) experimental studies of the effect of nutrient diets on the gene expression changes in a selected estrogen synthesis pathway. This experiment is valuable because it allows us to examine how the nutrient-containing diets regulate gene expression in the estrogen synthesis pathway during puberty; (II) global tests to assess an overall association of this particular pathway with time factor by utilizing generalized linear models to analyze microarray data; and (III) a chain reaction model to simulate the pathway. This is a novel application because we are able to translate the gene pathway into the chemical reactions in which each reaction channel describes gene-gene relationship in the pathway. In the chain reaction model, the implicit scheme is employed to efficiently solve the differential equations. Data analysis results show the proposed model is capable of predicting gene expression changes and demonstrating the effect of nutrient-containing diets on gene expression changes in the pathway. One of the objectives of this study is to explore and develop a numerical approach for simulating the gene expression change so that it can be applied and calibrated when the data of more time slices are available, and thus can be used to interpolate the expression change at a desired time point without conducting expensive experiments for a large amount of time points. Hence, we are not claiming this is either essential or the most efficient way for simulating this problem, rather a mathematical/numerical approach that can model the expression change of a large set of genes of a complex pathway. In addition, we understand the limitation of this experiment and realize that it is still far from being a complete model of predicting nutrient-gene interactions. The reason is that in the present model, the reaction rates were estimated based on available data at two time points; hence, the gene expression change is dependent upon the reaction rates and a linear function of the gene expressions. More data sets containing gene expression at various time slices are needed in order to improve the present model so that a non-linear variation of gene expression changes at different time can be predicted.

  13. Potential for quantifying expression of the Geobacteraceae citrate synthase gene to assess the activity of Geobacteraceae in the subsurface and on current-harvesting electrodes

    USGS Publications Warehouse

    Holmes, Dawn E.; Nevin, Kelly P.; O'Neil, Regina A.; Ward, Joy E.; Adams, Lorrie A.; Woodard, Trevor L.; Vrionis, Helen A.; Lovely, Derek R.

    2005-01-01

    The Geobacteraceae citrate synthase is phylogenetically distinct from those of other prokaryotes and is a key enzyme in the central metabolism of Geobacteraceae. Therefore, the potential for using levels of citrate synthase mRNA to estimate rates of Geobacter metabolism was evaluated in pure culture studies and in four different Geobacteraceae-dominated environments. Quantitative reverse transcription-PCR studies with mRNA extracted from cultures of Geobacter sulfurreducens grown in chemostats with Fe(III) as the electron acceptor or in batch with electrodes as the electron acceptor indicated that transcript levels of the citrate synthase gene, gltA, increased with increased rates of growth/Fe(III) reduction or current production, whereas the expression of the constitutively expressed housekeeping genes recA, rpoD, and proC remained relatively constant. Analysis of mRNA extracted from groundwater collected from a U(VI)-contaminated site undergoing in situ uranium bioremediation revealed a remarkable correspondence between acetate levels in the groundwater and levels of transcripts of gltA. The expression of gltA was also significantly greater in RNA extracted from groundwater beneath a highway runoff recharge pool that was exposed to calcium magnesium acetate in June, when acetate concentrations were high, than in October, when the levels had significantly decreased. It was also possible to detect gltA transcripts on current-harvesting anodes deployed in freshwater sediments. These results suggest that it is possible to monitor the in situ metabolic rate of Geobacteraceae by tracking the expression of the citrate synthase gene.

  14. Expression of CdDHN4, a Novel YSK2-Type Dehydrin Gene from Bermudagrass, Responses to Drought Stress through the ABA-Dependent Signal Pathway

    PubMed Central

    Lv, Aimin; Fan, Nana; Xie, Jianping; Yuan, Shili; An, Yuan; Zhou, Peng

    2017-01-01

    Dehydrin improves plant resistance to many abiotic stresses. In this study, the expression profiles of a dehydrin gene, CdDHN4, were estimated under various stresses and abscisic acid (ABA) treatments in two bermudagrasses (Cynodon dactylon L.): Tifway (drought-tolerant) and C299 (drought-sensitive). The expression of CdDHN4 was up-regulated by high temperatures, low temperatures, drought, salt and ABA. The sensitivity of CdDHN4 to ABA and the expression of CdDHN4 under drought conditions were higher in Tifway than in C299. A 1239-bp fragment, CdDHN4-P, the partial upstream sequence of the CdDHN4 gene, was cloned by genomic walking from Tifway. Bioinformatic analysis showed that the CdDHN4-P sequence possessed features typical of a plant promoter and contained many typical cis elements, including a transcription initiation site, a TATA-box, an ABRE, an MBS, a MYC, an LTRE, a TATC-box and a GT1-motif. Transient expression in tobacco leaves demonstrated that the promoter CdDHN4-P can be activated by ABA, drought and cold. These results indicate that CdDHN4 is regulated by an ABA-dependent signal pathway and that the high sensitivity of CdDHN4 to ABA might be an important mechanism enhancing the drought tolerance of bermudagrass. PMID:28559903

  15. Expression of CdDHN4, a Novel YSK2-Type Dehydrin Gene from Bermudagrass, Responses to Drought Stress through the ABA-Dependent Signal Pathway.

    PubMed

    Lv, Aimin; Fan, Nana; Xie, Jianping; Yuan, Shili; An, Yuan; Zhou, Peng

    2017-01-01

    Dehydrin improves plant resistance to many abiotic stresses. In this study, the expression profiles of a dehydrin gene, CdDHN4 , were estimated under various stresses and abscisic acid (ABA) treatments in two bermudagrasses ( Cynodon dactylon L.): Tifway (drought-tolerant) and C299 (drought-sensitive). The expression of CdDHN4 was up-regulated by high temperatures, low temperatures, drought, salt and ABA. The sensitivity of CdDHN4 to ABA and the expression of CdDHN4 under drought conditions were higher in Tifway than in C299. A 1239-bp fragment, CdDHN4-P, the partial upstream sequence of the CdDHN4 gene, was cloned by genomic walking from Tifway. Bioinformatic analysis showed that the CdDHN4-P sequence possessed features typical of a plant promoter and contained many typical cis elements, including a transcription initiation site, a TATA-box, an ABRE, an MBS, a MYC, an LTRE, a TATC-box and a GT1-motif. Transient expression in tobacco leaves demonstrated that the promoter CdDHN4-P can be activated by ABA, drought and cold. These results indicate that CdDHN4 is regulated by an ABA-dependent signal pathway and that the high sensitivity of CdDHN4 to ABA might be an important mechanism enhancing the drought tolerance of bermudagrass.

  16. A genome-wide 20 K citrus microarray for gene expression analysis

    PubMed Central

    Martinez-Godoy, M Angeles; Mauri, Nuria; Juarez, Jose; Marques, M Carmen; Santiago, Julia; Forment, Javier; Gadea, Jose

    2008-01-01

    Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database [1] was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability. Conclusion This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global studies in citrus by using it to catalogue genes expressed in citrus globular embryos. PMID:18598343

  17. Immunogenicity and protective role of antigenic regions from five outer membrane proteins of Flavobacterium columnare in grass carp Ctenopharyngodon idella

    NASA Astrophysics Data System (ADS)

    Luo, Zhang; Liu, Zhixin; Fu, Jianping; Zhang, Qiusheng; Huang, Bei; Nie, Pin

    2016-11-01

    Flavobacterium columnare causes columnaris disease in freshwater fish. In the present study, the antigenic regions of five outer membrane proteins (OMPs), including zinc metalloprotease, prolyl oligopeptidase, thermolysin, collagenase and chondroitin AC lyase, were bioinformatically analyzed, fused together, and then expressed as a recombinant fusion protein in Escherichia coli. The expressed protein of 95.6 kDa, as estimated by 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis, was consistent with the molecular weight deduced from the amino acid sequence. The purified recombinant protein was used to vaccinate the grass carp, Ctenopharyngodon idella. Following vaccination of the fish their IgM antibody levels were examined, as was the expression of IgM, IgD and IgZ immunoglobulin genes and other genes such as MHC Iα and MHC IIβ, which are also involved in adaptive immunity. Interleukin genes ( IL), including IL-1β, IL-8 and IL-10, and type I and type II interferon ( IFN) genes were also examined. At 3 and 4 weeks post-vaccination (wpv), significant increases in IgM antibody levels were observed in the fish vaccinated with the recombinant fusion protein, and an increase in the expression levels of IgM, IgD and IgZ genes was also detected following the vaccinations, thus indicating that an adaptive immune response was induced by the vaccinations. Early increases in the expression levels of IL and IFN genes were also observed in the vaccinated fish. At four wpv, the fish were challenged with F. columnare, and the vaccinated fish showed a good level of protection against this pathogen, with 39% relative percent survival (RPS) compared with the control group. It can be concluded, therefore, that the five OMPs, in the form of a recombinant fusion protein vaccine, induced an immune response in fish and protection against F. columnare.

  18. Linkage disequilibrium interval mapping of quantitative trait loci.

    PubMed

    Boitard, Simon; Abdallah, Jihad; de Rochambeau, Hubert; Cierco-Ayrolles, Christine; Mangin, Brigitte

    2006-03-16

    For many years gene mapping studies have been performed through linkage analyses based on pedigree data. Recently, linkage disequilibrium methods based on unrelated individuals have been advocated as powerful tools to refine estimates of gene location. Many strategies have been proposed to deal with simply inherited disease traits. However, locating quantitative trait loci is statistically more challenging and considerable research is needed to provide robust and computationally efficient methods. Under a three-locus Wright-Fisher model, we derived approximate expressions for the expected haplotype frequencies in a population. We considered haplotypes comprising one trait locus and two flanking markers. Using these theoretical expressions, we built a likelihood-maximization method, called HAPim, for estimating the location of a quantitative trait locus. For each postulated position, the method only requires information from the two flanking markers. Over a wide range of simulation scenarios it was found to be more accurate than a two-marker composite likelihood method. It also performed as well as identity by descent methods, whilst being valuable in a wider range of populations. Our method makes efficient use of marker information, and can be valuable for fine mapping purposes. Its performance is increased if multiallelic markers are available. Several improvements can be developed to account for more complex evolution scenarios or provide robust confidence intervals for the location estimates.

  19. Release of cell-free ice nuclei from Halomonas elongata expressing the ice nucleation gene inaZ of Pseudomonas syringae.

    PubMed

    Tegos, G; Vargas, C; Perysinakis, A; Koukkou, A I; Christogianni, A; Nieto, J J; Ventosa, A; Drainas, C

    2000-11-01

    Release of ice nuclei in the growth medium of recombinant Halomonas elongata cells expressing the inaZ gene of Pseudomonas syringae was studied in an attempt to produce cell-free active ice nuclei for biotechnological applications. Cell-free ice nuclei were not retained by cellulose acetate filters of 0.2 microm pore size. Highest activity of cell-free ice nuclei was obtained when cells were grown in low salinity (0.5-5% NaCl, w/v). Freezing temperature threshold, estimated to be below -7 degrees C indicating class C nuclei, was not affected by medium salinity. Their density, as estimated by Percoll density centrifugation, was 1.018 +/- 0.002 gml(-1) and they were found to be free of lipids. Ice nuclei are released in the growth medium of recombinant H. elongata cells probably because of inefficient anchoring of the ice-nucleation protein aggregates in the outer membrane. The ice+ recombinant H. elongata cells could be useful for future use as a source of active cell-free ice nucleation protein.

  20. Expression profile of osteoprotegerin, RANK and RANKL genes in the femoral head of patients with avascular necrosis.

    PubMed

    Samara, Stavroula; Dailiana, Zoe; Chassanidis, Christos; Koromila, Theodora; Papatheodorou, Loukia; Malizos, Konstantinos N; Kollia, Panagoula

    2014-02-01

    Femoral head avascular necrosis (AVN) is a recalcitrant disease of the hip that leads to joint destruction. Osteoprotegerin (OPG), Receptor Activator of Nuclear Factor kappa-B (RANK) and RANK ligand (RANKL) regulate the balance between osteoclasts-osteoblasts. The expression of these genes affects the maturation and function of osteoblasts-osteoclasts and bone remodeling. In this study, we investigated the molecular pathways leading to AVN by studying the expression profile of OPG, RANK and RANKL genes. Quantitative Real Time-PCR was performed for evaluation of OPG, RANK and RANKL expression. Analysis was based on parallel evaluation of mRNA and protein levels in normal/necrotic sites of 42 osteonecrotic femoral heads (FHs). OPG and RANKL protein levels were estimated by western blotting. The OPG mRNA levels were higher (insignificantly) in the necrotic than the normal site (p > 0.05). Although the expression of RANK and RANKL was significantly lower than OPG in both sites, RANK and RANKL mRNA levels were higher in the necrotic part than the normal (p < 0.05). Protein levels of OPG and RANKL showed no remarkable divergence. Our results indicate that differential expression mechanisms for OPG, RANK and RANKL that could play an important role in the progress of bone remodeling in the necrotic area, disturbing bone homeostasis. This finding may have an effect on the resulting bone destruction and the subsequent collapse of the hip joint. Copyright © 2013. Published by Elsevier Inc.

  1. Increased phosphatidylethanolamine N-methyltransferase gene expression in non-small-cell lung cancer tissue predicts shorter patient survival

    PubMed Central

    ZINRAJH, DAVID; HÖRL, GERD; JÜRGENS, GÜNTHER; MARC, JANJA; SOK, MIHA; CERNE, DARKO

    2014-01-01

    Lipid mobilization is of great importance for tumor growth and studies have suggested that cancer cells exhibit abnormal choline phospholipid metabolism. In the present study, we hypothesized that phosphatidylethanolamine N-methyltransferase (PEMT) gene expression is increased in non-small-cell lung cancer (NSCLC) tissues and that increased gene expression acts as a predictor of shorter patient survival. Forty-two consecutive patients with resected NSCLC were enrolled in this study. Paired samples of lung cancer tissues and adjacent non-cancer lung tissues were collected from resected specimens for the estimation of PEMT expression. SYBR Green-based real-time polymerase chain reaction was used for quantification of PEMT mRNA in lung cancer tissues. Lipoprotein lipase (LPL) and fatty acid synthase (FASN) activities had already been measured in the same tissues. During a four-year follow-up, 21 patients succumbed to tumor progression. One patient did not survive due to non-cancer reasons and was not included in the analysis. Cox regression analysis was used to assess the prognostic value of PEMT expression. Our findings show that elevated PEMT expression in the cancer tissue, relative to that in the adjacent non-cancer lung tissue, predicts shorter patient survival independently of standard prognostic factors and also independently of increased LPL or FASN activity, the two other lipid-related predictors of shorter patient survival. These findings suggest that active phosphatidylcholine and/or choline metabolism are essential for tumor growth and progression. PMID:24932311

  2. Increased phosphatidylethanolamine N-methyltransferase gene expression in non-small-cell lung cancer tissue predicts shorter patient survival.

    PubMed

    Zinrajh, David; Hörl, Gerd; Jürgens, Günther; Marc, Janja; Sok, Miha; Cerne, Darko

    2014-06-01

    Lipid mobilization is of great importance for tumor growth and studies have suggested that cancer cells exhibit abnormal choline phospholipid metabolism. In the present study, we hypothesized that phosphatidylethanolamine N-methyltransferase (PEMT) gene expression is increased in non-small-cell lung cancer (NSCLC) tissues and that increased gene expression acts as a predictor of shorter patient survival. Forty-two consecutive patients with resected NSCLC were enrolled in this study. Paired samples of lung cancer tissues and adjacent non-cancer lung tissues were collected from resected specimens for the estimation of PEMT expression. SYBR Green-based real-time polymerase chain reaction was used for quantification of PEMT mRNA in lung cancer tissues. Lipoprotein lipase (LPL) and fatty acid synthase (FASN) activities had already been measured in the same tissues. During a four-year follow-up, 21 patients succumbed to tumor progression. One patient did not survive due to non-cancer reasons and was not included in the analysis. Cox regression analysis was used to assess the prognostic value of PEMT expression. Our findings show that elevated PEMT expression in the cancer tissue, relative to that in the adjacent non-cancer lung tissue, predicts shorter patient survival independently of standard prognostic factors and also independently of increased LPL or FASN activity, the two other lipid-related predictors of shorter patient survival. These findings suggest that active phosphatidylcholine and/or choline metabolism are essential for tumor growth and progression.

  3. Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks.

    PubMed

    Chen, Chi-Kan

    2017-07-26

    The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RE RNN /RE RMLP ), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RE RNN -RNN and RE RMLP -RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RE RMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RE RNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RE RMLP -RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.

  4. A Robust Unified Approach to Analyzing Methylation and Gene Expression Data

    PubMed Central

    Khalili, Abbas; Huang, Tim; Lin, Shili

    2009-01-01

    Microarray technology has made it possible to investigate expression levels, and more recently methylation signatures, of thousands of genes simultaneously, in a biological sample. Since more and more data from different biological systems or technological platforms are being generated at an incredible rate, there is an increasing need to develop statistical methods that are applicable to multiple data types and platforms. Motivated by such a need, a flexible finite mixture model that is applicable to methylation, gene expression, and potentially data from other biological systems, is proposed. Two major thrusts of this approach are to allow for a variable number of components in the mixture to capture non-biological variation and small biases, and to use a robust procedure for parameter estimation and probe classification. The method was applied to the analysis of methylation signatures of three breast cancer cell lines. It was also tested on three sets of expression microarray data to study its power and type I error rates. Comparison with a number of existing methods in the literature yielded very encouraging results; lower type I error rates and comparable/better power were achieved based on the limited study. Furthermore, the method also leads to more biologically interpretable results for the three breast cancer cell lines. PMID:20161265

  5. RNA sequencing provides exquisite insight into the manipulation of the alveolar macrophage by tubercle bacilli.

    PubMed

    Nalpas, Nicolas C; Magee, David A; Conlon, Kevin M; Browne, John A; Healy, Claire; McLoughlin, Kirsten E; Rue-Albrecht, Kévin; McGettigan, Paul A; Killick, Kate E; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E

    2015-09-08

    Mycobacterium bovis, the agent of bovine tuberculosis, causes an estimated $3 billion annual losses to global agriculture due, in part, to the limitations of current diagnostics. Development of next-generation diagnostics requires a greater understanding of the interaction between the pathogen and the bovine host. Therefore, to explore the early response of the alveolar macrophage to infection, we report the first application of RNA-sequencing to define, in exquisite detail, the transcriptomes of M. bovis-infected and non-infected alveolar macrophages from ten calves at 2, 6, 24 and 48 hours post-infection. Differentially expressed sense genes were detected at these time points that revealed enrichment of innate immune signalling functions, and transcriptional suppression of host defence mechanisms (e.g., lysosome maturation). We also detected differentially expressed natural antisense transcripts, which may play a role in subverting innate immune mechanisms following infection. Furthermore, we report differential expression of novel bovine genes, some of which have immune-related functions based on orthology with human proteins. This is the first in-depth transcriptomics investigation of the alveolar macrophage response to the early stages of M. bovis infection and reveals complex patterns of gene expression and regulation that underlie the immunomodulatory mechanisms used by M. bovis to evade host defence mechanisms.

  6. The human orphan nuclear receptor PXR is activated by compounds that regulate CYP3A4 gene expression and cause drug interactions.

    PubMed Central

    Lehmann, J M; McKee, D D; Watson, M A; Willson, T M; Moore, J T; Kliewer, S A

    1998-01-01

    The cytochrome P-450 monooxygenase 3A4 (CYP3A4) is responsible for the oxidative metabolism of a wide variety of xenobiotics including an estimated 60% of all clinically used drugs. Although expression of the CYP3A4 gene is known to be induced in response to a variety of compounds, the mechanism underlying this induction, which represents a basis for drug interactions in patients, has remained unclear. We report the identification of a human (h) orphan nuclear receptor, termed the pregnane X receptor (PXR), that binds to a response element in the CYP3A4 promoter and is activated by a range of drugs known to induce CYP3A4 expression. Comparison of hPXR with the recently cloned mouse PXR reveals marked differences in their activation by certain drugs, which may account in part for the species-specific effects of compounds on CYP3A gene expression. These findings provide a molecular explanation for the ability of disparate chemicals to induce CYP3A4 levels and, furthermore, provide a basis for developing in vitro assays to aid in predicting whether drugs will interact in humans. PMID:9727070

  7. Breast Cancer and Early Onset Childhood Obesity: Cell Specific Gene Expression in Mammary Epithelia and Adipocytes

    DTIC Science & Technology

    2006-07-01

    information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and...maintaining the data needed, and completing and reviewing this collection of information. Send comments regarding this burden estimate or any other...Western Diet with representative Sucrose 29.1 25.2 17 medium and high fat diets (Ghibaudi, Maltodextrin 8.5 6.5 10 et al., Obesity Research, pp 956-963 10(9

  8. Campylobacter jejuni dsb gene expression is regulated by iron in a Fur-dependent manner and by a translational coupling mechanism

    PubMed Central

    2011-01-01

    Background Many bacterial extracytoplasmic proteins are stabilized by intramolecular disulfide bridges that are formed post-translationally between their cysteine residues. This protein modification plays an important role in bacterial pathogenesis, and is facilitated by the Dsb (disulfide bond) family of the redox proteins. These proteins function in two parallel pathways in the periplasmic space: an oxidation pathway and an isomerization pathway. The Dsb oxidative pathway in Campylobacter jejuni is more complex than the one in the laboratory E. coli K-12 strain. Results In the C. jejuni 81-176 genome, the dsb genes of the oxidative pathway are arranged in three transcriptional units: dsbA2-dsbB-astA, dsbA1 and dba-dsbI. Their transcription responds to an environmental stimulus - iron availability - and is regulated in a Fur-dependent manner. Fur involvement in dsb gene regulation was proven by a reporter gene study in a C. jejuni wild type strain and its isogenic fur mutant. An electrophoretic mobility shift assay (EMSA) confirmed that analyzed genes are members of the Fur regulon but each of them is regulated by a disparate mechanism, and both the iron-free and the iron-complexed Fur are able to bind in vitro to the C. jejuni promoter regions. This study led to identification of a new iron- and Fur-regulated promoter that drives dsbA1 gene expression in an indirect way. Moreover, the present work documents that synthesis of DsbI oxidoreductase is controlled by the mechanism of translational coupling. The importance of a secondary dba-dsbI mRNA structure for dsbI mRNA translation was verified by estimating individual dsbI gene expression from its own promoter. Conclusions The present work shows that iron concentration is a significant factor in dsb gene transcription. These results support the concept that iron concentration - also through its influence on dsb gene expression - might control the abundance of extracytoplasmic proteins during different stages of infection. Our work further shows that synthesis of the DsbI membrane oxidoreductase is controlled by a translational coupling mechanism. The dba expression is not only essential for the translation of the downstream dsbI gene, but also Dba protein that is produced might regulate the activity and/or stability of DsbI. PMID:21787430

  9. Campylobacter jejuni dsb gene expression is regulated by iron in a Fur-dependent manner and by a translational coupling mechanism.

    PubMed

    Grabowska, Anna D; Wandel, Michał P; Łasica, Anna M; Nesteruk, Monika; Roszczenko, Paula; Wyszyńska, Agnieszka; Godlewska, Renata; Jagusztyn-Krynicka, Elzbieta K

    2011-07-25

    Many bacterial extracytoplasmic proteins are stabilized by intramolecular disulfide bridges that are formed post-translationally between their cysteine residues. This protein modification plays an important role in bacterial pathogenesis, and is facilitated by the Dsb (disulfide bond) family of the redox proteins. These proteins function in two parallel pathways in the periplasmic space: an oxidation pathway and an isomerization pathway. The Dsb oxidative pathway in Campylobacter jejuni is more complex than the one in the laboratory E. coli K-12 strain. In the C. jejuni 81-176 genome, the dsb genes of the oxidative pathway are arranged in three transcriptional units: dsbA2-dsbB-astA, dsbA1 and dba-dsbI. Their transcription responds to an environmental stimulus - iron availability - and is regulated in a Fur-dependent manner. Fur involvement in dsb gene regulation was proven by a reporter gene study in a C. jejuni wild type strain and its isogenic fur mutant. An electrophoretic mobility shift assay (EMSA) confirmed that analyzed genes are members of the Fur regulon but each of them is regulated by a disparate mechanism, and both the iron-free and the iron-complexed Fur are able to bind in vitro to the C. jejuni promoter regions. This study led to identification of a new iron- and Fur-regulated promoter that drives dsbA1 gene expression in an indirect way. Moreover, the present work documents that synthesis of DsbI oxidoreductase is controlled by the mechanism of translational coupling. The importance of a secondary dba-dsbI mRNA structure for dsbI mRNA translation was verified by estimating individual dsbI gene expression from its own promoter. The present work shows that iron concentration is a significant factor in dsb gene transcription. These results support the concept that iron concentration - also through its influence on dsb gene expression - might control the abundance of extracytoplasmic proteins during different stages of infection. Our work further shows that synthesis of the DsbI membrane oxidoreductase is controlled by a translational coupling mechanism. The dba expression is not only essential for the translation of the downstream dsbI gene, but also Dba protein that is produced might regulate the activity and/or stability of DsbI.

  10. Screening of potential genes contributing to the macrocycle drug resistance of C. albicans via microarray analysis

    PubMed Central

    Yang, Jing; Zhang, Wei; Sun, Jian; Xi, Zhiqin; Qiao, Zusha; Zhang, Jinyu; Wang, Yan; Ji, Ying; Feng, Wenli

    2017-01-01

    The aim of the present study was to investigate the potential genes involved in drug resistance of Candida albicans (C. albicans) by performing microarray analysis. The gene expression profile of GSE65396 was downloaded from the Gene Expression Omnibus, including a control, 15-min and 45-min macrocyclic compound RF59-treated group with three repeats for each. Following preprocessing using RAM, the differentially expressed genes (DEGs) were screened using the Limma package. Subsequently, the Kyoto Encyclopedia of Genes and Genomes pathways of these genes were analyzed using the Database for Annotation, Visualization and Integrated Discovery. Based on interactions estimated by the Search Tool for Retrieval of Interacting Gene, the protein-protein interaction (PPI) network was visualized using Cytoscape. Subnetwork analysis was performed using ReactomeFI. A total of 154 upregulated and 27 downregulated DEGs were identified in the 15-min treated group, compared with the control, and 235 upregulated and 233 downregulated DEGs were identified in the 45-min treated group, compared with the control. The upregulated DEGs were significantly enriched in the ribosome pathway. Based on the PPI network, PRP5, RCL1, NOP13, NOP4 and MRT4 were the top five nodes in the 15-min treated comparison. GIS2, URA3, NOP58, ELP3 and PLP7 were the top five nodes in the 45-min treated comparison, and its subnetwork was significantly enriched in the ribosome pathway. The macrocyclic compound RF59 had a notable effect on the ribosome and its associated pathways of C. albicans. RCL1, NOP4, MRT4, GIS2 and NOP58 may be important in RF59-resistance. PMID:28944888

  11. Vitamin K2 alleviates type 2 diabetes in rats by induction of osteocalcin gene expression.

    PubMed

    Hussein, Atef G; Mohamed, Randa H; Shalaby, Sally M; Abd El Motteleb, Dalia M

    2018-03-01

    The biological mechanisms behind the association between vitamin K (Vit K) and glucose metabolism are uncertain. We aimed to analyze the expression of insulin 1 (Ins 1), insulin 2 (Ins 2) and cyclin D2, the expression of adiponectin and UCP-1 . In addition, we aimed to estimate the doses of Vit K2 able to affect various aspects of glucose and energy metabolism in type 2 diabetes. Thirty adult male rats were allocated equally into five groups: control group, diabetes mellitus group, and groups 3, 4, and 5, which received Vit K 2 at three daily dose levels (10, 15, and 30 mg/kg, respectively) for 8 wk. At the end of the study, blood samples were collected to quantify total osteocalcin, fasting plasma glucose, fasting insulin, and relevant variables. The expression of OC, Ins 1, Ins 2, cyclin D2, adiponectin, UCP-1 genes was analyzed by real-time polymerase chain reaction. After administration of Vit K 2 , a dose-dependent decrease in fasting plasma glucose, hemoglobin A1c and homeostatic model assessment method insulin resistance, and a dose-dependent increase in fasting insulin and homeostatic model assessment method β cell function levels, when compared with diabetes mellitus rats, were detected. There was significant upregulation of OC, Ins 1, Ins 2, or cyclin D2 gene expression in the three treated groups in a dose-dependent manner when compared with the diabetic rats. However, expression of adiponectin and UCP-1 were significantly increased at the highest dose (30 mg/kg daily) only. Vit K 2 administration could improve glycemic status in type 2 diabetic rats by induction of OC gene expression. Osteocalcin could increase β-cell proliferation, energy expenditure, and adiponectin expression. Different concentrations of Vit K 2 were required to affect glucose metabolism and insulin sensitivity. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Application of community phylogenetic approaches to understand gene expression: differential exploration of venom gene space in predatory marine gastropods.

    PubMed

    Chang, Dan; Duda, Thomas F

    2014-06-05

    Predatory marine gastropods of the genus Conus exhibit substantial variation in venom composition both within and among species. Apart from mechanisms associated with extensive turnover of gene families and rapid evolution of genes that encode venom components ('conotoxins'), the evolution of distinct conotoxin expression patterns is an additional source of variation that may drive interspecific differences in the utilization of species' 'venom gene space'. To determine the evolution of expression patterns of venom genes of Conus species, we evaluated the expression of A-superfamily conotoxin genes of a set of closely related Conus species by comparing recovered transcripts of A-superfamily genes that were previously identified from the genomes of these species. We modified community phylogenetics approaches to incorporate phylogenetic history and disparity of genes and their expression profiles to determine patterns of venom gene space utilization. Less than half of the A-superfamily gene repertoire of these species is expressed, and only a few orthologous genes are coexpressed among species. Species exhibit substantially distinct expression strategies, with some expressing sets of closely related loci ('under-dispersed' expression of available genes) while others express sets of more disparate genes ('over-dispersed' expression). In addition, expressed genes show higher dN/dS values than either unexpressed or ancestral genes; this implies that expression exposes genes to selection and facilitates rapid evolution of these genes. Few recent lineage-specific gene duplicates are expressed simultaneously, suggesting that expression divergence among redundant gene copies may be established shortly after gene duplication. Our study demonstrates that venom gene space is explored differentially by Conus species, a process that effectively permits the independent and rapid evolution of venoms in these species.

  13. A Leveraged Signal-to-Noise Ratio (LSTNR) Method to Extract Differentially Expressed Genes and Multivariate Patterns of Expression From Noisy and Low-Replication RNAseq Data

    PubMed Central

    Lozoya, Oswaldo A.; Santos, Janine H.; Woychik, Richard P.

    2018-01-01

    To life scientists, one important feature offered by RNAseq, a next-generation sequencing tool used to estimate changes in gene expression levels, lies in its unprecedented resolution. It can score countable differences in transcript numbers among thousands of genes and between experimental groups, all at once. However, its high cost limits experimental designs to very small sample sizes, usually N = 3, which often results in statistically underpowered analysis and poor reproducibility. All these issues are compounded by the presence of experimental noise, which is harder to distinguish from instrumental error when sample sizes are limiting (e.g., small-budget pilot tests), experimental populations exhibit biologically heterogeneous or diffuse expression phenotypes (e.g., patient samples), or when discriminating among transcriptional signatures of closely related experimental conditions (e.g., toxicological modes of action, or MOAs). Here, we present a leveraged signal-to-noise ratio (LSTNR) thresholding method, founded on generalized linear modeling (GLM) of aligned read detection limits to extract differentially expressed genes (DEGs) from noisy low-replication RNAseq data. The LSTNR method uses an agnostic independent filtering strategy to define the dynamic range of detected aggregate read counts per gene, and assigns statistical weights that prioritize genes with better sequencing resolution in differential expression analyses. To assess its performance, we implemented the LSTNR method to analyze three separate datasets: first, using a systematically noisy in silico dataset, we demonstrated that LSTNR can extract pre-designed patterns of expression and discriminate between “noise” and “true” differentially expressed pseudogenes at a 100% success rate; then, we illustrated how the LSTNR method can assign patient-derived breast cancer specimens correctly to one out of their four reported molecular subtypes (luminal A, luminal B, Her2-enriched and basal-like); and last, we showed the ability to retrieve five different modes of action (MOA) elicited in livers of rats exposed to three toxicants under three nutritional routes by using the LSTNR method. By combining differential measurements with resolving power to detect DEGs, the LSTNR method offers an alternative approach to interrogate noisy and low-replication RNAseq datasets, which handles multiple biological conditions at once, and defines benchmarks to validate RNAseq experiments with standard benchtop assays. PMID:29868123

  14. Transcriptional and Enzymatic Profiling of Pleurotus ostreatus Laccase Genes in Submerged and Solid-State Fermentation Cultures

    PubMed Central

    Castanera, Raúl; Pérez, Gúmer; Omarini, Alejandra; Alfaro, Manuel; Pisabarro, Antonio G.; Faraco, Vincenza; Amore, Antonella

    2012-01-01

    The genome of the white rot basidiomycete Pleurotus ostreatus includes 12 phenol oxidase (laccase) genes. In this study, we examined their expression profiles in different fungal strains under different culture conditions (submerged and solid cultures) and in the presence of a wheat straw extract, which was used as an inducer of the laccase gene family. We used a reverse transcription-quantitative PCR (RT-qPCR)-based approach and focused on determining the reaction parameters (in particular, the reference gene set for the normalization and reaction efficiency determinations) used to achieve an accurate estimation of the relative gene expression values. The results suggested that (i) laccase gene transcription is upregulated in the induced submerged fermentation (iSmF) cultures but downregulated in the solid fermentation (SSF) cultures, (ii) the Lacc2 and Lacc10 genes are the main sources of laccase activity in the iSmF cultures upon induction with water-soluble wheat straw extracts, and (iii) an additional, as-yet-uncharacterized activity (Unk1) is specifically induced in SSF cultures that complements the activity of Lacc2 and Lacc10. Moreover, both the enzymatic laccase activities and the Lacc gene family transcription profiles greatly differ between closely related strains. These differences can be targeted for biotechnological breeding programs for enzyme production in submerged fermentation reactors. PMID:22467498

  15. Profiling of hepatic gene expression of mice fed with edible japanese mushrooms by DNA microarray analysis: comparison among Pleurotus ostreatus, Grifola frondosa, and Hypsizigus marmoreus.

    PubMed

    Sato, Mayumi; Tokuji, Yoshihiko; Yoneyama, Shozo; Fujii-Akiyama, Kyoko; Kinoshita, Mikio; Ohnishi, Masao

    2011-10-12

    To compare and estimate the effects of dietary intake of three kinds of mushrooms (Pleurotus ostreatus, Grifola frondosa, and Hypsizigus marmoreus), mice were fed a diet containing 10-14% of each mushroom for 4 weeks. Triacylglycerol in the liver and plasma decreased and plasma cholesterol increased in the P. ostreatus-fed group compared with those in the control group. Cholesterol in the liver was lower in the G. frondosa-fed group than in the control group, but no changes were found in the H. marmoreus-fed group. DNA microarray analysis of the liver revealed differences of gene expression patterns among mushrooms. Ctp1a and Fabp families were upregulated in the P. ostreatus-fed group, which were considered to promote lipid transport and β-oxidation. In the G. frondosa-fed group, not only the gene involved in signal transduction of innate immunity via TLR3 and interferon but also virus resistance genes, such as Mx1, Rsad2, and Oas1, were upregulated.

  16. Evaluation of genistein ability to modulate CTGF mRNA/protein expression, genes expression of TGFβ isoforms and expression of selected genes regulating cell cycle in keloid fibroblasts in vitro.

    PubMed

    Jurzak, Magdalena; Adamczyk, Katarzyna; Antończak, Paweł; Garncarczyk, Agnieszka; Kuśmierz, Dariusz; Latocha, Małgorzata

    2014-01-01

    Keloids are characterized by overgrowth of connective tissue in the skin that arises as a consequence of abnormal wound healing. Normal wound healing is regulated by a complex set of interactions within a network of profibrotic and antifibrotic cytokines that regulate new extracellular matrix (ECM) synthesis and remodeling. These proteins include transforming growth factor β (TGFβ) isoforms and connective tissue growth factor (CTGF). TGFβ1 stimulates fibroblasts to synthesize and contract ECM and acts as a central mediator of profibrotic response. CTGF is induced by TGFβ1 and is considered a downstream mediator of TGFβ1action in fibroblasts. CTGF plays a crucial role in keloid pathogenesis by promoting prolonged collagen synthesis and deposition and as a consequence sustained fibrotic response. During keloids formation, besides imbalanced ECM synthesis and degradation, fibroblast proliferation and it's resistance to apoptosis is observed. Key genes that may play a role in keloid formation and growth involve: suppressor gene p53.,cyclin-depend- ent kinase inhibitor CDKN1A (p21) and BCL2 family genes: antiapoptotic BCL-2 and proapoptotic BAX. Genistein (4',5,7-trihydroxyisoflavone) exhibits multidirectional biological action. The concentration of genistein is relatively high in soybean. Genistein has been shown as effective antioxidant and chemopreventive agent. Genistein can bind to estrogen receptors (ERs) and modulate estrogen action due to its structure similarity to human estrogens. Genistein also inhibits transcription factors NFκB. Akt and AP-l signaling pathways, that are important for cytokines expression and cell proliferation, differentiation, survival and apoptosis. The aim of the study was to investigate genistein as a potential inhibitor of CTGF and TGFβ1, β2 and β3 isoforms expression and a potential regulator of p53. CDKN1A(p21), BAX and BCL-2 expression in normal fibroblasts and fibroblasts derived from keloids cultured in vitro. Real time RT-QPCR was used to estimate transcription level of selected genes in normal and keloid fibroblasts treated with genistein. Secreted/cell-associated CTGF protein was evaluated in cell growth's medium by ELISA. Total protein quantification was evaluated by fluorimetric assay in cells llsates (Quant-iT TM Protein Assay Kit). It was found that TGFβ1, β2 and β3 genes expression are decreased by genistein. Genistein suppresses the expression of CTGF mRNA and CTGF protein in a concentration dependent manner, p53 and p21 genes expression are modulated by genistein in concentration dependent manner. The agent also modulates BAX/BCL-2 ratio in examined cells in vitro.

  17. Evaluating Fumonisin Gene Expression in Fusarium verticillioides.

    PubMed

    Scala, Valeria; Visentin, Ivan; Cardinale, Francesca

    2017-01-01

    Transcript levels of key genes in a biosynthetic pathway are often taken as a proxy for metabolite production. This is the case of FUM1, encoding the first dedicated enzyme in the metabolic pathway leading to the production of the mycotoxins Fumonisins by fungal species belonging to the genus Fusarium. FUM1 expression can be quantified by different methods; here, we detail a protocol based on quantitative reverse transcriptase polymerase chain reaction (RT-qPCR), by which relative or absolute transcript abundance can be estimated in Fusaria grown in vitro or in planta. As very seldom commercial kits for RNA extraction and cDNA synthesis are optimized for fungal samples, we developed a protocol tailored for these organisms, which stands alone but can be also easily integrated with specific reagents and kits commercially available.

  18. MicroRNA-integrated and network-embedded gene selection with diffusion distance.

    PubMed

    Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C

    2010-10-29

    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

  19. Recombinant adeno-associated virus mediates a high level of gene transfer but less efficient integration in the K562 human hematopoietic cell line.

    PubMed Central

    Malik, P; McQuiston, S A; Yu, X J; Pepper, K A; Krall, W J; Podsakoff, G M; Kurtzman, G J; Kohn, D B

    1997-01-01

    We tested the ability of a recombinant adeno-associated virus (rAAV) vector to express and integrate exogenous DNA into human hematopoietic cells in the absence of selection. We developed an rAAV vector, AAV-tNGFR, carrying a truncated rat nerve growth factor receptor (tNGFR) cDNA as a cell surface reporter under the control of the Moloney murine leukemia virus (MoMuLV) long terminal repeat. An analogous MoMuLV-based retroviral vector (L-tNGFR) was used in parallel, and gene transfer and expression in human hematopoietic cells were assessed by flow cytometry and DNA analyses. Following gene transfer into K562 cells with AAV-tNGFR at a multiplicity of infection (MOI) of 13 infectious units (IU), 26 to 38% of cells expressed tNGFR on the surface early after transduction, but the proportion of tNGFR expressing cells steadily declined to 3.0 to 3.5% over 1 month of culture. At an MOI of 130 IU, nearly all cells expressed tNGFR immediately posttransduction, but the proportion of cells expressing tNGFR declined to 62% over 2 months of culture. The decline in the proportion of AAV-tNGFR-expressing cells was associated with ongoing losses of vector genomes. In contrast, K562 cells transduced with the retroviral vector L-tNGFR expressed tNGFR in a constant fraction. Integration analyses on clones showed that integration occurred at different sites. Integration frequencies were estimated at about 49% at an MOI of 130 and 2% at an MOI of 1.3. Transduction of primary human CD34+ progenitor cells by AAV-tNGFR was less efficient than with K562 cells and showed a declining percentage of cells expressing tNGFR over 2 weeks of culture. Thus, purified rAAV caused very high gene transfer and expression in human hematopoietic cells early after transduction, which steadily declined during cell passage in the absence of selection. Although the efficiency of integration was low, overall integration was markedly improved at a high MOI. While prolonged episomal persistence may be adequate for gene therapy of nondividing cells, a very high MOI or improvements in basic aspects of AAV-based vectors may be necessary to improve integration frequency in the rapidly dividing hematopoietic cell population. PMID:9032306

  20. Recombinant adeno-associated virus mediates a high level of gene transfer but less efficient integration in the K562 human hematopoietic cell line.

    PubMed

    Malik, P; McQuiston, S A; Yu, X J; Pepper, K A; Krall, W J; Podsakoff, G M; Kurtzman, G J; Kohn, D B

    1997-03-01

    We tested the ability of a recombinant adeno-associated virus (rAAV) vector to express and integrate exogenous DNA into human hematopoietic cells in the absence of selection. We developed an rAAV vector, AAV-tNGFR, carrying a truncated rat nerve growth factor receptor (tNGFR) cDNA as a cell surface reporter under the control of the Moloney murine leukemia virus (MoMuLV) long terminal repeat. An analogous MoMuLV-based retroviral vector (L-tNGFR) was used in parallel, and gene transfer and expression in human hematopoietic cells were assessed by flow cytometry and DNA analyses. Following gene transfer into K562 cells with AAV-tNGFR at a multiplicity of infection (MOI) of 13 infectious units (IU), 26 to 38% of cells expressed tNGFR on the surface early after transduction, but the proportion of tNGFR expressing cells steadily declined to 3.0 to 3.5% over 1 month of culture. At an MOI of 130 IU, nearly all cells expressed tNGFR immediately posttransduction, but the proportion of cells expressing tNGFR declined to 62% over 2 months of culture. The decline in the proportion of AAV-tNGFR-expressing cells was associated with ongoing losses of vector genomes. In contrast, K562 cells transduced with the retroviral vector L-tNGFR expressed tNGFR in a constant fraction. Integration analyses on clones showed that integration occurred at different sites. Integration frequencies were estimated at about 49% at an MOI of 130 and 2% at an MOI of 1.3. Transduction of primary human CD34+ progenitor cells by AAV-tNGFR was less efficient than with K562 cells and showed a declining percentage of cells expressing tNGFR over 2 weeks of culture. Thus, purified rAAV caused very high gene transfer and expression in human hematopoietic cells early after transduction, which steadily declined during cell passage in the absence of selection. Although the efficiency of integration was low, overall integration was markedly improved at a high MOI. While prolonged episomal persistence may be adequate for gene therapy of nondividing cells, a very high MOI or improvements in basic aspects of AAV-based vectors may be necessary to improve integration frequency in the rapidly dividing hematopoietic cell population.

  1. Downregulated PITX1 Modulated by MiR-19a-3p Promotes Cell Malignancy and Predicts a Poor Prognosis of Gastric Cancer by Affecting Transcriptionally Activated PDCD5.

    PubMed

    Qiao, Fengchang; Gong, Pihai; Song, Yunwei; Shen, Xiaohui; Su, Xianwei; Li, Yiping; Wu, Huazhang; Zhao, Zhujiang; Fan, Hong

    2018-01-01

    PITX1 has been identified as a potential tumor-suppressor gene in several malignant tumors. The molecular mechanism underlying PITX1, particularly its function as a transcription factor regulating gene expression during tumorigenesis, is still poorly understood. The expression level and location of PITX1 were determined by quantitative reverse transcription PCR (qRT-PCR) and immunohistochemical staining in gastric cancer (GC). The effect of PITX1 on the GC cell proliferation and tumorigenesis was analyzed in vitro and in vivo. To explore how PITX1 suppresses cell proliferation, we used PITX1-ChIP-sequencing to measure genome-wide binding sites of PITX1 and assessed global function associations based on its putative target genes. ChIP-PCR, electrophoretic mobility shift assay, and promoter reporter assays examined whether PITX1 bound to PDCD5 and regulated its expression. The function of PDCD5 in GC cell apoptosis was further examined in vitro and in vivo. The relationship between the PITX1 protein level and GC patient prognosis was evaluated by the Kaplan-Meier estimator. Meanwhile, the expression level of miR-19a-3p, which is related to PITX1, was also detected by luciferase reporter assay, qRT-PCR, and western blotting. The expression level of PITX1 was decreased in GC tissues and cell lines. Elevated PITX1 expression significantly suppressed the cell proliferation of GC cells and tumorigenesis in vitro and in vivo. PITX1 knockdown blocked its inhibition of GC cell proliferation. PITX1 bound to whole genome-wide sites, with these targets enriched on genes with functions mainly related to cell growth and apoptosis. PITX1 bound to PDCD5, an apoptosis-related gene, during tumorigenesis, and cis-regulated PDCD5 expression. Increased PDCD5 expression in GC cells not only induced GC cell apoptosis, but also suppressed GC cell growth in vitro and in vivo. Moreover, PITX1 expression was regulated by miR-19a-3p. More importantly, a decreased level of PITX1 protein was correlated with poor GC patient prognosis. Decreased expression of PITX1 predicts shorter overall survival in GC patients. As a transcriptional activator, PITX1 regulates apoptosis-related genes, including PDCD5, during gastric carcinogenesis. These data indicate PDCD5 to be a novel and feasible therapeutic target for GC. © 2018 The Author(s). Published by S. Karger AG, Basel.

  2. Sex-Specific Associations between Particulate Matter Exposure and Gene Expression in Independent Discovery and Validation Cohorts of Middle-Aged Men and Women.

    PubMed

    Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S

    2017-04-01

    Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM 10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM 2.5 and PM 10 exposures over 2-years were estimated for each participant's residential address using spatiotemporal interpolation in combination with a dispersion model. Average long-term PM 10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m 3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM 10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM 10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 ( p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM 10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM 10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Expression of the sex-specific candidate genes identified in the discovery population predicted PM 10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular mechanisms for PM-induced health effects.

  3. High natural gene expression variation in the reef-building coral Acropora millepora: potential for acclimative and adaptive plasticity

    PubMed Central

    2013-01-01

    Background Ecosystems worldwide are suffering the consequences of anthropogenic impact. The diverse ecosystem of coral reefs, for example, are globally threatened by increases in sea surface temperatures due to global warming. Studies to date have focused on determining genetic diversity, the sequence variability of genes in a species, as a proxy to estimate and predict the potential adaptive response of coral populations to environmental changes linked to climate changes. However, the examination of natural gene expression variation has received less attention. This variation has been implicated as an important factor in evolutionary processes, upon which natural selection can act. Results We acclimatized coral nubbins from six colonies of the reef-building coral Acropora millepora to a common garden in Heron Island (Great Barrier Reef, GBR) for a period of four weeks to remove any site-specific environmental effects on the physiology of the coral nubbins. By using a cDNA microarray platform, we detected a high level of gene expression variation, with 17% (488) of the unigenes differentially expressed across coral nubbins of the six colonies (jsFDR-corrected, p < 0.01). Among the main categories of biological processes found differentially expressed were transport, translation, response to stimulus, oxidation-reduction processes, and apoptosis. We found that the transcriptional profiles did not correspond to the genotype of the colony characterized using either an intron of the carbonic anhydrase gene or microsatellite loci markers. Conclusion Our results provide evidence of the high inter-colony variation in A. millepora at the transcriptomic level grown under a common garden and without a correspondence with genotypic identity. This finding brings to our attention the importance of taking into account natural variation between reef corals when assessing experimental gene expression differences. The high transcriptional variation detected in this study is interpreted and discussed within the context of adaptive potential and phenotypic plasticity of reef corals. Whether this variation will allow coral reefs to survive to current challenges remains unknown. PMID:23565725

  4. The human disease network in terms of dysfunctional regulatory mechanisms.

    PubMed

    Yang, Jing; Wu, Su-Juan; Dai, Wen-Tao; Li, Yi-Xue; Li, Yuan-Yuan

    2015-10-08

    Elucidation of human disease similarities has emerged as an active research area, which is highly relevant to etiology, disease classification, and drug repositioning. In pioneer studies, disease similarity was commonly estimated according to clinical manifestation. Subsequently, scientists started to investigate disease similarity based on gene-phenotype knowledge, which were inevitably biased to well-studied diseases. In recent years, estimating disease similarity according to transcriptomic behavior significantly enhances the probability of finding novel disease relationships, while the currently available studies usually mine expression data through differential expression analysis that has been considered to have little chance of unraveling dysfunctional regulatory relationships, the causal pathogenesis of diseases. We developed a computational approach to measure human disease similarity based on expression data. Differential coexpression analysis, instead of differential expression analysis, was employed to calculate differential coexpression level of every gene for each disease, which was then summarized to the pathway level. Disease similarity was eventually calculated as the partial correlation coefficients of pathways' differential coexpression values between any two diseases. The significance of disease relationships were evaluated by permutation test. Based on mRNA expression data and a differential coexpression analysis based method, we built a human disease network involving 1326 significant Disease-Disease links among 108 diseases. Compared with disease relationships captured by differential expression analysis based method, our disease links shared known disease genes and drugs more significantly. Some novel disease relationships were discovered, for example, Obesity and cancer, Obesity and Psoriasis, lung adenocarcinoma and S. pneumonia, which had been commonly regarded as unrelated to each other, but recently found to share similar molecular mechanisms. Additionally, it was found that both the type of disease and the type of affected tissue influenced the degree of disease similarity. A sub-network including Allergic asthma, Type 2 diabetes and Chronic kidney disease was extracted to demonstrate the exploration of their common pathogenesis. The present study produces a global view of human diseasome for the first time from the viewpoint of regulation mechanisms, which therefore could provide insightful clues to etiology and pathogenesis, and help to perform drug repositioning and design novel therapeutic interventions.

  5. Quantifying circular RNA expression from RNA-seq data using model-based framework.

    PubMed

    Li, Musheng; Xie, Xueying; Zhou, Jing; Sheng, Mengying; Yin, Xiaofeng; Ko, Eun-A; Zhou, Tong; Gu, Wanjun

    2017-07-15

    Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir. Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir . tongz@medicine.nevada.edu or wanjun.gu@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. FARO server: Meta-analysis of gene expression by matching gene expression signatures to a compendium of public gene expression data.

    PubMed

    Manijak, Mieszko P; Nielsen, Henrik B

    2011-06-11

    Although, systematic analysis of gene annotation is a powerful tool for interpreting gene expression data, it sometimes is blurred by incomplete gene annotation, missing expression response of key genes and secondary gene expression responses. These shortcomings may be partially circumvented by instead matching gene expression signatures to signatures of other experiments. To facilitate this we present the Functional Association Response by Overlap (FARO) server, that match input signatures to a compendium of 242 gene expression signatures, extracted from more than 1700 Arabidopsis microarray experiments. Hereby we present a publicly available tool for robust characterization of Arabidopsis gene expression experiments which can point to similar experimental factors in other experiments. The server is available at http://www.cbs.dtu.dk/services/faro/.

  7. A new estimator of the discovery probability.

    PubMed

    Favaro, Stefano; Lijoi, Antonio; Prünster, Igor

    2012-12-01

    Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n+m+1)th observation, species that have been observed with any given frequency in the enlarged sample of size n+m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets. © 2012, The International Biometric Society.

  8. Inference of quantitative models of bacterial promoters from time-series reporter gene data.

    PubMed

    Stefan, Diana; Pinel, Corinne; Pinhal, Stéphane; Cinquemani, Eugenio; Geiselmann, Johannes; de Jong, Hidde

    2015-01-01

    The inference of regulatory interactions and quantitative models of gene regulation from time-series transcriptomics data has been extensively studied and applied to a range of problems in drug discovery, cancer research, and biotechnology. The application of existing methods is commonly based on implicit assumptions on the biological processes under study. First, the measurements of mRNA abundance obtained in transcriptomics experiments are taken to be representative of protein concentrations. Second, the observed changes in gene expression are assumed to be solely due to transcription factors and other specific regulators, while changes in the activity of the gene expression machinery and other global physiological effects are neglected. While convenient in practice, these assumptions are often not valid and bias the reverse engineering process. Here we systematically investigate, using a combination of models and experiments, the importance of this bias and possible corrections. We measure in real time and in vivo the activity of genes involved in the FliA-FlgM module of the E. coli motility network. From these data, we estimate protein concentrations and global physiological effects by means of kinetic models of gene expression. Our results indicate that correcting for the bias of commonly-made assumptions improves the quality of the models inferred from the data. Moreover, we show by simulation that these improvements are expected to be even stronger for systems in which protein concentrations have longer half-lives and the activity of the gene expression machinery varies more strongly across conditions than in the FliA-FlgM module. The approach proposed in this study is broadly applicable when using time-series transcriptome data to learn about the structure and dynamics of regulatory networks. In the case of the FliA-FlgM module, our results demonstrate the importance of global physiological effects and the active regulation of FliA and FlgM half-lives for the dynamics of FliA-dependent promoters.

  9. Childhood tuberculosis is associated with decreased abundance of T cell gene transcripts and impaired T cell function.

    PubMed

    Hemingway, Cheryl; Berk, Maurice; Anderson, Suzanne T; Wright, Victoria J; Hamilton, Shea; Eleftherohorinou, Hariklia; Kaforou, Myrsini; Goldgof, Greg M; Hickman, Katy; Kampmann, Beate; Schoeman, Johan; Eley, Brian; Beatty, David; Pienaar, Sandra; Nicol, Mark P; Griffiths, Michael J; Waddell, Simon J; Newton, Sandra M; Coin, Lachlan J; Relman, David A; Montana, Giovanni; Levin, Michael

    2017-01-01

    The WHO estimates around a million children contract tuberculosis (TB) annually with over 80 000 deaths from dissemination of infection outside of the lungs. The insidious onset and association with skin test anergy suggests failure of the immune system to both recognise and respond to infection. To understand the immune mechanisms, we studied genome-wide whole blood RNA expression in children with TB meningitis (TBM). Findings were validated in a second cohort of children with TBM and pulmonary TB (PTB), and functional T-cell responses studied in a third cohort of children with TBM, other extrapulmonary TB (EPTB) and PTB. The predominant RNA transcriptional response in children with TBM was decreased abundance of multiple genes, with 140/204 (68%) of all differentially regulated genes showing reduced abundance compared to healthy controls. Findings were validated in a second cohort with concordance of the direction of differential expression in both TBM (r2 = 0.78 p = 2x10-16) and PTB patients (r2 = 0.71 p = 2x10-16) when compared to a second group of healthy controls. Although the direction of expression of these significant genes was similar in the PTB patients, the magnitude of differential transcript abundance was less in PTB than in TBM. The majority of genes were involved in activation of leucocytes (p = 2.67E-11) and T-cell receptor signalling (p = 6.56E-07). Less abundant gene expression in immune cells was associated with a functional defect in T-cell proliferation that recovered after full TB treatment (p<0.0003). Multiple genes involved in T-cell activation show decreased abundance in children with acute TB, who also have impaired functional T-cell responses. Our data suggest that childhood TB is associated with an acquired immune defect, potentially resulting in failure to contain the pathogen. Elucidation of the mechanism causing the immune paresis may identify new treatment and prevention strategies.

  10. Genome-wide investigation and expression analyses of WD40 protein family in the model plant foxtail millet (Setaria italica L.).

    PubMed

    Mishra, Awdhesh Kumar; Muthamilarasan, Mehanathan; Khan, Yusuf; Parida, Swarup Kumar; Prasad, Manoj

    2014-01-01

    WD40 proteins play a crucial role in diverse protein-protein interactions by acting as scaffolding molecules and thus assisting in the proper activity of proteins. Hence, systematic characterization and expression profiling of these WD40 genes in foxtail millet would enable us to understand the networks of WD40 proteins and their biological processes and gene functions. In the present study, a genome-wide survey was conducted and 225 potential WD40 genes were identified. Phylogenetic analysis categorized the WD40 proteins into 5 distinct sub-families (I-V). Gene Ontology annotation revealed the biological roles of the WD40 proteins along with its cellular components and molecular functions. In silico comparative mapping with sorghum, maize and rice demonstrated the orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of WD40 genes. Estimation of synonymous and non-synonymous substitution rates revealed its evolutionary significance in terms of gene-duplication and divergence. Expression profiling against abiotic stresses provided novel insights into specific and/or overlapping expression patterns of SiWD40 genes. Homology modeling enabled three-dimensional structure prediction was performed to understand the molecular functions of WD40 proteins. Although, recent findings had shown the importance of WD40 domains in acting as hubs for cellular networks during many biological processes, it has invited a lesser research attention unlike other common domains. Being a most promiscuous interactors, WD40 domains are versatile in mediating critical cellular functions and hence this genome-wide study especially in the model crop foxtail millet would serve as a blue-print for functional characterization of WD40s in millets and bioenergy grass species. In addition, the present analyses would also assist the research community in choosing the candidate WD40s for comprehensive studies towards crop improvement of millets and biofuel grasses.

  11. Genome-Wide Investigation and Expression Analyses of WD40 Protein Family in the Model Plant Foxtail Millet (Setaria italica L.)

    PubMed Central

    Mishra, Awdhesh Kumar; Muthamilarasan, Mehanathan; Khan, Yusuf; Parida, Swarup Kumar; Prasad, Manoj

    2014-01-01

    WD40 proteins play a crucial role in diverse protein-protein interactions by acting as scaffolding molecules and thus assisting in the proper activity of proteins. Hence, systematic characterization and expression profiling of these WD40 genes in foxtail millet would enable us to understand the networks of WD40 proteins and their biological processes and gene functions. In the present study, a genome-wide survey was conducted and 225 potential WD40 genes were identified. Phylogenetic analysis categorized the WD40 proteins into 5 distinct sub-families (I–V). Gene Ontology annotation revealed the biological roles of the WD40 proteins along with its cellular components and molecular functions. In silico comparative mapping with sorghum, maize and rice demonstrated the orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of WD40 genes. Estimation of synonymous and non-synonymous substitution rates revealed its evolutionary significance in terms of gene-duplication and divergence. Expression profiling against abiotic stresses provided novel insights into specific and/or overlapping expression patterns of SiWD40 genes. Homology modeling enabled three-dimensional structure prediction was performed to understand the molecular functions of WD40 proteins. Although, recent findings had shown the importance of WD40 domains in acting as hubs for cellular networks during many biological processes, it has invited a lesser research attention unlike other common domains. Being a most promiscuous interactors, WD40 domains are versatile in mediating critical cellular functions and hence this genome-wide study especially in the model crop foxtail millet would serve as a blue-print for functional characterization of WD40s in millets and bioenergy grass species. In addition, the present analyses would also assist the research community in choosing the candidate WD40s for comprehensive studies towards crop improvement of millets and biofuel grasses. PMID:24466268

  12. Annotated Gene and Proteome Data Support Recognition of Interconnections Between the Results of Different Experiments in Space Research

    NASA Astrophysics Data System (ADS)

    Bauer, Johann; Wehland, Markus; Pietsch, Jessica; Sickmann, Albert; Weber, Gerhard; Grimm, Daniela

    2016-06-01

    In a series of studies, human thyroid and endothelial cells exposed to real or simulated microgravity were analyzed in terms of changes in gene expression patterns or protein content. Due to the limitation of available cells in many space research experiments, comparative and control experiments had to be done in a serial manner. Therefore, detected genes or proteins were annotated with gene names and SwissProt numbers, in order to allow searches for interconnections between results obtained in different experiments by different methods. A crosscheck of several studies on the behavior of cytoskeletal genes and proteins suggested that clusters of cytoskeletal components change differently under the influence of microgravity and/or vibration in different cell types. The result that LOX and ISG15 gene expression were clearly altered during the Shenzhou-8 spaceflight mission could be estimated by comparison with the results of other experiments. The more than 100-fold down-regulation of LOX supports our hypothesis that the amount and stability of extracellular matrix have a great influence on the formation of three-dimensional aggregates under microgravity. The approximately 40-fold up-regulation of ISG15 cannot yet be explained in detail, but strongly suggests that ISGylation, an alternative form of posttranslational modification, plays a role in longterm cultures.

  13. Identification of the pheromone biosynthesis genes from the sex pheromone gland transcriptome of the diamondback moth, Plutella xylostella.

    PubMed

    Chen, Da-Song; Dai, Jian-Qing; Han, Shi-Chou

    2017-11-24

    The diamondback moth was estimated to increase costs to the global agricultural economy as the global area increase of Brassica vegetable crops and oilseed rape. Sex pheromones traps are outstanding tools available in Integrated Pest Management for many years and provides an effective approach for DBM population monitoring and control. The ratio of two major sex pheromone compounds shows geographical variations. However, the limitation of our information in the DBM pheromone biosynthesis dampens our understanding of the ratio diversity of pheromone compounds. Here, we constructed a transcriptomic library from the DBM pheromone gland and identified genes putatively involved in the fatty acid biosynthesis, pheromones functional group transfer, and β-oxidation enzymes. In addition, odorant binding protein, chemosensory protein and pheromone binding protein genes encoded in the pheromone gland transcriptome, suggest that female DBM moths may receive odors or pheromone compounds via their pheromone gland and ovipositor system. Tissue expression profiles further revealed that two ALR, three DES and one FAR5 genes were pheromone gland tissue biased, while some chemoreception genes expressed extensively in PG, pupa, antenna and legs tissues. Finally, the candidate genes from large-scale transcriptome information may be useful for characterizing a presumed biosynthetic pathway of the DBM sex pheromone.

  14. Digital gene expression analysis of the zebra finch genome

    PubMed Central

    2010-01-01

    Background In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates. PMID:20359325

  15. The relationship between quantitative human epidermal growth factor receptor 2 gene expression by the 21-gene reverse transcriptase polymerase chain reaction assay and adjuvant trastuzumab benefit in Alliance N9831.

    PubMed

    Perez, Edith A; Baehner, Frederick L; Butler, Steven M; Thompson, E Aubrey; Dueck, Amylou C; Jamshidian, Farid; Cherbavaz, Diana; Yoshizawa, Carl; Shak, Steven; Kaufman, Peter A; Davidson, Nancy E; Gralow, Julie; Asmann, Yan W; Ballman, Karla V

    2015-10-01

    The N9831 trial demonstrated the efficacy of adjuvant trastuzumab for patients with human epidermal growth factor receptor 2 (HER2) locally positive tumors by protein or gene analysis. We used the 21-gene assay to examine the association of quantitative HER2 messenger RNA (mRNA) gene expression and benefit from trastuzumab. N9831 tested the addition of trastuzumab to chemotherapy in stage I-III HER2-positive breast cancer. For two of the arms of the trial, doxorubicin and cyclophosphamide followed by paclitaxel (AC-T) and doxorubicin and cyclophosphamide followed by paclitaxel and trastuzumab concurrent chemotherapy-trastuzumab (AC-TH), recurrence score (RS) and HER2 mRNA expression were determined by the 21-gene assay (Oncotype DX®) (negative <10.7, equivocal 10.7 to <11.5, and positive ≥11.5 log2 expression units). Cox regression was used to assess the association of HER2 expression with trastuzumab benefit in preventing distant recurrence. Median follow-up was 7.4 years. Of 1,940 total patients, 901 had consent and sufficient tissue. HER2 by reverse transcriptase polymerase chain reaction (RT-PCR) was negative in 130 (14 %), equivocal in 85 (9 %), and positive in 686 (76 %) patients. Concordance between HER2 assessments was 95 % for RT-PCR versus central immunohistochemistry (IHC) (>10 % positive cells = positive), 91 % for RT-PCR versus central fluorescence in situ hybridization (FISH) (≥2.0 = positive) and 94 % for central IHC versus central FISH. In the primary analysis, the association of HER2 expression by 21-gene assay with trastuzumab benefit was marginally nonsignificant (nonlinear p = 0.057). In hormone receptor-positive patients (local IHC) the association was significant (p = 0.002). The association was nonlinear with the greatest estimated benefit at lower and higher HER2 expression levels. Concordance among HER2 assessments by central IHC, FISH, and RT-PCR were similar and high. Association of HER2 mRNA expression with trastuzumab benefit as measured by time to distant recurrence was nonsignificant. A consistent benefit of trastuzumab irrespective of mHER2 levels was observed in patients with either IHC-positive or FISH-positive tumors. Trend for benefit was observed also for the small groups of patients with negative results by any or all of the central assays. Clinicaltrials.gov NCT00005970 . Registered 5 July 2000.

  16. No Control Genes Required: Bayesian Analysis of qRT-PCR Data

    PubMed Central

    Matz, Mikhail V.; Wright, Rachel M.; Scott, James G.

    2013-01-01

    Background Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process. Results In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests. Conclusions Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R. PMID:23977043

  17. A novel approach for human whole transcriptome analysis based on absolute gene expression of microarray data.

    PubMed

    Bikel, Shirley; Jacobo-Albavera, Leonor; Sánchez-Muñoz, Fausto; Cornejo-Granados, Fernanda; Canizales-Quinteros, Samuel; Soberón, Xavier; Sotelo-Mundo, Rogerio R; Del Río-Navarro, Blanca E; Mendoza-Vargas, Alfredo; Sánchez, Filiberto; Ochoa-Leyva, Adrian

    2017-01-01

    In spite of the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under a specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. We extracted the RNA from 16 children leukocyte samples (nine males and seven females, ages 6-10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and confirmed their expression level by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEGG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments.

  18. Microarray analysis of a salamander hopeful monster reveals transcriptional signatures of paedomorphic brain development

    PubMed Central

    2010-01-01

    Background The Mexican axolotl (Ambystoma mexicanum) is considered a hopeful monster because it exhibits an adaptive and derived mode of development - paedomorphosis - that has evolved rapidly and independently among tiger salamanders. Unlike related tiger salamanders that undergo metamorphosis, axolotls retain larval morphological traits into adulthood and thus present an adult body plan that differs dramatically from the ancestral (metamorphic) form. The basis of paedomorphic development was investigated by comparing temporal patterns of gene transcription between axolotl and tiger salamander larvae (Ambystoma tigrinum tigrinum) that typically undergo a metamorphosis. Results Transcript abundances from whole brain and pituitary were estimated via microarray analysis on four different days post hatching (42, 56, 70, 84 dph) and regression modeling was used to independently identify genes that were differentially expressed as a function of time in both species. Collectively, more differentially expressed genes (DEGs) were identified as unique to the axolotl (n = 76) and tiger salamander (n = 292) than were identified as shared (n = 108). All but two of the shared DEGs exhibited the same temporal pattern of expression and the unique genes tended to show greater changes later in the larval period when tiger salamander larvae were undergoing anatomical metamorphosis. A second, complementary analysis that directly compared the expression of 1320 genes between the species identified 409 genes that differed as a function of species or the interaction between time and species. Of these 409 DEGs, 84% exhibited higher abundances in tiger salamander larvae at all sampling times. Conclusions Many of the unique tiger salamander transcriptional responses are probably associated with metamorphic biological processes. However, the axolotl also showed unique patterns of transcription early in development. In particular, the axolotl showed a genome-wide reduction in mRNA abundance across loci, including genes that regulate hypothalamic-pituitary activities. This suggests that an axolotls failure to undergo anatomical metamorphosis late in the larval period is indirectly associated with a mechanism(s) that acts earlier in development to broadly program transcription. The axolotl hopeful monster provides a model to identify mechanisms of early brain development that proximally and ultimately affect the expression of adult phenotypes. PMID:20584293

  19. An improved method for functional similarity analysis of genes based on Gene Ontology.

    PubMed

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  20. Integrative sparse principal component analysis of gene expression data.

    PubMed

    Liu, Mengque; Fan, Xinyan; Fang, Kuangnan; Zhang, Qingzhao; Ma, Shuangge

    2017-12-01

    In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform "classic" meta-analysis and other multidatasets techniques and single-dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance. © 2017 WILEY PERIODICALS, INC.

  1. Constitutive Expression of a Transcription Termination Factor by a Repressed Prophage: Promoters for Transcribing the Phage HK022 nun Gene

    PubMed Central

    King, Rodney A.; Madsen, Peter L.; Weisberg, Robert A.

    2000-01-01

    Lysogens of phage HK022 are resistant to infection by phage λ. Lambda resistance is caused by the action of the HK022 Nun protein, which prematurely terminates early λ transcripts. We report here that transcription of the nun gene initiates at a constitutive prophage promoter, PNun, located just upstream of the protein coding sequence. The 5′ end of the transcript was determined by primer extension analysis of RNA isolated from HK022 lysogens or RNA made in vitro by transcribing a template containing the promoter with purified Escherichia coli RNA polymerase. Inactivation of PNun by mutation greatly reduced Nun activity and Nun antigen in an HK022 lysogen. However, a low level of residual activity was detected, suggesting that a secondary promoter also contributes to nun expression. We found one possible secondary promoter, PNun′, just upstream of PNun. Neither promoter is likely to increase the expression of other phage genes in a lysogen because their transcripts should be terminated downstream of nun. We estimate that HK022 lysogens in stationary phase contain several hundred molecules of Nun per cell and that cells in exponential phase probably contain fewer. PMID:10629193

  2. Involvement of aryl hydrocarbon receptor signaling in the development of small cell lung cancer induced by HPV E6/E7 oncoproteins

    PubMed Central

    2011-01-01

    Background Lung cancers consist of four major types that and for clinical-pathological reasons are often divided into two broad categories: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). All major histological types of lung cancer are associated with smoking, although the association is stronger for SCLC and squamous cell carcinoma than adenocarcinoma. To date, epidemiological studies have identified several environmental, genetic, hormonal and viral factors associated with lung cancer risk. It has been estimated that 15-25% of human cancers may have a viral etiology. The human papillomavirus (HPV) is a proven cause of most human cervical cancers, and might have a role in other malignancies including vulva, skin, oesophagus, head and neck cancer. HPV has also been speculated to have a role in the pathogenesis of lung cancer. To validate the hypothesis of HPV involvement in small cell lung cancer pathogenesis we performed a gene expression profile of transgenic mouse model of SCLC induced by HPV-16 E6/E7 oncoproteins. Methods Gene expression profile of SCLC has been performed using Agilent whole mouse genome (4 × 44k) representing ~ 41000 genes and mouse transcripts. Samples were obtained from two HPV16-E6/E7 transgenic mouse models and from littermate's normal lung. Data analyses were performed using GeneSpring 10 and the functional classification of deregulated genes was performed using Ingenuity Pathway Analysis (Ingenuity® Systems, http://www.ingenuity.com). Results Analysis of deregulated genes induced by the expression of E6/E7 oncoproteins supports the hypothesis of a linkage between HPV infection and SCLC development. As a matter of fact, comparison of deregulated genes in our system and those in human SCLC showed that many of them are located in the Aryl Hydrocarbon Receptor Signal transduction pathway. Conclusions In this study, the global gene expression of transgenic mouse model of SCLC induced by HPV-16 E6/E7 oncoproteins led us to identification of several genes involved in SCLC tumor development. Furthermore, our study reveled that the Aryl Hydrocarbon Receptor Signaling is the primarily affected pathway by the E6/E7 oncoproteins expression and that this pathway is also deregulated in human SCLC. Our results provide the basis for the development of new therapeutic approaches against human SCLC. PMID:21205295

  3. Interleukins 1alpha and 1beta secreted by some melanoma cell lines strongly reduce expression of MITF-M and melanocyte differentiation antigens.

    PubMed

    Kholmanskikh, Olga; van Baren, Nicolas; Brasseur, Francis; Ottaviani, Sabrina; Vanacker, Julie; Arts, Nathalie; van der Bruggen, Pierre; Coulie, Pierre; De Plaen, Etienne

    2010-10-01

    We report that melanoma cell lines expressing the interleukin-1 receptor exhibit 4- to 10-fold lower levels of mRNA of microphthalmia-associated transcription factor (MITF-M) when treated with interleukin-1beta. This effect is NF-kappaB and JNK-dependent. MITF-M regulates the expression of melanocyte differentiation genes such as MLANA, tyrosinase and gp100, which encode antigens recognized on melanoma cells by autologous cytolytic T lymphocytes. Accordingly, treating some melanoma cells with IL-1beta reduced by 40-100% their ability to activate such antimelanoma cytolytic T lymphocytes. Finally, we observed large amounts of biologically active IL-1alpha or IL-1beta secreted by two melanoma cell lines that did not express MITF-M, suggesting an autocrine MITF-M downregulation. We estimate that approximately 13% of melanoma cell lines are MITF-M-negative and secrete IL-1 cytokines. These results indicate that the repression of melanocyte-differentiation genes by IL-1 produced by stromal cells or by tumor cells themselves may represent an additional mechanism of melanoma immune escape.

  4. Differential gene expression analysis in glioblastoma cells and normal human brain cells based on GEO database.

    PubMed

    Wang, Anping; Zhang, Guibin

    2017-11-01

    The differentially expressed genes between glioblastoma (GBM) cells and normal human brain cells were investigated to performed pathway analysis and protein interaction network analysis for the differentially expressed genes. GSE12657 and GSE42656 gene chips, which contain gene expression profile of GBM were obtained from Gene Expression Omniub (GEO) database of National Center for Biotechnology Information (NCBI). The 'limma' data packet in 'R' software was used to analyze the differentially expressed genes in the two gene chips, and gene integration was performed using 'RobustRankAggreg' package. Finally, pheatmap software was used for heatmap analysis and Cytoscape, DAVID, STRING and KOBAS were used for protein-protein interaction, Gene Ontology (GO) and KEGG analyses. As results: i) 702 differentially expressed genes were identified in GSE12657, among those genes, 548 were significantly upregulated and 154 were significantly downregulated (p<0.01, fold-change >1), and 1,854 differentially expressed genes were identified in GSE42656, among the genes, 1,068 were significantly upregulated and 786 were significantly downregulated (p<0.01, fold-change >1). A total of 167 differentially expressed genes including 100 upregulated genes and 67 downregulated genes were identified after gene integration, and the genes showed significantly different expression levels in GBM compared with normal human brain cells (p<0.05). ii) Interactions between the protein products of 101 differentially expressed genes were identified using STRING and expression network was established. A key gene, called CALM3, was identified by Cytoscape software. iii) GO enrichment analysis showed that differentially expressed genes were mainly enriched in 'neurotransmitter:sodium symporter activity' and 'neurotransmitter transporter activity', which can affect the activity of neurotransmitter transportation. KEGG pathway analysis showed that the differentially expressed genes were mainly enriched in 'protein processing in endoplasmic reticulum', which can affect protein processing in endoplasmic reticulum. The results showed that: i) 167 differentially expressed genes were identified from two gene chips after integration; and ii) protein interaction network was established, and GO and KEGG pathway analyses were successfully performed to identify and annotate the key gene, which provide new insights for the studies on GBN at gene level.

  5. Osteograft, plastic material for regenerative medicine

    NASA Astrophysics Data System (ADS)

    Zaidman, A. M.; Korel, A. V.; Shevchenko, A. I.; Shchelkunova, E. I.; Sherman, K. M.; Predein, Yu. A.; Kosareva, O. S.

    2016-08-01

    Creating tissue-engineering constructs based on the mechanism of cartilage-bone evolution is promising for traumatology and orthopedics. Such a graft was obtained from a chondrograft by transdifferentiation. The hondrograft placed in osteogenic medium is undergoing osteogenic differentiation for 14-30 days. Tissue specificity of the osteograft was studied by morphology, immunohistochemistry, electron microscopy, and the expression of the corresponding genes was estimated. The expression of osteonectin, fibronectin, collagen of type I, izolektin and CD 44 is determined. Alkaline phosphatase and matrix vesicles are determined in osteoblasts. Calcificates are observed in the matrix. Chondrogenic proteins expression is absent. These findings evidence the tissue specificity of the developed osteograft.

  6. Discovery and validation of a glioblastoma co-expressed gene module

    PubMed Central

    Dunwoodie, Leland J.; Poehlman, William L.; Ficklin, Stephen P.; Feltus, Frank Alexander

    2018-01-01

    Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network. PMID:29541392

  7. Discovery and validation of a glioblastoma co-expressed gene module.

    PubMed

    Dunwoodie, Leland J; Poehlman, William L; Ficklin, Stephen P; Feltus, Frank Alexander

    2018-02-16

    Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.

  8. A new measure for gene expression biclustering based on non-parametric correlation.

    PubMed

    Flores, Jose L; Inza, Iñaki; Larrañaga, Pedro; Calvo, Borja

    2013-12-01

    One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  9. Predicting chemical bioavailability using microarray gene expression data and regression modeling: A tale of three explosive compounds.

    PubMed

    Gong, Ping; Nan, Xiaofei; Barker, Natalie D; Boyd, Robert E; Chen, Yixin; Wilkins, Dawn E; Johnson, David R; Suedel, Burton C; Perkins, Edward J

    2016-03-08

    Chemical bioavailability is an important dose metric in environmental risk assessment. Although many approaches have been used to evaluate bioavailability, not a single approach is free from limitations. Previously, we developed a new genomics-based approach that integrated microarray technology and regression modeling for predicting bioavailability (tissue residue) of explosives compounds in exposed earthworms. In the present study, we further compared 18 different regression models and performed variable selection simultaneously with parameter estimation. This refined approach was applied to both previously collected and newly acquired earthworm microarray gene expression datasets for three explosive compounds. Our results demonstrate that a prediction accuracy of R(2) = 0.71-0.82 was achievable at a relatively low model complexity with as few as 3-10 predictor genes per model. These results are much more encouraging than our previous ones. This study has demonstrated that our approach is promising for bioavailability measurement, which warrants further studies of mixed contamination scenarios in field settings.

  10. Gastrointestinal Fibroblasts Have Specialized, Diverse Transcriptional Phenotypes: A Comprehensive Gene Expression Analysis of Human Fibroblasts

    PubMed Central

    Ishii, Genichiro; Aoyagi, Kazuhiko; Sasaki, Hiroki; Ochiai, Atsushi

    2015-01-01

    Background Fibroblasts are the principal stromal cells that exist in whole organs and play vital roles in many biological processes. Although the functional diversity of fibroblasts has been estimated, a comprehensive analysis of fibroblasts from the whole body has not been performed and their transcriptional diversity has not been sufficiently explored. The aim of this study was to elucidate the transcriptional diversity of human fibroblasts within the whole body. Methods Global gene expression analysis was performed on 63 human primary fibroblasts from 13 organs. Of these, 32 fibroblasts from gastrointestinal organs (gastrointestinal fibroblasts: GIFs) were obtained from a pair of 2 anatomical sites: the submucosal layer (submucosal fibroblasts: SMFs) and the subperitoneal layer (subperitoneal fibroblasts: SPFs). Using hierarchical clustering analysis, we elucidated identifiable subgroups of fibroblasts and analyzed the transcriptional character of each subgroup. Results In unsupervised clustering, 2 major clusters that separate GIFs and non-GIFs were observed. Organ- and anatomical site-dependent clusters within GIFs were also observed. The signature genes that discriminated GIFs from non-GIFs, SMFs from SPFs, and the fibroblasts of one organ from another organ consisted of genes associated with transcriptional regulation, signaling ligands, and extracellular matrix remodeling. Conclusions GIFs are characteristic fibroblasts with specific gene expressions from transcriptional regulation, signaling ligands, and extracellular matrix remodeling related genes. In addition, the anatomical site- and organ-dependent diversity of GIFs was also discovered. These features of GIFs contribute to their specific physiological function and homeostatic maintenance, and create a functional diversity of the gastrointestinal tract. PMID:26046848

  11. Mining of Ruminant Microbial Phytase (RPHY1) from Metagenomic Data of Mehsani Buffalo Breed: Identification, Gene Cloning, and Characterization.

    PubMed

    Mootapally, Chandra Shekar; Nathani, Neelam M; Patel, Amrutlal K; Jakhesara, Subhash J; Joshi, Chaitanya G

    2016-01-01

    Phytases have been widely used as animal feed supplements to increase the availability of digestible phosphorus, especially in monogastric animals fed cereal grains. The present study describes the identification of a full-length phytase gene of Prevotella species present in Mehsani buffalo rumen. The gene, designated as RPHY1, consists of 1,251 bp and is expressed into protein with 417 amino acids. A homology search of the deduced amino acid sequence of the RPHY1 phytase gene in a nonredundant protein database showed that it shares 92% similarity with the histidine acid phosphatase domain. Subsequently, the RPHY1 gene was expressed using a pET32a expression vector in Escherichia coli BL21 and purified using a His60 Ni-NTA gravity column. The mass of the purified RPHY1 was estimated to be approximately 63 kDa by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). The optimal RPHY1 enzyme activity was observed at 55°C (pH 5) and exhibited good stability at 5°C and within the acidic pH range. Significant inhibition of RPHY1 activity was observed for Mg2+ and K+ metal ions, while Ca2+, Mn2+, and Na+ slightly inhibited enzyme activity. The RPHY1 phytase was susceptible to SDS, and it was highly stimulated in the presence of EDTA. Overall, the observed comparatively high enzyme activity levels and characteristics of the RPHY1 gene mined from rumen prove its promising candidature as a feed supplement enzyme in animal farming. © 2016 S. Karger AG, Basel.

  12. Occupational styrene exposure induces stress-responsive genes involved in cytoprotective and cytotoxic activities.

    PubMed

    Strafella, Elisabetta; Bracci, Massimo; Staffolani, Sara; Manzella, Nicola; Giantomasi, Daniele; Valentino, Matteo; Amati, Monica; Tomasetti, Marco; Santarelli, Lory

    2013-01-01

    The aim of this study was to evaluate the expression of a panel of genes involved in toxicology in response to styrene exposure at levels below the occupational standard setting. Workers in a fiber glass boat industry were evaluated for a panel of stress- and toxicity-related genes and associated with biochemical parameters related to hepatic injury. Urinary styrene metabolites (MA+PGA) of subjects and environmental sampling data collected for air at workplace were used to estimate styrene exposure. Expression array analysis revealed massive upregulation of genes encoding stress-responsive proteins (HSPA1L, EGR1, IL-6, IL-1β, TNSF10 and TNFα) in the styrene-exposed group; the levels of cytokines released were further confirmed in serum. The exposed workers were then stratified by styrene exposure levels. EGR1 gene upregulation paralleled the expression and transcriptional protein levels of IL-6, TNSF10 and TNFα in styrene exposed workers, even at low level. The activation of the EGR1 pathway observed at low-styrene exposure was associated with a slight increase of hepatic markers found in highly exposed subjects, even though they were within normal range. The ALT and AST levels were not affected by alcohol consumption, and positively correlated with urinary styrene metabolites as evaluated by multiple regression analysis. The pro-inflammatory cytokines IL-6 and TNFα are the primary mediators of processes involved in the hepatic injury response and regeneration. Here, we show that styrene induced stress responsive genes involved in cytoprotection and cytotoxicity at low-exposure, that proceed to a mild subclinical hepatic toxicity at high-styrene exposure.

  13. Genome-Wide Investigation and Expression Profiling of AP2/ERF Transcription Factor Superfamily in Foxtail Millet (Setaria italica L.)

    PubMed Central

    Lata, Charu; Mishra, Awdhesh Kumar; Muthamilarasan, Mehanathan; Bonthala, Venkata Suresh; Khan, Yusuf; Prasad, Manoj

    2014-01-01

    The APETALA2/ethylene-responsive element binding factor (AP2/ERF) family is one of the largest transcription factor (TF) families in plants that includes four major sub-families, namely AP2, DREB (dehydration responsive element binding), ERF (ethylene responsive factors) and RAV (Related to ABI3/VP). AP2/ERFs are known to play significant roles in various plant processes including growth and development and biotic and abiotic stress responses. Considering this, a comprehensive genome-wide study was conducted in foxtail millet (Setaria italica L.). A total of 171 AP2/ERF genes were identified by systematic sequence analysis and were physically mapped onto nine chromosomes. Phylogenetic analysis grouped AP2/ERF genes into six classes (I to VI). Duplication analysis revealed that 12 (∼7%) SiAP2/ERF genes were tandem repeated and 22 (∼13%) were segmentally duplicated. Comparative physical mapping between foxtail millet AP2/ERF genes and its orthologs of sorghum (18 genes), maize (14 genes), rice (9 genes) and Brachypodium (6 genes) showed the evolutionary insights of AP2/ERF gene family and also the decrease in orthology with increase in phylogenetic distance. The evolutionary significance in terms of gene-duplication and divergence was analyzed by estimating synonymous and non-synonymous substitution rates. Expression profiling of candidate AP2/ERF genes against drought, salt and phytohormones revealed insights into their precise and/or overlapping expression patterns which could be responsible for their functional divergence in foxtail millet. The study showed that the genes SiAP2/ERF-069, SiAP2/ERF-103 and SiAP2/ERF-120 may be considered as potential candidate genes for further functional validation as well for utilization in crop improvement programs for stress resistance since these genes were up-regulated under drought and salinity stresses in ABA dependent manner. Altogether the present study provides new insights into evolution, divergence and systematic functional analysis of AP2/ERF gene family at genome level in foxtail millet which may be utilized for improving stress adaptation and tolerance in millets, cereals and bioenergy grasses. PMID:25409524

  14. Genome-wide investigation and expression profiling of AP2/ERF transcription factor superfamily in foxtail millet (Setaria italica L.).

    PubMed

    Lata, Charu; Mishra, Awdhesh Kumar; Muthamilarasan, Mehanathan; Bonthala, Venkata Suresh; Khan, Yusuf; Prasad, Manoj

    2014-01-01

    The APETALA2/ethylene-responsive element binding factor (AP2/ERF) family is one of the largest transcription factor (TF) families in plants that includes four major sub-families, namely AP2, DREB (dehydration responsive element binding), ERF (ethylene responsive factors) and RAV (Related to ABI3/VP). AP2/ERFs are known to play significant roles in various plant processes including growth and development and biotic and abiotic stress responses. Considering this, a comprehensive genome-wide study was conducted in foxtail millet (Setaria italica L.). A total of 171 AP2/ERF genes were identified by systematic sequence analysis and were physically mapped onto nine chromosomes. Phylogenetic analysis grouped AP2/ERF genes into six classes (I to VI). Duplication analysis revealed that 12 (∼7%) SiAP2/ERF genes were tandem repeated and 22 (∼13%) were segmentally duplicated. Comparative physical mapping between foxtail millet AP2/ERF genes and its orthologs of sorghum (18 genes), maize (14 genes), rice (9 genes) and Brachypodium (6 genes) showed the evolutionary insights of AP2/ERF gene family and also the decrease in orthology with increase in phylogenetic distance. The evolutionary significance in terms of gene-duplication and divergence was analyzed by estimating synonymous and non-synonymous substitution rates. Expression profiling of candidate AP2/ERF genes against drought, salt and phytohormones revealed insights into their precise and/or overlapping expression patterns which could be responsible for their functional divergence in foxtail millet. The study showed that the genes SiAP2/ERF-069, SiAP2/ERF-103 and SiAP2/ERF-120 may be considered as potential candidate genes for further functional validation as well for utilization in crop improvement programs for stress resistance since these genes were up-regulated under drought and salinity stresses in ABA dependent manner. Altogether the present study provides new insights into evolution, divergence and systematic functional analysis of AP2/ERF gene family at genome level in foxtail millet which may be utilized for improving stress adaptation and tolerance in millets, cereals and bioenergy grasses.

  15. Zebrafish fin immune responses during high mortality infections with viral haemorrhagic septicemia rhabdovirus. A proteomic and transcriptomic approach.

    PubMed

    Encinas, Paloma; Rodriguez-Milla, Miguel A; Novoa, Beatriz; Estepa, Amparo; Figueras, Antonio; Coll, Julio

    2010-09-27

    Despite rhabdoviral infections being one of the best known fish diseases, the gene expression changes induced at the surface tissues after the natural route of infection (infection-by-immersion) have not been described yet. This work describes the differential infected versus non-infected expression of proteins and immune-related transcripts in fins and organs of zebrafish Danio rerio shortly after infection-by-immersion with viral haemorrhagic septicemia virus (VHSV). Two-dimensional differential gel electrophoresis detected variations on the protein levels of the enzymes of the glycolytic pathway and cytoskeleton components but it detected very few immune-related proteins. Differential expression of immune-related gene transcripts estimated by quantitative polymerase chain reaction arrays and hybridization to oligo microarrays showed that while more transcripts increased in fins than in organs (spleen, head kidney and liver), more transcripts decreased in organs than in fins. Increased differential transcript levels in fins detected by both arrays corresponded to previously described infection-related genes such as complement components (c3b, c8 and c9) or class I histocompatibility antigens (mhc1) and to newly described genes such as secreted immunoglobulin domain (sid4), macrophage stimulating factor (mst1) and a cluster differentiation antigen (cd36). The genes described would contribute to the knowledge of the earliest molecular events occurring in the fish surfaces at the beginning of natural rhabdoviral infections and/or might be new candidates to be tested as adjuvants for fish vaccines.

  16. Assessment of reference gene stability influenced by extremely divergent disease symptoms in Solanum lycopersicum L.

    PubMed

    Wieczorek, Przemysław; Wrzesińska, Barbara; Obrępalska-Stęplowska, Aleksandra

    2013-12-01

    Tomato (Solanum lycopersicum L.) is one of the most important vegetables of great worldwide economic value. The scientific importance of the vegetable results from the fact that the genome of S. lycopersicum has been sequenced. This allows researchers to study fundamental mechanisms playing an essential role during tomato development and response to environmental factors contributing significantly to cell metabolism alterations. Parallel with the development of contemporary genetics and the constant increase in sequencing data, progress has to be aligned with improvement of experimental methods used for studying genes functions and gene expression levels, of which the quantitative polymerase chain reaction (qPCR) is still the most reliable. As well as with other nucleic acid-based methods used for comparison of the abundance of specific RNAs, the RT-qPCR data have to be normalised to the levels of RNAs represented stably in a cell. To achieve the goal, the so-called housekeeping genes (i.e., RNAs encoding, for instance, proteins playing an important role in the cell metabolism or structure maintenance), are used for normalisation of the target gene expression data. However, a number of studies have indicated the transcriptional instability of commonly used reference genes analysed in different situations or conditions; for instance, the origin of cells, tissue types, or environmental or other experimental conditions. The expression of ten common housekeeping genes of S. lycopersicum, namely EF1α, TUB, CAC, EXP, RPL8, GAPDH, TBP, ACT, SAND and 18S rRNA were examined during viral infections of tomato. Changes in the expression levels of the genes were estimated by comparison of the non-inoculated tomato plants with those infected with commonly known tomato viral pathogens, Tomato torrado virus, Cucumber mosaic virus, Tobacco mosaic virus and Pepino mosaic virus, inducing a diverse range of disease symptoms on the common host, ranging from mild leaves chlorosis to very severe stem necrosis. It is emphasised that despite the wide range of diverse disease symptoms it is concluded that ACT, CAC and EF1α could be used as the most suitable reference genes in studies of host-virus interactions in tomato. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Gender-Specific Gene Expression in Post-Mortem Human Brain: Localization to Sex Chromosomes

    PubMed Central

    Vawter, Marquis P; Evans, Simon; Choudary, Prabhakara; Tomita, Hiroaki; Meador-Woodruff, Jim; Molnar, Margherita; Li, Jun; Lopez, Juan F; Myers, Rick; Cox, David; Watson, Stanley J; Akil, Huda; Jones, Edward G; Bunney, William E

    2011-01-01

    Gender differences in brain development and in the prevalence of neuropsychiatric disorders such as depression have been reported. Gender differences in human brain might be related to patterns of gene expression. Microarray technology is one useful method for investigation of gene expression in brain. We investigated gene expression, cell types, and regional expression patterns of differentially expressed sex chromosome genes in brain. We profiled gene expression in male and female dorsolateral prefrontal cortex, anterior cingulate cortex, and cerebellum using the Affymetrix oligonucleotide microarray platform. Differentially expressed genes between males and females on the Y chromosome (DBY, SMCY, UTY, RPS4Y, and USP9Y) and X chromosome (XIST) were confirmed using real-time PCR measurements. In situ hybridization confirmed the differential expression of gender-specific genes and neuronal expression of XIST, RPS4Y, SMCY, and UTY in three brain regions examined. The XIST gene, which silences gene expression on regions of the X chromosome, is expressed in a subset of neurons. Since a subset of neurons express gender-specific genes, neural subpopulations may exhibit a subtle sexual dimorphism at the level of differences in gene regulation and function. The distinctive pattern of neuronal expression of XIST, RPS4Y, SMCY, and UTY and other sex chromosome genes in neuronal subpopulations may possibly contribute to gender differences in prevalence noted for some neuropsychiatric disorders. Studies of the protein expression of these sex- chromosome-linked genes in brain tissue are required to address the functional consequences of the observed gene expression differences. PMID:14583743

  18. Identification and gene expression of anaerobically induced enolase in Echinochloa phyllopogon and Echinochloa crus-pavonis.

    PubMed Central

    Fox, T C; Mujer, C V; Andrews, D L; Williams, A S; Cobb, B G; Kennedy, R A; Rumpho, M E

    1995-01-01

    Enolase (2-phospho-D-glycerate hydrolase, EC 4.2.1.11) has been identified as an anaerobic stress protein in Echinochloa oryzoides based on the homology of its internal amino acid sequence with those of enolases from other organisms, by immunological reactivity, and induction of catalytic activity during anaerobic stress. Enolase activity was induced 5-fold in anoxically treated seedlings of three flood-tolerant species (E. oryzoides, Echinochloa phyllopogon, and rice [Oryza sativa L.]) but not in the flood-intolerant species (Echinochloa crus-pavonis). A 540-bp fragment of the enolase gene was amplified by polymerase chain reaction from cDNAs of E. phyllopogon and maize (Zea mays L.) and used to estimate the number of enolase genes and to study the expression of enolase transcripts in E. phyllopogon, E. crus-pavonis, and maize. Southern blot analysis indicated that only one enolase gene is present in either E. phyllopogon or E. crus-pavonis. Three patterns of enolase gene expression were observed in the three species studied. In E. phyllopogon, enolase induction at both the mRNA and enzyme activity levels was sustained at all times with a further induction after 48 h of anoxia. In contrast, enolase was induced in hypoxically treated maize root tips only at the mRNA level. In E. crus-pavonis, enolase mRNA and enzyme activity were induced during hypoxia, but activity was only transiently elevated. These results suggest that enolase expression in maize and E. crus-pavonis during anoxia are similarly regulated at the transcriptional level but differ in posttranslational regulation, whereas enolase is fully induced in E. phyllopogon during anaerobiosis. PMID:7480340

  19. The upregulation of immune responses in tyrosine hydroxylase (TH) silenced Litopenaeus vannamei.

    PubMed

    Mapanao, Ratchaneegorn; Chang, Chin-Chyuan; Cheng, Winton

    2017-02-01

    Catecholamines (CAs) play a crucial role in maintaining physiological and immune homeostasis in invertebrates and vertebrates under stressful conditions. Tyrosine hydroxylase (TH) is the first and rate-limiting enzyme in CA synthesis. To develop an effective CA-related immunological defense system against stress and pathogen infection, various criteria, were evaluated in TH double-stranded (ds) RNA-injected white shrimp, Litopenaeus vannamei. Specifically, the relative transcript quantification of TH, dopamine β-hydroxylase (DBH), crustacean hyperglycemic hormone (CHH), and other immune-related genes; TH activity in the haemolymph; and the estimation of l-dihydroxyphenylalanine (l-DOPA), glucose, and lactate levels in the haemolymph were examined. TH depletion revealed a significant increase in the total haemocyte count; granular cells; semigranular cells; respiratory bursts (RBs, release of superoxide anion); superoxide dismutase (SOD) activity; phagocytic activity and clearance efficiency; and the expression of lipopolysaccharide and β-1,3-glucan-binding protein and peroxinectin, SOD, crustin, and lysozyme genes. In addition, the reduction of TH gene expression and activity was accompanied by a decline of phenoloxidase (PO) activity per granulocyte, lower glucose and lactate levels, and significantly low expression of DBH and CHH genes. However, the number of hyaline cells, activity of PO, RBs per haemocyte, and expression of POI and POII genes were not significantly different in the LvTH-silenced shrimp. Notably, the survival ratio of LvTH-silenced shrimp was significantly higher than that of shrimp injected with diethyl pyrocarbonate-water and nontargeting dsRNA when challenged with Vibrio alginolyticus. Therefore, the depletion of TH can enhance disease resistance in shrimp by upregulating specific immune parameters but downregulating the levels of carbohydrate metabolites. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Gene expression of stem cells at different stages of ontological human development.

    PubMed

    Allegra, Adolfo; Altomare, Roberta; Curcio, Patrizia; Santoro, Alessandra; Lo Monte, Attilio I; Mazzola, Sergio; Marino, Angelo

    2013-10-01

    To compare multipotent mesenchymal stem cells (MSCs) obtained from chorionic villi (CV), amniotic fluid (AF) and placenta, with regard to their phenotype and gene expression, in order to understand if MSCs derived from different extra-embryonic tissues, at different stages of human ontological development, present distinct stemness characteristics. MSCs obtained from 30 samples of CV, 30 of AF and 10 placentas (obtained from elective caesarean sections) were compared. MSCs at second confluence cultures were characterized by immunophenotypic analysis with flow cytometry using FACS CANTO II. The expression of the genes Oct-4 (Octamer-binding transcription factor 4, also known as POU5F1), Sox-2 (SRY box-containing factor 2), Nanog, Rex-1 (Zfp-42) and Pax-6 (Paired Box Protein-6), was analyzed. Real-time quantitative PCR was performed by ABI Prism 7700, after RNA isolation and retro-transcription in cDNA. Statistical analysis was performed using non-parametric test Kruskal-Wallis (XLSTAT 2011) and confirmed by REST software, to estimate fold changes between samples. Each gene was defined differentially expressed if p-value was <0.05. Cells from all samples were negative for haematopoietic antigens CD45, CD34, CD117 and CD33 and positive for the typical MSCs antigens CD13, CD73 and CD90. Nevertheless, MSCs from AF and placentas showed different fluorescence intensity, reflecting the heterogeneity of these tissues. The gene expression of OCT-4, SOX-2, NANOG was not significantly different among the three groups. In AF, REX-1 and PAX-6 showed a higher expression in comparison to CV. MSCs of different extra-embryonic tissues showed no differences in immunophenotype when collected from second confluence cultures. The expression of OCT-4, NANOG and SOX-2 was not significantly different, demonstrating that all fetal sources are suitable for obtaining MSCs. These results open new possibilities for the clinical use of MSCs derived from easily accessible sources, in order to develop new protocols for clinical and experimental research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

Top