Science.gov

Sample records for based gene expression

  1. Classification of genes based on gene expression analysis

    SciTech Connect

    Angelova, M. Myers, C. Faith, J.

    2008-05-15

    Systems biology and bioinformatics are now major fields for productive research. DNA microarrays and other array technologies and genome sequencing have advanced to the point that it is now possible to monitor gene expression on a genomic scale. Gene expression analysis is discussed and some important clustering techniques are considered. The patterns identified in the data suggest similarities in the gene behavior, which provides useful information for the gene functionalities. We discuss measures for investigating the homogeneity of gene expression data in order to optimize the clustering process. We contribute to the knowledge of functional roles and regulation of E. coli genes by proposing a classification of these genes based on consistently correlated genes in expression data and similarities of gene expression patterns. A new visualization tool for targeted projection pursuit and dimensionality reduction of gene expression data is demonstrated.

  2. Gene-Ontology-based clustering of gene expression data.

    PubMed

    Adryan, Boris; Schuh, Reinhard

    2004-11-01

    The expected correlation between genetic co-regulation and affiliation to a common biological process is not necessarily the case when numerical cluster algorithms are applied to gene expression data. GO-Cluster uses the tree structure of the Gene Ontology database as a framework for numerical clustering, and thus allowing a simple visualization of gene expression data at various levels of the ontology tree. The 32-bit Windows application is freely available at http://www.mpibpc.mpg.de/go-cluster/

  3. Robust PCA based method for discovering differentially expressed genes.

    PubMed

    Liu, Jin-Xing; Wang, Yu-Tian; Zheng, Chun-Hou; Sha, Wen; Mi, Jian-Xun; Xu, Yong

    2013-01-01

    How to identify a set of genes that are relevant to a key biological process is an important issue in current molecular biology. In this paper, we propose a novel method to discover differentially expressed genes based on robust principal component analysis (RPCA). In our method, we treat the differentially and non-differentially expressed genes as perturbation signals S and low-rank matrix A, respectively. Perturbation signals S can be recovered from the gene expression data by using RPCA. To discover the differentially expressed genes associated with special biological progresses or functions, the scheme is given as follows. Firstly, the matrix D of expression data is decomposed into two adding matrices A and S by using RPCA. Secondly, the differentially expressed genes are identified based on matrix S. Finally, the differentially expressed genes are evaluated by the tools based on Gene Ontology. A larger number of experiments on hypothetical and real gene expression data are also provided and the experimental results show that our method is efficient and effective.

  4. Control of alphavirus-based gene expression using engineered riboswitches.

    PubMed

    Bell, Christie L; Yu, Dong; Smolke, Christina D; Geall, Andrew J; Beard, Clayton W; Mason, Peter W

    2015-09-01

    Alphavirus-based replicons are a promising nucleic acid vaccine platform characterized by robust gene expression and immune responses. To further explore their use in vaccination, replicons were engineered to allow conditional control over their gene expression. Riboswitches, comprising a ribozyme actuator and RNA aptamer sensor, were engineered into the replicon 3' UTR. Binding of ligand to aptamer modulates ribozyme activity and, therefore, gene expression. Expression from DNA-launched and VRP-packaged replicons containing riboswitches was successfully regulated, achieving a 47-fold change in expression and modulation of the resulting type I interferon response. Moreover, we developed a novel control architecture where riboswitches were integrated into the 3' and 5' UTR of the subgenomic RNA region of the TC-83 virus, leading to an 1160-fold regulation of viral replication. Our studies demonstrate that the use of riboswitches for control of RNA replicon expression and viral replication holds promise for development of novel and safer vaccination strategies.

  5. Nonlinear model-based method for clustering periodically expressed genes.

    PubMed

    Tian, Li-Ping; Liu, Li-Zhi; Zhang, Qian-Wei; Wu, Fang-Xiang

    2011-01-01

    Clustering periodically expressed genes from their time-course expression data could help understand the molecular mechanism of those biological processes. In this paper, we propose a nonlinear model-based clustering method for periodically expressed gene profiles. As periodically expressed genes are associated with periodic biological processes, the proposed method naturally assumes that a periodically expressed gene dataset is generated by a number of periodical processes. Each periodical process is modelled by a linear combination of trigonometric sine and cosine functions in time plus a Gaussian noise term. A two stage method is proposed to estimate the model parameter, and a relocation-iteration algorithm is employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. One synthetic dataset and two biological datasets were employed to evaluate the performance of the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g., k-means) for periodically expressed gene data, and thus it is an effective cluster analysis method for periodically expressed gene data.

  6. Interpolation based consensus clustering for gene expression time series.

    PubMed

    Chiu, Tai-Yu; Hsu, Ting-Chieh; Yen, Chia-Cheng; Wang, Jia-Shung

    2015-04-16

    Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

  7. Multiclass cancer classification based on gene expression comparison

    PubMed Central

    Yang, Sitan; Naiman, Daniel Q.

    2016-01-01

    As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analyses, microarray-based cancer classification comprising multiple discriminatory molecular markers is an emerging trend. Such multiclass classification problems pose new methodological and computational challenges for developing novel and effective statistical approaches. In this paper, we introduce a new approach for classifying multiple disease states associated with cancer based on gene expression profiles. Our method focuses on detecting small sets of genes in which the relative comparison of their expression values leads to class discrimination. For an m-class problem, the classification rule typically depends on a small number of m-gene sets, which provide transparent decision boundaries and allow for potential biological interpretations. We first test our approach on seven common gene expression datasets and compare it with popular classification methods including support vector machines and random forests. We then consider an extremely large cohort of leukemia cancer to further assess its effectiveness. In both experiments, our method yields comparable or even better results to benchmark classifiers. In addition, we demonstrate that our approach can integrate pathway analysis of gene expression to provide accurate and biological meaningful classification. PMID:24918456

  8. Molecular-genetic imaging based on reporter gene expression.

    PubMed

    Kang, Joo Hyun; Chung, June-Key

    2008-06-01

    Molecular imaging includes proteomic, metabolic, cellular biologic process, and genetic imaging. In a narrow sense, molecular imaging means genetic imaging and can be called molecular-genetic imaging. Imaging reporter genes play a leading role in molecular-genetic imaging. There are 3 major methods of molecular-genetic imaging, based on optical, MRI, and nuclear medicine modalities. For each of these modalities, various reporter genes and probes have been developed, and these have resulted in successful transitions from bench to bedside applications. Each of these imaging modalities has its unique advantages and disadvantages. Fluorescent and bioluminescent optical imaging modalities are simple, less expensive, more convenient, and more user friendly than other imaging modalities. Another advantage, especially of bioluminescence imaging, is its ability to detect low levels of gene expression. MRI has the advantage of high spatial resolution, whereas nuclear medicine methods are highly sensitive and allow data from small-animal imaging studies to be translated to clinical practice. Moreover, multimodality imaging reporter genes will allow us to choose the imaging technologies that are most appropriate for the biologic problem at hand and facilitate the clinical application of reporter gene technologies. Reporter genes can be used to visualize the levels of expression of particular exogenous and endogenous genes and several intracellular biologic phenomena, including specific signal transduction pathways, nuclear receptor activities, and protein-protein interactions. This technique provides a straightforward means of monitoring tumor mass and can visualize the in vivo distributions of target cells, such as immune cells and stem cells. Molecular imaging has gradually evolved into an important tool for drug discovery and development, and transgenic mice with an imaging reporter gene can be useful during drug and stem cell therapy development. Moreover, instrumentation

  9. An Agent-Based Clustering Approach for Gene Selection in Gene Expression Microarray.

    PubMed

    Ramos, Juan; Castellanos-Garzón, José A; González-Briones, Alfonso; de Paz, Juan F; Corchado, Juan M

    2017-03-09

    Gene selection is a major research area in microarray analysis, which seeks to discover differentially expressed genes for a particular target annotation. Such genes also often called informative genes are able to differentiate tissue samples belonging to different classes of the studied disease. Despite the fact that there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This research proposes a gene selection approach by means of a clustering-based multi-agent system. This proposal manages different filter methods and gene clustering through coordinated agents to discover informative gene subsets. To assess the reliability of our approach, we have used four important and public gene expression datasets, two Lung cancer datasets, Colon and Leukemia cancer dataset. The achieved results have been validated through cluster validity measures, visual analytics, a classifier and compared with other gene selection methods, proving the reliability of our proposal.

  10. Ranking Candidate Disease Genes from Gene Expression and Protein Interaction: A Katz-Centrality Based Approach

    PubMed Central

    Zhao, Jing; Yang, Ting-Hong; Huang, Yongxu; Holme, Petter

    2011-01-01

    Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions—that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders. PMID:21912686

  11. Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

    PubMed

    Zhao, Jing; Yang, Ting-Hong; Huang, Yongxu; Holme, Petter

    2011-01-01

    Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.

  12. Computing gene expression data with a knowledge-based gene clustering approach.

    PubMed

    Rosa, Bruce A; Oh, Sookyung; Montgomery, Beronda L; Chen, Jin; Qin, Wensheng

    2010-01-01

    Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

  13. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for

  14. Identifying the optimal gene and gene set in hepatocellular carcinoma based on differential expression and differential co-expression algorithm.

    PubMed

    Dong, Li-Yang; Zhou, Wei-Zhong; Ni, Jun-Wei; Xiang, Wei; Hu, Wen-Hao; Yu, Chang; Li, Hai-Yan

    2017-02-01

    The objective of this study was to identify the optimal gene and gene set for hepatocellular carcinoma (HCC) utilizing differential expression and differential co-expression (DEDC) algorithm. The DEDC algorithm consisted of four parts: calculating differential expression (DE) by absolute t-value in t-statistics; computing differential co-expression (DC) based on Z-test; determining optimal thresholds on the basis of Chi-squared (χ2) maximization and the corresponding gene was the optimal gene; and evaluating functional relevance of genes categorized into different partitions to determine the optimal gene set with highest mean minimum functional information (FI) gain (Δ*G). The optimal thresholds divided genes into four partitions, high DE and high DC (HDE-HDC), high DE and low DC (HDE-LDC), low DE and high DC (LDE‑HDC), and low DE and low DC (LDE-LDC). In addition, the optimal gene was validated by conducting reverse transcription-polymerase chain reaction (RT-PCR) assay. The optimal threshold for DC and DE were 1.032 and 1.911, respectively. Using the optimal gene, the genes were divided into four partitions including: HDE-HDC (2,053 genes), HED-LDC (2,822 genes), LDE-HDC (2,622 genes), and LDE-LDC (6,169 genes). The optimal gene was microtubule‑associated protein RP/EB family member 1 (MAPRE1), and RT-PCR assay validated the significant difference between the HCC and normal state. The optimal gene set was nucleoside metabolic process (GO\\GO:0009116) with Δ*G = 18.681 and 24 HDE-HDC partitions in total. In conclusion, we successfully investigated the optimal gene, MAPRE1, and gene set, nucleoside metabolic process, which may be potential biomarkers for targeted therapy and provide significant insight for revealing the pathological mechanism underlying HCC.

  15. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods.

    PubMed

    Huttenhower, Curtis; Flamholz, Avi I; Landis, Jessica N; Sahi, Sauhard; Myers, Chad L; Olszewski, Kellen L; Hibbs, Matthew A; Siemers, Nathan O; Troyanskaya, Olga G; Coller, Hilary A

    2007-07-12

    The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a

  16. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    PubMed Central

    Huttenhower, Curtis; Flamholz, Avi I; Landis, Jessica N; Sahi, Sauhard; Myers, Chad L; Olszewski, Kellen L; Hibbs, Matthew A; Siemers, Nathan O; Troyanskaya, Olga G; Coller, Hilary A

    2007-01-01

    Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets

  17. Identification of feature genes for smoking-related lung adenocarcinoma based on gene expression profile data

    PubMed Central

    Liu, Ying; Ni, Ran; Zhang, Hui; Miao, Lijun; Wang, Jing; Jia, Wenqing; Wang, Yuanyuan

    2016-01-01

    This study aimed to identify the genes and pathways associated with smoking-related lung adenocarcinoma. Three lung adenocarcinoma associated datasets (GSE43458, GSE10072, and GSE50081), the subjects of which included smokers and nonsmokers, were downloaded to screen the differentially expressed feature genes between smokers and nonsmokers. Based on the identified feature genes, we constructed the protein–protein interaction (PPI) network and optimized feature genes using closeness centrality (CC) algorithm. Then, the support vector machine (SVM) classification model was constructed based on the feature genes with higher CC values. Finally, pathway enrichment analysis of the feature genes was performed. A total of 213 down-regulated and 83 up-regulated differentially expressed genes were identified. In the constructed PPI network, the top ten nodes with higher degrees and CC values included ANK3, EPHA4, FGFR2, etc. The SVM classifier was constructed with 27 feature genes, which could accurately identify smokers and nonsmokers. Pathways enrichment analysis for the 27 feature genes revealed that they were significantly enriched in five pathways, including proteoglycans in cancer (EGFR, SDC4, SDC2, etc.), and Ras signaling pathway (FGFR2, PLA2G1B, EGFR, etc.). The 27 feature genes, such as EPHA4, FGFR2, and EGFR for SVM classifier construction and cancer-related pathways of Ras signaling pathway and proteoglycans in cancer may play key roles in the progression and development of smoking-related lung adenocarcinoma. PMID:27994470

  18. Gene expression of a gene family in maize based on noncollinear haplotypes

    PubMed Central

    Song, Rentao; Messing, Joachim

    2003-01-01

    Genomic regions of nearly every species diverged into different haplotypes, mostly based on point mutations, small deletions, and insertions that do not affect the collinearity of genes within a species. However, the same genomic interval containing the z1C gene cluster of two inbred lines of Zea mays significantly lost their gene collinearity and also differed in the regulation of each remaining gene set. Furthermore, when inbreds were reciprocally crossed, hybrids exhibited an unexpected shift of expression patterns so that “overdominance” instead of “dominance complementation” of allelic and nonallelic gene expression occurred. The same interval also differed in length (360 vs. 263 kb). Segmental rearrangements led to sequence changes, which were further enhanced by the insertion of different transposable elements. Changes in gene order affected not only z1C genes but also three unrelated genes. However, the orthologous interval between two subspecies of rice (not rice cultivars) was conserved in length and gene order, whereas changes between two maize inbreds were as drastic as changes between maize and sorghum. Given that chromosomes could conceivably consist of intervals of haplotypes that are highly diverged, one could envision endless breeding opportunities because of their linear arrangement along a chromosome and their expression potential in hybrid combinations (“binary” systems). The implication of such a hypothesis for heterosis is discussed. PMID:12853580

  19. Gene regulatory network clustering for graph layout based on microarray gene expression data.

    PubMed

    Kojima, Kaname; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru

    2010-01-01

    We propose a statistical model realizing simultaneous estimation of gene regulatory network and gene module identification from time series gene expression data from microarray experiments. Under the assumption that genes in the same module are densely connected, the proposed method detects gene modules based on the variational Bayesian technique. The model can also incorporate existing biological prior knowledge such as protein subcellular localization. We apply the proposed model to the time series data from a synthetically generated network and verified the effectiveness of the proposed model. The proposed model is also applied the time series microarray data from HeLa cell. Detected gene module information gives the great help on drawing the estimated gene network.

  20. Transcriptome-based gene expression profiling identifies differentially expressed genes critical for salt stress response in radish (Raphanus sativus L.).

    PubMed

    Sun, Xiaochuan; Xu, Liang; Wang, Yan; Luo, Xiaobo; Zhu, Xianwen; Kinuthia, Karanja Benard; Nie, Shanshan; Feng, Haiyang; Li, Chao; Liu, Liwang

    2016-02-01

    Transcriptome-based gene expression analysis identifies many critical salt-responsive genes in radish and facilitates further dissecting the molecular mechanism underlying salt stress response. Salt stress severely impacts plant growth and development. Radish, a moderately salt-sensitive vegetable crop, has been studied for decades towards the physiological and biochemical performances under salt stress. However, no systematic study on isolation and identification of genes involved in salt stress response has been performed in radish, and the molecular mechanism governing this process is still indistinct. Here, the RNA-Seq technique was applied to analyze the transcriptomic changes on radish roots treated with salt (200 mM NaCl) for 48 h in comparison with those cultured in normal condition. Totally 8709 differentially expressed genes (DEGs) including 3931 up- and 4778 down-regulated genes were identified. Functional annotation analysis indicated that many genes could be involved in several aspects of salt stress response including stress sensing and signal transduction, osmoregulation, ion homeostasis and ROS scavenging. The association analysis of salt-responsive genes and miRNAs exhibited that 36 miRNA-mRNA pairs had negative correlationship in expression trends. Reverse-transcription quantitative PCR (RT-qPCR) analysis revealed that the expression profiles of DEGs were in line with results from the RNA-Seq analysis. Furthermore, the putative model of DEGs and miRNA-mediated gene regulation was proposed to elucidate how radish sensed and responded to salt stress. This study represents the first comprehensive transcriptome-based gene expression profiling under salt stress in radish. The outcomes of this study could facilitate further dissecting the molecular mechanism underlying salt stress response and provide a valuable platform for further genetic improvement of salt tolerance in radish breeding programs.

  1. Minimal gene selection for classification and diagnosis prediction based on gene expression profile

    PubMed Central

    Mehridehnavi, Alireza; Ziaei, Lia

    2013-01-01

    Background: Up to date different methods have been used in order to dimensions reduction, classification, clustering and prediction of cancers based on gene expression profiling. The aim of this study is extracting most significant genes and classifying of Diffuse Large B-cell Lymphoma (DLBCL) patients on the basis of their gene expression profiles. Materials and Methods: We studied 40 DLBCL patients and 4026 genes. We utilized Artificial Neural Network (ANN) for classification of patients in two groups: Germinal center and Activated like. As we were faced with low number of patients (40) and numerous genes (4026), we tried to deploy one optimum network and achieve to minimum error. Moreover we used signal to noise (S/N) ratio as a main tool for dimension reduction. We tried to select suitable training data and so to train just one network instead of 26 networks. Finally, we extracted two most significant genes. Result: In this study two most significant genes based on their S/N ratios were selected. After selection of suitable training samples, the training and testing error were 0 and 7% respectively. Conclusion: We have shown that the use of two most significant genes based on their S/N ratios and selection of suitable training samples can lead to classify DLBCL patients with a rather good result. Actually with the aid of mentioned methods we could compensate lack of enough number of patients, improve accuracy of classifying and reduce complication of computations and so running time. PMID:23977654

  2. GEDA: new knowledge base of gene expression in drug addiction.

    PubMed

    Suh, Young Ju; Yang, Moon Hee; Yoon, Suk Joon; Park, Jong Hoon

    2006-07-31

    Abuse of drugs can elicit compulsive drug seeking behaviors upon repeated administration, and ultimately leads to the phenomenon of addiction. We developed a procedure for the standardization of microarray gene expression data of rat brain in drug addiction and stored them in a single integrated database system, focusing on more effective data processing and interpretation. Another characteristic of the present database is that it has a systematic flexibility for statistical analysis and linking with other databases. Basically, we adopt an intelligent SQL querying system, as the foundation of our DB, in order to set up an interactive module which can automatically read the raw gene expression data in the standardized format. We maximize the usability of this DB, helping users study significant gene expression and identify biological function of the genes through integrated up-to-date gene information such as GO annotation and metabolic pathway. For collecting the latest information of selected gene from the database, we also set up the local BLAST search engine and nonredundant sequence database updated by NCBI server on a daily basis. We find that the present database is a useful query interface and data-mining tool, specifically for finding out the genes related to drug addiction. We apply this system to the identification and characterization of methamphetamine-induced genes' behavior in rat brain.

  3. Network-Based Inference Framework for Identifying Cancer Genes from Gene Expression Data

    PubMed Central

    Yang, Bo; Zhang, Junying; Yin, Yaling; Zhang, Yuanyuan

    2013-01-01

    Great efforts have been devoted to alleviate uncertainty of detected cancer genes as accurate identification of oncogenes is of tremendous significance and helps unravel the biological behavior of tumors. In this paper, we present a differential network-based framework to detect biologically meaningful cancer-related genes. Firstly, a gene regulatory network construction algorithm is proposed, in which a boosting regression based on likelihood score and informative prior is employed for improving accuracy of identification. Secondly, with the algorithm, two gene regulatory networks are constructed from case and control samples independently. Thirdly, by subtracting the two networks, a differential-network model is obtained and then used to rank differentially expressed hub genes for identification of cancer biomarkers. Compared with two existing gene-based methods (t-test and lasso), the method has a significant improvement in accuracy both on synthetic datasets and two real breast cancer datasets. Furthermore, identified six genes (TSPYL5, CD55, CCNE2, DCK, BBC3, and MUC1) susceptible to breast cancer were verified through the literature mining, GO analysis, and pathway functional enrichment analysis. Among these oncogenes, TSPYL5 and CCNE2 have been already known as prognostic biomarkers in breast cancer, CD55 has been suspected of playing an important role in breast cancer prognosis from literature evidence, and other three genes are newly discovered breast cancer biomarkers. More generally, the differential-network schema can be extended to other complex diseases for detection of disease associated-genes. PMID:24073403

  4. A biomarker-based screen of a gene expression compendium ...

    EPA Pesticide Factsheets

    Computational approaches were developed to identify factors that regulate Nrf2 in a large gene expression compendium of microarray profiles including >2000 comparisons which queried the effects of chemicals, genes, diets, and infectious agents on gene expression in the mouse liver. A gene expression biomarker of 48 genes which accurately predicted Nrf2 activation was used to identify factors which resulted in a gene expression profile with significant correlation to the biomarker. A number of novel insights were made. Chemicals that activated the xenosensor constitutive activated receptor (CAR) consistently activated Nrf2 across hundreds of profiles, possibly downstream of Cyp-induced increases in oxidative stress. Nrf2 activation was also found to be negatively regulated by the growth hormone (GH)- and androgen-regulated transcription factor STAT5b, a transcription factor suppressed by CAR. Nrf2 was activated when STAT5b was suppressed in female mice vs. male mice, after exposure to estrogens, or in genetic mutants in which GH signaling was disrupted. A subset of the mutants that show STAT5b suppression and Nrf2 activation result in increased resistance to environmental stressors and increased longevity. This study describes a novel approach for understanding the network of factors that regulate the Nrf2 pathway and highlights novel interactions between Nrf2, CAR and STAT5b transcription factors. (This abstract does not represent EPA policy.) Computational appr

  5. A biomarker-based screen of a gene expression compendium ...

    EPA Pesticide Factsheets

    Computational approaches were developed to identify factors that regulate Nrf2 in a large gene expression compendium of microarray profiles including >2000 comparisons which queried the effects of chemicals, genes, diets, and infectious agents on gene expression in the mouse liver. A gene expression biomarker of 48 genes which accurately predicted Nrf2 activation was used to identify factors which resulted in a gene expression profile with significant correlation to the biomarker. A number of novel insights were made. Chemicals that activated the xenosensor constitutive activated receptor (CAR) consistently activated Nrf2 across hundreds of profiles, possibly downstream of Cyp-induced increases in oxidative stress. Nrf2 activation was also found to be negatively regulated by the growth hormone (GH)- and androgen-regulated transcription factor STAT5b, a transcription factor suppressed by CAR. Nrf2 was activated when STAT5b was suppressed in female mice vs. male mice, after exposure to estrogens, or in genetic mutants in which GH signaling was disrupted. A subset of the mutants that show STAT5b suppression and Nrf2 activation result in increased resistance to environmental stressors and increased longevity. This study describes a novel approach for understanding the network of factors that regulate the Nrf2 pathway and highlights novel interactions between Nrf2, CAR and STAT5b transcription factors. (This abstract does not represent EPA policy.) Computational appr

  6. Density based pruning for identification of differentially expressed genes from microarray data

    PubMed Central

    2010-01-01

    Motivation Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes. Results We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change. Conclusions Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune PMID:21047384

  7. Modeling Gene Networks in Saccharomyces cerevisiae Based on Gene Expression Profiles.

    PubMed

    Zhang, Yulin; Lv, Kebo; Wang, Shudong; Su, Jionglong; Meng, Dazhi

    2015-01-01

    Detailed and innovative analysis of gene regulatory network structures may reveal novel insights to biological mechanisms. Here we study how gene regulatory network in Saccharomyces cerevisiae can differ under aerobic and anaerobic conditions. To achieve this, we discretized the gene expression profiles and calculated the self-entropy of down- and upregulation of gene expression as well as joint entropy. Based on these quantities the uncertainty coefficient was calculated for each gene triplet, following which, separate gene logic networks were constructed for the aerobic and anaerobic conditions. Four structural parameters such as average degree, average clustering coefficient, average shortest path, and average betweenness were used to compare the structure of the corresponding aerobic and anaerobic logic networks. Five genes were identified to be putative key components of the two energy metabolisms. Furthermore, community analysis using the Newman fast algorithm revealed two significant communities for the aerobic but only one for the anaerobic network. David Gene Functional Classification suggests that, under aerobic conditions, one such community reflects the cell cycle and cell replication, while the other one is linked to the mitochondrial respiratory chain function.

  8. Study of human dopamine sulfotransferases based on gene expression programming.

    PubMed

    Si, Hongzong; Zhao, Jiangang; Cui, Lianhua; Lian, Ning; Feng, Hanlin; Duan, Yun-Bo; Hu, Zhide

    2011-09-01

    A quantitative model is developed to predict the Km of 47 human dopamine sulfotransferases by gene expression programming. Each kind of compound is represented by several calculated structural descriptors of moment of inertia A, average electrophilic reactivity index for a C atom, relative number of triple bonds, RNCG relative negative charge, HA-dependent HDSA-1, and HBCA H-bonding charged surface area. Eight fitness functions of the gene expression programming method are used to find the best nonlinear model. The best quantitative model with squared standard error and square of correlation coefficient are 0.096 and 0.91 for training data set, and 0.102 and 0.88 for test set, respectively. It is shown that the gene expression programming-predicted results with fitness function are in good agreement with experimental ones. © 2011 John Wiley & Sons A/S.

  9. Integrated pathway-based transcription regulation network mining and visualization based on gene expression profiles.

    PubMed

    Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko

    2016-06-01

    Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer.

  10. Synthetic RNA-based switches for mammalian gene expression control.

    PubMed

    Ausländer, Simon; Fussenegger, Martin

    2017-04-04

    Synthetic ribonucleic acid (RNA)-based gene switches control RNA functions in a ligand-responsive manner. Key building blocks are aptamers that specifically bind to small molecules or protein ligands. Engineering approaches often combine rational design and high-throughput screening to identify optimal connection sites or sequences. In this report, we discuss basic principles and emerging design strategies for the engineering of RNA-based gene switches in mammalian cells. Their small size compared with those of transcriptional gene switches, together with advancements in design strategies and performance, may bring RNA-based switches to the forefront of biomedical and biotechnological applications.

  11. Noise-based switches and amplifiers for gene expression

    PubMed Central

    Hasty, Jeff; Pradines, Joel; Dolnik, Milos; Collins, J. J.

    2000-01-01

    The regulation of cellular function is often controlled at the level of gene transcription. Such genetic regulation usually consists of interacting networks, whereby gene products from a single network can act to control their own expression or the production of protein in another network. Engineered control of cellular function through the design and manipulation of such networks lies within the constraints of current technology. Here we develop a model describing the regulation of gene expression and elucidate the effects of noise on the formulation. We consider a single network derived from bacteriophage λ and construct a two-parameter deterministic model describing the temporal evolution of the concentration of λ repressor protein. Bistability in the steady-state protein concentration arises naturally, and we show how the bistable regime is enhanced with the addition of the first operator site in the promotor region. We then show how additive and multiplicative external noise can be used to regulate expression. In the additive case, we demonstrate the utility of such control through the construction of a protein switch, whereby protein production is turned “on” and “off” by using short noise pulses. In the multiplicative case, we show that small deviations in the transcription rate can lead to large fluctuations in the production of protein, and we describe how these fluctuations can be used to amplify protein production significantly. These results suggest that an external noise source could be used as a switch and/or amplifier for gene expression. Such a development could have important implications for gene therapy. PMID:10681449

  12. Investigation of candidate genes for osteoarthritis based on gene expression profiles.

    PubMed

    Dong, Shuanghai; Xia, Tian; Wang, Lei; Zhao, Qinghua; Tian, Jiwei

    2016-12-01

    To explore the mechanism of osteoarthritis (OA) and provide valid biological information for further investigation. Gene expression profile of GSE46750 was downloaded from Gene Expression Omnibus database. The Linear Models for Microarray Data (limma) package (Bioconductor project, http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify differentially expressed genes (DEGs) in inflamed OA samples. Gene Ontology function enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of DEGs were performed based on Database for Annotation, Visualization and Integrated Discovery data, and protein-protein interaction (PPI) network was constructed based on the Search Tool for the Retrieval of Interacting Genes/Proteins database. Regulatory network was screened based on Encyclopedia of DNA Elements. Molecular Complex Detection was used for sub-network screening. Two sub-networks with highest node degree were integrated with transcriptional regulatory network and KEGG functional enrichment analysis was processed for 2 modules. In total, 401 up- and 196 down-regulated DEGs were obtained. Up-regulated DEGs were involved in inflammatory response, while down-regulated DEGs were involved in cell cycle. PPI network with 2392 protein interactions was constructed. Moreover, 10 genes including Interleukin 6 (IL6) and Aurora B kinase (AURKB) were found to be outstanding in PPI network. There are 214 up- and 8 down-regulated transcription factor (TF)-target pairs in the TF regulatory network. Module 1 had TFs including SPI1, PRDM1, and FOS, while module 2 contained FOSL1. The nodes in module 1 were enriched in chemokine signaling pathway, while the nodes in module 2 were mainly enriched in cell cycle. The screened DEGs including IL6, AGT, and AURKB might be potential biomarkers for gene therapy for OA by being regulated by TFs such as FOS and SPI1, and participating in the cell cycle and cytokine-cytokine receptor

  13. A Model-Based Joint Identification of Differentially Expressed Genes and Phenotype-Associated Genes

    PubMed Central

    Seo, Minseok; Shin, Su-kyung; Kwon, Eun-Young; Kim, Sung-Eun; Bae, Yun-Jung; Lee, Seungyeoun; Sung, Mi-Kyung; Choi, Myung-Sook; Park, Taesung

    2016-01-01

    Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs) among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs). However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naïve approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods. Through analysis of

  14. Automated target preparation for microarray-based gene expression analysis.

    PubMed

    Raymond, Frédéric; Metairon, Sylviane; Borner, Roland; Hofmann, Markus; Kussmann, Martin

    2006-09-15

    DNA microarrays have rapidly evolved toward a platform for massively paralleled gene expression analysis. Despite its widespread use, the technology has been criticized to be vulnerable to technical variability. Addressing this issue, recent comparative, interplatform, and interlaboratory studies have revealed that, given defined procedures for "wet lab" experiments and data processing, a satisfactory reproducibility and little experimental variability can be achieved. In view of these advances in standardization, the requirement for uniform sample preparation becomes evident, especially if a microarray platform is used as a facility, i.e., by different users working in the laboratory. While one option to reduce technical variability is to dedicate one laboratory technician to all microarray studies, we have decided to automate the entire RNA sample preparation implementing a liquid handling system coupled to a thermocycler and a microtiter plate reader. Indeed, automated RNA sample preparation prior to chip analysis enables (1) the reduction of experimentally caused result variability, (2) the separation of (important) biological variability from (undesired) experimental variation, and (3) interstudy comparison of gene expression results. Our robotic platform can process up to 24 samples in parallel, using an automated sample preparation method that produces high-quality biotin-labeled cRNA ready to be hybridized on Affymetrix GeneChips. The results show that the technical interexperiment variation is less pronounced than with manually prepared samples. Moreover, experiments using the same starting material showed that the automated process yields a good reproducibility between samples.

  15. Combining gene annotations and gene expression data in model-based clustering: weighted method.

    PubMed

    Huang, Desheng; Wei, Peng; Pan, Wei

    2006-01-01

    It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method.

  16. Optimization Based Tumor Classification from Microarray Gene Expression Data

    PubMed Central

    Dagliyan, Onur; Uney-Yuksektepe, Fadime; Kavakli, I. Halil; Turkay, Metin

    2011-01-01

    Background An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. Methodology/Principal Findings We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described. Conclusions/Significance The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of

  17. Expression profile based gene clusters for ischemic stroke detection.

    PubMed

    Adamski, Mateusz G; Li, Yan; Wagner, Erin; Yu, Hua; Seales-Bailey, Chloe; Soper, Steven A; Murphy, Michael; Baird, Alison E

    2014-09-01

    In microarray studies alterations in gene expression in circulating leukocytes have shown utility for ischemic stroke diagnosis. We studied forty candidate markers identified in three gene expression profiles to (1) quantitate individual transcript expression, (2) identify transcript clusters and (3) assess the clinical diagnostic utility of the clusters identified for ischemic stroke detection. Using high throughput next generation qPCR 16 of the 40 transcripts were significantly up-regulated in stroke patients relative to control subjects (p<0.05). Six clusters of between 5 and 7 transcripts were identified that discriminated between stroke and control (p values between 1.01e-9 and 0.03). A 7 transcript cluster containing PLBD1, PYGL, BST1, DUSP1, FOS, VCAN and FCGR1A showed high accuracy for stroke classification (AUC=0.854). These results validate and improve upon the diagnostic value of transcripts identified in microarray studies for ischemic stroke. The clusters identified show promise for acute ischemic stroke detection. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Persistent gene expression in mouse nasal epithelia following feline immunodeficiency virus-based vector gene transfer.

    PubMed

    Sinn, Patrick L; Burnight, Erin R; Hickey, Melissa A; Blissard, Gary W; McCray, Paul B

    2005-10-01

    Gene transfer development for treatment or prevention of cystic fibrosis lung disease has been limited by the inability of vectors to efficiently and persistently transduce airway epithelia. Influenza A is an enveloped virus with natural lung tropism; however, pseudotyping feline immunodeficiency virus (FIV)-based lentiviral vector with the hemagglutinin envelope protein proved unsuccessful. Conversely, pseudotyping FIV with the envelope protein from influenza D (Thogoto virus GP75) resulted in titers of 10(6) transducing units (TU)/ml and conferred apical entry into well-differentiated human airway epithelial cells. Baculovirus GP64 envelope glycoproteins share sequence identity with influenza D GP75 envelope glycoproteins. Pseudotyping FIV with GP64 from three species of baculovirus resulted in titers of 10(7) to 10(9) TU/ml. Of note, GP64 from Autographa californica multicapsid nucleopolyhedrovirus resulted in high-titer FIV preparations (approximately 10(9) TU/ml) and conferred apical entry into polarized primary cultures of human airway epithelia. Using a luciferase reporter gene and bioluminescence imaging, we observed persistent gene expression from in vivo gene transfer in the mouse nose with A. californica GP64-pseudotyped FIV (AcGP64-FIV). Longitudinal bioluminescence analysis documented persistent expression in nasal epithelia for approximately 1 year without significant decline. According to histological analysis using a LacZ reporter gene, olfactory and respiratory epithelial cells were transduced. In addition, methylcellulose-formulated AcGP64-FIV transduced mouse nasal epithelia with much greater efficiency than similarly formulated vesicular stomatitis virus glycoprotein-pseudotyped FIV. These data suggest that AcGP64-FIV efficiently transduces and persistently expresses a transgene in nasal epithelia in the absence of agents that disrupt the cellular tight junction integrity.

  19. Gene Expression-Based Biomarkers for Anopheles gambiae Age Grading

    PubMed Central

    Wang, Mei-Hui; Marinotti, Osvaldo; Zhong, Daibin; James, Anthony A.; Walker, Edward; Guda, Tom; Kweka, Eliningaya J.; Githure, John; Yan, Guiyun

    2013-01-01

    Information on population age structure of mosquitoes under natural conditions is fundamental to the understanding of vectorial capacity and crucial for assessing the impact of vector control measures on malaria transmission. Transcriptional profiling has been proposed as a method for predicting mosquito age for Aedes and Anopheles mosquitoes, however, whether this new method is adequate for natural conditions is unknown. This study tests the applicability of transcriptional profiling for age-grading of Anopheles gambiae, the most important malaria vector in Africa. The transcript abundance of two An. gambiae genes, AGAP009551 and AGAP011615, was measured during aging under laboratory and field conditions in three mosquito strains. Age-dependent monotonic changes in transcript levels were observed in all strains evaluated. These genes were validated as age-grading biomarkers using the mark, release and recapture (MRR) method. The MRR method determined a good correspondence between actual and predicted age, and thus demonstrated the value of age classifications derived from the transcriptional profiling of these two genes. The technique was used to establish the age structure of mosquito populations from two malaria-endemic areas in western Kenya. The population age structure determined by the transcriptional profiling method was consistent with that based on mosquito parity. This study demonstrates that the transcription profiling method based on two genes is valuable for age determination of natural mosquitoes, providing a new approach for determining a key life history trait of malaria vectors. PMID:23936017

  20. Detecting Essential Proteins Based on Network Topology, Gene Expression Data and Gene Ontology Information.

    PubMed

    Zhang, Wei; Xu, Jia; Li, Yuanyuan; Zou, Xiufen

    2016-10-07

    The identification of essential proteins in protein-protein interaction (PPI) networks is of great significance for understanding cellular processes. With the increasing availability of large-scale PPI data, numerous centrality measures based on network topology have been proposed to detect essential proteins from PPI networks. However, most of the current approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology annotation information. In this paper, we propose a novel centrality measure, called TEO, for identifying essential proteins by combining network topology, gene expression profiles and GO information. To evaluate the performance of the TEO method, we compare it with five other methods (degree, betweenness, NC, Pec, CowEWC) in detecting essential proteins from two different yeast PPI datasets. The simulation results show that adding GO information can effectively improve the predicted precision and that our method outperforms the others in predicting essential proteins.

  1. Single base resolution analysis of 5-hydroxymethylcytosine in 188 human genes: implications for hepatic gene expression.

    PubMed

    Ivanov, Maxim; Kals, Mart; Lauschke, Volker; Barragan, Isabel; Ewels, Philip; Käller, Max; Axelsson, Tomas; Lehtiö, Janne; Milani, Lili; Ingelman-Sundberg, Magnus

    2016-08-19

    To improve the epigenomic analysis of tissues rich in 5-hydroxymethylcytosine (hmC), we developed a novel protocol called TAB-Methyl-SEQ, which allows for single base resolution profiling of both hmC and 5-methylcytosine by targeted next-generation sequencing. TAB-Methyl-SEQ data were extensively validated by a set of five methodologically different protocols. Importantly, these extensive cross-comparisons revealed that protocols based on Tet1-assisted bisulfite conversion provided more precise hmC values than TrueMethyl-based methods. A total of 109 454 CpG sites were analyzed by TAB-Methyl-SEQ for mC and hmC in 188 genes from 20 different adult human livers. We describe three types of variability of hepatic hmC profiles: (i) sample-specific variability at 40.8% of CpG sites analyzed, where the local hmC values correlate to the global hmC content of livers (measured by LC-MS), (ii) gene-specific variability, where hmC levels in the coding regions positively correlate to expression of the respective gene and (iii) site-specific variability, where prominent hmC peaks span only 1 to 3 neighboring CpG sites. Our data suggest that both the gene- and site-specific components of hmC variability might contribute to the epigenetic control of hepatic genes. The protocol described here should be useful for targeted DNA analysis in a variety of applications. © The Author 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Design-Based Learning for Biology: Genetic Engineering Experience Improves Understanding of Gene Expression

    ERIC Educational Resources Information Center

    Ellefson, Michelle R.; Brinker, Rebecca A.; Vernacchio, Vincent J.; Schunn, Christian D.

    2008-01-01

    Gene expression is a difficult topic for students to learn and comprehend, at least partially because it involves various biochemical structures and processes occurring at the microscopic level. Designer Bacteria, a design-based learning (DBL) unit for high-school students, applies principles of DBL to the teaching of gene expression. Throughout…

  3. Design-Based Learning for Biology: Genetic Engineering Experience Improves Understanding of Gene Expression

    ERIC Educational Resources Information Center

    Ellefson, Michelle R.; Brinker, Rebecca A.; Vernacchio, Vincent J.; Schunn, Christian D.

    2008-01-01

    Gene expression is a difficult topic for students to learn and comprehend, at least partially because it involves various biochemical structures and processes occurring at the microscopic level. Designer Bacteria, a design-based learning (DBL) unit for high-school students, applies principles of DBL to the teaching of gene expression. Throughout…

  4. Screening key genes associated with congenital heart defects in Down syndrome based on differential expression network.

    PubMed

    Yu, Shan; Yi, Huani; Wang, Zhimin; Dong, Juan

    2015-01-01

    Down syndrome (DS) is the most common viable chromosomal disorder with intellectual impairment and several other developmental abnormalities. Forty to fifty percent of newborns with DS have some form of congenital heart defects (CHD). The genome of CHD in DS has already been obtained, but the underlying genomic or gene expression variation that contributes to the manifestation of a CHD in DS is still unknown. This study was aimed to analyze key genes of patients with CHD in DS. Differential expression network (DEN) approach was employed to analyze the dyeregulated genes and pathways in this study. First, the differentially expressed genes (DEGs) between CHD in DS and normal subjects were screened based on the microarray expression data. Next, the differential interactions were identified using spearman correlation coefficients of edges in different conditions. The DEN was then constructed combining both DEGs and differential interactions, and HUB genes were gained by degree centrality analysis of DEN. Meanwhile, disease genes included in the DEN were also ascertained. When analyzing gene expression values in different conditions, no DEGs were identified. While, a total of 984 gene pairs with significant differential expression were identified. Finally, the DEN was constructed only using differential edges in our study. In this network, eight HUB genes were identified, and thereinto four genes (UBC, APP, HUWE1 and SRC) were both HUB genes and disease genes. DEN approach should be taken as a useful complement to traditional differential genes methods. We provide several potential underlying biomarkers for CHD in DS.

  5. Evaluation and validation of reference genes for normalization of quantitative real-time PCR based gene expression studies in peanut.

    PubMed

    Reddy, Dumbala Srinivas; Bhatnagar-Mathur, Pooja; Cindhuri, Katamreddy Sri; Sharma, Kiran K

    2013-01-01

    The quantitative real-time PCR (qPCR) based techniques have become essential for gene expression studies and high-throughput molecular characterization of transgenic events. Normalizing to reference gene in relative quantification make results from qPCR more reliable when compared to absolute quantification, but requires robust reference genes. Since, ideal reference gene should be species specific, no single internal control gene is universal for use as a reference gene across various plant developmental stages and diverse growth conditions. Here, we present validation studies of multiple stably expressed reference genes in cultivated peanut with minimal variations in temporal and spatial expression when subjected to various biotic and abiotic stresses. Stability in the expression of eight candidate reference genes including ADH3, ACT11, ATPsyn, CYP2, ELF1B, G6PD, LEC and UBC1 was compared in diverse peanut plant samples. The samples were categorized into distinct experimental sets to check the suitability of candidate genes for accurate and reliable normalization of gene expression using qPCR. Stability in expression of the references genes in eight sets of samples was determined by geNorm and NormFinder methods. While three candidate reference genes including ADH3, G6PD and ELF1B were identified to be stably expressed across experiments, LEC was observed to be the least stable, and hence must be avoided for gene expression studies in peanut. Inclusion of the former two genes gave sufficiently reliable results; nonetheless, the addition of the third reference gene ELF1B may be potentially better in a diverse set of tissue samples of peanut.

  6. Evaluation and Validation of Reference Genes for Normalization of Quantitative Real-Time PCR Based Gene Expression Studies in Peanut

    PubMed Central

    Cindhuri, Katamreddy Sri; Sharma, Kiran K.

    2013-01-01

    The quantitative real-time PCR (qPCR) based techniques have become essential for gene expression studies and high-throughput molecular characterization of transgenic events. Normalizing to reference gene in relative quantification make results from qPCR more reliable when compared to absolute quantification, but requires robust reference genes. Since, ideal reference gene should be species specific, no single internal control gene is universal for use as a reference gene across various plant developmental stages and diverse growth conditions. Here, we present validation studies of multiple stably expressed reference genes in cultivated peanut with minimal variations in temporal and spatial expression when subjected to various biotic and abiotic stresses. Stability in the expression of eight candidate reference genes including ADH3, ACT11, ATPsyn, CYP2, ELF1B, G6PD, LEC and UBC1 was compared in diverse peanut plant samples. The samples were categorized into distinct experimental sets to check the suitability of candidate genes for accurate and reliable normalization of gene expression using qPCR. Stability in expression of the references genes in eight sets of samples was determined by geNorm and NormFinder methods. While three candidate reference genes including ADH3, G6PD and ELF1B were identified to be stably expressed across experiments, LEC was observed to be the least stable, and hence must be avoided for gene expression studies in peanut. Inclusion of the former two genes gave sufficiently reliable results; nonetheless, the addition of the third reference gene ELF1B may be potentially better in a diverse set of tissue samples of peanut. PMID:24167633

  7. Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition

    PubMed Central

    2013-01-01

    Background DNA microarrays are used for discovery of genes expressed differentially between various biological conditions. In microarray experiments the number of analyzed samples is often much lower than the number of genes (probe sets) which leads to many false discoveries. Multiple testing correction methods control the number of false discoveries but decrease the sensitivity of discovering differentially expressed genes. Concerning this problem, filtering methods for improving the power of detection of differentially expressed genes were proposed in earlier papers. These techniques are two-step procedures, where in the first step some pool of non-informative genes is removed and in the second step only the pool of the retained genes is used for searching for differentially expressed genes. Results A very important parameter to choose is the proportion between the sizes of the pools of removed and retained genes. A new method, which we propose, allow to determine close to optimal threshold values for sample means and sample variances for gene filtering. The method is adaptive and based on the decomposition of the histogram of gene expression means or variances into mixture of Gaussian components. Conclusions By performing analyses of several publicly available datasets and simulated datasets we demonstrate that our adaptive method increases sensitivity of finding differentially expressed genes compared to previous methods of filtering microarray data based on using fixed threshold values. PMID:23510016

  8. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface

    PubMed Central

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-01-01

    Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156

  9. Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models.

    PubMed

    Hirose, Osamu; Yoshida, Ryo; Imoto, Seiya; Yamaguchi, Rui; Higuchi, Tomoyuki; Charnock-Jones, D Stephen; Print, Cristin; Miyano, Satoru

    2008-04-01

    Statistical inference of gene networks by using time-course microarray gene expression profiles is an essential step towards understanding the temporal structure of gene regulatory mechanisms. Unfortunately, most of the current studies have been limited to analysing a small number of genes because the length of time-course gene expression profiles is fairly short. One promising approach to overcome such a limitation is to infer gene networks by exploring the potential transcriptional modules which are sets of genes sharing a common function or involved in the same pathway. In this article, we present a novel approach based on the state space model to identify the transcriptional modules and module-based gene networks simultaneously. The state space model has the potential to infer large-scale gene networks, e.g. of order 10(3), from time-course gene expression profiles. Particularly, we succeeded in the identification of a cell cycle system by using the gene expression profiles of Saccharomyces cerevisiae in which the length of the time-course and number of genes were 24 and 4382, respectively. However, when analysing shorter time-course data, e.g. of length 10 or less, the parameter estimations of the state space model often fail due to overfitting. To extend the applicability of the state space model, we provide an approach to use the technical replicates of gene expression profiles, which are often measured in duplicate or triplicate. The use of technical replicates is important for achieving highly-efficient inferences of gene networks with short time-course data. The potential of the proposed method has been demonstrated through the time-course analysis of the gene expression profiles of human umbilical vein endothelial cells (HUVECs) undergoing growth factor deprivation-induced apoptosis. Supplementary Information and the software (TRANS-MNET) are available at http://daweb.ism.ac.jp/~yoshidar/software/ssm/.

  10. Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer

    PubMed Central

    Bhalla, Sherry; Chaudhary, Kumardeep; Kumar, Ritesh; Sehgal, Manika; Kaur, Harpreet; Sharma, Suresh; Raghava, Gajendra P. S.

    2017-01-01

    In this study, an attempt has been made to identify expression-based gene biomarkers that can discriminate early and late stage of clear cell renal cell carcinoma (ccRCC) patients. We have analyzed the gene expression of 523 samples to identify genes that are differentially expressed in the early and late stage of ccRCC. First, a threshold-based method has been developed, which attained a maximum accuracy of 71.12% with ROC 0.67 using single gene NR3C2. To improve the performance of threshold-based method, we combined two or more genes and achieved maximum accuracy of 70.19% with ROC of 0.74 using eight genes on the validation dataset. These eight genes include four underexpressed (NR3C2, ENAM, DNASE1L3, FRMPD2) and four overexpressed (PLEKHA9, MAP6D1, SMPD4, C11orf73) genes in the late stage of ccRCC. Second, models were developed using state-of-art techniques and achieved maximum accuracy of 72.64% and 0.81 ROC using 64 genes on validation dataset. Similar accuracy was obtained on 38 genes selected from subset of genes, involved in cancer hallmark biological processes. Our analysis further implied a need to develop gender-specific models for stage classification. A web server, CancerCSP, has been developed to predict stage of ccRCC using gene expression data derived from RNAseq experiments. PMID:28349958

  11. Identification of hub genes and pathways associated with retinoblastoma based on co-expression network analysis.

    PubMed

    Wang, Q L; Chen, X; Zhang, M H; Shen, Q H; Qin, Z M

    2015-12-08

    The objective of this paper was to identify hub genes and pathways associated with retinoblastoma using centrality analysis of the co-expression network and pathway-enrichment analysis. The co-expression network of retinoblastoma was constructed by weighted gene co-expression network analysis (WGCNA) based on differentially expressed (DE) genes, and clusters were obtained through the molecular complex detection (MCODE) algorithm. Degree centrality analysis of the co-expression network was performed to explore hub genes present in retinoblastoma. Pathway-enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Validation of hub gene expression in retinoblastoma was performed by reverse transcription-polymerase chain reaction (RT-PCR) analysis. The co-expression network based on 221 DE genes between retinoblastoma and normal controls consisted of 210 nodes and 3965 edges, and 5 clusters of the network were evaluated. By assessing the centrality analysis of the co-expression network, 21 hub genes were identified, such as SNORD115-41, RASSF2, and SNORD115-44. According to RT-PCR analysis, 16 of the 21 hub genes were differently expressed, including RASSF2 and CDCA7, and 5 were not differently expressed in retinoblastoma compared to normal controls. Pathway analysis showed that genes in 2 clusters were enriched in 3 pathways: purine metabolism, p53 signaling pathway, and melanogenesis. In this study, we successfully identified 16 hub genes and 3 pathways associated with retinoblastoma, which may be potential biomarkers for early detection and therapy for retinoblastoma.

  12. Multiobjective binary biogeography based optimization for feature selection using gene expression data.

    PubMed

    Li, Xiangtao; Yin, Minghao

    2013-12-01

    Gene expression data play an important role in the development of efficient cancer diagnoses and classification. However, gene expression data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a multi-objective biogeography based optimization method is proposed to select the small subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the Fisher-Markov selector is used to choose the 60 top gene expression data. Secondly, to make biogeography based optimization suitable for the discrete problem, binary biogeography based optimization, as called BBBO, is proposed based on a binary migration model and a binary mutation model. Then, multi-objective binary biogeography based optimization, as we called MOBBBO, is proposed by integrating the non-dominated sorting method and the crowding distance method into the BBBO framework. Finally, the MOBBBO method is used for gene selection, and support vector machine is used as the classifier with the leave-one-out cross-validation method (LOOCV). In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on ten gene expression dataset benchmarks. Experimental results demonstrate that the proposed method is better or at least comparable with previous particle swarm optimization (PSO) algorithm and support vector machine (SVM) from literature when considering the quality of the solutions obtained.

  13. Screening for genes and subnetworks associated with pancreatic cancer based on the gene expression profile.

    PubMed

    Long, Jin; Liu, Zhe; Wu, Xingda; Xu, Yuanhong; Ge, Chunlin

    2016-05-01

    The present study aimed to screen for potential genes and subnetworks associated with pancreatic cancer (PC) using the gene expression profile. The expression profile GSE 16515 was downloaded from the Gene Expression Omnibus database, which included 36 PC tissue samples and 16 normal samples. Limma package in R language was used to screen differentially expressed genes (DEGs), which were grouped as up‑ and downregulated genes. Then, PFSNet was applied to perform subnetwork analysis for all the DEGs. Moreover, Gene Ontology (GO) and REACTOME pathway enrichment analysis of up‑ and downregulated genes was performed, followed by protein‑protein interaction (PPI) network construction using Search Tool for the Retrieval of Interacting Genes Search Tool for the Retrieval of Interacting Genes. In total, 1,989 DEGs including 1,461 up‑ and 528 downregulated genes were screened out. Subnetworks including pancreatic cancer in PC tissue samples and intercellular adhesion in normal samples were identified, respectively. A total of 8 significant REACTOME pathways for upregulated DEGs, such as hemostasis and cell cycle, mitotic were identified. Moreover, 4 significant REACTOME pathways for downregulated DEGs, including regulation of β‑cell development and transmembrane transport of small molecules were screened out. Additionally, DEGs with high connectivity degrees, such as CCNA2 (cyclin A2) and PBK (PDZ binding kinase), of the module in the protein‑protein interaction network were mainly enriched with cell‑division cycle. CCNA2 and PBK of the module and their relative pathway cell‑division cycle, and two subnetworks (pancreatic cancer and intercellular adhesion subnetworks) may be pivotal for further understanding of the molecular mechanism of PC.

  14. Integrating Biological Covariates into Gene Expression-Based Predictors of Radiation Sensitivity

    PubMed Central

    Kamath, Vidya P.; Torres-Roca, Javier F.

    2017-01-01

    The use of gene expression-based classifiers has resulted in a number of promising potential signatures of patient diagnosis, prognosis, and response to therapy. However, these approaches have also created difficulties in trying to use gene expression alone to predict a complex trait. A practical approach to this problem is to integrate existing biological knowledge with gene expression to build a composite predictor. We studied the problem of predicting radiation sensitivity within human cancer cell lines from gene expression. First, we present evidence for the need to integrate known biological conditions (tissue of origin, RAS, and p53 mutational status) into a gene expression prediction problem involving radiation sensitivity. Next, we demonstrate using linear regression, a technique for incorporating this knowledge. The resulting correlations between gene expression and radiation sensitivity improved through the use of this technique (best-fit adjusted R2 increased from 0.3 to 0.84). Overfitting of data was examined through the use of simulation. The results reinforce the concept that radiation sensitivity is not driven solely by gene expression, but rather by a combination of distinct parameters. We show that accounting for biological heterogeneity significantly improves the ability of the model to identify genes that are associated with radiosensitivity.

  15. Localization of the modified base J in telomeric VSG gene expression sites of Trypanosoma brucei.

    PubMed

    van Leeuwen, F; Wijsman, E R; Kieft, R; van der Marel, G A; van Boom, J H; Borst, P

    1997-12-01

    African trypanosomes such as Trypanosoma brucei undergo antigenic variation in the bloodstream of their mammalian hosts by regularly changing the variant surface glycoprotein (VSG) gene expressed. The transcribed VSG gene is invariably located in a telomeric expression site. There are multiple expression sites and one way to change the VSG gene expressed is by activating a new site and inactivating the previously active one. The mechanisms that control expression site switching are unknown, but have been suggested to involve epigenetic regulation. We have found previously that VSG genes in silent (but not active) expression sites contain modified restriction endonuclease cleavage sites, and we have presented circumstantial evidence indicating that this is attributable to the presence of a novel modified base beta-D-glucosyl-hydroxymethyluracil, or J. To directly test this, we have generated antisera that specifically recognize J-containing DNA and have used these to determine the precise location of this modified thymine in the telomeric VSG expression sites. By anti J-DNA immunoprecipitations, we found that J is present in telomeric VSG genes in silenced expression sites and not in actively transcribed telomeric VSG genes. J was absent from inactive chromosome-internal VSG genes. DNA modification was also found at the boundaries of expression sites. In the long 50-bp repeat arrays upstream of the promoter and in the telomeric repeat arrays downstream of the VSG gene, J was found both in silent and active expression sites. This suggests that silencing results in a gradient of modification spreading from repetitive DNA flanks into the neighboring expression site sequences. In this paper, we discuss the possible role of J in silencing of expression sites.

  16. Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling

    PubMed Central

    Li, Xia; Rao, Shaoqi; Jiang, Wei; Li, Chuanxing; Xiao, Yun; Guo, Zheng; Zhang, Qingpu; Wang, Lihong; Du, Lei; Li, Jing; Li, Li; Zhang, Tianwen; Wang, Qing K

    2006-01-01

    Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network) to address the underlying regulations of genes that can span any unit(s) of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex gene regulations related to

  17. Design-based learning for biology: Genetic engineering experience improves understanding of gene expression.

    PubMed

    Ellefson, Michelle R; Brinker, Rebecca A; Vernacchio, Vincent J; Schunn, Christian D

    2008-07-01

    Gene expression is a difficult topic for students to learn and comprehend, at least partially because it involves various biochemical structures and processes occurring at the microscopic level. Designer Bacteria, a design-based learning (DBL) unit for high-school students, applies principles of DBL to the teaching of gene expression. Throughout the 8-week unit, students genetically engineer bacteria to meet a need in their own lives. Through a series of investigations, discussions, and design modifications, students learn about the molecular processes and structures involved in gene expression, and how these processes and structures are dependent upon various environmental variables. This article is intended to describe the Designer Bacteria unit and report preliminary results of student progress and performance on pre-unit and post-unit assessments. Teacher experiences and student progress indicate that Designer Bacteria successfully taught core aspects of gene expression through DBL. Copyright © 2008 International Union of Biochemistry and Molecular Biology, Inc.

  18. Prediction of highly expressed genes in microbes based on chromatin accessibility

    PubMed Central

    Willenbrock, Hanni; Ussery, David W

    2007-01-01

    Background It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI) values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes. Results We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. Conclusion This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches. PMID:17295928

  19. GeneShelf: a web-based visual interface for large gene expression time-series data repositories.

    PubMed

    Kim, Bohyoung; Lee, Bongshin; Knoblach, Susan; Hoffman, Eric; Seo, Jinwook

    2009-01-01

    A widespread use of high-throughput gene expression analysis techniques enabled the biomedical research community to share a huge body of gene expression datasets in many public databases on the web. However, current gene expression data repositories provide static representations of the data and support limited interactions. This hinders biologists from effectively exploring shared gene expression datasets. Responding to the growing need for better interfaces to improve the utility of the public datasets, we have designed and developed a new web-based visual interface entitled GeneShelf (http://bioinformatics.cnmcresearch.org/GeneShelf). It builds upon a zoomable grid display to represent two categorical dimensions. It also incorporates an augmented timeline with expandable time points that better shows multiple data values for the focused time point by embedding bar charts. We applied GeneShelf to one of the largest microarray datasets generated to study the progression and recovery process of injuries at the spinal cord of mice and rats. We present a case study and a preliminary qualitative user study with biologists to show the utility and usability of GeneShelf.

  20. Gene Expression Signature-Based Screening Identifies New Broadly Effective Influenza A Antivirals

    PubMed Central

    Josset, Laurence; Textoris, Julien; Loriod, Béatrice; Ferraris, Olivier; Moules, Vincent; Lina, Bruno; N'Guyen, Catherine; Diaz, Jean-Jacques; Rosa-Calatrava, Manuel

    2010-01-01

    Classical antiviral therapies target viral proteins and are consequently subject to resistance. To counteract this limitation, alternative strategies have been developed that target cellular factors. We hypothesized that such an approach could also be useful to identify broad-spectrum antivirals. The influenza A virus was used as a model for its viral diversity and because of the need to develop therapies against unpredictable viruses as recently underlined by the H1N1 pandemic. We proposed to identify a gene-expression signature associated with infection by different influenza A virus subtypes which would allow the identification of potential antiviral drugs with a broad anti-influenza spectrum of activity. We analyzed the cellular gene expression response to infection with five different human and avian influenza A virus strains and identified 300 genes as differentially expressed between infected and non-infected samples. The most 20 dysregulated genes were used to screen the connectivity map, a database of drug-associated gene expression profiles. Candidate antivirals were then identified by their inverse correlation to the query signature. We hypothesized that such molecules would induce an unfavorable cellular environment for influenza virus replication. Eight potential antivirals including ribavirin were identified and their effects were tested in vitro on five influenza A strains. Six of the molecules inhibited influenza viral growth. The new pandemic H1N1 virus, which was not used to define the gene expression signature of infection, was inhibited by five out of the eight identified molecules, demonstrating that this strategy could contribute to identifying new broad anti-influenza agents acting on cellular gene expression. The identified infection signature genes, the expression of which are modified upon infection, could encode cellular proteins involved in the viral life cycle. This is the first study showing that gene expression-based screening can be

  1. GEPAS: a web-based resource for microarray gene expression data analysis

    PubMed Central

    Herrero, Javier; Al-Shahrour, Fátima; Díaz-Uriarte, Ramón; Mateos, Álvaro; Vaquerizas, Juan M.; Santoyo, Javier; Dopazo, Joaquín

    2003-01-01

    We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es). PMID:12824345

  2. A new gene regulatory network model based on BP algorithm for interrogating differentially expressed genes of Sea Urchin.

    PubMed

    Liu, Longlong; Zhao, Tingting; Ma, Meng; Wang, Yan

    2016-01-01

    Computer science and mathematical theories are combined to analyze the complex interactions among genes, which are simplified to a network to establish a theoretical model for the analysis of the structure, module and dynamic properties. In contrast, traditional model of gene regulatory networks often lack an effective method for solving gene expression data because of high durational and spatial complexity. In this paper, we propose a new model for constructing gene regulatory networks using back propagation (BP) neural network based on predictive function and network topology. Combined with complex nonlinear mapping and self-learning, the BP neural network was mapped into a complex network. Network characteristics were obtained from the parameters of the average path length, average clustering coefficient, average degree, modularity, and map's density to simulate the real gene network by an artificial network. Through the statistical analysis and comparison of network parameters of Sea Urchin mRNA microarray data under different temperatures, the value of network parameters was observed. Differentially expressed Sea Urchin genes associated with temperature were determined by calculating the difference in the degree of each gene from different networks. The new model we developed is suitable to simulate gene regulatory network and has capability of determining differentially expressed genes.

  3. The Correlation-Base-Selection Algorithm for Diagnostic Schizophrenia Based on Blood-Based Gene Expression Signatures

    PubMed Central

    Zhang, Hang; Xie, Ziyang; Yang, Yuwen; Zhao, Yizhen

    2017-01-01

    Microarray analysis of gene expression is often used to diagnose different types of disease. Many studies report remarkable achievements in nervous system disease. Clinical diagnosis of schizophrenia (SCZ) still depends on doctors' experience, which is unreliable and needs to be more objective and quantified. To solve this problem, we collected whole blood gene expression data from four studies, including 152 individuals with schizophrenia (SCZ) and 138 normal controls in different regions. The correlation-based feature selection (CFS, one of the machine learning methods) algorithm was applied in this study, and 103 significantly differentially expressed genes between patients and controls, called “feature genes,” were selected; then, a model for SCZ diagnosis was built. The samples were subdivided into 10 groups, and cross-validation showed that the model we constructed achieved nearly 100% classification accuracy. Mathematical evaluation of the datasets before and after data processing proved the effectiveness of our algorithm. Feature genes were enriched in Parkinson's disease, oxidative phosphorylation, and TGF-beta signaling pathways, which were previously reported to be associated with SCZ. These results suggest that the analysis of gene expression in whole blood by our model could be a useful tool for diagnosing SCZ. PMID:28280741

  4. Gene expression profiling based identification of cell surface targets for developing multimeric ligands in pancreatic cancer

    PubMed Central

    Balagurunathan, Yoganand; Morse, David L.; Hostetter, Galen; Shanmugam, Vijayalakshmi; Stafford, Phillip; Shack, Sonsoles; Pearson, John; Trissal, Maria; Demeure, Michael J.; Von Hoff, Daniel D.; Hruby, Victor J.; Gillies, Robert J.; Han, Haiyong

    2008-01-01

    Multimeric ligands are ligands that contain multiple binding domains that simultaneously target multiple cell surface proteins. Due to cooperative binding, multimeric ligands can have high avidity for cells (tumor) expressing all targeting proteins and only show minimal binding to cells (normal tissues) expressing none or only some of the targets. Identifying combinations of targets that concurrently express in tumor cells, but not in normal cells is a challenging task. Here, we describe a novel approach for identifying such combinations using genome-wide gene expression profiling followed by immunohistochemistry. We first generated a database of mRNA gene expression profiles for 28 pancreatic cancer specimens and 103 normal tissue samples representing 28 unique tissue/cell types using DNA microarrays. The expression data for genes that encode proteins with cell surface epitopes were then extracted from the database and analyzed using a novel multivariate rule-based computational approach to identify gene combinations that are expressed at an efficient binding level in tumors, but not in normal tissues. These combinations were further ranked according to the proportion of tumor samples that expressed the sets at efficient levels. Protein expression of the genes contained in the top ranked combinations was confirmed using immunohistochemistry on a pancreatic tumor tissue and normal tissue microarrays. Co-expression of targets was further validated by their combined expression in pancreatic cancer cell lines using immunocytochemistry. These validated gene combinations thus encompass a list of cell surface targets that can be used to develop multimeric ligands for the imaging and treatment of pancreatic cancer. PMID:18765825

  5. An artificial cell based on gene expression in vesicle

    NASA Astrophysics Data System (ADS)

    Noireaux, Vincent

    2006-03-01

    A new experimental approach is presented to build an artificial cell using the translation machinery of a cell-free expression system as the hardware and a DNA synthetic program as the software. Cytoplasmic extracts, encapsulated in phospholipid vesicles, are used to assemble custom-made genetic circuits to develop the functions of a minimal cell. The objective is to understand how a DNA algorithm can be designed to build an operating system that has some of the properties of life. We show how a long-lived bioreactor is built to carry out in vitro transcription and translation in cell-sized vesicles. To develop the synthetic membrane into an active interface, a few amphipathic peptides and an insertion mechanism of integral membrane proteins have been tested. With vesicles composed of different phospholipids, the fusion protein alpha-hemolysin-eGFP can be expressed to reveal patterns on the membrane. Finally, specific degradation mechanisms are introduced to create a sink for the synthesized messengers and proteins. Perspectives and limitations of this approach will be discussed.

  6. Cytomegalovirus Replicon-Based Regulation of Gene Expression In Vitro and In Vivo

    PubMed Central

    Mohr, Hermine; Mohr, Christian A.; Schneider, Marlon R.; Scrivano, Laura; Adler, Barbara; Kraner-Schreiber, Simone; Schnieke, Angelika; Dahlhoff, Maik; Wolf, Eckhard; Koszinowski, Ulrich H.; Ruzsics, Zsolt

    2012-01-01

    There is increasing evidence for a connection between DNA replication and the expression of adjacent genes. Therefore, this study addressed the question of whether a herpesvirus origin of replication can be used to activate or increase the expression of adjacent genes. Cell lines carrying an episomal vector, in which reporter genes are linked to the murine cytomegalovirus (MCMV) origin of lytic replication (oriLyt), were constructed. Reporter gene expression was silenced by a histone-deacetylase-dependent mechanism, but was resolved upon lytic infection with MCMV. Replication of the episome was observed subsequent to infection, leading to the induction of gene expression by more than 1000-fold. oriLyt-based regulation thus provided a unique opportunity for virus-induced conditional gene expression without the need for an additional induction mechanism. This principle was exploited to show effective late trans-complementation of the toxic viral protein M50 and the glycoprotein gO of MCMV. Moreover, the application of this principle for intracellular immunization against herpesvirus infection was demonstrated. The results of the present study show that viral infection specifically activated the expression of a dominant-negative transgene, which inhibited viral growth. This conditional system was operative in explant cultures of transgenic mice, but not in vivo. Several applications are discussed. PMID:22685399

  7. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    PubMed

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  8. Knowledge-based analysis of microarray gene expression data by using support vector machines

    SciTech Connect

    William Grundy; Manuel Ares, Jr.; David Haussler

    2001-06-18

    The authors introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. They test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, they use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

  9. Expression of Human Skin-Specific Genes Defined by Transcriptomics and Antibody-Based Profiling

    PubMed Central

    Edqvist, Per-Henrik D.; Fagerberg, Linn; Hallström, Björn M.; Danielsson, Angelika; Edlund, Karolina; Uhlén, Mathias

    2014-01-01

    To increase our understanding of skin, it is important to define the molecular constituents of the cell types and epidermal layers that signify normal skin. We have combined a genome-wide transcriptomics analysis, using deep sequencing of mRNA from skin biopsies, with immunohistochemistry-based protein profiling to characterize the landscape of gene and protein expression in normal human skin. The transcriptomics and protein expression data of skin were compared to 26 (RNA) and 44 (protein) other normal tissue types. All 20,050 putative protein-coding genes were classified into categories based on patterns of expression. We found that 417 genes showed elevated expression in skin, with 106 genes expressed at least five-fold higher than that in other tissues. The 106 genes categorized as skin enriched encoded for well-known proteins involved in epidermal differentiation and proteins with unknown functions and expression patterns in skin, including the C1orf68 protein, which showed the highest relative enrichment in skin. In conclusion, we have applied a genome-wide analysis to identify the human skin-specific proteome and map the precise localization of the corresponding proteins in different compartments of the skin, to facilitate further functional studies to explore the molecular repertoire of normal skin and to identify biomarkers related to various skin diseases. PMID:25411189

  10. TABASCO: A single molecule, base-pair resolved gene expression simulator

    PubMed Central

    Kosuri, Sriram; Kelly, Jason R; Endy, Drew

    2007-01-01

    Background Experimental studies of gene expression have identified some of the individual molecular components and elementary reactions that comprise and control cellular behavior. Given our current understanding of gene expression, and the goals of biotechnology research, both scientists and engineers would benefit from detailed simulators that can explicitly compute genome-wide expression levels as a function of individual molecular events, including the activities and interactions of molecules on DNA at single base pair resolution. However, for practical reasons including computational tractability, available simulators have not been able to represent genome-scale models of gene expression at this level of detail. Results Here we develop a simulator, TABASCO , which enables the precise representation of individual molecules and events in gene expression for genome-scale systems. We use a single molecule computational engine to track individual molecules interacting with and along nucleic acid polymers at single base resolution. Tabasco uses logical rules to automatically update and delimit the set of species and reactions that comprise a system during simulation, thereby avoiding the need for a priori specification of all possible combinations of molecules and reaction events. We confirm that single molecule, base-pair resolved simulation using TABASCO (Tabasco) can accurately compute gene expression dynamics and, moving beyond previous simulators, provide for the direct representation of intermolecular events such as polymerase collisions and promoter occlusion. We demonstrate the computational capacity of Tabasco by simulating the entirety of gene expression during bacteriophage T7 infection; for reference, the 39,937 base pair T7 genome encodes 56 genes that are transcribed by two types of RNA polymerases active across 22 promoters. Conclusion Tabasco enables genome-scale simulation of transcription and translation at individual molecule and single base

  11. TABASCO: A single molecule, base-pair resolved gene expression simulator.

    PubMed

    Kosuri, Sriram; Kelly, Jason R; Endy, Drew

    2007-12-19

    Experimental studies of gene expression have identified some of the individual molecular components and elementary reactions that comprise and control cellular behavior. Given our current understanding of gene expression, and the goals of biotechnology research, both scientists and engineers would benefit from detailed simulators that can explicitly compute genome-wide expression levels as a function of individual molecular events, including the activities and interactions of molecules on DNA at single base pair resolution. However, for practical reasons including computational tractability, available simulators have not been able to represent genome-scale models of gene expression at this level of detail. Here we develop a simulator, TABASCO http://openwetware.org/wiki/TABASCO, which enables the precise representation of individual molecules and events in gene expression for genome-scale systems. We use a single molecule computational engine to track individual molecules interacting with and along nucleic acid polymers at single base resolution. Tabasco uses logical rules to automatically update and delimit the set of species and reactions that comprise a system during simulation, thereby avoiding the need for a priori specification of all possible combinations of molecules and reaction events. We confirm that single molecule, base-pair resolved simulation using TABASCO (Tabasco) can accurately compute gene expression dynamics and, moving beyond previous simulators, provide for the direct representation of intermolecular events such as polymerase collisions and promoter occlusion. We demonstrate the computational capacity of Tabasco by simulating the entirety of gene expression during bacteriophage T7 infection; for reference, the 39,937 base pair T7 genome encodes 56 genes that are transcribed by two types of RNA polymerases active across 22 promoters. Tabasco enables genome-scale simulation of transcription and translation at individual molecule and single base

  12. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    SciTech Connect

    Tucker, James D.; Joiner, Michael C.; Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V.; Chinkhota, Chantelle N.; Smolinski, Joseph M.; Divine, George W.; Auner, Gregory W.

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  13. Allen Brain Atlas-Driven Visualizations: a web-based gene expression energy visualization tool

    PubMed Central

    Zaldivar, Andrew; Krichmar, Jeffrey L.

    2014-01-01

    The Allen Brain Atlas-Driven Visualizations (ABADV) is a publicly accessible web-based tool created to retrieve and visualize expression energy data from the Allen Brain Atlas (ABA) across multiple genes and brain structures. Though the ABA offers their own search engine and software for researchers to view their growing collection of online public data sets, including extensive gene expression and neuroanatomical data from human and mouse brain, many of their tools limit the amount of genes and brain structures researchers can view at once. To complement their work, ABADV generates multiple pie charts, bar charts and heat maps of expression energy values for any given set of genes and brain structures. Such a suite of free and easy-to-understand visualizations allows for easy comparison of gene expression across multiple brain areas. In addition, each visualization links back to the ABA so researchers may view a summary of the experimental detail. ABADV is currently supported on modern web browsers and is compatible with expression energy data from the Allen Mouse Brain Atlas in situ hybridization data. By creating this web application, researchers can immediately obtain and survey numerous amounts of expression energy data from the ABA, which they can then use to supplement their work or perform meta-analysis. In the future, we hope to enable ABADV across multiple data resources. PMID:24904397

  14. Allen Brain Atlas-Driven Visualizations: a web-based gene expression energy visualization tool.

    PubMed

    Zaldivar, Andrew; Krichmar, Jeffrey L

    2014-01-01

    The Allen Brain Atlas-Driven Visualizations (ABADV) is a publicly accessible web-based tool created to retrieve and visualize expression energy data from the Allen Brain Atlas (ABA) across multiple genes and brain structures. Though the ABA offers their own search engine and software for researchers to view their growing collection of online public data sets, including extensive gene expression and neuroanatomical data from human and mouse brain, many of their tools limit the amount of genes and brain structures researchers can view at once. To complement their work, ABADV generates multiple pie charts, bar charts and heat maps of expression energy values for any given set of genes and brain structures. Such a suite of free and easy-to-understand visualizations allows for easy comparison of gene expression across multiple brain areas. In addition, each visualization links back to the ABA so researchers may view a summary of the experimental detail. ABADV is currently supported on modern web browsers and is compatible with expression energy data from the Allen Mouse Brain Atlas in situ hybridization data. By creating this web application, researchers can immediately obtain and survey numerous amounts of expression energy data from the ABA, which they can then use to supplement their work or perform meta-analysis. In the future, we hope to enable ABADV across multiple data resources.

  15. Weighted gene co-expression based biomarker discovery for psoriasis detection.

    PubMed

    Sundarrajan, Sudharsana; Arumugam, Mohanapriya

    2016-11-15

    Psoriasis is a chronic inflammatory disease of the skin with an unknown aetiology. The disease manifests itself as red and silvery scaly plaques distributed over the scalp, lower back and extensor aspects of the limbs. After receiving scant consideration for quite a few years, psoriasis has now become a prominent focus for new drug development. A group of closely connected and differentially co-expressed genes may act in a network and may serve as molecular signatures for an underlying phenotype. A weighted gene coexpression network analysis (WGCNA), a system biology approach has been utilized for identification of new molecular targets for psoriasis. Gene coexpression relationships were investigated in 58 psoriatic lesional samples resulting in five gene modules, clustered based on the gene coexpression patterns. The coexpression pattern was validated using three psoriatic datasets. 10 highly connected and informative genes from each module was selected and termed as psoriasis specific hub signatures. A random forest based binary classifier built using the expression profiles of signature genes robustly distinguished psoriatic samples from the normal samples in the validation set with an accuracy of 0.95 to 1. These signature genes may serve as potential candidates for biomarker discovery leading to new therapeutic targets. WGCNA, the network based approach has provided an alternative path to mine out key controllers and drivers of psoriasis. The study principle from the current work can be extended to other pathological conditions.

  16. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    DOE PAGES

    Wang, Pin; Wang, Yunshan; Hang, Bo; ...

    2016-07-11

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less

  17. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    SciTech Connect

    Wang, Pin; Wang, Yunshan; Hang, Bo; Zou, Xiaoping; Mao, Jian-Hua

    2016-07-11

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A network was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.

  18. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    PubMed Central

    Hang, Bo; Zou, Xiaoping; Mao, Jian-Hua

    2016-01-01

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A network was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system. PMID:27419373

  19. Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data

    PubMed Central

    Wang, Haiying; Zheng, Huiru; Simpson, David; Azuaje, Francisco

    2006-01-01

    Background Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. Results Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE), this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model) performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re-sampling techniques were

  20. WF-MSB: a weighted fuzzy-based biclustering method for gene expression data.

    PubMed

    Chen, Lien-Chin; Yu, Philip S; Tseng, Vincent S

    2011-01-01

    Biclustering is an important analysis method on gene expression data for finding a subset of genes sharing compatible expression patterns. Although some biclustering algorithms have been proposed, few provided a query-driven approach for biologists to search the biclusters, which contain a certain gene of interest. In this paper, we proposed a generalised fuzzy-based approach, namely Weighted Fuzzy-based Maximum Similarity Biclustering (WF-MSB), for extracting a query-driven bicluster based on the user-defined reference gene. A fuzzy-based similarity measurement and condition weighting approach are used to extract significant biclusters in expression levels. Both of the most similar bicluster and the most dissimilar bicluster to the reference gene are discovered by WF-MSB. The proposed WF-MSB method was evaluated in comparison with MSBE on a real yeast microarray data and synthetic data sets. The experimental results show that WF-MSB can effectively find the biclusters with significant GO-based functional meanings.

  1. Ontology based molecular signatures for immune cell types via gene expression analysis.

    PubMed

    Meehan, Terrence F; Vasilevsky, Nicole A; Mungall, Christopher J; Dougall, David S; Haendel, Melissa A; Blake, Judith A; Diehl, Alexander D

    2013-08-30

    New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an 'Ontologically BAsed Molecular Signature' (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type's identity. We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis - providing a new method for defining novel biomarkers and providing an opportunity for new biological insights.

  2. Integrating biological knowledge based on functional annotations for biclustering of gene expression data.

    PubMed

    Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S

    2015-05-01

    Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows

  3. A Pathway Based Classification Method for Analyzing Gene Expression for Alzheimer’s Disease Diagnosis

    PubMed Central

    Voyle, Nicola; Keohane, Aoife; Newhouse, Stephen; Lunnon, Katie; Johnston, Caroline; Soininen, Hilkka; Kloszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon; Hodges, Angela; Kiddle, Steven; Dobson, Richard JB.

    2015-01-01

    Background: Recent studies indicate that gene expression levels in blood may be able to differentiate subjects with Alzheimer’s disease (AD) from normal elderly controls and mild cognitively impaired (MCI) subjects. However, there is limited replicability at the single marker level. A pathway-based interpretation of gene expression may prove more robust. Objectives: This study aimed to investigate whether a case/control classification model built on pathway level data was more robust than a gene level model and may consequently perform better in test data. The study used two batches of gene expression data from the AddNeuroMed (ANM) and Dementia Case Registry (DCR) cohorts. Methods: Our study used Illumina Human HT-12 Expression BeadChips to collect gene expression from blood samples. Random forest modeling with recursive feature elimination was used to predict case/control status. Age and APOE ɛ4 status were used as covariates for all analysis. Results: Gene and pathway level models performed similarly to each other and to a model based on demographic information only. Conclusions: Any potential increase in concordance from the novel pathway level approach used here has not lead to a greater predictive ability in these datasets. However, we have only tested one method for creating pathway level scores. Further, we have been able to benchmark pathways against genes in datasets that had been extensively harmonized. Further work should focus on the use of alternative methods for creating pathway level scores, in particular those that incorporate pathway topology, and the use of an endophenotype based approach. PMID:26484910

  4. A stationary wavelet entropy-based clustering approach accurately predicts gene expression.

    PubMed

    Nguyen, Nha; Vo, An; Choi, Inchan; Won, Kyoung-Jae

    2015-03-01

    Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation.

  5. Genomic DNA-based absolute quantification of gene expression in Vitis

    USDA-ARS?s Scientific Manuscript database

    Many studies in which gene expression is quantified by polymerase chain reaction represent the expression of a gene of interest (GOI) relative to that of a reference gene (RG). Relative expression is founded on the assumptions that RG expression is stable across samples, treatments, organs, etc., an...

  6. QSAR study of 1,4-dihydropyridine calcium channel antagonists based on gene expression programming.

    PubMed

    Si, Hong Zong; Wang, Tao; Zhang, Ke Jun; Hu, Zhi De; Fan, Bo Tao

    2006-07-15

    The gene expression programming, a novel machine learning algorithm, is used to develop quantitative model as a potential screening mechanism for a series of 1,4-dihydropyridine calcium channel antagonists for the first time. The heuristic method was used to search the descriptor space and select the descriptors responsible for activity. A nonlinear, six-descriptor model based on gene expression programming with mean-square errors 0.19 was set up with a predicted correlation coefficient (R2) 0.92. This paper provides a new and effective method for drug design and screening.

  7. A method for multiplex gene synthesis employing error correction based on expression.

    PubMed

    Hsiau, Timothy H-C; Sukovich, David; Elms, Phillip; Prince, Robin N; Strittmatter, Tobias; Stritmatter, Tobias; Ruan, Paul; Curry, Bo; Anderson, Paige; Sampson, Jeff; Anderson, J Christopher

    2015-01-01

    Our ability to engineer organisms with new biosynthetic pathways and genetic circuits is limited by the availability of protein characterization data and the cost of synthetic DNA. With new tools for reading and writing DNA, there are opportunities for scalable assays that more efficiently and cost effectively mine for biochemical protein characteristics. To that end, we have developed the Multiplex Library Synthesis and Expression Correction (MuLSEC) method for rapid assembly, error correction, and expression characterization of many genes as a pooled library. This methodology enables gene synthesis from microarray-synthesized oligonucleotide pools with a one-pot technique, eliminating the need for robotic liquid handling. Post assembly, the gene library is subjected to an ampicillin based quality control selection, which serves as both an error correction step and a selection for proteins that are properly expressed and folded in E. coli. Next generation sequencing of post selection DNA enables quantitative analysis of gene expression characteristics. We demonstrate the feasibility of this approach by building and testing over 90 genes for empirical evidence of soluble expression. This technique reduces the problem of part characterization to multiplex oligonucleotide synthesis and deep sequencing, two technologies under extensive development with projected cost reduction.

  8. Identification and expression of cuticular protein genes based on Locusta migratoria transcriptome

    PubMed Central

    Zhao, Xiaoming; Gou, Xin; Qin, Zhongyu; Li, Daqi; Wang, Yan; Ma, Enbo; Li, Sheng; Zhang, Jianzhen

    2017-01-01

    Many types of cuticular proteins are found in a single insect species, and their number and features are very diversified among insects. The cuticle matrix consists of many different proteins that confer the physical properties of the exoskeleton. However, the number and properties of cuticle proteins in Locusta migratoria remain unclear. In the present study, Illumina sequencing and de novo assembly were combined to characterize the transcriptome of L. migratoria. Eighty-one cuticular protein genes were identified and divided into five groups: the CPR family (51), Tweedle (2), CPF/CPFLs (9), CPAP family (9), and other genes (10). Based on the expression patterns in different tissues and stages, most of the genes as a test were distributed in the integument, pronotum and wings, and expressed in selected stages with different patterns. The results showed no obvious correlation between the expression patterns and the conservative motifs. Additionally, each cluster displayed a different expression pattern that may possess a different function in the cuticle. Furthermore, the complexity of the large variety of genes displayed differential expression during the molting cycle may be associated with cuticle formation and may provide insights into the gene networks related to cuticle formation. PMID:28368027

  9. Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blinded Validation Study

    PubMed Central

    Shedden, Kerby; Taylor, Jeremy M.G.; Enkemann, Steve A.; Tsao, Ming S.; Yeatman, Timothy J.; Gerald, William L.; Eschrich, Steve; Jurisica, Igor; Venkatraman, Seshan E.; Meyerson, Matthew; Kuick, Rork; Dobbin, Kevin K.; Lively, Tracy; Jacobson, James W.; Beer, David G.; Giordano, Thomas J.; Misek, David E.; Chang, Andrew C.; Zhu, Chang Qi; Strumpf, Dan; Hanash, Samir; Shepherd, Francis A.; Ding, Kuyue; Seymour, Lesley; Naoki, Katsuhiko; Pennell, Nathan; Weir, Barbara; Verhaak, Roel; Ladd-Acosta, Christine; Golub, Todd; Gruidl, Mike; Szoke, Janos; Zakowski, Maureen; Rusch, Valerie; Kris, Mark; Viale, Agnes; Motoi, Noriko; Travis, William; Sharma, Anupama

    2009-01-01

    Although prognostic gene expression signatures for survival in early stage lung cancer have been proposed, for clinical application it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) can be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas. PMID:18641660

  10. VAMPIRE microarray suite: a web-based platform for the interpretation of gene expression data

    PubMed Central

    Hsiao, Albert; Ideker, Trey; Olefsky, Jerrold M.; Subramaniam, Shankar

    2005-01-01

    Microarrays are invaluable high-throughput tools used to snapshot the gene expression profiles of cells and tissues. Among the most basic and fundamental questions asked of microarray data is whether individual genes are significantly activated or repressed by a particular stimulus. We have previously presented two Bayesian statistical methods for this level of analysis, collectively known as variance-modeled posterior inference with regional exponentials (VAMPIRE). These methods each require a sophisticated modeling step followed by integration of a posterior probability density. We present here a publicly available, web-based platform that allows users to easily load data, associate related samples and identify differentially expressed features using the VAMPIRE statistical framework. In addition, this suite of tools seamlessly integrates a novel gene annotation tool, known as GOby, which identifies statistically overrepresented gene groups. Unlike other tools in this genre, GOby can localize enrichment while respecting the hierarchical structure of annotation systems like Gene Ontology (GO). By identifying statistically significant enrichment of GO terms, Kyoto Encyclopedia of Genes and Genomes pathways, and TRANSFAC transcription factor binding sites, users can gain substantial insight into the physiological significance of sets of differentially expressed genes. The VAMPIRE microarray suite can be accessed at . PMID:15980550

  11. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  12. Transcriptome based identification and tissue expression profiles of chemosensory genes in Blattella germanica (Blattaria: Blattidae).

    PubMed

    Niu, Dong-Juan; Liu, Yan; Dong, Xiao-Tong; Dong, Shuang-Lin

    2016-06-01

    Blattalla germanica is one of the most notorious household insect pests, and evolutionally more primitive than those well studied moths and flies, regarding the molecular mechanisms of chemosensation. In this study, we sequenced, for the first time, the antennal transcriptome of B. germanica using the Illumina HiSeq™ 2000 platform and then conducted the bioinformatic analysis of the data. In total, we identified 73 putative chemosensory genes, with 62 genes being novel in this species. These chemosensory genes included 48 odorant binding proteins (OBPs), 9 chemosensory proteins (CSPs), 6 sensory neuron membrane proteins (SNMPs), 5 odorant receptors (ORs) and 5 ionotropic receptors (IRs). Notably, Plus-C OBPs account for an exceptionally high proportion (39.58%) of the total 48 OBPs in this primitive insect. To predict the chemosensory functions of the genes, a detailed global tissue expression profiling was investigated by reverse transcription polymerase chain reaction (RT-PCR). Most OBP genes showed a chemosensory tissue biased profile, while CSP transcripts were widely and evenly expressed in different tissues. Furthermore, we found that more than half the chemosensory genes were expressed in the cerci, implying the important chemosensory functions of the organ in B. germanica. Taken together, our study provides important bases for elucidation of the molecular mechanisms and evolution of insect chemosensation, and for development of the chemosensation based techniques to control B. germanica.

  13. Association Rule Based Similarity Measures for the Clustering of Gene Expression Data

    PubMed Central

    Sethi, Prerna; Alagiriswamy, Sathya

    2010-01-01

    In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results. PMID:21603179

  14. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network.

    PubMed

    Wei, Pi-Jing; Zhang, Di; Xia, Junfeng; Zheng, Chun-Hou

    2016-12-23

    Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each individual. Correspondingly, one of the key challenges is to pinpoint functional driver mutations or pathways, which contributes to tumorigenesis, from millions of functional neutral passenger mutations. In this paper, in order to identify driver genes effectively, we applied a generalized additive model to mutation profiles to filter genes with long length and constructed a new gene-gene interaction network. Then we integrated the mutation data and expression data into the gene-gene interaction network. Lastly, greedy algorithm was used to prioritize candidate driver genes from the integrated data. We named the proposed method Length-Net-Driver (LNDriver). Experiments on three TCGA datasets, i.e., head and neck squamous cell carcinoma, kidney renal clear cell carcinoma and thyroid carcinoma, demonstrated that the proposed method was effective. Also, it can identify not only frequently mutated drivers, but also rare candidate driver genes.

  15. Identification of potential crucial genes associated with steroid-induced necrosis of femoral head based on gene expression profile.

    PubMed

    Lin, Zhe; Lin, Yongsheng

    2017-09-05

    The aim of this study was to explore potential crucial genes associated with the steroid-induced necrosis of femoral head (SINFH) and to provide valid biological information for further investigation of SINFH. Gene expression profile of GSE26316, generated from 3 SINFH rat samples and 3 normal rat samples were downloaded from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identified using LIMMA package. After functional enrichment analyses of DEGs, protein-protein interaction (PPI) network and sub-PPI network analyses were conducted based on the STRING database and cytoscape. In total, 59 up-regulated DEGs and 156 downregulated DEGs were identified. The up-regulated DEGs were mainly involved in functions about immunity (e.g. Fcer1A and Il7R), and the downregulated DEGs were mainly enriched in muscle system process (e.g. Tnni2, Mylpf and Myl1). The PPI network of DEGs consisted of 123 nodes and 300 interactions. Tnni2, Mylpf, and Myl1 were the top 3 outstanding genes based on both subgraph centrality and degree centrality evaluation. These three genes interacted with each other in the network. Furthermore, the significant network module was composed of 22 downregulated genes (e.g. Tnni2, Mylpf and Myl1). These genes were mainly enriched in functions like muscle system process. The DEGs related to the regulation of immune system process (e.g. Fcer1A and Il7R), and DEGs correlated with muscle system process (e.g. Tnni2, Mylpf and Myl1) may be closely associated with the progress of SINFH, which is still needed to be confirmed by experiments. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Gene expression based mouse brain parcellation using Markov random field regularized non-negative matrix factorization

    NASA Astrophysics Data System (ADS)

    Pathak, Sayan D.; Haynor, David R.; Thompson, Carol L.; Lein, Ed; Hawrylycz, Michael

    2009-02-01

    Understanding the geography of genetic expression in the mouse brain has opened previously unexplored avenues in neuroinformatics. The Allen Brain Atlas (www.brain-map.org) (ABA) provides genome-wide colorimetric in situ hybridization (ISH) gene expression images at high spatial resolution, all mapped to a common three-dimensional 200μm3 spatial framework defined by the Allen Reference Atlas (ARA) and is a unique data set for studying expression based structural and functional organization of the brain. The goal of this study was to facilitate an unbiased data-driven structural partitioning of the major structures in the mouse brain. We have developed an algorithm that uses nonnegative matrix factorization (NMF) to perform parts based analysis of ISH gene expression images. The standard NMF approach and its variants are limited in their ability to flexibly integrate prior knowledge, in the context of spatial data. In this paper, we introduce spatial connectivity as an additional regularization in NMF decomposition via the use of Markov Random Fields (mNMF). The mNMF algorithm alternates neighborhood updates with iterations of the standard NMF algorithm to exploit spatial correlations in the data. We present the algorithm and show the sub-divisions of hippocampus and somatosensory-cortex obtained via this approach. The results are compared with established neuroanatomic knowledge. We also highlight novel gene expression based sub divisions of the hippocampus identified by using the mNMF algorithm.

  17. Biclustering of Gene Expression Data by Correlation-Based Scatter Search

    PubMed Central

    2011-01-01

    Background The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. Methods Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. Results The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database. PMID:21261986

  18. Analysis of ripening-related gene expression in papaya using an Arabidopsis-based microarray

    PubMed Central

    2012-01-01

    Background Papaya (Carica papaya L.) is a commercially important crop that produces climacteric fruits with a soft and sweet pulp that contain a wide range of health promoting phytochemicals. Despite its importance, little is known about transcriptional modifications during papaya fruit ripening and their control. In this study we report the analysis of ripe papaya transcriptome by using a cross-species (XSpecies) microarray technique based on the phylogenetic proximity between papaya and Arabidopsis thaliana. Results Papaya transcriptome analyses resulted in the identification of 414 ripening-related genes with some having their expression validated by qPCR. The transcription profile was compared with that from ripening tomato and grape. There were many similarities between papaya and tomato especially with respect to the expression of genes encoding proteins involved in primary metabolism, regulation of transcription, biotic and abiotic stress and cell wall metabolism. XSpecies microarray data indicated that transcription factors (TFs) of the MADS-box, NAC and AP2/ERF gene families were involved in the control of papaya ripening and revealed that cell wall-related gene expression in papaya had similarities to the expression profiles seen in Arabidopsis during hypocotyl development. Conclusion The cross-species array experiment identified a ripening-related set of genes in papaya allowing the comparison of transcription control between papaya and other fruit bearing taxa during the ripening process. PMID:23256600

  19. Monoterpenoid-based preparations in beehives affect learning, memory, and gene expression in the bee brain.

    PubMed

    Bonnafé, Elsa; Alayrangues, Julie; Hotier, Lucie; Massou, Isabelle; Renom, Allan; Souesme, Guillaume; Marty, Pierre; Allaoua, Marion; Treilhou, Michel; Armengaud, Catherine

    2017-02-01

    Bees are exposed in their environment to contaminants that can weaken the colony and contribute to bee declines. Monoterpenoid-based preparations can be introduced into hives to control the parasitic mite Varroa destructor. The long-term effects of monoterpenoids are poorly investigated. Olfactory conditioning of the proboscis extension reflex (PER) has been used to evaluate the impact of stressors on cognitive functions of the honeybee such as learning and memory. The authors tested the PER to odorants on bees after exposure to monoterpenoids in hives. Octopamine receptors, transient receptor potential-like (TRPL), and γ-aminobutyric acid channels are thought to play a critical role in the memory of food experience. Gene expression levels of Amoa1, Rdl, and trpl were evaluated in parallel in the bee brain because these genes code for the cellular targets of monoterpenoids and some pesticides and neural circuits of memory require their expression. The miticide impaired the PER to odors in the 3 wk following treatment. Short-term and long-term olfactory memories were improved months after introduction of the monoterpenoids into the beehives. Chronic exposure to the miticide had significant effects on Amoa1, Rdl, and trpl gene expressions and modified seasonal changes in the expression of these genes in the brain. The decrease of expression of these genes in winter could partly explain the improvement of memory. The present study has led to new insights into alternative treatments, especially on their effects on memory and expression of selected genes involved in this cognitive function. Environ Toxicol Chem 2017;36:337-345. © 2016 SETAC. © 2016 SETAC.

  20. Predicting Autism Spectrum Disorder Using Blood-based Gene Expression Signatures and Machine Learning

    PubMed Central

    Oh, Dong Hoon; Kim, Il Bin; Kim, Seok Hyeon; Ahn, Dong Hyun

    2017-01-01

    Objective The aim of this study was to identify a transcriptomic signature that could be used to classify subjects with autism spectrum disorder (ASD) compared to controls on the basis of blood gene expression profiles. The gene expression profiles could ultimately be used as diagnostic biomarkers for ASD. Methods We used the published microarray data (GSE26415) from the Gene Expression Omnibus database, which included 21 young adults with ASD and 21 age- and sex-matched unaffected controls. Nineteen differentially expressed probes were identified from a training dataset (n=26, 13 ASD cases and 13 controls) using the limma package in R language (adjusted p value <0.05) and were further analyzed in a test dataset (n=16, 8 ASD cases and 8 controls) using machine learning algorithms. Results Hierarchical cluster analysis showed that subjects with ASD were relatively well-discriminated from controls. Based on the support vector machine and K-nearest neighbors analysis, validation of 19-DE probes with a test dataset resulted in an overall class prediction accuracy of 93.8% as well as a sensitivity and specificity of 100% and 87.5%, respectively. Conclusion The results of our exploratory study suggest that the gene expression profiles identified from the peripheral blood samples of young adults with ASD can be used to identify a biological signature for ASD. Further study using a larger cohort and more homogeneous datasets is required to improve the diagnostic accuracy. PMID:28138110

  1. Two novel gene expression systems based on the yeasts Schwanniomyces occidentalis and Pichia stipitis.

    PubMed

    Piontek, M; Hagedorn, J; Hollenberg, C P; Gellissen, G; Strasser, A W

    1998-09-01

    Two non-Saccharomyces yeasts have been developed as hosts for heterologous gene expression. The celD gene from Clostridium thermocellum, encoding a heat-stable cellulase, served as the test sequence. The first system is based on the amylolytic species Schwanniomyces occidentalis, the second on the xylolytic species Pichia stipitis. The systems comprise auxotrophic host strains (trp5 in the case of S. occidentalis; trp5-10, his3 in the case of P. stipitis) and suitable transformation vectors. Vector components consist of an S. occidentalis-derived autonomously replicating sequence (SwARS) and the Saccharomyces cerevisiae-derived TRP5 sequence for plasmid propagation and selection in the yeast hosts, an ori and an ampicillin-resistance sequence for propagation and selection in a bacterial host. A range of vectors has been engineered employing different promoter elements for heterologous gene expression control in both species. Homologous elements derived from highly expressed genes of the respective hosts appeared to be of superior quality: in the case of S. occidentalis that of the GAM1 gene, in the case of P. stipitis that of the XYL1 gene. Further elements tested are the S. cerevisiae-derived ADH1 and PDC1 promoter sequences.

  2. Correction of sequence-based artifacts in serial analysis of gene expression.

    PubMed

    Akmaev, Viatcheslav R; Wang, Clarence J

    2004-05-22

    Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencing-based methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their low-random incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts. We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions. The correction algorithm, SAGEScreen, is a multi-step procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similar-sequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags. The SAGEScreen software is available for academic users from the first author.

  3. ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data

    PubMed Central

    2011-01-01

    RNA-Seq and microarray platforms have emerged as important tools for detecting changes in gene expression and RNA processing in biological samples. We present ExpressionPlot, a software package consisting of a default back end, which prepares raw sequencing or Affymetrix microarray data, and a web-based front end, which offers a biologically centered interface to browse, visualize, and compare different data sets. Download and installation instructions, a user's manual, discussion group, and a prototype are available at http://expressionplot.com/. PMID:21797991

  4. Microarray Based Gene Expression Analysis of Murine Brown and Subcutaneous Adipose Tissue: Significance with Human

    PubMed Central

    Boparai, Ravneet K.; Kondepudi, Kanthi Kiran; Mantri, Shrikant; Bishnoi, Mahendra

    2015-01-01

    Background Two types of adipose tissues, white (WAT) and brown (BAT) are found in mammals. Increasingly novel strategies are being proposed for the treatment of obesity and its associated complications by altering amount and/or activity of BAT using mouse models. Methodology/Principle Findings The present study was designed to: (a) investigate the differential expression of genes in LACA mice subcutaneous WAT (sWAT) and BAT using mouse DNA microarray, (b) to compare mouse differential gene expression with previously published human data; to understand any inter- species differences between the two and (c) to make a comparative assessment with C57BL/6 mouse strain. In mouse microarray studies, over 7003, 1176 and 401 probe sets showed more than two-fold, five-fold and ten-fold change respectively in differential expression between murine BAT and WAT. Microarray data was validated using quantitative RT-PCR of key genes showing high expression in BAT (Fabp3, Ucp1, Slc27a1) and sWAT (Ms4a1, H2-Ob, Bank1) or showing relatively low expression in BAT (Pgk1, Cox6b1) and sWAT (Slc20a1, Cd74). Multi-omic pathway analysis was employed to understand possible links between the organisms. When murine two fold data was compared with published human BAT and sWAT data, 90 genes showed parallel differential expression in both mouse and human. Out of these 90 genes, 46 showed same pattern of differential expression whereas the pattern was opposite for the remaining 44 genes. Based on our microarray results and its comparison with human data, we were able to identify genes (targets) (a) which can be studied in mouse model systems to extrapolate results to human (b) where caution should be exercised before extrapolation of murine data to human. Conclusion Our study provides evidence for inter species (mouse vs human) differences in differential gene expression between sWAT and BAT. Critical understanding of this data may help in development of novel ways to engineer one form of adipose

  5. Microarray based gene expression analysis of murine brown and subcutaneous adipose tissue: significance with human.

    PubMed

    Baboota, Ritesh K; Sarma, Siddhartha M; Boparai, Ravneet K; Kondepudi, Kanthi Kiran; Mantri, Shrikant; Bishnoi, Mahendra

    2015-01-01

    Two types of adipose tissues, white (WAT) and brown (BAT) are found in mammals. Increasingly novel strategies are being proposed for the treatment of obesity and its associated complications by altering amount and/or activity of BAT using mouse models. The present study was designed to: (a) investigate the differential expression of genes in LACA mice subcutaneous WAT (sWAT) and BAT using mouse DNA microarray, (b) to compare mouse differential gene expression with previously published human data; to understand any inter- species differences between the two and (c) to make a comparative assessment with C57BL/6 mouse strain. In mouse microarray studies, over 7003, 1176 and 401 probe sets showed more than two-fold, five-fold and ten-fold change respectively in differential expression between murine BAT and WAT. Microarray data was validated using quantitative RT-PCR of key genes showing high expression in BAT (Fabp3, Ucp1, Slc27a1) and sWAT (Ms4a1, H2-Ob, Bank1) or showing relatively low expression in BAT (Pgk1, Cox6b1) and sWAT (Slc20a1, Cd74). Multi-omic pathway analysis was employed to understand possible links between the organisms. When murine two fold data was compared with published human BAT and sWAT data, 90 genes showed parallel differential expression in both mouse and human. Out of these 90 genes, 46 showed same pattern of differential expression whereas the pattern was opposite for the remaining 44 genes. Based on our microarray results and its comparison with human data, we were able to identify genes (targets) (a) which can be studied in mouse model systems to extrapolate results to human (b) where caution should be exercised before extrapolation of murine data to human. Our study provides evidence for inter species (mouse vs human) differences in differential gene expression between sWAT and BAT. Critical understanding of this data may help in development of novel ways to engineer one form of adipose tissue to another using murine model with focus on

  6. A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression.

    PubMed

    Yan, Xiting; Liang, Anqi; Gomez, Jose; Cohn, Lauren; Zhao, Hongyu; Chupp, Geoffrey L

    2017-06-20

    Distance based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. An alternative method to examine disease phenotypes is to use pre-defined biological pathways. These pathways have been shown to be perturbed in different ways in different subjects who have similar clinical features. We hypothesize that differences in the expressions of genes in a given pathway are more predictive of differences in biological differences compared to standard approaches and if integrated into clustering analysis will enhance the robustness and accuracy of the clustering method. To examine this hypothesis, we developed a novel computational method to assess the biological differences between samples using gene expression data by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Pre-defined biological pathways were downloaded and genes in each pathway were used to cluster samples using the Gaussian mixture model. The clustering results across different pathways were then summarized to calculate the pathway-based distance score between samples. This method was applied to both simulated and real data sets and compared to the traditional Euclidean distance and another pathway-based clustering method, Pathifier. The results show that the pathway-based distance score performs significantly better than the Euclidean distance, especially when the heterogeneity is low and genes in the same pathways are correlated. Compared to Pathifier, we demonstrated that our approach achieves higher accuracy and robustness for small pathways. When the pathway size is large, by downsampling the pathways into smaller pathways, our

  7. A novel sparse coding algorithm for classification of tumors based on gene expression data.

    PubMed

    Kolali Khormuji, Morteza; Bazrafkan, Mehrnoosh

    2016-06-01

    High-dimensional genomic and proteomic data play an important role in many applications in medicine such as prognosis of diseases, diagnosis, prevention and molecular biology, to name a few. Classifying such data is a challenging task due to the various issues such as curse of dimensionality, noise and redundancy. Recently, some researchers have used the sparse representation (SR) techniques to analyze high-dimensional biological data in various applications in classification of cancer patients based on gene expression datasets. A common problem with all SR-based biological data classification methods is that they cannot utilize the topological (geometrical) structure of data. More precisely, these methods transfer the data into sparse feature space without preserving the local structure of data points. In this paper, we proposed a novel SR-based cancer classification algorithm based on gene expression data that takes into account the geometrical information of all data. Precisely speaking, we incorporate the local linear embedding algorithm into the sparse coding framework, by which we can preserve the geometrical structure of all data. For performance comparison, we applied our algorithm on six tumor gene expression datasets, by which we demonstrate that the proposed method achieves higher classification accuracy than state-of-the-art SR-based tumor classification algorithms.

  8. How does gene expression level contribute to thermophilic adaptation of prokaryotes? An exploration based on predictors.

    PubMed

    Wang, Ji; Ma, Bin-Guang; Zhang, Hong-Yu; Chen, Ling-Ling; Zhang, Shi-Cui

    2008-09-15

    By analyzing the predicted gene expression levels of 33 prokaryotes with living temperature span from <10 degrees C to >100 degrees C, a universal positive correlation was found between the percentage of predicted highly expressed genes and the organisms' optimal growth temperature. A physical interpretation of the correlation revealed that highly expressed genes are statistically more thermostable than lowly expressed genes. These findings show the possibility of the significant contribution of gene expression level to the prokaryotic thermal adaptation and provide evidence for the translational selection pressure on the thermostability of natural proteins during evolution.

  9. A stochastic model for optimizing composite predictors based on gene expression profiles.

    PubMed

    Ramanathan, Murali

    2003-07-01

    This project was done to develop a mathematical model for optimizing composite predictors based on gene expression profiles from DNA arrays and proteomics. The problem was amenable to a formulation and solution analogous to the portfolio optimization problem in mathematical finance: it requires the optimization of a quadratic function subject to linear constraints. The performance of the approach was compared to that of neighborhood analysis using a data set containing cDNA array-derived gene expression profiles from 14 multiple sclerosis patients receiving intramuscular inteferon-beta1a. The Markowitz portfolio model predicts that the covariance between genes can be exploited to construct an efficient composite. The model predicts that a composite is not needed for maximizing the mean value of a treatment effect: only a single gene is needed, but the usefulness of the effect measure may be compromised by high variability. The model optimized the composite to yield the highest mean for a given level of variability or the least variability for a given mean level. The choices that meet this optimization criteria lie on a curve of composite mean vs. composite variability plot referred to as the "efficient frontier." When a composite is constructed using the model, it outperforms the composite constructed using the neighborhood analysis method. The Markowitz portfolio model may find potential applications in constructing composite biomarkers and in the pharmacogenomic modeling of treatment effects derived from gene expression endpoints.

  10. Gene expression-based classifications of fibroadenomas and phyllodes tumours of the breast.

    PubMed

    Vidal, Maria; Peg, Vicente; Galván, Patricia; Tres, Alejandro; Cortés, Javier; Ramón y Cajal, Santiago; Rubio, Isabel T; Prat, Aleix

    2015-06-01

    using gene expression-based data is feasible and might provide clinically useful biological and prognostic information.

  11. Lineage relationship of prostate cancer cell types based on gene expression.

    PubMed

    Pascal, Laura E; Vêncio, Ricardo Zn; Vessella, Robert L; Ware, Carol B; Vêncio, Eneida F; Denyer, Gareth; Liu, Alvin Y

    2011-05-23

    Prostate tumor heterogeneity is a major factor in disease management. Heterogeneity could be due to multiple cancer cell types with distinct gene expression. Of clinical importance is the so-called cancer stem cell type. Cell type-specific transcriptomes are used to examine lineage relationship among cancer cell types and their expression similarity to normal cell types including stem/progenitor cells. Transcriptomes were determined by Affymetrix DNA array analysis for the following cell types. Putative prostate progenitor cell populations were characterized and isolated by expression of the membrane transporter ABCG2. Stem cells were represented by embryonic stem and embryonal carcinoma cells. The cancer cell types were Gleason pattern 3 (glandular histomorphology) and pattern 4 (aglandular) sorted from primary tumors, cultured prostate cancer cell lines originally established from metastatic lesions, xenografts LuCaP 35 (adenocarcinoma phenotype) and LuCaP 49 (neuroendocrine/small cell carcinoma) grown in mice. No detectable gene expression differences were detected among serial passages of the LuCaP xenografts. Based on transcriptomes, the different cancer cell types could be clustered into a luminal-like grouping and a non-luminal-like (also not basal-like) grouping. The non-luminal-like types showed expression more similar to that of stem/progenitor cells than the luminal-like types. However, none showed expression of stem cell genes known to maintain stemness. Non-luminal-like types are all representatives of aggressive disease, and this could be attributed to the similarity in overall gene expression to stem and progenitor cell types.

  12. Recent developments in StemBase: a tool to study gene expression in human and murine stem cells

    PubMed Central

    Sandie, Reatha; Palidwor, Gareth A; Huska, Matthew R; Porter, Christopher J; Krzyzanowski, Paul M; Muro, Enrique M; Perez-Iratxeta, Carolina; Andrade-Navarro, Miguel A

    2009-01-01

    Background Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. Findings Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments. Conclusion StemBase can be used to study gene expression in human and murine stem cells and is available at . PMID:19284540

  13. Gene expression-based prognostic signatures in lung cancer: ready for clinical use?

    PubMed

    Subramanian, Jyothi; Simon, Richard

    2010-04-07

    A substantial number of studies have reported the development of gene expression-based prognostic signatures for lung cancer. The ultimate aim of such studies should be the development of well-validated clinically useful prognostic signatures that improve therapeutic decision making beyond current practice standards. We critically reviewed published studies reporting the development of gene expression-based prognostic signatures for non-small cell lung cancer to assess the progress made toward this objective. Studies published between January 1, 2002, and February 28, 2009, were identified through a PubMed search. Following hand-screening of abstracts of the identified articles, 16 were selected as relevant. Those publications were evaluated in detail for appropriateness of the study design, statistical validation of the prognostic signature on independent datasets, presentation of results in an unbiased manner, and demonstration of medical utility for the new signature beyond that obtained using existing treatment guidelines. Based on this review, we found little evidence that any of the reported gene expression signatures are ready for clinical application. We also found serious problems in the design and analysis of many of the studies. We suggest a set of guidelines to aid the design, analysis, and evaluation of prognostic signature studies. These guidelines emphasize the importance of focused study planning to address specific medically important questions and the use of unbiased analysis methods to evaluate whether the resulting signatures provide evidence of medical utility beyond standard of care-based prognostic factors.

  14. Ontology based molecular signatures for immune cell types via gene expression analysis

    PubMed Central

    2013-01-01

    Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649

  15. The distribution-based p-value for the outlier sum in differential gene expression analysis.

    PubMed

    Chen, Lin-An; Chen, Dung-Tsa; Chan, Wenyaw

    2010-03-01

    Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes.

  16. Characteristics and Validation Techniques for PCA-Based Gene-Expression Signatures

    PubMed Central

    Welsh, Eric A.

    2017-01-01

    Background. Many gene-expression signatures exist for describing the biological state of profiled tumors. Principal Component Analysis (PCA) can be used to summarize a gene signature into a single score. Our hypothesis is that gene signatures can be validated when applied to new datasets, using inherent properties of PCA. Results. This validation is based on four key concepts. Coherence: elements of a gene signature should be correlated beyond chance. Uniqueness: the general direction of the data being examined can drive most of the observed signal. Robustness: if a gene signature is designed to measure a single biological effect, then this signal should be sufficiently strong and distinct compared to other signals within the signature. Transferability: the derived PCA gene signature score should describe the same biology in the target dataset as it does in the training dataset. Conclusions. The proposed validation procedure ensures that PCA-based gene signatures perform as expected when applied to datasets other than those that the signatures were trained upon. Complex signatures, describing multiple independent biological components, are also easily identified. PMID:28265563

  17. Combining Click Chemistry-Based Proteomics With Dox-Inducible Gene Expression.

    PubMed

    Gebert, J; Schnölzer, M; Warnken, U; Kopitz, J

    2017-01-01

    Inactivating mutations in single genes can trigger, prevent, promote, or alleviate diseases. Identifying such disease-related genes is a main pillar of medical research. Since proteins play a crucial role in mediating these effects, their impact on the diseased cells' proteome including posttranslational modifications has to be elucidated for a detailed understanding of the role of these genes in the disease process. In complex disorders, like cancer, several genes contribute to the disease process, thereby hampering the assignment of a proteomic change to the corresponding causative gene. To enable comprehensive screening for the impact of inactivation of a gene, e.g., loss of a tumor suppressor in cancer, on the cellular proteome, we present a strategy based on combination of three technologies that is recombinase-mediated cassette exchange, click chemistry, and mass spectrometry. The methodology is exemplified by the analysis of the proteomic changes induced by the loss of a tumor suppressor gene in colorectal cancer cells. To demonstrate the applicability to screen for posttranslational modification changes, we also describe the analysis of protein glycosylation changes caused by the tumor suppressor inactivation. In principle, this strategy can be applied to analyze the effects of any gene of interest on protein expression as well as posttranslational modification by glycosylation. Moreover adaptation of the strategy to an appropriate cell culture model has the potential for application on a broad range of diseases where the disease-promoting mutations have been identified. © 2017 Elsevier Inc. All rights reserved.

  18. Outcome-based profiling of astrocytic tumours identifies prognostic gene expression signatures which link molecular and morphology-based pathology.

    PubMed

    Beetz, Christian; Bergner, Sven; Brodoehl, Stefan; Brodhun, Michael; Ewald, Christian; Kalff, Rolf; Krüger, Jutta; Patt, Stephan; Kiehntopf, Michael; Deufel, Thomas

    2006-11-01

    Astrocytomas are intracranial malignancies for which invasive growth and high motility of tumour cells preclude total resection; the tumours usually recur in a more aggressive and, eventually, lethal form. Clinical outcome is highly variable and the accuracy of morphology-based prognostic statements is limited. In order to identify novel molecular markers for prognosis we obtained expression profiles of: i) tumours associated with particularly long recurrence-free intervals, ii) tumours which led to rapid patient death, and iii) tumour-free control brain. Unsupervised data analysis completely separated the three sample entities indicating a strong impact of the selection criteria on general gene expression. Consequently, significant numbers of specifically expressed genes could be identified for each entity. An extended set of tumours was then investigated by RT-PCR targeting 12 selected genes. Data from these experiments were summarised into a sample-specific index which assigns tumours to high- and low-risk groups as successfully as does morphology-based grading. Moreover, this index directly correlates with definite survival suggesting that integrated gene expression data allow individualised prognostic statements. We also analysed localisation of selected marker transcripts by in situ hybridization. Our finding of cell-specificity for some of these outcome-determining genes relates global expression data to the presence of morphological correlates of tumour behaviour and, thus, provides a link between morphology-based and molecular pathology. Our identification of expression signatures that are associated individually with clinical outcome confirms the prognostic relevance of gene expression data and, thus, represents a step towards eventually implementing molecular diagnosis into clinical practice in neuro-oncology.

  19. An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora.

    PubMed

    Mondego, Jorge Mc; Vidal, Ramon O; Carazzolle, Marcelo F; Tokuda, Eric K; Parizzi, Lucas P; Costa, Gustavo Gl; Pereira, Luiz Fp; Andrade, Alan C; Colombo, Carlos A; Vieira, Luiz Ge; Pereira, Gonçalo Ag

    2011-02-08

    Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency. Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories. We present the first comprehensive genome-wide transcript profile study of C. arabica and C

  20. An EST-based analysis identifies new genes and reveals distinctive gene expression features of Coffea arabica and Coffea canephora

    PubMed Central

    2011-01-01

    Background Coffee is one of the world's most important crops; it is consumed worldwide and plays a significant role in the economy of producing countries. Coffea arabica and C. canephora are responsible for 70 and 30% of commercial production, respectively. C. arabica is an allotetraploid from a recent hybridization of the diploid species, C. canephora and C. eugenioides. C. arabica has lower genetic diversity and results in a higher quality beverage than C. canephora. Research initiatives have been launched to produce genomic and transcriptomic data about Coffea spp. as a strategy to improve breeding efficiency. Results Assembling the expressed sequence tags (ESTs) of C. arabica and C. canephora produced by the Brazilian Coffee Genome Project and the Nestlé-Cornell Consortium revealed 32,007 clusters of C. arabica and 16,665 clusters of C. canephora. We detected different GC3 profiles between these species that are related to their genome structure and mating system. BLAST analysis revealed similarities between coffee and grape (Vitis vinifera) genes. Using KA/KS analysis, we identified coffee genes under purifying and positive selection. Protein domain and gene ontology analyses suggested differences between Coffea spp. data, mainly in relation to complex sugar synthases and nucleotide binding proteins. OrthoMCL was used to identify specific and prevalent coffee protein families when compared to five other plant species. Among the interesting families annotated are new cystatins, glycine-rich proteins and RALF-like peptides. Hierarchical clustering was used to independently group C. arabica and C. canephora expression clusters according to expression data extracted from EST libraries, resulting in the identification of differentially expressed genes. Based on these results, we emphasize gene annotation and discuss plant defenses, abiotic stress and cup quality-related functional categories. Conclusion We present the first comprehensive genome-wide transcript

  1. A contribution to the study of plant development evolution based on gene co-expression networks

    PubMed Central

    Romero-Campero, Francisco J.; Lucas-Reina, Eva; Said, Fatima E.; Romero, José M.; Valverde, Federico

    2013-01-01

    Phototrophic eukaryotes are among the most successful organisms on Earth due to their unparalleled efficiency at capturing light energy and fixing carbon dioxide to produce organic molecules. A conserved and efficient network of light-dependent regulatory modules could be at the bases of this success. This regulatory system conferred early advantages to phototrophic eukaryotes that allowed for specialization, complex developmental processes and modern plant characteristics. We have studied light-dependent gene regulatory modules from algae to plants employing integrative-omics approaches based on gene co-expression networks. Our study reveals some remarkably conserved ways in which eukaryotic phototrophs deal with day length and light signaling. Here we describe how a family of Arabidopsis transcription factors involved in photoperiod response has evolved from a single algal gene according to the innovation, amplification and divergence theory of gene evolution by duplication. These modifications of the gene co-expression networks from the ancient unicellular green algae Chlamydomonas reinhardtii to the modern brassica Arabidopsis thaliana may hint on the evolution and specialization of plants and other organisms. PMID:23935602

  2. SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies.

    PubMed

    Pirooznia, Mehdi; Seifuddin, Fayaz; Goes, Fernando S; Leek, Jeffrey T; Zandi, Peter P

    2013-03-11

    Surrogate variable analysis (SVA) is a powerful method to identify, estimate, and utilize the components of gene expression heterogeneity due to unknown and/or unmeasured technical, genetic, environmental, or demographic factors. These sources of heterogeneity are common in gene expression studies, and failing to incorporate them into the analysis can obscure results. Using SVA increases the biological accuracy and reproducibility of gene expression studies by identifying these sources of heterogeneity and correctly accounting for them in the analysis. Here we have developed a web application called SVAw (Surrogate variable analysis Web app) that provides a user friendly interface for SVA analyses of genome-wide expression studies. The software has been developed based on open source bioconductor SVA package. In our software, we have extended the SVA program functionality in three aspects: (i) the SVAw performs a fully automated and user friendly analysis workflow; (ii) It calculates probe/gene Statistics for both pre and post SVA analysis and provides a table of results for the regression of gene expression on the primary variable of interest before and after correcting for surrogate variables; and (iii) it generates a comprehensive report file, including graphical comparison of the outcome for the user. SVAw is a web server freely accessible solution for the surrogate variant analysis of high-throughput datasets and facilitates removing all unwanted and unknown sources of variation. It is freely available for use at http://psychiatry.igm.jhmi.edu/sva. The executable packages for both web and standalone application and the instruction for installation can be downloaded from our web site.

  3. Anticancer drug clustering in lung cancer based on gene expression profiles and sensitivity database

    PubMed Central

    Gemma, Akihiko; Li, Cai; Sugiyama, Yuka; Matsuda, Kuniko; Seike, Yoko; Kosaihira, Seiji; Minegishi, Yuji; Noro, Rintaro; Nara, Michiya; Seike, Masahiro; Yoshimura, Akinobu; Shionoya, Aki; Kawakami, Akiko; Ogawa, Naoki; Uesaka, Haruka; Kudoh, Shoji

    2006-01-01

    background The effect of current therapies in improving the survival of lung cancer patients remains far from satisfactory. It is consequently desirable to find more appropriate therapeutic opportunities based on informed insights. A molecular pharmacological analysis was undertaken to design an improved chemotherapeutic strategy for advanced lung cancer. Methods We related the cytotoxic activity of each of commonly used anti-cancer agents (docetaxel, paclitaxel, gemcitabine, vinorelbine, 5-FU, SN38, cisplatin (CDDP), and carboplatin (CBDCA)) to corresponding expression pattern in each of the cell lines using a modified NCI program. Results We performed gene expression analysis in lung cancer cell lines using cDNA filter and high-density oligonucleotide arrays. We also examined the sensitivity of these cell lines to these drugs via MTT assay. To obtain our reproducible gene-drug sensitivity correlation data, we separately analyzed two sets of lung cancer cell lines, namely 10 and 19. In our gene-drug correlation analyses, gemcitabine consistently belonged to an isolated cluster in a reproducible fashion. On the other hand, docetaxel, paclitaxel, 5-FU, SN-38, CBDCA and CDDP were gathered together into one large cluster. Conclusion These results suggest that chemotherapy regimens including gemcitabine should be evaluated in second-line chemotherapy in cases where the first-line chemotherapy did not include this drug. Gene expression-drug sensitivity correlations, as provided by the NCI program, may yield improved therapeutic options for treatment of specific tumor types. PMID:16813650

  4. An siRNA-based method for efficient silencing of gene expression in mature brown adipocytes

    PubMed Central

    Isidor, Marie S.; Winther, Sally; Basse, Astrid L.; Petersen, M. Christine H.; Cannon, Barbara; Nedergaard, Jan; Hansen, Jacob B.

    2016-01-01

    ABSTRACT Brown adipose tissue is a promising therapeutic target for opposing obesity, glucose intolerance and insulin resistance. The ability to modulate gene expression in mature brown adipocytes is important to understand brown adipocyte function and delineate novel regulatory mechanisms of non-shivering thermogenesis. The aim of this study was to optimize a lipofection-based small interfering RNA (siRNA) transfection protocol for efficient silencing of gene expression in mature brown adipocytes. We determined that a critical parameter was to deliver the siRNA to mature adipocytes by reverse transfection, i.e. transfection of non-adherent cells. Using this protocol, we effectively knocked down both high- and low-abundance transcripts in a model of mature brown adipocytes (WT-1) as well as in primary mature mouse brown adipocytes. A functional consequence of the knockdown was confirmed by an attenuated increase in uncoupled respiration (thermogenesis) in response to β-adrenergic stimulation of mature WT-1 brown adipocytes transfected with uncoupling protein 1 siRNA. Efficient gene silencing was also obtained in various mouse and human white adipocyte models (3T3-L1, primary mouse white adipocytes, hMADS) with the ability to undergo “browning.” In summary, we report an easy and versatile reverse siRNA transfection protocol to achieve specific silencing of gene expression in various models of mature brown and browning-competent white adipocytes, including primary cells. PMID:27386153

  5. An siRNA-based method for efficient silencing of gene expression in mature brown adipocytes.

    PubMed

    Isidor, Marie S; Winther, Sally; Basse, Astrid L; Petersen, M Christine H; Cannon, Barbara; Nedergaard, Jan; Hansen, Jacob B

    2016-01-01

    Brown adipose tissue is a promising therapeutic target for opposing obesity, glucose intolerance and insulin resistance. The ability to modulate gene expression in mature brown adipocytes is important to understand brown adipocyte function and delineate novel regulatory mechanisms of non-shivering thermogenesis. The aim of this study was to optimize a lipofection-based small interfering RNA (siRNA) transfection protocol for efficient silencing of gene expression in mature brown adipocytes. We determined that a critical parameter was to deliver the siRNA to mature adipocytes by reverse transfection, i.e. transfection of non-adherent cells. Using this protocol, we effectively knocked down both high- and low-abundance transcripts in a model of mature brown adipocytes (WT-1) as well as in primary mature mouse brown adipocytes. A functional consequence of the knockdown was confirmed by an attenuated increase in uncoupled respiration (thermogenesis) in response to β-adrenergic stimulation of mature WT-1 brown adipocytes transfected with uncoupling protein 1 siRNA. Efficient gene silencing was also obtained in various mouse and human white adipocyte models (3T3-L1, primary mouse white adipocytes, hMADS) with the ability to undergo "browning." In summary, we report an easy and versatile reverse siRNA transfection protocol to achieve specific silencing of gene expression in various models of mature brown and browning-competent white adipocytes, including primary cells.

  6. Candidate genes and pathogenesis investigation for sepsis-related acute respiratory distress syndrome based on gene expression profile.

    PubMed

    Wang, Min; Yan, Jingjun; He, Xingxing; Zhong, Qiang; Zhan, Chengye; Li, Shusheng

    2016-04-18

    Acute respiratory distress syndrome (ARDS) is a potentially devastating form of acute inflammatory lung injury as well as a major cause of acute respiratory failure. Although researchers have made significant progresses in elucidating the pathophysiology of this complex syndrome over the years, the absence of a universal detail disease mechanism up until now has led to a series of practical problems for a definitive treatment. This study aimed to predict some genes or pathways associated with sepsis-related ARDS based on a public microarray dataset and to further explore the molecular mechanism of ARDS. A total of 122 up-regulated DEGs and 91 down-regulated differentially expressed genes (DEGs) were obtained. The up- and down-regulated DEGs were mainly involved in functions like mitotic cell cycle and pathway like cell cycle. Protein-protein interaction network of ARDS analysis revealed 20 hub genes including cyclin B1 (CCNB1), cyclin B2 (CCNB2) and topoisomerase II alpha (TOP2A). A total of seven transcription factors including forkhead box protein M1 (FOXM1) and 30 target genes were revealed in the transcription factor-target gene regulation network. Furthermore, co-cited genes including CCNB2-CCNB1 were revealed in literature mining for the relations ARDS related genes. Pathways like mitotic cell cycle were closed related with the development of ARDS. Genes including CCNB1, CCNB2 and TOP2A, as well as transcription factors like FOXM1 might be used as the novel gene therapy targets for sepsis related ARDS.

  7. Node-based learning of differential networks from multi-platform gene expression data.

    PubMed

    Ou-Yang, Le; Zhang, Xiao-Fei; Wu, Min; Li, Xiao-Li

    2017-06-01

    Recovering gene regulatory networks and exploring the network rewiring between two different disease states are important for revealing the mechanisms behind disease progression. The advent of high-throughput experimental techniques has enabled the possibility of inferring gene regulatory networks and differential networks using computational methods. However, most of existing differential network analysis methods are designed for single-platform data analysis and assume that differences between networks are driven by individual edges. Therefore, they cannot take into account the common information shared across different data platforms and may fail in identifying driver genes that lead to the change of network. In this study, we develop a node-based multi-view differential network analysis model to simultaneously estimate multiple gene regulatory networks and their differences from multi-platform gene expression data. Our model can leverage the strength across multiple data platforms to improve the accuracy of network inference and differential network estimation. Simulation studies demonstrate that our model can obtain more accurate estimations of gene regulatory networks and differential networks than other existing state-of-the-art models. We apply our model on TCGA ovarian cancer samples to identify network rewiring associated with drug resistance. We observe from our experiments that the hub nodes of our identified differential networks include known drug resistance-related genes and potential targets that are useful to improve the treatment of drug resistant tumors. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. A mRNA-based thermosensor controls expression of rhizobial heat shock genes

    PubMed Central

    Nocker, Andreas; Hausherr, Thomas; Balsiger, Sylvia; Krstulovic, Nila-Pia; Hennecke, Hauke; Narberhaus, Franz

    2001-01-01

    Expression of several heat shock operons, mainly coding for small heat shock proteins, is under the control of ROSE (repression of heat shock gene expression) in various rhizobial species. This negatively cis-acting element confers temperature control by preventing expression at physiological temperatures. We provide evidence that ROSE-mediated regulation occurs at the post-transcriptional level. A detailed mutational analysis of ROSE1–hspA translationally fused to lacZ revealed that its highly conserved 3′-half is required for repression at normal temperatures (30°C). The mRNA in this region is predicted to form an extended secondary structure that looks very similar in all 15 known ROSE elements. Nucleotides involved in base pairing are strongly conserved, whereas nucleotides in loop regions are more divergent. Base substitutions leading to derepression of the lacZ fusion at 30°C exclusively resided in potential stem structures. Optimised base pairing by elimination of a bulged residue and by introduction of complementary nucleotides in internal loops resulted in ROSE elements that were tightly repressed not only at normal but also at heat shock temperatures. We propose a model in which the temperature-regulated secondary structure of ROSE mRNA influences heat shock gene expression by controlling ribosome access to the ribosome-binding site. PMID:11726689

  9. A mRNA-based thermosensor controls expression of rhizobial heat shock genes.

    PubMed

    Nocker, A; Hausherr, T; Balsiger, S; Krstulovic, N P; Hennecke, H; Narberhaus, F

    2001-12-01

    Expression of several heat shock operons, mainly coding for small heat shock proteins, is under the control of ROSE (repression of heat shock gene expression) in various rhizobial species. This negatively cis-acting element confers temperature control by preventing expression at physiological temperatures. We provide evidence that ROSE-mediated regulation occurs at the post-transcriptional level. A detailed mutational analysis of ROSE(1)-hspA translationally fused to lacZ revealed that its highly conserved 3'-half is required for repression at normal temperatures (30 degrees C). The mRNA in this region is predicted to form an extended secondary structure that looks very similar in all 15 known ROSE elements. Nucleotides involved in base pairing are strongly conserved, whereas nucleotides in loop regions are more divergent. Base substitutions leading to derepression of the lacZ fusion at 30 degrees C exclusively resided in potential stem structures. Optimised base pairing by elimination of a bulged residue and by introduction of complementary nucleotides in internal loops resulted in ROSE elements that were tightly repressed not only at normal but also at heat shock temperatures. We propose a model in which the temperature-regulated secondary structure of ROSE mRNA influences heat shock gene expression by controlling ribosome access to the ribosome-binding site.

  10. Microarray-Based Analysis of Differential Gene Expression between Infective and Noninfective Larvae of Strongyloides stercoralis

    PubMed Central

    Ramanathan, Roshan; Varma, Sudhir; Ribeiro, José M. C.; Myers, Timothy G.; Nolan, Thomas J.; Abraham, David; Lok, James B.; Nutman, Thomas B.

    2011-01-01

    Background Differences between noninfective first-stage (L1) and infective third-stage (L3i) larvae of parasitic nematode Strongyloides stercoralis at the molecular level are relatively uncharacterized. DNA microarrays were developed and utilized for this purpose. Methods and Findings Oligonucleotide hybridization probes for the array were designed to bind 3,571 putative mRNA transcripts predicted by analysis of 11,335 expressed sequence tags (ESTs) obtained as part of the Nematode EST project. RNA obtained from S. stercoralis L3i and L1 was co-hybridized to each array after labeling the individual samples with different fluorescent tags. Bioinformatic predictions of gene function were developed using a novel cDNA Annotation System software. We identified 935 differentially expressed genes (469 L3i-biased; 466 L1-biased) having two-fold expression differences or greater and microarray signals with a p value<0.01. Based on a functional analysis, L1 larvae have a larger number of genes putatively involved in transcription (p = 0.004), and L3i larvae have biased expression of putative heat shock proteins (such as hsp-90). Genes with products known to be immunoreactive in S. stercoralis-infected humans (such as SsIR and NIE) had L3i biased expression. Abundantly expressed L3i contigs of interest included S. stercoralis orthologs of cytochrome oxidase ucr 2.1 and hsp-90, which may be potential chemotherapeutic targets. The S. stercoralis ortholog of fatty acid and retinol binding protein-1, successfully used in a vaccine against Ancylostoma ceylanicum, was identified among the 25 most highly expressed L3i genes. The sperm-containing glycoprotein domain, utilized in a vaccine against the nematode Cooperia punctata, was exclusively found in L3i biased genes and may be a valuable S. stercoralis target of interest. Conclusions A new DNA microarray tool for the examination of S. stercoralis biology has been developed and provides new and valuable insights regarding

  11. Microarray-Based Gene Expression Profiling to Elucidate Effectiveness of Fermented Codonopsis lanceolata in Mice

    PubMed Central

    Choi, Woon Yong; Kim, Ji Seon; Park, Sung Jin; Ma, Choong Je; Lee, Hyeon Yong

    2014-01-01

    In this study, the effect of Codonopsis lanceolata fermented by lactic acid on controlling gene expression levels related to obesity was observed in an oligonucleotide chip microarray. Among 8170 genes, 393 genes were up regulated and 760 genes were down regulated in feeding the fermented C. lanceolata (FCL). Another 374 genes were up regulated and 527 genes down regulated without feeding the sample. The genes were not affected by the FCL sample. It was interesting that among those genes, Chytochrome P450, Dmbt1, LOC76487, and thyroid hormones, etc., were mostly up or down regulated. These genes are more related to lipid synthesis. We could conclude that the FCL possibly controlled the gene expression levels related to lipid synthesis, which resulted in reducing obesity. However, more detailed protein expression experiments should be carried out. PMID:24717412

  12. Microarray-based gene expression profiling to elucidate effectiveness of fermented Codonopsis lanceolata in mice.

    PubMed

    Choi, Woon Yong; Kim, Ji Seon; Park, Sung Jin; Ma, Choong Je; Lee, Hyeon Yong

    2014-04-08

    In this study, the effect of Codonopsis lanceolata fermented by lactic acid on controlling gene expression levels related to obesity was observed in an oligonucleotide chip microarray. Among 8170 genes, 393 genes were up regulated and 760 genes were down regulated in feeding the fermented C. lanceolata (FCL). Another 374 genes were up regulated and 527 genes down regulated without feeding the sample. The genes were not affected by the FCL sample. It was interesting that among those genes, Chytochrome P450, Dmbt1, LOC76487, and thyroid hormones, etc., were mostly up or down regulated. These genes are more related to lipid synthesis. We could conclude that the FCL possibly controlled the gene expression levels related to lipid synthesis, which resulted in reducing obesity. However, more detailed protein expression experiments should be carried out.

  13. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction

    PubMed Central

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining. PMID:26751200

  14. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  15. Co-expressed Pathways DataBase for Tomato: a database to predict pathways relevant to a query gene.

    PubMed

    Narise, Takafumi; Sakurai, Nozomu; Obayashi, Takeshi; Ohta, Hiroyuki; Shibata, Daisuke

    2017-06-05

    Gene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes. In this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA. We developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato . The database allows users to predict pathways that are relevant to a

  16. Identifying differential networks based on multi-platform gene expression data.

    PubMed

    Ou-Yang, Le; Yan, Hong; Zhang, Xiao-Fei

    2016-12-20

    Exploring how the structure of a gene regulatory network differs between two different disease states is fundamental for understanding the biological mechanisms behind disease development and progression. Recently, with rapid advances in microarray technologies, gene expression profiles of the same patients can be collected from multiple microarray platforms. However, previous differential network analysis methods were usually developed based on a single type of platform, which could not utilize the common information shared across different platforms. In this study, we introduce a multi-view differential network analysis model to infer the differential network between two different patient groups based on gene expression profiles collected from multiple platforms. Unlike previous differential network analysis models that need to analyze each platform separately, our model can draw support from multiple data platforms to jointly estimate the differential networks and produce more accurate and reliable results. Our simulation studies demonstrate that our method consistently outperforms other available differential network analysis methods. We also applied our method to identify network rewiring associated with platinum resistance using TCGA ovarian cancer samples. The experimental results demonstrate that the hub genes in our identified differential networks on the PI3K/AKT/mTOR pathway play an important role in drug resistance.

  17. Identification of key genes associated with the human abdominal aortic aneurysm based on the gene expression profile

    PubMed Central

    CHEN, XUDONG; ZHENG, CHENGFEI; HE, YUNJUN; TIAN, LU; LI, JIANHUI; LI, DONGLIN; JIN, WEI; LI, MING; ZHENG, SHUSEN

    2015-01-01

    The present study was aimed at screening the key genes associated with abdominal aortic aneurysm (AAA) in the neck, and to investigate the molecular mechanism underlying the development of AAA. The gene expression profile, GSE47472, including 14 AAA neck samples and eight donor controls, was downloaded from the Gene Expression Omnibus database. The total AAA samples were grouped into two types to avoid bias. Differentially expressed genes (DEGs) were screened in patients with AAA and subsequently compared with donor controls using linear models for microarray data, or the Limma package in R, followed by gene ontology enrichment analysis. Furthermore, a protein-protein interaction (PPI) network based on the DEGs was constructed to detect highly connected regions using a Cytoscape plugin. In total, 388 DEGs in the AAA samples were identified. These DEGs were predominantly associated with limb development, including embryonic limb development and appendage development. Nuclear receptor co-repressor 1 (NCOR1), histone 4 (H4), E2F transcription factor 4 (E2F4) and hepatocyte nuclear factor 4α (HNF4A) were the four transcription factors associated with AAA. Furthermore, HNF4A indirectly interacted with the other three transcription factors. Additionally, six clusters were selected from the PPI network. The DEG screening process and the construction of an interaction network enabled an understanding of the mechanism of AAA to be gleaned. HNF4A may exert an important role in AAA development through its interactions with the three other transcription factors (E2F4, NCOR1 and H4), and the mechanism of this coordinated regulation of the transcription factors in AAA may provide a suitable target for the development of therapeutic intervention strategies. PMID:26498477

  18. Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types.

    PubMed

    Li, Qiyuan; Stram, Alexander; Chen, Constance; Kar, Siddhartha; Gayther, Simon; Pharoah, Paul; Haiman, Christopher; Stranger, Barbara; Kraft, Peter; Freedman, Matthew L

    2014-10-01

    The majority of trait-associated loci discovered through genome-wide association studies are located outside of known protein coding regions. Consequently, it is difficult to ascertain the mechanism underlying these variants and to pinpoint the causal alleles. Expression quantitative trait loci (eQTLs) provide an organizing principle to address both of these issues. eQTLs are genetic loci that correlate with RNA transcript levels. Large-scale data sets such as the Cancer Genome Atlas (TCGA) provide an ideal opportunity to systematically evaluate eQTLs as they have generated multiple data types on hundreds of samples. We evaluated the determinants of gene expression (germline variants and somatic copy number and methylation) and performed cis-eQTL analyses for mRNA expression and miRNA expression in five tumor types (breast, colon, kidney, lung and prostate). We next tested 149 known cancer risk loci for eQTL effects, and observed that 42 (28.2%) were significantly associated with at least one transcript. Lastly, we described a fine-mapping strategy for these 42 eQTL target-gene associations based on an integrated strategy that combines the eQTL level of significance and the regulatory potential as measured by DNaseI hypersensitivity. For each of the risk loci, our analyses suggested 1 to 81 candidate causal variants that may be prioritized for downstream functional analysis. In summary, our study provided a comprehensive landscape of the genetic determinants of gene expression in different tumor types and ranked the genes and loci for further functional assessment of known cancer risk loci. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types

    PubMed Central

    Li, Qiyuan; Stram, Alexander; Chen, Constance; Kar, Siddhartha; Gayther, Simon; Pharoah, Paul; Haiman, Christopher; Stranger, Barbara; Kraft, Peter; Freedman, Matthew L.

    2014-01-01

    The majority of trait-associated loci discovered through genome-wide association studies are located outside of known protein coding regions. Consequently, it is difficult to ascertain the mechanism underlying these variants and to pinpoint the causal alleles. Expression quantitative trait loci (eQTLs) provide an organizing principle to address both of these issues. eQTLs are genetic loci that correlate with RNA transcript levels. Large-scale data sets such as the Cancer Genome Atlas (TCGA) provide an ideal opportunity to systematically evaluate eQTLs as they have generated multiple data types on hundreds of samples. We evaluated the determinants of gene expression (germline variants and somatic copy number and methylation) and performed cis-eQTL analyses for mRNA expression and miRNA expression in five tumor types (breast, colon, kidney, lung and prostate). We next tested 149 known cancer risk loci for eQTL effects, and observed that 42 (28.2%) were significantly associated with at least one transcript. Lastly, we described a fine-mapping strategy for these 42 eQTL target–gene associations based on an integrated strategy that combines the eQTL level of significance and the regulatory potential as measured by DNaseI hypersensitivity. For each of the risk loci, our analyses suggested 1 to 81 candidate causal variants that may be prioritized for downstream functional analysis. In summary, our study provided a comprehensive landscape of the genetic determinants of gene expression in different tumor types and ranked the genes and loci for further functional assessment of known cancer risk loci. PMID:24907074

  20. Gene expression based evidence of innate immune response activation in the epithelium with oral lichen planus

    PubMed Central

    Adami, Guy R.; Yeung, Alexander C.F.; Stucki, Grant; Kolokythas, Antonia; Sroussi, Herve Y.; Cabay, Robert J.; Kuzin, Igor; Schwartz, Joel L.

    2014-01-01

    Objective Oral lichen planus (OLP) is a disease of the oral mucosa of unknown cause producing lesions with an intense band-like inflammatory infiltrate of T cells to the subepithelium and keratinocyte cell death. We performed gene expression analysis of the oral epithelium of lesions in subjects with OLP and its sister disease, oral lichenoid reaction (OLR), in order to better understand the role of the keratinocytes in these diseases. Design Fourteen patients with OLP or OLR were included in the study, along with a control group of 23 subjects with a variety of oral diseases and a normal group of 17 subjects with no clinically visible mucosal abnormalities. Various proteins have been associated with OLP, based on detection of secreted proteins or changes in RNA levels in tissue samples consisting of epithelium, stroma, and immune cells. The mRNA level of twelve of these genes expressed in the epithelium was tested in the three groups. Results Four genes showed increased expression in the epithelium of OLP patients: CD14, CXCL1, IL8, and TLR1, and at least two of these proteins, TLR1 and CXCL1, were expressed at substantial levels in oral keratinocytes. Conclusions Because of the large accumulation of T cells in lesions of OLP it has long been thought to be an adaptive immunity malfunction. We provide evidence that there is increased expression of innate immune genes in the epithelium with this illness, suggesting a role for this process in the disease and a possible target for treatment. PMID:24581860

  1. Function-Based Metagenomic Library Screening and Heterologous Expression Strategy for Genes Encoding Phosphatase Activity.

    PubMed

    Villamizar, Genis A Castillo; Nacke, Heiko; Daniel, Rolf

    2017-01-01

    The release of phosphate from inorganic and organic phosphorus compounds can be mediated enzymatically. Phosphate-releasing enzymes, comprising acid and alkaline phosphatases, are recognized as useful biocatalysts in applications such as plant and animal nutrition, bioremediation and diagnostic analysis. Metagenomic approaches provide access to novel phosphatase-encoding genes. Here, we describe a function-based screening approach for rapid identification of genes conferring phosphatase activity from small-insert and large-insert metagenomic libraries derived from various environments. This approach bears the potential for discovery of entirely novel phosphatase families or subfamilies and members of known enzyme classes hydrolyzing phosphomonoester bonds such as phytases. In addition, we provide a strategy for efficient heterologous phosphatase gene expression.

  2. Circular RNA and gene expression profiles in gastric cancer based on microarray chip technology.

    PubMed

    Sui, Weiguo; Shi, Zhoufang; Xue, Wen; Ou, Minglin; Zhu, Ying; Chen, Jiejing; Lin, Hua; Liu, Fuhua; Dai, Yong

    2017-03-01

    The aim of the present study was to screen gastric cancer (GC) tissue and adjacent tissue for differences in mRNA and circular (circRNA) expression, to analyze the differences in circRNA and mRNA expression, and to investigate the circRNA expression in gastric carcinoma and its mechanism. circRNA and mRNA differential expression profiles generated using Agilent microarray technology were analyzed in the GC tissues and adjacent tissues. qRT-PCR was used to verify the differential expression of circRNAs and mRNAs according to the interactions between circRNAs and miRNAs as well as the possible existence of miRNA and mRNA interactions. We found that: i) the circRNA expression profile revealed 1,285 significant differences in circRNA expression, with circRNA expression downregulated in 594 samples and upregulated in 691 samples via interactions with miRNAs. The qRT-PCR validation experiments showed that hsa_circRNA_400071, hsa_circRNA_000543 and hsa_circRNA_001959 expression was consistent with the microarray analysis results. ii) 29,112 genes were found in the GC tissues and adjacent tissues, including 5,460 differentially expressed genes. Among them, 2,390 differentially expressed genes were upregulated and 3,070 genes were downregulated. Gene Ontology (GO) analysis of the differentially expressed genes revealed these genes involved in biological process classification, cellular component classification and molecular function classification. Pathway analysis of the differentially expressed genes identified 83 significantly enriched genes, including 28 upregulated genes and 55 downregulated genes. iii) 69 differentially expressed circRNAs were found that might adsorb specific miRNAs to regulate the expression of their target gene mRNAs. The conclusions are: i) differentially expressed circRNAs had corresponding miRNA binding sites. These circRNAs regulated the expression of target genes through interactions with miRNAs and might become new molecular biomarkers for GC

  3. A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly.

    PubMed

    Goralski, Michal; Sobieszczanska, Paula; Obrepalska-Steplowska, Aleksandra; Swiercz, Aleksandra; Zmienko, Agnieszka; Figlerowicz, Marek

    2016-01-01

    Nicotiana benthamiana has been widely used in laboratories around the world for studying plant-pathogen interactions and posttranscriptional gene expression silencing. Yet the exploration of its transcriptome has lagged behind due to the lack of both adequate sequence information and genome-wide analysis tools, such as DNA microarrays. Despite the increasing use of high-throughput sequencing technologies, the DNA microarrays still remain a popular gene expression tool, because they are cheaper and less demanding regarding bioinformatics skills and computational effort. We designed a gene expression microarray with 103,747 60-mer probes, based on two recently published versions of N. benthamiana transcriptome (v.3 and v.5). Both versions were reconstructed from RNA-Seq data of non-strand-specific pooled-tissue libraries, so we defined the sense strand of the contigs prior to designing the probe. To accomplish this, we combined a homology search against Arabidopsis thaliana proteins and hybridization to a test 244k microarray containing pairs of probes, which represented individual contigs. We identified the sense strand in 106,684 transcriptome contigs and used this information to design an Nb-105k microarray on an Agilent eArray platform. Following hybridization of RNA samples from N. benthamiana roots and leaves we demonstrated that the new microarray had high specificity and sensitivity for detection of differentially expressed transcripts. We also showed that the data generated with the Nb-105k microarray may be used to identify incorrectly assembled contigs in the v.5 transcriptome, by detecting inconsistency in the gene expression profiles, which is indicated using multiple microarray probes that match the same v.5 primary transcripts. We provided a complete design of an oligonucleotide microarray that may be applied to the research of N. benthamiana transcriptome. This, in turn, will allow the N. benthamiana research community to take full advantage of

  4. A reference gene set for sex pheromone biosynthesis and degradation genes from the diamondback moth, Plutella xylostella, based on genome and transcriptome digital gene expression analyses.

    PubMed

    He, Peng; Zhang, Yun-Fei; Hong, Duan-Yang; Wang, Jun; Wang, Xing-Liang; Zuo, Ling-Hua; Tang, Xian-Fu; Xu, Wei-Ming; He, Ming

    2017-03-01

    comprehensive gene data set of sex pheromone biosynthesis and degradation enzyme related genes in DBM created by genome- and transcriptome-wide identification, characterization and expression profiling. Our findings provide a basis to better understand the function of genes with tissue enriched expression. The results also provide information on the genes involved in sex pheromone biosynthesis and degradation, and may be useful to identify potential gene targets for pest control strategies by disrupting the insect-insect communication using pheromone-based behavioral antagonists.

  5. Tissue-based microarray expression of genes predictive of metastasis in uveal melanoma and differentially expressed in metastatic uveal melanoma.

    PubMed

    Demirci, Hakan; Reed, David; Elner, Victor M

    2013-10-01

    To screen the microarray expression of CDH1, ECM1, EIF1B, FXR1, HTR2B, ID2, LMCD1, LTA4H, MTUS1, RAB31, ROBO1, and SATB1 genes which are predictive of primary uveal melanoma metastasis, and NFKB2, PTPN18, MTSS1, GADD45B, SNCG, HHIP, IL12B, CDK4, RPLP0, RPS17, RPS12 genes that are differentially expressed in metastatic uveal melanoma in normal whole human blood and tissues prone to metastatic involvement by uveal melanoma. We screened the GeneNote and GNF BioGPS databases for microarray analysis of genes predictive of primary uveal melanoma metastasis and those differentially expressed in metastatic uveal melanoma in normal whole blood, liver, lung and skin. Microarray analysis showed expression of all 22 genes in normal whole blood, liver, lung and skin, which are the most common sites of metastases. In the GNF BioGPS database, data for expression of the HHIP gene in normal whole blood and skin was not complete. Microarray analysis of genes predicting systemic metastasis of uveal melanoma and genes differentially expressed in metastatic uveal melanoma may not be used as a biomarker for metastasis in whole blood, liver, lung, and skin. Their expression in tissues prone to metastasis may suggest that they play a role in tropism of uveal melanoma metastasis to these tissues.

  6. A cell-based in vitro alternative to identify skin sensitizers by gene expression

    SciTech Connect

    Hooyberghs, Jef Schoeters, Elke; Lambrechts, Nathalie; Nelissen, Inge; Witters, Hilda; Schoeters, Greet; Heuvel, Rosette van den

    2008-08-15

    The ethical and economic burden associated with animal testing for assessment of skin sensitization has triggered intensive research effort towards development and validation of alternative methods. In addition, new legislation on the registration and use of cosmetics and chemicals promote the use of suitable alternatives for hazard assessment. Our previous studies demonstrated that human CD34{sup +} progenitor-derived dendritic cells from cord blood express specific gene profiles upon exposure to low molecular weight sensitizing chemicals. This paper presents a classification model based on this cell type which is successful in discriminating sensitizing chemicals from non-sensitizing chemicals based on transcriptome analysis of 13 genes. Expression profiles of a set of 10 sensitizers and 11 non-sensitizers were analyzed by RT-PCR using 9 different exposure conditions and a total of 73 donor samples. Based on these data a predictive dichotomous classifier for skin sensitizers has been constructed, which is referred to as . In a first step the dimensionality of the input data was reduced by selectively rejecting a number of exposure conditions and genes. Next, the generalization of a linear classifier was evaluated by a cross-validation which resulted in a prediction performance with a concordance of 89%, a specificity of 97% and a sensitivity of 82%. These results show that the present model may be a useful human in vitro alternative for further use in a test strategy towards the reduction of animal use for skin sensitization.

  7. A cell-based in vitro alternative to identify skin sensitizers by gene expression.

    PubMed

    Hooyberghs, Jef; Schoeters, Elke; Lambrechts, Nathalie; Nelissen, Inge; Witters, Hilda; Schoeters, Greet; Van Den Heuvel, Rosette

    2008-08-15

    The ethical and economic burden associated with animal testing for assessment of skin sensitization has triggered intensive research effort towards development and validation of alternative methods. In addition, new legislation on the registration and use of cosmetics and chemicals promote the use of suitable alternatives for hazard assessment. Our previous studies demonstrated that human CD34(+) progenitor-derived dendritic cells from cord blood express specific gene profiles upon exposure to low molecular weight sensitizing chemicals. This paper presents a classification model based on this cell type which is successful in discriminating sensitizing chemicals from non-sensitizing chemicals based on transcriptome analysis of 13 genes. Expression profiles of a set of 10 sensitizers and 11 non-sensitizers were analyzed by RT-PCR using 9 different exposure conditions and a total of 73 donor samples. Based on these data a predictive dichotomous classifier for skin sensitizers has been constructed, which is referred to as VITOSENS. In a first step the dimensionality of the input data was reduced by selectively rejecting a number of exposure conditions and genes. Next, the generalization of a linear classifier was evaluated by a cross-validation which resulted in a prediction performance with a concordance of 89%, a specificity of 97% and a sensitivity of 82%. These results show that the present model may be a useful human in vitro alternative for further use in a test strategy towards the reduction of animal use for skin sensitization.

  8. Use of lactobacilli and their pheromone-based regulatory mechanism in gene expression and drug delivery.

    PubMed

    Diep, D B; Mathiesen, G; Eijsink, V G H; Nes, I F

    2009-01-01

    Lactobacilli are common microorganisms in diverse vegetables and meat products and several of these are also indigenous inhabitants in the gastro-intestinal (GI) tract of humans and animals where they are believed to have health promoting effects on the host. One of the highly appreciated probiotic effects is their ability to inhibit the growth of pathogens by producing antimicrobial peptides, so-called bacteriocins. Production of some bacteriocins has been shown to be strictly regulated through a quorum-sensing based mechanism mediated by a secreted peptide-pheromone (also called induction peptide; IP), a membrane-located sensor (histidine protein kinase; HPK) and a cytoplasmic response regulator (RR). The interaction between an IP and its sensor, which is highly specific, leads to activation of the cognate RR which in turn binds to regulated promoters and activates gene expression. The HPKs and RRs are built up by conserved modules, and the signalling between them within a network is efficient and directional, and can easily be activated by exogenously added synthetic IPs. Consequently, components from such regulatory networks have successfully been exploited in construction of a number of inducible gene expression systems. In this review, we discuss some well-characterised quorum sensing networks involved in bacteriocin production in lactobacilli, with special focus on the use of the regulatory components in gene expression and on lactobacilli as potential delivery vehicle for therapeutic and vaccine purposes.

  9. Network based analyses of gene expression profile of LCN2 overexpression in esophageal squamous cell carcinoma

    PubMed Central

    Wu, Bingli; Li, Chunquan; Du, Zepeng; Yao, Qianlan; Wu, Jianyi; Feng, Li; Zhang, Pixian; Li, Shang; Xu, Liyan; Li, Enmin

    2014-01-01

    LCN2 (lipocalin 2) is a member of the lipocalin family of proteins that transport small, hydrophobic ligands. LCN2 is elevated in various cancers including esophageal squamous cell carcinoma (ESCC). In this study, LCN2 was overexpressed in the EC109 ESCC cell line and we applied integrated analyses of the gene expression data to identify protein-protein interactions (PPI) network to enhance our understanding of the role of LCN2 in ESCC. Through further mining of PPI sub-networks, hundreds of differentially expressed genes (DEGs) were identified to interact with thousands of other proteins. Subcellular localization analyses found the DEGs and their directly or indirectly interacting proteins distributed in multiple layers, which was applied to analyze the possible paths between two DEGs. Gene Ontology annotation generated a functional annotation map and found hundreds of significant terms, especially those associated with the known and potential roles of LCN2 protein. The algorithm of Random Walk with Restart was applied to prioritize the DEGs and identified several cancer-related DEGs ranked closest to LCN2 protein. These analyses based on PPI network have greatly expanded our understanding of the mRNA expression profile of LCN2 overexpresssion for future examination of the roles and mechanisms of LCN2. PMID:24954627

  10. An 8-gene qRT-PCR-based gene expression score that has prognostic value in early breast cancer

    PubMed Central

    2010-01-01

    Background Gene expression profiling may improve prognostic accuracy in patients with early breast cancer. Our objective was to demonstrate that it is possible to develop a simple molecular signature to predict distant relapse. Methods We included 153 patients with stage I-II hormonal receptor-positive breast cancer. RNA was isolated from formalin-fixed paraffin-embedded samples and qRT-PCR amplification of 83 genes was performed with gene expression assays. The genes we analyzed were those included in the 70-Gene Signature, the Recurrence Score and the Two-Gene Index. The association among gene expression, clinical variables and distant metastasis-free survival was analyzed using Cox regression models. Results An 8-gene prognostic score was defined. Distant metastasis-free survival at 5 years was 97% for patients defined as low-risk by the prognostic score versus 60% for patients defined as high-risk. The 8-gene score remained a significant factor in multivariate analysis and its performance was similar to that of two validated gene profiles: the 70-Gene Signature and the Recurrence Score. The validity of the signature was verified in independent cohorts obtained from the GEO database. Conclusions This study identifies a simple gene expression score that complements histopathological prognostic factors in breast cancer, and can be determined in paraffin-embedded samples. PMID:20584321

  11. A Digital Gene Expression-Based Bovine Gene Atlas Evaluating 92 Adult, Juvenile and Fetal Cattle Tissues

    USDA-ARS?s Scientific Manuscript database

    A comprehensive transcriptome survey, or “Gene Atlas,” provides information essential for a complete understanding of the genomic biology of an organism. Using a digital gene expression approach, we developed a Gene Atlas of RNA abundance in 92 adult, juvenile and fetal cattle tissues. The samples...

  12. In vivo imaging of inducible tyrosinase gene expression with an ultrasound array-based photoacoustic system

    NASA Astrophysics Data System (ADS)

    Harrison, Tyler; Paproski, Robert J.; Zemp, Roger J.

    2012-02-01

    Tyrosinase, a key enzyme in the production of melanin, has shown promise as a reporter of genetic activity. While green fluorescent protein has been used extensively in this capacity, it is limited in its ability to provide information deep in tissue at a reasonable resolution. As melanin is a strong absorber of light, it is possible to image gene expression using tyrosinase with photoacoustic imaging technologies, resulting in excellent resolutions at multiple-centimeter depths. While our previous work has focused on creating and imaging MCF-7 cells with doxycycline-controlled tyrosinase expression, we have now established the viability of these cells in a murine model. Using an array-based photoacoustic imaging system with 5 MHz center frequency, we capture interleaved ultrasound and photoacoustic images of tyrosinase-expressing MCF-7 tumors both in a tissue mimicking phantom, and in vivo. Images of both the tyrosinase-expressing tumor and a control tumor are presented as both coregistered ultrasound-photoacoustic B-scan images and 3-dimensional photoacoustic volumes created by mechanically scanning the transducer. We find that the tyrosinase-expressing tumor is visible with a signal level 12dB greater than that of the control tumor in vivo. Phantom studies with excised tumors show that the tyrosinase-expressing tumor is visible at depths in excess of 2cm, and have suggested that our imaging system is sensitive to a transfection rate of less than 1%.

  13. Sex-based differences in gene expression in hippocampus following postnatal lead exposure

    SciTech Connect

    Schneider, J.S. Anderson, D.W.; Sonnenahalli, H.; Vadigepalli, R.

    2011-10-15

    The influence of sex as an effect modifier of childhood lead poisoning has received little systematic attention. Considering the paucity of information available concerning the interactive effects of lead and sex on the brain, the current study examined the interactive effects of lead and sex on gene expression patterns in the hippocampus, a structure involved in learning and memory. Male or female rats were fed either 1500 ppm lead-containing chow or control chow for 30 days beginning at weaning.Blood lead levels were 26.7 {+-} 2.1 {mu}g/dl and 27.1 {+-} 1.7 {mu}g/dl for females and males, respectively. The expression of 175 unique genes was differentially regulated between control male and female rats. A total of 167 unique genes were differentially expressed in response to lead in either males or females. Lead exposure had a significant effect without a significant difference between male and female responses in 77 of these genes. In another set of 71 genes, there were significant differences in male vs. female response. A third set of 30 genes was differentially expressed in opposite directions in males vs. females, with the majority of genes expressed at a lower level in females than in males. Highly differentially expressed genes in males and females following lead exposure were associated with diverse biological pathways and functions. These results show that a brief exposure to lead produced significant changes in expression of a variety of genes in the hippocampus and that the response of the brain to a given lead exposure may vary depending on sex. - Highlights: > Postnatal lead exposure has a significant effect on hippocampal gene expression patterns. > At least one set of genes was affected in opposite directions in males and females. > Differentially expressed genes were associated with diverse biological pathways.

  14. Ribozyme-based aminoglycoside switches of gene expression engineered by genetic selection in S. cerevisiae.

    PubMed

    Klauser, Benedikt; Atanasov, Janina; Siewert, Lena K; Hartig, Jörg S

    2015-05-15

    Systems for conditional gene expression are powerful tools in basic research as well as in biotechnology. For future applications, it is of great importance to engineer orthogonal genetic switches that function reliably in diverse contexts. RNA-based switches have the advantage that effector molecules interact immediately with regulatory modules inserted into the target RNAs, getting rid of the need of transcription factors usually mediating genetic control. Artificial riboswitches are characterized by their simplicity and small size accompanied by a high degree of modularity. We have recently reported a series of hammerhead ribozyme-based artificial riboswitches that allow for post-transcriptional regulation of gene expression via switching mRNA, tRNA, or rRNA functions. A more widespread application was so far hampered by moderate switching performances and a limited set of effector molecules available. Here, we report the re-engineering of hammerhead ribozymes in order to respond efficiently to aminoglycoside antibiotics. We first established an in vivo selection protocol in Saccharomyces cerevisiae that enabled us to search large sequence spaces for optimized switches. We then envisioned and characterized a novel strategy of attaching the aptamer to the ribozyme catalytic core, increasing the design options for rendering the ribozyme ligand-dependent. These innovations enabled the development of neomycin-dependent RNA modules that switch gene expression up to 25-fold. The presented aminoglycoside-responsive riboswitches belong to the best-performing RNA-based genetic regulators reported so far. The developed in vivo selection protocol should allow for sampling of large sequence spaces for engineering of further optimized riboswitches.

  15. Cancer classification through filtering progressive transductive support vector machine based on gene expression data

    NASA Astrophysics Data System (ADS)

    Lu, Xinguo; Chen, Dan

    2017-08-01

    Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.

  16. Subtyping of Gliomaby Combining Gene Expression and CNVs Data Based on a Compressive Sensing Approach

    PubMed Central

    Tang, Wenlong; Cao, Hongbao; Zhang, Ji-Gang; Duan, Junbo; Lin, Dongdong; Wang, Yu-Ping

    2013-01-01

    It is realized that a combined analysis of different types of genomic measurements tends to give more reliable classification results. However, how to efficiently combine data with different resolutions is challenging. We propose a novel compressed sensing based approach for the combined analysis of gene expression and copy number variants data for the purpose of subtyping six types of Gliomas. Experimental results show that the proposed combined approach can substantially improve the classification accuracy compared to that of using either of individual data type. The proposed approach can be applicable to many other types of genomic data. PMID:25267935

  17. Problem-Based Test: The Effect of Fibroblast Growth Factor on Gene Expression

    ERIC Educational Resources Information Center

    Szeberenyi, Jozsef

    2011-01-01

    This paper shows the results of an experiment in which the effects of fibroblast growth factor (FGF), actinomycin D (Act D; an inhibitor of transcription), and cycloheximide (CHX; an inhibitor of translation) were studied on the expression of two genes: a gene called "Fnk" and the gene coding for glyceraldehyde-3-phosphate dehydrogenase (GAPDH).…

  18. Problem-Based Test: The Effect of Fibroblast Growth Factor on Gene Expression

    ERIC Educational Resources Information Center

    Szeberenyi, Jozsef

    2011-01-01

    This paper shows the results of an experiment in which the effects of fibroblast growth factor (FGF), actinomycin D (Act D; an inhibitor of transcription), and cycloheximide (CHX; an inhibitor of translation) were studied on the expression of two genes: a gene called "Fnk" and the gene coding for glyceraldehyde-3-phosphate dehydrogenase (GAPDH).…

  19. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes

    PubMed Central

    Warnat, Patrick; Eils, Roland; Brors, Benedikt

    2005-01-01

    Background The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods. Results In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85%) were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis. Conclusion Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and microarray technologies

  20. Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments

    PubMed Central

    Petryszak, Robert; Burdett, Tony; Fiorelli, Benedetto; Fonseca, Nuno A.; Gonzalez-Porta, Mar; Hastings, Emma; Huber, Wolfgang; Jupp, Simon; Keays, Maria; Kryvych, Nataliya; McMurry, Julie; Marioni, John C.; Malone, James; Megy, Karine; Rustici, Gabriella; Tang, Amy Y.; Taubert, Jan; Williams, Eleanor; Mannion, Oliver; Parkinson, Helen E.; Brazma, Alvis

    2014-01-01

    Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user. PMID:24304889

  1. Ecdysone Receptor-based Singular Gene Switches for Regulated Transgene Expression in Cells and Adult Rodent Tissues

    PubMed Central

    Lee, Seoghyun; Sohn, Kyung-Cheol; Choi, Dae-Kyoung; Won, Minho; Park, Kyeong Ah; Ju, Sung-Kyu; Kang, Kidong; Bae, Young-Ki; Hur, Gang Min; Ro, Hyunju

    2016-01-01

    Controlled gene expression is an indispensable technique in biomedical research. Here, we report a convenient, straightforward, and reliable way to induce expression of a gene of interest with negligible background expression compared to the most widely used tetracycline (Tet)-regulated system. Exploiting a Drosophila ecdysone receptor (EcR)-based gene regulatory system, we generated nonviral and adenoviral singular vectors designated as pEUI(+) and pENTR-EUI, respectively, which contain all the required elements to guarantee regulated transgene expression (GAL4-miniVP16-EcR, termed GvEcR hereafter, and 10 tandem repeats of an upstream activation sequence promoter followed by a multiple cloning site). Through the transient and stable transfection of mammalian cell lines with reporter genes, we validated that tebufenozide, an ecdysone agonist, reversibly induced gene expression, in a dose- and time-dependent manner, with negligible background expression. In addition, we created an adenovirus derived from the pENTR-EUI vector that readily infected not only cultured cells but also rodent tissues and was sensitive to tebufenozide treatment for regulated transgene expression. These results suggest that EcR-based singular gene regulatory switches would be convenient tools for the induction of gene expression in cells and tissues in a tightly controlled fashion. PMID:27673563

  2. Blood-based gene expression signatures of medication-free outpatients with major depressive disorder: integrative genome-wide and candidate gene analyses

    PubMed Central

    Hori, Hiroaki; Sasayama, Daimei; Teraishi, Toshiya; Yamamoto, Noriko; Nakamura, Seiji; Ota, Miho; Hattori, Kotaro; Kim, Yoshiharu; Higuchi, Teruhiko; Kunugi, Hiroshi

    2016-01-01

    Several microarray-based studies have investigated gene expression profiles in major depressive disorder (MDD), yet with highly variable findings. We examined blood-based genome-wide expression signatures of MDD, focusing on molecular pathways and networks underlying differentially expressed genes (DEGs) and behaviours of hypothesis-driven, evidence-based candidate genes for depression. Agilent human whole-genome arrays were used to measure gene expression in 14 medication-free outpatients with MDD who were at least moderately ill and 14 healthy controls matched pairwise for age and sex. After filtering, we compared expression of entire probes between patients and controls and identified DEGs. The DEGs were evaluated by pathway and network analyses. For the candidate gene analysis, we utilized 169 previously prioritized genes and examined their case-control separation efficiency and correlational co-expression network in patients relative to controls. The 317 screened DEGs mapped to a significantly over-represented pathway, the “synaptic transmission” pathway. The protein-protein interaction network was also significantly enriched, in which a number of key molecules for depression were included. The co-expression network of candidate genes was markedly disrupted in patients. This study provided evidence for an altered molecular network along with several key molecules in MDD and confirmed that the candidate genes are worthwhile targets for depression research. PMID:26728011

  3. Blood-based gene expression signatures of medication-free outpatients with major depressive disorder: integrative genome-wide and candidate gene analyses.

    PubMed

    Hori, Hiroaki; Sasayama, Daimei; Teraishi, Toshiya; Yamamoto, Noriko; Nakamura, Seiji; Ota, Miho; Hattori, Kotaro; Kim, Yoshiharu; Higuchi, Teruhiko; Kunugi, Hiroshi

    2016-01-05

    Several microarray-based studies have investigated gene expression profiles in major depressive disorder (MDD), yet with highly variable findings. We examined blood-based genome-wide expression signatures of MDD, focusing on molecular pathways and networks underlying differentially expressed genes (DEGs) and behaviours of hypothesis-driven, evidence-based candidate genes for depression. Agilent human whole-genome arrays were used to measure gene expression in 14 medication-free outpatients with MDD who were at least moderately ill and 14 healthy controls matched pairwise for age and sex. After filtering, we compared expression of entire probes between patients and controls and identified DEGs. The DEGs were evaluated by pathway and network analyses. For the candidate gene analysis, we utilized 169 previously prioritized genes and examined their case-control separation efficiency and correlational co-expression network in patients relative to controls. The 317 screened DEGs mapped to a significantly over-represented pathway, the "synaptic transmission" pathway. The protein-protein interaction network was also significantly enriched, in which a number of key molecules for depression were included. The co-expression network of candidate genes was markedly disrupted in patients. This study provided evidence for an altered molecular network along with several key molecules in MDD and confirmed that the candidate genes are worthwhile targets for depression research.

  4. Interpreting the gene expression microarray results: a user-based experience.

    PubMed

    Melissari, Erika; Di Russo, Manuela; Mariotti, Veronica; Righi, Marco; Iofrida, Caterina; Pellegrini, Silvia

    2013-06-01

    In recent years many tools have been developed to cope with the interpretation of gene expression results from microarray experiments. The effectiveness of these tools largely depends on their ease of use by biomedical researchers. Tools based on effective computational methods, indeed, cannot be fully exploited by users if they are not supported by an intuitive interface, a large set of utilities and effective outputs. In this paper, 10 tools for the interpretation of gene expression microarray results have been tested on 11 microarray datasets and evaluated according to eight assessment criteria: 1. interface design and usability, 2. easiness of input submission, 3. effectiveness of output representation and 4. of the downloaded outputs, 5. possibility to submit multiple gene IDs, 6. sources of information, 7. provision of different statistical tests and 8. of multiple test correction methods. Strengths and weaknesses of each tool are highlighted to: a. provide useful tips to users dealing with the biological interpretation of microarray results; b. draw the attention of software developers on the usability of their tools.

  5. Integrated Microfluidic Devices for Automated Microarray-Based Gene Expression and Genotyping Analysis

    NASA Astrophysics Data System (ADS)

    Liu, Robin H.; Lodes, Mike; Fuji, H. Sho; Danley, David; McShea, Andrew

    Microarray assays typically involve multistage sample processing and fluidic handling, which are generally labor-intensive and time-consuming. Automation of these processes would improve robustness, reduce run-to-run and operator-to-operator variation, and reduce costs. In this chapter, a fully integrated and self-contained microfluidic biochip device that has been developed to automate the fluidic handling steps for microarray-based gene expression or genotyping analysis is presented. The device consists of a semiconductor-based CustomArray® chip with 12,000 features and a microfluidic cartridge. The CustomArray was manufactured using a semiconductor-based in situ synthesis technology. The micro-fluidic cartridge consists of microfluidic pumps, mixers, valves, fluid channels, and reagent storage chambers. Microarray hybridization and subsequent fluidic handling and reactions (including a number of washing and labeling steps) were performed in this fully automated and miniature device before fluorescent image scanning of the microarray chip. Electrochemical micropumps were integrated in the cartridge to provide pumping of liquid solutions. A micromixing technique based on gas bubbling generated by electrochemical micropumps was developed. Low-cost check valves were implemented in the cartridge to prevent cross-talk of the stored reagents. Gene expression study of the human leukemia cell line (K562) and genotyping detection and sequencing of influenza A subtypes have been demonstrated using this integrated biochip platform. For gene expression assays, the microfluidic CustomArray device detected sample RNAs with a concentration as low as 0.375 pM. Detection was quantitative over more than three orders of magnitude. Experiment also showed that chip-to-chip variability was low indicating that the integrated microfluidic devices eliminate manual fluidic handling steps that can be a significant source of variability in genomic analysis. The genotyping results showed

  6. Dissecting Gene Expression Changes Accompanying a Ploidy-Based Phenotypic Switch

    PubMed Central

    Cromie, Gareth A.; Tan, Zhihao; Hays, Michelle; Jeffery, Eric W.; Dudley, Aimée M.

    2016-01-01

    Aneuploidy, a state in which the chromosome number deviates from a multiple of the haploid count, significantly impacts human health. The phenotypic consequences of aneuploidy are believed to arise from gene expression changes associated with the altered copy number of genes on the aneuploid chromosomes. To dissect the mechanisms underlying altered gene expression in aneuploids, we used RNA-seq to measure transcript abundance in colonies of the haploid Saccharomyces cerevisiae strain F45 and two aneuploid derivatives harboring disomies of chromosomes XV and XVI. F45 colonies display complex “fluffy” morphologies, while the disomic colonies are smooth, resembling laboratory strains. Our two disomes displayed similar transcriptional profiles, a phenomenon not driven by their shared smooth colony morphology nor simply by their karyotype. Surprisingly, the environmental stress response (ESR) was induced in F45, relative to the two disomes. We also identified genes whose expression reflected a nonlinear interaction between the copy number of a transcriptional regulatory gene on chromosome XVI, DIG1, and the copy number of other chromosome XVI genes. DIG1 and the remaining chromosome XVI genes also demonstrated distinct contributions to the effect of the chromosome XVI disome on ESR gene expression. Expression changes in aneuploids appear to reflect a mixture of effects shared between different aneuploidies and effects unique to perturbing the copy number of particular chromosomes, including nonlinear copy number interactions between genes. The balance between these two phenomena is likely to be genotype- and environment-specific. PMID:27836908

  7. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary

  8. Effect of electro-acupuncture on gene expression in heart of rats with stress-induced pre-hypertension based on gene chip technology.

    PubMed

    Guo, Yan; Xie, Xiaojia; Guo, Changqing; Wang, Zhaoyang; Liu, Qingguo

    2015-06-01

    To explore electro-acupuncture's (EA's) effect on gene expression in heart of rats with stress-induced pre-hypertension and try to reveal its biological mechanism based on gene chip technology. Twenty-seven Wistar male rats were randomly divided into 3 groups. The stress-induced hypertensive rat model was prepared by electric foot-shocks combined with generated noise. Molding cycle lasted for 14 days and EA intervene was applied,on rats in model + EA group during model preparation. Rat Gene 2.0 Sense Target Array technology was used for the determination of gene expression profiles and the screened key genes were verified by real-time quantitative polymerase chain reaction (RT-PCR) method. Compared with blank control group, 390 genes were changed in model group; compared with model control group, 330 genes were changed in model+EA group. Significance analysis of gene function showed that the differentially expressed genes are those involved in biological process, molecular function and cellular components. RT-PCR result of the screened key genes is consistent with that of gene chip test. EA could significantly lower blood pressure of stress-induced pre-hypertension rats and affect its gene expression profile in heart. Genes that related to the contraction of vascular smooth muscle may be involved in EA's anti-hypertensive mechanism.

  9. Robust depth-based tools for the analysis of gene expression data.

    PubMed

    López-Pintado, Sara; Romo, Juan; Torrente, Aurora

    2010-04-01

    Microarray experiments provide data on the expression levels of thousands of genes and, therefore, statistical methods applicable to the analysis of such high-dimensional data are needed. In this paper, we propose robust nonparametric tools for the description and analysis of microarray data based on the concept of functional depth, which measures the centrality of an observation within a sample. We show that this concept can be easily adapted to high-dimensional observations and, in particular, to gene expression data. This allows the development of the following depth-based inference tools: (1) a scale curve for measuring and visualizing the dispersion of a set of points, (2) a rank test for deciding if 2 groups of multidimensional observations come from the same population, and (3) supervised classification techniques for assigning a new sample to one of G given groups. We apply these methods to microarray data, and to simulated data including contaminated models, and show that they are robust, efficient, and competitive with other procedures proposed in the literature, outperforming them in some situations.

  10. A host-based RT-PCR gene expression signature to identify acute respiratory viral infection.

    PubMed

    Zaas, Aimee K; Burke, Thomas; Chen, Minhua; McClain, Micah; Nicholson, Bradly; Veldman, Timothy; Tsalik, Ephraim L; Fowler, Vance; Rivers, Emanuel P; Otero, Ronny; Kingsmore, Stephen F; Voora, Deepak; Lucas, Joseph; Hero, Alfred O; Carin, Lawrence; Woods, Christopher W; Ginsburg, Geoffrey S

    2013-09-18

    Improved ways to diagnose acute respiratory viral infections could decrease inappropriate antibacterial use and serve as a vital triage mechanism in the event of a potential viral pandemic. Measurement of the host response to infection is an alternative to pathogen-based diagnostic testing and may improve diagnostic accuracy. We have developed a host-based assay with a reverse transcription polymerase chain reaction (RT-PCR) TaqMan low-density array (TLDA) platform for classifying respiratory viral infection. We developed the assay using two cohorts experimentally infected with influenza A H3N2/Wisconsin or influenza A H1N1/Brisbane, and validated the assay in a sample of adults presenting to the emergency department with fever (n = 102) and in healthy volunteers (n = 41). Peripheral blood RNA samples were obtained from individuals who underwent experimental viral challenge or who presented to the emergency department and had microbiologically proven viral respiratory infection or systemic bacterial infection. The selected gene set on the RT-PCR TLDA assay classified participants with experimentally induced influenza H3N2 and H1N1 infection with 100 and 87% accuracy, respectively. We validated this host gene expression signature in a cohort of 102 individuals arriving at the emergency department. The sensitivity of the RT-PCR test was 89% [95% confidence interval (CI), 72 to 98%], and the specificity was 94% (95% CI, 86 to 99%). These results show that RT-PCR-based detection of a host gene expression signature can classify individuals with respiratory viral infection and sets the stage for prospective evaluation of this diagnostic approach in a clinical setting.

  11. Parkinson's disease candidate gene prioritization based on expression profile of midbrain dopaminergic neurons

    PubMed Central

    2010-01-01

    Background Parkinson's disease is the second most common neurodegenerative disorder. The pathological hallmark of the disease is degeneration of midbrain dopaminergic neurons. Genetic association studies have linked 13 human chromosomal loci to Parkinson's disease. Identification of gene(s), as part of the etiology of Parkinson's disease, within the large number of genes residing in these loci can be achieved through several approaches, including screening methods, and considering appropriate criteria. Since several of the indentified Parkinson's disease genes are expressed in substantia nigra pars compact of the midbrain, expression within the neurons of this area could be a suitable criterion to limit the number of candidates and identify PD genes. Methods In this work we have used the combination of findings from six rodent transcriptome analysis studies on the gene expression profile of midbrain dopaminergic neurons and the PARK loci in OMIM (Online Mendelian Inheritance in Man) database, to identify new candidate genes for Parkinson's disease. Results Merging the two datasets, we identified 20 genes within PARK loci, 7 of which are located in an orphan Parkinson's disease locus and one, which had been identified as a disease gene. In addition to identifying a set of candidates for further genetic association studies, these results show that the criteria of expression in midbrain dopaminergic neurons may be used to narrow down the number of genes in PARK loci for such studies. PMID:20716345

  12. A fuzzy gene expression-based computational approach improves breast cancer prognostication

    PubMed Central

    2010-01-01

    Early gene expression studies classified breast tumors into at least three clinically relevant subtypes. Although most current gene signatures are prognostic for estrogen receptor (ER) positive/human epidermal growth factor receptor 2 (HER2) negative breast cancers, few are informative for ER negative/HER2 negative and HER2 positive subtypes. Here we present Gene Expression Prognostic Index Using Subtypes (GENIUS), a fuzzy approach for prognostication that takes into account the molecular heterogeneity of breast cancer. In systematic evaluations, GENIUS significantly outperformed current gene signatures and clinical indices in the global population of patients. PMID:20156340

  13. Human Lacrimal Gland Gene Expression

    PubMed Central

    Aakalu, Vinay Kumar; Parameswaran, Sowmya; Maienschein-Cline, Mark; Bahroos, Neil; Shah, Dhara; Ali, Marwan; Krishnakumar, Subramanian

    2017-01-01

    Background The study of human lacrimal gland biology and development is limited. Lacrimal gland tissue is damaged or poorly functional in a number of disease states including dry eye disease. Development of cell based therapies for lacrimal gland diseases requires a better understanding of the gene expression and signaling pathways in lacrimal gland. Differential gene expression analysis between lacrimal gland and other embryologically similar tissues may be helpful in furthering our understanding of lacrimal gland development. Methods We performed global gene expression analysis of human lacrimal gland tissue using Affymetrix ® gene expression arrays. Primary data from our laboratory was compared with datasets available in the NLM GEO database for other surface ectodermal tissues including salivary gland, skin, conjunctiva and corneal epithelium. Results The analysis revealed statistically significant difference in the gene expression of lacrimal gland tissue compared to other ectodermal tissues. The lacrimal gland specific, cell surface secretory protein encoding genes and critical signaling pathways which distinguish lacrimal gland from other ectodermal tissues are described. Conclusions Differential gene expression in human lacrimal gland compared with other ectodermal tissue types revealed interesting patterns which may serve as the basis for future studies in directed differentiation among other areas. PMID:28081151

  14. A modified ABCDE model of flowering in orchids based on gene expression profiling studies of the moth orchid Phalaenopsis aphrodite.

    PubMed

    Su, Chun-Lin; Chen, Wan-Chieh; Lee, Ann-Ying; Chen, Chun-Yi; Chang, Yao-Chien Alex; Chao, Ya-Ting; Shih, Ming-Che

    2013-01-01

    Previously we developed genomic resources for orchids, including transcriptomic analyses using next-generation sequencing techniques and construction of a web-based orchid genomic database. Here, we report a modified molecular model of flower development in the Orchidaceae based on functional analysis of gene expression profiles in Phalaenopsis aphrodite (a moth orchid) that revealed novel roles for the transcription factors involved in floral organ pattern formation. Phalaenopsis orchid floral organ-specific genes were identified by microarray analysis. Several critical transcription factors including AP3, PI, AP1 and AGL6, displayed distinct spatial distribution patterns. Phylogenetic analysis of orchid MADS box genes was conducted to infer the evolutionary relationship among floral organ-specific genes. The results suggest that gene duplication MADS box genes in orchid may have resulted in their gaining novel functions during evolution. Based on these analyses, a modified model of orchid flowering was proposed. Comparison of the expression profiles of flowers of a peloric mutant and wild-type Phalaenopsis orchid further identified genes associated with lip morphology and peloric effects. Large scale investigation of gene expression profiles revealed that homeotic genes from the ABCDE model of flower development classes A and B in the Phalaenopsis orchid have novel functions due to evolutionary diversification, and display differential expression patterns.

  15. MicroRNA and target gene expression based clustering of oral cancer, precancer and normal tissues.

    PubMed

    Roy, Roshni; Singh, Richa; Chattopadhyay, Esita; Ray, Anindita; Sarkar, Navonil De; Aich, Ritesh; Paul, Ranjan Rashmi; Pal, Mousumi; Roy, Bidyut

    2016-11-15

    Development of oral cancer is usually preceded by precancerous lesion. Despite histopathological diagnosis, development of disease specific biomarkers continues to be a promising field of study. Expression of miRNAs and their target genes was studied in oral cancer and two types of precancer lesions to look for disease specific gene expression patterns. Expression of miR-26a, miR-29a, miR-34b and miR-423 and their 11 target genes were determined in 20 oral leukoplakia, 20 lichen planus and 20 cancer tissues with respect to 20 normal tissues using qPCR assay. Expression data were, then, used for cluster analysis of normal as well as disease tissues. Expression of miR-26a and miR-29a was significantly down regulated in leukoplakia and cancer tissues but up regulated in lichen planus tissues. Expression of target genes such as, ADAMTS7, ATP1B1, COL4A2, CPEB3, CDK6, DNMT3a and PI3KR1 was significantly down regulated in at least two of three disease types with respect to normal tissues. Negative correlations between expression levels of miRNAs and their targets were observed in normal tissues but not in disease tissues implying altered miRNA-target interaction in disease state. Specific expression profile of miRNAs and target genes formed separate clusters of normal, lichen planus and cancer tissues. Our results suggest that alterations in expression of selected miRNAs and target genes may play important roles in development of precancer to cancer. Expression profiles of miRNA and target genes may be useful to differentiate cancer and lichen planus from normal tissues, thereby bolstering their role in diagnostics. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Gene expression technology

    SciTech Connect

    Goeddel, D.V. )

    1990-01-01

    The articles in this volume were assemble to enable the reader to design effective strategies for the expression of cloned genes and cDNAs. More than a compilation of papers describing the multitude of techniques now available for expressing cloned genes, this volume provides a manual that should prove useful for solving the majority of expression problems one likely to encounter. The four major expression systems commonly available to most investigators are stressed: Escherichia coli, Bacillus subtilis, yeast, and mammalian cells. Each of these system has its advantages and disadvantages, details of which are found in Chapter 1 and the strategic overviews for the four major sections of the volume. The papers in each of these sections provide many suggestions on how to proceed if initial expression levels are not sufficient.

  17. Differential evolution of MAGE genes based on expression pattern and selection pressure.

    PubMed

    Zhao, Qi; Caballero, Otavia L; Simpson, Andrew J G; Strausberg, Robert L

    2012-01-01

    Starting from publicly-accessible datasets, we have utilized comparative and phylogenetic genome analyses to characterize the evolution of the human MAGE gene family. Our characterization of genomic structures in representative genomes of primates, rodents, carnivora, and macroscelidea indicates that both Type I and Type II MAGE genes have undergone lineage-specific evolution. The restricted expression pattern in germ cells of Type I MAGE orthologs is observed throughout evolutionary history. Unlike Type II MAGEs that have conserved promoter sequences, Type I MAGEs lack promoter conservation, suggesting that epigenetic regulation is a central mechanism for controlling their expression. Codon analysis shows that Type I but not Type II MAGE genes have been under positive selection. The combination of genomic and expression analysis suggests that Type 1 MAGE promoters and genes continue to evolve in the hominin lineage, perhaps towards functional diversification or acquiring additional specific functions, and that selection pressure at codon level is associated with expression spectrum.

  18. Differential Evolution of MAGE Genes Based on Expression Pattern and Selection Pressure

    PubMed Central

    Zhao, Qi; Caballero, Otavia L.; Simpson, Andrew J. G.; Strausberg, Robert L.

    2012-01-01

    Starting from publicly-accessible datasets, we have utilized comparative and phylogenetic genome analyses to characterize the evolution of the human MAGE gene family. Our characterization of genomic structures in representative genomes of primates, rodents, carnivora, and macroscelidea indicates that both Type I and Type II MAGE genes have undergone lineage-specific evolution. The restricted expression pattern in germ cells of Type I MAGE orthologs is observed throughout evolutionary history. Unlike Type II MAGEs that have conserved promoter sequences, Type I MAGEs lack promoter conservation, suggesting that epigenetic regulation is a central mechanism for controlling their expression. Codon analysis shows that Type I but not Type II MAGE genes have been under positive selection. The combination of genomic and expression analysis suggests that Type 1 MAGE promoters and genes continue to evolve in the hominin lineage, perhaps towards functional diversification or acquiring additional specific functions, and that selection pressure at codon level is associated with expression spectrum. PMID:23133577

  19. Sex-based differences in gene expression in hippocampus following postnatal lead exposure.

    PubMed

    Schneider, J S; Anderson, D W; Sonnenahalli, H; Vadigepalli, R

    2011-10-15

    The influence of sex as an effect modifier of childhood lead poisoning has received little systematic attention. Considering the paucity of information available concerning the interactive effects of lead and sex on the brain, the current study examined the interactive effects of lead and sex on gene expression patterns in the hippocampus, a structure involved in learning and memory. Male or female rats were fed either 1500 ppm lead-containing chow or control chow for 30 days beginning at weaning.Blood lead levels were 26.7±2.1 μg/dl and 27.1±1.7 μg/dl for females and males, respectively. The expression of 175 unique genes was differentially regulated between control male and female rats. A total of 167 unique genes were differentially expressed in response to lead in either males or females. Lead exposure had a significant effect without a significant difference between male and female responses in 77 of these genes. In another set of 71 genes, there were significant differences in male vs. female response. A third set of 30 genes was differentially expressed in opposite directions in males vs. females, with the majority of genes expressed at a lower level in females than in males. Highly differentially expressed genes in males and females following lead exposure were associated with diverse biological pathways and functions. These results show that a brief exposure to lead produced significant changes in expression of a variety of genes in the hippocampus and that the response of the brain to a given lead exposure may vary depending on sex. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. An Individual-Based Diploid Model Predicts Limited Conditions Under Which Stochastic Gene Expression Becomes Advantageous

    PubMed Central

    Matsumoto, Tomotaka; Mineta, Katsuhiko; Osada, Naoki; Araki, Hitoshi

    2015-01-01

    Recent studies suggest the existence of a stochasticity in gene expression (SGE) in many organisms, and its non-negligible effect on their phenotype and fitness. To date, however, how SGE affects the key parameters of population genetics are not well understood. SGE can increase the phenotypic variation and act as a load for individuals, if they are at the adaptive optimum in a stable environment. On the other hand, part of the phenotypic variation caused by SGE might become advantageous if individuals at the adaptive optimum become genetically less-adaptive, for example due to an environmental change. Furthermore, SGE of unimportant genes might have little or no fitness consequences. Thus, SGE can be advantageous, disadvantageous, or selectively neutral depending on its context. In addition, there might be a genetic basis that regulates magnitude of SGE, which is often referred to as “modifier genes,” but little is known about the conditions under which such an SGE-modifier gene evolves. In the present study, we conducted individual-based computer simulations to examine these conditions in a diploid model. In the simulations, we considered a single locus that determines organismal fitness for simplicity, and that SGE on the locus creates fitness variation in a stochastic manner. We also considered another locus that modifies the magnitude of SGE. Our results suggested that SGE was always deleterious in stable environments and increased the fixation probability of deleterious mutations in this model. Even under frequently changing environmental conditions, only very strong natural selection made SGE adaptive. These results suggest that the evolution of SGE-modifier genes requires strict balance among the strength of natural selection, magnitude of SGE, and frequency of environmental changes. However, the degree of dominance affected the condition under which SGE becomes advantageous, indicating a better opportunity for the evolution of SGE in different genetic

  1. A model of gene expression based on random dynamical systems reveals modularity properties of gene regulatory networks.

    PubMed

    Antoneli, Fernando; Ferreira, Renata C; Briones, Marcelo R S

    2016-06-01

    Here we propose a new approach to modeling gene expression based on the theory of random dynamical systems (RDS) that provides a general coupling prescription between the nodes of any given regulatory network given the dynamics of each node is modeled by a RDS. The main virtues of this approach are the following: (i) it provides a natural way to obtain arbitrarily large networks by coupling together simple basic pieces, thus revealing the modularity of regulatory networks; (ii) the assumptions about the stochastic processes used in the modeling are fairly general, in the sense that the only requirement is stationarity; (iii) there is a well developed mathematical theory, which is a blend of smooth dynamical systems theory, ergodic theory and stochastic analysis that allows one to extract relevant dynamical and statistical information without solving the system; (iv) one may obtain the classical rate equations form the corresponding stochastic version by averaging the dynamic random variables (small noise limit). It is important to emphasize that unlike the deterministic case, where coupling two equations is a trivial matter, coupling two RDS is non-trivial, specially in our case, where the coupling is performed between a state variable of one gene and the switching stochastic process of another gene and, hence, it is not a priori true that the resulting coupled system will satisfy the definition of a random dynamical system. We shall provide the necessary arguments that ensure that our coupling prescription does indeed furnish a coupled regulatory network of random dynamical systems. Finally, the fact that classical rate equations are the small noise limit of our stochastic model ensures that any validation or prediction made on the basis of the classical theory is also a validation or prediction of our model. We illustrate our framework with some simple examples of single-gene system and network motifs.

  2. A bayesian mixed regression based prediction of quantitative traits from molecular marker and gene expression data.

    PubMed

    Bhattacharjee, Madhuchhanda; Sillanpää, Mikko J

    2011-01-01

    Both molecular marker and gene expression data were considered alone as well as jointly to serve as additive predictors for two pathogen-activity-phenotypes in real recombinant inbred lines of soybean. For unobserved phenotype prediction, we used a bayesian hierarchical regression modeling, where the number of possible predictors in the model was controlled by different selection strategies tested. Our initial findings were submitted for DREAM5 (the 5th Dialogue on Reverse Engineering Assessment and Methods challenge) and were judged to be the best in sub-challenge B3 wherein both functional genomic and genetic data were used to predict the phenotypes. In this work we further improve upon this previous work by considering various predictor selection strategies and cross-validation was used to measure accuracy of in-data and out-data predictions. The results from various model choices indicate that for this data use of both data types (namely functional genomic and genetic) simultaneously improves out-data prediction accuracy. Adequate goodness-of-fit can be easily achieved with more complex models for both phenotypes, since the number of potential predictors is large and the sample size is not small. We also further studied gene-set enrichment (for continuous phenotype) in the biological process in question and chromosomal enrichment of the gene set. The methodological contribution of this paper is in exploration of variable selection techniques to alleviate the problem of over-fitting. Different strategies based on the nature of covariates were explored and all methods were implemented under the bayesian hierarchical modeling framework with indicator-based covariate selection. All the models based in careful variable selection procedure were found to produce significant results based on permutation test.

  3. A COMPRESSED SENSING BASED APPROACH FOR SUBTYPING OF LEUKEMIA FROM GENE EXPRESSION DATA

    PubMed Central

    Tang, Wenlong; Cao, Hongbao; Duan, Junbo; Wang, Yu-Ping

    2014-01-01

    With the development of genomic techniques, the demand for new methods that can handle high-throughput genome-wide data effectively is becoming stronger than ever before. Compressed sensing (CS) is an emerging approach in statistics and signal processing. With the CS theory, a signal can be uniquely reconstructed or approximated from its sparse representations, which can therefore better distinguish different types of signals. However, the application of CS approach to genome-wide data analysis has been rarely investigated. We propose a novel CS-based approach for genomic data classification and test its performance in the subtyping of leukemia through gene expression analysis. The detection of subtypes of cancers such as leukemia according to different genetic markups is significant, which holds promise for the individualization of therapies and improvement of treatments. In our work, four statistical features were employed to select significant genes for the classification. With our selected genes out of 7,129 ones, the proposed CS method achieved a classification accuracy of 97.4% when evaluated with the cross validation and 94.3% when evaluated with another independent data set. The robustness of the method to noise was also tested, giving good performance. Therefore, this work demonstrates that the CS method can effectively detect subtypes of leukemia, implying improved accuracy of diagnosis of leukemia. PMID:21976380

  4. Model-Based Characterization of Inflammatory Gene Expression Patterns of Activated Macrophages

    PubMed Central

    Ehlting, Christian; Thomas, Maria; Zanger, Ulrich M.; Sawodny, Oliver; Häussinger, Dieter; Bode, Johannes G.

    2016-01-01

    Macrophages are cells with remarkable plasticity. They integrate signals from their microenvironment leading to context-dependent polarization into classically (M1) or alternatively (M2) activated macrophages, representing two extremes of a broad spectrum of divergent phenotypes. Thereby, macrophages deliver protective and pro-regenerative signals towards injured tissue but, depending on the eliciting damage, may also be responsible for the generation and aggravation of tissue injury. Although incompletely understood, there is emerging evidence that macrophage polarization is critical for these antagonistic roles. To identify activation-specific expression patterns of chemokines and cytokines that may confer these distinct effects a systems biology approach was applied. A comprehensive literature-based Boolean model was developed to describe the M1 (LPS-activated) and M2 (IL-4/13-activated) polarization types. The model was validated using high-throughput transcript expression data from murine bone marrow derived macrophages. By dynamic modeling of gene expression, the chronology of pathway activation and autocrine signaling was estimated. Our results provide a deepened understanding of the physiological balance leading to M1/M2 activation, indicating the relevance of co-regulatory signals at the level of Akt1 or Akt2 that may be important for directing macrophage polarization. PMID:27464342

  5. Census of genes expressed in porcine embryos and reproductive tissues by mining an expressed sequence tag database based on human genes.

    PubMed

    Jiang, Zhihua; Zhang, Ming; Wasem, Vaughn D; Michal, Jennifer J; Zhang, Hao; Wright, Raymond W

    2003-10-01

    A total of 98,898 expressed sequence tags (ESTs) derived from embryos and reproductive tissues in pigs were identified in the GenBank "est_others" database. Pig embryos were collected at 11, 12, 13, 14, 15, 20, 30, and 45 days after gestation. The reproductive tissues were sampled from testis, ovary, endometrium, hypothalamus, anterior pituitary, uterus, and placenta. A gene-oriented approach was developed to annotate these porcine EST sequences to census the genes expressed from these sources. Of the 33 308 mRNA sequences from the human genes used as references (data accessed on 1 November 2002), 9410 had the porcine EST homologs expressed in embryos and 11 795 had the EST homologs expressed in reproductive tissues. The entire genome contributes at least 28.3% of its genes to embryo development and 35.4% of its genes to reproduction. Using the EST entry numbers as indicators of gene expression, we determined that the gene expression patterns differ significantly between embryos and reproductive tissues in pigs. The basic active genes were identified for each source, but most of them are not coexpressed abundantly. Few genes were expressed on the Y chromosome (P < 0.01), but they may represent counterparts of the double-dose genes that remain active in an inactivated X chromosome in females but are needed for proper development and growth. The census provides a panel of transcripts in a broad sense that can be used as targets to study the mechanisms involved in embryo development and reproduction in pigs and other mammals, including humans.

  6. SVD-based anatomy of gene expressions for correlation analysis in Arabidopsis thaliana.

    PubMed

    Fukushima, Atsushi; Wada, Masayoshi; Kanaya, Shigehiko; Arita, Masanori

    2008-12-01

    Gene co-expression analysis has been widely used in recent years for predicting unknown gene function and its regulatory mechanisms. The predictive accuracy depends on the quality and the diversity of data set used. In this report, we applied singular value decomposition (SVD) to array experiments in public databases to find that co-expression linkage could be estimated by a much smaller number of array data. Correlations of co-expressed gene were assessed using two regulatory mechanisms (feedback loop of the fundamental circadian clock and a global transcription factor Myb28), as well as metabolic pathways in the AraCyc database. Our conclusion is that a smaller number of informative arrays across tissues can suffice to reproduce comparable results with a state-of-the-art co-expression software tool. In our SVD analysis on Arabidopsis data set, array experiments that contributed most as the principal components included stamen development, germinating seed and stress responses on leaf.

  7. Partial Least Squares Based Gene Expression Analysis in EBV- Positive and EBV-Negative Posttransplant Lymphoproliferative Disorders.

    PubMed

    Wu, Sa; Zhang, Xin; Li, Zhi-Ming; Shi, Yan-Xia; Huang, Jia-Jia; Xia, Yi; Yang, Hang; Jiang, Wen-Qi

    2013-01-01

    Post-transplant lymphoproliferative disorder (PTLD) is a common complication of therapeutic immunosuppression after organ transplantation. Gene expression profile facilitates the identification of biological difference between Epstein-Barr virus (EBV) positive and negative PTLDs. Previous studies mainly implemented variance/regression analysis without considering unaccounted array specific factors. The aim of this study is to investigate the gene expression difference between EBV positive and negative PTLDs through partial least squares (PLS) based analysis. With a microarray data set from the Gene Expression Omnibus database, we performed PLS based analysis. We acquired 1188 differentially expressed genes. Pathway and Gene Ontology enrichment analysis identified significantly over-representation of dysregulated genes in immune response and cancer related biological processes. Network analysis identified three hub genes with degrees higher than 15, including CREBBP, ATXN1, and PML. Proteins encoded by CREBBP and PML have been reported to be interact with EBV before. Our findings shed light on expression distinction of EBV positive and negative PTLDs with the hope to offer theoretical support for future therapeutic study.

  8. Prediction on the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase based on gene expression programming.

    PubMed

    Li, Yuqin; You, Guirong; Jia, Baoxiu; Si, Hongzong; Yao, Xiaojun

    2014-01-01

    Quantitative structure-activity relationships (QSAR) were developed to predict the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase via heuristic method (HM) and gene expression programming (GEP). The descriptors of 33 pyrrolidine derivatives were calculated by the software CODESSA, which can calculate quantum chemical, topological, geometrical, constitutional, and electrostatic descriptors. HM was also used for the preselection of 5 appropriate molecular descriptors. Linear and nonlinear QSAR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R (2)) of 0.93 and 0.94. The two QSAR models are useful in predicting the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase during the discovery of new anticancer drugs and providing theory information for studying the new drugs.

  9. A new measure for gene expression biclustering based on non-parametric correlation.

    PubMed

    Flores, Jose L; Inza, Iñaki; Larrañaga, Pedro; Calvo, Borja

    2013-12-01

    One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  10. A PSO-Based Approach for Pathway Marker Identification From Gene Expression Data.

    PubMed

    Mandal, Monalisa; Mondal, Jyotirmay; Mukhopadhyay, Anirban

    2015-09-01

    In this article, a new and robust pathway activity inference scheme is proposed from gene expression data using Particle Swarm Optimization (PSO). From microarray gene expression data, the corresponding pathway information of the genes are collected from a public database. For identifying the pathway markers, the expression values of each pathway consisting of genes, termed as pathway activity, are summarized. To measure the goodness of a pathway activity vector, t-score is widely used in the existing literature. The weakness of existing techniques for inferring pathway activity is that they intend to consider all the member genes of a pathway. But in reality, all the member genes may not be significant to the corresponding pathway. Therefore, those genes, which are responsible in the corresponding pathway, should be included only. Motivated by this, in the proposed method, using PSO, important genes with respect to each pathway are identified. The objective is to maximize the average t-score. For the pathway activities inferred from different percentage of significant pathways, the average absolute t -scores are plotted. In addition, the top 50% pathway markers are evaluated using 10-fold cross validation and its performance is compared with that of other existing techniques. Biological relevance of the results is also studied.

  11. Benzodiazepines use and breast cancer risk: A population-based study and gene expression profiling evidence.

    PubMed

    Iqbal, Usman; Chang, Tzu-Hao; Nguyen, Phung-Anh; Syed-Abdul, Shabbir; Yang, Hsuan-Chia; Huang, Chih-Wei; Atique, Suleman; Yang, Wei-Chung; Moldovan, Max; Jian, Wen-Shan; Hsu, Min-Huei; Yen, Yun; Jack Li, Yu-Chuan

    2017-08-26

    The aim of this study was to investigate whether long-term use of Benzodiazepines (BZDs) is associated with breast cancer risk through the combination of population-based observational and gene expression profiling evidence. We conducted a population-based case-control study by using 1998 to 2009 year Taiwan National Health Insurance Research Database and investigated the association between BZDs use and breast cancer risk. We selected subjects age of > 20 years old and six eligible controls matched for age, sex and the index date (i.e., free of any cancer at the case diagnosis date) by using propensity scores. A bioinformatics analysis approach was also performed for the identification of oncogenesis effects of BZDs on breast cancer. We used breast cancer gene expression data from the Cancer Genome Atlas and perturbagen signatures of BZDs from the Library of Integrated Cellular Signatures database in order to identify the oncogenesis effects of BZDs on breast cancer. We found evidence of increased breast cancer risk for diazepam (OR, 1.16; 95%CI, 0.95-1.42; connectivity score [CS], 0.3016), zolpidem (OR, 1.11; 95%CI, 0.95-1.30; CS, 0.2738), but not for lorazepam (OR, 1.04; 95%CI, 0.89-1.23; CS, -0.2952) consistently in both methods. The finding for alparazolam was contradictory from the two methods. Diazepam and zolpidem trends showed association, although not statistically significant, with breast cancer risk in both epidemiological and bioinformatics analyses outcomes. The methodological value of our study is in introducing the way of combining epidemiological and bioinformatics approaches in order to answer a common scientific question. Combining the two approaches would be a substantial step towards uncovering, validation and further application of previously unknown scientific knowledge to the emerging field of precision medicine informatics. Copyright © 2017. Published by Elsevier Inc.

  12. Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels.

    PubMed

    Podolsky, Maxim D; Barchuk, Anton A; Kuznetcov, Vladimir I; Gusarova, Natalia F; Gaidukov, Vadim S; Tarakanov, Segrey A

    2016-01-01

    Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

  13. Classification of Dengue Fever Patients Based on Gene Expression Data Using Support Vector Machines

    PubMed Central

    Khan, Asif M.; Gil, Laura H. V. G.; Marques, Ernesto T. A.; Calzavara-Silva, Carlos E.; Tan, Tin Wee

    2010-01-01

    Background Symptomatic infection by dengue virus (DENV) can range from dengue fever (DF) to dengue haemorrhagic fever (DHF), however, the determinants of DF or DHF progression are not completely understood. It is hypothesised that host innate immune response factors are involved in modulating the disease outcome and the expression levels of genes involved in this response could be used as early prognostic markers for disease severity. Methodology/Principal Findings mRNA expression levels of genes involved in DENV innate immune responses were measured using quantitative real time PCR (qPCR). Here, we present a novel application of the support vector machines (SVM) algorithm to analyze the expression pattern of 12 genes in peripheral blood mononuclear cells (PBMCs) of 28 dengue patients (13 DHF and 15 DF) during acute viral infection. The SVM model was trained using gene expression data of these genes and achieved the highest accuracy of ∼85% with leave-one-out cross-validation. Through selective removal of gene expression data from the SVM model, we have identified seven genes (MYD88, TLR7, TLR3, MDA5, IRF3, IFN-α and CLEC5A) that may be central in differentiating DF patients from DHF, with MYD88 and TLR7 observed to be the most important. Though the individual removal of expression data of five other genes had no impact on the overall accuracy, a significant combined role was observed when the SVM model of the two main genes (MYD88 and TLR7) was re-trained to include the five genes, increasing the overall accuracy to ∼96%. Conclusions/Significance Here, we present a novel use of the SVM algorithm to classify DF and DHF patients, as well as to elucidate the significance of the various genes involved. It was observed that seven genes are critical in classifying DF and DHF patients: TLR3, MDA5, IRF3, IFN-α, CLEC5A, and the two most important MYD88 and TLR7. While these preliminary results are promising, further experimental investigation is necessary to validate

  14. A gene-expression-based neural code for food abundance that modulates lifespan.

    PubMed

    Entchev, Eugeni V; Patel, Dhaval S; Zhan, Mei; Steele, Andrew J; Lu, Hang; Ch'ng, QueeLim

    2015-05-12

    How the nervous system internally represents environmental food availability is poorly understood. Here, we show that quantitative information about food abundance is encoded by combinatorial neuron-specific gene-expression of conserved TGFβ and serotonin pathway components in Caenorhabditis elegans. Crosstalk and auto-regulation between these pathways alters the shape, dynamic range, and population variance of the gene-expression responses of daf-7 (TGFβ) and tph-1 (tryptophan hydroxylase) to food availability. These intricate regulatory features provide distinct mechanisms for TGFβ and serotonin signaling to tune the accuracy of this multi-neuron code: daf-7 primarily regulates gene-expression variability, while tph-1 primarily regulates the dynamic range of gene-expression responses. This code is functional because daf-7 and tph-1 mutations bidirectionally attenuate food level-dependent changes in lifespan. Our results reveal a neural code for food abundance and demonstrate that gene expression serves as an additional layer of information processing in the nervous system to control long-term physiology.

  15. A gene-expression-based neural code for food abundance that modulates lifespan

    PubMed Central

    Entchev, Eugeni V; Patel, Dhaval S; Zhan, Mei; Steele, Andrew J; Lu, Hang; Ch'ng, QueeLim

    2015-01-01

    How the nervous system internally represents environmental food availability is poorly understood. Here, we show that quantitative information about food abundance is encoded by combinatorial neuron-specific gene-expression of conserved TGFβ and serotonin pathway components in Caenorhabditis elegans. Crosstalk and auto-regulation between these pathways alters the shape, dynamic range, and population variance of the gene-expression responses of daf-7 (TGFβ) and tph-1 (tryptophan hydroxylase) to food availability. These intricate regulatory features provide distinct mechanisms for TGFβ and serotonin signaling to tune the accuracy of this multi-neuron code: daf-7 primarily regulates gene-expression variability, while tph-1 primarily regulates the dynamic range of gene-expression responses. This code is functional because daf-7 and tph-1 mutations bidirectionally attenuate food level-dependent changes in lifespan. Our results reveal a neural code for food abundance and demonstrate that gene expression serves as an additional layer of information processing in the nervous system to control long-term physiology. DOI: http://dx.doi.org/10.7554/eLife.06259.001 PMID:25962853

  16. Forensic diagnosis of ante- and postmortem burn based on aquaporin-3 gene expression in the skin.

    PubMed

    Kubo, Hidemichi; Hayashi, Takahito; Ago, Kazutoshi; Ago, Mihoko; Kanekura, Takuro; Ogata, Mamoru

    2014-05-01

    In order to diagnose death associated with fire, it is essential to show that the person was exposed to heat while still alive. We investigated both AQP1 and AQP3 expression in the skin of an experimental burn model, as well as in forensic autopsy cases, and discuss its role in the differential diagnosis of ante- and postmortem burns. In animal experiments, there was no difference in AQP1 gene expression among four groups (n=4): antemortem burn, postmortem burn, mechanical wound, and control. However, AQP3 expression in the antemortem burn was increased significantly compared with that of the other groups even at 5min after burn. Water content of the skin was decreased significantly by the burn procedure. Consistent with animal experiments, AQP3 gene expression in the skin of antemortem burn cases was increased significantly compared with postmortem burns, mechanical wounds, and controls (n=12 in each group). These observations suggest that dermal AQP3 gene expression was increased to maintain water homeostasis in response to dehydration from burn. Finally, our results suggest that AQP3 gene expression may be useful for forensic molecular diagnosis of antemortem burn.

  17. Gene expression profile in cardiovascular disease and preeclampsia: a meta-analysis of the transcriptome based on raw data from human studies deposited in Gene Expression Omnibus.

    PubMed

    Sitras, V; Fenton, C; Acharya, G

    2015-02-01

    Cardiovascular disease (CVD) and preeclampsia (PE) share common clinical features. We aimed to identify common transcriptomic signatures involved in CVD and PE in humans. Meta-analysis of individual raw microarray data deposited in GEO, obtained from blood samples of patients with CVD versus controls and placental samples from women with PE versus healthy women with uncomplicated pregnancies. Annotation of cases versus control samples was taken directly from the microarray documentation. Genes that showed a significant differential expression in the majority of experiments were selected for subsequent analysis. Hypergeometric gene list analysis was performed using Bioconductor GOstats package. Bioinformatic analysis was performed in PANTHER. Seven studies in CVD and 5 studies in PE were eligible for meta-analysis. A total of 181 genes were found to be differentially expressed in microarray studies investigating gene expression in blood samples obtained from patients with CVD compared to controls and 925 genes were differentially expressed between preeclamptic and healthy placentas. Among these differentially expressed genes, 22 were common between CVD and PE. Bioinformatic analysis of these genes revealed oxidative stress, p-53 pathway feedback, inflammation mediated by chemokines and cytokines, interleukin signaling, B-cell activation, PDGF signaling, Wnt signaling, integrin signaling and Alzheimer disease pathways to be involved in the pathophysiology of both CVD and PE. Metabolism, development, response to stimulus, immune response and cell communication were the associated biologic processes in both conditions. Gene set enrichment analysis showed the following overlapping pathways between CVD and PE: TGF-β-signaling, apoptosis, graft-versus-host disease, allograft rejection, chemokine signaling, steroid hormone synthesis, type I and II diabetes mellitus, VEGF signaling, pathways in cancer, GNRH signaling, Huntingtons disease and Notch signaling. CVD and PE

  18. Applicability of a gene expression based prediction method to SD and Wistar rats: an example of CARCINOscreen®.

    PubMed

    Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro

    2015-12-01

    Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.

  19. Expression of cell cycle-regulated genes and prostate cancer prognosis in a population-based cohort.

    PubMed

    Rubicz, Rohina; Zhao, Shanshan; April, Craig; Wright, Jonathan L; Kolb, Suzanne; Coleman, Ilsa; Lin, Daniel W; Nelson, Peter S; Ostrander, Elaine A; Feng, Ziding; Fan, Jian-Bing; Stanford, Janet L

    2015-09-01

    Prostate cancer (PCa) is clinically and biologically heterogeneous, making it difficult to predict at detection whether it will take an indolent or aggressive disease course. Cell cycle-regulated genes may be more highly expressed in actively dividing cells, with transcript levels reflecting tumor growth rate. Here, we evaluated expression of cell cycle genes in relation to PCa outcomes in a population-based cohort. Gene expression data were generated from tumor tissues obtained at radical prostatectomy for 383 population-based patients (12.3-years average follow-up). The overall mean and individual transcript levels of 30 selected cell cycle genes was compared between patients with no evidence of recurrence (73%) and those who recurred (27%) or died (7%) from PCa. The multivariate adjusted hazard ratio (HR) for a change from the 25th to 75th percentile of mean gene expression level (range 8.02-10.05) was 1.25 (95%CI 0.96-1.63; P = 0.10) for PCa recurrence risk, and did not vary substantially by Gleason score, TMPRSS2-ERG fusion status, or family history of PCa. For lethal PCa, the HR for a change (25th to 75th percentile) in mean gene expression level was 2.04 (95%CI 1.26-3.31; P = 0.004), adjusted for clinicopathological variables. The ROC curve for mean gene expression level alone (AUC = 0.740) did not perform as well as clinicopathological variables alone (AUC = 0.803) for predicting lethal PCa, and the addition of mean gene expression to clinicopathological variables did not substantially improve prediction (AUC = 0.827; P = 0.18). Higher TK1 expression was strongly associated with both recurrent (P = 6.7 × 10(-5)) and lethal (P = 6.4 × 10(-6)) PCa. Mean expression level for 30 selected cell cycle-regulated genes was unrelated to recurrence risk, but was associated with a twofold increase in risk of lethal PCa. However, gene expression had less discriminatory accuracy than clinical variables alone for predicting lethal

  20. Development of a plant viral-vector-based gene expression assay for the screening of yeast cytochrome p450 monooxygenases.

    PubMed

    Hanley, Kathleen; Nguyen, Long V; Khan, Faizah; Pogue, Gregory P; Vojdani, Fakhrieh; Panda, Sanjay; Pinot, Franck; Oriedo, Vincent B; Rasochova, Lada; Subramanian, Mani; Miller, Barbara; White, Earl L

    2003-02-01

    Development of a gene discovery tool for heterologously expressed cytochrome P450 monooxygenases has been inherently difficult. The activity assays are labor-intensive and not amenable to parallel screening. Additionally, biochemical confirmation requires coexpression of a homologous P450 reductase or complementary heterologous activity. Plant virus gene expression systems have been utilized for a diverse group of organisms. In this study we describe a method using an RNA vector expression system to phenotypically screen for cytochrome P450-dependent fatty acid omega-hydroxylase activity. Yarrowia lipolytica CYP52 gene family members involved in n-alkane assimilation were amplified from genomic DNA, cloned into a plant virus gene expression vector, and used as a model system for determining heterologous expression. Plants infected with virus vectors expressing the yeast CYP52 genes (YlALK1-YlALK7) showed a distinct necrotic lesion phenotype on inoculated plant leaves. No phenotype was detected on negative control constructs. YlALK3-, YlALK5-, and YlALK7-inoculated plants all catalyzed the terminal hydroxylation of lauric acid as confirmed using thin-layer and gas chromatography/mass spectrometry methods. The plant-based cytochrome P450 phenotypic screen was tested on an n-alkane-induced Yarrowia lipolytica plant virus expression library. A subset of 1,025 random library clones, including YlALK1-YlALK7 constructs, were tested on plants. All YlALK gene constructs scored positive in the randomized screen. Following nucleotide sequencing of the clones that scored positive using a phenotypic screen, approximately 5% were deemed appropriate for further biochemical analysis. This report illustrates the utility of a plant-based system for expression of heterologous cytochrome P450 monooxygenases and for the assignment of gene function.

  1. Effect of Long-Term Storage in TRIzol on Microarray-Based Gene Expression Profiling

    PubMed Central

    Ma, Wencai; Wang, Michael; Wang, Zhi-Qiang; Sun, Luhong; Graber, David; Matthews, Jairo; Champlin, Richard; Yi, Qing; Orlowski, Robert Z.; Kwak, Larry W.; Weber, Donna M.; Thomas, Sheeba K.; Shah, Jatin; Kornblau, Steven; Davis, R. Eric

    2010-01-01

    Background Although TRIzol is widely used for preservation and isolation of RNA, there is suspicion that prolonged sample storage in TRIzol may affect array-based gene expression profiling (GEP), via premature termination during reverse transcription (RT). Methods GEP on Illumina arrays compared paired aliquots (cryopreserved or stored in TRIzol) of primary samples of multiple myeloma (MM) and acute myeloid leukemia (AML). Data were analyzed at the “probe level” (a single consensus value) or “bead level” (multiple measurements provided by individual beads). Results TRIzol storage does not affect standard probe-level comparisons between sample groups: different preservation methods did not generate differentially-expressed probes (DEPs) within MM or AML sample groups, or substantially affect the many DEPs distinguishing between these groups. Differences were found by gene set enrichment analysis, but were dismissible because of instability with permutation of sample labels, unbalanced restriction to TRIzol aliquots, inconsistency between MM and AML groups, and lack of biological plausibility. Bead-level comparisons found many DEPs within sample pairs, but most (73%) were <2-fold changed. There was no consistent evidence that TRIzol causes premature RT termination. Instead, a subset of DEPs were systematically due to increased signals in TRIzol-preserved samples from probes near the 5’ end of transcripts, suggesting better mRNA preservation with TRIzol. Conclusions TRIzol preserves RNA quality well, without a deleterious effect on GEP. Samples stored frozen with and without TRIzol may be compared by GEP with only minor concern for systematic artifacts. Impact The standard practice of prolonged sample storage in TRIzol is suitable for GEP. PMID:20805315

  2. Microfluidic droplet-based PCR instrumentation for high-throughput gene expression profiling and biomarker discovery

    PubMed Central

    Hayes, Christopher J.; Dalton, Tara M.

    2015-01-01

    PCR is a common and often indispensable technique used in medical and biological research labs for a variety of applications. Real-time quantitative PCR (RT-qPCR) has become a definitive technique for quantitating differences in gene expression levels between samples. Yet, in spite of this importance, reliable methods to quantitate nucleic acid amounts in a higher throughput remain elusive. In the following paper, a unique design to quantify gene expression levels at the nanoscale in a continuous flow system is presented. Fully automated, high-throughput, low volume amplification of deoxynucleotides (DNA) in a droplet based microfluidic system is described. Unlike some conventional qPCR instrumentation that use integrated fluidic circuits or plate arrays, the instrument performs qPCR in a continuous, micro-droplet flowing process with droplet generation, distinctive reagent mixing, thermal cycling and optical detection platforms all combined on one complete instrument. Detailed experimental profiling of reactions of less than 300 nl total volume is achieved using the platform demonstrating the dynamic range to be 4 order logs and consistent instrument sensitivity. Furthermore, reduced pipetting steps by as much as 90% and a unique degree of hands-free automation makes the analytical possibilities for this instrumentation far reaching. In conclusion, a discussion of the first demonstrations of this approach to perform novel, continuous high-throughput biological screens is presented. The results generated from the instrument, when compared with commercial instrumentation, demonstrate the instrument reliability and robustness to carry out further studies of clinical significance with added throughput and economic benefits. PMID:27077035

  3. In Silico Analysis of Microarray-Based Gene Expression Profiles Predicts Tumor Cell Response to Withanolides

    PubMed Central

    Efferth, Thomas; Greten, Henry Johannes

    2012-01-01

    Withania somnifera (L.) Dunal (Indian ginseng, winter cherry, Solanaceae) is widely used in traditional medicine. Roots are either chewed or used to prepare beverages (aqueous decocts). The major secondary metabolites of Withania somnifera are the withanolides, which are C-28-steroidal lactone triterpenoids. Withania somnifera extracts exert chemopreventive and anticancer activities in vitro and in vivo. The aims of the present in silico study were, firstly, to investigate whether tumor cells develop cross-resistance between standard anticancer drugs and withanolides and, secondly, to elucidate the molecular determinants of sensitivity and resistance of tumor cells towards withanolides. Using IC50 concentrations of eight different withanolides (withaferin A, withaferin A diacetate, 3-azerininylwithaferin A, withafastuosin D diacetate, 4-B-hydroxy-withanolide E, isowithanololide E, withafastuosin E, and withaperuvin) and 19 established anticancer drugs, we analyzed the cross-resistance profile of 60 tumor cell lines. The cell lines revealed cross-resistance between the eight withanolides. Consistent cross-resistance between withanolides and nitrosoureas (carmustin, lomustin, and semimustin) was also observed. Then, we performed transcriptomic microarray-based COMPARE and hierarchical cluster analyses of mRNA expression to identify mRNA expression profiles predicting sensitivity or resistance towards withanolides. Genes from diverse functional groups were significantly associated with response of tumor cells to withaferin A diacetate, e.g. genes functioning in DNA damage and repair, stress response, cell growth regulation, extracellular matrix components, cell adhesion and cell migration, constituents of the ribosome, cytoskeletal organization and regulation, signal transduction, transcription factors, and others. PMID:27605335

  4. ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions

    PubMed Central

    Tan, Jie; Hammond, John H.; Hogan, Deborah A.

    2016-01-01

    ABSTRACT The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (analysis using denoising autoencoders of gene expression), and apply it to the publicly available gene expression data compendium for Pseudomonas aeruginosa. In this approach, the machine-learned ADAGE model contained 50 nodes which we predicted would correspond to gene expression patterns across the gene expression compendium. While no biological knowledge was used during model construction, cooperonic genes had similar weights across nodes, and genes with similar weights across nodes were significantly more likely to share KEGG pathways. By analyzing newly generated and previously published microarray and transcriptome sequencing data, the ADAGE model identified differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes based on low-level gene expression differences. ADAGE compared favorably with traditional principal component analysis and independent component analysis approaches in its ability to extract validated patterns, and based on our analyses, we propose that these approaches differ in the types of patterns they preferentially identify. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments and open source code for use with other species and settings. Extraction of consistent patterns across large-scale collections of genomic data using methods like ADAGE provides the opportunity to identify general principles and biologically important patterns in microbial biology. This approach will be particularly useful in less-well-studied microbial species. IMPORTANCE The quantity and breadth of genome-scale data sets that examine RNA expression in diverse

  5. ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.

    PubMed

    Tan, Jie; Hammond, John H; Hogan, Deborah A; Greene, Casey S

    2016-01-01

    The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (analysis using denoising autoencoders of gene expression), and apply it to the publicly available gene expression data compendium for Pseudomonas aeruginosa. In this approach, the machine-learned ADAGE model contained 50 nodes which we predicted would correspond to gene expression patterns across the gene expression compendium. While no biological knowledge was used during model construction, cooperonic genes had similar weights across nodes, and genes with similar weights across nodes were significantly more likely to share KEGG pathways. By analyzing newly generated and previously published microarray and transcriptome sequencing data, the ADAGE model identified differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes based on low-level gene expression differences. ADAGE compared favorably with traditional principal component analysis and independent component analysis approaches in its ability to extract validated patterns, and based on our analyses, we propose that these approaches differ in the types of patterns they preferentially identify. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments and open source code for use with other species and settings. Extraction of consistent patterns across large-scale collections of genomic data using methods like ADAGE provides the opportunity to identify general principles and biologically important patterns in microbial biology. This approach will be particularly useful in less-well-studied microbial species. IMPORTANCE The quantity and breadth of genome-scale data sets that examine RNA expression in diverse bacterial

  6. A Cas9-based toolkit to program gene expression in Saccharomyces cerevisiae

    PubMed Central

    Reider Apel, Amanda; d'Espaux, Leo; Wehrs, Maren; Sachs, Daniel; Li, Rachel A.; Tong, Gary J.; Garber, Megan; Nnadi, Oge; Zhuang, William; Hillson, Nathan J.; Keasling, Jay D.; Mukhopadhyay, Aindrila

    2017-01-01

    Despite the extensive use of Saccharomyces cerevisiae as a platform for synthetic biology, strain engineering remains slow and laborious. Here, we employ CRISPR/Cas9 technology to build a cloning-free toolkit that addresses commonly encountered obstacles in metabolic engineering, including chromosomal integration locus and promoter selection, as well as protein localization and solubility. The toolkit includes 23 Cas9-sgRNA plasmids, 37 promoters of various strengths and temporal expression profiles, and 10 protein-localization, degradation and solubility tags. We facilitated the use of these parts via a web-based tool, that automates the generation of DNA fragments for integration. Our system builds upon existing gene editing methods in the thoroughness with which the parts are standardized and characterized, the types and number of parts available and the ease with which our methodology can be used to perform genetic edits in yeast. We demonstrated the applicability of this toolkit by optimizing the expression of a challenging but industrially important enzyme, taxadiene synthase (TXS). This approach enabled us to diagnose an issue with TXS solubility, the resolution of which yielded a 25-fold improvement in taxadiene production. PMID:27899650

  7. A Cas9-based toolkit to program gene expression in Saccharomyces cerevisiae

    SciTech Connect

    Reider Apel, Amanda; d'Espaux, Leo; Wehrs, Maren; Sachs, Daniel; Li, Rachel A.; Tong, Gary J.; Garber, Megan; Nnadi, Oge; Zhuang, William; Hillson, Nathan J.; Keasling, Jay D.; Mukhopadhyay, Aindrila

    2016-11-28

    Despite the extensive use of Saccharomyces cerevisiae as a platform for synthetic biology, strain engineering remains slow and laborious. Here, we employ CRISPR/Cas9 technology to build a cloning-free toolkit that addresses commonly encountered obstacles in metabolic engineering, including chromosomal integration locus and promoter selection, as well as protein localization and solubility. The toolkit includes 23 Cas9-sgRNA plasmids, 37 promoters of various strengths and temporal expression profiles, and 10 protein-localization, degradation and solubility tags. We facilitated the use of these parts via a web-based tool, that automates the generation of DNA fragments for integration. Our system builds upon existing gene editing methods in the thoroughness with which the parts are standardized and characterized, the types and number of parts available and the ease with which our methodology can be used to perform genetic edits in yeast. We demonstrated the applicability of this toolkit by optimizing the expression of a challenging but industrially important enzyme, taxadiene synthase (TXS). This approach enabled us to diagnose an issue with TXS solubility, the resolution of which yielded a 25-fold improvement in taxadiene production.

  8. A Cas9-based toolkit to program gene expression in Saccharomyces cerevisiae.

    PubMed

    Reider Apel, Amanda; d'Espaux, Leo; Wehrs, Maren; Sachs, Daniel; Li, Rachel A; Tong, Gary J; Garber, Megan; Nnadi, Oge; Zhuang, William; Hillson, Nathan J; Keasling, Jay D; Mukhopadhyay, Aindrila

    2017-01-09

    Despite the extensive use of Saccharomyces cerevisiae as a platform for synthetic biology, strain engineering remains slow and laborious. Here, we employ CRISPR/Cas9 technology to build a cloning-free toolkit that addresses commonly encountered obstacles in metabolic engineering, including chromosomal integration locus and promoter selection, as well as protein localization and solubility. The toolkit includes 23 Cas9-sgRNA plasmids, 37 promoters of various strengths and temporal expression profiles, and 10 protein-localization, degradation and solubility tags. We facilitated the use of these parts via a web-based tool, that automates the generation of DNA fragments for integration. Our system builds upon existing gene editing methods in the thoroughness with which the parts are standardized and characterized, the types and number of parts available and the ease with which our methodology can be used to perform genetic edits in yeast. We demonstrated the applicability of this toolkit by optimizing the expression of a challenging but industrially important enzyme, taxadiene synthase (TXS). This approach enabled us to diagnose an issue with TXS solubility, the resolution of which yielded a 25-fold improvement in taxadiene production.

  9. A Cas9-based toolkit to program gene expression in Saccharomyces cerevisiae

    DOE PAGES

    Reider Apel, Amanda; d'Espaux, Leo; Wehrs, Maren; ...

    2016-11-28

    Despite the extensive use of Saccharomyces cerevisiae as a platform for synthetic biology, strain engineering remains slow and laborious. Here, we employ CRISPR/Cas9 technology to build a cloning-free toolkit that addresses commonly encountered obstacles in metabolic engineering, including chromosomal integration locus and promoter selection, as well as protein localization and solubility. The toolkit includes 23 Cas9-sgRNA plasmids, 37 promoters of various strengths and temporal expression profiles, and 10 protein-localization, degradation and solubility tags. We facilitated the use of these parts via a web-based tool, that automates the generation of DNA fragments for integration. Our system builds upon existing gene editingmore » methods in the thoroughness with which the parts are standardized and characterized, the types and number of parts available and the ease with which our methodology can be used to perform genetic edits in yeast. We demonstrated the applicability of this toolkit by optimizing the expression of a challenging but industrially important enzyme, taxadiene synthase (TXS). This approach enabled us to diagnose an issue with TXS solubility, the resolution of which yielded a 25-fold improvement in taxadiene production.« less

  10. Analysis of differentially co-expressed genes based on microarray data of hepatocellular carcinoma.

    PubMed

    Wang, Y; Jiang, T; Li, Z; Lu, L; Zhang, R; Zhang, D; Wang, X; Tan, J

    2017-01-01

    Hepatocellular carcinoma (HCC) is the third leading cause of cancer related death worldwide. Although great progress in diagnosis and management of HCC have been made, the exact molecular mechanisms remain poorly understood. The study aims to identify potential biomarkers for HCC progression, mainly at transcription level. In this study, chip data GSE 29721 was utilized, which contains 10 HCC samples and 10 normal adjacent tissue samples. Differentially expressed genes (DEGs) between two sample types were selected by t-test method. Following, the differentially co-expressed genes (DCGs) and differentially co-expressed Links (DCLs) were identified by DCGL package in R with the threshold of q < 0.25. Afterwards, pathway enrichment analysis of the DCGs was carried out by DAVID. Then, DCLs were mapped to TRANSFAC database to reveal associations between relevant transcriptional factors (TFs) and their target genes. Quantitative real-time RT-PCR was performed for TFs or genes of interest. As a result, a total of 388 DCGs and 35,771 DCLs were obtained. The predominant pathways enriched by these genes were Cytokine-cytokine receptor interaction, ECM-receptor interaction and TGF-β signaling pathway. Three TF-target interactions, LEF1-NCAM1, EGR1-FN1 and FOS-MT2A were predicted. Compared with control, expressions of the TF genes EGR1, FOS and ETS2 were all up-regulated in the HCC cell line, HepG2; while LEF1 was down-regulated. Except NCAM1, all the target genes were up-regulated in HepG2. Our findings suggest these TFs and genes might play important roles in the pathogenesis of HCC and may be used as therapeutic targets for HCC management.

  11. shinyGEO: a web-based application for analyzing gene expression omnibus datasets.

    PubMed

    Dumas, Jasmine; Gargano, Michael A; Dancik, Garrett M

    2016-12-01

    The Gene Expression Omnibus (GEO) is a public repository of gene expression data. Although GEO has its own tool, GEO2R, for data analysis, evaluation of single genes is not straightforward and survival analysis in specific GEO datasets is not possible without bioinformatics expertise. We describe a web application, shinyGEO, that allows a user to download gene expression data sets directly from GEO in order to perform differential expression and survival analysis for a gene of interest. In addition, shinyGEO supports customized graphics, sample selection, data export and R code generation so that all analyses are reproducible. The availability of shinyGEO makes GEO datasets more accessible to non-bioinformaticians, promising to lead to better understanding of biological processes and genetic diseases such as cancer. Web application and source code are available from http://gdancik.github.io/shinyGEO/ CONTACT: dancikg@easternct.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Host-Based Peripheral Blood Gene Expression Analysis for Diagnosis of Infectious Diseases.

    PubMed

    Holcomb, Zachary E; Tsalik, Ephraim L; Woods, Christopher W; McClain, Micah T

    2017-02-01

    Emerging pandemic infectious threats, inappropriate antibacterial use contributing to multidrug resistance, and increased morbidity and mortality from diagnostic delays all contribute to a need for improved diagnostics in the field of infectious diseases. Historically, diagnosis of infectious diseases has relied on pathogen detection; however, a novel concept to improve diagnostics in infectious diseases relies instead on the detection of changes in patterns of gene expression in circulating white blood cells in response to infection. Alterations in peripheral blood gene expression in the infected state are robust and reproducible, yielding diagnostic and prognostic information to help facilitate patient treatment decisions.

  13. A blood-based gene expression and signaling pathway analysis to differentiate between high and low grade gliomas.

    PubMed

    Ponnampalam, Stephen N; Kamaluddin, Nor Rizan; Zakaria, Zubaidah; Matheneswaran, Vickneswaran; Ganesan, Dharmendra; Haspani, Mohammed Saffari; Ryten, Mina; Hardy, John A

    2017-01-01

    The aims of the present study were to undertake gene expression profiling of the blood of glioma patients to determine key genetic components of signaling pathways and to develop a panel of genes that could be used as a potential blood-based biomarker to differentiate between high and low grade gliomas, non-gliomas and control samples. In this study, blood samples were obtained from glioma patients, non-glioma and control subjects. Ten samples each were obtained from patients with high and low grade tumours, respectively, ten samples from non-glioma patients and twenty samples from control subjects. Total RNA was isolated from each sample after which first and second strand synthesis was performed. The resulting cRNA was then hybridized with the Agilent Whole Human Genome (4x44K) microarray chip according to the manufacturer's instructions. Universal Human Reference RNA and samples were labeled with Cy3 CTP and Cy5 CTP, respectively. Microarray data were analyzed by the Agilent Gene Spring 12.1V software using stringent criteria which included at least a 2-fold difference in gene expression between samples. Statistical analysis was performed using the unpaired Student's t-test with a p<0.01. Pathway enrichment was also performed, with key genes selected for validation using droplet digital polymerase chain reaction (ddPCR). The gene expression profiling indicated that were a substantial number of genes that were differentially expressed with more than a 2-fold change (p<0.01) between each of the four different conditions. We selected key genes within significant pathways that were analyzed through pathway enrichment. These key genes included regulators of cell proliferation, transcription factors, cytokines and tumour suppressor genes. In the present study, we showed that key genes involved in significant and well established pathways, could possibly be used as a potential blood-based biomarker to differentiate between high and low grade gliomas, non-gliomas and

  14. A blood-based gene expression and signaling pathway analysis to differentiate between high and low grade gliomas

    PubMed Central

    Ponnampalam, Stephen N.; Kamaluddin, Nor Rizan; Zakaria, Zubaidah; Matheneswaran, Vickneswaran; Ganesan, Dharmendra; Haspani, Mohammed Saffari; Ryten, Mina; Hardy, John A.

    2016-01-01

    The aims of the present study were to undertake gene expression profiling of the blood of glioma patients to determine key genetic components of signaling pathways and to develop a panel of genes that could be used as a potential blood-based biomarker to differentiate between high and low grade gliomas, non-gliomas and control samples. In this study, blood samples were obtained from glioma patients, non-glioma and control subjects. Ten samples each were obtained from patients with high and low grade tumours, respectively, ten samples from non-glioma patients and twenty samples from control subjects. Total RNA was isolated from each sample after which first and second strand synthesis was performed. The resulting cRNA was then hybridized with the Agilent Whole Human Genome (4×44K) microarray chip according to the manufacturer's instructions. Universal Human Reference RNA and samples were labeled with Cy3 CTP and Cy5 CTP, respectively. Microarray data were analyzed by the Agilent Gene Spring 12.1V software using stringent criteria which included at least a 2-fold difference in gene expression between samples. Statistical analysis was performed using the unpaired Student's t-test with a P<0.01. Pathway enrichment was also performed, with key genes selected for validation using droplet digital polymerase chain reaction (ddPCR). The gene expression profiling indicated that were a substantial number of genes that were differentially expressed with more than a 2-fold change (P<0.01) between each of the four different conditions. We selected key genes within significant pathways that were analyzed through pathway enrichment. These key genes included regulators of cell proliferation, transcription factors, cytokines and tumour suppressor genes. In the present study, we showed that key genes involved in significant and well established pathways, could possibly be used as a potential blood-based biomarker to differentiate between high and low grade gliomas, non-gliomas and

  15. Pathway and gene ontology based analysis of gene expression in a rat model of cerebral ischemic tolerance.

    PubMed

    Feng, Zheng; Davis, Daniel P; Sásik, Roman; Patel, Hemal H; Drummond, John C; Patel, Piyush M

    2007-10-26

    Ischemic tolerance is a phenomenon whereby a sublethal ischemic insult [ischemic preconditioning (IPC)] provides robust protection against subsequent lethal ischemia. Activation of N-methyl-D-aspartate (NMDA) receptors and subsequent new gene transcription are required for tolerance. We utilized the NMDA antagonist, MK801, prior to the IPC stimulus to separate candidate genes from epiphenomenona. Rats were divided into four groups: vehicle/IPC (preconditioned), MK801/IPC (attenuated preconditioning), vehicle/sham (non-preconditioned), and MK801/sham (non-preconditioned). Hippocampi (5/group/time point) were harvested immediately after ischemia as well as 1, 4, and 24 h post-ischemia to profile gene expression patterns using microarray analyses. Extracted mRNAs were pooled and subsequently hybridized to Affymetrix arrays. In addition, groups of rats were sacrificed for Western blot analysis and histological studies. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and gene ontology (GO) analyses were used to identify functionally related groups of genes whose modulation was statistically significant, while hierarchical cluster analysis was used to visualize the fold expression within these groups. Significantly modulated pathways included: MAP kinase signaling pathway, Toll receptor pathway, TGF-beta signaling pathways, and pathways associated with ribosome function and oxidative phosphorylation. Our data suggest that the tolerant brain responds to subsequent ischemic stress by partially downregulating inflammatory and upregulating protein synthesis and energy metabolism pathways.

  16. Study of gene function based on spatial co-expression in a high-resolution mouse brain atlas

    PubMed Central

    Liu, Zheng; Yan, S Frank; Walker, John R; Zwingman, Theresa A; Jiang, Tao; Li, Jing; Zhou, Yingyao

    2007-01-01

    Background The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions. Results In order to address the pattern comparison challenge when analyzing the ABA database, we developed a robust image filtering method, dubbed histogram-row-column (HRC) algorithm. We demonstrated how the HRC algorithm offers the sensitivity of identifying a manageable number of gene pairs based on automatic pattern searching from an original large brain image collection. This tool enables us to quickly identify genes of similar in situ hybridization patterns in a semi-automatic fashion and consequently allows us to discover several gene expression patterns with expression neighborhoods containing genes of similar functional categories. Conclusion Given a query brain image, HRC is a fully automated algorithm that is able to quickly mine vast number of brain images and identify a manageable subset of genes that potentially shares similar spatial co-distribution patterns for further visual inspection. A three-dimensional in situ hybridization pattern, if statistically significant, could serve as a fingerprint of certain gene function. Databases such as ABA provide

  17. Study of gene function based on spatial co-expression in a high-resolution mouse brain atlas.

    PubMed

    Liu, Zheng; Yan, S Frank; Walker, John R; Zwingman, Theresa A; Jiang, Tao; Li, Jing; Zhou, Yingyao

    2007-04-16

    The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions. In order to address the pattern comparison challenge when analyzing the ABA database, we developed a robust image filtering method, dubbed histogram-row-column (HRC) algorithm. We demonstrated how the HRC algorithm offers the sensitivity of identifying a manageable number of gene pairs based on automatic pattern searching from an original large brain image collection. This tool enables us to quickly identify genes of similar in situ hybridization patterns in a semi-automatic fashion and consequently allows us to discover several gene expression patterns with expression neighborhoods containing genes of similar functional categories. Given a query brain image, HRC is a fully automated algorithm that is able to quickly mine vast number of brain images and identify a manageable subset of genes that potentially shares similar spatial co-distribution patterns for further visual inspection. A three-dimensional in situ hybridization pattern, if statistically significant, could serve as a fingerprint of certain gene function. Databases such as ABA provide valuable data source for

  18. Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO.

    PubMed

    Zuo, Yiming; Cui, Yi; Yu, Guoqiang; Li, Ruijiang; Ressom, Habtom W

    2017-02-10

    Conventional differential gene expression analysis by methods such as student's t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions and to investigate the rewiring interactions in disease versus control groups. In this paper, we apply weighted graphical LASSO (wgLASSO) algorithm to integrate a data-driven network model with prior biological knowledge (i.e., protein-protein interactions) for biological network inference. We propose a novel differentially weighted graphical LASSO (dwgLASSO) algorithm that builds group-specific networks and perform network-based differential gene expression analysis to select biomarker candidates by considering their topological differences between the groups. Through simulation, we showed that wgLASSO can achieve better performance in building biologically relevant networks than purely data-driven models (e.g., neighbor selection, graphical LASSO), even when only a moderate level of information is available as prior biological knowledge. We evaluated the performance of dwgLASSO for survival time prediction using two microarray breast cancer datasets previously reported by Bild et al. and van de Vijver et al. Compared with the top 10 significant genes selected by conventional differential gene expression analysis method, the top 10 significant genes selected by dwgLASSO in the dataset from Bild et al. led to a significantly improved survival time prediction in the independent dataset from van de Vijver et al. Among the 10 genes selected by dwgLASSO, UBE2S, SALL2, XBP1 and KIAA0922 have been confirmed by literature survey to be highly relevant in breast cancer biomarker discovery study. Additionally, we tested dwgLASSO on TCGA RNA-seq data acquired from patients with hepatocellular carcinoma (HCC) on tumors samples and their corresponding non-tumorous liver tissues. Improved

  19. A novel mutual information-based Boolean network inference method from time-series gene expression data

    PubMed Central

    Barman, Shohag; Kwon, Yung-Keun

    2017-01-01

    Background Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately. Results In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI) method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods. Conclusions Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network. PMID:28178334

  20. Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation.

    PubMed

    Richard, Arianne C; Lyons, Paul A; Peters, James E; Biasci, Daniele; Flint, Shaun M; Lee, James C; McKinney, Eoin F; Siegel, Richard M; Smith, Kenneth G C

    2014-08-04

    Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study. Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this "gold-standard" comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues. Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

  1. The epithelial sodium channel γ-subunit gene and blood pressure: family based association, renal gene expression, and physiological analyses.

    PubMed

    Büsst, Cara J; Bloomer, Lisa D S; Scurrah, Katrina J; Ellis, Justine A; Barnes, Timothy A; Charchar, Fadi J; Braund, Peter; Hopkins, Paul N; Samani, Nilesh J; Hunt, Steven C; Tomaszewski, Maciej; Harrap, Stephen B

    2011-12-01

    Variants in the gene encoding the γ-subunit of the epithelial sodium channel (SCNN1G) are associated with both Mendelian and quantitative effects on blood pressure. Here, in 4 cohorts of 1611 white European families composed of a total of 8199 individuals, we undertook staged testing of candidate single-nucleotide polymorphisms for SCNN1G (supplemented with imputation based on data from the 1000 Genomes Project) followed by a meta-analysis in all of the families of the strongest candidate. We also examined relationships between the genotypes and relevant intermediate renal phenotypes, as well as expression of SCNN1G in human kidneys. We found that an intronic single-nucleotide polymorphism of SCNN1G (rs13331086) was significantly associated with age-, sex-, and body mass index-adjusted blood pressure in each of the 4 populations (P<0.05). In an inverse variance-weighted meta-analysis of this single-nucleotide polymorphism in all 4 of the populations, each additional minor allele copy was associated with a 1-mm Hg increase in systolic blood pressure and 0.52-mm Hg increase in diastolic blood pressure (SE=0.33, P=0.002 for systolic blood pressure; SE=0.21, P=0.011 for diastolic blood pressure). The same allele was also associated with higher 12-hour overnight urinary potassium excretion (P=0.04), consistent with increased epithelial sodium channel activity. Renal samples from hypertensive subjects showed a nonsignificant (P=0.07) 1.7-fold higher expression of SCNN1G compared with normotensive controls. These data provide genetic and phenotypic evidence in support of a role for a common genetic variant of SCNN1G in blood pressure determination.

  2. Positively regulated glycerol/G3P-dependent Bacillus subtilis gene expression system based on anti-termination.

    PubMed

    Lewin, Anna; Su, Xiao-Dong; Hederstedt, Lars

    2009-01-01

    Plasmid pLALA was constructed for glycerol or glycerol-3-phosphate inducible plasmid-borne gene expression in Bacillus subtilis and closely related Gram-positive bacteria. Gene expression using pLALA is based on anti-termination of transcription and involves the B. subtilis GlpP protein that in the presence of glycerol-3-phosphate acts as an anti-terminator protein by binding to the 5'-untranslated region of glpD mRNA. Properties and the usefulness of the system, denoted LALA, were validated by inducible production in B. subtilis strains of two water-soluble proteins (beta-galactosidase and a protein phospho-tyrosine phosphatase), and one integral membrane protein (heme A synthase). Advantages with LALA is that it is based on positive control, does not involve a DNA-binding protein, and that glycerol, a cheap and stable compound, can be used as inducer of gene expression.

  3. Interaction-Based Feature Selection for Uncovering Cancer Driver Genes Through Copy Number-Driven Expression Level.

    PubMed

    Park, Heewon; Niida, Atsushi; Imoto, Seiya; Miyano, Satoru

    2017-02-01

    Driver gene selection is crucial to understand the heterogeneous system of cancer. To identity cancer driver genes, various statistical strategies have been proposed, especially the L1-type regularization methods have drawn a large amount of attention. However, the statistical approaches have been developed purely from algorithmic and statistical point, and the existing studies have applied the statistical approaches to genomic data analysis without consideration of biological knowledge. We consider a statistical strategy incorporating biological knowledge to identify cancer driver gene. The alterations of copy number have been considered to driver cancer pathogenesis processes, and the region of strong interaction of copy number alterations and expression levels was known as a tumor-related symptom. We incorporate the influence of copy number alterations on expression levels to cancer driver gene-selection processes. To quantify the dependence of copy number alterations on expression levels, we consider [Formula: see text] and [Formula: see text] effects of copy number alterations on expression levels of genes, and incorporate the symptom of tumor pathogenesis to gene-selection procedures. We then proposed an interaction-based feature-selection strategy based on the adaptive L1-type regularization and random lasso procedures. The proposed method imposes a large amount of penalty on genes corresponding to a low dependency of the two features, thus the coefficients of the genes are estimated to be small or exactly 0. It implies that the proposed method can provide biologically relevant results in cancer driver gene selection. Monte Carlo simulations and analysis of the Cancer Genome Atlas (TCGA) data show that the proposed strategy is effective for high-dimensional genomic data analysis. Furthermore, the proposed method provides reliable and biologically relevant results for cancer driver gene selection in TCGA data analysis.

  4. Entropy-based cluster validation and estimation of the number of clusters in gene expression data.

    PubMed

    Novoselova, Natalia; Tom, Igor

    2012-10-01

    Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

  5. Structure-based predictions broadly link transcription factor mutations to gene expression changes in cancers.

    PubMed

    Ashworth, Justin; Bernard, Brady; Reynolds, Sheila; Plaisier, Christopher L; Shmulevich, Ilya; Baliga, Nitin S

    2014-12-01

    Thousands of unique mutations in transcription factors (TFs) arise in cancers, and the functional and biological roles of relatively few of these have been characterized. Here, we used structure-based methods developed specifically for DNA-binding proteins to systematically predict the consequences of mutations in several TFs that are frequently mutated in cancers. The explicit consideration of protein-DNA interactions was crucial to explain the roles and prevalence of mutations in TP53 and RUNX1 in cancers, and resulted in a higher specificity of detection for known p53-regulated genes among genetic associations between TP53 genotypes and genome-wide expression in The Cancer Genome Atlas, compared to existing methods of mutation assessment. Biophysical predictions also indicated that the relative prevalence of TP53 missense mutations in cancer is proportional to their thermodynamic impacts on protein stability and DNA binding, which is consistent with the selection for the loss of p53 transcriptional function in cancers. Structure and thermodynamics-based predictions of the impacts of missense mutations that focus on specific molecular functions may be increasingly useful for the precise and large-scale inference of aberrant molecular phenotypes in cancer and other complex diseases.

  6. Network-Based Gene Expression Biomarkers for Cold and Heat Patterns of Rheumatoid Arthritis in Traditional Chinese Medicine

    PubMed Central

    Lu, Cheng; Niu, Xuyan; Xiao, Cheng; Chen, Gao; Zha, Qinglin; Guo, Hongtao; Jiang, Miao; Lu, Aiping

    2012-01-01

    In Traditional Chinese Medicine (TCM), patients with Rheumatoid Arthritis (RA) can be classified into two main patterns: cold-pattern and heat-pattern. This paper identified the network-based gene expression biomarkers for both cold- and heat-patterns of RA. Gene expression profilings of CD4+ T cells from cold-pattern RA patients, heat-pattern RA patients, and healthy volunteers were obtained using microarray. The differentially expressed genes and related networks were explored using DAVID, GeneSpring software, and the protein-protein interactions (PPI) method. EIF4A2, CCNT1, and IL7R, which were related to the up-regulation of cell proliferation and the Jak-STAT cascade, were significant gene biomarkers of the TCM cold pattern of RA. PRKAA1, HSPA8, and LSM6, which were related to fatty acid metabolism and the I-κB kinase/NF-κB cascade, were significant biomarkers of the TCM heat-pattern of RA. The network-based gene expression biomarkers for the TCM cold- and heat-patterns may be helpful for the further stratification of RA patients when deciding on interventions or clinical trials. PMID:22536280

  7. Gene Expression Music Algorithm-Based Characterization of the Ewing Sarcoma Stem Cell Signature

    PubMed Central

    2016-01-01

    Gene Expression Music Algorithm (GEMusicA) is a method for the transformation of DNA microarray data into melodies that can be used for the characterization of differentially expressed genes. Using this method we compared gene expression profiles from endothelial cells (EC), hematopoietic stem cells, neuronal stem cells, embryonic stem cells (ESC), and mesenchymal stem cells (MSC) and defined a set of genes that can discriminate between the different stem cell types. We analyzed the behavior of public microarray data sets from Ewing sarcoma (“Ewing family tumors,” EFT) cell lines and biopsies in GEMusicA after prefiltering DNA microarray data for the probe sets from the stem cell signature. Our results demonstrate that individual Ewing sarcoma cell lines have a high similarity to ESC or EC. Ewing sarcoma cell lines with inhibited Ewing sarcoma breakpoint region 1-Friend leukemia virus integration 1 (EWSR1-FLI1) oncogene retained the similarity to ESC and EC. However, correlation coefficients between GEMusicA-processed expression data between EFT and ESC decreased whereas correlation coefficients between EFT and EC as well as between EFT and MSC increased after knockdown of EWSR1-FLI1. Our data support the concept of EFT being derived from cells with features of embryonic and endothelial cells. PMID:27446218

  8. Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium

    PubMed Central

    Yang, Fengxi; Zhu, Genfa

    2015-01-01

    Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral

  9. An expression index for Affymetrix GeneChips based on the generalized logarithm.

    PubMed

    Zhou, Lei; Rocke, David M

    2005-11-01

    Affymetrix GeneChip high-density oligonucleotide arrays interrogate a single transcript using multiple short 25mer probes. Usually, a necessary step in the analysis of experiments using these GeneChips is to summarize each of these probe sets into a single expression index that can then be used for determining differential expression, for classification, for clustering, and for other analyses. In this paper, we propose a new expression index that is competitive with the best existing methods, and superior in many cases. We call this expression index method GLA, for GLog Average, since after normalization at the probe level, we take the mean generalized logarithm of perfect match probes. In this paper, we use Affycomp as the primary tool to assess the weaknesses and strengths of GLA. Comparisons are made between GLA and most widely used summary methods (RMA, MAS5.0 and MBEI) in great detail. The substantial reduction in variability and increased ability to detect differential expression, together with the simplicity of implementation, make GLA a plausible candidate for analysis of Affymetrix GeneChip data.

  10. Analysis of dysregulation of immune system in pancreatic cancer based on gene expression profile.

    PubMed

    Wang, Baosheng; Sun, Shaolong; Liu, Zhen

    2014-07-01

    The aim of this study was to explore the dysregulated expression of the immune system in pancreatic cancer and clarify the pathogenesis of pancreatic cancer. The Dataset GSE15471 was downloaded from GEO database, Student's t test was used to screen differentially expressed genes (DEGs) between the pancreatic cancer group and the normal control group. Kyoto Encyclopedia of Genes and Genomes (KEGG) provides functional annotation was employed to explore the significant DEGs involved in biological functions. We got 988 significantly DEGs, including 832 up-regulated genes and 156 down-regulated genes. The ratio of up-regulated genes and down-regulated genes was 5.3. Total 13 biological pathways which were significant enriched with DEGs by KEGG pathway enrichment analysis. Finally, we constructed a overall network of the immune system in pancreatic cancer with these biological pathways information. Our study reveals that dysregulated pathways in pancreatic cancer associated with the immune system. Besides, we also identify some important molecular biomarkers of the pancreatic cancer, including CXCR4 and CD4. Dysfunctional pathways and important molecular biomarkers of pancreatic cancer will provide useful information for potential treatment of pancreatic cancer.

  11. Gene expression analysis of biopsy samples reveals critical limitations of transcriptome‐based molecular classifications of hepatocellular carcinoma

    PubMed Central

    Makowska, Zuzanna; Boldanova, Tujana; Adametz, David; Quagliata, Luca; Vogt, Julia E.; Dill, Michael T.; Matter, Mathias S.; Roth, Volker; Terracciano, Luigi

    2016-01-01

    Abstract Molecular classification of hepatocellular carcinomas (HCC) could guide patient stratification for personalized therapies targeting subclass‐specific cancer ‘driver pathways’. Currently, there are several transcriptome‐based molecular classifications of HCC with different subclass numbers, ranging from two to six. They were established using resected tumours that introduce a selection bias towards patients without liver cirrhosis and with early stage HCCs. We generated and analyzed gene expression data from paired HCC and non‐cancerous liver tissue biopsies from 60 patients as well as five normal liver samples. Unbiased consensus clustering of HCC biopsy profiles identified 3 robust classes. Class membership correlated with survival, tumour size and with Edmondson and Barcelona Clinical Liver Cancer (BCLC) stage. When focusing only on the gene expression of the HCC biopsies, we could validate previously reported classifications of HCC based on expression patterns of signature genes. However, the subclass‐specific gene expression patterns were no longer preserved when the fold‐change relative to the normal tissue was used. The majority of genes believed to be subclass‐specific turned out to be cancer‐related genes differentially regulated in all HCC patients, with quantitative rather than qualitative differences between the molecular subclasses. With the exception of a subset of samples with a definitive β‐catenin gene signature, biological pathway analysis could not identify class‐specific pathways reflecting the activation of distinct oncogenic programs. In conclusion, we have found that gene expression profiling of HCC biopsies has limited potential to direct therapies that target specific driver pathways, but can identify subgroups of patients with different prognosis. PMID:27499918

  12. Histogenetic compartments of the mouse centromedial and extended amygdala based on gene expression patterns during development.

    PubMed

    García-López, Margarita; Abellán, Antonio; Legaz, Isabel; Rubenstein, John L R; Puelles, Luis; Medina, Loreta

    2008-01-01

    The amygdala controls emotional and social behavior and regulates instinctive reflexes such as defense and reproduction by way of descending projections to the hypothalamus and brainstem. The descending amygdalar projections are suggested to show a cortico-striato-pallidal organization similar to that of the basal ganglia (Swanson [2000] Brain Res 886:113-164). To test this model we investigated the embryological origin and molecular properties of the mouse centromedial and extended amygdalar subdivisions, which constitute major sources of descending projections. We analyzed the distribution of key regulatory genes that show restricted expression patterns within the subpallium (Dlx5, Nkx2.1, Lhx6, Lhx7/8, Lhx9, Shh, and Gbx1), as well as genes considered markers for specific subpallial neuronal subpopulations. Our results indicate that most of the centromedial and extended amygdala is formed by cells derived from multiple subpallial subdivisions. Contrary to a previous suggestion, only the central--but not the medial--amygdala derives from the lateral ganglionic eminence and has striatal-like features. The medial amygdala and a large part of the extended amygdala (including the bed nucleus of the stria terminalis) consist of subdivisions or cell groups that derive from subpallial, pallial (ventral pallium), or extratelencephalic progenitor domains. The subpallial part includes derivatives from the medial ganglionic eminence, the anterior peduncular area, and possibly a novel subdivision, called here commissural preoptic area, located at the base of the septum and related to the anterior commissure. Our study provides a molecular and morphological foundation for understanding the complex embryonic origins and adult organization of the centromedial and extended amygdala.

  13. Histogenetic Compartments of the Mouse Centromedial and Extended Amygdala Based on Gene Expression Patterns during Development

    PubMed Central

    García-López, Margarita; Abellán, Antonio; Legaz, Isabel; Rubenstein, John L.R.; Puelles, Luis; Medina, Loreta

    2016-01-01

    The amygdala controls emotional and social behavior and regulates instinctive reflexes such as defense and reproduction by way of descending projections to the hypothalamus and brainstem. The descending amygdalar projections are suggested to show a cortico-striato-pallidal organization similar to that of the basal ganglia (Swanson [2000] Brain Res 886:113–164). To test this model we investigated the embryological origin and molecular properties of the mouse centromedial and extended amygdalar subdivisions, which constitute major sources of descending projections. We analyzed the distribution of key regulatory genes that show restricted expression patterns within the subpallium (Dlx5, Nkx2.1, Lhx6, Lhx7/8, Lhx9, Shh, and Gbx1), as well as genes considered markers for specific subpallial neuronal subpopulations. Our results indicate that most of the centromedial and extended amygdala is formed by cells derived from multiple subpallial subdivisions. Contrary to a previous suggestion, only the central—but not the medial—amygdala derives from the lateral ganglionic eminence and has striatal-like features. The medial amygdala and a large part of the extended amygdala (including the bed nucleus of the stria terminalis) consist of subdivisions or cell groups that derive from subpallial, pallial (ventral pallium), or extratelencephalic progenitor domains. The subpallial part includes derivatives from the medial ganglionic eminence, the anterior peduncular area, and possibly a novel subdivision, called here commissural preoptic area, located at the base of the septum and related to the anterior commissure. Our study provides a molecular and morphological foundation for understanding the complex embryonic origins and adult organization of the centromedial and extended amygdala. PMID:17990271

  14. Validation of endogenous reference genes in Buglossoides arvensis for normalizing RT-qPCR-based gene expression data.

    PubMed

    Gadkar, Vijay J; Filion, Martin

    2015-01-01

    Selection of a stably expressed reference gene (RG) is an important step for generating reliable and reproducible quantitative real-time reverse transcription polymerase chain reaction (RT-qPCR) gene expression data. We, in this study, have sought to validate RGs for Buglossoides arvensis, a high nutraceutical value plant whose refined seed oil is entering the market under the commercial trade name Ahiflower™. This weed plant has received attention for its natural ability to significantly accumulate the poly-unsaturated fatty acid (PUFA) stearidonic acid (SDA, C18:4n-3) in its seeds, which is uncommon for most plant species. Ten candidate RGs (β-Act, 18S rRNA, EF-1a, α-Tub, UBQ, α-actin, CAC, PP2a, RUBISCO, GAPDH) were isolated from B. arvensis and TaqMan™ compliant primers/probes were designed for RT-qPCR analysis. Abundance of these gene transcripts was analyzed across different tissues and growth regimes. Two of the most widely used algorithms, geNorm and NormFinder, showed variation in expression levels of these RGs. However, combinatorial analysis of the results clearly identified CAC and α-actin as the most stable and unstable RG candidates, respectively. This study has for the first time identified and validated RGs in the non-model system B. arvensis, a weed plant projected to become an important yet sustainable source of dietary omega-3 PUFA.

  15. Prediction of metabolic flux distribution from gene expression data based on the flux minimization principle.

    PubMed

    Song, Hyun-Seob; Reifman, Jaques; Wallqvist, Anders

    2014-01-01

    Prediction of possible flux distributions in a metabolic network provides detailed phenotypic information that links metabolism to cellular physiology. To estimate metabolic steady-state fluxes, the most common approach is to solve a set of macroscopic mass balance equations subjected to stoichiometric constraints while attempting to optimize an assumed optimal objective function. This assumption is justifiable in specific cases but may be invalid when tested across different conditions, cell populations, or other organisms. With an aim to providing a more consistent and reliable prediction of flux distributions over a wide range of conditions, in this article we propose a framework that uses the flux minimization principle to predict active metabolic pathways from mRNA expression data. The proposed algorithm minimizes a weighted sum of flux magnitudes, while biomass production can be bounded to fit an ample range from very low to very high values according to the analyzed context. We have formulated the flux weights as a function of the corresponding enzyme reaction's gene expression value, enabling the creation of context-specific fluxes based on a generic metabolic network. In case studies of wild-type Saccharomyces cerevisiae, and wild-type and mutant Escherichia coli strains, our method achieved high prediction accuracy, as gauged by correlation coefficients and sums of squared error, with respect to the experimentally measured values. In contrast to other approaches, our method was able to provide quantitative predictions for both model organisms under a variety of conditions. Our approach requires no prior knowledge or assumption of a context-specific metabolic functionality and does not require trial-and-error parameter adjustments. Thus, our framework is of general applicability for modeling the transcription-dependent metabolism of bacteria and yeasts.

  16. Network-Based Meta-Analyses of Associations of Multiple Gene Expression Profiles with Bone Mineral Density Variations in Women

    PubMed Central

    Niu, Tianhua; Zhou, Yu; Zhang, Lan; Zeng, Yong; Zhu, Wei; Wang, Yu-ping; Deng, Hong-wen

    2016-01-01

    Background Existing microarray studies of bone mineral density (BMD) have been critical for understanding the pathophysiology of osteoporosis, and have identified a number of candidate genes. However, these studies were limited by their relatively small sample sizes and were usually analyzed individually. Here, we propose a novel network-based meta-analysis approach that combines data across six microarray studies to identify functional modules from human protein-protein interaction (PPI) data, and highlight several differentially expressed genes (DEGs) and a functional module that may play an important role in BMD regulation in women. Methods Expression profiling studies were identified by searching PubMed, Gene Expression Omnibus (GEO) and ArrayExpress. Two meta-analysis methods were applied across different gene expression profiling studies. The first, a nonparametric Fisher’s method, combined p-values from individual experiments to identify genes with large effect sizes. The second method combined effect sizes from individual datasets into a meta-effect size to gain a higher precision of effect size estimation across all datasets. Genes with Q test’s p-values < 0.05 or I2 values > 50% were assessed by a random effects model and the remainder by a fixed effects model. Using Fisher’s combined p-values, functional modules were identified through an integrated analysis of microarray data in the context of large protein–protein interaction (PPI) networks. Two previously published meta-analysis studies of genome-wide association (GWA) datasets were used to determine whether these module genes were genetically associated with BMD. Pathway enrichment analysis was performed with a hypergeometric test. Results Six gene expression datasets were identified, which included a total of 249 (129 high BMD and 120 low BMD) female subjects. Using a network-based meta-analysis, a consensus module containing 58 genes (nodes) and 83 edges was detected. Pathway enrichment

  17. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp.

  18. Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data.

    PubMed

    Ma, Yuanyuan; Hu, Xiaohua; He, Tingting; Jiang, Xingpeng

    2016-12-01

    Nonnegative matrix factorization (NMF) has received considerable attention due to its interpretation of observed samples as combinations of different components, and has been successfully used as a clustering method. As an extension of NMF, Symmetric NMF (SNMF) inherits the advantages of NMF. Unlike NMF, however, SNMF takes a nonnegative similarity matrix as an input, and two lower rank nonnegative matrices (H, H(T)) are computed as an output to approximate the original similarity matrix. Laplacian regularization has improved the clustering performance of NMF and SNMF. However, Laplacian regularization (LR), as a classic manifold regularization method, suffers some problems because of its weak extrapolating ability. In this paper, we propose a novel variant of SNMF, called Hessian regularization based symmetric nonnegative matrix factorization (HSNMF), for this purpose. In contrast to Laplacian regularization, Hessian regularization fits the data perfectly and extrapolates nicely to unseen data. We conduct extensive experiments on several datasets including text data, gene expression data and HMP (Human Microbiome Project) data. The results show that the proposed method outperforms other methods, which suggests the potential application of HSNMF in biological data clustering. Copyright © 2016. Published by Elsevier Inc.

  19. Proposed method for dimensionality reduction based on framework in gene expression domain.

    PubMed

    Macedo, D C; Ishikawa, E C M; Santos, C B; Matos, S N; Borges, H B; Francisco, A C

    2014-12-12

    The excessive use of attributes may affect the search for patterns and extraction of useful knowledge, because they harm the learning performance of algorithms in both speed and success rate. The use of dimensionality reduction methods is therefore an important alternative; however, these methods do not deal with the reduction of attributes in a specific area. This article presents a method based on framework concepts of domain for reducing attributes in a domain. The input method is a set of databases related to a domain, and the main process is the identification of common and variable attributes, plus the reduction of attributes in the original database. The proposed method was applied in the gene expression domain, using databases. The method can be used to analyze the most relevant attributes in a specific domain, granting greater confidence for models created for the application of a data mining task, thus, a previously known method in data mining. Attribute selection was also applied in the three databases for the comparison of the results. Analyses of the results using the criterion of cross-validation revealed that the employment of the methods resulted in the improvement of success rates compared to the databases containing the full range of attributes.

  20. RNase One Gene Isolation, Expression, and Affinity Purification Models Research Experimental Progression and Culminates with Guided Inquiry-Based Experiments

    ERIC Educational Resources Information Center

    Bailey, Cheryl P.

    2009-01-01

    This new biochemistry laboratory course moves through a progression of experiments that generates a platform for guided inquiry-based experiments. RNase One gene is isolated from prokaryotic genomic DNA, expressed as a tagged protein, affinity purified, and tested for activity and substrate specificity. Student pairs present detailed explanations…

  1. RNase One Gene Isolation, Expression, and Affinity Purification Models Research Experimental Progression and Culminates with Guided Inquiry-Based Experiments

    ERIC Educational Resources Information Center

    Bailey, Cheryl P.

    2009-01-01

    This new biochemistry laboratory course moves through a progression of experiments that generates a platform for guided inquiry-based experiments. RNase One gene is isolated from prokaryotic genomic DNA, expressed as a tagged protein, affinity purified, and tested for activity and substrate specificity. Student pairs present detailed explanations…

  2. Impaired brain StAR and HSP70 gene expression in zebrafish exposed to Methyl-Parathion based insecticide.

    PubMed

    da Rosa, João Gabriel Santos; Koakoski, Gessi; Piato, Angelo L; Bogo, Maurício Reis; Bonan, Carla Denise; Barcellos, Leonardo José Gil

    2016-01-01

    Fish production ponds and natural water body areas located in close proximity to agricultural fields receive water with variable amounts of agrochemicals, and consequently, compounds that produce adverse effects may reach nontarget organisms. The aim of this study was to investigate whether waterborne methyl-parathion-based insecticide (MPBI) affected gene expression patterns of brain glucocorticoid receptor (GR), steroidogenic acute regulatory protein (StAR), and heat shock protein 70 (hsp70) in adult zebrafish (Danio rerio) exposed to this chemical for 96 h. Treated fish exposed to MPBI-contaminated water showed an inhibition of brain StAR and hsp70 gene expression. Data demonstrated that MPBI produced a decrease brain StAR and hsp70 gene expression.

  3. Multi-parametric profiling network based on gene expression and phenotype data: a novel approach to developmental neurotoxicity testing.

    PubMed

    Nagano, Reiko; Akanuma, Hiromi; Qin, Xian-Yang; Imanishi, Satoshi; Toyoshiba, Hiroyoshi; Yoshinaga, Jun; Ohsako, Seiichiroh; Sone, Hideko

    2012-01-01

    The establishment of more efficient approaches for developmental neurotoxicity testing (DNT) has been an emerging issue for children's environmental health. Here we describe a systematic approach for DNT using the neuronal differentiation of mouse embryonic stem cells (mESCs) as a model of fetal programming. During embryoid body (EB) formation, mESCs were exposed to 12 chemicals for 24 h and then global gene expression profiling was performed using whole genome microarray analysis. Gene expression signatures for seven kinds of gene sets related to neuronal development and neuronal diseases were selected for further analysis. At the later stages of neuronal cell differentiation from EBs, neuronal phenotypic parameters were determined using a high-content image analyzer. Bayesian network analysis was then performed based on global gene expression and neuronal phenotypic data to generate comprehensive networks with a linkage between early events and later effects. Furthermore, the probability distribution values for the strength of the linkage between parameters in each network was calculated and then used in principal component analysis. The characterization of chemicals according to their neurotoxic potential reveals that the multi-parametric analysis based on phenotype and gene expression profiling during neuronal differentiation of mESCs can provide a useful tool to monitor fetal programming and to predict developmentally neurotoxic compounds.

  4. SNP-based large-scale identification of allele-specific gene expression in human B cells.

    PubMed

    Song, Min-Young; Kim, Hye-Eun; Kim, Sun; Choi, Ick-Hwa; Lee, Jong-Keuk

    2012-02-10

    Polymorphism and variations in gene expression provide the genetic basis for human variation. Allelic variation of gene expression, in particular, may play a crucial role in phenotypic variation and disease susceptibility. To identify genes with allelic expression in human cells, we genotyped genomic DNA and cDNA isolated from 31 immortalized B cell lines from three Centre d'Etude du Polymorphisme Humain (CEPH) families using high-density single-nucleotide polymorphism (SNP) chips containing 13,900 exonic SNPs. We identified seven SNPs in five genes with monoallelic expression, 146 SNPs in 125 genes with allelic imbalance in expression with preferentially higher expression of one allele in a heterozygous individual. The monoallelically expressed genes (ERAP2, MDGA1, LOC644422, SDCCAG3P1 and CLTCL1) were regulated by cis-acting, non-imprinted differential allelic control. In addition, all monoallelic gene expression patterns and allelic imbalances in gene expression in B cells were transmitted from parents to offspring in the pedigree, indicating genetic transmission of allelic gene expression. Furthermore, frequent allele substitution, probably due to RNA editing, was also observed in 21 genes in 23 SNPs as well as in 48 SNPs located in regions containing no known genes. In this study, we demonstrated that allelic gene expression is frequently observed in human B cells, and SNP chips are very useful tools for detecting allelic gene expression. Overall, our data provide a valuable framework for better understanding allelic gene expression in human B cells.

  5. Differential gene detection incorporating common expression patterns

    NASA Astrophysics Data System (ADS)

    Oba, Shigeyuki; Ishii, Shin

    2009-12-01

    In detection of differentially expressed (DE) genes between different groups of samples based on a high-throughput expression measurement system, we often use a classical statistical testing based on a simple assumption that the expression of a certain DE gene in one group is higher or lower in average than that in the other group. Based on this simple assumption, the theory of optimal discovery procedure (ODP) (Storey, 2005) provided an optimal thresholding function for DE gene detection. However, expression patterns of DE genes over samples may have such a structure that is not exactly consistent with group labels assigned to the samples. Appropriate treatment of such a structure can increase the detection ability. Namely, genes showing similar expression patterns to other biologically meaningful genes can be regarded as statistically more significant than those showing expression patterns independent of other genes, even if differences in mean expression levels are comparable. In this study, we propose a new statistical thresholding function based on a latent variable model incorporating expression patterns together with the ODP theory. The latent variable model assumes hidden common signals behind expression patterns over samples and the ODP theory is extended to involve the latent variables. When applied to several gene expression data matrices which include cluster structures or 'cancer outlier' structures, the newly-proposed thresholding functions showed prominently better detection performance of DE genes than the original ODP thresholding function did. We also demonstrate how the proposed methods behave through analyses of real breast cancer and lymphoma datasets.

  6. Gene expression-based biological test for major depressive disorder: an advanced study

    PubMed Central

    Watanabe, Shin-ya; Numata, Shusuke; Iga, Jun-ichi; Kinoshita, Makoto; Umehara, Hidehiro; Ishii, Kazuo; Ohmori, Tetsuro

    2017-01-01

    Purpose Recently, we could distinguished patients with major depressive disorder (MDD) from nonpsychiatric controls with high accuracy using a panel of five gene expression markers (ARHGAP24, HDAC5, PDGFC, PRNP, and SLC6A4) in leukocyte. In the present study, we examined whether this biological test is able to discriminate patients with MDD from those without MDD, including those with schizophrenia and bipolar disorder. Patients and methods We measured messenger ribonucleic acid expression levels of the aforementioned five genes in peripheral leukocytes in 17 patients with schizophrenia and 36 patients with bipolar disorder using quantitative real-time polymerase chain reaction (PCR), and we combined these expression data with our previous expression data of 25 patients with MDD and 25 controls. Subsequently, a linear discriminant function was developed for use in discriminating between patients with MDD and without MDD. Results This expression panel was able to segregate patients with MDD from those without MDD with a sensitivity and specificity of 64% and 67.9%, respectively. Conclusion Further research to identify MDD-specific markers is needed to improve the performance of this biological test. PMID:28260899

  7. STARNET 2: a web-based tool for accelerating discovery of gene regulatory networks using microarray co-expression data

    PubMed Central

    Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent

    2009-01-01

    Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new HEATSEEKER module. Conclusion STARNET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to

  8. Prediction of Metabolic Flux Distribution from Gene Expression Data Based on the Flux Minimization Principle

    DTIC Science & Technology

    2014-11-14

    problem. Modification of the PLOS ONE | www.plosone.org 1 November 2014 | Volume 9 | Issue 11 | e112524 Report Documentation Page Form ApprovedOMB No. 0704... Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 FBA algorithm to incorporate additional biological information from gene expression profiles is...We set the maximization of biomass production as the objective of FBA and implemented it in two different forms : without flux minimization (or

  9. Principal Components Analysis Based Unsupervised Feature Extraction Applied to Gene Expression Analysis of Blood from Dengue Haemorrhagic Fever Patients

    PubMed Central

    Taguchi, Y-h.

    2017-01-01

    Dengue haemorrhagic fever (DHF) sometimes occurs after recovery from the disease caused by Dengue virus (DENV), and is often fatal. However, the mechanism of DHF has not been determined, possibly because no suitable methodologies are available to analyse this disease. Therefore, more innovative methods are required to analyse the gene expression profiles of DENV-infected patients. Principal components analysis (PCA)-based unsupervised feature extraction (FE) was applied to the gene expression profiles of DENV-infected patients, and an integrated analysis of two independent data sets identified 46 genes as critical for DHF progression. PCA using only these 46 genes rendered the two data sets highly consistent. The application of PCA to the 46 genes of an independent third data set successfully predicted the progression of DHF. A fourth in vitro data set confirmed the identification of the 46 genes. These 46 genes included interferon- and heme-biosynthesis-related genes. The former are enriched in binding sites for STAT1, STAT2, and IRF1, which are associated with DHF-promoting antibody-dependent enhancement, whereas the latter are considered to be related to the dysfunction of spliceosomes, which may mediate haemorrhage. These results are outcomes that other type of bioinformatic analysis could hardly achieve. PMID:28276456

  10. COPD subtypes identified by network-based clustering of blood gene expression

    PubMed Central

    Chang, Yale; Glass, Kimberly; Liu, Yang-Yu; Silverman, Edwin K.; Crapo, James D.; Tal-Singer, Ruth; Bowler, Russ; Dy, Jennifer; Cho, Michael; Castaldi, Peter

    2016-01-01

    One of the most common smoking-related diseases, chronic obstructive pulmonary disease (COPD), results from a dysregulated, multi-tissue inflammatory response to cigarette smoke. We hypothesized that systemic inflammatory signals in genome-wide blood gene expression can identify clinically important COPD-related disease subtypes, and we leveraged pre-existing gene interaction networks to guide unsupervised clustering of blood microarray expression data. Using network-informed non-negative matrix factorization, we analyzed genome-wide blood gene expression from 229 former smokers in the ECLIPSE Study, and we identified novel, clinically relevant molecular subtypes of COPD. These network-informed clusters were more stable and more strongly associated with measures of lung structure and function than clusters derived from a network-naïve approach, and they were associated with subtype-specific enrichment for inflammatory and protein catabolic pathways. These clusters were successfully reproduced in an independent sample of 135 smokers from the COPDGene Study. PMID:26773458

  11. Correlation of gene expression and contaminat concentrations in wild largescale suckers: a field-based study

    USGS Publications Warehouse

    Christiansen, Helena E.; Mehinto, Alvine C.; Yu, Fahong; Perry, Russell W.; Denslow, Nancy D.; Maule, Alec G.; Mesa, Matthew G.

    2014-01-01

    Toxic compounds such as organochlorine pesticides (OCs), polychlorinated biphenyls (PCBs), and polybrominated diphenyl ether flame retardants (PBDEs) have been detected in fish, birds, and aquatic mammals that live in the Columbia River or use food resources from within the river. We developed a custom microarray for largescale suckers (Catostomus macrocheilus) and used it to investigate the molecular effects of contaminant exposure on wild fish in the Columbia River. Using Significance Analysis of Microarrays (SAM) we identified 72 probes representing 69 unique genes with expression patterns that correlated with hepatic tissue levels of OCs, PCBs, or PBDEs. These genes were involved in many biological processes previously shown to respond to contaminant exposure, including drug and lipid metabolism, apoptosis, cellular transport, oxidative stress, and cellular chaperone function. The relation between gene expression and contaminant concentration suggests that these genes may respond to environmental contaminant exposure and are promising candidates for further field and laboratory studies to develop biomarkers for monitoring exposure of wild fish to contaminant mixtures found in the Columbia River Basin. The array developed in this study could also be a useful tool for studies involving endangered sucker species and other sucker species used in contaminant research.

  12. A tree of life based on ninety-eight expressed genes conserved across diverse eukaryotic species

    PubMed Central

    Jayaswal, Pawan Kumar; Dogra, Vivek; Shanker, Asheesh; Sharma, Tilak Raj

    2017-01-01

    Rapid advances in DNA sequencing technologies have resulted in the accumulation of large data sets in the public domain, facilitating comparative studies to provide novel insights into the evolution of life. Phylogenetic studies across the eukaryotic taxa have been reported but on the basis of a limited number of genes. Here we present a genome-wide analysis across different plant, fungal, protist, and animal species, with reference to the 36,002 expressed genes of the rice genome. Our analysis revealed 9831 genes unique to rice and 98 genes conserved across all 49 eukaryotic species analysed. The 98 genes conserved across diverse eukaryotes mostly exhibited binding and catalytic activities and shared common sequence motifs; and hence appeared to have a common origin. The 98 conserved genes belonged to 22 functional gene families including 26S protease, actin, ADP–ribosylation factor, ATP synthase, casein kinase, DEAD-box protein, DnaK, elongation factor 2, glyceraldehyde 3-phosphate, phosphatase 2A, ras-related protein, Ser/Thr protein phosphatase family protein, tubulin, ubiquitin and others. The consensus Bayesian eukaryotic tree of life developed in this study demonstrated widely separated clades of plants, fungi, and animals. Musa acuminata provided an evolutionary link between monocotyledons and dicotyledons, and Salpingoeca rosetta provided an evolutionary link between fungi and animals, which indicating that protozoan species are close relatives of fungi and animals. The divergence times for 1176 species pairs were estimated accurately by integrating fossil information with synonymous substitution rates in the comprehensive set of 98 genes. The present study provides valuable insight into the evolution of eukaryotes. PMID:28922368

  13. Predicting gene targets of perturbations via network-based filtering of mRNA expression compendia

    PubMed Central

    Cosgrove, Elissa J.; Zhou, Yingchun; Gardner, Timothy S.; Kolaczyk, Eric D.

    2008-01-01

    Motivation: DNA microarrays are routinely applied to study diseased or drug-treated cell populations. A critical challenge is distinguishing the genes directly affected by these perturbations from the hundreds of genes that are indirectly affected. Here, we developed a sparse simultaneous equation model (SSEM) of mRNA expression data and applied Lasso regression to estimate the model parameters, thus constructing a network model of gene interaction effects. This inferred network model was then used to filter data from a given experimental condition of interest and predict the genes directly targeted by that perturbation. Results: Our proposed SSEM–Lasso method demonstrated substantial improvement in sensitivity compared with other tested methods for predicting the targets of perturbations in both simulated datasets and microarray compendia. In simulated data, for two different network types, and over a wide range of signal-to-noise ratios, our algorithm demonstrated a 167% increase in sensitivity on average for the top 100 ranked genes, compared with the next best method. Our method also performed well in identifying targets of genetic perturbations in microarray compendia, with up to a 24% improvement in sensitivity on average for the top 100 ranked genes. The overall performance of our network-filtering method shows promise for identifying the direct targets of genetic dysregulation in cancer and disease from expression profiles. Availability: Microarray data are available at the Many Microbe Microarrays Database (M3D, http://m3d.bu.edu). Algorithm scripts are available at the Gardner Lab website (http://gardnerlab.bu.edu/SSEMLasso). Contact: kolaczyk@math.bu.edu Supplementary information: Supplementary Data are available at Bioinformatics on line. PMID:18779235

  14. Gene Expression Profiling of Development and Anthocyanin Accumulation in Kiwifruit (Actinidia chinensis) Based on Transcriptome Sequencing

    PubMed Central

    Zeng, Shaohua; Xiao, Gong; Wang, Gan; Wang, Ying; Peng, Ming; Huang, Hongwen

    2015-01-01

    Red-fleshed kiwifruit (Actinidia chinensis Planch. ‘Hongyang’) is a promising commercial cultivar due to its nutritious value and unique flesh color, derived from vitamin C and anthocyanins. In this study, we obtained transcriptome data of ‘Hongyang’ from seven developmental stages using Illumina sequencing. We mapped 39–54 million reads to the recently sequenced kiwifruit genome and other databases to define gene structure, to analyze alternative splicing, and to quantify gene transcript abundance at different developmental stages. The transcript profiles throughout red kiwifruit development were constructed and analyzed, with a focus on the biosynthesis and metabolism of compounds such as phytohormones, sugars, starch and L-ascorbic acid, which are indispensable for the development and formation of quality fruit. Candidate genes for these pathways were identified through MapMan and phylogenetic analysis. The transcript levels of genes involved in sucrose and starch metabolism were consistent with the change in soluble sugar and starch content throughout kiwifruit development. The metabolism of L-ascorbic acid was very active, primarily through the L-galactose pathway. The genes responsible for the accumulation of anthocyanin in red kiwifruit were identified, and their expression levels were investigated during kiwifruit development. This survey of gene expression during kiwifruit development paves the way for further investigation of the development of this uniquely colored and nutritious fruit and reveals which factors are needed for high quality fruit formation. This transcriptome data and its analysis will be useful for improving kiwifruit genome annotation, for basic fruit molecular biology research, and for kiwifruit breeding and improvement. PMID:26301713

  15. Gene Expression Profiling of Development and Anthocyanin Accumulation in Kiwifruit (Actinidia chinensis) Based on Transcriptome Sequencing.

    PubMed

    Li, Wenbin; Liu, Yifei; Zeng, Shaohua; Xiao, Gong; Wang, Gan; Wang, Ying; Peng, Ming; Huang, Hongwen

    2015-01-01

    Red-fleshed kiwifruit (Actinidia chinensis Planch. 'Hongyang') is a promising commercial cultivar due to its nutritious value and unique flesh color, derived from vitamin C and anthocyanins. In this study, we obtained transcriptome data of 'Hongyang' from seven developmental stages using Illumina sequencing. We mapped 39-54 million reads to the recently sequenced kiwifruit genome and other databases to define gene structure, to analyze alternative splicing, and to quantify gene transcript abundance at different developmental stages. The transcript profiles throughout red kiwifruit development were constructed and analyzed, with a focus on the biosynthesis and metabolism of compounds such as phytohormones, sugars, starch and L-ascorbic acid, which are indispensable for the development and formation of quality fruit. Candidate genes for these pathways were identified through MapMan and phylogenetic analysis. The transcript levels of genes involved in sucrose and starch metabolism were consistent with the change in soluble sugar and starch content throughout kiwifruit development. The metabolism of L-ascorbic acid was very active, primarily through the L-galactose pathway. The genes responsible for the accumulation of anthocyanin in red kiwifruit were identified, and their expression levels were investigated during kiwifruit development. This survey of gene expression during kiwifruit development paves the way for further investigation of the development of this uniquely colored and nutritious fruit and reveals which factors are needed for high quality fruit formation. This transcriptome data and its analysis will be useful for improving kiwifruit genome annotation, for basic fruit molecular biology research, and for kiwifruit breeding and improvement.

  16. Expression profile based gene clusters for ischemic stroke detection Whole blood gene clusters for ischemic stroke detection

    PubMed Central

    Adamski, Mateusz G; Li, Yan; Wagner, Erin; Yu, Hua; Seales-Bailey, Chloe; Soper, Steven A; Murphy, Michael; Baird, Alison E

    2014-01-01

    In microarray studies alterations in gene expression in circulating leukocytes have shown utility for ischemic stroke diagnosis. We studied forty candidate markers identified in three gene expression profiles to (1) quantitate individual transcript expression, (2) identify transcript clusters and (3) assess the clinical diagnostic utility of the clusters identified for ischemic stroke detection. Using high throughput next generation qPCR 16 of the 40 transcripts were significantly up-regulated in stroke patients relative to control subjects (p<0.05). Six clusters of between 5 and 7 transcripts discriminated between stroke and control (p values between 1.01e-9 and 0.03). A 7 transcript cluster containing PLBD1, PYGL, BST1, DUSP1, FOS, VCAN and FCGR1A showed high accuracy for stroke classification (AUC=0.854). These results validate and improve upon the diagnostic value of transcripts identified in microarray studies for ischemic stroke. The clusters identified show promise for acute ischemic stroke detection. PMID:25135788

  17. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  18. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data

  19. Real-time polymerase chain reaction-based exponential sample amplification for microarray gene expression profiling.

    PubMed

    Nagy, Zsolt B; Kelemen, János Z; Fehér, Liliána Z; Zvara, Agnes; Juhász, Kata; Puskás, László G

    2005-02-01

    Conventional approaches to target labeling for gene expression analysis using microarray technology typically require relatively large amounts of RNA, a serious limitation when the available sample is limited. Here we describe an alternative exponential sample amplification method by using quantitative real-time polymerase chain reaction (QRT-PCR) to follow the amplification and eliminate the overamplified cDNA which could distort the quantitative ratio of the starting mRNA population. Probes generated from nonamplified, PCR-amplified, and real-time-PCR-amplified cDNA samples were generated from lipopolysaccharide-treated and nontreated mouse macrophages and hybridized to mouse cDNA microarrays. Signals obtained from the three protocols were compared. Reproducibility and reliability of the methods were determined. The Pearson correlation coefficients for replica experiments were r=0.927 and r=0.687 for QRT-PCR-amplification and PCR-overamplification protocols, respectively. Chi2 test showed that overamplification resulted in major biases in expression ratios, while these alterations could be eliminated by following the cycling status with QRT-PCR. Our exponential sample amplification protocol preserves the original expression ratios and allows unbiased gene expression analysis from minute amounts of starting material.

  20. A novel PCR-based method for high throughput prokaryotic expression of antimicrobial peptide genes

    PubMed Central

    2012-01-01

    Background To facilitate the screening of large quantities of new antimicrobial peptides (AMPs), we describe a cost-effective method for high throughput prokaryotic expression of AMPs. EDDIE, an autoproteolytic mutant of the N-terminal autoprotease, Npro, from classical swine fever virus, was selected as a fusion protein partner. The expression system was used for high-level expression of six antimicrobial peptides with different sizes: Bombinin-like peptide 7, Temporin G, hexapeptide, Combi-1, human Histatin 9, and human Histatin 6. These expressed AMPs were purified and evaluated for antimicrobial activity. Results Two or four primers were used to synthesize each AMP gene in a single step PCR. Each synthetic gene was then cloned into the pET30a/His-EDDIE-GFP vector via an in vivo recombination strategy. Each AMP was then expressed as an Npro fusion protein in Escherichia coli. The expressed fusion proteins existed as inclusion bodies in the cytoplasm and the expression levels of the six AMPs reached up to 40% of the total cell protein content. On in vitro refolding, the fusion AMPs was released from the C-terminal end of the autoprotease by self-cleavage, leaving AMPs with an authentic N terminus. The released fusion partner was easily purified by Ni-NTA chromatography. All recombinant AMPs displayed expected antimicrobial activity against E. coli, Micrococcus luteus and S. cerevisia. Conclusions The method described in this report allows the fast synthesis of genes that are optimized for over-expression in E. coli and for the production of sufficiently large amounts of peptides for functional and structural characterization. The Npro partner system, without the need for chemical or enzymatic removal of the fusion tag, is a low-cost, efficient way of producing AMPs for characterization. The cloning method, combined with bioinformatic analyses from genome and EST sequence data, will also be useful for screening new AMPs. Plasmid pET30a/His-EDDIE-GFP also provides

  1. [Study on action mechanism and material base of compound Danshen dripping pills in treatment of carotid atherosclerosis based on techniques of gene expression profile and molecular fingerprint].

    PubMed

    Zhou, Wei; Song, Xiang-gang; Chen, Chao; Wang, Shu-mei; Liang, Sheng-wang

    2015-08-01

    Action mechanism and material base of compound Danshen dripping pills in treatment of carotid atherosclerosis were discussed based on gene expression profile and molecular fingerprint in this paper. First, gene expression profiles of atherosclerotic carotid artery tissues and histologically normal tissues in human body were collected, and were screened using significance analysis of microarray (SAM) to screen out differential gene expressions; then differential genes were analyzed by Gene Ontology (GO) analysis and KEGG pathway analysis; to avoid some genes with non-outstanding differential expression but biologically importance, Gene Set Enrichment Analysis (GSEA) were performed, and 7 chemical ingredients with higher negative enrichment score were obtained by Cmap method, implying that they could reversely regulate the gene expression profiles of pathological tissues; and last, based on the hypotheses that similar structures have similar activities, 336 ingredients of compound Danshen dripping pills were compared with 7 drug molecules in 2D molecular fingerprints method. The results showed that 147 differential genes including 60 up-regulated genes and 87 down regulated genes were screened out by SAM. And in GO analysis, Biological Process ( BP) is mainly concerned with biological adhesion, response to wounding and inflammatory response; Cellular Component (CC) is mainly concerned with extracellular region, extracellular space and plasma membrane; while Molecular Function (MF) is mainly concerned with antigen binding, metalloendopeptidase activity and peptide binding. KEGG pathway analysis is mainly concerned with JAK-STAT, RIG-I like receptor and PPAR signaling pathway. There were 10 compounds, such as hexadecane, with Tanimoto coefficients greater than 0.85, which implied that they may be the active ingredients (AIs) of compound Danshen dripping pills in treatment of carotid atherosclerosis (CAs). The present method can be applied to the research on material

  2. Prioritization of candidate genes for cattle reproductive traits, based on protein-protein interactions, gene expression, and text-mining.

    PubMed

    Hulsegge, Ina; Woelders, Henri; Smits, Mari; Schokker, Dirkjan; Jiang, Li; Sørensen, Peter

    2013-05-15

    Reproduction is of significant economic importance in dairy cattle. Improved understanding of mechanisms that control estrous behavior and other reproduction traits could help in developing strategies to improve and/or monitor these traits. The objective of this study was to predict and rank genes and processes in brain areas and pituitary involved in reproductive traits in cattle using information derived from three different data sources: gene expression, protein-protein interactions, and literature. We identified 59, 89, 53, 23, and 71 genes in bovine amygdala, dorsal hypothalamus, hippocampus, pituitary, and ventral hypothalamus, respectively, potentially involved in processes underlying estrus and estrous behavior. Functional annotation of the candidate genes points to a number of tissue-specific processes of which the "neurotransmitter/ion channel/synapse" process in the amygdala, "steroid hormone receptor activity/ion binding" in the pituitary, "extracellular region" in the ventral hypothalamus, and "positive regulation of transcription/metabolic process" in the dorsal hypothalamus are most prominent. The regulation of the functional processes in the various tissues operate at different biological levels, including transcriptional, posttranscriptional, extracellular, and intercellular signaling levels.

  3. Nucleic-acid based gene therapeutics: delivery challenges and modular design of nonviral gene carriers and expression cassettes to overcome intracellular barriers for sustained targeted expression.

    PubMed

    Hsu, Charlie Yu Ming; Uludağ, Hasan

    2012-05-01

    The delivery of nucleic acid molecules into cells to alter physiological functions at the genetic level is a powerful approach to treat a wide range of inherited and acquired disorders. Biocompatible materials such as cationic polymers, lipids, and peptides are being explored as safer alternatives to viral gene carriers. However, the comparatively low efficiency of nonviral carriers currently hampers their translation into clinical settings. Controlling the size and stability of carrier/nucleic acid complexes is one of the primary hurdles as the physicochemical properties of the complexes can define the uptake pathways, which dictate intracellular routing, endosomal processing, and nucleocytoplasmic transport. In addition to nuclear import, subnuclear trafficking, posttranscriptional events, and immune responses can further limit transfection efficiency. Chemical moieties, reactive linkers or signal peptide have been conjugated to carriers to prevent aggregation, induce membrane destabilization and localize to subcellular compartments. Genetic elements can be inserted into the expression cassette to facilitate nuclear targeting, delimit expression to targeted tissue, and modulate transgene expression. The modular option afforded by both gene carriers and expression cassettes provides a two-tier multicomponent delivery system that can be optimized for targeted gene delivery in a variety of settings.

  4. Gene Express Inc.

    PubMed

    Saccomanno, Colette F

    2006-07-01

    Gene Express, Inc. is a technology-licensing company and provider of Standardized Reverse Transcription Polymerase Chain Reaction (StaRT-PCR) services. Designed by and for clinical researchers involved in pharmaceutical, biomarker and molecular diagnostic product development, StaRT-PCR is a unique quantitative and standardized multigene expression measurement platform. StaRT-PCR meets all of the performance characteristics defined by the US FDA as required to support regulatory submissions [101,102] , and by the Clinical Laboratory Improvement Act of 1988 (CLIA) as necessary to support diagnostic testing [1] . A standardized mixture of internal standards (SMIS), manufactured in bulk, provides integrated quality control wherein each native template target gene is measured relative to a competitive template internal standard. Bulk production enables the compilation of a comprehensive standardized database from across multiple experiments, across collaborating laboratories and across the entire clinical development lifecycle of a given compound or diagnostic product. For the first time, all these data are able to be directly compared. Access to such a database can dramatically shorten the time from investigational new drug (IND) to new drug application (NDA), or save time and money by hastening a substantiated 'no-go' decision. High-throughput StaRT-PCR is conducted at the company's automated Standardized Expression Measurement (SEM) Center. Currently optimized for detection on a microcapillary electrophoretic platform, StaRT-PCR products also may be analyzed on microarray, high-performance liquid chromatography (HPLC), or matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) platforms. SEM Center services deliver standardized genomic data--data that will accelerate the application of pharmacogenomic technology to new drug and diagnostic test development and facilitate personalized medicine.

  5. N-Acetylglucosamine Utilization by Saccharomyces cerevisiae Based on Expression of Candida albicans NAG Genes

    PubMed Central

    Wendland, Jürgen; Schaub, Yvonne; Walther, Andrea

    2009-01-01

    Synthesis of chitin de novo from glucose involves a linear pathway in Saccharomyces cerevisiae. Several of the pathway genes, including GNA1, are essential. Genes for chitin catabolism are absent in S. cerevisiae. Therefore, S. cerevisiae cannot use chitin as a carbon source. Chitin is the second most abundant polysaccharide after cellulose and consists of N-acetylglucosamine (GlcNAc) moieties. Here, we have generated S. cerevisiae strains that are able to use GlcNAc as a carbon source by expressing four Candida albicans genes (NAG3 or its NAG4 paralog, NAG5, NAG2, and NAG1) encoding a GlcNAc permease, a GlcNAc kinase, a GlcNAc-6-phosphate deacetylase, and a glucosamine-6-phosphate deaminase, respectively. Expression of NAG3 and NAG5 or NAG4 and NAG5 in S. cerevisiae resulted in strains in which the otherwise-essential ScGNA1 could be deleted. These strains required the presence of GlcNAc in the medium, indicating that uptake of GlcNAc and its phosphorylation were achieved. Expression of all four NAG genes produced strains that could use GlcNAc as the sole carbon source for growth. Utilization of a GlcNAc catabolic pathway for bioethanol production using these strains was tested. However, fermentation was slow and yielded only minor amounts of ethanol (approximately 3.0 g/liter), suggesting that fructose-6-phosphate produced from GlcNAc under these conditions is largely consumed to maintain cellular functions and promote growth. Our results present the first step toward tapping a novel, renewable carbon source for biofuel production. PMID:19648376

  6. Identification of gene expression-based prognostic markers in the hematopoietic stem cells of patients with myelodysplastic syndromes.

    PubMed

    Pellagatti, Andrea; Benner, Axel; Mills, Ken I; Cazzola, Mario; Giagounidis, Aristoteles; Perry, Janet; Malcovati, Luca; Della Porta, Matteo G; Jädersten, Martin; Verma, Amit; McDonald, Emma-Jane; Killick, Sally; Hellström-Lindberg, Eva; Bullinger, Lars; Wainscoat, James S; Boultwood, Jacqueline

    2013-10-01

    The diagnosis of patients with myelodysplastic syndromes (MDS) is largely dependent on morphologic examination of bone marrow aspirates. Several criteria that form the basis of the classifications and scoring systems most commonly used in clinical practice are affected by operator-dependent variation. To identify standardized molecular markers that would allow prediction of prognosis, we have used gene expression profiling (GEP) data on CD34+ cells from patients with MDS to determine the relationship between gene expression levels and prognosis. GEP data on CD34+ cells from 125 patients with MDS with a minimum 12-month follow-up since date of bone marrow sample collection were included in this study. Supervised principal components and lasso penalized Cox proportional hazards regression (Coxnet) were used for the analysis. We identified several genes, the expression of which was significantly associated with survival of patients with MDS, including LEF1, CDH1, WT1, and MN1. The Coxnet predictor, based on expression data on 20 genes, outperformed other predictors, including one that additionally used clinical information. Our Coxnet gene signature based on CD34+ cells significantly identified a separation of patients with good or bad prognosis in an independent GEP data set based on unsorted bone marrow mononuclear cells, demonstrating that our signature is robust and may be applicable to bone marrow cells without the need to isolate CD34+ cells. We present a new, valuable GEP-based signature for assessing prognosis in MDS. GEP-based signatures correlating with clinical outcome may significantly contribute to a refined risk classification of MDS.

  7. Surface EMG-based Sketching Recognition Using Two Analysis Windows and Gene Expression Programming

    PubMed Central

    Yang, Zhongliang; Chen, Yumiao

    2016-01-01

    Sketching is one of the most important processes in the conceptual stage of design. Previous studies have relied largely on the analyses of sketching process and outcomes; whereas surface electromyographic (sEMG) signals associated with sketching have received little attention. In this study, we propose a method in which 11 basic one-stroke sketching shapes are identified from the sEMG signals generated by the forearm and upper arm muscles from 4 subjects. Time domain features such as integrated electromyography, root mean square and mean absolute value were extracted with analysis windows of two length conditions for pattern recognition. After reducing data dimensionality using principal component analysis, the shapes were classified using Gene Expression Programming (GEP). The performance of the GEP classifier was compared to the Back Propagation neural network (BPNN) and the Elman neural network (ENN). Feature extraction with the short analysis window (250 ms with a 250 ms increment) improved the recognition rate by around 6.4% averagely compared with the long analysis window (2500 ms with a 2500 ms increment). The average recognition rate for the eleven basic one-stroke sketching patterns achieved by the GEP classifier was 96.26% in the training set and 95.62% in the test set, which was superior to the performance of the BPNN and ENN classifiers. The results show that the GEP classifier is able to perform well with either length of the analysis window. Thus, the proposed GEP model show promise for recognizing sketching based on sEMG signals. PMID:27790083

  8. Surface EMG-based Sketching Recognition Using Two Analysis Windows and Gene Expression Programming.

    PubMed

    Yang, Zhongliang; Chen, Yumiao

    2016-01-01

    Sketching is one of the most important processes in the conceptual stage of design. Previous studies have relied largely on the analyses of sketching process and outcomes; whereas surface electromyographic (sEMG) signals associated with sketching have received little attention. In this study, we propose a method in which 11 basic one-stroke sketching shapes are identified from the sEMG signals generated by the forearm and upper arm muscles from 4 subjects. Time domain features such as integrated electromyography, root mean square and mean absolute value were extracted with analysis windows of two length conditions for pattern recognition. After reducing data dimensionality using principal component analysis, the shapes were classified using Gene Expression Programming (GEP). The performance of the GEP classifier was compared to the Back Propagation neural network (BPNN) and the Elman neural network (ENN). Feature extraction with the short analysis window (250 ms with a 250 ms increment) improved the recognition rate by around 6.4% averagely compared with the long analysis window (2500 ms with a 2500 ms increment). The average recognition rate for the eleven basic one-stroke sketching patterns achieved by the GEP classifier was 96.26% in the training set and 95.62% in the test set, which was superior to the performance of the BPNN and ENN classifiers. The results show that the GEP classifier is able to perform well with either length of the analysis window. Thus, the proposed GEP model show promise for recognizing sketching based on sEMG signals.

  9. Bayesian inference with historical data-based informative priors improves detection of differentially expressed genes

    PubMed Central

    Li, Ben; Sun, Zhaonan; He, Qing; Zhu, Yu; Qin, Zhaohui S.

    2016-01-01

    Motivation: Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical ‘large p, small n’ problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. Results: Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the ‘large p, small n’ problem. Availability and implementation: Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT. Contact: yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26519502

  10. Bayesian inference with historical data-based informative priors improves detection of differentially expressed genes.

    PubMed

    Li, Ben; Sun, Zhaonan; He, Qing; Zhu, Yu; Qin, Zhaohui S

    2016-03-01

    Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical 'large p, small n' problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the 'large p, small n' problem. Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT CONTACT: yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Endoribonuclease-Based Two-Component Repressor Systems for Tight Gene Expression Control in Plants

    DOE PAGES

    Liang, Yan; Richardson, Sarah; Yan, Jingwei; ...

    2017-01-17

    © 2017 American Chemical Society. Tight control and multifactorial regulation of gene expression are important challenges in genetic engineering and are critical for the development of regulatory circuits. Meeting these challenges will facilitate transgene expression regulation and support the fine-tuning of metabolic pathways to avoid the accumulation of undesired intermediates. By employing the endoribonuclease Csy4 and its recognition sequence from Pseudomonas aeruginosa and manipulating 5′UTR of mRNA, we developed a two-component expression-repression system to tightly control synthesis of transgene products. We demonstrated that this regulatory device was functional in monocotyledonous and dicotyledonous plant species, and showed that it can bemore » used to repress transgene expression by > 400-fold and to synchronize transgene repression. In addition to tissue-specific transgene repression, this system offers stimuli-dependent expression control. Using a bioinformatics approach, we identified 54 orthologous systems from various bacteria, and then validated in planta the activity for a few of those systems, demonstrating the potential diversity of such a two-component repressor system.« less

  12. Gene expression-based dosimetry by dose and time in mice following acute radiation exposure.

    PubMed

    Tucker, James D; Divine, George W; Grever, William E; Thomas, Robert A; Joiner, Michael C; Smolinski, Joseph M; Auner, Gregory W

    2013-01-01

    Rapid and reliable methods for performing biological dosimetry are of paramount importance in the event of a large-scale nuclear event. Traditional dosimetry approaches lack the requisite rapid assessment capability, ease of use, portability and low cost, which are factors needed for triaging a large number of victims. Here we describe the results of experiments in which mice were acutely exposed to (60)Co gamma rays at doses of 0 (control) to 10 Gy. Blood was obtained from irradiated mice 0.5, 1, 2, 3, 5, and 7 days after exposure. mRNA expression levels of 106 selected genes were obtained by reverse-transcription real time PCR. Stepwise regression of dose received against individual gene transcript expression levels provided optimal dosimetry at each time point. The results indicate that only 4-7 different gene transcripts are needed to explain ≥ 0.69 of the variance (R(2)), and that receiver-operator characteristics, a measure of sensitivity and specificity, of ≥ 0.93 for these statistical models were achieved at each time point. These models provide an excellent description of the relationship between the actual and predicted doses up to 6 Gy. At doses of 8 and 10 Gy there appears to be saturation of the radiation-response signals with a corresponding diminution of accuracy. These results suggest that similar analyses in humans may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations.

  13. Gene Expression-Based Dosimetry by Dose and Time in Mice Following Acute Radiation Exposure

    PubMed Central

    Tucker, James D.; Divine, George W.; Grever, William E.; Thomas, Robert A.; Joiner, Michael C.; Smolinski, Joseph M.; Auner, Gregory W.

    2013-01-01

    Rapid and reliable methods for performing biological dosimetry are of paramount importance in the event of a large-scale nuclear event. Traditional dosimetry approaches lack the requisite rapid assessment capability, ease of use, portability and low cost, which are factors needed for triaging a large number of victims. Here we describe the results of experiments in which mice were acutely exposed to 60Co gamma rays at doses of 0 (control) to 10 Gy. Blood was obtained from irradiated mice 0.5, 1, 2, 3, 5, and 7 days after exposure. mRNA expression levels of 106 selected genes were obtained by reverse-transcription real time PCR. Stepwise regression of dose received against individual gene transcript expression levels provided optimal dosimetry at each time point. The results indicate that only 4–7 different gene transcripts are needed to explain ≥ 0.69 of the variance (R2), and that receiver-operator characteristics, a measure of sensitivity and specificity, of ≥ 0.93 for these statistical models were achieved at each time point. These models provide an excellent description of the relationship between the actual and predicted doses up to 6 Gy. At doses of 8 and 10 Gy there appears to be saturation of the radiation-response signals with a corresponding diminution of accuracy. These results suggest that similar analyses in humans may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations. PMID:24358280

  14. Analysis of gene expression at the single-cell level using microdroplet-based microfluidic technology

    PubMed Central

    Mary, Pascaline; Dauphinot, Luce; Bois, Nadège; Potier, Marie-Claude; Studer, Vincent; Tabeling, Patrick

    2011-01-01

    In the present work, we have measured the messenger RNA expression of specific genes both from total RNA and cells encapsulated in droplets. The microfluidic chip introduced includes the following functionalities: RNA∕cell encapsulation, lysis, reverse transcription and real-time polymerase chain reaction. We have shown that simplex and duplex gene expression measurements can be carried out over a population of 100 purified RNA samples encapsulated simultaneously in 2 nl droplets in less than 2 h. An analysis of 100 samples containing one to three cells has shown excellent consistency with standard techniques regarding average values. The cell-to-cell distributions of the E-cadherin expression suggest fluctuations on the order of 80% in the number of transcripts, which is highly consistent with the general findings from the literature. A mathematical model has also been introduced to strengthen the interpretation of our results. The present work paves the way for the systematic acquisition of such information in biological and biomedical studies. PMID:21716808

  15. Microarray-based characterization of differential gene expression during vocal fold wound healing in rats.

    PubMed

    Welham, Nathan V; Ling, Changying; Dawson, John A; Kendziorski, Christina; Thibeault, Susan L; Yamashita, Masaru

    2015-03-01

    The vocal fold (VF) mucosa confers elegant biomechanical function for voice production but is susceptible to scar formation following injury. Current understanding of VF wound healing is hindered by a paucity of data and is therefore often generalized from research conducted in skin and other mucosal systems. Here, using a previously validated rat injury model, expression microarray technology and an empirical Bayes analysis approach, we generated a VF-specific transcriptome dataset to better capture the system-level complexity of wound healing in this specialized tissue. We measured differential gene expression at 3, 14 and 60 days post-injury compared to experimentally naïve controls, pursued functional enrichment analyses to refine and add greater biological definition to the previously proposed temporal phases of VF wound healing, and validated the expression and localization of a subset of previously unidentified repair- and regeneration-related genes at the protein level. Our microarray dataset is a resource for the wider research community and has the potential to stimulate new hypotheses and avenues of investigation, improve biological and mechanistic insight, and accelerate the identification of novel therapeutic targets.

  16. Genome-Based Genetic Tool Development for Bacillus methanolicus: Theta- and Rolling Circle-Replicating Plasmids for Inducible Gene Expression and Application to Methanol-Based Cadaverine Production.

    PubMed

    Irla, Marta; Heggeset, Tonje M B; Nærdal, Ingemar; Paul, Lidia; Haugen, Tone; Le, Simone B; Brautaset, Trygve; Wendisch, Volker F

    2016-01-01

    Bacillus methanolicus is a thermophilic methylotroph able to overproduce amino acids from methanol, a substrate not used for human or animal nutrition. Based on our previous RNA-seq analysis a mannitol inducible promoter and a putative mannitol activator gene mtlR were identified. The mannitol inducible promoter was applied for controlled gene expression using fluorescent reporter proteins and a flow cytometry analysis, and improved by changing the -35 promoter region and by co-expression of the mtlR regulator gene. For independent complementary gene expression control, the heterologous xylose-inducible system from B. megaterium was employed and a two-plasmid gene expression system was developed. Four different replicons for expression vectors were compared with respect to their copy number and stability. As an application example, methanol-based production of cadaverine was shown to be improved from 11.3 to 17.5 g/L when a heterologous lysine decarboxylase gene cadA was expressed from a theta-replicating rather than a rolling-circle replicating vector. The current work on inducible promoter systems and compatible theta- or rolling circle-replicating vectors is an important extension of the poorly developed B. methanolicus genetic toolbox, valuable for genetic engineering and further exploration of this bacterium.

  17. Genome-Based Genetic Tool Development for Bacillus methanolicus: Theta- and Rolling Circle-Replicating Plasmids for Inducible Gene Expression and Application to Methanol-Based Cadaverine Production

    PubMed Central

    Irla, Marta; Heggeset, Tonje M. B.; Nærdal, Ingemar; Paul, Lidia; Haugen, Tone; Le, Simone B.; Brautaset, Trygve; Wendisch, Volker F.

    2016-01-01

    Bacillus methanolicus is a thermophilic methylotroph able to overproduce amino acids from methanol, a substrate not used for human or animal nutrition. Based on our previous RNA-seq analysis a mannitol inducible promoter and a putative mannitol activator gene mtlR were identified. The mannitol inducible promoter was applied for controlled gene expression using fluorescent reporter proteins and a flow cytometry analysis, and improved by changing the -35 promoter region and by co-expression of the mtlR regulator gene. For independent complementary gene expression control, the heterologous xylose-inducible system from B. megaterium was employed and a two-plasmid gene expression system was developed. Four different replicons for expression vectors were compared with respect to their copy number and stability. As an application example, methanol-based production of cadaverine was shown to be improved from 11.3 to 17.5 g/L when a heterologous lysine decarboxylase gene cadA was expressed from a theta-replicating rather than a rolling-circle replicating vector. The current work on inducible promoter systems and compatible theta- or rolling circle-replicating vectors is an important extension of the poorly developed B. methanolicus genetic toolbox, valuable for genetic engineering and further exploration of this bacterium. PMID:27713731

  18. Simulating the time series of a selected gene expression profile in an agent-based tumor model

    NASA Astrophysics Data System (ADS)

    Mansury, Yuri; Deisboeck, Thomas S.

    2004-09-01

    To elucidate the role of environmental conditions in molecular-level dynamics and to study their impact on macroscopic brain tumor growth patterns, the expression of the genes Tenascin C and PCNA in a 2D agent-based model for the migratory trait is calibrated using experimental data from the literature, while the expression of these genes for the proliferative trait is obtained as the model output. Numerical results confirm that the gene expression of Tenascin C is indeed consistently higher in the migratory glioma cell phenotype and show that the expression of PCNA is consistently higher among proliferating tumor cells. Intriguingly, the time series of the tumor cells’ gene expression exhibit a sudden change in behavior during the invasion of the tumor into a nutrient-abundant region, showing a robust positive correlation between the expression of Tenascin C and the tumor’s diameter, yet a strong negative correlation between the expression of PCNA and the diameter. These molecular-level dynamics correspond to the emergence of a structural asymmetry in the form of a bulging tumor rim in the nutrient-abundant region. The simulated time series thus supports the critical role of the migratory cell phenotype during both the tumor system’s overall macroscopic expansion and the evolvement of regional growth patterns, particularly in the later stages. Furthermore, detrended fluctuation analysis (DFA) suggests that for prediction purposes, the simulated gene expression profiles of Tenascin C and PCNA that were determined separately for the migrating and proliferating phenotypes exhibit lesser predictability than those of the phenotypic mixture combining all viable tumor cells typically found in clinical biopsies. Finally, partitioning the tumor into distinct geographic regions of interest (ROI) reveals that the gene expression profile of tumor cells in the quadrant close to the nutrient-abundant region is representative for the entire tumor whereas the expression

  19. Quantitative Reverse Transcription-qPCR-Based Gene Expression Analysis in Plants.

    PubMed

    Abdallah, Heithem Ben; Bauer, Petra

    2016-01-01

    The investigation of gene expression is an initial and essential step to understand the function of a gene in a physiological context. Reverse transcription-quantitative real-time PCR (RT-qPCR) assays are reproducible, quantitative, and fast. They can be adapted to study model and non-model plant species without the need to have whole genome or transcriptome sequence data available. Here, we provide a protocol for a reliable RT-qPCR assay, which can be easily adapted to any plant species of interest. We describe the design of the qPCR strategy and primer design, considerations for plant material generation, RNA preparation and cDNA synthesis, qPCR setup and run, and qPCR data analysis, interpretation, and final presentation.

  20. Digital gene expression analysis based on integrated de novo transcriptome assembly of sweet potato [Ipomoea batatas (L.) Lam].

    PubMed

    Tao, Xiang; Gu, Ying-Hong; Wang, Hai-Yan; Zheng, Wen; Li, Xiao; Zhao, Chuan-Wu; Zhang, Yi-Zheng

    2012-01-01

    Sweet potato (Ipomoea batatas L. [Lam.]) ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages. Illumina paired-end (PE) RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥ 100 bp), which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE) tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified. The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots, tissue-specific gene expression, potential biotic and abiotic stress response in sweet

  1. Gene expression-based chemical genomics identifies potential therapeutic drugs in hepatocellular carcinoma.

    PubMed

    Chen, Ming-Huang; Yang, Wu-Lung R; Lin, Kuan-Ting; Liu, Chia-Hung; Liu, Yu-Wen; Huang, Kai-Wen; Chang, Peter Mu-Hsin; Lai, Jin-Mei; Hsu, Chun-Nan; Chao, Kun-Mao; Kao, Cheng-Yan; Huang, Chi-Ying F

    2011-01-01

    Hepatocellular carcinoma (HCC) is an aggressive tumor with a poor prognosis. Currently, only sorafenib is approved by the FDA for advanced HCC treatment; therefore, there is an urgent need to discover candidate therapeutic drugs for HCC. We hypothesized that if a drug signature could reverse, at least in part, the gene expression signature of HCC, it might have the potential to inhibit HCC-related pathways and thereby treat HCC. To test this hypothesis, we first built an integrative platform, the "Encyclopedia of Hepatocellular Carcinoma genes Online 2", dubbed EHCO2, to systematically collect, organize and compare the publicly available data from HCC studies. The resulting collection includes a total of 4,020 genes. To systematically query the Connectivity Map (CMap), which includes 6,100 drug-mediated expression profiles, we further designed various gene signature selection and enrichment methods, including a randomization technique, majority vote, and clique analysis. Subsequently, 28 out of 50 prioritized drugs, including tanespimycin, trichostatin A, thioguanosine, and several anti-psychotic drugs with anti-tumor activities, were validated via MTT cell viability assays and clonogenic assays in HCC cell lines. To accelerate their future clinical use, possibly through drug-repurposing, we selected two well-established drugs to test in mice, chlorpromazine and trifluoperazine. Both drugs inhibited orthotopic liver tumor growth. In conclusion, we successfully discovered and validated existing drugs for potential HCC therapeutic use with the pipeline of Connectivity Map analysis and lab verification, thereby suggesting the usefulness of this procedure to accelerate drug repurposing for HCC treatment.

  2. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data

    PubMed Central

    Ahn, Byeong-Cheol

    2016-01-01

    This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified

  3. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.

    PubMed

    Ganesh Kumar, Pugalendhi; Kavitha, Muthu Subash; Ahn, Byeong-Cheol

    2016-01-01

    This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified

  4. Understanding development and stem cells using single cell-based analyses of gene expression.

    PubMed

    Kumar, Pavithra; Tan, Yuqi; Cahan, Patrick

    2017-01-01

    In recent years, genome-wide profiling approaches have begun to uncover the molecular programs that drive developmental processes. In particular, technical advances that enable genome-wide profiling of thousands of individual cells have provided the tantalizing prospect of cataloging cell type diversity and developmental dynamics in a quantitative and comprehensive manner. Here, we review how single-cell RNA sequencing has provided key insights into mammalian developmental and stem cell biology, emphasizing the analytical approaches that are specific to studying gene expression in single cells. © 2017. Published by The Company of Biologists Ltd.

  5. A gene expression signature-based approach reveals the mechanisms of action of the Chinese herbal medicine berberine

    PubMed Central

    Lee, Kuen-Haur; Lo, Hsiang-Ling; Tang, Wan-Chun; Hsiao, Heidi Hao-yun; Yang, Pei-Ming

    2014-01-01

    Berberine (BBR), a traditional Chinese herbal medicine, was shown to display anticancer activity. In this study, we attempted to provide a global view of the molecular pathways associated with its anticancer effect through a gene expression-based chemical approach. BBR-induced differentially expressed genes obtained from the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) were analyzed using the Connectivity Map (CMAP) database to compare similarities of gene expression profiles between BBR and CMAP compounds. Candidate compounds were further analyzed using the Search Tool for Interactions of Chemicals (STITCH) database to explore chemical-protein interactions. Results showed that BBR may inhibit protein synthesis, histone deacetylase (HDAC), or AKT/mammalian target of rapamycin (mTOR) pathways. Further analyses demonstrated that BBR inhibited global protein synthesis and basal AKT activity, and induced endoplasmic reticulum (ER) stress and autophagy, which was associated with activation of AMP-activated protein kinase (AMPK). However, BBR did not alter mTOR or HDAC activities. Interestingly, BBR induced the acetylation of α-tubulin, a substrate of HDAC6. In addition, the combination of BBR and SAHA, a pan-HDAC inhibitor, synergistically inhibited cell proliferation and induced cell cycle arrest. Our results provide novel insights into the mechanisms of action of BBR in cancer therapy. PMID:25227736

  6. A gene expression signature-based approach reveals the mechanisms of action of the Chinese herbal medicine berberine.

    PubMed

    Lee, Kuen-Haur; Lo, Hsiang-Ling; Tang, Wan-Chun; Hsiao, Heidi Hao-yun; Yang, Pei-Ming

    2014-09-17

    Berberine (BBR), a traditional Chinese herbal medicine, was shown to display anticancer activity. In this study, we attempted to provide a global view of the molecular pathways associated with its anticancer effect through a gene expression-based chemical approach. BBR-induced differentially expressed genes obtained from the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) were analyzed using the Connectivity Map (CMAP) database to compare similarities of gene expression profiles between BBR and CMAP compounds. Candidate compounds were further analyzed using the Search Tool for Interactions of Chemicals (STITCH) database to explore chemical-protein interactions. Results showed that BBR may inhibit protein synthesis, histone deacetylase (HDAC), or AKT/mammalian target of rapamycin (mTOR) pathways. Further analyses demonstrated that BBR inhibited global protein synthesis and basal AKT activity, and induced endoplasmic reticulum (ER) stress and autophagy, which was associated with activation of AMP-activated protein kinase (AMPK). However, BBR did not alter mTOR or HDAC activities. Interestingly, BBR induced the acetylation of α-tubulin, a substrate of HDAC6. In addition, the combination of BBR and SAHA, a pan-HDAC inhibitor, synergistically inhibited cell proliferation and induced cell cycle arrest. Our results provide novel insights into the mechanisms of action of BBR in cancer therapy.

  7. Revealing constitutively expressed resistance genes in Agrostis species using PCR-based motif-directed RNA fingerprinting.

    PubMed

    Budak, Hikmet; Su, Senem; Ergen, Neslihan

    2006-12-01

    Agrostis species are mainly used in athletic fields and golf courses. Their integrity is maintained by fungicides, which makes the development of disease-resistance varieties a high priority. However, there is a lack of knowledge about resistance (R) genes and their use for genetic improvement in Agrostis species. The objective of this study was to identify and clone constitutively expressed cDNAs encoding R gene-like (RGL) sequences from three Agrostis species (colonial bentgrass (A. capillaris L.), creeping bentgrass (A. stolonifera L.) and velvet bentgrass (A. canina L.)) by PCR-based motif-directed RNA fingerprinting towards relatively conserved nucleotide binding site (NBS) domains. Sixty-one constitutively expressed cDNA sequences were identified and characterized. Sequence analysis of ESTs and probable translation products revealed that RGLs are highly conserved among these three Agrostis species. Fifteen of them were shown to share conserved motifs found in other plant disease resistance genes such as MLA13, Xa1, YR6, YR23 and RPP5. The molecular evolutionary forces, analysed using the Ka/Ks ratio, reflected purifying selection both on NBS and leucine-rich repeat (LRR) intervening regions of discovered RGL sequences in these species. This study presents, for the first time, isolation and characterization of constitutively expressed RGL sequences from Agrostis species revealing the presence of TNL (TIR-NBS-LRR) type R genes in monocot plants. The characterized RGLs will further enhance knowledge on the molecular evolution of the R gene family in grasses.

  8. Network-based analysis of differentially expressed genes in cerebrospinal fluid (CSF) and blood reveals new candidate genes for multiple sclerosis

    PubMed Central

    Safari-Alighiarloo, Nahid; Taghizadeh, Mohammad; Tabatabaei, Seyyed Mohammad; Namaki, Saeed

    2016-01-01

    Background The involvement of multiple genes and missing heritability, which are dominant in complex diseases such as multiple sclerosis (MS), entail using network biology to better elucidate their molecular basis and genetic factors. We therefore aimed to integrate interactome (protein–protein interaction (PPI)) and transcriptomes data to construct and analyze PPI networks for MS disease. Methods Gene expression profiles in paired cerebrospinal fluid (CSF) and peripheral blood mononuclear cells (PBMCs) samples from MS patients, sampled in relapse or remission and controls, were analyzed. Differentially expressed genes which determined only in CSF (MS vs. control) and PBMCs (relapse vs. remission) separately integrated with PPI data to construct the Query-Query PPI (QQPPI) networks. The networks were further analyzed to investigate more central genes, functional modules and complexes involved in MS progression. Results The networks were analyzed and high centrality genes were identified. Exploration of functional modules and complexes showed that the majority of high centrality genes incorporated in biological pathways driving MS pathogenesis. Proteasome and spliceosome were also noticeable in enriched pathways in PBMCs (relapse vs. remission) which were identified by both modularity and clique analyses. Finally, STK4, RB1, CDKN1A, CDK1, RAC1, EZH2, SDCBP genes in CSF (MS vs. control) and CDC37, MAP3K3, MYC genes in PBMCs (relapse vs. remission) were identified as potential candidate genes for MS, which were the more central genes involved in biological pathways. Discussion This study showed that network-based analysis could explicate the complex interplay between biological processes underlying MS. Furthermore, an experimental validation of candidate genes can lead to identification of potential therapeutic targets. PMID:28028462

  9. Microarray-based analysis of gene expression in lycopersicon esculentum seedling roots in response to cadmium, chromium, mercury, and lead.

    PubMed

    Hou, Jing; Liu, Xinhui; Wang, Juan; Zhao, Shengnan; Cui, Baoshan

    2015-02-03

    The effects of heavy metals in agricultural soils have received special attention due to their potential for accumulation in crops, which can affect species at all trophic levels. Therefore, there is a critical need for reliable bioassays for assessing risk levels due to heavy metals in agricultural soil. In the present study, we used microarrays to investigate changes in gene expression of Lycopersicon esculentum in response to Cd-, Cr-, Hg-, or Pb-spiked soil. Exposure to (1)/10 median lethal concentrations (LC50) of Cd, Cr, Hg, or Pb for 7 days resulted in expression changes in 29 Cd-specific, 58 Cr-specific, 192 Hg-specific and 864 Pb-specific genes as determined by microarray analysis, whereas conventional morphological and physiological bioassays did not reveal any toxicant stresses. Hierarchical clustering analysis showed that the characteristic gene expression profiles induced by Cd, Cr, Hg, and Pb were distinct from not only the control but also one another. Furthermore, a total of three genes related to "ion transport" for Cd, 14 genes related to "external encapsulating structure organization", "reproductive developmental process", "lipid metabolic process" and "response to stimulus" for Cr, 11 genes related to "cellular metabolic process" and "cellular response to stimulus" for Hg, 78 genes related to 20 biological processes (e.g., DNA metabolic process, monosaccharide catabolic process, cell division) for Pb were identified and selected as their potential biomarkers. These findings demonstrated that microarray-based analysis of Lycopersicon esculentum was a sensitive tool for the early detection of potential toxicity of heavy metals in agricultural soil, as well as an effective tool for identifying the heavy metal-specific genes, which should be useful for assessing risk levels due to heavy metals in agricultural soil.

  10. Gene expression inference with deep learning.

    PubMed

    Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

    2016-06-15

    Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Nonadditive gene expression in polyploids.

    PubMed

    Yoo, Mi-Jeong; Liu, Xiaoxian; Pires, J Chris; Soltis, Pamela S; Soltis, Douglas E

    2014-01-01

    Allopolyploidy involves hybridization and duplication of divergent parental genomes and provides new avenues for gene expression. The expression levels of duplicated genes in polyploids can show deviation from parental additivity (the arithmetic average of the parental expression levels). Nonadditive expression has been widely observed in diverse polyploids and comprises at least three possible scenarios: (a) The total gene expression level in a polyploid is similar to that of one of its parents (expression-level dominance); (b) total gene expression is lower or higher than in both parents (transgressive expression); and (c) the relative contribution of the parental copies (homeologs) to the total gene expression is unequal (homeolog expression bias). Several factors may result in expression nonadditivity in polyploids, including maternal-paternal influence, gene dosage balance, cis- and/or trans-regulatory networks, and epigenetic regulation. As our understanding of nonadditive gene expression in polyploids remains limited, a new generation of investigators should explore additional phenomena (i.e., alternative splicing) and use other high-throughput "omics" technologies to measure the impact of nonadditive expression on phenotype, proteome, and metabolome.

  12. Evolution of Gene Expression after Gene Amplification

    PubMed Central

    Garcia, Nelson; Zhang, Wei; Wu, Yongrui; Messing, Joachim

    2015-01-01

    We took a rather unique approach to investigate the conservation of gene expression of prolamin storage protein genes across two different subfamilies of the Poaceae. We took advantage of oat plants carrying single maize chromosomes in different cultivars, called oat–maize addition (OMA) lines, which permitted us to determine whether regulation of gene expression was conserved between the two species. We found that γ-zeins are expressed in OMA7.06, which carries maize chromosome 7 even in the absence of the trans-acting maize prolamin-box-binding factor (PBF), which regulates their expression. This is likely because oat PBF can substitute for the function of maize PBF as shown in our transient expression data, using a γ-zein promoter fused to green fluorescent protein (GFP). Despite this conservation, the younger, recently amplified prolamin genes in maize, absent in oat, are not expressed in the corresponding OMAs. However, maize can express the oldest prolamin gene, the wheat high-molecular weight glutenin Dx5 gene, even when maize Pbf is knocked down (through PbfRNAi), and/or another maize transcription factor, Opaque-2 (O2) is knocked out (in maize o2 mutant). Therefore, older genes are conserved in their regulation, whereas younger ones diverged during evolution and eventually acquired a new repertoire of suitable transcriptional activators. PMID:25912045

  13. Gene Expression Studies in Mosquitoes

    PubMed Central

    Chen, Xlao-Guang; Mathur, Geetika; James, Anthony A.

    2009-01-01

    Research on gene expression in mosquitoes is motivated by both basic and applied interests. Studies of genes involved in hematophagy, reproduction, olfaction, and immune responses reveal an exquisite confluence of biological adaptations that result in these highly-successful life forms. The requirement of female mosquitoes for a bloodmeal for propagation has been exploited by a wide diversity of viral, protozoan and metazoan pathogens as part of their life cycles. Identifying genes involved in host-seeking, blood feeding and digestion, reproduction, insecticide resistance and susceptibility/refractoriness to pathogen development is expected to provide the bases for the development of novel methods to control mosquito-borne diseases. Advances in mosquito transgenesis technologies, the availability of whole genome sequence information, mass sequencing and analyses of transcriptomes and RNAi techniques will assist development of these tools as well as deepen the understanding of the underlying genetic components for biological phenomena characteristic of these insect species. PMID:19161831

  14. Molecular markers of early Parkinson's disease based on gene expression in blood.

    PubMed

    Scherzer, Clemens R; Eklund, Aron C; Morse, Lee J; Liao, Zhixiang; Locascio, Joseph J; Fefer, Daniel; Schwarzschild, Michael A; Schlossmacher, Michael G; Hauser, Michael A; Vance, Jeffery M; Sudarsky, Lewis R; Standaert, David G; Growdon, John H; Jensen, Roderick V; Gullans, Steven R

    2007-01-16

    Parkinson's disease (PD) progresses relentlessly and affects five million people worldwide. Laboratory tests for PD are critically needed for developing treatments designed to slow or prevent progression of the disease. We performed a transcriptome-wide scan in 105 individuals to interrogate the molecular processes perturbed in cellular blood of patients with early-stage PD. The molecular multigene marker here identified is associated with risk of PD in 66 samples of the training set comprising healthy and disease controls [third tertile cross-validated odds ratio of 5.7 (P for trend 0.005)]. It is further validated in 39 independent test samples [third tertile odds ratio of 5.1 (P for trend 0.04)]. Insights into disease-linked processes detectable in peripheral blood are offered by 22 unique genes differentially expressed in patients with PD versus healthy individuals. These include the co-chaperone ST13, which stabilizes heat-shock protein 70, a modifier of alpha-synuclein misfolding and toxicity. ST13 messenger RNA copies are lower in patients with PD (mean +/- SE 0.59 +/- 0.05) than in controls (0.96 +/- 0.09) (P = 0.002) in two independent populations. Thus, gene expression signals measured in blood can facilitate the development of biomarkers for PD.

  15. Molecular markers of early Parkinson's disease based on gene expression in blood

    PubMed Central

    Scherzer, Clemens R.; Eklund, Aron C.; Morse, Lee J.; Liao, Zhixiang; Locascio, Joseph J.; Fefer, Daniel; Schwarzschild, Michael A.; Schlossmacher, Michael G.; Hauser, Michael A.; Vance, Jeffery M.; Sudarsky, Lewis R.; Standaert, David G.; Growdon, John H.; Jensen, Roderick V.; Gullans, Steven R.

    2007-01-01

    Parkinson's disease (PD) progresses relentlessly and affects five million people worldwide. Laboratory tests for PD are critically needed for developing treatments designed to slow or prevent progression of the disease. We performed a transcriptome-wide scan in 105 individuals to interrogate the molecular processes perturbed in cellular blood of patients with early-stage PD. The molecular multigene marker here identified is associated with risk of PD in 66 samples of the training set comprising healthy and disease controls [third tertile cross-validated odds ratio of 5.7 (P for trend 0.005)]. It is further validated in 39 independent test samples [third tertile odds ratio of 5.1 (P for trend 0.04)]. Insights into disease-linked processes detectable in peripheral blood are offered by 22 unique genes differentially expressed in patients with PD versus healthy individuals. These include the cochaperone ST13, which stabilizes heat-shock protein 70, a modifier of α-synuclein misfolding and toxicity. ST13 messenger RNA copies are lower in patients with PD (mean ± SE 0.59 ± 0.05) than in controls (0.96 ± 0.09) (P = 0.002) in two independent populations. Thus, gene expression signals measured in blood can facilitate the development of biomarkers for PD. PMID:17215369

  16. Prediction of lung cancer based on serum biomarkers by gene expression programming methods.

    PubMed

    Yu, Zhuang; Chen, Xiao-Zheng; Cui, Lian-Hua; Si, Hong-Zong; Lu, Hai-Jiao; Liu, Shi-Hai

    2014-01-01

    In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH), C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1, are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made based on biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint models with a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm that combines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses on relationships between variables in sets of data and then builds models to explain these relationships, and has been successfully used in formula finding and function mining. As a basis for defining a GEP environment for SCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are frequently- used lung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer, basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH), GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 were combined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly, the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. With GEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was 0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminating between NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

  17. Noninferiority tests based on concordance correlation coefficient for assessment of the agreement for gene expression data from microarray experiments.

    PubMed

    Liao, Chen-Tuo; Lin, Chia-Ying; Liu, Jen-Pei

    2007-01-01

    Microarray is one of the breakthrough technologies in the twenty-first century. Despite of its great potential, transition and realization of microarray technology into the clinically useful commercial products have not been as rapid as the technology could promise. One of the primary reasons is lack of agreement and poor reproducibility of the intensity measurements on gene expression obtained from microarray experiments. Current practices often use the testing the hypothesis of zero Pearson correlation coefficient to assess the agreement of gene expression levels between the technical replicates from microarray experiments. However, Pearson correlation coefficient is to evaluate linear association between two variables and fail to take into account changes in accuracy and precision. Hence, it is not appropriate for evaluation of agreement of gene expression levels between technical replicates. Therefore, we propose to use the concordance correlation coefficient to assess agreement of gene expression levels between technical replicates. We also apply the Generalized Pivotal Quantities to obtain the exact confidence interval for concordance coefficient. In addition, based on the concept of noninferiority test, a one-sided (1 - alpha) lower confidence limit for concordance correlation coefficient is employed to test the hypothesis that the agreement of expression levels of the same genes between two technical replicates exceeds some minimal requirement of agreement. We conducted a simulation study, under various combinations of mean differences, variability, and sample size, to empirically compare the performance of different methods for assessment of agreement in terms of coverage probability, expected length, size, and power. Numerical data from published papers illustrate the application of the proposed methods.

  18. Live fluorescent RNA-based detection of pluripotency gene expression in embryonic and induced pluripotent stem cells of different species.

    PubMed

    Lahm, Harald; Doppler, Stefanie; Dreßen, Martina; Werner, Astrid; Adamczyk, Klaudia; Schrambke, Dominic; Brade, Thomas; Laugwitz, Karl-Ludwig; Deutsch, Marcus-André; Schiemann, Matthias; Lange, Rüdiger; Moretti, Alessandra; Krane, Markus

    2015-02-01

    The generation of induced pluripotent stem (iPS) cells has successfully been achieved in many species. However, the identification of truly reprogrammed iPS cells still remains laborious and the detection of pluripotency markers requires fixation of cells in most cases. Here, we report an approach with nanoparticles carrying Cy3-labeled sense oligonucleotide reporter strands coupled to gold-particles. These molecules are directly added to cultured cells without any manipulation and gene expression is evaluated microscopically after overnight incubation. To simultaneously detect gene expression in different species, probe sequences were chosen according to interspecies homology. With a common target-specific probe we could successfully demonstrate expression of the GAPDH house-keeping gene in somatic cells and expression of the pluripotency markers NANOG and GDF3 in embryonic stem cells and iPS cells of murine, human, and porcine origin. The population of target gene positive cells could be purified by fluorescence-activated cell sorting. After lentiviral transduction of murine tail-tip fibroblasts Nanog-specific probes identified truly reprogrammed murine iPS cells in situ during development based on their Cy3-fluorescence. The intensity of Nanog-specific fluorescence correlated positively with an increased capacity of individual clones to differentiate into cells of all three germ layers. Our approach offers a universal tool to detect intracellular gene expression directly in live cells of any desired origin without the need for manipulation, thus allowing conservation of the genetic background of the target cell. Furthermore, it represents an easy, scalable method for efficient screening of pluripotency which is highly desirable during high-throughput cell reprogramming and after genomic editing of pluripotent stem cells. © 2014 AlphaMed Press.

  19. Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review.

    PubMed

    Raddatz, Barbara B; Spitzbarth, Ingo; Matheis, Katja A; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang; Ulrich, Reiner

    2017-09-01

    High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.

  20. The Gene Expression Omnibus Database.

    PubMed

    Clough, Emily; Barrett, Tanya

    2016-01-01

    The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.

  1. Sphingoid Base Metabolism in Yeast: Mapping Gene Expression Patterns Into Qualitative Metabolite Time Course Predictions

    PubMed Central

    2001-01-01

    Can qualitative metabolite time course predictions be inferred from measured mRNA expression patterns? Speaking against this possibility is the large number of ‘decoupling’ control points that lie between these variables, i.e. translation, protein degradation, enzyme inhibition and enzyme activation. Speaking for it is the notion that these control points might be coordinately regulated such that action exerted on the mRNA level is informative of action exerted on the protein and metabolite levels. A simple kinetic model of sphingoid base metabolism in yeast is postulated. When the enzyme activities in this model are modulated proportional to mRNA expression levels measured in heat shocked yeast, the model yields a transient rise and fall in sphingoid bases followed by a permanent rise in ceramide. This finding is in qualitative agreement with experiments and is thus consistent with the aforementioned coordinated control system hypothesis. PMID:18629242

  2. Integrating phenotype and gene expression data for predicting gene function.

    PubMed

    Malone, Brandon M; Perkins, Andy D; Bridges, Susan M

    2009-10-08

    This paper presents a framework for integrating disparate data sets to predict gene function. The algorithm constructs a graph, called an integrated similarity graph, by computing similarities based upon both gene expression and textual phenotype data. This integrated graph is then used to make predictions about whether individual genes should be assigned a particular annotation from the Gene Ontology. A combined graph was generated from publicly-available gene expression data and phenotypic information from Saccharomyces cerevisiae. This graph was used to assign annotations to genes, as were graphs constructed from gene expression data and textual phenotype information alone. While the F-measure appeared similar for all three methods, annotations based upon the integrated similarity graph exhibited a better overall precision than gene expression or phenotype information alone can generate. The integrated approach was also able to assign almost as many annotations as the gene expression method alone, and generated significantly more total and correct assignments than the phenotype information could provide. These results suggest that augmenting standard gene expression data sets with publicly-available textual phenotype data can help generate more precise functional annotation predictions while mitigating the weaknesses of a standard textual phenotype approach.

  3. PCD-GED: Protein complex detection considering PPI dynamics based on time series gene expression data.

    PubMed

    Lakizadeh, Amir; Jalili, Saeed; Marashi, Sayed-Amir

    2015-08-07

    Detection of protein complexes from protein-protein interaction (PPI) networks is essential to understand the function of cell machinery. However, available PPIs are static, and cannot reflect the dynamics inherent in real networks. Our method uses time series gene expression data in addition to PPI networks to detect protein complexes. The proposed method generates a series of time-sequenced subnetworks (TSN) according to the time that the interactions are activated. It finds, from each TSN, the protein complexes by employing the weighted clustering coefficient and maximal weighted density concepts. The final set of detected protein complexes are obtained from union of all complexes from different subnetworks. Our findings suggest that by employing these considerations can produce far better results in protein complex detection problem. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Investigation of TbfA in Riemerella anatipestifer using plasmid-based methods for gene over-expression and knockdown

    PubMed Central

    Liu, MaFeng; Wang, MengYi; Zhu, DeKang; Wang, MingShu; Jia, RenYong; Chen, Shun; Sun, KunFeng; Yang, Qiao; Wu, Ying; Chen, XiaoYue; Biville, Francis; Cheng, AnChun

    2016-01-01

    Riemerella anatipestifer is a duck pathogen that has caused serious economic losses to the duck industry worldwide. Despite this, there are few reported studies of the physiological and pathogenic mechanisms of Riemerella anatipestifer infection. In previous study, we have shown that TonB1 and TonB2 were involved in hemin uptake. TonB family protein (TbfA) was not investigated, since knockout of this gene was not successful at that time. Here, we used a plasmid based gene over-expression and knockdown to investigate its function. First, we constructed three Escherichia-Riemerella anatipestifer shuttle vectors containing three different native Riemerella anatipestifer promoters. The shuttle plasmids were introduced into Riemerella anatipestifer ATCC11845 by conjugation at an efficiency of 5 × 10−5 antibiotic-resistant transconjugants per recipient cell. Based on the high-expression shuttle vector pLMF03, a method for gene knockdown was established. Knockdown of TbfA in Riemerella anatipestifer ATCC11845 decreased the organism’s growth ability in TSB medium but did not affect its hemin utilization. In contrast, over-expression of TbfA in Riemerella anatipestifer ATCC11845ΔtonB1ΔtonB2. Significantly promoted the organism’s growth in TSB medium but significantly inhibited its hemin utilization. Collectively, these findings suggest that TbfA is not involved in hemin utilization by Riemerella anatipestifer. PMID:27845444

  5. Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data.

    PubMed

    Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2007-01-01

    Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.

  6. Effect of medium/ω-6 long chain triglyceride-based emulsion on leucocyte death and inflammatory gene expression

    PubMed Central

    Cury-Boaventura, M F; Gorjão, R; Martins de Lima, T; Fiamoncini, J; Godoy, A B P; Deschamphs, F C; Soriano, F G; Curi, R

    2011-01-01

    Lipid emulsion (LE) containing medium/ω-6 long chain triglyceride-based emulsion (MCT/ω-6 LCT LE) has been recommended in the place of ω-6 LCT-based emulsion to prevent impairment of immune function. The impact of MCT/ω-6 LCT LE on lymphocyte and neutrophil death and expression of genes related to inflammation was investigated. Seven volunteers were recruited and infusion of MCT/ω-6 LCT LE was performed for 6 h. Four volunteers received saline and no change was found. Blood samples were collected before, immediately afterwards and 18 h after LE infusion. Lymphocytes and neutrophils were studied immediately after isolation and after 24 and 48 h in culture. The following determinations were carried out: plasma-free fatty acids, triacylglycerol and cholesterol concentrations, plasma fatty acid composition, neutral lipid accumulation in lymphocytes and neutrophils, signs of lymphocyte and neutrophil death and lymphocyte expression of genes related to inflammation. MCT/ω-6 LCT LE induced lymphocyte and neutrophil death. The mechanism for MCT/ω-6 LCT LE-dependent induction of leucocyte death may involve changes in neutral lipid content and modulation of expression of genes related to cell death, proteolysis, cell signalling, inflammatory response, oxidative stress and transcription. PMID:21682721

  7. Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model.

    PubMed

    Kogelman, Lisette J A; Cirera, Susanna; Zhernakova, Daria V; Fredholm, Merete; Franke, Lude; Kadarmideen, Haja N

    2014-09-30

    Obesity is a complex metabolic condition in strong association with various diseases, like type 2 diabetes, resulting in major public health and economic implications. Obesity is the result of environmental and genetic factors and their interactions, including genome-wide genetic interactions. Identification of co-expressed and regulatory genes in RNA extracted from relevant tissues representing lean and obese individuals provides an entry point for the identification of genes and pathways of importance to the development of obesity. The pig, an omnivorous animal, is an excellent model for human obesity, offering the possibility to study in-depth organ-level transcriptomic regulations of obesity, unfeasible in humans. Our aim was to reveal adipose tissue co-expression networks, pathways and transcriptional regulations of obesity using RNA Sequencing based systems biology approaches in a porcine model. We selected 36 animals for RNA Sequencing from a previously created F2 pig population representing three extreme groups based on their predicted genetic risks for obesity. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to detect clusters of highly co-expressed genes (modules). Additionally, regulator genes were detected using Lemon-Tree algorithms. WGCNA revealed five modules which were strongly correlated with at least one obesity-related phenotype (correlations ranging from -0.54 to 0.72, P < 0.001). Functional annotation identified pathways enlightening the association between obesity and other diseases, like osteoporosis (osteoclast differentiation, P = 1.4E-7), and immune-related complications (e.g. Natural killer cell mediated cytotoxity, P = 3.8E-5; B cell receptor signaling pathway, P = 7.2E-5). Lemon-Tree identified three potential regulator genes, using confident scores, for the WGCNA module which was associated with osteoclast differentiation: CCR1, MSR1 and SI1 (probability scores respectively 95.30, 62.28, and 34.58). Moreover, detection

  8. Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model

    PubMed Central

    2014-01-01

    Background Obesity is a complex metabolic condition in strong association with various diseases, like type 2 diabetes, resulting in major public health and economic implications. Obesity is the result of environmental and genetic factors and their interactions, including genome-wide genetic interactions. Identification of co-expressed and regulatory genes in RNA extracted from relevant tissues representing lean and obese individuals provides an entry point for the identification of genes and pathways of importance to the development of obesity. The pig, an omnivorous animal, is an excellent model for human obesity, offering the possibility to study in-depth organ-level transcriptomic regulations of obesity, unfeasible in humans. Our aim was to reveal adipose tissue co-expression networks, pathways and transcriptional regulations of obesity using RNA Sequencing based systems biology approaches in a porcine model. Methods We selected 36 animals for RNA Sequencing from a previously created F2 pig population representing three extreme groups based on their predicted genetic risks for obesity. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to detect clusters of highly co-expressed genes (modules). Additionally, regulator genes were detected using Lemon-Tree algorithms. Results WGCNA revealed five modules which were strongly correlated with at least one obesity-related phenotype (correlations ranging from -0.54 to 0.72, P < 0.001). Functional annotation identified pathways enlightening the association between obesity and other diseases, like osteoporosis (osteoclast differentiation, P = 1.4E-7), and immune-related complications (e.g. Natural killer cell mediated cytotoxity, P = 3.8E-5; B cell receptor signaling pathway, P = 7.2E-5). Lemon-Tree identified three potential regulator genes, using confident scores, for the WGCNA module which was associated with osteoclast differentiation: CCR1, MSR1 and SI1 (probability scores respectively 95.30, 62.28, and

  9. De Novo Sequencing-Based Transcriptome and Digital Gene Expression Analysis Reveals Insecticide Resistance-Relevant Genes in Propylaea japonica (Thunberg) (Coleoptea: Coccinellidae)

    PubMed Central

    Jin, Feng-Liang; Qiu, Bao-Li; Wu, Jian-Hui; Ren, Shun-Xiang

    2014-01-01

    The ladybird Propylaea japonica (Thunberg) is one of most important natural enemies of aphids in China. This species is threatened by the extensive use of insecticides but genomics-based information on the molecular mechanisms underlying insecticide resistance is limited. Hence, we analyzed the transcriptome and expression profile data of P. japonica in order to gain a deeper understanding of insecticide resistance in ladybirds. We performed de novo assembly of a transcriptome using Illumina's Solexa sequencing technology and short reads. A total of 27,243,552 reads were generated. These were assembled into 81,458 contigs and 33,647 unigenes (6,862 clusters and 26,785 singletons). Of the unigenes, 23,965 (71.22%) have putative homologues in the non-redundant (nr) protein database from NCBI, using BLASTX, with a cut-off E-value of 10−5. We examined COG, GO and KEGG annotations to better understand the functions of these unigenes. Digital gene expression (DGE) libraries showed differences in gene expression profiles between two insecticide resistant strains. When compared with an insecticide susceptible profile, a total of 4,692 genes were significantly up- or down- regulated in a moderately resistant strain. Among these genes, 125 putative insecticide resistance genes were identified. To confirm the DGE results, 16 selected genes were validated using quantitative real time PCR (qRT-PCR). This study is the first to report genetic information on P. japonica and has greatly enriched the sequence data for ladybirds. The large number of gene sequences produced from the transcriptome and DGE sequencing will greatly improve our understanding of this important insect, at the molecular level, and could contribute to the in-depth research into insecticide resistance mechanisms. PMID:24959827

  10. Quantitative multiplex quantum dot in-situ hybridisation based gene expression profiling in tissue microarrays identifies prognostic genes in acute myeloid leukaemia

    SciTech Connect

    Tholouli, Eleni; MacDermott, Sarah; Hoyland, Judith; Yin, John Liu; Byers, Richard

    2012-08-24

    Highlights: Black-Right-Pointing-Pointer Development of a quantitative high throughput in situ expression profiling method. Black-Right-Pointing-Pointer Application to a tissue microarray of 242 AML bone marrow samples. Black-Right-Pointing-Pointer Identification of HOXA4, HOXA9, Meis1 and DNMT3A as prognostic markers in AML. -- Abstract: Measurement and validation of microarray gene signatures in routine clinical samples is problematic and a rate limiting step in translational research. In order to facilitate measurement of microarray identified gene signatures in routine clinical tissue a novel method combining quantum dot based oligonucleotide in situ hybridisation (QD-ISH) and post-hybridisation spectral image analysis was used for multiplex in-situ transcript detection in archival bone marrow trephine samples from patients with acute myeloid leukaemia (AML). Tissue-microarrays were prepared into which white cell pellets were spiked as a standard. Tissue microarrays were made using routinely processed bone marrow trephines from 242 patients with AML. QD-ISH was performed for six candidate prognostic genes using triplex QD-ISH for DNMT1, DNMT3A, DNMT3B, and for HOXA4, HOXA9, Meis1. Scrambled oligonucleotides were used to correct for background staining followed by normalisation of expression against the expression values for the white cell pellet standard. Survival analysis demonstrated that low expression of HOXA4 was associated with poorer overall survival (p = 0.009), whilst high expression of HOXA9 (p < 0.0001), Meis1 (p = 0.005) and DNMT3A (p = 0.04) were associated with early treatment failure. These results demonstrate application of a standardised, quantitative multiplex QD-ISH method for identification of prognostic markers in formalin-fixed paraffin-embedded clinical samples, facilitating measurement of gene expression signatures in routine clinical samples.

  11. Gene expression signature based screening identifies ribonucleotide reductase as a candidate therapeutic target in Ewing sarcoma

    PubMed Central

    Goss, Kelli L.; Gordon, David J.

    2016-01-01

    There is a critical need in cancer therapeutics to identify targeted therapies that will improve outcomes and decrease toxicities compared to conventional, cytotoxic chemotherapy. Ewing sarcoma is a highly aggressive bone and soft tissue cancer that is caused by the EWS-FLI1 fusion protein. Although EWS-FLI1 is specific for cancer cells, and required for tumorigenesis, directly targeting this transcription factor has proven challenging. Consequently, targeting unique dependencies or key downstream mediators of EWS-FLI1 represent important alternative strategies. We used gene expression data derived from a genetically defined model of Ewing sarcoma to interrogate the Connectivity Map and identify a class of drugs, iron chelators, that downregulate a significant number of EWS-FLI1 target genes. We then identified ribonucleotide reductase M2 (RRM2), the iron-dependent subunit of ribonucleotide reductase (RNR), as one mediator of iron chelator toxicity in Ewing sarcoma cells. Inhibition of RNR in Ewing sarcoma cells caused apoptosis in vitro and attenuated tumor growth in an in vivo, xenograft model. Additionally, we discovered that the sensitivity of Ewing sarcoma cells to inhibition or suppression of RNR is mediated, in part, by high levels of SLFN11, a protein that sensitizes cells to DNA damage. This work demonstrates a unique dependency of Ewing sarcoma cells on RNR and supports further investigation of RNR inhibitors, which are currently used in clinical practice, as a novel approach for treating Ewing sarcoma. PMID:27557498

  12. Comparative Analysis of RNAi-Based Methods to Down-Regulate Expression of Two Genes Expressed at Different Levels in Myzus persicae

    PubMed Central

    Mulot, Michaël; Boissinot, Sylvaine; Monsion, Baptiste; Rastegar, Maryam; Clavijo, Gabriel; Halter, David; Bochet, Nicole; Erdinger, Monique; Brault, Véronique

    2016-01-01

    With the increasing availability of aphid genomic data, it is necessary to develop robust functional validation methods to evaluate the role of specific aphid genes. This work represents the first study in which five different techniques, all based on RNA interference and on oral acquisition of double-stranded RNA (dsRNA), were developed to silence two genes, ALY and Eph, potentially involved in polerovirus transmission by aphids. Efficient silencing of only Eph transcripts, which are less abundant than those of ALY, could be achieved by feeding aphids on transgenic Arabidopsis thaliana expressing an RNA hairpin targeting Eph, on Nicotiana benthamiana infected with a Tobacco rattle virus (TRV)-Eph recombinant virus, or on in vitro-synthesized Eph-targeting dsRNA. These experiments showed that the silencing efficiency may differ greatly between genes and that aphid gut cells seem to be preferentially affected by the silencing mechanism after oral acquisition of dsRNA. In addition, the use of plants infected with recombinant TRV proved to be a promising technique to silence aphid genes as it does not require plant transformation. This work highlights the need to pursue development of innovative strategies to reproducibly achieve reduction of expression of aphid genes. PMID:27869783

  13. Method of controlling gene expression

    DOEpatents

    Peters, Norman K.; Frost, John W.; Long, Sharon R.

    1991-12-03

    A method of controlling expression of a DNA segment under the control of a nod gene promoter which comprises administering to a host containing a nod gene promoter an amount sufficient to control expression of the DNA segment of a compound of the formula: ##STR1## in which each R is independently H or OH, is described.

  14. The flow of gene expression.

    PubMed

    Misteli, Tom

    2004-03-01

    Gene expression is a highly interconnected multistep process. A recent meeting in Iguazu Falls, Argentina, highlighted the need to uncover both the molecular details of each single step as well as the mechanisms of coordination among processes in order to fully understand the expression of genes.

  15. Discovery of molecular associations among aging, stem cells, and cancer based on gene expression profiling.

    PubMed

    Wang, Xiaosheng

    2013-04-01

    The emergence of a huge volume of "omics" data enables a computational approach to the investigation of the biology of cancer. The cancer informatics approach is a useful supplement to the traditional experimental approach. I reviewed several reports that used a bioinformatics approach to analyze the associations among aging, stem cells, and cancer by microarray gene expression profiling. The high expression of aging- or human embryonic stem cell-related molecules in cancer suggests that certain important mechanisms are commonly underlying aging, stem cells, and cancer. These mechanisms are involved in cell cycle regulation, metabolic process, DNA damage response, apoptosis, p53 signaling pathway, immune/inflammatory response, and other processes, suggesting that cancer is a developmental and evolutional disease that is strongly related to aging. Moreover, these mechanisms demonstrate that the initiation, proliferation, and metastasis of cancer are associated with the deregulation of stem cells. These findings provide insights into the biology of cancer. Certainly, the findings that are obtained by the informatics approach should be justified by experimental validation. This review also noted that next-generation sequencing data provide enriched sources for cancer informatics study.

  16. Discovering modulators of gene expression

    PubMed Central

    Babur, Özgün; Demir, Emek; Gönen, Mithat; Sander, Chris; Dogrusoz, Ugur

    2010-01-01

    Proteins that modulate the activity of transcription factors, often called modulators, play a critical role in creating tissue- and context-specific gene expression responses to the signals cells receive. GEM (Gene Expression Modulation) is a probabilistic framework that predicts modulators, their affected targets and mode of action by combining gene expression profiles, protein–protein interactions and transcription factor–target relationships. Using GEM, we correctly predicted a significant number of androgen receptor modulators and observed that most modulators can both act as co-activators and co-repressors for different target genes. PMID:20466809

  17. Evaluation of light regulatory potential of Calvin cycle steps based on large-scale gene expression profiling data.

    PubMed

    Sun, Ning; Ma, Ligeng; Pan, Deyun; Zhao, Hongyu; Deng, Xing Wang

    2003-11-01

    Although large-scale gene expression data have been studied from many perspectives, they have not been systematically integrated to infer the regulatory potentials of individual genes in specific pathways. Here we report the analysis of expression patterns of genes in the Calvin cycle from 95 Arabidopsis microarray experiments, which revealed a consistent gene regulation pattern in most experiments. This identified pattern, likely due to gene regulation by light rather than feedback regulations of the metabolite fluxes in the Calvin cycle, is remarkably consistent with the rate-limiting roles of the enzymes encoded by these genes reported from both experimental and modeling approaches. Therefore, the regulatory potential of the genes in a pathway may be inferred from their expression patterns. Furthermore, gene expression analysis in the context of a known pathway helps to categorize various biological perturbations that would not be recognized with the prevailing methods.

  18. Widespread ectopic expression of olfactory receptor genes

    PubMed Central

    Feldmesser, Ester; Olender, Tsviya; Khen, Miriam; Yanai, Itai; Ophir, Ron; Lancet, Doron

    2006-01-01

    Background Olfactory receptors (ORs) are the largest gene family in the human genome. Although they are expected to be expressed specifically in olfactory tissues, some ectopic expression has been reported, with special emphasis on sperm and testis. The present study systematically explores the expression patterns of OR genes in a large number of tissues and assesses the potential functional implication of such ectopic expression. Results We analyzed the expression of hundreds of human and mouse OR transcripts, via EST and microarray data, in several dozens of human and mouse tissues. Different tissues had specific, relatively small OR gene subsets which had particularly high expression levels. In testis, average expression was not particularly high, and very few highly expressed genes were found, none corresponding to ORs previously implicated in sperm chemotaxis. Higher expression levels were more common for genes with a non-OR genomic neighbor. Importantly, no correlation in expression levels was detected for human-mouse orthologous pairs. Also, no significant difference in expression levels was seen between intact and pseudogenized ORs, except for the pseudogenes of subfamily 7E which has undergone a human-specific expansion. Conclusion The OR superfamily as a whole, show widespread, locus-dependent and heterogeneous expression, in agreement with a neutral or near neutral evolutionary model for transcription control. These results cannot reject the possibility that small OR subsets might play functional roles in different tissues, however considerable care should be exerted when offering a functional interpretation for ectopic OR expression based only on transcription information. PMID:16716209

  19. ANMM4CBR: a case-based reasoning method for gene expression data classification.

    PubMed

    Yao, Bangpeng; Li, Shao

    2010-01-06

    Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms. In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data. The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise. The source code is attached as an additional file of this paper.

  20. An expanded maize gene expression atlas based on RNA sequencing and its use to explore root development

    DOE PAGES

    Stelpflug, Scott C.; Sekhon, Rajandeep S.; Vaillancourt, Brieanne; ...

    2015-12-30

    Comprehensive and systematic transcriptome profiling provides valuable insight into biological and developmental processes that occur throughout the life cycle of a plant. We have enhanced our previously published microarray-based gene atlas of maize (Zea mays L.) inbred B73 to now include 79 distinct replicated samples that have been interrogated using RNA sequencing (RNA-seq). The current version of the atlas includes 50 original array-based gene atlas samples, a time-course of 12 stalk and leaf samples postflowering, and an additional set of 17 samples from the maize seedling and adult root system. The entire dataset contains 4.6 billion mapped reads, with anmore » average of 20.5 million mapped reads per biological replicate, allowing for detection of genes with lower transcript abundance. As the new root samples represent key additions to the previously examined tissues, we highlight insights into the root transcriptome, which is represented by 28,894 (73.2%) annotated genes in maize. Additionally, we observed remarkable expression differences across both the longitudinal (four zones) and radial gradients (cortical parenchyma and stele) of the primary root supported by fourfold differential expression of 9353 and 4728 genes, respectively. Among the latter were 1110 genes that encode transcription factors, some of which are orthologs of previously characterized transcription factors known to regulate root development in Arabidopsis thaliana (L.) Heynh., while most are novel, and represent attractive targets for reverse genetics approaches to determine their roles in this important organ. As a result, this comprehensive transcriptome dataset is a powerful tool toward understanding maize development, physiology, and phenotypic diversity.« less

  1. An expanded maize gene expression atlas based on RNA sequencing and its use to explore root development

    SciTech Connect

    Stelpflug, Scott C.; Sekhon, Rajandeep S.; Vaillancourt, Brieanne; Hirsch, Candice N.; Buell, C. Robin; de Leon, Natalia; Kaeppler, Shawn M.

    2015-12-30

    Comprehensive and systematic transcriptome profiling provides valuable insight into biological and developmental processes that occur throughout the life cycle of a plant. We have enhanced our previously published microarray-based gene atlas of maize (Zea mays L.) inbred B73 to now include 79 distinct replicated samples that have been interrogated using RNA sequencing (RNA-seq). The current version of the atlas includes 50 original array-based gene atlas samples, a time-course of 12 stalk and leaf samples postflowering, and an additional set of 17 samples from the maize seedling and adult root system. The entire dataset contains 4.6 billion mapped reads, with an average of 20.5 million mapped reads per biological replicate, allowing for detection of genes with lower transcript abundance. As the new root samples represent key additions to the previously examined tissues, we highlight insights into the root transcriptome, which is represented by 28,894 (73.2%) annotated genes in maize. Additionally, we observed remarkable expression differences across both the longitudinal (four zones) and radial gradients (cortical parenchyma and stele) of the primary root supported by fourfold differential expression of 9353 and 4728 genes, respectively. Among the latter were 1110 genes that encode transcription factors, some of which are orthologs of previously characterized transcription factors known to regulate root development in Arabidopsis thaliana (L.) Heynh., while most are novel, and represent attractive targets for reverse genetics approaches to determine their roles in this important organ. As a result, this comprehensive transcriptome dataset is a powerful tool toward understanding maize development, physiology, and phenotypic diversity.

  2. Blood-Based Gene Expression Signatures of Autistic Infants and Toddlers

    PubMed Central

    Glatt, Stephen J.; Tsuang, Ming T.; Winn, Mary; Chandler, Sharon D.; Collins, Melanie; Lopez, Linda; Weinfeld, Melanie; Carter, Cindy; Schork, Nicholas

    2013-01-01

    Objective Autism spectrum disorders (ASDs) are highly heritable neurodevelopmental disorders that onset clinically during the first years of life. ASD-risk biomarkers expressed early in life could significantly impact diagnosis and treatment, but no transcriptome-wide biomarker classifiers derived from fresh blood samples from children with autism have yet emerged. Method Using a community-based, prospective, longitudinal method, we identified 60 infants and toddlers at-risk for ASDs (autistic disorder and pervasive developmental disorder), 34 at-risk for language delay (LD), 17 at-risk for global developmental delay (DD), and 68 typically developing (TD) comparison children. Diagnoses were confirmed via longitudinal follow-up. Each child's mRNA expression profile in peripheral blood mononuclear cells (PBMCs) was determined by microarray. Results Potential ASD biomarkers were discovered in one half of the sample and used to build a classifier with high diagnostic accuracy in the remaining half of the sample. Conclusions The mRNA expression abnormalities reliably observed in PBMCs, which are safely and easily assayed in babies, offer the first potential peripheral blood-based early biomarker panel of risk for autism in infants and toddlers. Future work should verify these biomarkers and evaluate if they may also serve as indirect indices of deviant molecular neural mechanisms in autism. PMID:22917206

  3. Development of tobacco ringspot virus-based vectors for foreign gene expression and virus-induced gene silencing in a variety of plants.

    PubMed

    Zhao, Fumei; Lim, Seungmo; Igori, Davaajargal; Yoo, Ran Hee; Kwon, Suk-Yoon; Moon, Jae Sun

    2016-05-01

    We report here the development of tobacco ringspot virus (TRSV)-based vectors for the transient expression of foreign genes and for the analysis of endogenous gene function in plants using virus-induced gene silencing. The jellyfish green fluorescent protein (GFP) gene was inserted between the TRSV movement protein (MP) and coat protein (CP) regions, resulting in high in-frame expression of the RNA2-encoded viral polyprotein. GFP was released from the polyprotein via an N-terminal homologous MP-CP cleavage site and a C-terminal foot-and-mouth disease virus (FMDV) 2 A catalytic peptide in Nicotiana benthamiana. The VIGS target gene was introduced in the sense and antisense orientations into a SnaBI site, which was created by mutating the sequence following the CP stop codon. VIGS of phytoene desaturase (PDS) in N. benthamiana, Arabidopsis ecotype Col-0, cucurbits and legumes led to obvious photo-bleaching phenotypes. A significant reduction in PDS mRNA levels in silenced plants was confirmed by semi-quantitative RT-PCR. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. A Cross-Species Gene Expression Marker-Based Genetic Map and QTL Analysis in Bambara Groundnut.

    PubMed

    Chai, Hui Hui; Ho, Wai Kuan; Graham, Neil; May, Sean; Massawe, Festo; Mayes, Sean

    2017-02-22

    Bambara groundnut (Vigna subterranea (L.) Verdc.) is an underutilised legume crop, which has long been recognised as a protein-rich and drought-tolerant crop, used extensively in Sub-Saharan Africa. The aim of the study was to identify quantitative trait loci (QTL) involved in agronomic and drought-related traits using an expression marker-based genetic map based on major crop resources developed in soybean. The gene expression markers (GEMs) were generated at the (unmasked) probe-pair level after cross-hybridisation of bambara groundnut leaf RNA to the Affymetrix Soybean Genome GeneChip. A total of 753 markers grouped at an LOD (Logarithm of odds) of three, with 527 markers mapped into linkage groups. From this initial map, a spaced expression marker-based genetic map consisting of 13 linkage groups containing 218 GEMs, spanning 982.7 cM (centimorgan) of the bambara groundnut genome, was developed. Of the QTL detected, 46% were detected in both control and drought treatment populations, suggesting that they are the result of intrinsic trait differences between the parental lines used to construct the cross, with 31% detected in only one of the conditions. The present GEM map in bambara groundnut provides one technically feasible route for the translation of information and resources from major and model plant species to underutilised and resource-poor crops.

  5. A Cross-Species Gene Expression Marker-Based Genetic Map and QTL Analysis in Bambara Groundnut

    PubMed Central

    Chai, Hui Hui; Ho, Wai Kuan; Graham, Neil; May, Sean; Massawe, Festo; Mayes, Sean

    2017-01-01

    Bambara groundnut (Vigna subterranea (L.) Verdc.) is an underutilised legume crop, which has long been recognised as a protein-rich and drought-tolerant crop, used extensively in Sub-Saharan Africa. The aim of the study was to identify quantitative trait loci (QTL) involved in agronomic and drought-related traits using an expression marker-based genetic map based on major crop resources developed in soybean. The gene expression markers (GEMs) were generated at the (unmasked) probe-pair level after cross-hybridisation of bambara groundnut leaf RNA to the Affymetrix Soybean Genome GeneChip. A total of 753 markers grouped at an LOD (Logarithm of odds) of three, with 527 markers mapped into linkage groups. From this initial map, a spaced expression marker-based genetic map consisting of 13 linkage groups containing 218 GEMs, spanning 982.7 cM (centimorgan) of the bambara groundnut genome, was developed. Of the QTL detected, 46% were detected in both control and drought treatment populations, suggesting that they are the result of intrinsic trait differences between the parental lines used to construct the cross, with 31% detected in only one of the conditions. The present GEM map in bambara groundnut provides one technically feasible route for the translation of information and resources from major and model plant species to underutilised and resource-poor crops. PMID:28241413

  6. Gene Expression Patterns in Ovarian Carcinomas

    PubMed Central

    Schaner, Marci E.; Ross, Douglas T.; Ciaravino, Giuseppe; Sørlie, Therese; Troyanskaya, Olga; Diehn, Maximilian; Wang, Yan C.; Duran, George E.; Sikic, Thomas L.; Caldeira, Sandra; Skomedal, Hanne; Tu, I-Ping; Hernandez-Boussard, Tina; Johnson, Steven W.; O'Dwyer, Peter J.; Fero, Michael J.; Kristensen, Gunnar B.; Børresen-Dale, Anne-Lise; Hastie, Trevor; Tibshirani, Robert; van de Rijn, Matt; Teng, Nelson N.; Longacre, Teri A.; Botstein, David; Brown, Patrick O.; Sikic, Branimir I.

    2003-01-01

    We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from other ovarian carcinomas, grade I and II from grade III serous papillary carcinomas, and ovarian from breast carcinomas. Six clear cell carcinomas were distinguished from 36 other ovarian carcinomas (predominantly serous papillary) based on their gene expression patterns. The differences may yield insights into the worse prognosis and therapeutic resistance associated with clear cell carcinomas. A comparison of the gene expression patterns in the ovarian cancers to published data of gene expression in breast cancers revealed a large number of differentially expressed genes. We identified a group of 62 genes that correctly classified all 125 breast and ovarian cancer specimens. Among the best discriminators more highly expressed in the ovarian carcinomas were PAX8 (paired box gene 8), mesothelin, and ephrin-B1 (EFNB1). Although estrogen receptor was expressed in both the ovarian and breast cancers, genes that are coregulated with the estrogen receptor in breast cancers, including GATA-3, LIV-1, and X-box binding protein 1, did not show a similar pattern of coexpression in the ovarian cancers. PMID:12960427

  7. Differential expression analysis for individual cancer samples based on robust within-sample relative gene expression orderings across multiple profiling platforms

    PubMed Central

    Guan, Qingzhou; Chen, Rou; Yan, Haidan; Cai, Hao; Guo, You; Li, Mengyao; Li, Xiangyu; Tong, Mengsha; Ao, Lu; Li, Hongdong; Hong, Guini; Guo, Zheng

    2016-01-01

    The highly stable within-sample relative expression orderings (REOs) of gene pairs in a particular type of human normal tissue are widely reversed in the cancer condition. Based on this finding, we have recently proposed an algorithm named RankComp to detect differentially expressed genes (DEGs) for individual disease samples measured by a particular platform. In this paper, with 461 normal lung tissue samples separately measured by four commonly used platforms, we demonstrated that tens of millions of gene pairs with significantly stable REOs in normal lung tissue can be consistently detected in samples measured by different platforms. However, about 20% of stable REOs commonly detected by two different platforms (e.g., Affymetrix and Illumina platforms) showed inconsistent REO patterns due to the differences in probe design principles. Based on the significantly stable REOs (FDR<0.01) for normal lung tissue consistently detected by the four platforms, which tended to have large rank differences, RankComp detected averagely 1184, 1335 and 1116 DEGs per sample with averagely 96.51%, 95.95% and 94.78% precisions in three evaluation datasets with 25, 57 and 58 paired lung cancer and normal samples, respectively. Individualized pathway analysis revealed some common and subtype-specific functional mechanisms of lung cancer. Similar results were observed for colorectal cancer. In conclusion, based on the cross-platform significantly stable REOs for a particular normal tissue, differentially expressed genes and pathways in any disease sample measured by any of the platforms can be readily and accurately detected, which could be further exploited for dissecting the heterogeneity of cancer. PMID:27634898

  8. An approach to transgene expression in liver endothelial cells using a liposome-based gene vector coated with hyaluronic acid.

    PubMed

    Yamada, Yuma; Hashida, Masahiro; Hayashi, Yasuhiro; Tabata, Mai; Hyodo, Mamoru; Ara, Mst Naznin; Ohga, Noritaka; Hida, Kyoko; Harashima, Hideyoshi

    2013-09-01

    Dysfunctional sinusoidal liver endothelial cells (LECs) are associated with liver diseases, such as liver fibrosis, cirrhosis, and portal hypertension. Because of this, gene therapy targeted to LECs would be a useful and productive strategy for directly treating these diseases at the level of genes. Here, we report on the development of a transgene vector that specifically targets LECs. The vector is a liposome-based gene vector coated with hyaluronic acid (HA). HA is a natural ligand for LECs and confers desirable properties on particles, rendering them biodegradable, biocompatible, and nonimmunogenic. In this study, we constructed HA-modified carriers, and evaluated cellular uptake and transfection activity using cultured LECs from KSN nude mice (KSN-LECs). Cellular uptake analyses showed that KSN-LECs recognized the HA-modified carriers more effectively than skin endothelial cells. The transfection assay indicated that the efficient gene expression in KSN-LECs, using the HA-modified carriers, required an adequate lipid composition and a functional device to control intracellular trafficking. This finding contributes to our overall knowledge of transgene expression targeted to LECs.

  9. Identification of biomarkers that distinguish chemical contaminants based on gene expression profiles

    PubMed Central

    2014-01-01

    Background High throughput transcriptomics profiles such as those generated using microarrays have been useful in identifying biomarkers for different classification and toxicity prediction purposes. Here, we investigated the use of microarrays to predict chemical toxicants and their possible mechanisms of action. Results In this study, in vitro cultures of primary rat hepatocytes were exposed to 105 chemicals and vehicle controls, representing 14 compound classes. We comprehensively compared various normalization of gene expression profiles, feature selection and classification algorithms for the classification of these 105 chemicals into14 compound classes. We found that normalization had little effect on the averaged classification accuracy. Two support vector machine (SVM) methods, LibSVM and sequential minimal optimization, had better classification performance than other methods. SVM recursive feature selection (SVM-RFE) had the highest overfitting rate when an independent dataset was used for a prediction. Therefore, we developed a new feature selection algorithm called gradient method that had a relatively high training classification as well as prediction accuracy with the lowest overfitting rate of the methods tested. Analysis of biomarkers that distinguished the 14 classes of compounds identified a group of genes principally involved in cell cycle function that were significantly downregulated by metal and inflammatory compounds, but were induced by anti-microbial, cancer related drugs, pesticides, and PXR mediators. Conclusions Our results indicate that using microarrays and a supervised machine learning approach to predict chemical toxicants, their potential toxicity and mechanisms of action is practical and efficient. Choosing the right feature and classification algorithms for this multiple category classification and prediction is critical. PMID:24678894

  10. RNase one gene isolation, expression, and affinity purification models research experimental progression and culminates with guided inquiry-based experiments.

    PubMed

    Bailey, Cheryl P

    2009-01-01

    This new biochemistry laboratory course moves through a progression of experiments that generates a platform for guided inquiry-based experiments. RNase One gene is isolated from prokaryotic genomic DNA, expressed as a tagged protein, affinity purified, and tested for activity and substrate specificity. Student pairs present detailed explanations of materials and methods and the semester culminates in a poster session. Experimental plans take into account the expense and time required to move from gene isolation to enzyme assays. This combination of instructor-guided and student-designed experiments is a manageable foray into guided inquiry-based learning in a biochemistry laboratory course, while providing a cohesive story and context for individual experiments.

  11. Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold.

    PubMed

    Glass, Edmund R; Dozmorov, Mikhail G

    2016-10-06

    The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions

  12. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

    PubMed

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.

  13. Gene set analysis for longitudinal gene expression data

    PubMed Central

    2011-01-01

    Background Gene set analysis (GSA) has become a successful tool to interpret gene expression profiles in terms of biological functions, molecular pathways, or genomic locations. GSA performs statistical tests for independent microarray samples at the level of gene sets rather than individual genes. Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a GSA needs to take into account the within-gene correlations in addition to possible between-gene correlations. Results We provide a robust nonparametric approach to compare the expressions of longitudinally measured sets of genes under multiple treatments or experimental conditions. The limiting distributions of our statistics are derived when the number of genes goes to infinity while the number of replications can be small. When the number of genes in a gene set is small, we recommend permutation tests based on our nonparametric test statistics to achieve reliable type I error and better power while incorporating unknown correlations between and within-genes. Simulation results demonstrate that the proposed method has a greater power than other methods for various data distributions and heteroscedastic correlation structures. This method was used for an IL-2 stimulation study and significantly altered gene sets were identified. Conclusions The simulation study and the real data application showed that the proposed gene set analysis provides a promising tool for longitudinal microarray analysis. R scripts for simulating longitudinal data and calculating the nonparametric statistics are posted on the North Dakota INBRE website http://ndinbre.org/programs/bioinformatics.php. Raw microarray data is available in Gene Expression Omnibus (National Center for Biotechnology Information) with accession number GSE6085. PMID

  14. Delivery of RNA-based molecules to human hematopoietic stem and progenitor cells for modulation of gene expression.

    PubMed

    Diener, Yvonne; Bosio, Andreas; Bissels, Ute

    2016-11-01

    Gene modulation of human hematopoietic stem and progenitor cells (HSPCs) harbors great potential for therapeutic application of these cells and presents a versatile tool in basic research to enhance our understanding of HSPC biology. However, stable genetic modification might be adverse, particularly in clinical settings. Here, we review a broad range of approaches to transient, nonviral modulation of protein expression with a focus on RNA-based methods. We compare different delivery methods and describe the usefulness of RNA molecules for overexpression as well as downregulation of proteins in HSPCs.

  15. Machine learning-based classification of diffuse large B-cell lymphoma patients by eight gene expression profiles.

    PubMed

    Zhao, Shuangtao; Dong, Xiaoli; Shen, Wenzhi; Ye, Zhen; Xiang, Rong

    2016-05-01

    Gene expression profiling (GEP) had divided the diffuse large B-cell lymphoma (DLBCL) into molecular subgroups: germinal center B-cell like (GCB), activated B-cell like (ABC), and unclassified (UC) subtype. However, this classification with prognostic significance was not applied into clinical practice since there were more than 1000 genes to detect and interpreting was difficult. To classify cancer samples validly, eight significant genes (MYBL1, LMO2, BCL6, MME, IRF4, NFKBIZ, PDE4B, and SLA) were selected in 414 patients treated with CHOP/R-CHOP chemotherapy from Gene Expression Omnibus (GEO) data sets. Cutoffs for each gene were obtained using receiver-operating characteristic curves (ROC) new model based on the support vector machine (SVM) estimated the probability of membership into one of two subgroups: GCB and Non-GCB (ABC and UC). Furtherly, multivariate analysis validated the model in another two cohorts including 855 cases in all. As a result, patients in the training and validated cohorts were stratified into two subgroups with 94.0%, 91.0%, and 94.4% concordance with GEP, respectively. Patients with Non-GCB subtype had significantly poorer outcomes than that with GCB subtype, which agreed with the prognostic power of GEP classification. Moreover, the similar prognosis received in the low (0-2) and high (3-5) IPI scores group demonstrated that the new model was independent of IPI as well as GEP method. In conclusion, our new model could stratify DLBCL patients with CHOP/R-CHOP regimen matching GEP subtypes effectively.

  16. Gene Expression-Based Classifiers Identify Staphylococcus aureus Infection in Mice and Humans

    PubMed Central

    Cyr, Derek D.; Zhang, Yurong; van Velkinburgh, Jennifer C.; Langley, Raymond J.; Glickman, Seth W.; Cairns, Charles B.; Zaas, Aimee K.; Rivers, Emanuel P.; Otero, Ronny M.; Veldman, Tim; Kingsmore, Stephen F.; Lucas, Joseph; Woods, Christopher W.; Ginsburg, Geoffrey S.; Fowler, Vance G.

    2013-01-01

    Staphylococcus aureus causes a spectrum of human infection. Diagnostic delays and uncertainty lead to treatment delays and inappropriate antibiotic use. A growing literature suggests the host’s inflammatory response to the pathogen represents a potential tool to improve upon current diagnostics. The hypothesis of this study is that the host responds differently to S. aureus than to E. coli infection in a quantifiable way, providing a new diagnostic avenue. This study uses Bayesian sparse factor modeling and penalized binary regression to define peripheral blood gene-expression classifiers of murine and human S. aureus infection. The murine-derived classifier distinguished S. aureus infection from healthy controls and Escherichia coli-infected mice across a range of conditions (mouse and bacterial strain, time post infection) and was validated in outbred mice (AUC>0.97). A S. aureus classifier derived from a cohort of 94 human subjects distinguished S. aureus blood stream infection (BSI) from healthy subjects (AUC 0.99) and E. coli BSI (AUC 0.84). Murine and human responses to S. aureus infection share common biological pathways, allowing the murine model to classify S. aureus BSI in humans (AUC 0.84). Both murine and human S. aureus classifiers were validated in an independent human cohort (AUC 0.95 and 0.92, respectively). The approach described here lends insight into the conserved and disparate pathways utilized by mice and humans in response to these infections. Furthermore, this study advances our understanding of S. aureus infection; the host response to it; and identifies new diagnostic and therapeutic avenues. PMID:23326304

  17. DNA microarray-based experimental strategy for trustworthy expression profiling of the hippocampal genes by astaxanthin supplementation in adult mouse

    PubMed Central

    Yook, Jang Soo; Shibato, Junko; Rakwal, Randeep; Soya, Hideaki

    2015-01-01

    Naturally occurring astaxantin (ASX) is one of the noticeable carotenoid and dietary supplement, which has strong antioxidant and anti-inflammatory properties, and neuroprotective effects in the brain through crossing the blood–brain barrier. Specially, we are interested in the role of ASX as a brain food. Although ASX has been suggested to have potential benefit to the brain function, the underlying molecular mechanisms and events mediating such effect remain unknown. Here we examined molecular factors in the hippocampus of adult mouse fed ASX diets (0.1% and 0.5% doses) using DNA microarray (Agilent 4 × 44 K whole mouse genome chip) analysis. In this study, we described in detail our experimental workflow and protocol, and validated quality controls with the housekeeping gene expression (Gapdh and Beta-actin) on the dye-swap based approach to advocate our microarray data, which have been uploaded to Gene Expression Omnibus (accession number GSE62197) as a gene resource for the scientific community. This data will also form an important basis for further detailed experiments and bioinformatics analysis with an aim to unravel the potential molecular pathways or mechanisms underlying the positive effects of ASX supplementation on the brain, in particular the hippocampus. PMID:26981356

  18. Assessment of the effectiveness of a nuclear-launched TMV-based replicon as a tool for foreign gene expression in plants in comparison to direct gene expression from a nuclear promoter.

    PubMed

    Man, Michal; Epel, Bernard L

    2006-02-01

    An environmentally safe Tobacco Mosaic Virus (TMV)-based expression replicon was constructed that lacks movement protein (MP) and coat protein (CP), and which expresses the green fluorescent protein (GFP) gene from a full CP subgenomic promoter. The TMV replicon, whose cDNA was positioned between an enhanced Cauliflower Mosaic Virus 35S promoter (CaMV) and a self-cleaving hammerhead ribozyme with a downstream nopaline synthase gene polyadenylation signal [nos-poly(A)], was assessed for its effectiveness to accumulate GFP upon agroinfiltration into plant leaves compared to a control construct in which GFP was directly expressed from the enhanced CaMV 35S promoter. It was determined that individually expressing cells produced ca. 9-fold more GFP from the TMV-based replicon than from the enhanced 35S promoter. In contrast, GFP measurements from total leaf extracts determined that leaves infiltrated with the TMV-based replicon produced ca. 7-fold less GFP than the control construct. These apparently contradictory results can be explained by the low infectivity of the TMV-based replicon as it was found that the number of foci expressing GFP produced in leaves agroinfiltrated with the TMV-based replicon was ca. 66-fold lower than produced by the control.

  19. Mining Gene Expression Data of Multiple Sclerosis

    PubMed Central

    Zhu, Zhenli; Huang, Zhengliang; Li, Ke

    2014-01-01

    Objectives Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. Materials and methods Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. Results An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. Conclusions The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases. PMID:24932510

  20. The functional landscape of mouse gene expression

    PubMed Central

    Zhang, Wen; Morris, Quaid D; Chang, Richard; Shai, Ofer; Bakowski, Malina A; Mitsakakis, Nicholas; Mohammad, Naveed; Robinson, Mark D; Zirngibl, Ralph; Somogyi, Eszter; Laurin, Nancy; Eftekharpour, Eftekhar; Sat, Eric; Grigull, Jörg; Pan, Qun; Peng, Wen-Tao; Krogan, Nevan; Greenblatt, Jack; Fehlings, Michael; van der Kooy, Derek; Aubin, Jane; Bruneau, Benoit G; Rossant, Janet; Blencowe, Benjamin J; Frey, Brendan J; Hughes, Timothy R

    2004-01-01

    Background Large-scale quantitative analysis of transcriptional co-expression has been used to dissect regulatory networks and to predict the functions of new genes discovered by genome sequencing in model organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function in mammals is widely accepted, it has not been objectively tested nor compared with the related but distinct strategy of correlating gene co-expression as a means to predict gene function. Results We generated microarray expression data for nearly 40,000 known and predicted mRNAs in 55 mouse tissues, using custom-built oligonucleotide arrays. We show that quantitative transcriptional co-expression is a powerful predictor of gene function. Hundreds of functional categories, as defined by Gene Ontology 'Biological Processes', are associated with characteristic expression patterns across all tissues, including categories that bear no overt relationship to the tissue of origin. In contrast, simple tissue-specific restriction of expression is a poor predictor of which genes are in which functional categories. As an example, the highly conserved mouse gene PWP1 is widely expressed across different tissues but is co-expressed with many RNA-processing genes; we show that the uncharacterized yeast homolog of PWP1 is required for rRNA biogenesis. Conclusions We conclude that 'functional genomics' strategies based on quantitative transcriptional co-expression will be as fruitful in mammals as they have been in simpler organisms, and that transcriptional control of mammalian physiology is more modular than is generally appreciated. Our data and analyses provide a public resource for mammalian functional genomics. PMID:15588312

  1. Relationships among CFTR expression, HCO3- secretion, and host defense may inform gene- and cell-based cystic fibrosis therapies.

    PubMed

    Shah, Viral S; Ernst, Sarah; Tang, Xiao Xiao; Karp, Philip H; Parker, Connor P; Ostedgaard, Lynda S; Welsh, Michael J

    2016-05-10

    Cystic fibrosis (CF) is caused by mutations in the gene encoding the cystic fibrosis transmembrane conductance regulator (CFTR) anion channel. Airway disease is the major source of morbidity and mortality. Successful implementation of gene- and cell-based therapies for CF airway disease requires knowledge of relationships among percentages of targeted cells, levels of CFTR expression, correction of electrolyte transport, and rescue of host defense defects. Previous studies suggested that, when ∼10-50% of airway epithelial cells expressed CFTR, they generated nearly wild-type levels of Cl(-) secretion; overexpressing CFTR offered no advantage compared with endogenous expression levels. However, recent discoveries focused attention on CFTR-mediated HCO3 (-) secretion and airway surface liquid (ASL) pH as critical for host defense and CF pathogenesis. Therefore, we generated porcine airway epithelia with varying ratios of CF and wild-type cells. Epithelia with a 50:50 mix secreted HCO3 (-) at half the rate of wild-type epithelia. Likewise, heterozygous epithelia (CFTR(+/-) or CFTR(+/∆F508)) expressed CFTR and secreted HCO3 (-) at ∼50% of wild-type values. ASL pH, antimicrobial activity, and viscosity showed similar relationships to the amount of CFTR. Overexpressing CFTR increased HCO3 (-) secretion to rates greater than wild type, but ASL pH did not exceed wild-type values. Thus, in contrast to Cl(-) secretion, the amount of CFTR is rate-limiting for HCO3 (-) secretion and for correcting host defense abnormalities. In addition, overexpressing CFTR might produce a greater benefit than expressing CFTR at wild-type levels when targeting small fractions of cells. These findings may also explain the risk of airway disease in CF carriers.

  2. Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments

    PubMed Central

    Kapushesky, Misha; Adamusiak, Tomasz; Burdett, Tony; Culhane, Aedin; Farne, Anna; Filippov, Alexey; Holloway, Ele; Klebanov, Andrey; Kryvych, Nataliya; Kurbatova, Natalja; Kurnosov, Pavel; Malone, James; Melnichuk, Olga; Petryszak, Robert; Pultsin, Nikolay; Rustici, Gabriella; Tikhonov, Andrew; Travillian, Ravensara S.; Williams, Eleanor; Zorin, Andrey; Parkinson, Helen; Brazma, Alvis

    2012-01-01

    Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies. PMID:22064864

  3. Tuning noise in gene expression.

    PubMed

    Tyagi, Sanjay

    2015-05-05

    The relative contribution of promoter architecture and the associated chromatin environment in regulating gene expression noise has remained elusive. In their recent work, Arkin, Schaffer and colleagues (Dey et al, 2015) show that mean expression and noise for a given promoter at different genomic loci are uncorrelated and influenced by the local chromatin environment.

  4. Monoallelic Gene Expression in Mammals.

    PubMed

    Chess, Andrew

    2016-11-23

    Monoallelic expression not due to cis-regulatory sequence polymorphism poses an intriguing problem in epigenetics because it requires the unequal treatment of two segments of DNA that are present in the same nucleus and that can indeed have absolutely identical sequences. Here, I focus on a few recent developments in the field of monoallelic expression that are of particular interest and raise interesting questions for future work. One development is regarding analyses of imprinted genes, in which recent work suggests the possibility that intriguing networks of imprinted genes exist and are important for genetic and physiological studies. Another issue that has been raised in recent years by a number of publications is the question of how skewed allelic expression should be for it to be designated as monoallelic expression and, further, what methods are appropriate or inappropriate for analyzing genomic data to examine allele-specific expression. Perhaps the most exciting recent development in mammalian monoallelic expression is a clever and carefully executed analysis of genetic diversity of autosomal genes subject to random monoallelic expression (RMAE), which provides compelling evidence for distinct evolutionary forces acting on random monoallelically expressed genes.

  5. LuxCDE-luxAB-based promoter reporter system to monitor the Yersinia enterocolitica O:3 gene expression in vivo

    PubMed Central

    Bozcal, Elif; Dagdeviren, Melih; Uzel, Atac

    2017-01-01

    It is crucial to understand the in vitro and in vivo regulation of the virulence factor genes of bacterial pathogens. In this study, we describe the construction of a versatile reporter system for Yersinia enterocolitica serotype O:3 (YeO3) based on the luxCDABE operon. In strain YeO3-luxCDE we integrated the luciferase substrate biosynthetic genes, luxCDE, into the genome of the bacterium so that the substrate is constitutively produced. The luxAB genes that encode the luciferase enzyme were cloned into a suicide vector to allow cloning of any promoter-containing fragment upstream the genes. When the obtained suicide-construct is mobilized into YeO3-luxCDE bacteria, it integrates into the recipient genome via homologous recombination between the cloned promoter fragment and the genomic promoter sequence and thereby generates a single-copy and stable promoter reporter. Lipopolysaccharide (LPS) O-antigen (O-ag) and outer core hexasaccharide (OC) of YeO3 are virulence factors necessary to colonization of the intestine and establishment of infection. To monitor the activities of the OC and O-ag gene cluster promoters we constructed the reporter strains YeO3-Poc::luxAB and YeO3-Pop1::luxAB, respectively. In vitro, at 37°C both promoter activities were highest during logarithmic growth and decreased when the bacteria entered stationary growth phase. At 22°C the OC gene cluster promoter activity increased during the late logarithmic phase. Both promoters were more active in late stationary phase. To monitor the promoter activities in vivo, mice were infected intragastrically and the reporter activities monitored by the IVIS technology. The mouse experiments revealed that both LPS promoters were well expressed in vivo and could be detected by IVIS, mainly from the intestinal region of orally infected mice. PMID:28235077

  6. Network-based differential gene expression analysis suggests cell cycle related genes regulated by E2F1 underlie the molecular difference between smoker and non-smoker lung adenocarcinoma

    PubMed Central

    2013-01-01

    Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize

  7. Selection of reference genes for qPCR- and ddPCR-based analyses of gene expression in Senescing Barley leaves.

    PubMed

    Zmienko, Agnieszka; Samelak-Czajka, Anna; Goralski, Michal; Sobieszczuk-Nowicka, Ewa; Kozlowski, Piotr; Figlerowicz, Marek

    2015-01-01

    Leaf senescence is a tightly regulated developmental or stress-induced process. It is accompanied by dramatic changes in cell metabolism and structure, eventually leading to the disintegration of chloroplasts, the breakdown of leaf proteins, internucleosomal fragmentation of nuclear DNA and ultimately cell death. In light of the global and intense reorganization of the senescing leaf transcriptome, measuring time-course gene expression patterns in this model is challenging due to the evident problems associated with selecting stable reference genes. We have used oligonucleotide microarray data to identify 181 genes with stable expression in the course of dark-induced senescence of barley leaf. From those genes, we selected 5 candidates and confirmed their invariant expression by both reverse transcription quantitative PCR and droplet digital PCR (ddPCR). We used the selected reference genes to normalize the level of the expression of the following senescence-responsive genes in ddPCR assays: SAG12, ICL, AGXT, CS and RbcS. We were thereby able to achieve a substantial reduction in the data variability. Although the use of reference genes is not considered mandatory in ddPCR assays, our results show that it is advisable in special cases, specifically those that involve the following conditions: i) a low number of repeats, ii) the detection of low-fold changes in gene expression or iii) series data comparisons (such as time-course experiments) in which large sample variation greatly affects the overall gene expression profile and biological interpretation of the data.

  8. Mass spectrometry-based proteomics identifies UPF1 as a critical gene expression regulator in MonoMac 6 cells.

    PubMed

    Ochs, Meike J; Ossipova, Elena; Oliynyk, Ganna; Steinhilber, Dieter; Suess, Beatrix; Jakobsson, Per-Johan

    2013-06-07

    5-Lipoxygenase (5-LO) catalyzes the two initial steps in the biosynthesis of leukotrienes, a group of inflammatory lipid mediators derived from arachidonic acid. Recently, we have demonstrated that 5-LO mRNA expression is regulated by alternative splicing and nonsense-mediated mRNA decay (NMD). In addition to this, 5-LO protein expression was reduced on translational level in UPF1 knockdown cells, suggesting that UPF1 has a positive influence on 5-LO translation. Therefore, a mass spectrometry-based proteomics study was performed to identify compartment-specific protein expression changes upon UPF1 knockdown in differentiated and undifferentiated MM6 cells. The proteomics analysis revealed that the knockdown of UPF1 results in numerous protein changes in the microsomal fraction (~21%) but not in the cytosolic fraction (<1%). The results suggest that UPF1 is a critical gene expression regulator in a compartment-specific way. During differentiation by TGFβ and calcitriol, the majority of UPF1 regulated proteins were adjusted to normal level. This indicates that the translational regulation by UPF1 can potentially be cell differentiation-dependent.

  9. Norovirus gene expression and replication.

    PubMed

    Thorne, Lucy G; Goodfellow, Ian G

    2014-02-01

    Noroviruses are small, positive-sense RNA viruses within the family Caliciviridae, and are now accepted widely as a major cause of acute gastroenteritis in both developed and developing countries. Despite their impact, our understanding of the life cycle of noroviruses has lagged behind that of other RNA viruses due to the inability to culture human noroviruses (HuNVs). Our knowledge of norovirus biology has improved significantly over the past decade as a result of numerous technological advances. The use of a HuNV replicon, improved biochemical and cell-based assays, combined with the discovery of a murine norovirus capable of replication in cell culture, has improved greatly our understanding of the molecular mechanisms of norovirus genome translation and replication, as well as the interaction with host cell processes. In this review, the current state of knowledge of the intracellular life of noroviruses is discussed with particular emphasis on the mechanisms of viral gene expression and viral genome replication.

  10. Sex-based differences in myocardial gene expression in recently deceased organ donors with no prior cardiovascular disease.

    PubMed

    InanlooRahatloo, Kolsoum; Liang, Grace; Vo, Davis; Ebert, Antje; Nguyen, Ivy; Nguyen, Patricia K

    2017-01-01

    Sex differences in the development of the normal heart and the prevalence of cardiomyopathies have been reported. The molecular basis of these differences remains unclear. Sex differences in the human heart might be related to patterns of gene expression. Recent studies have shown that sex specific differences in gene expression in tissues including the brain, kidney, skeletal muscle, and liver. Similar data is limited for the heart. Herein we address this issue by analyzing donor and post-mortem adult human heart samples originating from 46 control individuals to study whole-genome gene expression in the human left ventricle. Using data from the genotype tissue expression (GTEx) project, we compared the transcriptome expression profiles of male and female hearts. We found that genes located on sex chromosomes were the most abundant ones among the sexually dimorphic genes. The majority of differentially expressed autosomal genes were those involved in the regulation of inflammation, which has been found to be an important contributor to left ventricular remodeling. Specifically, genes on autosomal chromosomes encoding chemokines with inflammatory functions (e.g. CCL4, CX3CL1, TNFAIP3) and a gene that regulates adhesion of immune cells to the endothelium (e.g., VCAM1) were identified with sex-specific expression levels. This study underlines the relevance of sex as an important modifier of cardiac gene expression. These results have important implications in the understanding of the differences in the physiology of the male and female heart transcriptome and how they may lead to different sex specific difference in human cardiac health and its control.

  11. A Modified ABCDE Model of Flowering in Orchids Based on Gene Expression Profiling Studies of the Moth Orchid Phalaenopsis aphrodite

    PubMed Central

    Lee, Ann-Ying; Chen, Chun-Yi; Chang, Yao-Chien Alex; Chao, Ya-Ting; Shih, Ming-Che

    2013-01-01

    Previously we developed genomic resources for orchids, including transcriptomic analyses using next-generation sequencing techniques and construction of a web-based orchid genomic database. Here, we report a modified molecular model of flower development in the Orchidaceae based on functional analysis of gene expression profiles in Phalaenopsis aphrodite (a moth orchid) that revealed novel roles for the transcription factors involved in floral organ pattern formation. Phalaenopsis orchid floral organ-specific genes were identified by microarray analysis. Several critical transcription factors including AP3, PI, AP1 and AGL6, displayed distinct spatial distribution patterns. Phylogenetic analysis of orchid MADS box genes was conducted to infer the evolutionary relationship among floral organ-specific genes. The results suggest that gene duplication MADS box genes in orchid may have resulted in their gaining novel functions during evolution. Based on these analyses, a modified model of orchid flowering was proposed. Comparison of the expression profiles of flowers of a peloric mutant and wild-type Phalaenopsis orchid further identified genes associated with lip morphology and peloric effects. Large scale investigation of gene expression profiles revealed that homeotic genes from the ABCDE model of flower development classes A and B in the Phalaenopsis orchid have novel functions due to evolutionary diversification, and display differential expression patterns. PMID:24265826

  12. Dynamic modeling of gene expression data

    PubMed Central

    Holter, Neal S.; Maritan, Amos; Cieplak, Marek; Fedoroff, Nina V.; Banavar, Jayanth R.

    2001-01-01

    We describe the time evolution of gene expression levels by using a time translational matrix to predict future expression levels of genes based on their expression levels at some initial time. We deduce the time translational matrix for previously published DNA microarray gene expression data sets by modeling them within a linear framework by using the characteristic modes obtained by singular value decomposition. The resulting time translation matrix provides a measure of the relationships among the modes and governs their time evolution. We show that a truncated matrix linking just a few modes is a good approximation of the full time translation matrix. This finding suggests that the number of essential connections among the genes is small. PMID:11172013

  13. Dynamic modeling of gene expression data

    NASA Technical Reports Server (NTRS)

    Holter, N. S.; Maritan, A.; Cieplak, M.; Fedoroff, N. V.; Banavar, J. R.

    2001-01-01

    We describe the time evolution of gene expression levels by using a time translational matrix to predict future expression levels of genes based on their expression levels at some initial time. We deduce the time translational matrix for previously published DNA microarray gene expression data sets by modeling them within a linear framework by using the characteristic modes obtained by singular value decomposition. The resulting time translation matrix provides a measure of the relationships among the modes and governs their time evolution. We show that a truncated matrix linking just a few modes is a good approximation of the full time translation matrix. This finding suggests that the number of essential connections among the genes is small.

  14. Dynamic modeling of gene expression data

    NASA Technical Reports Server (NTRS)

    Holter, N. S.; Maritan, A.; Cieplak, M.; Fedoroff, N. V.; Banavar, J. R.

    2001-01-01

    We describe the time evolution of gene expression levels by using a time translational matrix to predict future expression levels of genes based on their expression levels at some initial time. We deduce the time translational matrix for previously published DNA microarray gene expression data sets by modeling them within a linear framework by using the characteristic modes obtained by singular value decomposition. The resulting time translation matrix provides a measure of the relationships among the modes and governs their time evolution. We show that a truncated matrix linking just a few modes is a good approximation of the full time translation matrix. This finding suggests that the number of essential connections among the genes is small.

  15. Mining Temporal Protein Complex Based on the Dynamic PIN Weighted with Connected Affinity and Gene Co-Expression.

    PubMed

    Shen, Xianjun; Yi, Li; Jiang, Xingpeng; He, Tingting; Hu, Xiaohua; Yang, Jincai

    2016-01-01

    The identification of temporal protein complexes would make great contribution to our knowledge of the dynamic organization characteristics in protein interaction networks (PINs). Recent studies have focused on integrating gene expression data into static PIN to construct dynamic PIN which reveals the dynamic evolutionary procedure of protein interactions, but they fail in practice for recognizing the active time points of proteins with low or high expression levels. We construct a Time-Evolving PIN (TEPIN) with a novel method called Deviation Degree, which is designed to identify the active time points of proteins based on the deviation degree of their own expression values. Owing to the differences between protein interactions, moreover, we weight TEPIN with connected affinity and gene co-expression to quantify the degree of these interactions. To validate the efficiencies of our methods, ClusterONE, CAMSE and MCL algorithms are applied on the TEPIN, DPIN (a dynamic PIN constructed with state-of-the-art three-sigma method) and SPIN (the original static PIN) to detect temporal protein complexes. Each algorithm on our TEPIN outperforms that on other networks in terms of match degree, sensitivity, specificity, F-measure and function enrichment etc. In conclusion, our Deviation Degree method successfully eliminates the disadvantages which exist in the previous state-of-the-art dynamic PIN construction methods. Moreover, the biological nature of protein interactions can be well described in our weighted network. Weighted TEPIN is a useful approach for detecting temporal protein complexes and revealing the dynamic protein assembly process for cellular organization.

  16. Microarray-based gene expression analysis of strong seed dormancy in rice cv. N22 and less dormant mutant derivatives.

    PubMed

    Wu, Tao; Yang, Chunyan; Ding, Baoxu; Feng, Zhiming; Wang, Qian; He, Jun; Tong, Jianhua; Xiao, Langtao; Jiang, Ling; Wan, Jianmin

    2016-02-01

    Seed dormancy in rice is an important trait related to the pre-harvest sprouting resistance. In order to understand the molecular mechanisms of seed dormancy, gene expression was investigated by transcriptome analysis using seeds of the strongly dormant cultivar N22 and its less dormant mutants Q4359 and Q4646 at 24 days after heading (DAH). Microarray data revealed more differentially expressed genes in Q4359 than in Q4646 compared to N22. Most genes differing between Q4646 and N22 also differed between Q4359 and N22. GO analysis of genes differentially expressed in both Q4359 and Q4646 revealed that some genes such as those for starch biosynthesis were repressed, whereas metabolic genes such as those for carbohydrate metabolism were enhanced in Q4359 and Q4646 seeds relative to N22. Expression of some genes involved in cell redox homeostasis and chromatin remodeling differed significantly only between Q4359 and N22. The results suggested a close correlation between cell redox homeostasis, chromatin remodeling and seed dormancy. In addition, some genes involved in ABA signaling were down-regulated, and several genes involved in GA biosynthesis and signaling were up-regulated. These observations suggest that reduced seed dormancy in Q4359 was regulated by ABA-GA antagonism. A few differentially expressed genes were located in the regions containing qSdn-1 and qSdn-5 suggesting that they could be candidate genes underlying seed dormancy. Our work provides useful leads to further determine the underling mechanisms of seed dormancy and for cloning seed dormancy genes from N22.

  17. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

    PubMed Central

    2013-01-01

    Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919

  18. Differential Hippocampal Gene Expression and Pathway Analysis in an Etiology-Based Mouse Model of Major Depressive Disorder

    PubMed Central

    Zubenko, George S.; Hughes, Hugh B.; Jordan, Rick M.; Lyons-Weiler, James; Cohen, Bruce M.

    2015-01-01

    We have recently reported the creation and initial characterization of an etiology-based recombinant mouse model of a severe and inherited form of Major Depressive Disorder (MDD). This was achieved by replacing the corresponding mouse DNA sequence witha6-base DNA sequence from the human CREB1promoterthat is associated with MDD in individuals from families with recurrent, early-onset MDD (RE-MDD). In the current study, we explored the effect of the pathogenic Creb1 allele on gene expression in the mouse hippocampus, a brain region that is altered in structure and function in MDD. Mouse whole-genome profiling was performed using the Illumina MouseWG-6 v2.0 Expression BeadChip microarray. Univariate analysis identified 269 differentially-expressed genes in the hippocampus of the mutant mouse. Pathway analyses highlighted 11 KEGG pathways: the phosphatidylinositol signaling system, which has been widely implicated in MDD, Bipolar Disorder, and the action of mood stabilizers; gap junction and long-term potentiation, which mediate cognition and memory functions often impaired in MDD; cardiac muscle contraction, insulin signaling pathway, and three neurodegenerative brain disorders (Alzheimer’s, Parkinson’s, and Huntington’s Diseases) that are associated with MDD; ribosome and proteasome pathways affecting protein synthesis/degradation; and the oxidative phosphorylation pathway that is key to energy production. These findings illustrate the merit of this congenic C57BL/6 recombinant mouse as a model of RE-MDD, and demonstrate its potential for highlighting molecular and cellular pathways that contribute to the biology of MDD. The results also inform our understanding of the mechanisms that underlie the comorbidity of MDD with other disorders. PMID:25059218

  19. Digital gene expression signatures for maize development.

    PubMed

    Eveland, Andrea L; Satoh-Nagasawa, Namiko; Goldshmidt, Alexander; Meyer, Sandra; Beatty, Mary; Sakai, Hajime; Ware, Doreen; Jackson, David

    2010-11-01

    Genome-wide expression signatures detect specific perturbations in developmental programs and contribute to functional resolution of key regulatory networks. In maize (Zea mays) inflorescences, mutations in the RAMOSA (RA) genes affect the determinacy of axillary meristems and thus alter branching patterns, an important agronomic trait. In this work, we developed and tested a framework for analysis of tag-based, digital gene expression profiles using Illumina's high-throughput sequencing technology and the newly assembled B73 maize reference genome. We also used a mutation in the RA3 gene to identify putative expression signatures specific to stem cell fate in axillary meristem determinacy. The RA3 gene encodes a trehalose-6-phosphate phosphatase and may act at the interface between developmental and metabolic processes. Deep sequencing of digital gene expression libraries, representing three biological replicate ear samples from wild-type and ra3 plants, generated 27 million 20- to 21-nucleotide reads with frequencies spanning 4 orders of magnitude. Unique sequence tags were anchored to 3'-ends of individual transcripts by DpnII and NlaIII digests, which were multiplexed during sequencing. We mapped 86% of nonredundant signature tags to the maize genome, which associated with 37,117 gene models and unannotated regions of expression. In total, 66% of genes were detected by at least nine reads in immature maize ears. We used comparative genomics to leverage existing information from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) in functional analyses of differentially expressed maize genes. Results from this study provide a basis for the analysis of short-read expression data in maize and resolved specific expression signatures that will help define mechanisms of action for the RA3 gene.

  20. Gold nanoprobe-based method for sensing activated leukocyte cell adhesion molecule (ALCAM) gene expression, as a breast cancer biomarker.

    PubMed

    Eskandari, Leila; Akbarzadeh, Abolfazl; Zarghami, Nosratollah; Rahmati-Yamchi, Mohammad

    2017-03-01

    In breast cancer, a proper biomarker for the assessment of metastasis and poor prognosis is the RNA of activated leukocyte cell adhesion molecule (ALCAM) gene, which is expressed at high levels in breast tumor. We applied DNA-functionalized gold nanoparticles as the target-specific probes, for detecting specific sequences of DNA or RNA. At high MgCL2 concentrations, nanoprobes aggregate in the absence of the complementary DNA sequence and alteration in the solution color is detectable by evaluating the localized surface plasmon resonance (LSPR). But in the presence of complementary DNA, nanoprobes hybridize to the complementary sequence; therefore, no aggregation takes place, and no color change is observed. We designed a gold nanoprobe-based method that promptly detects the ALCAM gene expression in a low reaction volume with high sensitivity and specificity. This method is simple, fast, selective, and quantitative and can be done with small concentrations of the target (fmol/μL). Limit of detection of the method corresponded to 300 fmol/μL of synthetic ALCAM target.

  1. Sustained expression and safety of human GNE in normal mice after gene transfer based on AAV8 systemic delivery.

    PubMed

    Mitrani-Rosenbaum, Stella; Yakovlev, Lena; Becker Cohen, Michal; Telem, Michal; Elbaz, Moran; Yanay, Nurit; Yotvat, Hagit; Ben Shlomo, Uri; Harazi, Avi; Fellig, Yakov; Argov, Zohar; Sela, Ilan

    2012-11-01

    GNE myopathy is an autosomal recessive adult onset disorder caused by mutations in the GNE gene. GNE encodes the bifunctional enzyme UDP-N-acetylglucosamine 2-epimerase/N-acetyl mannosamine kinase, the key enzyme in the biosynthesis pathway of sialic acid. Additional functions for GNE have been described recently, but the mechanism leading from GNE mutation to this myopathy is unclear. Therefore a gene therapy approach could address all potential defects caused by GNE mutations in muscle. We show that AAV8 viral vectors carrying wild type human GNE cDNA are able to transduce murine muscle cells and human GNE myopathy-derived muscle cells in culture and to express the transgene in these cells. Furthermore, the intravenous administration of this viral vector to healthy mice allows expression of the GNE transgene mRNA and of the coexpressed luciferase protein, for at least 6months in skeletal muscles, with no clinical or pathological signs of focal or general toxicity, neither from the virus particles nor from the wild type human GNE overexpression. Our results support the future use of an AAV8 based vector platform for a safe and efficient therapy of muscle in GNE myopathy. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Pathway-based factor analysis of gene expression data produces highly heritable phenotypes that associate with age.

    PubMed

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-03-09

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.

  3. Pathway-Based Factor Analysis of Gene Expression Data Produces Highly Heritable Phenotypes That Associate with Age

    PubMed Central

    Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard

    2015-01-01

    Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38×10−5). These phenotypes are more heritable (h2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. PMID:25758824

  4. Differential gene expression in glaucoma.

    PubMed

    Jakobs, Tatjana C

    2014-07-01

    In glaucoma, regardless of its etiology, retinal ganglion cells degenerate and eventually die. Although age and elevated intraocular pressure (IOP) are the main risk factors, there are still many mysteries in the pathogenesis of glaucoma. The advent of genome-wide microarray expression screening together with the availability of animal models of the disease has allowed analysis of differential gene expression in all parts of the eye in glaucoma. This review will outline the findings of recent genome-wide expression studies and discuss their commonalities and differences. A common finding was the differential regulation of genes involved in inflammation and immunity, including the complement system and the cytokines transforming growth factor β (TGFβ) and tumor necrosis factor α (TNFα). Other genes of interest have roles in the extracellular matrix, cell-matrix interactions and adhesion, the cell cycle, and the endothelin system.

  5. Differential Gene Expression in Glaucoma

    PubMed Central

    Jakobs, Tatjana C.

    2014-01-01

    In glaucoma, regardless of its etiology, retinal ganglion cells degenerate and eventually die. Although age and elevated intraocular pressure (IOP) are the main risk factors, there are still many mysteries in the pathogenesis of glaucoma. The advent of genome-wide microarray expression screening together with the availability of animal models of the disease has allowed analysis of differential gene expression in all parts of the eye in glaucoma. This review will outline the findings of recent genome-wide expression studies and discuss their commonalities and differences. A common finding was the differential regulation of genes involved in inflammation and immunity, including the complement system and the cytokines transforming growth factor β (TGFβ) and tumor necrosis factor α (TNFα). Other genes of interest have roles in the extracellular matrix, cell–matrix interactions and adhesion, the cell cycle, and the endothelin system. PMID:24985133

  6. RNA-seq based gene expression analysis of ovarian granulosa cells exposed to zearalenone in vitro: significance to steroidogenesis

    PubMed Central

    Zhang, Guo-Liang; Zhang, Rui-Qian; Sun, Xiao-Feng; Cheng, Shun-Feng; Wang, Yu-Feng; Ji, Chuan-Liang; Feng, Yan-Zhong; Yu, Jie; Ge, Wei; Zhao, Yong; Sun, Shi-Duo; Shen, Wei; Li, Lan

    2017-01-01

    Zearalenone (ZEA) is a natural contaminant of various food and feed products representing a significant problem worldwide. Since the occurrence of ZEA in grains and feeds is frequent, the present study was carried out to evaluate the possible effects of ZEA on steroid production and gene expression of porcine granulosa cells, using RNA-seq analysis. Porcine granulosa cells were administered 10 μM and 30 μM ZEA during 72 h of culture in vitro. Following ZEA treatment the gene expression profile of control and exposed granulosa cells was compared using RNA-seq analysis. The results showed that in the exposed granulosa cells ZEA significantly altered the transcript levels, particularly steroidogenesis associated genes. Compared with the control group, 10 μM and 30 μM ZEA treatment significantly increased the mRNA expression of EDN1, IER3, TGFβ and BDNF genes and significantly reduced the mRNA expression of IGF-1 and SFRP2 genes. In particular, ZEA significantly decreased the expression of genes essential for estrogen synthesis including FSHR, CYP19A1 and HSD17β in granulosa cells. Furthermore, Q-PCR and Western-blot analysis also confirmed reduced expression of these genes in ZEA exposed granulosa cells. These effects were associated with a significant reduction of 17β-estradiol concentrations in the culture medium of granulosa cells. Collectively, these results demonstrated a concretely deleterious effect of ZEA exposure on the mRNA expression of steroidogenesis related genes and the production of steroid hormones in porcine ovarian granulosa cells in vitro. PMID:28969048

  7. ESTROGENIC STATUS MODULATES DMBA-MEDIATED HEPATIC GENE EXPRESSION: MICROARRAY-BASED ANALYSIS

    USDA-ARS?s Scientific Manuscript database

    Estrogenic status in women influences the metabolism and toxicity of polycyclic aromatic hydrocarbons (PAH). The objective of this study was to examine the influence of estradiol (E2) on 7,12 dimethylbenz(a)anthracene (DMBA), a ligand for aryl hydrocarbon receptor, mediated changes on gene expressio...

  8. Imputing gene expression to maximize platform compatibility.

    PubMed

    Zhou, Weizhuang; Han, Lichy; Altman, Russ B

    2017-02-15

    Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis.

  9. [Expression of prn gene of Bordetella bronchiseptica and development of a recombinant protein-based indirect ELISA for antibodies detection].

    PubMed

    Zhao, Zhanqin; Xue, Yun; Wu, Bin; Tang, Xibiao; Chen, Huanchun; Li, Zengqiang; Hu, Ruiming; Zhang, Jianmin; Duan, Longchuan

    2008-03-01

    We developed an indirect ELISA method for detecting Bordetella bronchiseptica (Bb) pertactin antibodies based on the recombinant pertactin protein expressed in Escherichia coli (DE3) strain. The prn gene encoding Bb pertactin was fused to the downstream of glutathione S-transferase (GST) of pGEX-KG expression vector, resulting in the fusion expression plasmid pGEX-prn. SDS-PAGE showed that the GST-PRN fusion protein was expressed in high level in BL21 carrying pGEX-prn. The strong reactivity of the GST-PRN fusion protein, specifically with antiserum against porcine Bordetellosis caused by Bb HH0809, was identified by Western blot. The recombinant protein fragment of rPRN was purified from the GST-PRN fusion protein digested by protease thrombin with the purity of 93.1%. The rPRN-based indirect ELISA was developed for detecting antibodies against PRN. The ELISA could detect positive samples in experimentally infected pigs fourteen days post inoculation and the degree of sensitivity was over 4 times higher than the latex agglutination test with the coating antigen of killed Bb. Thirty-two point seven percent of positive samples were detected in 1,229 clinical samples while no false positive results were found in detecting 7 antisera against porcine bacterial diseases. Sera samples from two bordetellosis-positive pig fields were tested by the indirect ELISA method and the results indicated that pigs were infected by Bb during the nursery periods. The assay showed excellent specificity, sensitivity and reduplication, and can be useful for epidemiological survey and clinical diagnosis of swine bordetellosis.

  10. Transgenic Arabidopsis Gene Expression System

    NASA Technical Reports Server (NTRS)

    Ferl, Robert; Paul, Anna-Lisa

    2009-01-01

    The Transgenic Arabidopsis Gene Expression System (TAGES) investigation is one in a pair of investigations that use the Advanced Biological Research System (ABRS) facility. TAGES uses Arabidopsis thaliana, thale cress, with sensor promoter-reporter gene constructs that render the plants as biomonitors (an organism used to determine the quality of the surrounding environment) of their environment using real-time nondestructive Green Fluorescent Protein (GFP) imagery and traditional postflight analyses.

  11. Covariance Structure Models for Gene Expression Microarray Data

    ERIC Educational Resources Information Center

    Xie, Jun; Bentler, Peter M.

    2003-01-01

    Covariance structure models are applied to gene expression data using a factor model, a path model, and their combination. The factor model is based on a few factors that capture most of the expression information. A common factor of a group of genes may represent a common protein factor for the transcript of the co-expressed genes, and hence, it…

  12. Covariance Structure Models for Gene Expression Microarray Data

    ERIC Educational Resources Information Center

    Xie, Jun; Bentler, Peter M.

    2003-01-01

    Covariance structure models are applied to gene expression data using a factor model, a path model, and their combination. The factor model is based on a few factors that capture most of the expression information. A common factor of a group of genes may represent a common protein factor for the transcript of the co-expressed genes, and hence, it…

  13. Stochastic Mechanisms in Gene Expression

    NASA Astrophysics Data System (ADS)

    McAdams, Harley H.; Arkin, Adam

    1997-02-01

    In cellular regulatory networks, genetic activity is controlled by molecular signals that determine when and how often a given gene is transcribed. In genetically controlled pathways, the protein product encoded by one gene often regulates expression of other genes. The time delay, after activation of the first promoter, to reach an effective level to control the next promoter depends on the rate of protein accumulation. We have analyzed the chemical reactions controlling transcript initiation and translation termination in a single such ``genetically coupled'' link as a precursor to modeling networks constructed from many such links. Simulation of the processes of gene expression shows that proteins are produced from an activated promoter in short bursts of variable numbers of proteins that occur at random time intervals. As a result, there can be large differences in the time between successive events in regulatory cascades across a cell population. In addition, the random pattern of expression of competitive effectors can produce probabilistic outcomes in switching mechanisms that select between alternative regulatory paths. The result can be a partitioning of the cell population into different phenotypes as the cells follow different paths. There are numerous unexplained examples of phenotypic variations in isogenic populations of both prokaryotic and eukaryotic cells that may be the result of these stochastic gene expression mechanisms.

  14. Zipf's Law in Gene Expression

    NASA Astrophysics Data System (ADS)

    Furusawa, Chikara; Kaneko, Kunihiko

    2003-02-01

    Using data from gene expression databases on various organisms and tissues, including yeast, nematodes, human normal and cancer tissues, and embryonic stem cells, we found that the abundances of expressed genes exhibit a power-law distribution with an exponent close to -1; i.e., they obey Zipf’s law. Furthermore, by simulations of a simple model with an intracellular reaction network, we found that Zipf’s law of chemical abundance is a universal feature of cells where such a network optimizes the efficiency and faithfulness of self-reproduction. These findings provide novel insights into the nature of the organization of reaction dynamics in living cells.

  15. Bayesian modeling of differential gene expression.

    PubMed

    Lewin, Alex; Richardson, Sylvia; Marshall, Clare; Glazier, Anne; Aitman, Tim

    2006-03-01

    We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different nonlinear functions as part of our model-based approach to normalization. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modeling the array effects (normalization) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates. In an application to mouse knockout data, Gene Ontology annotations over- and underrepresented among the genes on the chosen list are consistent with biological expectations.

  16. Serum-based culture conditions provoke gene expression variability in mouse embryonic stem cells as revealed by single cell analysis

    PubMed Central

    Guo, Guoji; Pinello, Luca; Han, Xiaoping; Lai, Shujing; Shen, Li; Lin, Ta-Wei; Zou, Keyong; Yuan, Guo-Cheng; Orkin, Stuart H.

    2015-01-01

    Summary Variation in gene expression is an important feature of mouse embryonic stem cells (ESCs). However, the mechanisms responsible for global gene expression variation in ESCs are not fully understood. We performed single cell mRNA-seq analysis of mouse ESCs and uncovered significant heterogeneity in ESCs cultured in serum. We define highly variable gene clusters with distinct chromatin states; and show that bivalent genes are prone to expression variation. At the same time, we identify an ESC priming pathway that initiates the exit from the naïve ESC state. Finally, we provide evidence that a large proportion of intracellular network variability is due to the extracellular culture environment. Serum free culture reduces cellular heterogeneity and transcriptome variation in ESCs. PMID:26804902

  17. Plant Omics Data Center: An Integrated Web Repository for Interspecies Gene Expression Networks with NLP-Based Curation

    PubMed Central

    Ohyanagi, Hajime; Takano, Tomoyuki; Terashima, Shin; Kobayashi, Masaaki; Kanno, Maasa; Morimoto, Kyoko; Kanegae, Hiromi; Sasaki, Yohei; Saito, Misa; Asano, Satomi; Ozaki, Soichi; Kudo, Toru; Yokoyama, Koji; Aya, Koichiro; Suwabe, Keita; Suzuki, Go; Aoki, Koh; Kubo, Yasutaka; Watanabe, Masao; Matsuoka, Makoto; Yano, Kentaro

    2015-01-01

    Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources. PMID:25505034

  18. In vivo effects on photosynthesis gene expression of base pair exchanges in the gene encoding the light-responsive BLUF domain of AppA in Rhodobacter sphaeroides.

    PubMed

    Metz, Sebastian; Hendriks, Johnny; Jäger, Andreas; Hellingwerf, Klaas; Klug, Gabriele

    2010-01-01

    The Rhodobacter sphaeroides protein AppA has the unique quality of sensing and transmitting light and redox signals. By acting as antirepressor to the PpsR protein, it acts as a major regulator in photosynthesis gene expression. In this study, we show that by introducing amino acid exchanges into the AppA protein, the in vivo activity as an antirepressor can be greatly altered. The tryptophan 104 to phenylalanine (W104F) base exchange greatly diminished blue-light sensitivity of the BLUF domain. From the obtained in vivo data, the difference in thermal recovery rate of the signaling state of the BLUF domain between the wild type and mutated protein was calculated, predicting an about 10-fold faster recovery in the mutant, which is consistent with in vitro data. Introduction of a tyrosine 21 to phenylalanine (Y21F) or to cysteine (Y21C) mutation led to a complete loss of AppA antirepressor activity, while additionally leading to an increase of photosynthesis gene expression after illumination with high blue-light quantities. Interestingly, this effect is not visible in a W104F/Y21F double mutant that again shows a wild-type-like behavior of the BLUF domain after blue-light illumination, thus restoring the activity of AppA.

  19. Possible prediction of the response of esophageal squamous cell carcinoma to neoadjuvant chemotherapy based on gene expression profiling.

    PubMed

    Shen, Lu-Yan; Wang, Hui; Dong, Bin; Yan, Wan-Pu; Lin, Yao; Shi, Qi; Chen, Ke-Neng

    2016-01-26

    Heterogeneous efficacy of neoadjuvant chemotherapy has led to controversies that have limited its application in clinical practice. Thus, we aimed to identify potential biomarkers predicting esophageal squamous cell carcinoma (ESCC) chemo-responsiveness by gene expression profiling. CCK8 assay was used to evaluate the growth inhibitory effect of different concentrations of cisplatin and paclitaxel on the ESCC cell lines EC109, KYSE450, KYSE410, KYSE510, and KYSE150 to differentiate between chemosensitive and chemoresistant cell lines. Gene expression profiling and Real-time PCR were applied to analyze and validate the gene expression differences between chemosensitive and chemoresistant cell lines. IHC was conducted to examine the expression of selected target markers MUC4, MUC13, and MUC20 in 186 ESCC resection samples and the relationships between their expression and tumor regression grade was analyzed. Moreover, RNAi was conducted to instantly block the expression of MUC4, MUC13, and MUC20 to observe their influences on chemo-responsiveness. EC109 was found to be relatively sensitive to both cisplatin and paclitaxel, while KYSE410 was relatively resistant to cisplatin, KYSE510 was relatively resistant to paclitaxel. Gene expression profiling analysis showed that 2018 genes were differentially expressed in sensitive cell line compared to resistant cell lines. The expression patterns of MUC4, MUC13, MUC20 were validated. Low expression of MUC4 and MUC20 in resection samples was significantly correlated with better TRG. Blockage of MUC20 and MUC13 decreased the drug-resistance capacity and chemosensitivity, respectively. MUC4 and MUC20 were identified as potential biomarkers for predicting the efficacy of neoadjuvant chemotherapy in ESCC patients.

  20. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database.

    PubMed

    Tian, Feng; Zhao, Jinlong; Fan, Xinlei; Kang, Zhenxing

    2017-01-01

    Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC.

  1. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database

    PubMed Central

    Tian, Feng; Zhao, Jinlong; Kang, Zhenxing

    2017-01-01

    Background Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. Methods We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Results Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. Conclusions The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC. PMID:28203405

  2. Blood-Based Gene Expression Signatures of Infants and Toddlers with Autism

    ERIC Educational Resources Information Center

    Glatt, Stephen J.; Tsuang, Ming T.; Winn, Mary; Chandler, Sharon D.; Collins, Melanie; Lopez, Linda; Weinfeld, Melanie; Carter, Cindy; Schork, Nicholas; Pierce, Karen; Courchesne, Eric

    2012-01-01

    Objective: Autism spectrum disorders (ASDs) are highly heritable neurodevelopmental disorders that onset clinically during the first years of life. ASD risk biomarkers expressed early in life could significantly impact diagnosis and treatment, but no transcriptome-wide biomarker classifiers derived from fresh blood samples from children with…

  3. Blood-Based Gene Expression Signatures of Infants and Toddlers with Autism

    ERIC Educational Resources Information Center

    Glatt, Stephen J.; Tsuang, Ming T.; Winn, Mary; Chandler, Sharon D.; Collins, Melanie; Lopez, Linda; Weinfeld, Melanie; Carter, Cindy; Schork, Nicholas; Pierce, Karen; Courchesne, Eric

    2012-01-01

    Objective: Autism spectrum disorders (ASDs) are highly heritable neurodevelopmental disorders that onset clinically during the first years of life. ASD risk biomarkers expressed early in life could significantly impact diagnosis and treatment, but no transcriptome-wide biomarker classifiers derived from fresh blood samples from children with…

  4. Growth response and expression of muscle growth-related candidate genes in adult zebrafish fed plant and fishmeal protein-based diets.

    PubMed

    Ulloa, Pilar E; Peña, Andrea A; Lizama, Carla D; Araneda, Cristian; Iturra, Patricia; Neira, Roberto; Medrano, Juan F

    2013-03-01

    The main objective of this study was to examine the effects of a plant protein- vs. fishmeal-based diet on growth response in a population of 24 families, as well as expression of growth-related genes in the muscle of adult zebrafish (Danio rerio). Each family was split to create two fish populations with similar genetic backgrounds, and the fish were fed either fishmeal (FM diet) or plant protein (PP diet) as the unique protein source in their diets from 35 to 98 days postfertilization (dpf). To understand the effect of the PP diet on gene expression, individuals from three families, representative of the mean weight in both populations, were selected. To understand the effect of familiar variation on gene expression, the same families were evaluated separately. At 98 dpf, growth-related genes Igf1a, Igf2a, mTOR, Pld1a, Mrf4, Myod, Myogenin, and Myostatin1b were evaluated. In males, Myogenin, Mrf4, and Igf2a showed changes attributable to the PP diet. In females, the effect of the PP diet did not modulate the expression in any of the eight genes studied. The effect of familiar variation on gene expression was observed among families. This study shows that PP diet and family variation have effects on gene expression in fish muscle.

  5. Gene expression of the endolymphatic sac.

    PubMed

    Friis, Morten; Martin-Bertelsen, Tomas; Friis-Hansen, Lennart; Winther, Ole; Henao, Ricardo; Sørensen, Mads Sølvsten; Qvortrup, Klaus

    2011-12-01

    The endolymphatic sac is part of the membranous inner ear and is thought to play a role in the fluid homeostasis and immune defense of the inner ear; however, the exact function of the endolymphatic sac is not fully known. Many of the detected mRNAs in this study suggest that the endolymphatic sac has multiple and diverse functions in the inner ear. The objective of this study was to provide a comprehensive review of the genes expressed in the endolymphatic sac in the rat and perform a functional characterization based on measured mRNA abundance. Microarray technology was used to investigate the gene expression of the endolymphatic sac with the surrounding dura. Characteristic and novel endolymphatic sac genes were determined by comparing with expressions in pure dura. In all, 463 genes were identified specific for the endolymphatic sac. Functional annotation clustering revealed 29 functional clusters.

  6. Neighboring Genes Show Correlated Evolution in Gene Expression.

    PubMed

    Ghanbarian, Avazeh T; Hurst, Laurence D

    2015-07-01

    When considering the evolution of a gene's expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking.

  7. Gene expression changes in children with autism.

    PubMed

    Gregg, Jeffrey P; Lit, Lisa; Baron, Colin A; Hertz-Picciotto, Irva; Walker, Wynn; Davis, Ryan A; Croen, Lisa A; Ozonoff, Sally; Hansen, Robin; Pessah, Isaac N; Sharp, Frank R

    2008-01-01

    The objective of this study was to identify gene expression differences in blood differences in children with autism (AU) and autism spectrum disorder (ASD) compared to general population controls. Transcriptional profiles were compared with age- and gender-matched, typically developing children from the general population (GP). The AU group was subdivided based on a history of developmental regression (A-R) or a history of early onset (A-E without regression). Total RNA from blood was processed on human Affymetrix microarrays. Thirty-five children with AU (17 with early onset autism and 18 with autism with regression) and 14 ASD children (who did not meet criteria for AU) were compared to 12 GP children. Unpaired t tests (corrected for multiple comparisons with a false discovery rate of 0.05) detected a number of genes that were regulated more than 1.5-fold for AU versus GP (n=55 genes), for A-E versus GP (n=140 genes), for A-R versus GP (n=20 genes), and for A-R versus A-E (n=494 genes). No genes were significantly regulated for ASD versus GP. There were 11 genes shared between the comparisons of all autism subgroups to GP (AU, A-E, and A-R versus GP) and these genes were all expressed in natural killer cells and many belonged to the KEGG natural killer cytotoxicity pathway (p=0.02). A subset of these genes (n=7) was tested with qRT-PCR and all genes were found to be differentially expressed (p<0.05). We conclude that the gene expression data support emerging evidence for abnormalities in peripheral blood leukocytes in autism that could represent a genetic and/or environmental predisposition to the disorder.

  8. Resource Sharing Controls Gene Expression Bursting.

    PubMed

    Caveney, Patrick M; Norred, S Elizabeth; Chin, Charles W; Boreyko, Jonathan B; Razooky, Brandon S; Retterer, Scott T; Collier, C Patrick; Simpson, Michael L

    2017-02-17

    Episodic gene expression, with periods of high expression separated by periods of no expression, is a pervasive biological phenomenon. This bursty pattern of expression draws from a finite reservoir of expression machinery in a highly time variant way, i.e., requiring no resources most of the time but drawing heavily on them during short intense bursts, that intimately links expression bursting and resource sharing. Yet, most recent investigations have focused on specific molecular mechanisms intrinsic to the bursty behavior of individual genes, while little is known about the interplay between resource sharing and global expression bursting behavior. Here, we confine Escherichia coli cell extract in both cell-sized microfluidic chambers and lipid-based vesicles to explore how resource sharing influences expression bursting. Interestingly, expression burst size, but not burst frequency, is highly sensitive to the size of the shared transcription and translation resource pools. The intriguing implication of these results is that expression bursts are more readily amplified than initiated, suggesting that burst formation occurs through positive feedback or cooperativity. When extrapolated to prokaryotic cells, these results suggest that large translational bursts may be correlated with large transcriptional bursts. This correlation is supported by recently reported transcription and translation bursting studies in E. coli. The results reported here demonstrate a strong intimate link between global expression burst patterns and resource sharing, and they suggest that bursting plays an important role in optimizing the use of limited, shared expression resources.

  9. MIrExpress: A Database for Gene Coexpression Correlation in Immune Cells Based on Mutual Information and Pearson Correlation.

    PubMed

    Wang, Luman; Mo, Qiaochu; Wang, Jianxin

    2015-01-01

    Most current gene coexpression databases support the analysis for linear correlation of gene pairs, but not nonlinear correlation of them, which hinders precisely evaluating the gene-gene coexpression strengths. Here, we report a new database, MIrExpress, which takes advantage of the information theory, as well as the Pearson linear correlation method, to measure the linear correlation, nonlinear correlation, and their hybrid of cell-specific gene coexpressions in immune cells. For a given gene pair or probe set pair input by web users, both mutual information (MI) and Pearson correlation coefficient (r) are calculated, and several corresponding values are reported to reflect their coexpression correlation nature, including MI and r values, their respective rank orderings, their rank comparison, and their hybrid correlation value. Furthermore, for a given gene, the top 10 most relevant genes to it are displayed with the MI, r, or their hybrid perspective, respectively. Currently, the database totally includes 16 human cell groups, involving 20,283 human genes. The expression data and the calculated correlation results from the database are interactively accessible on the web page and can be implemented for other related applications and researches.

  10. PCR-based amplification and heterologous expression of Pseudomonas alcohol dehydrogenase genes from the soil metagenome for biocatalysis.

    PubMed

    Itoh, Nobuya; Isotani, Kentaro; Makino, Yoshihide; Kato, Masaki; Kitayama, Kouta; Ishimota, Tuyoshi

    2014-02-05

    The amplification of useful genes from metagenomes offers great biotechnological potential. We employed this approach to isolate alcohol dehydrogenase (adh) genes from Pseudomonas to aid in the synthesis of optically pure alcohols from various ketones. A PCR primer combination synthesized by reference to the adh sequences of known Pseudomonas genes was used to amplify full-length adh genes directly from 17 samples of DNA extracted from soil. Three such adh preparations were used to construct Escherichia coli plasmid libraries. Of the approximately 2800 colonies obtained, 240 putative adh-positive clones were identified by colony-PCR. Next, 23 functional adh genes named using the descriptors HBadh and HPadh were analyzed. The adh genes obtained via this metagenomic approach varied in their DNA and amino acid sequences. Expression of the gene products in E. coli indicated varying substrate specificity. Two representative genes, HBadh-1 and HPadh-24, expressed in E. coli and Pseudomonas putida, respectively, were purified and characterized in detail. The enzyme products of these genes were confirmed to be useful for producing anti-Prelog chiral alcohols. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Mapping in an apple (Malus x domestica) F1 segregating population based on physical clustering of differentially expressed genes.

    PubMed

    Jensen, Philip J; Fazio, Gennaro; Altman, Naomi; Praul, Craig; McNellis, Timothy W

    2014-04-04

    Apple tree breeding is slow and difficult due to long generation times, self-incompatibility, and complex genetics. The identification of molecular markers linked to traits of interest is a way to expedite the breeding process. In the present study, we aimed to identify genes whose steady-state transcript abundance was associated with inheritance of specific traits segregating in an apple (Malus × domestica) rootstock F1 breeding population, including resistance to powdery mildew (Podosphaera leucotricha) disease and woolly apple aphid (Eriosoma lanigerum). Transcription profiling was performed for 48 individual F1 apple trees from a cross of two highly heterozygous parents, using RNA isolated from healthy, actively-growing shoot tips and a custom apple DNA oligonucleotide microarray representing 26,000 unique transcripts. Genome-wide expression profiles were not clear indicators of powdery mildew or woolly apple aphid resistance phenotype. However, standard differential gene expression analysis between phenotypic groups of trees revealed relatively small sets of genes with trait-associated expression levels. For example, thirty genes were identified that were differentially expressed between trees resistant and susceptible to powdery mildew. Interestingly, the genes encoding twenty-four of these transcripts were physically clustered on chromosome 12. Similarly, seven genes were identified that were differentially expressed between trees resistant and susceptible to woolly apple aphid, and the genes encoding five of these transcripts were also clustered, this time on chromosome 17. In each case, the gene clusters were in the vicinity of previously identified major quantitative trait loci for the corresponding trait. Similar results were obtained for a series of molecular traits. Several of the differentially expressed genes were used to develop DNA polymorphism markers linked to powdery mildew disease and woolly apple aphid resistance. Gene expression profiling

  12. Visualizing spatiotemporal dynamics of apoptosis after G1 arrest by human T cell leukemia virus type 1 Tax and insights into gene expression changes using microarray-based gene expression analysis

    PubMed Central

    2012-01-01

    Background Human T cell leukemia virus type 1 (HTLV-1) Tax is a potent activator of viral and cellular gene expression that interacts with a number of cellular proteins. Many reports show that Tax is capable of regulating cell cycle progression and apoptosis both positively and negatively. However, it still remains to understand why the Tax oncoprotein induces cell cycle arrest and apoptosis, or whether Tax-induced apoptosis is dependent upon its ability to induce G1 arrest. The present study used time-lapse imaging to explore the spatiotemporal patterns of cell cycle dynamics in Tax-expressing HeLa cells containing the fluorescent ubiquitination-based cell cycle indicator, Fucci2. A large-scale host cell gene profiling approach was also used to identify the genes involved in Tax-mediated cell signaling events related to cellular proliferation and apoptosis. Results Tax-expressing apoptotic cells showed a rounded morphology and detached from the culture dish after cell cycle arrest at the G1 phase. Thus, it appears that Tax induces apoptosis through pathways identical to those involved in G1 arrest. To elucidate the mechanism(s) by which Tax induces cell cycle arrest and apoptosis, regulation of host cellular genes by Tax was analyzed using a microarray containing approximately 18,400 human mRNA transcripts. Seventeen genes related to cell cycle regulation were identified as being up or downregulated > 2.0-fold in Tax-expressing cells. Several genes, including SMAD3, JUN, GADD45B, DUSP1 and IL8, were involved in cellular proliferation, responses to cellular stress and DNA damage, or inflammation and immune responses. Additionally, 23 pro- and anti-apoptotic genes were deregulated by Tax, including TNFAIP3, TNFRS9, BIRC3 and IL6. Furthermore, the kinetics of IL8, SMAD3, CDKN1A, GADD45A, GADD45B and IL6 expression were altered following the induction of Tax, and correlated closely with the morphological changes observed by time-lapse imaging. Conclusions Taken

  13. Neighboring Genes Show Correlated Evolution in Gene Expression

    PubMed Central

    Ghanbarian, Avazeh T.; Hurst, Laurence D.

    2015-01-01

    When considering the evolution of a gene’s expression profile, we commonly assume that this is unaffected by its genomic neighborhood. This is, however, in contrast to what we know about the lack of autonomy between neighboring genes in gene expression profiles in extant taxa. Indeed, in all eukaryotic genomes genes of similar expression-profile tend to cluster, reflecting chromatin level dynamics. Does it follow that if a gene increases expression in a particular lineage then the genomic neighbors will also increase in their expression or is gene expression evolution autonomous? To address this here we consider evolution of human gene expression since the human-chimp common ancestor, allowing for both variation in estimation of current expression level and error in Bayesian estimation of the ancestral state. We find that in all tissues and both sexes, the change in gene expression of a focal gene on average predicts the change in gene expression of neighbors. The effect is highly pronounced in the immediate vicinity (<100 kb) but extends much further. Sex-specific expression change is also genomically clustered. As genes increasing their expression in humans tend to avoid nuclear lamina domains and be enriched for the gene activator 5-hydroxymethylcytosine, we conclude that, most probably owing to chromatin level control of gene expression, a change in gene expression of one gene likely affects the expression evolution of neighbors, what we term expression piggybacking, an analog of hitchhiking. PMID:25743543

  14. Hexosamine template. A platform for modulating gene expression and for sugar-based drug discovery.

    PubMed

    Elmouelhi, Noha; Aich, Udayanath; Paruchuri, Venkata D P; Meledeo, M Adam; Campbell, Christopher T; Wang, Jean J; Srinivas, Raja; Khanna, Hargun S; Yarema, Kevin J

    2009-04-23

    This study investigates the breadth of cellular responses engendered by short chain fatty acid (SCFA)-hexosamine hybrid molecules, a class of compounds long used in "metabolic glycoengineering" that are now emerging as drug candidates. First, a "mix and match" strategy showed that different SCFA (n-butyrate and acetate) appended to the same core sugar altered biological activity, complementing previous results [Campbell et al. J. Med. Chem. 2008, 51, 8135-8147] where a single type of SCFA elicited distinct responses. Microarray profiling then compared transcriptional responses engendered by regioisomerically modified ManNAc, GlcNAc, and GalNAc analogues in MDA-MB-231 cells. These data, which were validated by qRT-PCR or Western analysis for ID1, TP53, HPSE, NQO1, EGR1, and VEGFA, showed a two-pronged response where a core set of genes was coordinately regulated by all analogues while each analogue simultaneously uniquely regulated a larger number of genes. Finally, AutoDock modeling supported a mechanism where the analogues directly interact with elements of the NF-kappaB pathway. Together, these results establish the SCFA-hexosamine template as a versatile platform for modulating biological activity and developing new therapeutics.

  15. A New Drug Combinatory Effect Prediction Algorithm on the Cancer Cell Based on Gene Expression and Dose-Response Curve.

    PubMed

    Goswami, C Pankaj; Cheng, L; Alexander, P S; Singal, A; Li, L

    2015-02-01

    Gene expression data before and after treatment with an individual drug and the IC20 of dose-response data were utilized to predict two drugs' interaction effects on a diffuse large B-cell lymphoma (DLBCL) cancer cell. A novel drug interaction scoring algorithm was developed to account for either synergistic or antagonistic effects between drug combinations. Different core gene selection schemes were investigated, which included the whole gene set, the drug-sensitive gene set, the drug-sensitive minus drug-resistant gene set, and the known drug target gene set. The prediction scores were compared with the observed drug interaction data at 6, 12, and 24 hours with a probability concordance (PC) index. The test result shows the concordance between observed and predicted drug interaction ranking reaches a PC index of 0.605. The scoring reliability and efficiency was further confirmed in five drug interaction studies published in the GEO database.

  16. Leveraging global gene expression patterns to predict expression of unmeasured genes.

    PubMed

    Rudd, James; Zelaya, René A; Demidenko, Eugene; Goode, Ellen L; Greene, Casey S; Doherty, Jennifer A

    2015-12-15

    Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes. We developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP|) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1-3), and |rP| (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3 testing partition and four independent populations is above 0.65 and approaches 0.8 for conservative parameter sets. We observe similar accuracies in the TCGA HGSC RNA-sequencing data. Specifically, the prediction accuracy increases with increasing redundancy and increasing |rP|. GGS-selected genes, which maximize expression information about unmeasured genes, can be combined with candidate gene sets as a cost effective way to increase the amount of gene expression information obtained in large studies. This method can be applied

  17. Using PCR to Target Misconceptions about Gene Expression

    PubMed Central

    Wright, Leslie K.; Newman, Dina L.

    2013-01-01

    We present a PCR-based laboratory exercise that can be used with first- or second-year biology students to help overcome common misconceptions about gene expression. Biology students typically do not have a clear understanding of the difference between genes (DNA) and gene expression (mRNA/protein) and often believe that genes exist in an organism or cell only when they are expressed. This laboratory exercise allows students to carry out a PCR-based experiment designed to challenge their misunderstanding of the difference between genes and gene expression. Students first transform E. coli with an inducible GFP gene containing plasmid and observe induced and un-induced colonies. The following exercise creates cognitive dissonance when actual PCR results contradict their initial (incorrect) predictions of the presence of the GFP gene in transformed cells. Field testing of this laboratory exercise resulted in learning gains on both knowledge and application questions on concepts related to genes and gene expression. PMID:23858358

  18. Construction of a plasmid vector based on the pMV158 replicon for cloning and inducible gene expression in Streptococcus pneumoniae.

    PubMed

    Ruiz-Masó, José A; López-Aguilar, Celeste; Nieto, Concha; Sanz, Marta; Burón, Patricia; Espinosa, Manuel; del Solar, Gloria

    2012-01-01

    We report the construction of a plasmid vector designed for regulated gene expression in Streptococcus pneumoniae. The new vector, pLS1ROM, is based on the replicon of the streptococcal promiscuous rolling circle replication (RCR) plasmid pMV158. We inserted the controllable promoter P(M) of the S. pneumoniaemalMP operon, followed by a multi-cloning site sequence aimed to facilitate the insertion of target genes. The expression from P(M) is negatively regulated by the transcriptional repressor MalR, which is released from the DNA operator sequence by growing the cells in maltose-containing media. To get a highly regulated expression of the target gene, MalR was provided in cis by inserting the malR gene under control of the constitutive P(tet) promoter, which in pMV158 directs expression of the tetL gene. To test the functionality of the system, we cloned the reporter gene gfp from Aequorea victoria, encoding the green fluorescent protein (GFP). Pneumococcal cells harboring the recombinant plasmid rendered GFP fluorescence in a maltose-dependent mode with undetectable background levels in the absence of the inducer. The new vector, pLS1ROM, exhibits full structural and segregational stability and constitutes a valuable tool for genetic manipulation and regulated gene expression in S. pneumoniae. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Protein structure protection commits gene expression patterns.

    PubMed

    Chen, Jianping; Liang, Han; Fernández, Ariel

    2008-01-01

    Gene co-expressions often determine module-defining spatial and temporal concurrences of proteins. Yet, little effort has been devoted to tracing coordinating signals for expression correlations to the three-dimensional structures of gene products. We performed a global structure-based analysis of the yeast and human proteomes and contrasted this information against their respective transcriptome organizations obtained from comprehensive microarray data. We show that protein vulnerability quantifies dosage sensitivity for metabolic adaptation phases and tissue-specific patterns of mRNA expression, determining the extent of co-expression similarity of binding partners. The role of protein intrinsic disorder in transcriptome organization is also delineated by interrelating vulnerability, disorder propensity and co-expression patterns. Extremely vulnerable human proteins are shown to be subject to severe post-transcriptional regulation of their expression through significant micro-RNA targeting, making mRNA levels poor surrogates for protein-expression levels. By contrast, in yeast the expression of extremely under-wrapped proteins is likely regulated through protein aggregation. Thus, the 85 most vulnerable proteins in yeast include the five confirmed prions, while in human, the genes encoding extremely vulnerable proteins are predicted to be targeted by microRNAs. Hence, in both vastly different organisms protein vulnerability emerges as a structure-encoded signal for post-transcriptional regulation. Vulnerability of protein structure and the concurrent need to maintain structural integrity are shown to quantify dosage sensitivity, compelling gene expression patterns across tissue types and temporal adaptation phases in a quantifiable manner. Extremely vulnerable proteins impose additional constraints on gene expression: They are subject to high levels of regulation at the post-transcriptional level.

  20. Time-course investigation of the gene expression profile during Fasciola hepatica infection: A microarray-based study

    PubMed Central

    Rojas-Caraballo, Jose; López-Abán, Julio; Fernández-Soto, Pedro; Vicente, Belén; Collía, Francisco; Muro, Antonio

    2015-01-01

    Fasciolosis is listed as one of the most important neglected tropical diseases according with the World Health Organization and is also considered as a reemerging disease in the human beings. Despite there are several studies describing the immune response induced by Fasciola hepatica in the mammalian host, investigations aimed at identifying the expression profile of genes involved in inducing hepatic injury are currently scarce. Data presented here belong to a time-course investigation of the gene expression profile in the liver of BALB/c mice infected with F. hepatica metacercariae at 7 and 21 days after experimental infection. The data published here have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE69588, previously published by Rojas-Caraballo et al. (2015) in PLoS One [1]. PMID:26697343

  1. Time-course investigation of the gene expression profile during Fasciola hepatica infection: A microarray-based study.

    PubMed

    Rojas-Caraballo, Jose; López-Abán, Julio; Fernández-Soto, Pedro; Vicente, Belén; Collía, Francisco; Muro, Antonio

    2015-12-01

    Fasciolosis is listed as one of the most important neglected tropical diseases according with the World Health Organization and is also considered as a reemerging disease in the human beings. Despite there are several studies describing the immune response induced by Fasciola hepatica in the mammalian host, investigations aimed at identifying the expression profile of genes involved in inducing hepatic injury are currently scarce. Data presented here belong to a time-course investigation of the gene expression profile in the liver of BALB/c mice infected with F. hepatica metacercariae at 7 and 21 days after experimental infection. The data published here have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE69588, previously published by Rojas-Caraballo et al. (2015) in PLoS One [1].

  2. A new high-performance heterologous fungal expression system based on regulatory elements from the Aspergillus terreus terrein gene cluster.

    PubMed

    Gressler, Markus; Hortschansky, Peter; Geib, Elena; Brock, Matthias

    2015-01-01

    Recently, the Aspergillus terreus terrein gene cluster was identified and selected for development of a new heterologous expression system. The cluster encodes the specific transcription factor TerR that is indispensable for terrein cluster induction. To identify TerR binding sites, different recombinant versions of the TerR DNA-binding domain were analyzed for specific motif recognition. The high affinity consensus motif TCGGHHWYHCGGH was identified from genes required for terrein production and binding site mutations confirmed their essential contribution to gene expression in A. terreus. A combination of TerR with its terA target promoter was tested as recombinant expression system in the heterologous host Aspergillus niger. TerR mediated target promoter activation was directly dependent on its transcription level. Therefore, terR was expressed under control of the regulatable amylase promoter PamyB and the resulting activation of the terA target promoter was compared with activation levels obtained from direct expression of reporters from the strong gpdA control promoter. Here, the coupled system outcompeted the direct expression system. When the coupled system was used for heterologous polyketide synthase expression high metabolite levels were produced. Additionally, expression of the Aspergillus nidulans polyketide synthase gene orsA revealed lecanoric acid rather than orsellinic acid as major polyketide synthase product. Domain swapping experiments assigned this depside formation from orsellinic acid to the OrsA thioesterase domain. These experiments confirm the suitability of the expression system especially for high-level metabolite production in heterologous hosts.

  3. A new high-performance heterologous fungal expression system based on regulatory elements from the Aspergillus terreus terrein gene cluster

    PubMed Central

    Gressler, Markus; Hortschansky, Peter; Geib, Elena; Brock, Matthias

    2015-01-01

    Recently, the Aspergillus terreus terrein gene cluster was identified and selected for development of a new heterologous expression system. The cluster encodes the specific transcription factor TerR that is indispensable for terrein cluster induction. To identify TerR binding sites, different recombinant versions of the TerR DNA-binding domain were analyzed for specific motif recognition. The high affinity consensus motif TCGGHHWYHCGGH was identified from genes required for terrein production and binding site mutations confirmed their essential contribution to gene expression in A. terreus. A combination of TerR with its terA target promoter was tested as recombinant expression system in the heterologous host Aspergillus niger. TerR mediated target promoter activation was directly dependent on its transcription level. Therefore, terR was expressed under control of the regulatable amylase promoter PamyB and the resulting activation of the terA target promoter was compared with activation levels obtained from direct expression of reporters from the strong gpdA control promoter. Here, the coupled system outcompeted the direct expression system. When the coupled system was used for heterologous polyketide synthase expression high metabolite levels were produced. Additionally, expression of the Aspergillus nidulans polyketide synthase gene orsA revealed lecanoric acid rather than orsellinic acid as major polyketide synthase product. Domain swapping experiments assigned this depside formation from orsellinic acid to the OrsA thioesterase domain. These experiments confirm the suitability of the expression system especially for high-level metabolite production in heterologous hosts. PMID:25852654

  4. Estrogen-Responsive Transient Expression Assay Using a Brain Aromatase-Based Reporter Gene in Zebrafish (Danio rerio)

    PubMed Central

    Kim, Dong-Jae; Seok, Seung-Hyeok; Baek, Min-Won; Lee, Hui-Young; Na, Yi-Rang; Park, Sung-Hoon; Lee, Hyun-Kyoung; Dutta, Noton Kumar; Kawakami, Koichi; Park, Jae-Hak

    2009-01-01

    Whereas endogenous estrogens play an important role in the development, maintenance, and function of female and male reproductive organs, xenoestrogens present in the environment disrupt normal endocrine function in humans and wildlife. Various in vivo and in vitro assays have been developed to screen these xenoestrogens. However, traditional in vivo assays are laborious and unsuitable for large-scale screening, and in vitro assays do not necessarily replicate in vivo functioning. To overcome these limitations, we developed a transient expression assay in zebrafish, into which a brain aromatase (cyp19a1b)-based estrogen-responsive reporter gene was introduced. In response to 17β-estradiol (10−6 M) and heptachlor (10−6 M), zebrafish embryos carrying the reporter construct expressed enhanced green fluorescent protein in the olfactory bulb, telencephalon, preoptic area, and mediobasal hypothalamus. This system will serve to model the in vivo conversion and breakdown of estrogenic compounds and thus provide a rapid preliminary screening method to estimate their estrogenicity. PMID:19887024

  5. Regulation of ABO gene expression.

    PubMed

    Kominato, Yoshihiko; Hata, Yukiko; Matsui, Kazuhiro; Takizawa, Hisao

    2005-07-01

    The ABO blood group system is important in blood transfusions and in identifying individuals during criminal investigations. Two carbohydrate antigens, the A and B antigens, and their antibodies constitute this system. Although biochemical and molecular genetic studies have demonstrated the molecular basis of the histo-blood group ABO system, some aspects remain to be elucidated. To explain the molecular basis of how the ABO genes are controlled in cell type-specific expression, during n