Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes
NASA Astrophysics Data System (ADS)
Ashkani, Jahanshah; Naidoo, Kevin J.
2016-05-01
Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.
Artificial neural network classifier predicts neuroblastoma patients' outcome.
Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi
2016-11-08
More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic status of the tumor in patients with poor prognosis. We developed a robust classifier predicting neuroblastoma patients' outcome with a very low error rate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the potential of using hypoxia as target for neuroblastoma treatment.
Reddy, Anupama; Growney, Joseph D; Wilson, Nick S; Emery, Caroline M; Johnson, Jennifer A; Ward, Rebecca; Monaco, Kelli A; Korn, Joshua; Monahan, John E; Stump, Mark D; Mapa, Felipa A; Wilson, Christopher J; Steiger, Janine; Ledell, Jebediah; Rickles, Richard J; Myer, Vic E; Ettenberg, Seth A; Schlegel, Robert; Sellers, William R; Huet, Heather A; Lehár, Joseph
2015-01-01
Death Receptor 5 (DR5) agonists demonstrate anti-tumor activity in preclinical models but have yet to demonstrate robust clinical responses. A key limitation may be the lack of patient selection strategies to identify those most likely to respond to treatment. To overcome this limitation, we screened a DR5 agonist Nanobody across >600 cell lines representing 21 tumor lineages and assessed molecular features associated with response. High expression of DR5 and Casp8 were significantly associated with sensitivity, but their expression thresholds were difficult to translate due to low dynamic ranges. To address the translational challenge of establishing thresholds of gene expression, we developed a classifier based on ratios of genes that predicted response across lineages. The ratio classifier outperformed the DR5+Casp8 classifier, as well as standard approaches for feature selection and classification using genes, instead of ratios. This classifier was independently validated using 11 primary patient-derived pancreatic xenograft models showing perfect predictions as well as a striking linearity between prediction probability and anti-tumor response. A network analysis of the genes in the ratio classifier captured important biological relationships mediating drug response, specifically identifying key positive and negative regulators of DR5 mediated apoptosis, including DR5, CASP8, BID, cFLIP, XIAP and PEA15. Importantly, the ratio classifier shows translatability across gene expression platforms (from Affymetrix microarrays to RNA-seq) and across model systems (in vitro to in vivo). Our approach of using gene expression ratios presents a robust and novel method for constructing translatable biomarkers of compound response, which can also probe the underlying biology of treatment response.
Reddy, Anupama; Growney, Joseph D.; Wilson, Nick S.; Emery, Caroline M.; Johnson, Jennifer A.; Ward, Rebecca; Monaco, Kelli A.; Korn, Joshua; Monahan, John E.; Stump, Mark D.; Mapa, Felipa A.; Wilson, Christopher J.; Steiger, Janine; Ledell, Jebediah; Rickles, Richard J.; Myer, Vic E.; Ettenberg, Seth A.; Schlegel, Robert; Sellers, William R.
2015-01-01
Death Receptor 5 (DR5) agonists demonstrate anti-tumor activity in preclinical models but have yet to demonstrate robust clinical responses. A key limitation may be the lack of patient selection strategies to identify those most likely to respond to treatment. To overcome this limitation, we screened a DR5 agonist Nanobody across >600 cell lines representing 21 tumor lineages and assessed molecular features associated with response. High expression of DR5 and Casp8 were significantly associated with sensitivity, but their expression thresholds were difficult to translate due to low dynamic ranges. To address the translational challenge of establishing thresholds of gene expression, we developed a classifier based on ratios of genes that predicted response across lineages. The ratio classifier outperformed the DR5+Casp8 classifier, as well as standard approaches for feature selection and classification using genes, instead of ratios. This classifier was independently validated using 11 primary patient-derived pancreatic xenograft models showing perfect predictions as well as a striking linearity between prediction probability and anti-tumor response. A network analysis of the genes in the ratio classifier captured important biological relationships mediating drug response, specifically identifying key positive and negative regulators of DR5 mediated apoptosis, including DR5, CASP8, BID, cFLIP, XIAP and PEA15. Importantly, the ratio classifier shows translatability across gene expression platforms (from Affymetrix microarrays to RNA-seq) and across model systems (in vitro to in vivo). Our approach of using gene expression ratios presents a robust and novel method for constructing translatable biomarkers of compound response, which can also probe the underlying biology of treatment response. PMID:26378449
Computerized system for recognition of autism on the basis of gene expression microarray data.
Latkowski, Tomasz; Osowski, Stanislaw
2015-01-01
The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of different methods of gene selection, to select the most representative input attributes for an ensemble of classifiers. The set of classifiers is responsible for distinguishing autism data from the reference class. Simultaneous application of a few gene selection methods enables analysis of the ill-conditioned gene expression matrix from different points of view. The results of selection combined with a genetic algorithm and SVM classifier have shown increased accuracy of autism recognition. Early recognition of autism is extremely important for treatment of children and increases the probability of their recovery and return to normal social communication. The results of this research can find practical application in early recognition of autism on the basis of gene expression microarray analysis. Copyright © 2014 Elsevier Ltd. All rights reserved.
Novel gene sets improve set-level classification of prokaryotic gene expression data.
Holec, Matěj; Kuželka, Ondřej; Železný, Filip
2015-10-28
Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.
Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco
2011-01-01
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.
Menke, Andreas; Arloth, Janine; Pütz, Benno; Weber, Peter; Klengel, Torsten; Mehta, Divya; Gonik, Mariya; Rex-Haffner, Monika; Rubel, Jennifer; Uhr, Manfred; Lucae, Susanne; Deussing, Jan M; Müller-Myhsok, Bertram; Holsboer, Florian; Binder, Elisabeth B
2012-01-01
Although gene expression profiles in peripheral blood in major depression are not likely to identify genes directly involved in the pathomechanism of affective disorders, they may serve as biomarkers for this disorder. As previous studies using baseline gene expression profiles have provided mixed results, our approach was to use an in vivo dexamethasone challenge test and to compare glucocorticoid receptor (GR)-mediated changes in gene expression between depressed patients and healthy controls. Whole genome gene expression data (baseline and following GR-stimulation with 1.5 mg dexamethasone p.o.) from two independent cohorts were analyzed to identify gene expression pattern that would predict case and control status using a training (N=18 cases/18 controls) and a test cohort (N=11/13). Dexamethasone led to reproducible regulation of 2670 genes in controls and 1151 transcripts in cases. Several genes, including FKBP5 and DUSP1, previously associated with the pathophysiology of major depression, were found to be reliable markers of GR-activation. Using random forest analyses for classification, GR-stimulated gene expression outperformed baseline gene expression as a classifier for case and control status with a correct classification of 79.1 vs 41.6% in the test cohort. GR-stimulated gene expression performed best in dexamethasone non-suppressor patients (88.7% correctly classified with 100% sensitivity), but also correctly classified 77.3% of the suppressor patients (76.7% sensitivity), when using a refined set of 19 genes. Our study suggests that in vivo stimulated gene expression in peripheral blood cells could be a promising molecular marker of altered GR-functioning, an important component of the underlying pathology, in patients suffering from depressive episodes. PMID:22237309
Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona
2013-06-01
To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values≤false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 post-anakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra.
Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona
2014-01-01
Objective To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. Methods We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values ≤ false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Results Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 postanakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. Conclusions We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra. PMID:23223423
Hung, Fei-Hung; Chiu, Hung-Wen
2015-01-01
Gene expression profiles differ in different diseases. Even if diseases are at the same stage, such diseases exhibit different gene expressions, not to mention the different subtypes at a single lesion site. Distinguishing different disease subtypes at a single lesion site is difficult. In early cases, subtypes were initially distinguished by doctors. Subsequently, further differences were found through pathological experiments. For example, a brain tumor can be classified according to its origin, its cell-type origin, or the tumor site. Because of the advancements in bioinformatics and the techniques for accumulating gene expressions, researchers can use gene expression data to classify disease subtypes. Because the operation of a biopathway is closely related to the disease mechanism, the application of gene expression profiles for clustering disease subtypes is insufficient. In this study, we collected gene expression data of healthy and four myelodysplastic syndrome subtypes and applied a method that integrated protein-protein interaction and gene expression data to identify different patterns of disease subtypes. We hope it is efficient for the classification of disease subtypes in adventure.
Comparative analyses identify molecular signature of MRI-classified SVZ-associated glioblastoma
Lin, Chin-Hsing Annie; Rhodes, Christopher T.; Lin, ChenWei; Phillips, Joanna J.; Berger, Mitchel S.
2017-01-01
ABSTRACT Glioblastoma (GBM) is a highly aggressive brain cancer with limited therapeutic options. While efforts to identify genes responsible for GBM have revealed mutations and aberrant gene expression associated with distinct types of GBM, patients with GBM are often diagnosed and classified based on MRI features. Therefore, we seek to identify molecular representatives in parallel with MRI classification for group I and group II primary GBM associated with the subventricular zone (SVZ). As group I and II GBM contain stem-like signature, we compared gene expression profiles between these 2 groups of primary GBM and endogenous neural stem progenitor cells to reveal dysregulation of cell cycle, chromatin status, cellular morphogenesis, and signaling pathways in these 2 types of MRI-classified GBM. In the absence of IDH mutation, several genes associated with metabolism are differentially expressed in these subtypes of primary GBM, implicating metabolic reprogramming occurs in tumor microenvironment. Furthermore, histone lysine methyltransferase EZH2 was upregulated while histone lysine demethylases KDM2 and KDM4 were downregulated in both group I and II primary GBM. Lastly, we identified 9 common genes across large data sets of gene expression profiles among MRI-classified group I/II GBM, a large cohort of GBM subtypes from TCGA, and glioma stem cells by unsupervised clustering comparison. These commonly upregulated genes have known functions in cell cycle, centromere assembly, chromosome segregation, and mitotic progression. Our findings highlight altered expression of genes important in chromosome integrity across all GBM, suggesting a common mechanism of disrupted fidelity of chromosome structure in GBM. PMID:28278055
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
Hinchcliff, Monique; Huang, Chiang-Ching; Wood, Tammara A.; Mahoney, J. Matthew; Martyanov, Viktor; Bhattacharyya, Swati; Tamaki, Zenshiro; Lee, Jungwha; Carns, Mary; Podlusky, Sofia; Sirajuddin, Arlene; Shah, Sanjiv J; Chang, Rowland W.; Lafyatis, Robert; Varga, John; Whitfield, Michael L.
2013-01-01
Heterogeneity in systemic sclerosis/SSc confounds clinical trials. We previously identified ‘intrinsic’ gene expression subsets by analysis of SSc skin. Here we test the hypotheses that skin gene expression signatures including intrinsic subset are associated with skin score/MRSS improvement during mycophenolate mofetil (MMF) treatment. Gene expression and intrinsic subset assignment were measured in 12 SSc patients’ biopsies and ten controls at baseline, and from serial biopsies of one cyclophosphamide-treated patient, and nine MMF-treated patients. Gene expression changes during treatment were determined using paired t-tests corrected for multiple hypothesis testing. MRSS improved in four of seven MMF-treated patients classified as the inflammatory intrinsic subset. Three patients without MRSS improvement were classified as normal-like or fibroproliferative intrinsic subsets. 321 genes (FDR <5%) were differentially expressed at baseline between patients with and without MRSS improvement during treatment. Expression of 571 genes (FDR <10%) changed between pre- and post-MMF treatment biopsies for patients demonstrating MRSS improvement. Gene expression changes in skin are only seen in patients with MRSS improvement. Baseline gene expression in skin, including intrinsic subset assignment, may identify SSc patients whose MRSS will improve during MMF treatment, suggesting that gene expression in skin may allow targeted treatment in SSc. PMID:23677167
Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E
2016-03-11
Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia
2012-01-01
Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521
A new approach to enhance the performance of decision tree for classifying gene expression data.
Hassan, Md; Kotagiri, Ramamohanarao
2013-12-20
Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data
2013-01-01
Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression. PMID:23369200
Classification of ductal carcinoma in situ by gene expression profiling.
Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes B G; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J
2006-01-01
Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples.
Classification of ductal carcinoma in situ by gene expression profiling
Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes BG; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J
2006-01-01
Introduction Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Methods Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. Results DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Conclusion Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples. PMID:17069663
Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J
2007-09-21
Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, M; Craft, D
Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less
Li, Jiangeng; Su, Lei; Pang, Zenan
2015-12-01
Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud
2018-01-01
Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919
A comprehensive simulation study on classification of RNA-Seq data.
Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet
2017-01-01
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
AUCTSP: an improved biomarker gene pair class predictor.
Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin T
2018-06-26
The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.
Li, Qiyuan; Eklund, Aron C.; Juul, Nicolai; Haibe-Kains, Benjamin; Workman, Christopher T.; Richardson, Andrea L.; Szallasi, Zoltan; Swanton, Charles
2010-01-01
Background Expression of the oestrogen receptor (ER) in breast cancer predicts benefit from endocrine therapy. Minimising the frequency of false negative ER status classification is essential to identify all patients with ER positive breast cancers who should be offered endocrine therapies in order to improve clinical outcome. In routine oncological practice ER status is determined by semi-quantitative methods such as immunohistochemistry (IHC) or other immunoassays in which the ER expression level is compared to an empirical threshold[1], [2]. The clinical relevance of gene expression-based ER subtypes as compared to IHC-based determination has not been systematically evaluated. Here we attempt to reduce the frequency of false negative ER status classification using two gene expression approaches and compare these methods to IHC based ER status in terms of predictive and prognostic concordance with clinical outcome. Methodology/Principal Findings Firstly, ER status was discriminated by fitting the bimodal expression of ESR1 to a mixed Gaussian model. The discriminative power of ESR1 suggested bimodal expression as an efficient way to stratify breast cancer; therefore we identified a set of genes whose expression was both strongly bimodal, mimicking ESR expression status, and highly expressed in breast epithelial cell lines, to derive a 23-gene ER expression signature-based classifier. We assessed our classifiers in seven published breast cancer cohorts by comparing the gene expression-based ER status to IHC-based ER status as a predictor of clinical outcome in both untreated and tamoxifen treated cohorts. In untreated breast cancer cohorts, the 23 gene signature-based ER status provided significantly improved prognostic power compared to IHC-based ER status (P = 0.006). In tamoxifen-treated cohorts, the 23 gene ER expression signature predicted clinical outcome (HR = 2.20, P = 0.00035). These complementary ER signature-based strategies estimated that between 15.1% and 21.8% patients of IHC-based negative ER status would be classified with ER positive breast cancer. Conclusion/Significance Expression-based ER status classification may complement IHC to minimise false negative ER status classification and optimise patient stratification for endocrine therapies. PMID:21152022
Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos
2005-01-01
We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308
A consensus prognostic gene expression classifier for ER positive breast cancer
Teschendorff, Andrew E; Naderi, Ali; Barbosa-Morais, Nuno L; Pinder, Sarah E; Ellis, Ian O; Aparicio, Sam; Brenton, James D; Caldas, Carlos
2006-01-01
Background A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer. Results Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation. Conclusion The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors. PMID:17076897
A comparison of machine learning techniques for survival prediction in breast cancer
2011-01-01
Background The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature. Results We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection. Conclusions Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data. PMID:21569330
Transferring genomics to the clinic: distinguishing Burkitt and diffuse large B cell lymphomas.
Sha, Chulin; Barrans, Sharon; Care, Matthew A; Cunningham, David; Tooze, Reuben M; Jack, Andrew; Westhead, David R
2015-01-01
Classifiers based on molecular criteria such as gene expression signatures have been developed to distinguish Burkitt lymphoma and diffuse large B cell lymphoma, which help to explore the intermediate cases where traditional diagnosis is difficult. Transfer of these research classifiers into a clinical setting is challenging because there are competing classifiers in the literature based on different methodology and gene sets with no clear best choice; classifiers based on one expression measurement platform may not transfer effectively to another; and, classifiers developed using fresh frozen samples may not work effectively with the commonly used and more convenient formalin fixed paraffin-embedded samples used in routine diagnosis. Here we thoroughly compared two published high profile classifiers developed on data from different Affymetrix array platforms and fresh-frozen tissue, examining their transferability and concordance. Based on this analysis, a new Burkitt and diffuse large B cell lymphoma classifier (BDC) was developed and employed on Illumina DASL data from our own paraffin-embedded samples, allowing comparison with the diagnosis made in a central haematopathology laboratory and evaluation of clinical relevance. We show that both previous classifiers can be recapitulated using very much smaller gene sets than originally employed, and that the classification result is closely dependent on the Burkitt lymphoma criteria applied in the training set. The BDC classification on our data exhibits high agreement (~95 %) with the original diagnosis. A simple outcome comparison in the patients presenting intermediate features on conventional criteria suggests that the cases classified as Burkitt lymphoma by BDC have worse response to standard diffuse large B cell lymphoma treatment than those classified as diffuse large B cell lymphoma. In this study, we comprehensively investigate two previous Burkitt lymphoma molecular classifiers, and implement a new gene expression classifier, BDC, that works effectively on paraffin-embedded samples and provides useful information for treatment decisions. The classifier is available as a free software package under the GNU public licence within the R statistical software environment through the link http://www.bioinformatics.leeds.ac.uk/labpages/softwares/ or on github https://github.com/Sharlene/BDC.
Huang, Jianyan; Zhao, Xiaobo; Weng, Xiaoyu; Wang, Lei; Xie, Weibo
2012-01-01
Background The B-box (BBX) -containing proteins are a class of zinc finger proteins that contain one or two B-box domains and play important roles in plant growth and development. The Arabidopsis BBX gene family has recently been re-identified and renamed. However, there has not been a genome-wide survey of the rice BBX (OsBBX) gene family until now. Methodology/Principal Findings In this study, we identified 30 rice BBX genes through a comprehensive bioinformatics analysis. Each gene was assigned a uniform nomenclature. We described the chromosome localizations, gene structures, protein domains, phylogenetic relationship, whole life-cycle expression profile and diurnal expression patterns of the OsBBX family members. Based on the phylogeny and domain constitution, the OsBBX gene family was classified into five subfamilies. The gene duplication analysis revealed that only chromosomal segmental duplication contributed to the expansion of the OsBBX gene family. The expression profile of the OsBBX genes was analyzed by Affymetrix GeneChip microarrays throughout the entire life-cycle of rice cultivar Zhenshan 97 (ZS97). In addition, microarray analysis was performed to obtain the expression patterns of these genes under light/dark conditions and after three phytohormone treatments. This analysis revealed that the expression patterns of the OsBBX genes could be classified into eight groups. Eight genes were regulated under the light/dark treatments, and eleven genes showed differential expression under at least one phytohormone treatment. Moreover, we verified the diurnal expression of the OsBBX genes using the data obtained from the Diurnal Project and qPCR analysis, and the results indicated that many of these genes had a diurnal expression pattern. Conclusions/Significance The combination of the genome-wide identification and the expression and diurnal analysis of the OsBBX gene family should facilitate additional functional studies of the OsBBX genes. PMID:23118960
Building gene expression profile classifiers with a simple and efficient rejection option in R.
Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez
2011-01-01
The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.
Andries, Erik; Hagstrom, Thomas; Atlas, Susan R; Willman, Cheryl
2007-02-01
Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.
Role of the Chemokine MCP-1 in Sensitization of PKC-Mediated Apoptosis in Prostate Cancer Cells
2010-02-01
component. As phorbol esters are strong inducers of gene expression, we analyzed changes in gene expression using Affymetrix microarrays. These studies...were carried out at the UPenn Microarray Facility. We studied the dynamics of changes in gene expression by PMA at different times between 0 and 24 h...after PMA treatment. We identified ~ 5,000 PMA- genes up- or down-regulated by PMA (> 2-fold change), identified early and late genes , and classified
Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin
2009-01-01
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.
Genome-wide identification of the potato WRKY transcription factor family.
Zhang, Chao; Wang, Dongdong; Yang, Chenghui; Kong, Nana; Shi, Zheng; Zhao, Peng; Nan, Yunyou; Nie, Tengkun; Wang, Ruoqiu; Ma, Haoli; Chen, Qin
2017-01-01
WRKY transcription factors play pivotal roles in regulation of stress responses. This study identified 79 WRKY genes in potato (Solanum tuberosum). Based on multiple sequence alignment and phylogenetic relationships, WRKY genes were classified into three major groups. The majority of WRKY genes belonged to Group II (52 StWRKYs), Group III had 14 and Group I consisted of 13. The phylogenetic tree further classified Group II into five sub-groups. All StWRKY genes except StWRKY79 were mapped on potato chromosomes, with eight tandem duplication gene pairs and seven segmental duplication gene pairs found from StWRKY family genes. The expression analysis of 22 StWRKYs showed their differential expression levels under various stress conditions. Cis-element prediction showed that a large number of elements related to drought, heat and salicylic acid were present in the promotor regions of StWRKY genes. The expression analysis indicated that seven StWRKYs seemed to respond to stress (heat, drought and salinity) and salicylic acid treatment. These genes are candidates for abiotic stress signaling for further research.
Genome-wide identification of the potato WRKY transcription factor family
Kong, Nana; Shi, Zheng; Zhao, Peng; Nan, Yunyou; Nie, Tengkun; Wang, Ruoqiu; Ma, Haoli
2017-01-01
WRKY transcription factors play pivotal roles in regulation of stress responses. This study identified 79 WRKY genes in potato (Solanum tuberosum). Based on multiple sequence alignment and phylogenetic relationships, WRKY genes were classified into three major groups. The majority of WRKY genes belonged to Group II (52 StWRKYs), Group III had 14 and Group I consisted of 13. The phylogenetic tree further classified Group II into five sub-groups. All StWRKY genes except StWRKY79 were mapped on potato chromosomes, with eight tandem duplication gene pairs and seven segmental duplication gene pairs found from StWRKY family genes. The expression analysis of 22 StWRKYs showed their differential expression levels under various stress conditions. Cis-element prediction showed that a large number of elements related to drought, heat and salicylic acid were present in the promotor regions of StWRKY genes. The expression analysis indicated that seven StWRKYs seemed to respond to stress (heat, drought and salinity) and salicylic acid treatment. These genes are candidates for abiotic stress signaling for further research. PMID:28727761
Bruno, Rossella; Alì, Greta; Giannini, Riccardo; Proietti, Agnese; Lucchi, Marco; Chella, Antonio; Melfi, Franca; Mussi, Alfredo; Fontanini, Gabriella
2017-01-10
Malignant pleural mesothelioma (MPM) is a rare asbestos related cancer, aggressive and unresponsive to therapies. Histological examination of pleural lesions is the gold standard of MPM diagnosis, although it is sometimes hard to discriminate the epithelioid type of MPM from benign mesothelial hyperplasia (MH).This work aims to define a new molecular tool for the differential diagnosis of MPM, using the expression profile of 117 genes deregulated in this tumour.The gene expression analysis was performed by nanoString System on tumour tissues from 36 epithelioid MPM and 17 MH patients, and on 14 mesothelial pleural samples analysed in a blind way. Data analysis included raw nanoString data normalization, unsupervised cluster analysis by Pearson correlation, non-parametric Mann Whitney U-test and molecular classification by the Uncorrelated Shrunken Centroid (USC) Algorithm.The Mann-Whitney U-test found 35 genes upregulated and 31 downregulated in MPM. The unsupervised cluster analysis revealed two clusters, one composed only of MPM and one only of MH samples, thus revealing class-specific gene profiles. The Uncorrelated Shrunken Centroid algorithm identified two classifiers, one including 22 genes and the other 40 genes, able to properly classify all the samples as benign or malignant using gene expression data; both classifiers were also able to correctly determine, in a blind analysis, the diagnostic categories of all the 14 unknown samples.In conclusion we delineated a diagnostic tool combining molecular data (gene expression) and computational analysis (USC algorithm), which can be applied in the clinical practice for the differential diagnosis of MPM.
Baker, J B; Dutta, D; Watson, D; Maddala, T; Munneke, B M; Shak, S; Rowinsky, E K; Xu, L-A; Harbison, C T; Clark, E A; Mauro, D J; Khambata-Ford, S
2011-02-01
Although it is accepted that metastatic colorectal cancers (mCRCs) that carry activating mutations in KRAS are unresponsive to anti-epidermal growth factor receptor (EGFR) monoclonal antibodies, a significant fraction of KRAS wild-type (wt) mCRCs are also unresponsive to anti-EGFR therapy. Genes encoding EGFR ligands amphiregulin (AREG) and epiregulin (EREG) are promising gene expression-based markers but have not been incorporated into a test to dichotomise KRAS wt mCRC patients with respect to sensitivity to anti-EGFR treatment. We used RT-PCR to test 110 candidate gene expression markers in primary tumours from 144 KRAS wt mCRC patients who received monotherapy with the anti-EGFR antibody cetuximab. Results were correlated with multiple clinical endpoints: disease control, objective response, and progression-free survival (PFS). Expression of many of the tested candidate genes, including EREG and AREG, strongly associate with all clinical endpoints. Using multivariate analysis with two-layer five-fold cross-validation, we constructed a four-gene predictive classifier. Strikingly, patients below the classifier cutpoint had PFS and disease control rates similar to those of patients with KRAS mutant mCRC. Gene expression appears to identify KRAS wt mCRC patients who receive little benefit from cetuximab. It will be important to test this model in an independent validation study.
Lv, Yufeng; Wei, Wenhao; Huang, Zhong; Chen, Zhichao; Fang, Yuan; Pan, Lili; Han, Xueqiong; Xu, Zihai
2018-06-20
The aim of this study was to develop a novel long non-coding RNA (lncRNA) expression signature to accurately predict early recurrence for patients with hepatocellular carcinoma (HCC) after curative resection. Using expression profiles downloaded from The Cancer Genome Atlas database, we identified multiple lncRNAs with differential expression between early recurrence (ER) group and non-early recurrence (non-ER) group of HCC. Least absolute shrinkage and selection operator (LASSO) for logistic regression models were used to develop a lncRNA-based classifier for predicting ER in the training set. An independent test set was used to validated the predictive value of this classifier. Futhermore, a co-expression network based on these lncRNAs and its highly related genes was constructed and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of genes in the network were performed. We identified 10 differentially expressed lncRNAs, including 3 that were upregulated and 7 that were downregulated in ER group. The lncRNA-based classifier was constructed based on 7 lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861 and LINC02084), and its accuracy was 0.83 in training set, 0.87 in test set and 0.84 in total set. And ROC curve analysis showed the AUROC was 0.741 in training set, 0.824 in the test set and 0.765 in total set. A functional enrichment analysis suggested that the genes of which is highly related to 4 lncRNAs were involved in immune system. This 7-lncRNA expression profile can effectively predict the early recurrence after surgical resection for HCC. This article is protected by copyright. All rights reserved.
Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W
2006-03-01
Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.
Multiclass cancer diagnosis using tumor gene expression signatures
Ramaswamy, S.; Tamayo, P.; Rifkin, R.; ...
2001-12-11
The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a supportmore » vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.« less
A 10-Gene Classifier for Indeterminate Thyroid Nodules: Development and Multicenter Accuracy Study
González, Hernán E.; Martínez, José R.; Vargas-Salas, Sergio; Solar, Antonieta; Veliz, Loreto; Cruz, Francisco; Arias, Tatiana; Loyola, Soledad; Horvath, Eleonora; Tala, Hernán; Traipe, Eufrosina; Meneses, Manuel; Marín, Luis; Wohllk, Nelson; Diaz, René E.; Véliz, Jesús; Pineda, Pedro; Arroyo, Patricia; Mena, Natalia; Bracamonte, Milagros; Miranda, Giovanna; Bruce, Elsa
2017-01-01
Background: In most of the world, diagnostic surgery remains the most frequent approach for indeterminate thyroid cytology. Although several molecular tests are available for testing in centralized commercial laboratories in the United States, there are no available kits for local laboratory testing. The aim of this study was to develop a prototype in vitro diagnostic (IVD) gene classifier for the further characterization of nodules with an indeterminate thyroid cytology. Methods: In a first stage, the expression of 18 genes was determined by quantitative polymerase chain reaction (qPCR) in a broad histopathological spectrum of 114 fresh-tissue biopsies. Expression data were used to train several classifiers by supervised machine learning approaches. Classifiers were tested in an independent set of 139 samples. In a second stage, the best classifier was chosen as a model to develop a multiplexed-qPCR IVD prototype assay, which was tested in a prospective multicenter cohort of fine-needle aspiration biopsies. Results: In tissue biopsies, the best classifier, using only 10 genes, reached an optimal and consistent performance in the ninefold cross-validated testing set (sensitivity 93% and specificity 81%). In the multicenter cohort of fine-needle aspiration biopsy samples, the 10-gene signature, built into a multiplexed-qPCR IVD prototype, showed an area under the curve of 0.97, a positive predictive value of 78%, and a negative predictive value of 98%. By Bayes' theorem, the IVD prototype is expected to achieve a positive predictive value of 64–82% and a negative predictive value of 97–99% in patients with a cancer prevalence range of 20–40%. Conclusions: A new multiplexed-qPCR IVD prototype is reported that accurately classifies thyroid nodules and may provide a future solution suitable for local reference laboratory testing. PMID:28521616
Metastatic breast carcinomas display genomic and transcriptomic heterogeneity
Weigelt, Britta; Ng, Charlotte KY; Shen, Ronglai; Popova, Tatiana; Schizas, Michail; Natrajan, Rachael; Mariani, Odette; Stern, Marc-Henri; Norton, Larry; Vincent-Salomon, Anne; Reis-Filho, Jorge S
2015-01-01
Metaplastic breast carcinoma is a rare and aggressive histologic type of breast cancer, preferentially displaying a triple-negative phenotype. We sought to define the transcriptomic heterogeneity of metaplastic breast cancers on the basis of current gene expression microarray-based classifiers, and to determine whether these tumors display gene copy number profiles consistent with those of BRCA1-associated breast cancers. Twenty-eight consecutive triple-negative metaplastic breast carcinomas were reviewed, and the metaplastic component present in each frozen specimen was defined (ie, spindle cell, squamous, chondroid metaplasia). RNA and DNA extracted from frozen sections with tumor cell content >60% were subjected to gene expression (Illumina HumanHT-12 v4) and copy number profiling (Affymetrix SNP 6.0), respectively. Using the best practice PAM50/claudin-low microarray-based classifier, all metaplastic breast carcinomas with spindle cell metaplasia were of claudin-low subtype, whereas those with squamous or chondroid metaplasia were preferentially of basal-like subtype. Triple-negative breast cancer subtyping using a dedicated website (http://cbc.mc.vanderbilt.edu/tnbc/) revealed that all metaplastic breast carcinomas with chondroid metaplasia were of mesenchymal-like subtype, spindle cell carcinomas preferentially of unstable or mesenchymal stem-like subtype, and those with squamous metaplasia were of multiple subtypes. None of the cases was classified as immunomodulatory or luminal androgen receptor subtype. Integrative clustering, combining gene expression and gene copy number data, revealed that metaplastic breast carcinomas with spindle cell and chondroid metaplasia were preferentially classified as of integrative clusters 4 and 9, respectively, whereas those with squamous metaplasia were classified into six different clusters. Eight of the 26 metaplastic breast cancers subjected to SNP6 analysis were classified as BRCA1-like. The diversity of histologic features of metaplastic breast carcinomas is reflected at the transcriptomic level, and an association between molecular subtypes and histology was observed. BRCA1-like genomic profiles were found only in a subset (31%) of metaplastic breast cancers, and were not associated with a specific molecular or histologic subtype. PMID:25412848
Sarin, Hemant
2017-03-01
To study the conserved basis for gene expression in comparative cell types at opposite ends of the cell pressuromodulation spectrum, the lymphatic endothelial cell and the blood microvascular capillary endothelial cell. The mechanism for gene expression is studied in terms of the 5' -> 3' direction paired point tropy quotients ( prpT Q s) and the final 5' -> 3' direction episodic sub-episode block sums split-integrated weighted average-averaged gene overexpression tropy quotient ( esebssiwaagoT Q ). The final 5' -> 3' esebssiwaagoT Q classifies an lymphatic endothelial cell overexpressed gene as a supra-pressuromodulated gene ( esebssiwaagoT Q ≥ 0.25 < 0.75) every time and classifies a blood microvascular capillary endothelial cell overexpressed gene every time as an infra-pressuromodulated gene ( esebssiwaagoT Q < 0.25) (100% sensitivity; 100% specificity). Horizontal alignment of 5' -> 3' intergene distance segment tropy wrt the gene is the basis for DNA transcription in the pressuromodulated state.
Kinetics of transcription of infectious laryngotracheitis virus genes.
Mahmoudian, Alireza; Markham, Philip F; Noormohammadi, Amir H; Browning, Glenn F
2012-03-01
The kinetics of expression of only a few genes of infectious laryngotracheitis virus (ILTV) have been determined, using northern blot analysis. We used quantitative reverse transcriptase PCR to examine the kinetics of expression of 74 ILTV genes in LMH cells. ICP4 was the only gene fully expressed in the presence of cycloheximide, and thus classified as immediate-early. The genes most highly expressed early in infection, and thus classified as early, included UL1 (gL), UL2, UL3, UL4, UL5, UL6, UL7, UL8, UL13, UL14, UL19, UL20, UL23 (TK), UL25, UL28, UL29, UL31, UL33, UL34, UL38, UL39, UL40, UL42, UL43, UL44 (gC), UL47, UL48 (α-TIF), UL49, UL54 (ICP27), US3 and US10. ORF A, ORF B, ORF C, ORF E, sORF 4/3, UL[-1], UL0, UL3.5, UL9, UL10 (gM), UL11, UL15a, UL15b, UL18, UL22 (gH), UL24, UL26, UL30, UL32, UL36, UL45, UL49.5 (gN), UL52, US2, US4 (gG), US5 (gJ) and US9 were most highly expressed late in infection and were thus considered late genes. Several genes, including ORF D, UL12, UL17, UL21, UL27 (gB), UL35, UL37, UL41, UL46, UL50, UL51, UL53 (gK), US8 (gE), US6 (gD) and US7 (gI), had features of both early and late genes and were classified as early/late. Our findings suggest transcription from most of ILTV genes is leaky or subject to more complex patterns of regulation than those classically described for herpesviruses. This is the first study examining global expression of ILTV genes and the data provide a basis for future investigations of the pathogenesis of infection with ILTV. Copyright © 2011 Elsevier Ltd. All rights reserved.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.
Tuo, Youlin; An, Ning; Zhang, Ming
2018-03-01
The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
Oberthuer, André; Berthold, Frank; Warnat, Patrick; Hero, Barbara; Kahlert, Yvonne; Spitz, Rüdiger; Ernestus, Karen; König, Rainer; Haas, Stefan; Eils, Roland; Schwab, Manfred; Brors, Benedikt; Westermann, Frank; Fischer, Matthias
2006-11-01
To develop a gene expression-based classifier for neuroblastoma patients that reliably predicts courses of the disease. Two hundred fifty-one neuroblastoma specimens were analyzed using a customized oligonucleotide microarray comprising 10,163 probes for transcripts with differential expression in clinical subgroups of the disease. Subsequently, the prediction analysis for microarrays (PAM) was applied to a first set of patients with maximally divergent clinical courses (n = 77). The classification accuracy was estimated by a complete 10-times-repeated 10-fold cross validation, and a 144-gene predictor was constructed from this set. This classifier's predictive power was evaluated in an independent second set (n = 174) by comparing results of the gene expression-based classification with those of risk stratification systems of current trials from Germany, Japan, and the United States. The first set of patients was accurately predicted by PAM (cross-validated accuracy, 99%). Within the second set, the PAM classifier significantly separated cohorts with distinct courses (3-year event-free survival [EFS] 0.86 +/- 0.03 [favorable; n = 115] v 0.52 +/- 0.07 [unfavorable; n = 59] and 3-year overall survival 0.99 +/- 0.01 v 0.84 +/- 0.05; both P < .0001) and separated risk groups of current neuroblastoma trials into subgroups with divergent outcome (NB2004: low-risk 3-year EFS 0.86 +/- 0.04 v 0.25 +/- 0.15, P < .0001; intermediate-risk 1.00 v 0.57 +/- 0.19, P = .018; high-risk 0.81 +/- 0.10 v 0.56 +/- 0.08, P = .06). In a multivariate Cox regression model, the PAM predictor classified patients of the second set more accurately than risk stratification of current trials from Germany, Japan, and the United States (P < .001; hazard ratio, 4.756 [95% CI, 2.544 to 8.893]). Integration of gene expression-based class prediction of neuroblastoma patients may improve risk estimation of current neuroblastoma trials.
Chang, Yu-Chun; Ding, Yan; Dong, Lingsheng; Zhu, Lang-Jing; Jensen, Roderick V.
2018-01-01
Background Using DNA microarrays, we previously identified 451 genes expressed in 19 different human tissues. Although ubiquitously expressed, the variable expression patterns of these “housekeeping genes” (HKGs) could separate one normal human tissue type from another. Current focus on identifying “specific disease markers” is problematic as single gene expression in a given sample represents the specific cellular states of the sample at the time of collection. In this study, we examine the diagnostic and prognostic potential of the variable expressions of HKGs in lung cancers. Methods Microarray and RNA-seq data for normal lungs, lung adenocarcinomas (AD), squamous cell carcinomas of the lung (SQCLC), and small cell carcinomas of the lung (SCLC) were collected from online databases. Using 374 of 451 HKGs, differentially expressed genes between pairs of sample types were determined via two-sided, homoscedastic t-test. Principal component analysis and hierarchical clustering classified normal lung and lung cancers subtypes according to relative gene expression variations. We used uni- and multi-variate cox-regressions to identify significant predictors of overall survival in AD patients. Classifying genes were selected using a set of training samples and then validated using an independent test set. Gene Ontology was examined by PANTHER. Results This study showed that the differential expression patterns of 242, 245, and 99 HKGs were able to distinguish normal lung from AD, SCLC, and SQCLC, respectively. From these, 70 HKGs were common across the three lung cancer subtypes. These HKGs have low expression variation compared to current lung cancer markers (e.g., EGFR, KRAS) and were involved in the most common biological processes (e.g., metabolism, stress response). In addition, the expression pattern of 106 HKGs alone was a significant classifier of AD versus SQCLC. We further highlighted that a panel of 13 HKGs was an independent predictor of overall survival and cumulative risk in AD patients. Discussion Here we report HKG expression patterns may be an effective tool for evaluation of lung cancer states. For example, the differential expression pattern of 70 HKGs alone can separate normal lung tissue from various lung cancers while a panel of 106 HKGs was a capable class predictor of subtypes of non-small cell carcinomas. We also reported that HKGs have significantly lower variance compared to traditional cancer markers across samples, highlighting the robustness of a panel of genes over any one specific biomarker. Using RNA-seq data, we showed that the expression pattern of 13 HKGs is a significant, independent predictor of overall survival for AD patients. This reinforces the predictive power of a HKG panel across different gene expression measurement platforms. Thus, we propose the expression patterns of HKGs alone may be sufficient for the diagnosis and prognosis of individuals with lung cancer. PMID:29761043
GEsture: an online hand-drawing tool for gene expression pattern search.
Wang, Chunyan; Xu, Yiqing; Wang, Xuelin; Zhang, Li; Wei, Suyun; Ye, Qiaolin; Zhu, Youxiang; Yin, Hengfu; Nainwal, Manoj; Tanon-Reyes, Luis; Cheng, Feng; Yin, Tongming; Ye, Ning
2018-01-01
Gene expression profiling data provide useful information for the investigation of biological function and process. However, identifying a specific expression pattern from extensive time series gene expression data is not an easy task. Clustering, a popular method, is often used to classify similar expression genes, however, genes with a 'desirable' or 'user-defined' pattern cannot be efficiently detected by clustering methods. To address these limitations, we developed an online tool called GEsture. Users can draw, or graph a curve using a mouse instead of inputting abstract parameters of clustering methods. GEsture explores genes showing similar, opposite and time-delay expression patterns with a gene expression curve as input from time series datasets. We presented three examples that illustrate the capacity of GEsture in gene hunting while following users' requirements. GEsture also provides visualization tools (such as expression pattern figure, heat map and correlation network) to display the searching results. The result outputs may provide useful information for researchers to understand the targets, function and biological processes of the involved genes.
By Stuart G. Baker The program requires Mathematica 7.01.0 The key function is Classify [datalist,options] where datalist={data, genename, dataname} data ={matrix for class 0, matrix for class 1}, matrix is gene expression by specimen genename a list of names of genes, dataname ={name of data set, name of class0, name of class1} |
Liu, Yanqiu; Lu, Huijuan; Yan, Ke; Xia, Haixia; An, Chunlin
2016-01-01
Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.
Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell
2013-01-01
Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637
Lung tumor diagnosis and subtype discovery by gene expression profiling.
Wang, Lu-yong; Tu, Zhuowen
2006-01-01
The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.
Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V
2017-09-30
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.
Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species
Andersen, Ethan J.; Neupane, Surendra; Benson, Benjamin V.
2017-01-01
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence. PMID:28973974
Li, Ye Long; Dai, Xin Ren; Yue, Xun; Gao, Xin-Qi; Zhang, Xian Sheng
2014-10-01
Maize 1,491 small secreted peptides were identified, which were classified according to the character of peptide sequences. Partial SSP gene expressions in reproductive tissues were determined by qRT-PCR. Small secreted peptides (SSPs) are important cell-cell communication messengers in plants. Most information on plant SSPs come from Arabidopsis thaliana and Oryza sativa, while little is known about the SSPs of other grass species such as maize (Zea mays). In this study, we identified 1,491 SSP genes from maize genomic sequences. These putative SSP genes were distributed throughout the ten maize chromosomes. Among them, 611 SSPs were classified into 198 superfamilies according to their conserved domains, and 725 SSPs with four or more cysteines at their C-termini shared similar cysteine arrangements with their counterparts in other plant species. Moreover, the SSPs requiring post-translational modification, as well as defensin-like (DEFL) proteins, were identified. Further, the expression levels of 110 SSP genes were analyzed in reproductive tissues, including male flower, pollen, silk, and ovary. Most of the genes encoding basal-layer antifungal peptide-like, small coat proteins-like, thioredoxin-like proteins, γ-thionins-like, and DEFL proteins showed high expression levels in the ovary and male flower compared with their levels in silk and mature pollen. The rapid alkalinization factor-like genes were highly expressed only in the mature ovary and mature pollen, and pollen Ole e 1-like genes showed low expression in silk. The results of this study provide basic information for further analysis of SSP functions in the reproductive process of maize.
Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara
2017-01-01
Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes (CPQ, PLVAP, TFF3, ACVRL1, ZFYVE21, FAM189A2, and CLEC3B). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes (CPQ, PLVAP, TFF3, ACVRL1). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material. PMID:28574441
Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara
2017-06-02
Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes ( CPQ , PLVAP , TFF3 , ACVRL1 , ZFYVE21 , FAM189A2 , and CLEC3B ). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes ( CPQ , PLVAP , TFF3 , ACVRL1 ). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material.
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry
2018-06-25
The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Ushijima, Masaru; Mashima, Tetsuo; Tomida, Akihiro; Dan, Shingo; Saito, Sakae; Furuno, Aki; Tsukahara, Satomi; Seimiya, Hiroyuki; Yamori, Takao; Matsuura, Masaaki
2013-03-01
Genome-wide transcriptional expression analysis is a powerful strategy for characterizing the biological activity of anticancer compounds. It is often instructive to identify gene sets involved in the activity of a given drug compound for comparison with different compounds. Currently, however, there is no comprehensive gene expression database and related application system that is; (i) specialized in anticancer agents; (ii) easy to use; and (iii) open to the public. To develop a public gene expression database of antitumor agents, we first examined gene expression profiles in human cancer cells after exposure to 35 compounds including 25 clinically used anticancer agents. Gene signatures were extracted that were classified as upregulated or downregulated after exposure to the drug. Hierarchical clustering showed that drugs with similar mechanisms of action, such as genotoxic drugs, were clustered. Connectivity map analysis further revealed that our gene signature data reflected modes of action of the respective agents. Together with the database, we developed analysis programs that calculate scores for ranking changes in gene expression and for searching statistically significant pathways from the Kyoto Encyclopedia of Genes and Genomes database in order to analyze the datasets more easily. Our database and the analysis programs are available online at our website (http://scads.jfcr.or.jp/db/cs/). Using these systems, we successfully showed that proteasome inhibitors are selectively classified as endoplasmic reticulum stress inducers and induce atypical endoplasmic reticulum stress. Thus, our public access database and related analysis programs constitute a set of efficient tools to evaluate the mode of action of novel compounds and identify promising anticancer lead compounds. © 2012 Japanese Cancer Association.
Release of (and lessons learned from mining) a pioneering large toxicogenomics database.
Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R
2015-07-01
We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.
NASA Astrophysics Data System (ADS)
Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia
2002-06-01
We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.
Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet
2017-01-01
RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.
Li, Changning; Nong, Qian; Solanki, Manoj Kumar; Liang, Qiang; Xie, Jinlan; Liu, Xiaoyan; Li, Yijie; Wang, Weizan; Yang, Litao; Li, Yangrui
2016-01-01
Water stress causes considerable yield losses in sugarcane. To investigate differentially expressed genes under water stress, a pot experiment was performed with the sugarcane variety GT21 at three water-deficit levels (mild, moderate, and severe) during the elongation stage and gene expression was analyzed using microarray technology. Physiological parameters of sugarcane showed significant alterations in response to drought stress. Based on the expression profile of 15,593 sugarcane genes, 1,501 (9.6%) genes were differentially expressed under different water-level treatments; 821 genes were upregulated and 680 genes were downregulated. A gene similarity analysis showed that approximately 62.6% of the differentially expressed genes shared homology with functional proteins. In a Gene Ontology (GO) analysis, 901 differentially expressed genes were assigned to 36 GO categories. Moreover, 325 differentially expressed genes were classified into 101 pathway categories involved in various processes, such as the biosynthesis of secondary metabolites, ribosomes, carbon metabolism, etc. In addition, some unannotated genes were detected; these may provide a basis for studies of water-deficit tolerance. The reliability of the observed expression patterns was confirmed by RT-PCR. The results of this study may help identify useful genes for improving drought tolerance in sugarcane. PMID:27170459
Genome-wide survey and expression analysis of F-box genes in chickpea.
Gupta, Shefali; Garg, Vanika; Kant, Chandra; Bhatia, Sabhyata
2015-02-13
The F-box genes constitute one of the largest gene families in plants involved in degradation of cellular proteins. F-box proteins can recognize a wide array of substrates and regulate many important biological processes such as embryogenesis, floral development, plant growth and development, biotic and abiotic stress, hormonal responses and senescence, among others. However, little is known about the F-box genes in the important legume crop, chickpea. The available draft genome sequence of chickpea allowed us to conduct a genome-wide survey of the F-box gene family in chickpea. A total of 285 F-box genes were identified in chickpea which were classified based on their C-terminal domain structures into 10 subfamilies. Thirteen putative novel motifs were also identified in F-box proteins with no known functional domain at their C-termini. The F-box genes were physically mapped on the 8 chickpea chromosomes and duplication events were investigated which revealed that the F-box gene family expanded largely due to tandem duplications. Phylogenetic analysis classified the chickpea F-box genes into 9 clusters. Also, maximum syntenic relationship was observed with soybean followed by Medicago truncatula, Lotus japonicus and Arabidopsis. Digital expression analysis of F-box genes in various chickpea tissues as well as under abiotic stress conditions utilizing the available chickpea transcriptome data revealed differential expression patterns with several F-box genes specifically expressing in each tissue, few of which were validated by using quantitative real-time PCR. The genome-wide analysis of chickpea F-box genes provides new opportunities for characterization of candidate F-box genes and elucidation of their function in growth, development and stress responses for utilization in chickpea improvement.
A Bronchial Genomic Classifier for the Diagnostic Evaluation of Lung Cancer.
Silvestri, Gerard A; Vachani, Anil; Whitney, Duncan; Elashoff, Michael; Porta Smith, Kate; Ferguson, J Scott; Parsons, Ed; Mitra, Nandita; Brody, Jerome; Lenburg, Marc E; Spira, Avrum
2015-07-16
Bronchoscopy is frequently nondiagnostic in patients with pulmonary lesions suspected to be lung cancer. This often results in additional invasive testing, although many lesions are benign. We sought to validate a bronchial-airway gene-expression classifier that could improve the diagnostic performance of bronchoscopy. Current or former smokers undergoing bronchoscopy for suspected lung cancer were enrolled at 28 centers in two multicenter prospective studies (AEGIS-1 and AEGIS-2). A gene-expression classifier was measured in epithelial cells collected from the normal-appearing mainstem bronchus to assess the probability of lung cancer. A total of 639 patients in AEGIS-1 (298 patients) and AEGIS-2 (341 patients) met the criteria for inclusion. A total of 43% of bronchoscopic examinations were nondiagnostic for lung cancer, and invasive procedures were performed after bronchoscopy in 35% of patients with benign lesions. In AEGIS-1, the classifier had an area under the receiver-operating-characteristic curve (AUC) of 0.78 (95% confidence interval [CI], 0.73 to 0.83), a sensitivity of 88% (95% CI, 83 to 92), and a specificity of 47% (95% CI, 37 to 58). In AEGIS-2, the classifier had an AUC of 0.74 (95% CI, 0.68 to 0.80), a sensitivity of 89% (95% CI, 84 to 92), and a specificity of 47% (95% CI, 36 to 59). The combination of the classifier plus bronchoscopy had a sensitivity of 96% (95% CI, 93 to 98) in AEGIS-1 and 98% (95% CI, 96 to 99) in AEGIS-2, independent of lesion size and location. In 101 patients with an intermediate pretest probability of cancer, the negative predictive value of the classifier was 91% (95% CI, 75 to 98) among patients with a nondiagnostic bronchoscopic examination. The gene-expression classifier improved the diagnostic performance of bronchoscopy for the detection of lung cancer. In intermediate-risk patients with a nondiagnostic bronchoscopic examination, a negative classifier score provides support for a more conservative diagnostic approach. (Funded by Allegro Diagnostics and others; AEGIS-1 and AEGIS-2 ClinicalTrials.gov numbers, NCT01309087 and NCT00746759.).
NASA Astrophysics Data System (ADS)
Lu, Xinguo; Chen, Dan
2017-08-01
Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.
The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in
Shahdoust, Maryam; Hajizadeh, Ebrahim; Mozdarani, Hossein; Chehrei, Ali
2013-01-01
Cigarette smoking is the major risk factor for development of lung cancer. Identification of effects of tobacco on airway gene expression may provide insight into the causes. This research aimed to compare gene expression of large airway epithelium cells in normal smokers (n=13) and non-smokers (n=9) in order to find genes which discriminate the two groups and assess cigarette smoking effects on large airway epithelium cells. Genes discriminating smokers from non-smokers were identified by applying a neural network clustering method, growing self-organizing maps (GSOM), to microarray data according to class discrimination scores. An index was computed based on differentiation between each mean of gene expression in the two groups. This clustering approach provided the possibility of comparing thousands of genes simultaneously. The applied approach compared the mean of 7,129 genes in smokers and non-smokers simultaneously and classified the genes of large airway epithelium cells which had differently expressed in smokers comparing with non-smokers. Seven genes were identified which had the highest different expression in smokers compared with the non-smokers group: NQO1, H19, ALDH3A1, AKR1C1, ABHD2, GPX2 and ADH7. Most (NQO1, ALDH3A1, AKR1C1, H19 and GPX2) are known to be clinically notable in lung cancer studies. Furthermore, statistical discriminate analysis showed that these genes could classify samples in smokers and non-smokers correctly with 100% accuracy. With the performed GSOM map, other nodes with high average discriminate scores included genes with alterations strongly related to the lung cancer such as AKR1C3, CYP1B1, UCHL1 and AKR1B10. This clustering by comparing expression of thousands of genes at the same time revealed alteration in normal smokers. Most of the identified genes were strongly relevant to lung cancer in the existing literature. The genes may be utilized to identify smokers with increased risk for lung cancer. A large sample study is now recommended to determine relations between the genes ABHD2 and ADH7 and smoking.
Molecular classification of gastric cancer: a new paradigm.
Shah, Manish A; Khanin, Raya; Tang, Laura; Janjigian, Yelena Y; Klimstra, David S; Gerdes, Hans; Kelsen, David P
2011-05-01
Gastric cancer may be subdivided into 3 distinct subtypes--proximal, diffuse, and distal gastric cancer--based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology. Our aim is to test the hypothesis that these distinct gastric cancer subtypes may also be distinguished by gene expression analysis. Patients with localized gastric adenocarcinoma being screened for a phase II preoperative clinical trial (National Cancer Institute, NCI #5917) underwent endoscopic biopsy for fresh tumor procurement. Four to 6 targeted biopsies of the primary tumor were obtained. Macrodissection was carried out to ensure more than 80% carcinoma in the sample. HG-U133A GeneChip (Affymetrix) was used for cDNA expression analysis, and all arrays were processed and analyzed using the Bioconductor R-package. Between November 2003 and January 2006, 57 patients were screened to identify 36 patients with localized gastric cancer who had adequate RNA for expression analysis. Using supervised analysis, we built a classifier to distinguish the 3 gastric cancer subtypes, successfully classifying each into tightly grouped clusters. Leave-one-out cross-validation error was 0.14, suggesting that more than 85% of samples were classified correctly. Gene set analysis with the false discovery rate set at 0.25 identified several pathways that were differentially regulated when comparing each gastric cancer subtype to adjacent normal stomach. Subtypes of gastric cancer that have epidemiologic and histologic distinctions are also distinguished by gene expression data. These preliminary data suggest a new classification of gastric cancer with implications for improving our understanding of disease biology and identification of unique molecular drivers for each gastric cancer subtype. ©2011 AACR.
Molecular Classification of Gastric Cancer: A new paradigm
Shah, Manish A.; Khanin, Raya; Tang, Laura; Janjigian, Yelena Y.; Klimstra, David S.; Gerdes, Hans; Kelsen, David P.
2011-01-01
Purpose Gastric cancer may be subdivided into three distinct subtypes –proximal, diffuse, and distal gastric cancer– based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology. Our aim is to test the hypothesis that these distinct gastric cancer subtypes may also be distinguished by gene expression analysis. Experimental Design Patients with localized gastric adenocarcinoma being screened for a phase II preoperative clinical trial (NCI 5917) underwent endoscopic biopsy for fresh tumor procurement. 4–6 targeted biopsies of the primary tumor were obtained. Macrodissection was performed to ensure >80% carcinoma in the sample. HG-U133A GeneChip (Affymetrix) was used for cDNA expression analysis, and all arrays were processed and analyzed using the Bioconductor R-package. Results Between November 2003 and January 2006, 57 patients were screened to identify 36 patients with localized gastric cancer who had adequate RNA for expression analysis. Using supervised analysis, we built a classifier to distinguish the three gastric cancer subtypes, successfully classifying each into tightly grouped clusters. Leave-one-out cross validation error was 0.14, suggesting that >85% of samples were classified correctly. Gene set analysis with the False Discovery Rate set at 0.25 identified several pathways that were differentially regulated when comparing each gastric cancer subtype to adjacent normal stomach. Conclusions Subtypes of gastric cancer that have epidemiologic and histologic distinction are also distinguished by gene expression data. These preliminary data suggest a new classification of gastric cancer with implications for improving our understanding of disease biology and identification of unique molecular drivers for each gastric cancer subtype. PMID:21430069
2002-01-01
their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested
Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar
2016-04-01
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Yu, Shunying; Yuan, Chengmei; Hong, Wu; Wang, Zuowei; Cui, Jian; Shi, Tieliu; Fang, Yiru
2012-01-01
Subsyndromal symptomatic depression (SSD) is a subtype of subthreshold depressive and also lead to significant psychosocial functional impairment as same as major depressive disorder (MDD). Several studies have suggested that SSD is a transitory phenomena in the depression spectrum and is thus considered a subtype of depression. However, the pathophysioloy of depression remain largely obscure and studies on SSD are limited. The present study compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD, and matched controls (8 subjects in each group). Support vector machines (SVMs) were utilized for training and testing on candidate signature expression profiles from signature selection step. Firstly, we identified 63 differentially expressed SSD signatures in contrast to control (P< = 5.0E-4) and 30 differentially expressed MDD signatures in contrast to control, respectively. Then, 123 gene signatures were identified with significantly differential expression level between SSD and MDD. Secondly, in order to conduct priority selection for biomarkers for SSD and MDD together, we selected top gene signatures from each group of pair-wise comparison results, and merged the signatures together to generate better profiles used for clearly classify SSD and MDD sets in the same time. In details, we tried different combination of signatures from the three pair-wise compartmental results and finally determined 48 gene expression signatures with 100% accuracy. Our finding suggested that SSD and MDD did not exhibit the same expressed genome signature with peripheral blood leukocyte, and blood cell–derived RNA of these 48 gene models may have significant value for performing diagnostic functions and classifying SSD, MDD, and healthy controls. PMID:22348066
Pentheroudakis, George; Kotoula, Vassiliki; Eleftheraki, Anastasia G; Tsolaki, Eleftheria; Wirtz, Ralph M; Kalogeras, Konstantine T; Batistatou, Anna; Bobos, Mattheos; Dimopoulos, Meletios A; Timotheadou, Eleni; Gogas, Helen; Christodoulou, Christos; Papadopoulou, Kyriaki; Efstratiou, Ioannis; Scopa, Chrisoula D; Papaspyrou, Irene; Vlachodimitropoulos, Dimitrios; Linardou, Helena; Samantas, Epaminontas; Pectasides, Dimitrios; Pavlidis, Nicholas; Fountzilas, George
2013-01-01
Discrepant data have been published on the incidence and prognostic significance of ESR1 gene amplification in early breast cancer. Formalin-fixed paraffin-embedded tumor blocks were collected from women with early breast cancer participating in two HeCOG adjuvant trials. Messenger RNA was studied by quantitative PCR, ER protein expression was centrally assessed using immunohistochemistry (IHC) and ESR1 gene copy number by dual fluorescent in situ hybridization probes. In a total of 1010 women with resected node-positive early breast adenocarcinoma, the tumoral ESR1/CEP6 gene ratio was suggestive of deletion in 159 (15.7%), gene gain in 551 (54.6%) and amplification in 42 cases (4.2%), with only 30 tumors (3%) harboring five or more ESR1 copies. Gene copy number ratio showed a significant, though weak correlation to mRNA and protein expression (Spearman's Rho <0.23, p = 0.01). ESR1 clusters were observed in 9.5% (57 gain, 38 amplification) of cases. In contrast to mRNA and protein expression, which were favorable prognosticators, gene copy number changes did not obtain prognostic significance. When ESR1/CEP6 gene ratio was combined with function (as defined by ER protein and mRNA expression) in a molecular classifier, the Gene Functional profile, it was functional status that impacted on prognosis. In univariate analysis, patients with functional tumors (positive ER protein expression and gene ratio normal or gain/amplification) fared better than those with non-functional tumors with ESR1 gain (HR for relapse or death 0.49-0.64, p = 0.003). Significant interactions were observed between gene gain/amplification and paclitaxel therapy (trend for DFS benefit from paclitaxel only in patients with ESR1 gain/amplification, p = 0.066) and Gene Functional profile with HER2 amplification (Gene Functional profile prognostic only in HER2-normal cases, p = 0.029). ESR1 gene deletion and amplification do not constitute per se prognostic markers, instead they can be classified to distinct prognostic groups according to their protein-mediated functional status.
A novel algorithm for simplification of complex gene classifiers in cancer
Wilson, Raphael A.; Teng, Ling; Bachmeyer, Karen M.; Bissonnette, Mei Lin Z.; Husain, Aliya N.; Parham, David M.; Triche, Timothy J.; Wing, Michele R.; Gastier-Foster, Julie M.; Barr, Frederic G.; Hawkins, Douglas S.; Anderson, James R.; Skapek, Stephen X.; Volchenboum, Samuel L.
2013-01-01
The clinical application of complex molecular classifiers as diagnostic or prognostic tools has been limited by the time and cost needed to apply them to patients. Using an existing fifty-gene expression signature known to separate two molecular subtypes of the pediatric cancer rhabdomyosarcoma, we show that an exhaustive iterative search algorithm can distill this complex classifier down to two or three features with equal discrimination. We validated the two-gene signatures using three separate and distinct data sets, including one that uses degraded RNA extracted from formalin-fixed, paraffin-embedded material. Finally, to demonstrate the generalizability of our algorithm, we applied it to a lung cancer data set to find minimal gene signatures that can distinguish survival. Our approach can easily be generalized and coupled to existing technical platforms to facilitate the discovery of simplified signatures that are ready for routine clinical use. PMID:23913937
Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J
2008-01-01
ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Ando, Tatsuya; Suguro, Miyuki; Kobayashi, Takeshi; Seto, Masao; Honda, Hiroyuki
2003-10-01
A fuzzy neural network (FNN) using gene expression profile data can select combinations of genes from thousands of genes, and is applicable to predict outcome for cancer patients after chemotherapy. However, wide clinical heterogeneity reduces the accuracy of prediction. To overcome this problem, we have proposed an FNN system based on majoritarian decision using multiple noninferior models. We used transcriptional profiling data, which were obtained from "Lymphochip" DNA microarrays (http://llmpp.nih.gov/DLBCL), reported by Rosenwald (N Engl J Med 2002; 346: 1937-47). When the data were analyzed by our FNN system, accuracy (73.4%) of outcome prediction using only 1 FNN model with 4 genes was higher than that (68.5%) of the Cox model using 17 genes. Higher accuracy (91%) was obtained when an FNN system with 9 noninferior models, consisting of 35 independent genes, was used. The genes selected by the system included genes that are informative in the prognosis of Diffuse large B-cell lymphoma (DLBCL), such as genes showing an expression pattern similar to that of CD10 and BCL-6 or similar to that of IRF-4 and BCL-4. We classified 220 DLBCL patients into 5 groups using the prediction results of 9 FNN models. These groups may correspond to DLBCL subtypes. In group A containing half of the 220 patients, patients with poor outcome were found to satisfy 2 rules, i.e., high expression of MAX dimerization with high expression of unknown A (LC_26146), or high expression of MAX dimerization with low expression of unknown B (LC_33144). The present paper is the first to describe the multiple noninferior FNN modeling system. This system is a powerful tool for predicting outcome and classifying patients, and is applicable to other heterogeneous diseases.
Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
Wang, Zichen; Monteiro, Caroline D.; Jagodnik, Kathleen M.; Fernandez, Nicolas F.; Gundersen, Gregory W.; Rouillard, Andrew D.; Jenkins, Sherry L.; Feldmann, Axel S.; Hu, Kevin S.; McDermott, Michael G.; Duan, Qiaonan; Clark, Neil R.; Jones, Matthew R.; Kou, Yan; Goff, Troy; Woodland, Holly; Amaral, Fabio M R.; Szeto, Gregory L.; Fuchs, Oliver; Schüssler-Fiorenza Rose, Sophia M.; Sharma, Shvetank; Schwartz, Uwe; Bausela, Xabier Bengoetxea; Szymkiewicz, Maciej; Maroulis, Vasileios; Salykin, Anton; Barra, Carolina M.; Kruth, Candice D.; Bongio, Nicholas J.; Mathur, Vaibhav; Todoric, Radmila D; Rubin, Udi E.; Malatras, Apostolos; Fulp, Carl T.; Galindo, John A.; Motiejunaite, Ruta; Jüschke, Christoph; Dishuck, Philip C.; Lahl, Katharina; Jafari, Mohieddin; Aibar, Sara; Zaravinos, Apostolos; Steenhuizen, Linda H.; Allison, Lindsey R.; Gamallo, Pablo; de Andres Segura, Fernando; Dae Devlin, Tyler; Pérez-García, Vicente; Ma'ayan, Avi
2016-01-01
Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization. PMID:27667448
Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd.
Wang, Zichen; Monteiro, Caroline D; Jagodnik, Kathleen M; Fernandez, Nicolas F; Gundersen, Gregory W; Rouillard, Andrew D; Jenkins, Sherry L; Feldmann, Axel S; Hu, Kevin S; McDermott, Michael G; Duan, Qiaonan; Clark, Neil R; Jones, Matthew R; Kou, Yan; Goff, Troy; Woodland, Holly; Amaral, Fabio M R; Szeto, Gregory L; Fuchs, Oliver; Schüssler-Fiorenza Rose, Sophia M; Sharma, Shvetank; Schwartz, Uwe; Bausela, Xabier Bengoetxea; Szymkiewicz, Maciej; Maroulis, Vasileios; Salykin, Anton; Barra, Carolina M; Kruth, Candice D; Bongio, Nicholas J; Mathur, Vaibhav; Todoric, Radmila D; Rubin, Udi E; Malatras, Apostolos; Fulp, Carl T; Galindo, John A; Motiejunaite, Ruta; Jüschke, Christoph; Dishuck, Philip C; Lahl, Katharina; Jafari, Mohieddin; Aibar, Sara; Zaravinos, Apostolos; Steenhuizen, Linda H; Allison, Lindsey R; Gamallo, Pablo; de Andres Segura, Fernando; Dae Devlin, Tyler; Pérez-García, Vicente; Ma'ayan, Avi
2016-09-26
Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.
Gene Expression in Parp1 Deficient Mice Exposed to a Median Lethal Dose of Gamma Rays.
Kumar, M A Suresh; Laiakis, Evagelia C; Ghandhi, Shanaz A; Morton, Shad R; Fornace, Albert J; Amundson, Sally A
2018-05-10
There is a current interest in the development of biodosimetric methods for rapidly assessing radiation exposure in the wake of a large-scale radiological event. This work was initially focused on determining the exposure dose to an individual using biological indicators. Gene expression signatures show promise for biodosimetric application, but little is known about how these signatures might translate for the assessment of radiological injury in radiosensitive individuals, who comprise a significant proportion of the general population, and who would likely require treatment after exposure to lower doses. Using Parp1 -/- mice as a model radiation-sensitive genotype, we have investigated the effect of this DNA repair deficiency on the gene expression response to radiation. Although Parp1 is known to play general roles in regulating transcription, the pattern of gene expression changes observed in Parp1 -/- mice 24 h postirradiation to a LD 50/30 was remarkably similar to that in wild-type mice after exposure to LD 50/30 . Similar levels of activation of both the p53 and NFκB radiation response pathways were indicated in both strains. In contrast, exposure of wild-type mice to a sublethal dose that was equal to the Parp1 -/- LD 50/30 , which resulted in a lower magnitude gene expression response. Thus, Parp1 -/- mice displayed a heightened gene expression response to radiation, which was more similar to the wild-type response to an equitoxic dose than to an equal absorbed dose. Gene expression classifiers trained on the wild-type data correctly identified all wild-type samples as unexposed, exposed to a sublethal dose or exposed to an LD 50/30 . All unexposed samples from Parp1 -/- mice were also correctly classified with the same gene set, and 80% of irradiated Parp1 -/- samples were identified as exposed to an LD 50/30 . The results of this study suggest that, at least for some pathways that may influence radiosensitivity in humans, specific gene expression signatures have the potential to accurately detect the extent of radiological injury, rather than serving only as a surrogate of physical radiation dose.
Diotel, Nicolas; Rodriguez Viales, Rebecca; Armant, Olivier; März, Martin; Ferg, Marco; Rastegar, Sepand; Strähle, Uwe
2015-01-01
The zebrafish has become a model to study adult vertebrate neurogenesis. In particular, the adult telencephalon has been an intensely studied structure in the zebrafish brain. Differential expression of transcriptional regulators (TRs) is a key feature of development and tissue homeostasis. Here we report an expression map of 1,202 TR genes in the telencephalon of adult zebrafish. Our results are summarized in a database with search and clustering functions to identify genes expressed in particular regions of the telencephalon. We classified 562 genes into 13 distinct patterns, including genes expressed in the proliferative zone. The remaining 640 genes displayed unique and complex patterns of expression and could thus not be grouped into distinct classes. The neurogenic ventricular regions express overlapping but distinct sets of TR genes, suggesting regional differences in the neurogenic niches in the telencephalon. In summary, the small telencephalon of the zebrafish shows a remarkable complexity in TR gene expression. The adult zebrafish telencephalon has become a model to study neurogenesis. We established the expression pattern of more than 1200 transcription regulators (TR) in the adult telencephalon. The neurogenic regions express overlapping but distinct sets of TR genes suggesting regional differences in the neurogenic potential. J. Comp. Neurol. 523:1202–1221, 2015. © 2015 Wiley Periodicals, Inc. PMID:25556858
Diotel, Nicolas; Rodriguez Viales, Rebecca; Armant, Olivier; März, Martin; Ferg, Marco; Rastegar, Sepand; Strähle, Uwe
2015-06-01
The zebrafish has become a model to study adult vertebrate neurogenesis. In particular, the adult telencephalon has been an intensely studied structure in the zebrafish brain. Differential expression of transcriptional regulators (TRs) is a key feature of development and tissue homeostasis. Here we report an expression map of 1,202 TR genes in the telencephalon of adult zebrafish. Our results are summarized in a database with search and clustering functions to identify genes expressed in particular regions of the telencephalon. We classified 562 genes into 13 distinct patterns, including genes expressed in the proliferative zone. The remaining 640 genes displayed unique and complex patterns of expression and could thus not be grouped into distinct classes. The neurogenic ventricular regions express overlapping but distinct sets of TR genes, suggesting regional differences in the neurogenic niches in the telencephalon. In summary, the small telencephalon of the zebrafish shows a remarkable complexity in TR gene expression. The adult zebrafish telencephalon has become a model to study neurogenesis. We established the expression pattern of more than 1200 transcription regulators (TR) in the adult telencephalon. The neurogenic regions express overlapping but distinct sets of TR genes suggesting regional differences in the neurogenic potential. © 2015 Wiley Periodicals, Inc.
Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G
2014-12-05
Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
An Ensemble Framework Coping with Instability in the Gene Selection Process.
Castellanos-Garzón, José A; Ramos, Juan; López-Sánchez, Daniel; de Paz, Juan F; Corchado, Juan M
2018-03-01
This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.
Gaffoor, Iffa; Brown, Daren W.; Plattner, Ron; Proctor, Robert H.; Qi, Weihong; Trail, Frances
2005-01-01
Polyketides are a class of secondary metabolites that exhibit a vast diversity of form and function. In fungi, these compounds are produced by large, multidomain enzymes classified as type I polyketide synthases (PKSs). In this study we identified and functionally disrupted 15 PKS genes from the genome of the filamentous fungus Gibberella zeae. Five of these genes are responsible for producing the mycotoxins zearalenone, aurofusarin, and fusarin C and the black perithecial pigment. A comprehensive expression analysis of the 15 genes revealed diverse expression patterns during grain colonization, plant colonization, sexual development, and mycelial growth. Expression of one of the PKS genes was not detected under any of 18 conditions tested. This is the first study to genetically characterize a complete set of PKS genes from a single organism. PMID:16278459
Kurian, S. M.; Williams, A. N.; Gelbart, T.; Campbell, D.; Mondala, T. S.; Head, S. R.; Horvath, S.; Gaber, L.; Thompson, R.; Whisenant, T.; Lin, W.; Langfelder, P.; Robison, E. H.; Schaffer, R. L.; Fisher, J. S.; Friedewald, J.; Flechner, S. M.; Chan, L. K.; Wiseman, A. C.; Shidban, H.; Mendez, R.; Heilman, R.; Abecassis, M. M.; Marsh, C. L.; Salomon, D. R.
2015-01-01
There are no minimally invasive diagnostic metrics for acute kidney transplant rejection (AR), especially in the setting of the common confounding diagnosis, acute dysfunction with no rejection (ADNR). Thus, though kidney transplant biopsies remain the gold standard, they are invasive, have substantial risks, sampling error issues and significant costs and are not suitable for serial monitoring. Global gene expression profiles of 148 peripheral blood samples from transplant patients with excellent function and normal histology (TX; n = 46), AR (n = 63) and ADNR (n = 39), from two independent cohorts were analyzed with DNA microarrays. We applied a new normalization tool, frozen robust multi-array analysis, particularly suitable for clinical diagnostics, multiple prediction tools to discover, refine and validate robust molecular classifiers and we tested a novel one-by-one analysis strategy to model the real clinical application of this test. Multiple three-way classifier tools identified 200 highest value probesets with sensitivity, specificity, positive predictive value, negative predictive value and area under the curve for the validation cohort ranging from 82% to 100%, 76% to 95%, 76% to 95%, 79% to 100%, 84% to 100% and 0.817 to 0.968, respectively. We conclude that peripheral blood gene expression profiling can be used as a minimally invasive tool to accurately reveal TX, AR and ADNR in the setting of acute kidney transplant dysfunction. PMID:24725967
Clustering change patterns using Fourier transformation with time-course gene expression data.
Kim, Jaehee
2011-01-01
To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.
NASA Astrophysics Data System (ADS)
Prayuni, Kinasih; Dwivany, Fenny M.
2015-09-01
Banana is classified as a climateric fruit, whose ripening is regulated by ethylene. Ethylene is synthesized from ACC (1-aminocyclopropane-1-carboxylic acid) by ACC oxidase enzyme which is encoded by ACO gene. Controling an important gene expression in ethylene biosynthesis pathway has became a target to delay the ripening process. Therefore in the previous study we have designed a MaACO-RNAi construct to control MaACO gene expression. In this research, we study the effectiveness of different transient transformation methods to deliver the construct. Direct injection, with or no vaccum infiltration methods were used to deliver MaACO-RNAi construct. All of the methods succesfully deliver the construct into banana fruits based on RT-PCR result.
2013-01-01
Background Differential diagnosis between malignant follicular thyroid cancer (FTC) and benign follicular thyroid adenoma (FTA) is a great challenge for even an experienced pathologist and requires special effort. Molecular markers may potentially support a differential diagnosis between FTC and FTA in postoperative specimens. The purpose of this study was to derive molecular support for differential post-operative diagnosis, in the form of a simple multigene mRNA-based classifier that would differentiate between FTC and FTA tissue samples. Methods A molecular classifier was created based on a combined analysis of two microarray datasets (using 66 thyroid samples). The performance of the classifier was assessed using an independent dataset comprising 71 formalin-fixed paraffin-embedded (FFPE) samples (31 FTC and 40 FTA), which were analysed by quantitative real-time PCR (qPCR). In addition, three other microarray datasets (62 samples) were used to confirm the utility of the classifier. Results Five of 8 genes selected from training datasets (ELMO1, EMCN, ITIH5, KCNAB1, SLCO2A1) were amplified by qPCR in FFPE material from an independent sample set. Three other genes did not amplify in FFPE material, probably due to low abundance. All 5 analysed genes were downregulated in FTC compared to FTA. The sensitivity and specificity of the 5-gene classifier tested on the FFPE dataset were 71% and 72%, respectively. Conclusions The proposed approach could support histopathological examination: 5-gene classifier may aid in molecular discrimination between FTC and FTA in FFPE material. PMID:24099521
NASA Astrophysics Data System (ADS)
Toledo-Suárez, Carlos D.
It is proposed a way of increasing the cardinality of an alphabet used to write rules in a learning classifier system that extends the idea of relational schemata. Theoretical justifications regarding the possible reduction in the amount of rules for the solution of problems such extended alphabets (st-alphabets) imply are shown. It is shown that when expressed as bipolar neural networks, the matching process of rules over st-alphabets strongly resembles a gene expression mechanism applied to a system over {0,1,#}. In spite of the apparent drawbacks the explicit use of such relational alphabets would imply, their successful implementation in an information gain based classifier system (IGCS) is presented.
Chao, Nan; Liu, Shu-Xin; Liu, Bing-Mei; Li, Ning; Jiang, Xiang-Ning; Gai, Ying
2014-11-01
Nine CAD/CAD-like genes in P. tomentosa were classified into four classes based on expression patterns, phylogenetic analysis and biochemical properties with modification for the previous claim of SAD. Cinnamyl alcohol dehydrogenase (CAD) functions in monolignol biosynthesis and plays a critical role in wood development and defense. In this study, we isolated and cloned nine CAD/CAD-like genes in the Populus tomentosa genome. We investigated differential expression using microarray chips and found that PtoCAD1 was highly expressed in bud, root and vascular tissues (xylem and phloem) with the greatest expression in the root. Differential expression in tissues was demonstrated for PtoCAD3, PtoCAD6 and PtoCAD9. Biochemical analysis of purified PtoCADs in vitro indicated PtoCAD1, PtoCAD2 and PtoCAD8 had detectable activity against both coniferaldehyde and sinapaldehyde. PtoCAD1 used both substrates with high efficiency. PtoCAD2 showed no specific requirement for sinapaldehyde in spite of its high identity with so-called PtrSAD (sinapyl alcohol dehydrogenase). In addition, the enzymatic activity of PtoCAD1 and PtoCAD2 was affected by temperature. We classified these nine CAD/CAD-like genes into four classes: class I included PtoCAD1, which was a bone fide CAD with the highest activity; class II included PtoCAD2, -5, -7, -8, which might function in monolignol biosynthesis and defense; class III genes included PtoCAD3, -6, -9, which have a distinct expression pattern; class IV included PtoCAD12, which has a distinct structure. These data suggest divergence of the PtoCADs and its homologs, related to their functions. We propose genes in class II are a subset of CAD genes that evolved before angiosperms appeared. These results suggest CAD/CAD-like genes in classes I and II play a role in monolignol biosynthesis and contribute to our knowledge of lignin biosynthesis in P. tomentosa.
Construction of diagnosis system and gene regulatory networks based on microarray analysis.
Hong, Chun-Fu; Chen, Ying-Chen; Chen, Wei-Chun; Tu, Keng-Chang; Tsai, Meng-Hsiun; Chan, Yung-Kuan; Yu, Shyr Shen
2018-05-01
A microarray analysis generally contains expression data of thousands of genes, but most of them are irrelevant to the disease of interest, making analyzing the genes concerning specific diseases complicated. Therefore, filtering out a few essential genes as well as their regulatory networks is critical, and a disease can be easily diagnosed just depending on the expression profiles of a few critical genes. In this study, a target gene screening (TGS) system, which is a microarray-based information system that integrates F-statistics, pattern recognition matching, a two-layer K-means classifier, a Parameter Detection Genetic Algorithm (PDGA), a genetic-based gene selector (GBG selector) and the association rule, was developed to screen out a small subset of genes that can discriminate malignant stages of cancers. During the first stage, F-statistic, pattern recognition matching, and a two-layer K-means classifier were applied in the system to filter out the 20 critical genes most relevant to ovarian cancer from 9600 genes, and the PDGA was used to decide the fittest values of the parameters for these critical genes. Among the 20 critical genes, 15 are associated with cancer progression. In the second stage, we further employed a GBG selector and the association rule to screen out seven target gene sets, each with only four to six genes, and each of which can precisely identify the malignancy stage of ovarian cancer based on their expression profiles. We further deduced the gene regulatory networks of the 20 critical genes by applying the Pearson correlation coefficient to evaluate the correlationship between the expression of each gene at the same stages and at different stages. Correlationships between gene pairs were calculated, and then, three regulatory networks were deduced. Their correlationships were further confirmed by the Ingenuity pathway analysis. The prognostic significances of the genes identified via regulatory networks were examined using online tools, and most represented biomarker candidates. In summary, our proposed system provides a new strategy to identify critical genes or biomarkers, as well as their regulatory networks, from microarray data. Copyright © 2018. Published by Elsevier Inc.
Blood-Based Gene Expression Signatures of Infants and Toddlers with Autism
ERIC Educational Resources Information Center
Glatt, Stephen J.; Tsuang, Ming T.; Winn, Mary; Chandler, Sharon D.; Collins, Melanie; Lopez, Linda; Weinfeld, Melanie; Carter, Cindy; Schork, Nicholas; Pierce, Karen; Courchesne, Eric
2012-01-01
Objective: Autism spectrum disorders (ASDs) are highly heritable neurodevelopmental disorders that onset clinically during the first years of life. ASD risk biomarkers expressed early in life could significantly impact diagnosis and treatment, but no transcriptome-wide biomarker classifiers derived from fresh blood samples from children with…
Identification of a Genomic Signature Predicting for Recurrence in Early Stage Ovarian Cancer
2015-12-01
early stage ovarian cancer to help researchers worldwide identify biomarkers that can aid early detection and inform novel targets for therapy. This...to detect differentially expressed genes after transformation using Voom. When using the top 5 genes to build the classifier, it predicted...to analyze expression of micro-RNA in these samples. Thus, at the end of the third year of funding we started a parallel analysis of RNAseq, DNA- CNV
Co-expression network analysis of duplicate genes in maize (Zea mays L.) reveals no subgenome bias.
Li, Lin; Briskine, Roman; Schaefer, Robert; Schnable, Patrick S; Myers, Chad L; Flagel, Lex E; Springer, Nathan M; Muehlbauer, Gary J
2016-11-04
Gene duplication is prevalent in many species and can result in coding and regulatory divergence. Gene duplications can be classified as whole genome duplication (WGD), tandem and inserted (non-syntenic). In maize, WGD resulted in the subgenomes maize1 and maize2, of which maize1 is considered the dominant subgenome. However, the landscape of co-expression network divergence of duplicate genes in maize is still largely uncharacterized. To address the consequence of gene duplication on co-expression network divergence, we developed a gene co-expression network from RNA-seq data derived from 64 different tissues/stages of the maize reference inbred-B73. WGD, tandem and inserted gene duplications exhibited distinct regulatory divergence. Inserted duplicate genes were more likely to be singletons in the co-expression networks, while WGD duplicate genes were likely to be co-expressed with other genes. Tandem duplicate genes were enriched in the co-expression pattern where co-expressed genes were nearly identical for the duplicates in the network. Older gene duplications exhibit more extensive co-expression variation than younger duplications. Overall, non-syntenic genes primarily from inserted duplications show more co-expression divergence. Also, such enlarged co-expression divergence is significantly related to duplication age. Moreover, subgenome dominance was not observed in the co-expression networks - maize1 and maize2 exhibit similar levels of intra subgenome correlations. Intriguingly, the level of inter subgenome co-expression was similar to the level of intra subgenome correlations, and genes from specific subgenomes were not likely to be the enriched in co-expression network modules and the hub genes were not predominantly from any specific subgenomes in maize. Our work provides a comprehensive analysis of maize co-expression network divergence for three different types of gene duplications and identifies potential relationships between duplication types, duplication ages and co-expression consequences.
Bruno, Maria E C; Rogier, Eric W; Arsenescu, Razvan I; Flomenhoft, Deborah R; Kurkjian, Cathryn J; Ellis, Gavin I; Kaetzel, Charlotte S
2015-10-01
Inflammatory bowel diseases (IBD), including Crohn's disease (CD) and ulcerative colitis (UC), are characterized by chronic intestinal inflammation due to immunological, microbial, and environmental factors in genetically predisposed individuals. Advances in the diagnosis, prognosis, and treatment of IBD require the identification of robust biomarkers that can be used for molecular classification of diverse disease presentations. We previously identified five genes, RELA, TNFAIP3 (A20), PIGR, TNF, and IL8, whose mRNA levels in colonic mucosal biopsies could be used in a multivariate analysis to classify patients with CD based on disease behavior and responses to therapy. We compared expression of these five biomarkers in IBD patients classified as having CD or UC, and in healthy controls. Patients with CD were characterized as having decreased median expression of TNFAIP3, PIGR, and TNF in non-inflamed colonic mucosa as compared to healthy controls. By contrast, UC patients exhibited decreased expression of PIGR and elevated expression of IL8 in colonic mucosa compared to healthy controls. A multivariate analysis combining mRNA levels for all five genes resulted in segregation of individuals based on disease presentation (CD vs. UC) as well as severity, i.e., patients in remission versus those with acute colitis at the time of biopsy. We propose that this approach could be used as a model for molecular classification of IBD patients, which could further be enhanced by the inclusion of additional genes that are identified by functional studies, global gene expression analyses, and genome-wide association studies.
Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer.
Paik, E Sun; Choi, Hyun Jin; Kim, Tae-Joong; Lee, Jeong-Won; Kim, Byoung-Gie; Bae, Duk-Soo; Choi, Chel Hun
2018-04-01
We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients' survival. Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.
Gene Expression Signatures Based on Variability can Robustly Predict Tumor Progression and Prognosis
Dinalankara, Wikum; Bravo, Héctor Corrada
2015-01-01
Gene expression signatures are commonly used to create cancer prognosis and diagnosis methods, yet only a small number of them are successfully deployed in the clinic since many fail to replicate performance on subsequent validation. A primary reason for this lack of reproducibility is the fact that these signatures attempt to model the highly variable and unstable genomic behavior of cancer. Our group recently introduced gene expression anti-profiles as a robust methodology to derive gene expression signatures based on the observation that while gene expression measurements are highly heterogeneous across tumors of a specific cancer type relative to the normal tissue, their degree of deviation from normal tissue expression in specific genes involved in tissue differentiation is a stable tumor mark that is reproducible across experiments and cancer types. Here we show that constructing gene expression signatures based on variability and the anti-profile approach yields classifiers capable of successfully distinguishing benign growths from cancerous growths based on deviation from normal expression. We then show that this same approach generates stable and reproducible signatures that predict probability of relapse and survival based on tumor gene expression. These results suggest that using the anti-profile framework for the discovery of genomic signatures is an avenue leading to the development of reproducible signatures suitable for adoption in clinical settings. PMID:26078586
Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo
2005-01-01
A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.
Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J
2018-05-29
Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.
Paul, Topon Kumar; Iba, Hitoshi
2009-01-01
In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.
2013-01-01
Background Qualitative alterations or abnormal expression of microRNAs (miRNAs) in colon cancer have mainly been demonstrated in primary tumors. Poorly overlapping sets of oncomiRs, tumor suppressor miRNAs and metastamiRs have been linked with distinct stages in the progression of colorectal cancer. To identify changes in both miRNA and gene expression levels among normal colon mucosa, primary tumor and liver metastasis samples, and to classify miRNAs into functional networks, in this work miRNA and gene expression profiles in 158 samples from 46 patients were analysed. Results Most changes in miRNA and gene expression levels had already manifested in the primary tumors while these levels were almost stably maintained in the subsequent primary tumor-to-metastasis transition. In addition, comparing normal tissue, tumor and metastasis, we did not observe general impairment or any rise in miRNA biogenesis. While only few mRNAs were found to be differentially expressed between primary colorectal carcinoma and liver metastases, miRNA expression profiles can classify primary tumors and metastases well, including differential expression of miR-10b, miR-210 and miR-708. Of 82 miRNAs that were modulated during tumor progression, 22 were involved in EMT. qRT-PCR confirmed the down-regulation of miR-150 and miR-10b in both primary tumor and metastasis compared to normal mucosa and of miR-146a in metastases compared to primary tumor. The upregulation of miR-201 in metastasis compared both with normal and primary tumour was also confirmed. A preliminary survival analysis considering differentially expressed miRNAs suggested a possible link between miR-10b expression in metastasis and patient survival. By integrating miRNA and target gene expression data, we identified a combination of interconnected miRNAs, which are organized into sub-networks, including several regulatory relationships with differentially expressed genes. Key regulatory interactions were validated experimentally. Specific mixed circuits involving miRNAs and transcription factors were identified and deserve further investigation. The suppressor activity of miR-182 on ENTPD5 gene was identified for the first time and confirmed in an independent set of samples. Conclusions Using a large dataset of CRC miRNA and gene expression profiles, we describe the interplay of miRNA groups in regulating gene expression, which in turn affects modulated pathways that are important for tumor development. PMID:23987127
Abruzzo, Lynne V; Barron, Lynn L; Anderson, Keith; Newman, Rachel J; Wierda, William G; O'brien, Susan; Ferrajoli, Alessandra; Luthra, Madan; Talwalkar, Sameer; Luthra, Rajyalakshmi; Jones, Dan; Keating, Michael J; Coombes, Kevin R
2007-09-01
To develop a model incorporating relevant prognostic biomarkers for untreated chronic lymphocytic leukemia patients, we re-analyzed the raw data from four published gene expression profiling studies. We selected 88 candidate biomarkers linked to immunoglobulin heavy-chain variable region gene (IgV(H)) mutation status and produced a reliable and reproducible microfluidics quantitative real-time polymerase chain reaction array. We applied this array to a training set of 29 purified samples from previously untreated patients. In an unsupervised analysis, the samples clustered into two groups. Using a cutoff point of 2% homology to the germline IgV(H) sequence, one group contained all 14 IgV(H)-unmutated samples; the other contained all 15 mutated samples. We confirmed the differential expression of 37 of the candidate biomarkers using two-sample t-tests. Next, we constructed 16 different models to predict IgV(H) mutation status and evaluated their performance on an independent test set of 20 new samples. Nine models correctly classified 11 of 11 IgV(H)-mutated cases and eight of nine IgV(H)-unmutated cases, with some models using three to seven genes. Thus, we can classify cases with 95% accuracy based on the expression of as few as three genes.
Identification of innate lymphoid cells in single-cell RNA-Seq data.
Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David
2017-07-01
Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.
Nikolaidis, Nikolas; Nei, Masatoshi
2004-03-01
We have identified the Hsp70 gene superfamily of the nematode Caenorhabditis briggsae and investigated the evolution of these genes in comparison with Hsp70 genes from C. elegans, Drosophila, and yeast. The Hsp70 genes are classified into three monophyletic groups according to their subcellular localization, namely, cytoplasm (CYT), endoplasmic reticulum (ER), and mitochondria (MT). The Hsp110 genes can be classified into the polyphyletic CYT group and the monophyletic ER group. The different Hsp70 and Hsp110 groups appeared to evolve following the model of divergent evolution. This model can also explain the evolution of the ER and MT genes. On the other hand, the CYT genes are divided into heat-inducible and constitutively expressed genes. The constitutively expressed genes have evolved more or less following the birth-and-death process, and the rates of gene birth and gene death are different between the two nematode species. By contrast, some heat-inducible genes show an intraspecies phylogenetic clustering. This suggests that they are subject to sequence homogenization resulting from gene conversion-like events. In addition, the heat-inducible genes show high levels of sequence conservation in both intra-species and inter-species comparisons, and in most cases, amino acid sequence similarity is higher than nucleotide sequence similarity. This indicates that purifying selection also plays an important role in maintaining high sequence similarity among paralogous Hsp70 genes. Therefore, we suggest that the CYT heat-inducible genes have been subjected to a combination of purifying selection, birth-and-death process, and gene conversion-like events.
Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes
Manning, Cerys; Rattray, Magnus
2017-01-01
Multiple biological processes are driven by oscillatory gene expression at different time scales. Pulsatile dynamics are thought to be widespread, and single-cell live imaging of gene expression has lead to a surge of dynamic, possibly oscillatory, data for different gene networks. However, the regulation of gene expression at the level of an individual cell involves reactions between finite numbers of molecules, and this can result in inherent randomness in expression dynamics, which blurs the boundaries between aperiodic fluctuations and noisy oscillators. This underlies a new challenge to the experimentalist because neither intuition nor pre-existing methods work well for identifying oscillatory activity in noisy biological time series. Thus, there is an acute need for an objective statistical method for classifying whether an experimentally derived noisy time series is periodic. Here, we present a new data analysis method that combines mechanistic stochastic modelling with the powerful methods of non-parametric regression with Gaussian processes. Our method can distinguish oscillatory gene expression from random fluctuations of non-oscillatory expression in single-cell time series, despite peak-to-peak variability in period and amplitude of single-cell oscillations. We show that our method outperforms the Lomb-Scargle periodogram in successfully classifying cells as oscillatory or non-oscillatory in data simulated from a simple genetic oscillator model and in experimental data. Analysis of bioluminescent live-cell imaging shows a significantly greater number of oscillatory cells when luciferase is driven by a Hes1 promoter (10/19), which has previously been reported to oscillate, than the constitutive MoMuLV 5’ LTR (MMLV) promoter (0/25). The method can be applied to data from any gene network to both quantify the proportion of oscillating cells within a population and to measure the period and quality of oscillations. It is publicly available as a MATLAB package. PMID:28493880
Comparison of transcriptomic signature of post-Chernobyl and postradiotherapy thyroid tumors.
Ory, Catherine; Ugolin, Nicolas; Hofman, Paul; Schlumberger, Martin; Likhtarev, Illya A; Chevillard, Sylvie
2013-11-01
We previously identified two highly discriminating and predictive radiation-induced transcriptomic signatures by comparing series of sporadic and postradiotherapy thyroid tumors (322-gene signature), and by reanalyzing a previously published data set of sporadic and post-Chernobyl thyroid tumors (106-gene signature). The aim of the present work was (i) to compare the two signatures in terms of gene expression deregulations and molecular features/pathways, and (ii) to test the capacity of the postradiotherapy signature in classifying the post-Chernobyl series of tumors and reciprocally of the post-Chernobyl signature in classifying the postradiotherapy-induced tumors. We now explored if postradiotherapy and post-Chernobyl papillary thyroid carcinomas (PTC) display common molecular features by comparing molecular pathways deregulated in the two tumor series, and tested the potential of gene subsets of the postradiotherapy signature to classify the post-Chernobyl series (14 sporadic and 12 post-Chernobyl PTC), and reciprocally of gene subsets of the post-Chernobyl signature to classify the postradiotherapy series (15 sporadic and 12 postradiotherapy PTC), by using conventional principal component analysis. We found that the five genes common to the two signatures classified the learning/training tumors (used to search these signatures) of both the postradiotherapy (seven PTC) and the post-Chernobyl (six PTC) thyroid tumor series as compared with the sporadic tumors (seven sporadic PTC in each series). Importantly, these five genes were also effective for classifying independent series of postradiotherapy (five PTC) and post-Chernobyl (six PTC) tumors compared to independent series of sporadic tumors (eight PTC and six PTC respectively; testing tumors). Moreover, part of each postradiotherapy (32 genes) and post-Chernobyl signature (16 genes) cross-classified the respective series of thyroid tumors. Finally, several molecular pathways deregulated in post-Chernobyl tumors matched those found to be deregulated in postradiotherapy tumors. Overall, our data suggest that thyroid tumors that developed following either external exposure or internal (131)I contamination shared common molecular features, related to DNA repair, oxidative and endoplasmic reticulum stresses, allowing their classification as radiation-induced tumors in comparison with sporadic counterparts, independently of doses and dose rates, which suggests there may be a "general" radiation-induced signature of thyroid tumors.
How large a training set is needed to develop a classifier for microarray data?
Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M
2008-01-01
A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.
Cangelosi, Davide; Muselli, Marco; Parodi, Stefano; Blengio, Fabiola; Becherini, Pamela; Versteeg, Rogier; Conte, Massimo; Varesio, Luigi
2014-01-01
Cancer patient's outcome is written, in part, in the gene expression profile of the tumor. We previously identified a 62-probe sets signature (NB-hypo) to identify tissue hypoxia in neuroblastoma tumors and showed that NB-hypo stratified neuroblastoma patients in good and poor outcome 1. It was important to develop a prognostic classifier to cluster patients into risk groups benefiting of defined therapeutic approaches. Novel classification and data discretization approaches can be instrumental for the generation of accurate predictors and robust tools for clinical decision support. We explored the application to gene expression data of Rulex, a novel software suite including the Attribute Driven Incremental Discretization technique for transforming continuous variables into simplified discrete ones and the Logic Learning Machine model for intelligible rule generation. We applied Rulex components to the problem of predicting the outcome of neuroblastoma patients on the bases of 62 probe sets NB-hypo gene expression signature. The resulting classifier consisted in 9 rules utilizing mainly two conditions of the relative expression of 11 probe sets. These rules were very effective predictors, as shown in an independent validation set, demonstrating the validity of the LLM algorithm applied to microarray data and patients' classification. The LLM performed as efficiently as Prediction Analysis of Microarray and Support Vector Machine, and outperformed other learning algorithms such as C4.5. Rulex carried out a feature selection by selecting a new signature (NB-hypo-II) of 11 probe sets that turned out to be the most relevant in predicting outcome among the 62 of the NB-hypo signature. Rules are easily interpretable as they involve only few conditions. Our findings provided evidence that the application of Rulex to the expression values of NB-hypo signature created a set of accurate, high quality, consistent and interpretable rules for the prediction of neuroblastoma patients' outcome. We identified the Rulex weighted classification as a flexible tool that can support clinical decisions. For these reasons, we consider Rulex to be a useful tool for cancer classification from microarray gene expression data.
Flores-Monterroso, Aranzazu; Canales, Javier; de la Torre, Fernando; Ávila, Concepción; Cánovas, Francisco M
2013-06-01
Ectomycorrhizal associations are of major ecological importance in temperate and boreal forests. The development of a functional ectomycorrhiza requires many genetic and biochemical changes. In this study, suppressive subtraction hybridization was used to identify differentially expressed genes in the roots of maritime pine (Pinus pinaster Aiton) inoculated with Laccaria bicolor, a mycorrhizal fungus. A total number of 200 unigenes were identified as being differentially regulated in maritime pine roots during the development of mycorrhiza. These unigenes were classified into 10 categories according to the function of their homologues in the GenBank database. Approximately, 40 % of the differentially expressed transcripts were genes that coded for unknown proteins in the databases or that had no homology to known genes. A group of these differentially expressed genes was selected to validate the results using quantitative real-time PCR. The transcript levels of the representative genes were compared between the non-inoculated and inoculated plants at 1, 5, 15 and 30 days after inoculation. The observed expression patterns indicate (1) changes in the composition of the wall cell, (2) tight regulation of defence genes during the development of mycorrhiza and (3) changes in carbon and nitrogen metabolism. Ammonium excess or deficiency dramatically affected the stability of ectomycorrhiza and altered gene expression in maritime pine roots.
NASA Astrophysics Data System (ADS)
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-06-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs.
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-01-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs. PMID:26047353
Spectral biclustering of microarray data: coclustering genes and conditions.
Kluger, Yuval; Basri, Ronen; Chang, Joseph T; Gerstein, Mark
2003-04-01
Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find "marker genes" that are differentially expressed in particular sets of "conditions." We have developed a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of gene expression data, if they exist. In a cancer context, these checkerboards correspond to genes that are markedly up- or downregulated in patients with particular types of tumors. Our method, spectral biclustering, is based on the observation that checkerboard structures in matrices of expression data can be found in eigenvectors corresponding to characteristic expression patterns across genes or conditions. In addition, these eigenvectors can be readily identified by commonly used linear algebra approaches, in particular the singular value decomposition (SVD), coupled with closely integrated normalization steps. We present a number of variants of the approach, depending on whether the normalization over genes and conditions is done independently or in a coupled fashion. We then apply spectral biclustering to a selection of publicly available cancer expression data sets, and examine the degree to which the approach is able to identify checkerboard structures. Furthermore, we compare the performance of our biclustering methods against a number of reasonable benchmarks (e.g., direct application of SVD or normalized cuts to raw data).
PAM50 Breast Cancer Subtyping by RT-qPCR and Concordance with Standard Clinical Molecular Markers
2012-01-01
Background Many methodologies have been used in research to identify the “intrinsic” subtypes of breast cancer commonly known as Luminal A, Luminal B, HER2-Enriched (HER2-E) and Basal-like. The PAM50 gene set is often used for gene expression-based subtyping; however, surrogate subtyping using panels of immunohistochemical (IHC) markers are still widely used clinically. Discrepancies between these methods may lead to different treatment decisions. Methods We used the PAM50 RT-qPCR assay to expression profile 814 tumors from the GEICAM/9906 phase III clinical trial that enrolled women with locally advanced primary invasive breast cancer. All samples were scored at a single site by IHC for estrogen receptor (ER), progesterone receptor (PR), and Her2/neu (HER2) protein expression. Equivocal HER2 cases were confirmed by chromogenic in situ hybridization (CISH). Single gene scores by IHC/CISH were compared with RT-qPCR continuous gene expression values and “intrinsic” subtype assignment by the PAM50. High, medium, and low expression for ESR1, PGR, ERBB2, and proliferation were selected using quartile cut-points from the continuous RT-qPCR data across the PAM50 subtype assignments. Results ESR1, PGR, and ERBB2 gene expression had high agreement with established binary IHC cut-points (area under the curve (AUC) ≥ 0.9). Estrogen receptor positivity by IHC was strongly associated with Luminal (A and B) subtypes (92%), but only 75% of ER negative tumors were classified into the HER2-E and Basal-like subtypes. Luminal A tumors more frequently expressed PR than Luminal B (94% vs 74%) and Luminal A tumors were less likely to have high proliferation (11% vs 77%). Seventy-seven percent (30/39) of ER-/HER2+ tumors by IHC were classified as the HER2-E subtype. Triple negative tumors were mainly comprised of Basal-like (57%) and HER2-E (30%) subtypes. Single gene scoring for ESR1, PGR, and ERBB2 was more prognostic than the corresponding IHC markers as shown in a multivariate analysis. Conclusions The standard immunohistochemical panel for breast cancer (ER, PR, and HER2) does not adequately identify the PAM50 gene expression subtypes. Although there is high agreement between biomarker scoring by protein immunohistochemistry and gene expression, the gene expression determinations for ESR1 and ERBB2 status was more prognostic. PMID:23035882
Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro
2015-12-01
Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.
ESTs Analysis Reveals Putative Genes Involved in Symbiotic Seed Germination in Dendrobium officinale
Zhao, Ming-Ming; Zhang, Gang; Zhang, Da-Wei; Hsiao, Yu-Yun; Guo, Shun-Xing
2013-01-01
Dendrobium officinale (Orchidaceae) is one of the world’s most endangered plants with great medicinal value. In nature, D . officinale seeds must establish symbiotic relationships with fungi to germinate. However, the molecular events involved in the interaction between fungus and plant during this process are poorly understood. To isolate the genes involved in symbiotic germination, a suppression subtractive hybridization (SSH) cDNA library of symbiotically germinated D . officinale seeds was constructed. From this library, 1437 expressed sequence tags (ESTs) were clustered to 1074 Unigenes (including 902 singletons and 172 contigs), which were searched against the NCBI non-redundant (NR) protein database (E-value cutoff, e-5). Based on sequence similarity with known proteins, 579 differentially expressed genes in D . officinale were identified and classified into different functional categories by Gene Ontology (GO), Clusters of orthologous Groups of proteins (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The expression levels of 15 selected genes emblematic of symbiotic germination were confirmed via real-time quantitative PCR. These genes were classified into various categories, including defense and stress response, metabolism, transcriptional regulation, transport process and signal transduction pathways. All transcripts were upregulated in the symbiotically germinated seeds (SGS). The functions of these genes in symbiotic germination were predicted. Furthermore, two fungus-induced calcium-dependent protein kinases (CDPKs), which were upregulated 6.76- and 26.69-fold in SGS compared with un-germinated seeds (UGS), were cloned from D . officinale and characterized for the first time. This study provides the first global overview of genes putatively involved in D . officinale symbiotic seed germination and provides a foundation for further functional research regarding symbiotic relationships in orchids. PMID:23967335
Zhao, Ming-Ming; Zhang, Gang; Zhang, Da-Wei; Hsiao, Yu-Yun; Guo, Shun-Xing
2013-01-01
Dendrobiumofficinale (Orchidaceae) is one of the world's most endangered plants with great medicinal value. In nature, D. officinale seeds must establish symbiotic relationships with fungi to germinate. However, the molecular events involved in the interaction between fungus and plant during this process are poorly understood. To isolate the genes involved in symbiotic germination, a suppression subtractive hybridization (SSH) cDNA library of symbiotically germinated D. officinale seeds was constructed. From this library, 1437 expressed sequence tags (ESTs) were clustered to 1074 Unigenes (including 902 singletons and 172 contigs), which were searched against the NCBI non-redundant (NR) protein database (E-value cutoff, e(-5)). Based on sequence similarity with known proteins, 579 differentially expressed genes in D. officinale were identified and classified into different functional categories by Gene Ontology (GO), Clusters of orthologous Groups of proteins (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The expression levels of 15 selected genes emblematic of symbiotic germination were confirmed via real-time quantitative PCR. These genes were classified into various categories, including defense and stress response, metabolism, transcriptional regulation, transport process and signal transduction pathways. All transcripts were upregulated in the symbiotically germinated seeds (SGS). The functions of these genes in symbiotic germination were predicted. Furthermore, two fungus-induced calcium-dependent protein kinases (CDPKs), which were upregulated 6.76- and 26.69-fold in SGS compared with un-germinated seeds (UGS), were cloned from D. officinale and characterized for the first time. This study provides the first global overview of genes putatively involved in D. officinale symbiotic seed germination and provides a foundation for further functional research regarding symbiotic relationships in orchids.
Wang, Shengji; Wang, Jiying; Yao, Wenjing; Zhou, Boru; Li, Renhua; Jiang, Tingbo
2014-10-01
Spatio-temporal expression patterns of 13 out of 119 poplar WRKY genes indicated dynamic and tissue-specific roles of WRKY family proteins in salinity stress tolerance. To understand the expression patterns of poplar WRKY genes under salinity stress, 51 of the 119 WRKY genes were selected from di-haploid Populus simonii × P. nigra by quantitative real-time PCR (qRT-PCR). We used qRT-PCR to profile the expression of the top 13 genes under salinity stress across seven time points, and employed RNA-Seq platforms to cross-validate it. Results demonstrated that all the 13 WRKY genes were expressed in root, stem, and leaf tissues, but their expression levels and overall patterns varied notably in these tissues. Regarding overall gene expression in roots, the 13 genes were significantly highly expressed at all six time points after the treatment, reaching the plateau of expression at hour 9. In leaves, the 13 genes were similarly up-regulated from 3 to 12 h in response to NaCl treatment. In stems, however, expression levels of the 13 genes did not show significant changes after the NaCl treatment. Regarding individual gene expression across the time points and the three tissues, the 13 genes can be classified into three clusters: the lowly expressed Cluster 1 containing PthWRKY28, 45 and 105; intermediately expressed Clusters 2 including PthWRKY56, 88 and 116; and highly expressed Cluster 3 consisting of PthWRKY41, 44, 51, 61, 62, 75 and 106. In general, genes in Cluster 2 and 3 displayed a dynamic pattern of "induced amplification-recovering", suggesting that these WRKY genes and corresponding pathways may play a critical role in mediating salt response and tolerance in a dynamic and tissue-specific manner.
Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco
2018-03-01
This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.
Comprehensive analysis of mitogen-activated protein kinase cascades in chrysanthemum
Ding, Lian; Zhang, Xue; Li, Peiling; Liu, Ye
2018-01-01
Background Mitogen-activated protein kinase (MAPK) cascades, an important type of pathway in eukaryotic signaling networks, play a key role in plant defense responses, growth and development. Methods Phylogenetic analysis and conserved motif analysis of the MKK and MPK families in Arabidopsis thaliana, Helianthus annuus and Chrysanthemum morifolium classified MKK genes and MPK genes. qRT-PCR was used for the expression patterns of CmMPK and CmMKK genes, and yeast two-hybrid assay was applied to clear the interaction between CmMPKs and CmMKKs. Results We characterized six MKK genes and 11 MPK genes in chrysanthemum based on transcriptomic sequences and classified these genes into four groups. qRT-PCR analysis demonstrated that CmMKKs and CmMPKs exhibited various expression patterns in different organs of chrysanthemum and in response to abiotic stresses and phytohormone treatments. Furthermore, a yeast two-hybrid assay was applied to analyze the interaction between CmMKKs and CmMPKs and reveal the MAPK cascades in chrysanthemum. Discussion Our data led us to propose that CmMKK4-CmMPK13 and CmMKK2-CmMPK4 may be involved in regulating salt resistance and in the relationship between CmMKK9 and CmMPK6 and temperature stress. PMID:29942696
2010-01-01
Background Infection by infectious laryngotracheitis virus (ILTV; gallid herpesvirus 1) causes acute respiratory diseases in chickens often with high mortality. To better understand host-ILTV interactions at the host transcriptional level, a microarray analysis was performed using 4 × 44 K Agilent chicken custom oligo microarrays. Results Microarrays were hybridized using the two color hybridization method with total RNA extracted from ILTV infected chicken embryo lung cells at 0, 1, 3, 5, and 7 days post infection (dpi). Results showed that 789 genes were differentially expressed in response to ILTV infection that include genes involved in the immune system (cytokines, chemokines, MHC, and NF-κB), cell cycle regulation (cyclin B2, CDK1, and CKI3), matrix metalloproteinases (MMPs) and cellular metabolism. Differential expression for 20 out of 789 genes were confirmed by quantitative reverse transcription-PCR (qRT-PCR). A bioinformatics tool (Ingenuity Pathway Analysis) used to analyze biological functions and pathways on the group of 789 differentially expressed genes revealed that 21 possible gene networks with intermolecular connections among 275 functionally identified genes. These 275 genes were classified into a number of functional groups that included cancer, genetic disorder, cellular growth and proliferation, and cell death. Conclusion The results of this study provide comprehensive knowledge on global gene expression, and biological functionalities of differentially expressed genes in chicken embryo lung cells in response to ILTV infections. PMID:20663125
Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal
2017-01-01
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Identifying gnostic predictors of the vaccine response.
Haining, W Nicholas; Pulendran, Bali
2012-06-01
Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis of their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of vaccine prediction. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. Copyright © 2012 Elsevier Ltd. All rights reserved.
Identifying gnostic predictors of the vaccine response
Haining, W. Nicholas; Pulendran, Bali
2012-01-01
Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis for their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of outcome classification. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. PMID:22633886
Identification and characterization of the grape WRKY family.
Zhang, Ying; Feng, Jian Can
2014-01-01
WRKY transcription factors have functions in plant growth and development and in response to biotic and abiotic stresses. Many studies have focused on functional identification of WRKY transcription factors, but little is known about the molecular phylogeny or global expression patterns of the complete WRKY family. In this study, we identified 80 WRKY proteins encoded in the grape genome. Based on the structural features of these proteins, the grape WRKY genes were classified into three groups (groups 1-3). Analysis of WRKY genes expression profiles indicated that 28 WRKY genes were differentially expressed in response to biotic stress caused by grape whiterot and/or salicylic acid (SA). In that 16 WRKY genes upregulated both by whiterot pathogenic bacteria and SA. The results indicated that 16 WRKY proteins participated in SA-dependent defense signal pathway. This study provides a basis for cloning genes with specific functions from grape.
A Host-Based RT-PCR Gene Expression Signature to Identify Acute Respiratory Viral Infection
Zaas, Aimee K.; Burke, Thomas; Chen, Minhua; McClain, Micah; Nicholson, Bradly; Veldman, Timothy; Tsalik, Ephraim L.; Fowler, Vance; Rivers, Emanuel P.; Otero, Ronny; Kingsmore, Stephen F.; Voora, Deepak; Lucas, Joseph; Hero, Alfred O.; Carin, Lawrence; Woods, Christopher W.; Ginsburg, Geoffrey S.
2014-01-01
Improved ways to diagnose acute respiratory viral infections could decrease inappropriate antibacterial use and serve as a vital triage mechanism in the event of a potential viral pandemic. Measurement of the host response to infection is an alternative to pathogen-based diagnostic testing and may improve diagnostic accuracy. We have developed a host-based assay with a reverse transcription polymerase chain reaction (RT-PCR) TaqMan low-density array (TLDA) platform for classifying respiratory viral infection. We developed the assay using two cohorts experimentally infected with influenza A H3N2/Wisconsin or influenza A H1N1/Brisbane, and validated the assay in a sample of adults presenting to the emergency department with fever (n = 102) and in healthy volunteers (n = 41). Peripheral blood RNA samples were obtained from individuals who underwent experimental viral challenge or who presented to the emergency department and had microbiologically proven viral respiratory infection or systemic bacterial infection. The selected gene set on the RT-PCR TLDA assay classified participants with experimentally induced influenza H3N2 and H1N1 infection with 100 and 87% accuracy, respectively. We validated this host gene expression signature in a cohort of 102 individuals arriving at the emergency department. The sensitivity of the RT-PCR test was 89% [95% confidence interval (CI), 72 to 98%], and the specificity was 94% (95% CI, 86 to 99%). These results show that RT-PCR–based detection of a host gene expression signature can classify individuals with respiratory viral infection and sets the stage for prospective evaluation of this diagnostic approach in a clinical setting. PMID:24048524
Orozco, Carlos A; Acevedo, Andrés; Cortina, Lazaro; Cuellar, Gina E; Duarte, Mónica; Martín, Liliana; Mesa, Néstor M; Muñoz, Javier; Portilla, Carlos A; Quijano, Sandra M; Quintero, Guillermo; Rodriguez, Miriam; Saavedra, Carlos E; Groot, Helena; Torres, María M; López-Segura, Valeriano
2013-01-01
A variety of genetic alterations are considered hallmarks of cancer development and progression. The Ikaros gene family, encoding for key transcription factors in hematopoietic development, provides several examples as genetic defects in these genes are associated with the development of different types of leukemia. However, the complex patterns of expression of isoforms in Ikaros family genes has prevented their use as clinical markers. In this study, we propose the use of the expression profiles of the Ikaros isoforms to classify various hematological tumor diseases. We have standardized a quantitative PCR protocol to estimate the expression levels of the Ikaros gene exons. Our analysis reveals that these levels are associated with specific types of leukemia and we have found differences in the levels of expression relative to five interexonic Ikaros regions for all diseases studied. In conclusion, our method has allowed us to precisely discriminate between B-ALL, CLL and MM cases. Differences between the groups of lymphoid and myeloid pathologies were also identified in the same way.
Yan, Yan; Wang, Lianzhe; Ding, Zehong; Tie, Weiwei; Ding, Xupo; Zeng, Changying; Wei, Yunxie; Zhao, Hongliang; Peng, Ming; Hu, Wei
2016-01-01
Mitogen-activated protein kinases (MAPKs) play central roles in plant developmental processes, hormone signaling transduction, and responses to abiotic stress. However, no data are currently available about the MAPK family in cassava, an important tropical crop. Herein, 21 MeMAPK genes were identified from cassava. Phylogenetic analysis indicated that MeMAPKs could be classified into four subfamilies. Gene structure analysis demonstrated that the number of introns in MeMAPK genes ranged from 1 to 10, suggesting large variation among cassava MAPK genes. Conserved motif analysis indicated that all MeMAPKs had typical protein kinase domains. Transcriptomic analysis suggested that MeMAPK genes showed differential expression patterns in distinct tissues and in response to drought stress between wild subspecies and cultivated varieties. Interaction networks and co-expression analyses revealed that crucial pathways controlled by MeMAPK networks may be involved in the differential response to drought stress in different accessions of cassava. Expression of nine selected MAPK genes showed that these genes could comprehensively respond to osmotic, salt, cold, oxidative stressors, and abscisic acid (ABA) signaling. These findings yield new insights into the transcriptional control of MAPK gene expression, provide an improved understanding of abiotic stress responses and signaling transduction in cassava, and lead to potential applications in the genetic improvement of cassava cultivars. PMID:27625666
Wei, Ling; Yang, Chao; Tao, Wenjing; Wang, Deshou
2016-01-01
The Sox transcription factor family is characterized with the presence of a Sry-related high-mobility group (HMG) box and plays important roles in various biological processes in animals, including sex determination and differentiation, and the development of multiple organs. In this study, 27 Sox genes were identified in the genome of the Nile tilapia (Oreochromis niloticus), and were classified into seven groups. The members of each group of the tilapia Sox genes exhibited a relatively conserved exon-intron structure. Comparative analysis showed that the Sox gene family has undergone an expansion in tilapia and other teleost fishes following their whole genome duplication, and group K only exists in teleosts. Transcriptome-based analysis demonstrated that most of the tilapia Sox genes presented stage-specific and/or sex-dimorphic expressions during gonadal development, and six of the group B Sox genes were specifically expressed in the adult brain. Our results provide a better understanding of gene structure and spatio-temporal expression of the Sox gene family in tilapia, and will be useful for further deciphering the roles of the Sox genes during sex determination and gonadal development in teleosts. PMID:26907269
Wei, Ling; Yang, Chao; Tao, Wenjing; Wang, Deshou
2016-02-23
The Sox transcription factor family is characterized with the presence of a Sry-related high-mobility group (HMG) box and plays important roles in various biological processes in animals, including sex determination and differentiation, and the development of multiple organs. In this study, 27 Sox genes were identified in the genome of the Nile tilapia (Oreochromis niloticus), and were classified into seven groups. The members of each group of the tilapia Sox genes exhibited a relatively conserved exon-intron structure. Comparative analysis showed that the Sox gene family has undergone an expansion in tilapia and other teleost fishes following their whole genome duplication, and group K only exists in teleosts. Transcriptome-based analysis demonstrated that most of the tilapia Sox genes presented stage-specific and/or sex-dimorphic expressions during gonadal development, and six of the group B Sox genes were specifically expressed in the adult brain. Our results provide a better understanding of gene structure and spatio-temporal expression of the Sox gene family in tilapia, and will be useful for further deciphering the roles of the Sox genes during sex determination and gonadal development in teleosts.
Zhang, Bin; Liu, Xia; Zhao, Guangyao; Mao, Xinguo; Li, Ang; Jing, Ruilian
2014-06-01
Wheat (Triticum aestivum L.) is one of the most important crops in the world. Squamosa-promoter binding protein (SBP)-box genes play a critical role in regulating flower and fruit development. In this study, 10 novel SBP-box genes (TaSPL genes) were isolated from wheat ((Triticum aestivum L.) cultivar Yanzhan 4110). Phylogenetic analysis classified the TaSPL genes into five groups (G1-G5). The motif combinations and expression patterns of the TaSPL genes varied among the five groups with each having own distinctive characteristics: TaSPL20/21 in G1 and TaSPL17 in G2 mainly expressed in the shoot apical meristem and the young ear, and their expression levels responded to development of the ear; TaSPL6/15 belonging to G3 were upregulated and TaSPL1/23 in G4 were downregulated during grain development; the gene in G5 (TaSPL3) expressed constitutively. Thus, the consistency of the phylogenetic analysis, motif compositions, and expression patterns of the TaSPL genes revealed specific gene structures and functions. On the other hand, the diverse gene structures and different expression patterns suggested that wheat SBP-box genes have a wide range of functions. The results also suggest a potential role for wheat SBP-box genes in ear development. This study provides a significant beginning of functional analysis of SBP-box genes in wheat. © 2014 The Authors. Journal of Integrative Plant Biology Published by Wiley Publishing Asia Pty Ltd on behalf of Institute of Botany, Chinese Academy of Sciences.
Saga, Yukika; Inamura, Tomoka; Shimada, Nao; Kawata, Takefumi
2016-05-01
STATa, a Dictyostelium homologue of metazoan signal transducer and activator of transcription, is important for the organizer function in the tip region of the migrating Dictyostelium slug. We previously showed that ecmF gene expression depends on STATa in prestalk A (pstA) cells, where STATa is activated. Deletion and site-directed mutagenesis analysis of the ecmF/lacZ fusion gene in wild-type and STATa null strains identified an imperfect inverted repeat sequence, ACAAATANTATTTGT, as a STATa-responsive element. An upstream sequence element was required for efficient expression in the rear region of pstA zone; an element downstream of the inverted repeat was necessary for sufficient prestalk expression during culmination. Band shift analyses using purified STATa protein detected no sequence-specific binding to those ecmF elements. The only verified upregulated target gene of STATa is cudA gene; CudA directly activates expL7 gene expression in prestalk cells. However, ecmF gene expression was almost unaffected in a cudA null mutant. Several previously reported putative STATa target genes were also expressed in cudA null mutant but were downregulated in STATa null mutant. Moreover, mybC, which encodes another transcription factor, belonged to this category, and ecmF expression was downregulated in a mybC null mutant. These findings demonstrate the existence of a genetic hierarchy for pstA-specific genes, which can be classified into two distinct STATa downstream pathways, CudA dependent and independent. The ecmF expression is indirectly upregulated by STATa in a CudA-independent activation manner but dependent on MybC, whose expression is positively regulated by STATa. © 2016 Japanese Society of Developmental Biologists.
ADGO: analysis of differentially expressed gene sets using composite GO annotation.
Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun
2006-09-15
Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.
Chang, Yan-Li; Li, Wen-Yan; Miao, Hai; Yang, Shuai-Qi; Li, Ri; Wang, Xiang; Li, Wen-Qiang; Chen, Kun-Ming
2016-02-23
Plasma membrane NADPH oxidases (NOXs) are key producers of reactive oxygen species under both normal and stress conditions in plants and they form functional subfamilies. Studies of these subfamilies indicated that they show considerable evolutionary selection. We performed a comparative genomic analysis that identified 50 ferric reduction oxidases (FRO) and 77 NOX gene homologs from 20 species representing the eight major plant lineages within the supergroup Plantae: glaucophytes, rhodophytes, chlorophytes, bryophytes, lycophytes, gymnosperms, monocots, and eudicots. Phylogenetic and structural analysis classified these FRO and NOX genes into four well-conserved groups represented as NOX, FRO I, FRO II, and FRO III. Further analysis of NOXs of phylogenetic and exon/intron structures showed that single intron loss and gain had occurred, yielding the diversified gene structures during the evolution of NOXs family genes and which were classified into four conserved subfamilies which are represented as Sub.I, Sub.II, Sub.III, and Sub.IV. Additionally, both available global microarray data analysis and quantitative real-time PCR experiments revealed that the NOX genes in Arabidopsis and rice (Oryza sativa) have different expression patterns in different developmental stages, various abiotic stresses and hormone treatments. Finally, coexpression network analysis of NOX genes in Arabidopsis and rice revealed that NOXs have significantly correlated expression profiles with genes which are involved in plants metabolic and resistance progresses. All these results suggest that NOX family underscores the functional diversity and divergence in plants. This finding will facilitate further studies of the NOX family and provide valuable information for functional validation of this family in plants. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Evolutionary Analysis and Expression Profiling of Zebra Finch Immune Genes
Ekblom, Robert; French, Lisa; Slate, Jon; Burke, Terry
2010-01-01
Genes of the immune system are generally considered to evolve rapidly due to host–parasite coevolution. They are therefore of great interest in evolutionary biology and molecular ecology. In this study, we manually annotated 144 avian immune genes from the zebra finch (Taeniopygia guttata) genome and conducted evolutionary analyses of these by comparing them with their orthologs in the chicken (Gallus gallus). Genes classified as immune receptors showed elevated dN/dS ratios compared with other classes of immune genes. Immune genes in general also appear to be evolving more rapidly than other genes, as inferred from a higher dN/dS ratio compared with the rest of the genome. Furthermore, ten genes (of 27) for which sequence data were available from at least three bird species showed evidence of positive selection acting on specific codons. From transcriptome data of eight different tissues, we found evidence for expression of 106 of the studied immune genes, with primary expression of most of these in bursa, blood, and spleen. These immune-related genes showed a more tissue-specific expression pattern than other genes in the zebra finch genome. Several of the avian immune genes investigated here provide strong candidates for in-depth studies of molecular adaptation in birds. PMID:20884724
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.
Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping
2016-08-26
Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.
Differential gene expression patterns between smokers and non‐smokers: cause or consequence?
Jansen, Rick; Brooks, Andy; Willemsen, Gonneke; van Grootheest, Gerard; de Geus, Eco; Smit, Jan H.; Penninx, Brenda W.; Boomsma, Dorret I.
2015-01-01
Abstract The molecular mechanisms causing smoking‐induced health decline are largely unknown. To elucidate the molecular pathways involved in cause and consequences of smoking behavior, we conducted a genome‐wide gene expression study in peripheral blood samples targeting 18 238 genes. Data of 743 smokers, 1686 never smokers and 890 ex‐smokers were available from two population‐based cohorts from the Netherlands. In addition, data of 56 monozygotic twin pairs discordant for ever smoking were used. One hundred thirty‐two genes were differentially expressed between current smokers and never smokers (P < 1.2 × 10−6, Bonferroni correction). The most significant genes were G protein‐coupled receptor 15 (P < 1 × 10−150) and leucine‐rich repeat neuronal 3 (P < 1 × 10−44). The smoking‐related genes were enriched for immune system, blood coagulation, natural killer cell and cancer pathways. By taking the data of ex‐smokers into account, expression of these 132 genes was classified into reversible (94 genes), slowly reversible (31 genes), irreversible (6 genes) or inconclusive (1 gene). Expression of 6 of the 132 genes (three reversible and three slowly reversible) was confirmed to be reactive to smoking as they were differentially expressed in monozygotic pairs discordant for smoking. Cis‐expression quantitative trait loci for GPR56 and RARRES3 (downregulated in smokers) were associated with increased number of cigarettes smoked per day in a large genome‐wide association meta‐analysis, suggesting a causative effect of GPR56 and RARRES3 expression on smoking behavior. In conclusion, differential gene expression patterns in smokers are extensive and cluster in several underlying disease pathways. Gene expression differences seem mainly direct consequences of smoking, and largely reversible after smoking cessation. However, we also identified DNA variants that may influence smoking behavior via the mediating gene expression. PMID:26594007
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren's Syndrome.
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J; Gillespie, Colin S; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T; Emery, Paul; Lanyon, Peter; Hunter, John A; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.
Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures
Natsoulis, Georges; El Ghaoui, Laurent; Lanckriet, Gert R.G.; Tolley, Alexander M.; Leroy, Fabrice; Dunlea, Shane; Eynon, Barrett P.; Pearson, Cecelia I.; Tugendreich, Stuart; Jarnagin, Kurt
2005-01-01
A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class. PMID:15867433
Matsui, H; Nakamura, G; Ishiga, Y; Toshima, H; Inagaki, Y; Toyoda, K; Shiraishi, T; Ichinose, Y
2004-02-01
Recently, we observed that expression of a pea gene (S64) encoding an oxophytodienoic acid reductase (OPR) was induced by a suppressor of pea defense responses, secreted by the pea pathogen Mycosphaerella pinodes. Because it is known that OPRs are usually encoded by families of homologous genes, we screened for genomic and cDNA clones encoding members of this putative OPR family in pea. We isolated five members of the OPR gene family from a pea genomic DNA library, and amplified six cDNA clones, including S64, by RT-PCR (reverse transcriptase-PCR). Sequencing analysis revealed that S64 corresponds to PsOPR2, and the amino acid sequences of the predicted products of the six OPR-like genes shared more than 80% identity with each other. Based on their sequence similarity, all these OPR-like genes code for OPRs of subgroup I, i.e., enzymes which are not required for jasmonic acid biosynthesis. However, the genes varied in their exon/intron organization and in their promoter sequences. To investigate the expression of each individual OPR-like gene, RT-PCR was performed using gene-specific primers. The results indicated that the OPR-like gene most strongly induced by the inoculation of pea plants with a compatible pathogen and by treatment with the suppressor from M. pinodes was PsOPR2. Furthermore, the ability of the six recombinant OPR-like proteins to reduce a model substrate, 2-cyclohexen-1-one (2-CyHE), was investigated. The results indicated that PsOPR1, 4 and 6 display robust activity, and PsOPR2 has a most remarkable ability to reduce 2-CyHE, whereas PsOPR3 has little and PsOPR5 does not reduce this compound. Thus, the six OPR-like proteins can be classified into four types. Interestingly, the gene structures, expression profiles, and enzymatic activities used to classify each member of the pea OPR-like gene family are clearly correlated, indicating that each member of this OPR-like family has a distinct function.
Ngaki, Micheline N.; Wang, Bing; Sahu, Binod B.; Srivastava, Subodh K.; Farooqi, Mohammad S.; Kambakam, Sekhar; Swaminathan, Sivakumar
2016-01-01
Fusarium virguliforme causes the serious disease sudden death syndrome (SDS) in soybean. Host resistance to this pathogen is partial and is encoded by a large number of quantitative trait loci, each conditioning small effects. Breeding SDS resistance is therefore challenging and identification of single-gene encoded novel resistance mechanisms is becoming a priority to fight this devastating this fungal pathogen. In this transcriptomic study we identified a few putative soybean defense genes, expression of which is suppressed during F. virguliforme infection. The F. virguliforme infection-suppressed genes were broadly classified into four major classes. The steady state transcript levels of many of these genes were suppressed to undetectable levels immediately following F. virguliforme infection. One of these classes contains two novel genes encoding ankyrin repeat-containing proteins. Expression of one of these genes, GmARP1, during F. virguliforme infection enhances SDS resistance among the transgenic soybean plants. Our data suggest that GmARP1 is a novel defense gene and the pathogen presumably suppress its expression to establish compatible interaction. PMID:27760122
Selective modes determine evolutionary rates, gene compactness and expression patterns in Brassica.
Guo, Yue; Liu, Jing; Zhang, Jiefu; Liu, Shengyi; Du, Jianchang
2017-07-01
It has been well documented that most nuclear protein-coding genes in organisms can be classified into two categories: positively selected genes (PSGs) and negatively selected genes (NSGs). The characteristics and evolutionary fates of different types of genes, however, have been poorly understood. In this study, the rates of nonsynonymous substitution (K a ) and the rates of synonymous substitution (K s ) were investigated by comparing the orthologs between the two sequenced Brassica species, Brassica rapa and Brassica oleracea, and the evolutionary rates, gene structures, expression patterns, and codon bias were compared between PSGs and NSGs. The resulting data show that PSGs have higher protein evolutionary rates, lower synonymous substitution rates, shorter gene length, fewer exons, higher functional specificity, lower expression level, higher tissue-specific expression and stronger codon bias than NSGs. Although the quantities and values are different, the relative features of PSGs and NSGs have been largely verified in the model species Arabidopsis. These data suggest that PSGs and NSGs differ not only under selective pressure (K a /K s ), but also in their evolutionary, structural and functional properties, indicating that selective modes may serve as a determinant factor for measuring evolutionary rates, gene compactness and expression patterns in Brassica. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Comparative antennal transcriptome of Apis cerana cerana from four developmental stages.
Zhao, Huiting; Peng, Zhu; Du, Yali; Xu, Kai; Guo, Lina; Yang, Shuang; Ma, Weihua; Jiang, Yusuo
2018-06-20
Apis cerana cerana, an important endemic honey bee species in China, possesses valuable characteristics such as a sensitive olfactory system, good foraging ability, and strong resistance to parasitic mites. Here, we performed transcriptome sequencing of the antenna, the major chemosensory organ of the bee, using an Illumina sequencer, to identify typical differentially expressed genes (DEGs) in adult worker bees of different ages, namely, T1 (1 day); T2 (10 days); T3 (15 days); and T4 (25 days). Surprisingly, the expression levels of DEGs changed significantly between the T1 period and the other three periods. All the DEGs were classified into 26 expression profiles by trend analysis. Selected trend clusters were analyzed, and valuable information on gene expression patterns was obtained. We found that the expression levels of genes encoding cuticle proteins declined after eclosion, while those of immunity-related genes increased. In addition, genes encoding venom proteins and major royal jelly proteins were enriched at the T2 stage; small heat shock proteins showed significantly higher expression at the T3 stage; and some metabolism-related genes were more highly expressed at the T4 stage. The DEGs identified in this study may serve as a valuable resource for the characterization of expression patterns of antennal genes in A. cerana cerana. Furthermore, this study provides insights into the relationship between labor division in social bees and gene function. Copyright © 2018. Published by Elsevier B.V.
Soybean kinome: functional classification and gene expression patterns
Liu, Jinyi; Chen, Nana; Grant, Joshua N.; Cheng, Zong-Ming (Max); Stewart, C. Neal; Hewezi, Tarek
2015-01-01
The protein kinase (PK) gene family is one of the largest and most highly conserved gene families in plants and plays a role in nearly all biological functions. While a large number of genes have been predicted to encode PKs in soybean, a comprehensive functional classification and global analysis of expression patterns of this large gene family is lacking. In this study, we identified the entire soybean PK repertoire or kinome, which comprised 2166 putative PK genes, representing 4.67% of all soybean protein-coding genes. The soybean kinome was classified into 19 groups, 81 families, and 122 subfamilies. The receptor-like kinase (RLK) group was remarkably large, containing 1418 genes. Collinearity analysis indicated that whole-genome segmental duplication events may have played a key role in the expansion of the soybean kinome, whereas tandem duplications might have contributed to the expansion of specific subfamilies. Gene structure, subcellular localization prediction, and gene expression patterns indicated extensive functional divergence of PK subfamilies. Global gene expression analysis of soybean PK subfamilies revealed tissue- and stress-specific expression patterns, implying regulatory functions over a wide range of developmental and physiological processes. In addition, tissue and stress co-expression network analysis uncovered specific subfamilies with narrow or wide interconnected relationships, indicative of their association with particular or broad signalling pathways, respectively. Taken together, our analyses provide a foundation for further functional studies to reveal the biological and molecular functions of PKs in soybean. PMID:25614662
Molecular-Directed Treatment of Differentiated Thyroid Cancer: Advances in Diagnosis and Treatment.
Yip, Linwah; Sosa, Julie Ann
2016-07-01
Thyroid cancer incidence is increasing, and when fine-needle aspiration biopsy results are cytologically indeterminate, the diagnosis is often still established only after thyroidectomy. Molecular marker testing may be helpful in guiding patient-oriented and tailored management of thyroid nodules and thyroid cancer. To summarize available data on the use of molecular testing to improve the diagnosis and prognostication of thyroid cancer. A MEDLINE review was conducted using the primary search terms molecular, thyroid cancer, thyroid nodule, and gene expression classifier in search strings. Articles were restricted to those published between January 1, 2010, and June 1, 2015, inclusive of adult humans, and reported in the English language only. Of 867 titles screened, 67 articles were further identified for review of the full text. The 2 most studied molecular marker testing techniques for indeterminate thyroid nodules include gene expression classifier analysis and evaluation for somatic mutations or rearrangements that are commonly found in thyroid cancer (7-gene panel). Nodules with benign results on gene expression classifier analysis can be associated with less than a 5% risk of cancer and may be observed, while nodules with positive results on the 7-gene panel may have a higher risk of cancer (80%-100%) and definitive surgery can be recommended. However, cancer prevalence and geographic variations in histologic subtypes may affect accuracy and clinical applicability of both tests. Molecular marker tests such as ThyroSeq version 2.1 are more comprehensive, but they need further validation. Preoperative risk stratification using molecular markers also may be used to better define the optimal extent of thyroidectomy for patients with thyroid cancer. Molecular markers potentially can augment the diagnostic specificity of fine-needle aspiration biopsy to better differentiate cytologically indeterminate nodules that can be safely observed from cytologically indeterminate nodules that may be associated with differentiated thyroid cancer. Long-term follow-up data are still needed; in the end, patient preference regarding the relative risks and benefits of molecular testing is at the crux of decision making.
Knapp, Dunja; Schulz, Herbert; Rascon, Cynthia Alexander; Volkmer, Michael; Scholz, Juliane; Nacu, Eugen; Le, Mu; Novozhilov, Sergey; Tazaki, Akira; Protze, Stephanie; Jacob, Tina; Hubner, Norbert; Habermann, Bianca; Tanaka, Elly M.
2013-01-01
Understanding how the limb blastema is established after the initial wound healing response is an important aspect of regeneration research. Here we performed parallel expression profile time courses of healing lateral wounds versus amputated limbs in axolotl. This comparison between wound healing and regeneration allowed us to identify amputation-specific genes. By clustering the expression profiles of these samples, we could detect three distinguishable phases of gene expression – early wound healing followed by a transition-phase leading to establishment of the limb development program, which correspond to the three phases of limb regeneration that had been defined by morphological criteria. By focusing on the transition-phase, we identified 93 strictly amputation-associated genes many of which are implicated in oxidative-stress response, chromatin modification, epithelial development or limb development. We further classified the genes based on whether they were or were not significantly expressed in the developing limb bud. The specific localization of 53 selected candidates within the blastema was investigated by in situ hybridization. In summary, we identified a set of genes that are expressed specifically during regeneration and are therefore, likely candidates for the regulation of blastema formation. PMID:23658691
Yang, Chengqing; Hu, Guoqin; Li, Zezhi; Wang, Qingzhong; Wang, Xuemei; Yuan, Chengmei; Wang, Zuowei; Hong, Wu; Lu, Weihong; Cao, Lan; Chen, Jun; Wang, Yong; Yu, Shunying; Zhou, Yimin; Yi, Zhenghui; Fang, Yiru
2017-01-01
Subsyndromal symptomatic depression (SSD) is a subtype of subthreshold depressive and can lead to significant psychosocial functional impairment. Although the pathogenesis of major depressive disorder (MDD) and SSD still remains poorly understood, a set of studies have found that many same genetic factors play important roles in the etiology of these two disorders. Nowadays, the differential gene expression between MDD and SSD is still unknown. In our previous study, we compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD and matched healthy controls (8 subjects in each group), and finally determined 48 gene expression signatures. Based on these findings, we further clarify whether these genes mRNA was different expressed in peripheral blood in patients with SSD, MDD and healthy controls (60 subjects respectively). With the help of the quantitative real-time reverse transcription-polymerase chain reaction (RT-qPCR), we gained gene relative expression levels among the three groups. We found that there are three of the forty eight co-regulated genes had differential expression in peripheral blood among the three groups, which are CD84, STRN, CTNS gene (F = 3.528, p = 0.034; F = 3.382, p = 0.039; F = 3.801, p = 0.026, respectively) while there were no significant differences for other genes. CD84, STRN, CTNS gene may have significant value for performing diagnostic functions and classifying SSD, MDD and healthy controls.
Uchida, Masaya; Hirano, Masashi; Ishibashi, Hiroshi; Kobayashi, Jun; Kagami, Yoshihiro; Koyanagi, Akiko; Kusano, Teruhiko; Koga, Minoru; Arizono, Koji
2016-11-01
Nonylphenol (NP) has been classified as an endocrine-disrupting chemical. In this study, we conducted mysid DNA microarray analysis with which has 2240 oligo DNA probes to observe differential gene expressions in mysid crustacean (Americamysis bahia) exposed to 1, 3, 10 and 30 μg/l of NP for 14 days. As a result, we found 31, 27, 39 and 68 genes were differentially expressed in the respective concentrations. Among these genes, the expressions of five particular genes were regulated in a similar manner at all concentrations of the NP exposure. So, we focused on one gene encoding cuticle protein, and another encoding cuticular protein analogous to peritrophins 1-H precursor. These genes were down-regulated by NP exposure in a dose-dependent manner, and it suggested that they were related in a reduction of the number of molting in mysids. Thus, they might become useful molecular biomarker candidates to evaluate molting inhibition in mysids. Copyright © 2016 Elsevier Inc. All rights reserved.
Dong, Wei-Feng; Canil, Sarah; Lai, Raymond; Morel, Didier; Swanson, Paul E.; Izevbaye, Iyare
2018-01-01
A new automated MYC IHC classifier based on bivariate logistic regression is presented. The predictor relies on image analysis developed with the open-source ImageJ platform. From a histologic section immunostained for MYC protein, 2 dimensionless quantitative variables are extracted: (a) relative distance between nuclei positive for MYC IHC based on euclidean minimum spanning tree graph and (b) coefficient of variation of the MYC IHC stain intensity among MYC IHC-positive nuclei. Distance between positive nuclei is suggested to inversely correlate MYC gene rearrangement status, whereas coefficient of variation is suggested to inversely correlate physiological regulation of MYC protein expression. The bivariate classifier was compared with 2 other MYC IHC classifiers (based on percentage of MYC IHC positive nuclei), all tested on 113 lymphomas including mostly diffuse large B-cell lymphomas with known MYC fluorescent in situ hybridization (FISH) status. The bivariate classifier strongly outperformed the “percentage of MYC IHC-positive nuclei” methods to predict MYC+ FISH status with 100% sensitivity (95% confidence interval, 94-100) associated with 80% specificity. The test is rapidly performed and might at a minimum provide primary IHC screening for MYC gene rearrangement status in diffuse large B-cell lymphomas. Furthermore, as this bivariate classifier actually predicts “permanent overexpressed MYC protein status,” it might identify nontranslocation-related chromosomal anomalies missed by FISH. PMID:27093450
Szabo, Eva; Miller, Mark Steven; Lubet, Ronald A.; You, Ming; Wang, Yian
2017-01-01
Due to exposure to environmental toxicants, a “field cancerization” effect occurs in the lung resulting in the development of a field of initiated but morphologically normal appearing cells in the damaged epithelium of bronchial airways with dysregulated gene expression patterns. Using a mouse model of lung squamous cell carcinoma (SCC), we performed transcriptome sequencing (RNA-Seq) to profile bronchial airway gene expression and found activation of the PI3K and Myc signaling networks in cytologically normal bronchial airway epithelial cells of mice with preneopastic lung SCC lesions, which was reversed by treatment with the PI3K Inhibitor XL-147 and pioglitazone, respectively. Activated MYC signaling was also present in premalignant and tumor tissues from human lung SCC patients. In addition, we identified a key microRNA, mmu-miR-449c-5p, whose suppression significantly up-regulated Myc expression in the normal bronchial airway epithelial cells of mice with early stage SCC lesions. We developed a novel bronchial genomic classifier in mice and validated it in humans. In the classifier, Ppbp (pro-platelet basic protein) was overexpressed 115 fold in the bronchial airways of mice with preneoplastic lung SCC lesions. This is the first report that demonstrates Ppbp as a novel biomarker in the bronchial airway for lung cancer diagnosis. PMID:27935865
Evolution and expression analysis of the grape (Vitis vinifera L.) WRKY gene family.
Guo, Chunlei; Guo, Rongrong; Xu, Xiaozhao; Gao, Min; Li, Xiaoqin; Song, Junyang; Zheng, Yi; Wang, Xiping
2014-04-01
WRKY proteins comprise a large family of transcription factors that play important roles in plant defence regulatory networks, including responses to various biotic and abiotic stresses. To date, no large-scale study of WRKY genes has been undertaken in grape (Vitis vinifera L.). In this study, a total of 59 putative grape WRKY genes (VvWRKY) were identified and renamed on the basis of their respective chromosome distribution. A multiple sequence alignment analysis using all predicted grape WRKY genes coding sequences, together with those from Arabidopsis thaliana and tomato (Solanum lycopersicum), indicated that the 59 VvWRKY genes can be classified into three main groups (I-III). An evaluation of the duplication events suggested that several WRKY genes arose before the divergence of the grape and Arabidopsis lineages. Moreover, expression profiles derived from semiquantitative PCR and real-time quantitative PCR analyses showed distinct expression patterns in various tissues and in response to different treatments. Four VvWRKY genes showed a significantly higher expression in roots or leaves, 55 responded to varying degrees to at least one abiotic stress treatment, and the expression of 38 were altered following powdery mildew (Erysiphe necator) infection. Most VvWRKY genes were downregulated in response to abscisic acid or salicylic acid treatments, while the expression of a subset was upregulated by methyl jasmonate or ethylene treatments.
Evolution and expression analysis of the grape (Vitis vinifera L.) WRKY gene family
Guo, Chunlei; Guo, Rongrong; Wang, Xiping
2014-01-01
WRKY proteins comprise a large family of transcription factors that play important roles in plant defence regulatory networks, including responses to various biotic and abiotic stresses. To date, no large-scale study of WRKY genes has been undertaken in grape (Vitis vinifera L.). In this study, a total of 59 putative grape WRKY genes (VvWRKY) were identified and renamed on the basis of their respective chromosome distribution. A multiple sequence alignment analysis using all predicted grape WRKY genes coding sequences, together with those from Arabidopsis thaliana and tomato (Solanum lycopersicum), indicated that the 59 VvWRKY genes can be classified into three main groups (I–III). An evaluation of the duplication events suggested that several WRKY genes arose before the divergence of the grape and Arabidopsis lineages. Moreover, expression profiles derived from semiquantitative PCR and real-time quantitative PCR analyses showed distinct expression patterns in various tissues and in response to different treatments. Four VvWRKY genes showed a significantly higher expression in roots or leaves, 55 responded to varying degrees to at least one abiotic stress treatment, and the expression of 38 were altered following powdery mildew (Erysiphe necator) infection. Most VvWRKY genes were downregulated in response to abscisic acid or salicylic acid treatments, while the expression of a subset was upregulated by methyl jasmonate or ethylene treatments. PMID:24510937
Gene Discovery in Bladder Cancer Progression using cDNA Microarrays
Sanchez-Carbayo, Marta; Socci, Nicholas D.; Lozano, Juan Jose; Li, Wentian; Charytonowicz, Elizabeth; Belbin, Thomas J.; Prystowsky, Michael B.; Ortiz, Angel R.; Childs, Geoffrey; Cordon-Cardo, Carlos
2003-01-01
To identify gene expression changes along progression of bladder cancer, we compared the expression profiles of early-stage and advanced bladder tumors using cDNA microarrays containing 17,842 known genes and expressed sequence tags. The application of bootstrapping techniques to hierarchical clustering segregated early-stage and invasive transitional carcinomas into two main clusters. Multidimensional analysis confirmed these clusters and more importantly, it separated carcinoma in situ from papillary superficial lesions and subgroups within early-stage and invasive tumors displaying different overall survival. Additionally, it recognized early-stage tumors showing gene profiles similar to invasive disease. Different techniques including standard t-test, single-gene logistic regression, and support vector machine algorithms were applied to identify relevant genes involved in bladder cancer progression. Cytokeratin 20, neuropilin-2, p21, and p33ING1 were selected among the top ranked molecular targets differentially expressed and validated by immunohistochemistry using tissue microarrays (n = 173). Their expression patterns were significantly associated with pathological stage, tumor grade, and altered retinoblastoma (RB) expression. Moreover, p33ING1 expression levels were significantly associated with overall survival. Analysis of the annotation of the most significant genes revealed the relevance of critical genes and pathways during bladder cancer progression, including the overexpression of oncogenic genes such as DEK in superficial tumors or immune response genes such as Cd86 antigen in invasive disease. Gene profiling successfully classified bladder tumors based on their progression and clinical outcome. The present study has identified molecular biomarkers of potential clinical significance and critical molecular targets associated with bladder cancer progression. PMID:12875971
Shared Gene Expression Alterations in Nasal and Bronchial Epithelium for Lung Cancer Detection.
2017-07-01
We previously derived and validated a bronchial epithelial gene expression biomarker to detect lung cancer in current and former smokers. Given that bronchial and nasal epithelial gene expression are similarly altered by cigarette smoke exposure, we sought to determine if cancer-associated gene expression might also be detectable in the more readily accessible nasal epithelium. Nasal epithelial brushings were prospectively collected from current and former smokers undergoing diagnostic evaluation for pulmonary lesions suspicious for lung cancer in the AEGIS-1 (n = 375) and AEGIS-2 (n = 130) clinical trials and gene expression profiled using microarrays. All statistical tests were two-sided. We identified 535 genes that were differentially expressed in the nasal epithelium of AEGIS-1 patients diagnosed with lung cancer vs those with benign disease after one year of follow-up ( P < .001). Using bronchial gene expression data from the AEGIS-1 patients, we found statistically significant concordant cancer-associated gene expression alterations between the two airway sites ( P < .001). Differentially expressed genes in the nose were enriched for genes associated with the regulation of apoptosis and immune system signaling. A nasal lung cancer classifier derived in the AEGIS-1 cohort that combined clinical factors (age, smoking status, time since quit, mass size) and nasal gene expression (30 genes) had statistically significantly higher area under the curve (0.81; 95% confidence interval [CI] = 0.74 to 0.89, P = .01) and sensitivity (0.91; 95% CI = 0.81 to 0.97, P = .03) than a clinical-factor only model in independent samples from the AEGIS-2 cohort. These results support that the airway epithelial field of lung cancer-associated injury in ever smokers extends to the nose and demonstrates the potential of using nasal gene expression as a noninvasive biomarker for lung cancer detection. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Wang, S-N; Shan, S; Zheng, Y; Peng, Y; Lu, Z-Y; Yang, Y-Q; Li, R-J; Zhang, Y-J; Guo, Y-Y
2017-08-01
Odorant receptors (ORs) expressed in the antennae of parasitoid wasps are responsible for detection of various lipophilic airborne molecules. In the present study, 107 novel OR genes were identified from Microplitis mediator antennal transcriptome data. Phylogenetic analysis of the set of OR genes from M. mediator and Microplitis demolitor revealed that M. mediator OR (MmedOR) genes can be classified into different subfamilies, and the majority of MmedORs in each subfamily shared high sequence identities and clear orthologous relationships to M. demolitor ORs. Within a subfamily, six MmedOR genes, MmedOR98, 124, 125, 126, 131 and 155, shared a similar gene structure and were tightly linked in the genome. To evaluate whether the clustered MmedOR genes share common regulatory features, the transcription profile and expression characteristics of the six closely related OR genes were investigated in M. mediator. Rapid amplification of cDNA ends-PCR experiments revealed that the OR genes within the cluster were transcribed as single mRNAs, and a bicistronic mRNA for two adjacent genes (MmedOR124 and MmedOR98) was also detected in female antennae by reverse transcription PCR. In situ hybridization experiments indicated that each OR gene within the cluster was expressed in a different number of cells. Moreover, there was no co-expression of the two highly related OR genes, MmedOR124 and MmedOR98, which appeared to be individually expressed in a distinct population of neurons. Overall, there were distinct expression profiles of closely related MmedOR genes from the same cluster in M. mediator. These data provide a basic understanding of the olfactory coding in parasitoid wasps. © 2017 The Royal Entomological Society.
Chen, Xue; Chen, Zhu; Zhao, Hualin; Zhao, Yang; Cheng, Beijiu; Xiang, Yan
2014-01-01
Homeodomain-leucine zipper (HD-Zip) proteins, a group of homeobox transcription factors, participate in various aspects of normal plant growth and developmental processes as well as environmental responses. To date, no overall analysis or expression profiling of the HD-Zip gene family in soybean (Glycine max) has been reported. An investigation of the soybean genome revealed 88 putative HD-Zip genes. These genes were classified into four subfamilies, I to IV, based on phylogenetic analysis. In each subfamily, the constituent parts of gene structure and motif were relatively conserved. A total of 87 out of 88 genes were distributed unequally on 20 chromosomes with 36 segmental duplication events, indicating that segmental duplication is important for the expansion of the HD-Zip family. Analysis of the Ka/Ks ratios showed that the duplicated genes of the HD-Zip family basically underwent purifying selection with restrictive functional divergence after the duplication events. Analysis of expression profiles showed that 80 genes differentially expressed across 14 tissues, and 59 HD-Zip genes are differentially expressed under salinity and drought stress, with 20 paralogous pairs showing nearly identical expression patterns and three paralogous pairs diversifying significantly under drought stress. Quantitative real-time RT-PCR (qRT-PCR) analysis of six paralogous pairs of 12 selected soybean HD-Zip genes under both drought and salinity stress confirmed their stress-inducible expression patterns. This study presents a thorough overview of the soybean HD-Zip gene family and provides a new perspective on the evolution of this gene family. The results indicate that HD-Zip family genes may be involved in many plant responses to stress conditions. Additionally, this study provides a solid foundation for uncovering the biological roles of HD-Zip genes in soybean growth and development.
Comparative transcriptome analysis of the Asteraceae halophyte Karelinia caspica under salt stress.
Zhang, Xia; Liao, Maoseng; Chang, Dan; Zhang, Fuchun
2014-12-17
Much attention has been given to the potential of halophytes as sources of tolerance traits for introduction into cereals. However, a great deal remains unknown about the diverse mechanisms employed by halophytes to cope with salinity. To characterize salt tolerance mechanisms underlying Karelinia caspica, an Asteraceae halophyte, we performed Large-scale transcriptomic analysis using a high-throughput Illumina sequencing platform. Comparative gene expression analysis was performed to correlate the effects of salt stress and ABA regulation at the molecular level. Total sequence reads generated by pyrosequencing were assembled into 287,185 non-redundant transcripts with an average length of 652 bp. Using the BLAST function in the Swiss-Prot, NCBI nr, GO, KEGG, and KOG databases, a total of 216,416 coding sequences associated with known proteins were annotated. Among these, 35,533 unigenes were classified into 69 gene ontology categories, and 18,378 unigenes were classified into 202 known pathways. Based on the fold changes observed when comparing the salt stress and control samples, 60,127 unigenes were differentially expressed, with 38,122 and 22,005 up- and down-regulated, respectively. Several of the differentially expressed genes are known to be involved in the signaling pathway of the plant hormone ABA, including ABA metabolism, transport, and sensing as well as the ABA signaling cascade. Transcriptome profiling of K. caspica contribute to a comprehensive understanding of K. caspica at the molecular level. Moreover, the global survey of differentially expressed genes in this species under salt stress and analyses of the effects of salt stress and ABA regulation will contribute to the identification and characterization of genes and molecular mechanisms underlying salt stress responses in Asteraceae plants.
Microarray gene expression profiling using core biopsies of renal neoplasia.
Rogers, Craig G; Ditlev, Jonathon A; Tan, Min-Han; Sugimura, Jun; Qian, Chao-Nan; Cooper, Jeff; Lane, Brian; Jewett, Michael A; Kahnoski, Richard J; Kort, Eric J; Teh, Bin T
2009-01-01
We investigate the feasibility of using microarray gene expression profiling technology to analyze core biopsies of renal tumors for classification of tumor histology. Core biopsies were obtained ex-vivo from 7 renal tumors-comprised of four histological subtypes-following radical nephrectomy using 18-gauge biopsy needles. RNA was isolated from these samples and, in the case of biopsy samples, amplified by in vitro transcription. Microarray analysis was then used to quantify the mRNA expression patterns in these samples relative to non-diseased renal tissue mRNA. Genes with significant variation across all non-biopsy tumor samples were identified, and the relationship between tumor and biopsy samples in terms of expression levels of these genes was then quantified in terms of Euclidean distance, and visualized by complete linkage clustering. Final pathologic assessment of kidney tumors demonstrated clear cell renal cell carcinoma (4), oncocytoma (1), angiomyolipoma (1) and adrenalcortical carcinoma (1). Five of the seven biopsy samples were most similar in terms of gene expression to the resected tumors from which they were derived in terms of Euclidean distance. All seven biopsies were assigned to the correct histological class by hierarchical clustering. We demonstrate the feasibility of gene expression profiling of core biopsies of renal tumors to classify tumor histology.
Microarray gene expression profiling using core biopsies of renal neoplasia
Rogers, Craig G.; Ditlev, Jonathon A.; Tan, Min-Han; Sugimura, Jun; Qian, Chao-Nan; Cooper, Jeff; Lane, Brian; Jewett, Michael A.; Kahnoski, Richard J.; Kort, Eric J.; Teh, Bin T.
2009-01-01
We investigate the feasibility of using microarray gene expression profiling technology to analyze core biopsies of renal tumors for classification of tumor histology. Core biopsies were obtained ex-vivo from 7 renal tumors—comprised of four histological subtypes—following radical nephrectomy using 18-gauge biopsy needles. RNA was isolated from these samples and, in the case of biopsy samples, amplified by in vitro transcription. Microarray analysis was then used to quantify the mRNA expression patterns in these samples relative to non-diseased renal tissue mRNA. Genes with significant variation across all non-biopsy tumor samples were identified, and the relationship between tumor and biopsy samples in terms of expression levels of these genes was then quantified in terms of Euclidean distance, and visualized by complete linkage clustering. Final pathologic assessment of kidney tumors demonstrated clear cell renal cell carcinoma (4), oncocytoma (1), angiomyolipoma (1) and adrenalcortical carcinoma (1). Five of the seven biopsy samples were most similar in terms of gene expression to the resected tumors from which they were derived in terms of Euclidean distance. All seven biopsies were assigned to the correct histological class by hierarchical clustering. We demonstrate the feasibility of gene expression profiling of core biopsies of renal tumors to classify tumor histology. PMID:19966938
Transcriptomic investigation of meat tenderness in two Italian cattle breeds.
Bongiorni, S; Gruber, C E M; Bueno, S; Chillemi, G; Ferrè, F; Failla, S; Moioli, B; Valentini, A
2016-06-01
Our objectives for this study were to understand the biological basis of meat tenderness and to provide an overview of the gene expression profiles related to meat quality as a tool for selection. Through deep mRNA sequencing, we analyzed gene expression in muscle tissues of two Italian cattle breeds: Maremmana and Chianina. We uncovered several differentially expressed genes that encode for proteins belonging to a family of tripartite motif proteins, which are involved in growth, cell differentiation and apoptosis, such as TRIM45, or play an essential role in regulating skeletal muscle differentiation and the regeneration of adult skeletal muscle, such as TRIM32. Other differentially expressed genes (SCN2B, SLC9A7 and KCNK3) emphasize the involvement of potassium-sodium pumps in tender meat. By mapping splice junctions in RNA-Seq reads, we found significant differences in gene isoform expression levels. The PRKAG3 gene, which is involved in the regulation of energy metabolism, showed four isoforms that were differentially expressed. This distinct pattern of PRKAG3 gene expression could indicate impaired glycogen storage in skeletal muscle, and consequently, this gene very likely has a role in the tenderization process. Furthermore, with this deep RNA-sequencing, we captured a high number of expressed SNPs, for example, we found 1462 homozygous SNPs showing the alternative allele with a 100% frequency when comparing tender and tough meat. SNPs were then classified into categories by their position and also by their effect on gene coding (174 non-synonymous polymorphisms) based on the available UMD_3.1 annotations. © 2016 Stichting International Foundation for Animal Genetics.
Learning a Markov Logic network for supervised gene regulatory network inference
2013-01-01
Background Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. Results We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate “regulates”, starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. Conclusions The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge. PMID:24028533
Learning a Markov Logic network for supervised gene regulatory network inference.
Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence
2013-09-12
Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions. The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.
Zaas, Aimee K.; Chen, Minhua; Varkey, Jay; Veldman, Timothy; Hero, Alfred O.; Lucas, Joseph; Huang, Yongsheng; Turner, Ronald; Gilbert, Anthony; Lambkin-Williams, Robert; Øien, N. Christine; Nicholson, Bradly; Kingsmore, Stephen; Carin, Lawrence; Woods, Christopher W.; Ginsburg, Geoffrey S.
2010-01-01
Summary Acute respiratory infections (ARI) are a common reason for seeking medical attention and the threat of pandemic influenza will likely add to these numbers. Using human viral challenge studies with live rhinovirus, respiratory syncytial virus, and influenza A, we developed peripheral blood gene expression signatures that distinguish individuals with symptomatic ARI from uninfected individuals with > 95% accuracy. We validated this “acute respiratory viral” signature - encompassing genes with a known role in host defense against viral infections - across each viral challenge. We also validated the signature in an independently acquired dataset for influenza A and classified infected individuals from healthy controls with 100% accuracy. In the same dataset, we could also distinguish viral from bacterial ARIs (93% accuracy). These results demonstrate that ARIs induce changes in human peripheral blood gene expression that can be used to diagnose a viral etiology of respiratory infection and triage symptomatic individuals. PMID:19664979
Dunne, Philip D.; Alderdice, Matthew; O'Reilly, Paul G.; Roddy, Aideen C.; McCorry, Amy M. B.; Richman, Susan; Maughan, Tim; McDade, Simon S.; Johnston, Patrick G.; Longley, Daniel B.; Kay, Elaine; McArt, Darragh G.; Lawler, Mark
2017-01-01
Stromal-derived intratumoural heterogeneity (ITH) has been shown to undermine molecular stratification of patients into appropriate prognostic/predictive subgroups. Here, using several clinically relevant colorectal cancer (CRC) gene expression signatures, we assessed the susceptibility of these signatures to the confounding effects of ITH using gene expression microarray data obtained from multiple tumour regions of a cohort of 24 patients, including central tumour, the tumour invasive front and lymph node metastasis. Sample clustering alongside correlative assessment revealed variation in the ability of each signature to cluster samples according to patient-of-origin rather than region-of-origin within the multi-region dataset. Signatures focused on cancer-cell intrinsic gene expression were found to produce more clinically useful, patient-centred classifiers, as exemplified by the CRC intrinsic signature (CRIS), which robustly clustered samples by patient-of-origin rather than region-of-origin. These findings highlight the potential of cancer-cell intrinsic signatures to reliably stratify CRC patients by minimising the confounding effects of stromal-derived ITH. PMID:28561046
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
Vallejos, Catalina A.; Marioni, John C.; Richardson, Sylvia
2015-01-01
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach. PMID:26107944
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.
Vallejos, Catalina A; Marioni, John C; Richardson, Sylvia
2015-06-01
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2011-01-01
Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235
Govindaraj, Lekha; Gupta, Tania; Esvaran, Vijaya Gowri; Awasthi, Arvind Kumar; Ponnuvel, Kangayam M
2016-04-01
Sugar transporters play an essential role in controlling carbohydrate transport and are responsible for mediating the movement of sugars into cells. These genes exist as large multigene families within the insect genome. In insects, sugar transporters not only have a role in sugar transport, but may also act as receptors for virus entry. Genome-wide annotation of silkworm Bombyx mori (B. mori) revealed 100 putative sugar transporter (BmST) genes exists as a large multigene family and were classified into 11 sub families, through phylogenetic analysis. Chromosomes 27, 26 and 20 were found to possess the highest number of BmST paralogous genes, harboring 22, 7 and 6 genes, respectively. These genes occurred in clusters exhibiting the phenomenon of tandem gene duplication. The ovary, silk gland, hemocytes, midgut and malphigian tubules were the different tissues/cells enriched with BmST gene expression. The BmST gene BGIBMGA001498 had maximum EST transcripts of 134 and expressed exclusively in the malphigian tubule. The expression of EST transcripts of the BmST clustered genes on chromosome 27 was distributed in various tissues like testis, ovary, silk gland, malphigian tubule, maxillary galea, prothoracic gland, epidermis, fat body and midgut. Three sugar transporter genes (BmST) were constitutively expressed in the susceptible race and were down regulated upon BmNPV infection at 12h post infection (hpi). The expression pattern of these three genes was validated through real-time PCR in the midgut tissues at different time intervals from 0 to 30hpi. In the susceptible B. mori race, expression of sugar transporter genes was constitutively expressed making the host succumb to viral infection. Copyright © 2015 Elsevier B.V. All rights reserved.
Koul, Sweaty; Khandrika, Lakshmipathi; Meacham, Randall B.; Koul, Hari K.
2012-01-01
Nephrolithiasis is a multi-factorial disease which, in the majority of cases, involves the renal deposition of calcium oxalate. Oxalate is a metabolic end product excreted primarily by the kidney. Previous studies have shown that elevated levels of oxalate are detrimental to the renal epithelial cells; however, oxalate renal epithelial cell interactions are not completely understood. In this study, we utilized an unbiased approach of gene expression profiling using Affymetrix HG_U133_plus2 gene chips to understand the global gene expression changes in human renal epithelial cells [HK-2] after exposure to oxalate. We analyzed the expression of 47,000 transcripts and variants, including 38,500 well characterized human genes, in the HK2 cells after 4 hours and 24 hours of oxalate exposure. Gene expression was compared among replicates as per the Affymetrix statistical program. Gene expression among various groups was compared using various analytical tools, and differentially expressed genes were classified according to the Gene Ontology Functional Category. The results from this study show that oxalate exposure induces significant expression changes in many genes. We show for the first time that oxalate exposure induces as well as shuts off genes differentially. We found 750 up-regulated and 2276 down-regulated genes which have not been reported before. Our results also show that renal cells exposed to oxalate results in the regulation of genes that are associated with specific molecular function, biological processes, and other cellular components. In addition we have identified a set of 20 genes that is differentially regulated by oxalate irrespective of duration of exposure and may be useful in monitoring oxalate nephrotoxicity. Taken together our studies profile global gene expression changes and provide a unique insight into oxalate renal cell interactions and oxalate nephrotoxicity. PMID:23028475
Pangeni, Rajendra P; Zhang, Zhou; Alvarez, Angel A; Wan, Xuechao; Sastry, Namratha; Lu, Songjian; Shi, Taiping; Huang, Tianzhi; Lei, Charles X; James, C David; Kessler, John A; Brennan, Cameron W; Nakano, Ichiro; Lu, Xinghua; Hu, Bo; Zhang, Wei; Cheng, Shi-Yuan
2018-06-21
Glioma stem cells (GSCs), a subpopulation of tumor cells, contribute to tumor heterogeneity and therapy resistance. Gene expression profiling classified glioblastoma (GBM) and GSCs into four transcriptomically-defined subtypes. Here, we determined the DNA methylation signatures in transcriptomically pre-classified GSC and GBM bulk tumors subtypes. We hypothesized that these DNA methylation signatures correlate with gene expression and are uniquely associated either with only GSCs or only GBM bulk tumors. Additional methylation signatures may be commonly associated with both GSCs and GBM bulk tumors, i.e., common to non-stem-like and stem-like tumor cell populations and correlating with the clinical prognosis of glioma patients. We analyzed Illumina 450K methylation array and expression data from a panel of 23 patient-derived GSCs. We referenced these results with The Cancer Genome Atlas (TCGA) GBM datasets to generate methylomic and transcriptomic signatures for GSCs and GBM bulk tumors of each transcriptomically pre-defined tumor subtype. Survival analyses were carried out for these signature genes using publicly available datasets, including from TCGA. We report that DNA methylation signatures in proneural and mesenchymal tumor subtypes are either unique to GSCs, unique to GBM bulk tumors, or common to both. Further, dysregulated DNA methylation correlates with gene expression and clinical prognoses. Additionally, many previously identified transcriptionally-regulated markers are also dysregulated due to DNA methylation. The subtype-specific DNA methylation signatures described in this study could be useful for refining GBM sub-classification, improving prognostic accuracy, and making therapeutic decisions.
Expression analysis of genes encoding double B-box zinc finger proteins in maize.
Li, Wenlan; Wang, Jingchao; Sun, Qi; Li, Wencai; Yu, Yanli; Zhao, Meng; Meng, Zhaodong
2017-11-01
The B-box proteins play key roles in plant development. The double B-box (DBB) family is one of the subfamily of the B-box family, with two B-box domains and without a CCT domain. In this study, 12 maize double B-box genes (ZmDBBs) were identified through a genome-wide survey. Phylogenetic analysis of DBB proteins from maize, rice, Sorghum bicolor, Arabidopsis, and poplar classified them into five major clades. Gene duplication analysis indicated that segmental duplications made a large contribution to the expansion of ZmDBBs. Furthermore, a large number of cis-acting regulatory elements related to plant development, response to light and phytohormone were identified in the promoter regions of the ZmDBB genes. The expression patterns of the ZmDBB genes in various tissues and different developmental stages demonstrated that ZmDBBs might play essential roles in plant development, and some ZmDBB genes might have unique function in specific developmental stages. In addition, several ZmDBB genes showed diurnal expression pattern. The expression levels of some ZmDBB genes changed significantly under light/dark treatment conditions and phytohormone treatments, implying that they might participate in light signaling pathway and hormone signaling. Our results will provide new information to better understand the complexity of the DBB gene family in maize.
Yang, Yan; Zhou, Yuan; Chi, Yingjun; Fan, Baofang; Chen, Zhixiang
2017-12-19
WRKY proteins are a superfamily of plant transcription factors with important roles in plants. WRKY proteins have been extensively analyzed in plant species including Arabidopsis and rice. Here we report characterization of soybean WRKY gene family and their functional analysis in resistance to soybean cyst nematode (SCN), the most important soybean pathogen. Through search of the soybean genome, we identified 174 genes encoding WRKY proteins that can be classified into seven groups as established in other plants. WRKY variants including a WRKY-related protein unique to legumes have also been identified. Expression analysis reveals both diverse expression patterns in different soybean tissues and preferential expression of specific WRKY groups in certain tissues. Furthermore, a large number of soybean WRKY genes were responsive to salicylic acid. To identify soybean WRKY genes that promote soybean resistance to SCN, we first screened soybean WRKY genes for enhancing SCN resistance when over-expressed in transgenic soybean hairy roots. To confirm the results, we transformed five WRKY genes into a SCN-susceptible soybean cultivar and generated transgenic soybean lines. Transgenic soybean lines overexpressing three WRKY transgenes displayed increased resistance to SCN. Thus, WRKY genes could be explored to develop new soybean cultivars with enhanced resistance to SCN.
Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou
2013-08-01
Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.
Guo, Zhenhua; Adomas, Aleksandra B; Jackson, Erin D; Qin, Hong; Townsend, Jeffrey P
2011-06-01
We investigated the mechanism underlying the natural variation in longevity within natural populations using the model budding yeast, Saccharomyces cerevisiae. We analyzed whole-genome gene expression in four progeny of a natural S. cerevisiae strain that display differential replicative aging. Genes with different expression levels in short- and long-lived strains were classified disproportionately into metabolism, transport, development, transcription or cell cycle, and organelle organization (mitochondrial, chromosomal, and cytoskeletal). With several independent validating experiments, we detected 15 genes with consistent differential expression levels between the long- and the short-lived progeny. Among those 15, SIR2, HSP30, and TIM17 were upregulated in long-lived strains, which is consistent with the known effects of gene silencing, stress response, and mitochondrial function on aging. The link between SIR2 and yeast natural life span variation offers some intriguing ties to the allelic association of the human homolog SIRT1 to visceral obesity and metabolic response to lifestyle intervention. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Wang, Meng; Xu, Zongchang; Ding, Anming; Kong, Yingzhen
2018-05-24
Xyloglucan endotransglucosylase/hydrolase genes ( XTHs ) encode enzymes required for the reconstruction and modification of xyloglucan backbones, which will result in changes of cell wall extensibility during growth. A total of 56 NtXTH genes were identified from common tobacco, and 50 cDNA fragments were verified by PCR amplification. The 56 NtXTH genes could be classified into two subfamilies: Group I/II and Group III according to their phylogenetic relationships. The gene structure, chromosomal localization, conserved protein domains prediction, sub-cellular localization of NtXTH proteins and evolutionary relationships among Nicotiana tabacum , Nicotiana sylvestrisis , Nicotiana tomentosiformis , Arabidopsis , and rice were also analyzed. The NtXTHs expression profiles analyzed by the TobEA database and qRT-PCR revealed that NtXTHs display different expression patterns in different tissues. Notably, the expression patterns of 12 NtXTHs responding to environment stresses, including salinity, alkali, heat, chilling, and plant hormones, including IAA and brassinolide, were characterized. All the results would be useful for the function study of NtXTHs during different growth cycles and stresses.
On the statistical assessment of classifiers using DNA microarray data
Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F
2006-01-01
Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. Conclusions The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. PMID:16919171
Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions
Kluger, Yuval; Basri, Ronen; Chang, Joseph T.; Gerstein, Mark
2003-01-01
Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find “marker genes” that are differentially expressed in particular sets of “conditions.” We have developed a method that simultaneously clusters genes and conditions, finding distinctive “checkerboard” patterns in matrices of gene expression data, if they exist. In a cancer context, these checkerboards correspond to genes that are markedly up- or downregulated in patients with particular types of tumors. Our method, spectral biclustering, is based on the observation that checkerboard structures in matrices of expression data can be found in eigenvectors corresponding to characteristic expression patterns across genes or conditions. In addition, these eigenvectors can be readily identified by commonly used linear algebra approaches, in particular the singular value decomposition (SVD), coupled with closely integrated normalization steps. We present a number of variants of the approach, depending on whether the normalization over genes and conditions is done independently or in a coupled fashion. We then apply spectral biclustering to a selection of publicly available cancer expression data sets, and examine the degree to which the approach is able to identify checkerboard structures. Furthermore, we compare the performance of our biclustering methods against a number of reasonable benchmarks (e.g., direct application of SVD or normalized cuts to raw data). PMID:12671006
Karanjalker, G R; Ravishankar, K V; Shivashankara, K S; Dinesh, M R; Roy, T K; Sudhakar Rao, D V
2018-01-01
Mango (Mangiferaindica L.) fruits are generally classified based on peel color into green, yellow, and red types. Mango peel turns from green to yellow or red or retain green colors during ripening. The carotenoids and anthocyanins are the important pigments responsible for the colors of fruits. In the present study, peels of different colored cultivars at three ripening stages were characterized for pigments, colors, and gene expression analysis. The yellow colored cultivar "Arka Anmol" showed higher carotenoid content, wherein β-carotene followed by violaxanthin were the major carotenoid compounds that increased during ripening. The red colored cultivars were characterized with higher anthocyanins with cyanidin-3-O-monoglucosides and peonidin-3-O-glucosides as the major anthocyanins. The gene expression analysis by qRT-PCR showed the higher expression of carotenoid biosynthetic genes viz. lycopene-β-cyclase and violaxanthin-de-epoxidase in yellow colored cv. Arka Anmol, and the expression was found to increase during ripening. However, in red colored cv. "Janardhan Pasand," there is increased regulation of all anthocyanin biosynthetic genes including transcription factors MYB and basic helix loop. This indicated the regulation of the anthocyanins by these genes in red mango peel. The results showed that the accumulation pattern of particular pigments and higher expression of specific biosynthetic genes in mango peel impart different colors.
Wang, Hai-Tao; Kong, Jian-Ping; Ding, Fang; Wang, Xiu-Qin; Wang, Ming-Rong; Liu, Lian-Xin; Wu, Min; Liu, Zhi-Hua
2003-01-01
AIM: To obtain human esophageal cancer cell EC9706 stably expressed epithelial membrane protein-1 (EMP-1) with integrated eukaryotic plasmid harboring the open reading frame (ORF) of human EMP-1, and then to study the mechanism by which EMP-1 exerts its diverse cellular action on cell proliferation and altered gene profile by exploring the effect of EMP-1. METHODS: The authors first constructed pcDNA3.1/myc-his expression vector harboring the ORF of EMP-1 and then transfected it into human esophageal carcinoma cell line EC9706. The positive clones were analyzed by Western blot and RT-PCR. Moreover, the cell growth curve was observed and the cell cycle was checked by FACS technique. Using cDNA microarray technology, the authors compared the gene expression pattern in positive clones with control. To confirm the gene expression profile, semi-quantitative RT-PCR was carried out for 4 of the randomly picked differentially expressed genes. For those differentially expressed genes, classification was performed according to their function and cellular component. RESULTS: Human EMP-1 gene can be stably expressed in EC9706 cell line transfected with human EMP-1. The authors found the cell growth decreased, among which S phase was arrested and G1 phase was prolonged in the transfected positive clones. By cDNA microarray analysis, 35 genes showed an over 2.0 fold change in expression level after transfection, with 28 genes being consistently up-regulated and 7 genes being down-regulated. Among the classified genes, almost half of the induced genes (13 out of 28 genes) were related to cell signaling, cell communication and particularly to adhesion. CONCLUSION: Overexpression of human EMP-1 gene can inhibit the proliferation of EC9706 cell with S phase arrested and G1 phase prolonged. The cDNA microarray analysis suggested that EMP-1 may be one of regulators involved in cell signaling, cell communication and adhesion regulators. PMID:12632483
Wang, Hai-Tao; Kong, Jian-Ping; Ding, Fang; Wang, Xiu-Qin; Wang, Ming-Rong; Liu, Lian-Xin; Wu, Min; Liu, Zhi-Hua
2003-03-01
To obtain human esophageal cancer cell EC9706 stably expressed epithelial membrane protein-1 (EMP-1) with integrated eukaryotic plasmid harboring the open reading frame (ORF) of human EMP-1, and then to study the mechanism by which EMP-1 exerts its diverse cellular action on cell proliferation and altered gene profile by exploring the effect of EMP-1. The authors first constructed pcDNA3.1/myc-his expression vector harboring the ORF of EMP-1 and then transfected it into human esophageal carcinoma cell line EC9706. The positive clones were analyzed by Western blot and RT-PCR. Moreover, the cell growth curve was observed and the cell cycle was checked by FACS technique. Using cDNA microarray technology, the authors compared the gene expression pattern in positive clones with control. To confirm the gene expression profile, semi-quantitative RT-PCR was carried out for 4 of the randomly picked differentially expressed genes. For those differentially expressed genes, classification was performed according to their function and cellular component. Human EMP-1 gene can be stably expressed in EC9706 cell line transfected with human EMP-1. The authors found the cell growth decreased, among which S phase was arrested and G1 phase was prolonged in the transfected positive clones. By cDNA microarray analysis, 35 genes showed an over 2.0 fold change in expression level after transfection, with 28 genes being consistently up-regulated and 7 genes being down-regulated. Among the classified genes, almost half of the induced genes (13 out of 28 genes) were related to cell signaling, cell communication and particularly to adhesion. Overexpression of human EMP-1 gene can inhibit the proliferation of EC9706 cell with S phase arrested and G1 phase prolonged. The cDNA microarray analysis suggested that EMP-1 may be one of regulators involved in cell signaling, cell communication and adhesion regulators.
Goldman, Gustavo H.; dos Reis Marques, Everaldo; Custódio Duarte Ribeiro, Diógenes; Ângelo de Souza Bernardes, Luciano; Quiapin, Andréa Carla; Vitorelli, Patrícia Marostica; Savoldi, Marcela; Semighini, Camile P.; de Oliveira, Regina C.; Nunes, Luiz R.; Travassos, Luiz R.; Puccia, Rosana; Batista, Wagner L.; Ferreira, Leslie Ecker; Moreira, Júlio C.; Bogossian, Ana Paula; Tekaia, Fredj; Nobrega, Marina Pasetto; Nobrega, Francisco G.; Goldman, Maria Helena S.
2003-01-01
Paracoccidioides brasiliensis, a thermodimorphic fungus, is the causative agent of the prevalent systemic mycosis in Latin America, paracoccidioidomycosis. We present here a survey of expressed genes in the yeast pathogenic phase of P. brasiliensis. We obtained 13,490 expressed sequence tags from both 5′ and 3′ ends. Clustering analysis yielded the partial sequences of 4,692 expressed genes that were functionally classified by similarity to known genes. We have identified several Candida albicans virulence and pathogenicity homologues in P. brasiliensis. Furthermore, we have analyzed the expression of some of these genes during the dimorphic yeast-mycelium-yeast transition by real-time quantitative reverse transcription-PCR. Clustering analysis of the mycelium-yeast transition revealed three groups: (i) RBT, hydrophobin, and isocitrate lyase; (ii) malate dehydrogenase, contigs Pb1067 and Pb1145, GPI, and alternative oxidase; and (iii) ubiquitin, delta-9-desaturase, HSP70, HSP82, and HSP104. The first two groups displayed high mRNA expression in the mycelial phase, whereas the third group showed higher mRNA expression in the yeast phase. Our results suggest the possible conservation of pathogenicity and virulence mechanisms among fungi, expand considerably gene identification in P. brasiliensis, and provide a broader basis for further progress in understanding its biological peculiarities. PMID:12582121
USDA-ARS?s Scientific Manuscript database
A feruloyl esterase (FAE) gene was isolated from a rumen microbial metagenome, cloned into E. coli, and expressed in active form. The enzyme (RuFae4) was classified as a Type D feruloyl esterase based on its action on synthetic substrates and ability to release diferulates. The RuFae4 alone releas...
Wang, Hao; Yin, Xiangjing; Li, Xiaoqin; Wang, Li; Zheng, Yi; Xu, Xiaozhao; Zhang, Yucheng; Wang, Xiping
2014-01-01
Plant zinc finger-homeodomain (ZHD) genes encode a family of transcription factors that have been demonstrated to play an important role in the regulation of plant growth and development. In this study, we identified a total of 13 ZHD genes (VvZHD) in the grape genome that were further classified into at least seven groups. Genome synteny analysis revealed that a number of VvZHD genes were present in the corresponding syntenic blocks of Arabidopsis, indicating that they arose before the divergence of these two species. Gene expression analysis showed that the identified VvZHD genes displayed distinct spatiotemporal expression patterns, and were differentially regulated under various stress conditions and hormone treatments, suggesting that the grape VvZHDs might be also involved in plant response to a variety of biotic and abiotic insults. Our work provides insightful information and knowledge about the ZHD genes in grape, which provides a framework for further characterization of their roles in regulation of stress tolerance as well as other aspects of grape productivity. PMID:24705465
2013-01-01
Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960
Huang, Hung-Chung; Jupiter, Daniel; VanBuren, Vincent
2010-01-01
Background Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. Methodology/Principal Findings In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as ‘brain group’ and ‘non-brain group’; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. Conclusions/Significance The methodology employed here may be used to facilitate disease-specific biomarker discovery. PMID:20140228
Li, Wen; Li, Deng-Di; Han, Li-Hong; Tao, Miao; Hu, Qian-Qian; Wu, Wen-Ying; Zhang, Jing-Bo; Li, Xue-Bao; Huang, Geng-Qing
2017-08-31
TCP proteins are plant-specific transcription factors (TFs), and perform a variety of physiological functions in plant growth and development. In this study, 74 non-redundant TCP genes were identified in upland cotton (Gossypium hirsutum L.) genome. Cotton TCP family can be classified into two classes (class I and class II) that can be further divided into 11 types (groups) based on their motif composition. Quantitative RT-PCR analysis indicated that GhTCPs display different expression patterns in cotton tissues. The majority of these genes are preferentially or specifically expressed in cotton leaves, while some GhTCP genes are highly expressed in initiating fibers and/or elongating fibers of cotton. Yeast two-hybrid results indicated that GhTCPs can interact with each other to form homodimers or heterodimers. In addition, GhTCP14a and GhTCP22 can interact with some transcription factors which are involved in fiber development. These results lay solid foundation for further study on the functions of TCP genes during cotton fiber development.
Reverse engineering of gene regulatory networks.
Cho, K H; Choo, S M; Jung, S H; Kim, J R; Choi, H S; Kim, J
2007-05-01
Systems biology is a multi-disciplinary approach to the study of the interactions of various cellular mechanisms and cellular components. Owing to the development of new technologies that simultaneously measure the expression of genetic information, systems biological studies involving gene interactions are increasingly prominent. In this regard, reconstructing gene regulatory networks (GRNs) forms the basis for the dynamical analysis of gene interactions and related effects on cellular control pathways. Various approaches of inferring GRNs from gene expression profiles and biological information, including machine learning approaches, have been reviewed, with a brief introduction of DNA microarray experiments as typical tools for measuring levels of messenger ribonucleic acid (mRNA) expression. In particular, the inference methods are classified according to the required input information, and the main idea of each method is elucidated by comparing its advantages and disadvantages with respect to the other methods. In addition, recent developments in this field are introduced and discussions on the challenges and opportunities for future research are provided.
Sakaki, Mizuho; Ebihara, Yukiko; Okamura, Kohji; Nakabayashi, Kazuhiko; Igarashi, Arisa; Matsumoto, Kenji; Hata, Kenichiro; Kobayashi, Yoshiro
2017-01-01
Cellular senescence is classified into two groups: replicative and premature senescence. Gene expression and epigenetic changes are reported to differ between these two groups and cell types. Normal human diploid fibroblast TIG-3 cells have often been used in cellular senescence research; however, their epigenetic profiles are still not fully understood. To elucidate how cellular senescence is epigenetically regulated in TIG-3 cells, we analyzed the gene expression and DNA methylation profiles of three types of senescent cells, namely, replicatively senescent, ras-induced senescent (RIS), and non-permissive temperature-induced senescent SVts8 cells, using gene expression and DNA methylation microarrays. The expression of genes involved in the cell cycle and immune response was commonly either down- or up-regulated in the three types of senescent cells, respectively. The altered DNA methylation patterns were observed in replicatively senescent cells, but not in prematurely senescent cells. Interestingly, hypomethylated CpG sites detected on non-CpG island regions (“open sea”) were enriched in immune response-related genes that had non-CpG island promoters. The integrated analysis of gene expression and methylation in replicatively senescent cells demonstrated that differentially expressed 867 genes, including cell cycle- and immune response-related genes, were associated with DNA methylation changes in CpG sites close to the transcription start sites (TSSs). Furthermore, several miRNAs regulated in part through DNA methylation were found to affect the expression of their targeted genes. Taken together, these results indicate that the epigenetic changes of DNA methylation regulate the expression of a certain portion of genes and partly contribute to the introduction and establishment of replicative senescence. PMID:28158250
Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar
2016-01-01
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311
Song, Aiping; Li, Peiling; Xin, Jingjing; Chen, Sumei; Zhao, Kunkun; Wu, Dan; Fan, Qingqing; Gao, Tianwei; Chen, Fadi; Guan, Zhiyong
2016-01-01
The homeodomain-leucine zipper (HD-Zip) transcription factor family is a key transcription factor family and unique to the plant kingdom. It consists of a homeodomain and a leucine zipper that serve in combination as a dimerization motif. The family can be classified into four subfamilies, and these subfamilies participate in the development of hormones and mediation of hormone action and are involved in plant responses to environmental conditions. However, limited information on this gene family is available for the important chrysanthemum ornamental species (Chrysanthemum morifolium). Here, we characterized 17 chrysanthemum HD-Zip genes based on transcriptome sequences. Phylogenetic analyses revealed that 17 CmHB genes were distributed in the HD-Zip subfamilies I and II and identified two pairs of putative orthologous proteins in Arabidopsis and chrysanthemum and four pairs of paralogous proteins in chrysanthemum. The software MEME was used to identify 7 putative motifs with E values less than 1e-3 in the chrysanthemum HD-Zip factors, and they can be clearly classified into two groups based on the composition of the motifs. A bioinformatics analysis predicted that 8 CmHB genes could be targeted by 10 miRNA families, and the expression of these 17 genes in response to phytohormone treatments and abiotic stresses was characterized. The results presented here will promote research on the various functions of the HD-Zip gene family members in plant hormones and stress responses. PMID:27196930
Genome-wide analysis of WRKY gene family in Cucumis sativus
2011-01-01
Background WRKY proteins are a large family of transcriptional regulators in higher plant. They are involved in many biological processes, such as plant development, metabolism, and responses to biotic and abiotic stresses. Prior to the present study, only one full-length cucumber WRKY protein had been reported. The recent publication of the draft genome sequence of cucumber allowed us to conduct a genome-wide search for cucumber WRKY proteins, and to compare these positively identified proteins with their homologs in model plants, such as Arabidopsis. Results We identified a total of 55 WRKY genes in the cucumber genome. According to structural features of their encoded proteins, the cucumber WRKY (CsWRKY) genes were classified into three groups (group 1-3). Analysis of expression profiles of CsWRKY genes indicated that 48 WRKY genes display differential expression either in their transcript abundance or in their expression patterns under normal growth conditions, and 23 WRKY genes were differentially expressed in response to at least one abiotic stresses (cold, drought or salinity). The expression profile of stress-inducible CsWRKY genes were correlated with those of their putative Arabidopsis WRKY (AtWRKY) orthologs, except for the group 3 WRKY genes. Interestingly, duplicated group 3 AtWRKY genes appear to have been under positive selection pressure during evolution. In contrast, there was no evidence of recent gene duplication or positive selection pressure among CsWRKY group 3 genes, which may have led to the expressional divergence of group 3 orthologs. Conclusions Fifty-five WRKY genes were identified in cucumber and the structure of their encoded proteins, their expression, and their evolution were examined. Considering that there has been extensive expansion of group 3 WRKY genes in angiosperms, the occurrence of different evolutionary events could explain the functional divergence of these genes. PMID:21955985
Genome-wide analysis of WRKY gene family in Cucumis sativus.
Ling, Jian; Jiang, Weijie; Zhang, Ying; Yu, Hongjun; Mao, Zhenchuan; Gu, Xingfang; Huang, Sanwen; Xie, Bingyan
2011-09-28
WRKY proteins are a large family of transcriptional regulators in higher plant. They are involved in many biological processes, such as plant development, metabolism, and responses to biotic and abiotic stresses. Prior to the present study, only one full-length cucumber WRKY protein had been reported. The recent publication of the draft genome sequence of cucumber allowed us to conduct a genome-wide search for cucumber WRKY proteins, and to compare these positively identified proteins with their homologs in model plants, such as Arabidopsis. We identified a total of 55 WRKY genes in the cucumber genome. According to structural features of their encoded proteins, the cucumber WRKY (CsWRKY) genes were classified into three groups (group 1-3). Analysis of expression profiles of CsWRKY genes indicated that 48 WRKY genes display differential expression either in their transcript abundance or in their expression patterns under normal growth conditions, and 23 WRKY genes were differentially expressed in response to at least one abiotic stresses (cold, drought or salinity). The expression profile of stress-inducible CsWRKY genes were correlated with those of their putative Arabidopsis WRKY (AtWRKY) orthologs, except for the group 3 WRKY genes. Interestingly, duplicated group 3 AtWRKY genes appear to have been under positive selection pressure during evolution. In contrast, there was no evidence of recent gene duplication or positive selection pressure among CsWRKY group 3 genes, which may have led to the expressional divergence of group 3 orthologs. Fifty-five WRKY genes were identified in cucumber and the structure of their encoded proteins, their expression, and their evolution were examined. Considering that there has been extensive expansion of group 3 WRKY genes in angiosperms, the occurrence of different evolutionary events could explain the functional divergence of these genes.
Genome-wide identification of the SWEET gene family in wheat.
Gao, Yue; Wang, Zi Yuan; Kumar, Vikranth; Xu, Xiao Feng; Yuan, De Peng; Zhu, Xiao Feng; Li, Tian Ya; Jia, Baolei; Xuan, Yuan Hu
2018-02-05
The SWEET (sugars will eventually be exported transporter) family is a newly characterized group of sugar transporters. In plants, the key roles of SWEETs in phloem transport, nectar secretion, pollen nutrition, stress tolerance, and plant-pathogen interactions have been identified. SWEET family genes have been characterized in many plant species, but a comprehensive analysis of SWEET members has not yet been performed in wheat. Here, 59 wheat SWEETs (hereafter TaSWEETs) were identified through homology searches. Analyses of phylogenetic relationships, numbers of transmembrane helices (TMHs), gene structures, and motifs showed that TaSWEETs carrying 3-7 TMHs could be classified into four clades with 10 different types of motifs. Examination of the expression patterns of 18 SWEET genes revealed that a few are tissue-specific while most are ubiquitously expressed. In addition, the stem rust-mediated expression patterns of SWEET genes were monitored using a stem rust-susceptible cultivar, 'Little Club' (LC). The resulting data showed that the expression of five out of the 18 SWEETs tested was induced following inoculation. In conclusion, we provide the first comprehensive analysis of the wheat SWEET gene family. Information regarding the phylogenetic relationships, gene structures, and expression profiles of SWEET genes in different tissues and following stem rust disease inoculation will be useful in identifying the potential roles of SWEETs in specific developmental and pathogenic processes. Copyright © 2017 Elsevier B.V. All rights reserved.
Lim, Pek Siew; Hardy, Kristine; Peng, Kaiman; Shannon, Frances M
2016-03-01
T cell activation involves the recognition of a foreign antigen complexed to the major histocompatibility complex on the antigen presenting T cell to the T cell receptor. This leads to activation of signaling pathways, which ultimately leads to induction of key cytokine genes responsible for eradication of foreign antigens. We used the mouse EL4 T cell as a model system to study genes that are induced as a result of T cell activation using phorbol myristate acetate (PMA) and calcium ionomycin (I) as stimuli. We were also interested to examine the importance of new protein synthesis in regulating the expression of genes involved in T cell activation. Thus we have pre-treated mouse EL4 T cells with cycloheximide, a protein synthesis inhibitor, and left the cells unstimulated or stimulated with PMA/I for 4 h. We performed microarray expression profiling of these cells to correlate the gene expression with chromatin state of T cells upon T cell activation [1]. Here, we detail further information and analysis of the microarray data, which shows that T cell activation leads to differential expression of genes and inducible genes can be further classified as primary and secondary response genes based on their protein synthesis dependency. The data is available in the Gene Expression Omnibus under accession number GSE13278.
Ornostay, Anna; Cowie, Andrew M; Hindle, Matthew; Baker, Christopher J O; Martyniuk, Christopher J
2013-12-01
The herbicide linuron (LIN) is an endocrine disruptor with an anti-androgenic mode of action. The objectives of this study were to (1) improve knowledge of androgen and anti-androgen signaling in the teleostean ovary and to (2) assess the ability of gene networks and machine learning to classify LIN as an anti-androgen using transcriptomic data. Ovarian explants from vitellogenic fathead minnows (FHMs) were exposed to three concentrations of either 5α-dihydrotestosterone (DHT), flutamide (FLUT), or LIN for 12h. Ovaries exposed to DHT showed a significant increase in 17β-estradiol (E2) production while FLUT and LIN had no effect on E2. To improve understanding of androgen receptor signaling in the ovary, a reciprocal gene expression network was constructed for DHT and FLUT using pathway analysis and these data suggested that steroid metabolism, translation, and DNA replication are processes regulated through AR signaling in the ovary. Sub-network enrichment analysis revealed that FLUT and LIN shared more regulated gene networks in common compared to DHT. Using transcriptomic datasets from different fish species, machine learning algorithms classified LIN successfully with other anti-androgens. This study advances knowledge regarding molecular signaling cascades in the ovary that are responsive to androgens and anti-androgens and provides proof of concept that gene network analysis and machine learning can classify priority chemicals using experimental transcriptomic data collected from different fish species. © 2013.
Case-based retrieval framework for gene expression data.
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J
2015-01-01
The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children's Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.
Nakaoka, Hirofumi; Tajima, Atsushi; Yoneyama, Taku; Hosomichi, Kazuyoshi; Kasuya, Hidetoshi; Mizutani, Tohru; Inoue, Ituro
2014-08-01
The rupture of intracranial aneurysm (IA) causes subarachnoid hemorrhage associated with high morbidity and mortality. We compared gene expression profiles in aneurysmal domes between unruptured IAs and ruptured IAs (RIAs) to elucidate biological mechanisms predisposing to the rupture of IA. We determined gene expression levels of 8 RIAs, 5 unruptured IAs, and 10 superficial temporal arteries with the Agilent microarrays. To explore biological heterogeneity of IAs, we classified the samples into subgroups showing similar gene expression patterns, using clustering methods. The clustering analysis identified 4 groups: superficial temporal arteries and unruptured IAs were aggregated into their own clusters, whereas RIAs segregated into 2 distinct subgroups (early and late RIAs). Comparing gene expression levels between early RIAs and unruptured IAs, we identified 430 upregulated and 617 downregulated genes in early RIAs. The upregulated genes were associated with inflammatory and immune responses and phagocytosis including S100/calgranulin genes (S100A8, S100A9, and S100A12). The downregulated genes suggest mechanical weakness of aneurysm walls. The expressions of Krüppel-like family of transcription factors (KLF2, KLF12, and KLF15), which were anti-inflammatory regulators, and CDKN2A, which was located on chromosome 9p21 that was the most consistently replicated locus in genome-wide association studies of IA, were also downregulated. We demonstrate that gene expression patterns of RIAs were different according to the age of patients. The results suggest that macrophage-mediated inflammation is a key biological pathway for IA rupture. The identified genes can be good candidates for molecular markers of rupture-prone IAs and therapeutic targets. © 2014 American Heart Association, Inc.
Feng, Juerong; Zhou, Rui; Chang, Ying; Liu, Jing; Zhao, Qiu
2017-01-01
Hepatocellular carcinoma (HCC) has a high incidence and mortality worldwide, and its carcinogenesis and progression are influenced by a complex network of gene interactions. A weighted gene co-expression network was constructed to identify gene modules associated with the clinical traits in HCC (n = 214). Among the 13 modules, high correlation was only found between the red module and metastasis risk (classified by the HCC metastasis gene signature) (R2 = −0.74). Moreover, in the red module, 34 network hub genes for metastasis risk were identified, six of which (ABAT, AGXT, ALDH6A1, CYP4A11, DAO and EHHADH) were also hub nodes in the protein-protein interaction network of the module genes. Thus, a total of six hub genes were identified. In validation, all hub genes showed a negative correlation with the four-stage HCC progression (P for trend < 0.05) in the test set. Furthermore, in the training set, HCC samples with any hub gene lowly expressed demonstrated a higher recurrence rate and poorer survival rate (hazard ratios with 95% confidence intervals > 1). RNA-sequencing data of 142 HCC samples showed consistent results in the prognosis. Gene set enrichment analysis (GSEA) demonstrated that in the samples with any hub gene highly expressed, a total of 24 functional gene sets were enriched, most of which focused on amino acid metabolism and oxidation. In conclusion, co-expression network analysis identified six hub genes in association with HCC metastasis risk and prognosis, which might improve the prognosis by influencing amino acid metabolism and oxidation. PMID:28430663
Molecular profiling identifies prognostic markers of stage IA lung adenocarcinoma.
Zhang, Jie; Shao, Jinchen; Zhu, Lei; Zhao, Ruiying; Xing, Jie; Wang, Jun; Guo, Xiaohui; Tu, Shichun; Han, Baohui; Yu, Keke
2017-09-26
We previously showed that different pathologic subtypes were associated with different prognostic values in patients with stage IA lung adenocarcinoma (AC). We hypothesize that differential gene expression profiles of different subtypes may be valuable factors for prognosis in stage IA lung adenocarcinoma. We performed microarray gene expression profiling on tumor tissues micro-dissected from patients with acinar and solid predominant subtypes of stage IA lung adenocarcinoma. These patients had undergone a lobectomy and mediastinal lymph node dissection at the Shanghai Chest Hospital, Shanghai, China in 2012. No patient had preoperative treatment. We performed the Gene Set Enrichment Analysis (GSEA) analysis to look for gene expression signatures associated with tumor subtypes. The histologic subtypes of all patients were classified according to the 2015 WHO lung Adenocarcinoma classification. We found that patients with the solid predominant subtype are enriched for genes involved in RNA polymerase activity as well as inactivation of the p53 pathway. Further, we identified a list of genes that may serve as prognostic markers for stage IA lung adenocarcinoma. Validation in the TCGA database shows that these genes are correlated with survival, suggesting that they are novel prognostic factors for stage IA lung adenocarcinoma. In conclusion, we have uncovered novel prognostic factors for stage IA lung adenocarcinoma using gene expression profiling in combination with histopathology subtyping.
Shah, Syed Tariq; Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Arain, Saima; Yu, Shuxun
2013-12-01
NAC (NAM, ATAF, and CUC) is a plant-specific transcription factor family with diverse roles in plant development and stress regulation. In this report, stress-responsive NAC genes (GhNAC8-GhNAC17) isolated from cotton (Gossypium hirsutum L.) were characterised in the context of leaf senescence and stress tolerance. The characterisation of NAC genes during leaf senescence has not yet been reported for cotton. Based on the sequence characterisation, these GhNACs could be classified into three groups belonging to three known NAC sub-families. Their predicted amino acid sequences exhibited similarities to NAC genes from other plant species. Senescent leaves were the sites of maximum expression for all GhNAC genes except GhNAC10 and GhNAC13, which showed maximum expression in fibres, collected from 25 days post anthesis (DPA) plants. The ten GhNAC genes displayed differential expression patterns and levels during natural and induced leaf senescence. Quantitative RT-PCR and promoter analyses suggest that these genes are induced by ABA, ethylene, drought, salinity, cold, heat, and other hormonal treatments. These results support a role for cotton GhNAC genes in transcriptional regulation of leaf senescence, stress tolerance and other developmental stages of cotton. © 2013.
Yu, Liying; Tang, Weiqi; He, Weiyi; Ma, Xiaoli; Vasseur, Liette; Baxter, Simon W; Yang, Guang; Huang, Shiguo; Song, Fengqin; You, Minsheng
2015-03-10
Cytochrome P450 monooxygenases are present in almost all organisms and can play vital roles in hormone regulation, metabolism of xenobiotics and in biosynthesis or inactivation of endogenous compounds. In the present study, a genome-wide approach was used to identify and analyze the P450 gene family of diamondback moth, Plutella xylostella, a destructive worldwide pest of cruciferous crops. We identified 85 putative cytochrome P450 genes from the P. xylostella genome, including 84 functional genes and 1 pseudogene. These genes were classified into 26 families and 52 subfamilies. A phylogenetic tree constructed with three additional insect species shows extensive gene expansions of P. xylostella P450 genes from clans 3 and 4. Gene expression of cytochrome P450s was quantified across multiple developmental stages (egg, larva, pupa and adult) and tissues (head and midgut) using P. xylostella strains susceptible or resistant to insecticides chlorpyrifos and fiprinol. Expression of the lepidopteran specific CYP367s predominantly occurred in head tissue suggesting a role in either olfaction or detoxification. CYP340s with abundant transposable elements and relatively high expression in the midgut probably contribute to the detoxification of insecticides or plant toxins in P. xylostella. This study will facilitate future functional studies of the P. xylostella P450s in detoxification.
Yu, Liying; Tang, Weiqi; He, Weiyi; Ma, Xiaoli; Vasseur, Liette; Baxter, Simon W.; Yang, Guang; Huang, Shiguo; Song, Fengqin; You, Minsheng
2015-01-01
Cytochrome P450 monooxygenases are present in almost all organisms and can play vital roles in hormone regulation, metabolism of xenobiotics and in biosynthesis or inactivation of endogenous compounds. In the present study, a genome-wide approach was used to identify and analyze the P450 gene family of diamondback moth, Plutella xylostella, a destructive worldwide pest of cruciferous crops. We identified 85 putative cytochrome P450 genes from the P. xylostella genome, including 84 functional genes and 1 pseudogene. These genes were classified into 26 families and 52 subfamilies. A phylogenetic tree constructed with three additional insect species shows extensive gene expansions of P. xylostella P450 genes from clans 3 and 4. Gene expression of cytochrome P450s was quantified across multiple developmental stages (egg, larva, pupa and adult) and tissues (head and midgut) using P. xylostella strains susceptible or resistant to insecticides chlorpyrifos and fiprinol. Expression of the lepidopteran specific CYP367s predominantly occurred in head tissue suggesting a role in either olfaction or detoxification. CYP340s with abundant transposable elements and relatively high expression in the midgut probably contribute to the detoxification of insecticides or plant toxins in P. xylostella. This study will facilitate future functional studies of the P. xylostella P450s in detoxification. PMID:25752830
Zhang, Ao; Tian, Suyan
2018-05-01
Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ferreira Filho, Jaire Alves; Horta, Maria Augusta Crivelente; Beloti, Lilian Luzia; Dos Santos, Clelton Aparecido; de Souza, Anete Pereira
2017-10-12
Trichoderma harzianum is used in biotechnology applications due to its ability to produce powerful enzymes for the conversion of lignocellulosic substrates into soluble sugars. Active enzymes involved in carbohydrate metabolism are defined as carbohydrate-active enzymes (CAZymes), and the most abundant family in the CAZy database is the glycoside hydrolases. The enzymes of this family play a fundamental role in the decomposition of plant biomass. In this study, the CAZymes of T. harzianum were identified and classified using bioinformatic approaches after which the expression profiles of all annotated CAZymes were assessed via RNA-Seq, and a phylogenetic analysis was performed. A total of 430 CAZymes (3.7% of the total proteins for this organism) were annotated in T. harzianum, including 259 glycoside hydrolases (GHs), 101 glycosyl transferases (GTs), 6 polysaccharide lyases (PLs), 22 carbohydrate esterases (CEs), 42 auxiliary activities (AAs) and 46 carbohydrate-binding modules (CBMs). Among the identified T. harzianum CAZymes, 47% were predicted to harbor a signal peptide sequence and were therefore classified as secreted proteins. The GH families were the CAZyme class with the greatest number of expressed genes, including GH18 (23 genes), GH3 (17 genes), GH16 (16 genes), GH2 (13 genes) and GH5 (12 genes). A phylogenetic analysis of the proteins in the AA9/GH61, CE5 and GH55 families showed high functional variation among the proteins. Identifying the main proteins used by T. harzianum for biomass degradation can ensure new advances in the biofuel production field. Herein, we annotated and characterized the expression levels of all of the CAZymes from T. harzianum, which may contribute to future studies focusing on the functional and structural characterization of the identified proteins.
Yin, Rui; Zhao, Mingzhu; Wang, Kangyu; Lin, Yanping; Wang, Yanfang; Sun, Chunyu; Wang, Yi; Zhang, Meiping
2017-01-01
Ginseng, Panax ginseng C.A. Meyer, is one of the most important medicinal plants for human health and medicine. It has been documented that over 80% of genes conferring resistance to bacteria, viruses, fungi and nematodes are contributed by the nucleotide binding site (NBS)-encoding gene family. Therefore, identification and characterization of NBS genes expressed in ginseng are paramount to its genetic improvement and breeding. However, little is known about the NBS-encoding genes in ginseng. Here we report genome-wide identification and systems analysis of the NBS genes actively expressed in ginseng (PgNBS genes). Four hundred twelve PgNBS gene transcripts, derived from 284 gene models, were identified from the transcriptomes of 14 ginseng tissues. These genes were classified into eight types, including TNL, TN, CNL, CN, NL, N, RPW8-NL and RPW8-N. Seven conserved motifs were identified in both the Toll/interleukine-1 receptor (TIR) and coiled-coil (CC) typed genes whereas six were identified in the RPW8 typed genes. Phylogenetic analysis showed that the PgNBS gene family is an ancient family, with a vast majority of its genes originated before ginseng originated. In spite of their belonging to a family, the PgNBS genes have functionally dramatically differentiated and been categorized into numerous functional categories. The expressions of the across tissues, different aged roots and the roots of different genotypes. However, they are coordinating in expression, forming a single co-expression network. These results provide a deeper understanding of the origin, evolution and functional differentiation and expression dynamics of the NBS-encoding gene family in plants in general and in ginseng particularly, and a NBS gene toolkit useful for isolation and characterization of disease resistance genes and for enhanced disease resistance breeding in ginseng and related species.
Wang, Kangyu; Lin, Yanping; Wang, Yanfang; Sun, Chunyu; Wang, Yi
2017-01-01
Ginseng, Panax ginseng C.A. Meyer, is one of the most important medicinal plants for human health and medicine. It has been documented that over 80% of genes conferring resistance to bacteria, viruses, fungi and nematodes are contributed by the nucleotide binding site (NBS)-encoding gene family. Therefore, identification and characterization of NBS genes expressed in ginseng are paramount to its genetic improvement and breeding. However, little is known about the NBS-encoding genes in ginseng. Here we report genome-wide identification and systems analysis of the NBS genes actively expressed in ginseng (PgNBS genes). Four hundred twelve PgNBS gene transcripts, derived from 284 gene models, were identified from the transcriptomes of 14 ginseng tissues. These genes were classified into eight types, including TNL, TN, CNL, CN, NL, N, RPW8-NL and RPW8-N. Seven conserved motifs were identified in both the Toll/interleukine-1 receptor (TIR) and coiled-coil (CC) typed genes whereas six were identified in the RPW8 typed genes. Phylogenetic analysis showed that the PgNBS gene family is an ancient family, with a vast majority of its genes originated before ginseng originated. In spite of their belonging to a family, the PgNBS genes have functionally dramatically differentiated and been categorized into numerous functional categories. The expressions of the across tissues, different aged roots and the roots of different genotypes. However, they are coordinating in expression, forming a single co-expression network. These results provide a deeper understanding of the origin, evolution and functional differentiation and expression dynamics of the NBS-encoding gene family in plants in general and in ginseng particularly, and a NBS gene toolkit useful for isolation and characterization of disease resistance genes and for enhanced disease resistance breeding in ginseng and related species. PMID:28727829
Kaur, Surleen; Archer, Kellie J; Devi, M Gouri; Kriplani, Alka; Strauss, Jerome F; Singh, Rita
2012-10-01
Polycystic ovary syndrome (PCOS) is a heterogeneous, genetically complex, endocrine disorder of uncertain etiology in women. Our aim was to compare the gene expression profiles in stimulated granulosa cells of PCOS women with and without insulin resistance vs. matched controls. This study included 12 normal ovulatory women (controls), 12 women with PCOS without evidence for insulin resistance (PCOS non-IR), and 16 women with insulin resistance (PCOS-IR) undergoing in vitro fertilization. Granulosa cell gene expression profiling was accomplished using Affymetrix Human Genome-U133 arrays. Differentially expressed genes were classified according to gene ontology using ingenuity pathway analysis tools. Microarray results for selected genes were confirmed by real-time quantitative PCR. A total of 211 genes were differentially expressed in PCOS non-IR and PCOS-IR granulosa cells (fold change≥1.5; P≤0.001) vs. matched controls. Diabetes mellitus and inflammation genes were significantly increased in PCOS-IR patients. Real-time quantitative PCR confirmed higher expression of NCF2 (2.13-fold), TCF7L2 (1.92-fold), and SERPINA1 (5.35-fold). Increased expression of inflammation genes ITGAX (3.68-fold) and TAB2 (1.86-fold) was confirmed in PCOS non-IR. Different cardiometabolic disease genes were differentially expressed in the two groups. Decreased expression of CAV1 (-3.58-fold) in PCOS non-IR and SPARC (-1.88-fold) in PCOS-IR was confirmed. Differential expression of genes involved in TGF-β signaling (IGF2R, increased; and HAS2, decreased), and oxidative stress (TXNIP, increased) was confirmed in both groups. Microarray analysis demonstrated differential expression of genes linked to diabetes mellitus, inflammation, cardiovascular diseases, and infertility in the granulosa cells of PCOS women with and without insulin resistance. Because these dysregulated genes are also involved in oxidative stress, lipid metabolism, and insulin signaling, we hypothesize that these genes may be involved in follicular growth arrest and metabolic disorders associated with the different phenotypes of PCOS.
Tang, Xin-Ran; Li, Ying-Qin; Liang, Shao-Bo; Jiang, Wei; Liu, Fang; Ge, Wen-Xiu; Tang, Ling-Long; Mao, Yan-Ping; He, Qing-Mei; Yang, Xiao-Jing; Zhang, Yuan; Wen, Xin; Zhang, Jian; Wang, Ya-Qin; Zhang, Pan-Pan; Sun, Ying; Yun, Jing-Ping; Zeng, Jing; Li, Li; Liu, Li-Zhi; Liu, Na; Ma, Jun
2018-03-01
Gene expression patterns can be used as prognostic biomarkers in various types of cancers. We aimed to identify a gene expression pattern for individual distant metastatic risk assessment in patients with locoregionally advanced nasopharyngeal carcinoma. In this multicentre, retrospective, cohort analysis, we included 937 patients with locoregionally advanced nasopharyngeal carcinoma from three Chinese hospitals: the Sun Yat-sen University Cancer Center (Guangzhou, China), the Affiliated Hospital of Guilin Medical University (Guilin, China), and the First People's Hospital of Foshan (Foshan, China). Using microarray analysis, we profiled mRNA gene expression between 24 paired locoregionally advanced nasopharyngeal carcinoma tumours from patients at Sun Yat-sen University Cancer Center with or without distant metastasis after radical treatment. Differentially expressed genes were examined using digital expression profiling in a training cohort (Guangzhou training cohort; n=410) to build a gene classifier using a penalised regression model. We validated the prognostic accuracy of this gene classifier in an internal validation cohort (Guangzhou internal validation cohort, n=204) and two external independent cohorts (Guilin cohort, n=165; Foshan cohort, n=158). The primary endpoint was distant metastasis-free survival. Secondary endpoints were disease-free survival and overall survival. We identified 137 differentially expressed genes between metastatic and non-metastatic locoregionally advanced nasopharyngeal carcinoma tissues. A distant metastasis gene signature for locoregionally advanced nasopharyngeal carcinoma (DMGN) that consisted of 13 genes was generated to classify patients into high-risk and low-risk groups in the training cohort. Patients with high-risk scores in the training cohort had shorter distant metastasis-free survival (hazard ratio [HR] 4·93, 95% CI 2·99-8·16; p<0·0001), disease-free survival (HR 3·51, 2·43-5·07; p<0·0001), and overall survival (HR 3·22, 2·18-4·76; p<0·0001) than patients with low-risk scores. The prognostic accuracy of DMGN was validated in the internal and external cohorts. Furthermore, among patients with low-risk scores in the combined training and internal cohorts, concurrent chemotherapy improved distant metastasis-free survival compared with those patients who did not receive concurrent chemotherapy (HR 0·40, 95% CI 0·19-0·83; p=0·011), whereas patients with high-risk scores did not benefit from concurrent chemotherapy (HR 1·03, 0·71-1·50; p=0·876). This was also validated in the two external cohorts combined. We developed a nomogram based on the DMGN and other variables that predicted an individual's risk of distant metastasis, which was strengthened by adding Epstein-Barr virus DNA status. The DMGN is a reliable prognostic tool for distant metastasis in patients with locoregionally advanced nasopharyngeal carcinoma and might be able to predict which patients benefit from concurrent chemotherapy. It has the potential to guide treatment decisions for patients at different risk of distant metastasis. The National Natural Science Foundation of China, the National Science & Technology Pillar Program during the Twelfth Five-year Plan Period, the Natural Science Foundation of Guang Dong Province, the National Key Research and Development Program of China, the Innovation Team Development Plan of the Ministry of Education, the Health & Medical Collaborative Innovation Project of Guangzhou City, China, and the Program of Introducing Talents of Discipline to Universities. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chen, Jihua; Uto, Takuhiro; Tanigawa, Shunsuke; Yamada-Kato, Tomeo; Fujii, Makoto; Hou, DE-Xing
2010-01-01
6-(Methylsulfinyl)hexyl isothiocyanate (6-MSITC) is a bioactive ingredient of wasabi [Wasabia japonica (Miq.) Matsumura], which is a popular pungent spice of Japan. To evaluate the anti-inflammatory function and underlying genes targeted by 6-MSITC, gene expression profiling through DNA microarray was performed in mouse macrophages. Among 22,050 oligonucleotides, the expression levels of 406 genes were increased by ≥3-fold in lipopolysaccharide (LPS)-activated RAW264 cells, 238 gene signals of which were attenuated by 6-MSITC (≥2-fold). Expression levels of 717 genes were decreased by ≥3-fold in LPS-activated cells, of which 336 gene signals were restored by 6-MSITC (≥2-fold). Utilizing group analysis, 206 genes affected by 6-MSITC with a ≥2-fold change were classified into 35 categories relating to biological processes (81), molecular functions (108) and signaling pathways (17). The genes were further categorized as 'defense, inflammatory response, cytokine activities and receptor activities' and some were confirmed by real-time polymerase chain reaction. Ingenuity pathway analysis further revealed that wasabi 6-MSITC regulated the relevant networks of chemokines, interleukins and interferons to exert its anti-inflammatory function.
CHEN, JIHUA; UTO, TAKUHIRO; TANIGAWA, SHUNSUKE; YAMADA-KATO, TOMEO; FUJII, MAKOTO; HOU, DE-XING
2010-01-01
6-(Methylsulfinyl)hexyl isothiocyanate (6-MSITC) is a bioactive ingredient of wasabi [Wasabia japonica (Miq.) Matsumura], which is a popular pungent spice of Japan. To evaluate the anti-inflammatory function and underlying genes targeted by 6-MSITC, gene expression profiling through DNA microarray was performed in mouse macrophages. Among 22,050 oligonucleotides, the expression levels of 406 genes were increased by ≥3-fold in lipopolysaccharide (LPS)-activated RAW264 cells, 238 gene signals of which were attenuated by 6-MSITC (≥2-fold). Expression levels of 717 genes were decreased by ≥3-fold in LPS-activated cells, of which 336 gene signals were restored by 6-MSITC (≥2-fold). Utilizing group analysis, 206 genes affected by 6-MSITC with a ≥2-fold change were classified into 35 categories relating to biological processes (81), molecular functions (108) and signaling pathways (17). The genes were further categorized as ‘defense, inflammatory response, cytokine activities and receptor activities’ and some were confirmed by real-time polymerase chain reaction. Ingenuity pathway analysis further revealed that wasabi 6-MSITC regulated the relevant networks of chemokines, interleukins and interferons to exert its anti-inflammatory function. PMID:23136589
Yoshikawa, Mamoru; Kojima, Hiromi; Wada, Kota; Tsukidate, Toshiharu; Okada, Naoko; Saito, Hirohisa; Moriyama, Hiroshi
2006-07-01
To investigate the role of fibroblasts in the pathogenesis of cholesteatoma. Tissue specimens were obtained from our patients. Middle ear cholesteatoma-derived fibroblasts (MECFs) and postauricular skin-derived fibroblasts (SFs) as controls were then cultured for a few weeks. These fibroblasts were stimulated with interleukin (IL) 1alpha and/or IL-1beta before gene expression assays. We used the human genome U133A probe array (GeneChip) and real-time polymerase chain reaction to examine and compare the gene expression profiles of the MECFs and SFs. Six patients who had undergone tympanoplasty. The IL-1alpha-regulated genes were classified into 4 distinct clusters on the basis of profiles differentially regulated by SF and MECF using a hierarchical clustering analysis. The messenger RNA expressions of LARC (liver and activation-regulated chemokine), GMCSF (granulocyte-macrophage colony-stimulating factor), epiregulin, ICAM1 (intercellular adhesion molecule 1), and TGFA (transforming growth factor alpha) were more strongly up-regulated by IL-1alpha and/or IL-1beta in MECF than in SF, suggesting that these fibroblasts derived from different tissues retained their typical gene expression profiles. Fibroblasts may play a role in hyperkeratosis of middle ear cholesteatoma by releasing molecules involved in inflammation and epidermal growth. These fibroblasts may retain tissue-specific characteristics presumably controlled by epigenetic mechanisms.
Genome-Wide Identification and Expression Analysis of the UGlcAE Gene Family in Tomato.
Ding, Xing; Li, Jinhua; Pan, Yu; Zhang, Yue; Ni, Lei; Wang, Yaling; Zhang, Xingguo
2018-05-27
The UGlcAE has the capability of interconverting UDP-d-galacturonic acid and UDP-d-glucuronic acid, and UDP-d-galacturonic acid is an activated precursor for the synthesis of pectins in plants. In this study, we identified nine UGlcAE protein-encoding genes in tomato. The nine UGlcAE genes that were distributed on eight chromosomes in tomato, and the corresponding proteins contained one or two trans-membrane domains. The phylogenetic analysis showed that SlUGlcAE genes could be divided into seven groups, designated UGlcAE1 to UGlcAE6 , of which the UGlcAE2 were classified into two groups. Expression profile analysis revealed that the SlUGlcAE genes display diverse expression patterns in various tomato tissues. Selective pressure analysis indicated that all of the amino acid sites of SlUGlcAE proteins are undergoing purifying selection. Fifteen stress-, hormone-, and development-related elements were identified in the upstream regions (0.5 kb) of these SlUGlcAE genes. Furthermore, we investigated the expression patterns of SlUGlcAE genes in response to three hormones (indole-3-acetic acid (IAA), gibberellin (GA), and salicylic acid (SA)). We detected firmness, pectin contents, and expression levels of UGlcAE family genes during the development of tomato fruit. Here, we systematically summarize the general characteristics of the SlUGlcAE genes in tomato, which could provide a basis for further function studies of tomato UGlcAE genes.
Yadav, Inderjit S.; Sharma, Amandeep; Kaur, Satinder; Nahar, Natasha; Bhardwaj, Subhash C.; Sharma, Tilak R.; Chhuneja, Parveen
2016-01-01
Leaf rust caused by Puccinia triticina (Pt) is one of the most important diseases of bread wheat globally. Recent advances in sequencing technologies have provided opportunities to analyse the complete transcriptomes of the host as well as pathogen for studying differential gene expression during infection. Pathogen induced differential gene expression was characterized in a near isogenic line carrying leaf rust resistance gene Lr57 and susceptible recipient genotype WL711. RNA samples were collected at five different time points 0, 12, 24, 48, and 72 h post inoculation (HPI) with Pt 77-5. A total of 3020 transcripts were differentially expressed with 1458 and 2692 transcripts in WL711 and WL711+Lr57, respectively. The highest number of differentially expressed transcripts was detected at 12 HPI. Functional categorization using Blast2GO classified the genes into biological processes, molecular function and cellular components. WL711+Lr57 showed much higher number of differentially expressed nucleotide binding and leucine rich repeat genes and expressed more protein kinases and pathogenesis related proteins such as chitinases, glucanases and other PR proteins as compared to susceptible genotype. Pathway annotation with KEGG categorized genes into 13 major classes with carbohydrate metabolism being the most prominent followed by amino acid, secondary metabolites, and nucleotide metabolism. Gene co-expression network analysis identified four and eight clusters of highly correlated genes in WL711 and WL711+Lr57, respectively. Comparative analysis of the differentially expressed transcripts led to the identification of some transcripts which were specifically expressed only in WL711+Lr57. It was apparent from the whole transcriptome sequencing that the resistance gene Lr57 directed the expression of different genes involved in building the resistance response in the host to combat invading pathogen. The RNAseq data and differentially expressed transcripts identified in present study is a genomic resource which can be used for further studying the host pathogen interaction for Lr57 and wheat transcriptome in general. PMID:28066494
Nirmala, Nanguneri; Grom, Alexei; Gram, Hermann
2014-09-01
This review summarizes biomarkers in systemic juvenile idiopathic arthritis (sJIA). Broadly, the markers are classified under protein, cellular, gene expression and genetic markers. We also compare the biomarkers in sJIA to biomarkers in cryopyrin-associated periodic syndrome (CAPS). Recent publications showing the similarity of clinical response of sJIA and CAPS to anti-interleukin 1 therapies prompted a comparison at the biomarker level. sJIA traditionally is classified under the umbrella of juvenile idiopathic arthritis. At the clinical phenotypic level, sJIA has several features that are more similar to those seen in CAPS. In this review, we summarize biomarkers in sJIA and CAPS and draw upon the various similarities and differences between the two families of diseases. The main differences between sJIA and CAPS biomarkers are genetic markers, with CAPS being a family of monogenic diseases with mutations in NLRP3. There have been a small number of publications describing cellular biomarkers in sJIA with no such studies described for CAPS. Many of the protein marker's characteristics of sJIA are also seen to characterize CAPS. The gene expression data in both sJIA and CAPS show a strong upregulation of innate immunity pathways. In addition, we describe a strong similarity between sJIA and CAPS at the gene expression level in which several genes that form a part of the erythropoiesis signature are upregulated in both sJIA and CAPS.
Nirmala, Nanguneri; Grom, Alexei; Gram, Hermann
2015-01-01
Purpose of review This review summarizes biomarkers in Systemic Juvenile Idiopathic Arthritis (sJIA). Broadly, the markers are classified under protein, cellular, gene expression and genetic markers. We also compare the biomarkers in sJIA to biomarkers in cryopyrin associated periodic syndromes (CAPS). Recent findings Recent publications showing the similarity of clinical response of sJIA and CAPS to anti IL1 therapies prompted a comparison at the biomarker level. Summary sJIA traditionally is classified under the umbrella of juvenile idiopathic arthritis. At the clinical phenotypic level, sJIA has several features that are more similar to those seen in Cryopyrin Associated Periodic Syndromes (CAPS). In this review, we summarize biomarkers in sJIA and CAPS and draw upon the various similarities and differences between the two families of diseases. The main difference between sJIA and CAPS biomarkers are genetic markers with CAPS being a family of monogenic diseases with mutations in NLRP3. There have been a small number of publications describing cellular biomarkers in sJIA with no such studies described for CAPS. Many of the protein markers characteristic of sJIA are also seen to characterize CAPS. The gene expression data in both sJIA and CAPS show a strong upregulation of innate immunity pathways. In addition, we describe a strong similarity between sJIA and CAPS at the gene expression level where several genes that form a part of the erythropoiesis signature are upregulated in both sJIA and CAPS. PMID:25050926
Hox expression in the direct-type developing sand dollar Peronella japonica.
Tsuchimoto, Jun; Yamaguchi, Masaaki
2014-08-01
Echinoderms are a curious group of deuterostomes that forms a clade with hemichordates but has a pentameral body plan. Hox complex plays a pivotal role in axial patterning in bilaterians and often occurs in a cluster on the chromosome. In contrast to hemichordates with an organized Hox cluster, the sea urchin Strongylocentrotus purpuratus has a Hox cluster with an atypical organization. However, the current data on hox expression in sea urchin rudiments are fragmentary. We report a comprehensive examination of hox expression in a sand dollar echinoid. Nine hox genes are expressed in the adult rudiment, which are classified into two groups, but hox11/13b belongs to both: one with linear expression in the coelomic mesoderm and another with radial expression around the adult mouth. The linear genes may endow the coelom/mesentery with axial information to direct postmetamorphic transformation of the digestive tract, whereas the radial genes developmentally correlate with the morphological novelties of echinoderms and/or sea urchins. Recruitment of the radial genes except hox11/13b appears to be accompanied by the loss of ancestral/axial roles. This in toto co-option of the hox genes provides insight into the molecular mechanisms underlying the evolution of echinoderms from a bilateral ancestor. © 2014 Wiley Periodicals, Inc.
Zhou, Shuang-Shuang; Sun, Ze; Ma, Weihua; Chen, Wei; Wang, Man-Qun
2014-03-01
We sequenced the antenna transcriptome of the brown planthopper (BPH), Nilaparvata lugens (Stål), a global rice pest, and performed transcriptome analysis on BPH antenna. We obtained about 40million 90bp reads that were assembled into 75,874 unigenes with a mean size of 456bp. Among the antenna transcripts, 32,856 (43%) showed significant similarity (E-value <1e(-5)) to known proteins in the NCBI database. Gene ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were used to classify functions of BPH antenna genes. We identified 10 odorant-binding proteins (OBPs), including 7 previously unidentified, and 11 chemosensory proteins (CSPs), including two new members. The expression profiles of 4 OBPs and 2 CSPs were determined by q-PCR for antenna, abdomen, leg and wing of insects of different age, gender, and mating status including two BPH adult wing-morphology types. NlugCSP10 and 4 OBPs appeared to be antenna-specific because they were highly and differentially expressed in male and female antennae. NlugCSP11 was expressed ubiquitously, with particularly high expression in wings. The transcript levels of several olfactory genes depended on adult wing form, age, gender, and mating status, although no clear expression patterns were determined. Copyright © 2013 Elsevier Inc. All rights reserved.
Mancia, Annalaura; Ryan, James C; Van Dolah, Frances M; Kucklick, John R; Rowles, Teresa K; Wells, Randall S; Rosel, Patricia E; Hohn, Aleta A; Schwacke, Lori H
2014-09-01
As top-level predators, common bottlenose dolphins (Tursiops truncatus) are particularly sensitive to chemical and biological contaminants that accumulate and biomagnify in the marine food chain. This work investigates the potential use of microarray technology and gene expression profile analysis to screen common bottlenose dolphins for exposure to environmental contaminants through the immunological and/or endocrine perturbations associated with these agents. A dolphin microarray representing 24,418 unigene sequences was used to analyze blood samples collected from 47 dolphins during capture-release health assessments from five different US coastal locations (Beaufort, NC, Sarasota Bay, FL, Saint Joseph Bay, FL, Sapelo Island, GA and Brunswick, GA). Organohalogen contaminants including pesticides, polychlorinated biphenyl congeners (PCBs) and polybrominated diphenyl ether congeners were determined in blubber biopsy samples from the same animals. A subset of samples (n = 10, males; n = 8, females) with the highest and the lowest measured values of PCBs in their blubber was used as strata to determine the differential gene expression of the exposure extremes through machine learning classification algorithms. A set of genes associated primarily with nuclear and DNA stability, cell division and apoptosis regulation, intra- and extra-cellular traffic, and immune response activation was selected by the algorithm for identifying the two exposure extremes. In order to test the hypothesis that these gene expression patterns reflect PCB exposure, we next investigated the blood transcriptomes of the remaining dolphin samples using machine-learning approaches, including K-nn and Support Vector Machines classifiers. Using the derived gene sets, the algorithms worked very well (100% success rate) at classifying dolphins according to the contaminant load accumulated in their blubber. These results suggest that gene expression profile analysis may provide a valuable means to screen for indicators of chemical exposure. Copyright © 2014 Elsevier Ltd. All rights reserved.
Chai, Wenbo; Si, Weina; Ji, Wei; Qin, Qianqian; Zhao, Manli; Jiang, Haiyang
2018-01-01
HD-Zip proteins represent the major transcription factors in higher plants, playing essential roles in plant development and stress responses. Foxtail millet is a crop to investigate the systems biology of millet and biofuel grasses and the HD-Zip gene family has not been studied in foxtail millet. For further investigation of the expression profile of the HD-Zip gene family in foxtail millet, a comprehensive genome-wide expression analysis was conducted in this study. We found 47 protein-encoding genes in foxtail millet using BLAST search tools; the putative proteins were classified into four subfamilies, namely, subfamilies I, II, III, and IV. Gene structure and motif analysis indicate that the genes in one subfamily were conserved. Promotor analysis showed that HD-Zip gene was involved in abiotic stress. Duplication analysis revealed that 8 (~17%) hdz genes were tandemly duplicated and 28 (58%) were segmentally duplicated; purifying duplication plays important roles in gene expansion. Microsynteny analysis revealed the maximum relationship in foxtail millet-sorghum and foxtail millet-rice. Expression profiling upon the abiotic stresses of drought and high salinity and the biotic stress of ABA revealed that some genes regulated responses to drought and salinity stresses via an ABA-dependent process, especially sihdz29 and sihdz45. Our study provides new insight into evolutionary and functional analyses of HD-Zip genes involved in environmental stress responses in foxtail millet.
Yang, Liang; Du, Zhenguo; Gao, Feng; Wu, Kangcheng; Xie, Lianhui; Li, Yi; Wu, Zujian; Wu, Jianguo
2014-05-06
Rice dwarf virus (RDV) is the causal agent of rice dwarf disease, which limits rice production in many areas of south East Asia. Transcriptional changes of rice in response to RDV infection have been characterized by Shimizu et al. and Satoh et al.. Both studies found induction of defense related genes and correlations between transcriptional changes and symptom development in RDV-infected rice. However, the same rice cultivar, namely Nipponbare belonging to the Japonic subspecies of rice was used in both studies. Gene expression changes of the indica subspecies of rice, namely Oryza sativa L. ssp. indica cv Yixiang2292 that show moderate resistance to RDV, in response to RDV infection were characterized using an Affymetrix Rice Genome Array. Differentially expressed genes (DEGs) were classified according to their Gene Ontology (GO) annotation. The effects of transient expression of Pns11 in Nicotiana benthaminana on the expression of nucleolar genes were studied using real-time PCR (RT-PCR). 856 genes involved in defense or other physiological processes were identified to be DEGs, most of which showed up-regulation. Ribosome- and nucleolus related genes were significantly enriched in the DEGs. Representative genes related to nucleolar function exhibited altered expression in N. benthaminana plants transiently expressing Pns11 of RDV. Induction of defense related genes is common for rice infected with RDV. There is a co-relation between symptom severity and transcriptional alteration in RDV infected rice. Besides ribosome, RDV may also target nucleolus to manipulate the translation machinery of rice. Given the tight links between nucleolus and ribosome, it is intriguing to speculate that RDV may enhance expression of ribosomal genes by targeting nucleolus through Pns11.
Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders
Mutch, David M.; Temanni, M. Ramzi; Henegar, Corneliu; Combes, Florence; Pelloux, Véronique; Holst, Claus; Sørensen, Thorkild I. A.; Astrup, Arne; Martinez, J. Alfredo; Saris, Wim H. M.; Viguerie, Nathalie; Langin, Dominique; Zucker, Jean-Daniel; Clément, Karine
2007-01-01
Background The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. Methodology/Principal Findings The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8–12 kgs weight loss) could always be differentiated from non-responders (<4 kgs weight loss). We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box) approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%±8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier) improved prediction accuracy to 80.9%±2.2%. Conclusion Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition. PMID:18094752
Martin, C E; Paibomesai, M A; Emam, S M; Gallienne, J; Hine, B C; Thompson-Crispi, K A; Mallard, B A
2016-03-01
Genetic selection for enhanced immune response has been shown to decrease disease occurrence in dairy cattle. Cows can be classified as high (H), average, or low responders based on antibody-mediated immune response (AMIR), predominated by type-2 cytokine production, and cell-mediated immune response (CMIR) through estimated breeding values for these traits. The purpose of this study was to identify in vitro tests that correlate with in vivo immune response phenotyping in dairy cattle. Blood mononuclear cells (BMC) isolated from cows classified as H-AMIR and H-CMIR through estimated breeding values for immune response traits were stimulated with concanavalin A (ConA; Sigma Aldrich, St. Louis, MO) and gene expression, cytokine production, and cell proliferation was determined at multiple time points. A repeated measures model, which included the effects of immune response group, parity, and stage of lactation, was used to compare differences between immune response phenotype groups. The H-AMIR cows produced more IL-4 protein than H-CMIR cows at 48 h; however, no difference in gene expression of type-2 transcription factor GATA3 or IL4 was noted. The BMC from H-CMIR cows had increased production of IFN-γ protein at 48, 72, and 96 h compared with H-AMIR animals. Further, H-CMIR cows had increased expression of the IFNG gene at 16, 24, and 48 h post-treatment with ConA, although expression of the type-1 transcription factor gene TBX21 did not differ between immune response groups. Although proliferation of BMC increased from 24 to 72 h after ConA stimulation, no differences were found between the immune response groups. Overall, stimulation of H-AMIR and H-CMIR bovine BMC with ConA resulted in distinct cytokine production profiles according to genetically defined groups. These distinct cytokine profiles could be used to define disease resistance phenotypes in dairy cows according to stimulation in vitro; however, other immune response phenotypes should be assessed. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Tomlins, Scott A.; Alshalalfa, Mohammed; Davicioni, Elai; Erho, Nicholas; Yousefi, Kasra; Zhao, Shuang; Haddad, Zaid; Den, Robert B.; Dicker, Adam P.; Trock, Bruce; DeMarzo, Angelo; Ross, Ashley; Schaeffer, Edward M.; Klein, Eric A.; Magi-Galluzzi, Cristina; Karnes, Jeffery R.; Jenkins, Robert B.; Feng, Felix Y.
2015-01-01
Background Prostate cancer (PCa) molecular subtypes have been defined by essentially mutually exclusive events, including ETS gene fusions (most commonly involving ERG) and SPINK1 over-expression. Clinical assessment may aid in disease stratification, complementing available prognostic tests. Objective To determine the analytical validity and clinicopatholgical associations of microarray-based molecular subtyping. Design, Setting and Participants We analyzed Affymetrix GeneChip expression profiles for 1,577 patients from eight radical prostatectomy (RP) cohorts, including 1,351 cases assessed using the Decipher prognostic assay (performed in a CLIA-certified laboratory). A microarray-based (m-) random forest ERG classification model was trained and validated. Outlier expression analysis was used to predict other mutually exclusive non-ERG ETS gene rearrangements (ETS+) or SPINK1 over-expression (SPINK1+). Outcome Measurements Associations with clinical features and outcomes by multivariable logistic regression analysis and receiver operating curves. Results and Limitations The m-ERG classifier showed 95% accuracy in an independent validation subset (n=155 samples). Across cohorts, 45%, 9%, 8% and 38% of PCa were classified as m-ERG+, m-ETS+, m-SPINK1+, and triple negative (m-ERG−/m-ETS−/m-SPINK1−), respectively. Gene expression profiling supports three underlying molecularly defined groups (m-ERG+, m-ETS+ and m-SPINK1+/triple negative). On multivariable analysis, m-ERG+ tumors were associated with lower preoperative serum PSA and Gleason scores, but enriched for extraprostatic extension (p<0.001). m-ETS+ tumors were associated with seminal vesicle invasion (p=0.01), while m-SPINK1+/triple negative tumors had higher Gleason scores and were more frequent in Black/African American patients (p<0.001). Clinical outcomes were not significantly different between subtypes. Conclusions A clinically available prognostic test (Decipher) can also assess PCa molecular subtypes, obviating the need for additional testing. Clinicopathological differences were found among subtypes based on global expression patterns. PMID:25964175
Yao, Qiu-Yang; Xia, En-Hua; Liu, Fei-Hu; Gao, Li-Zhi
2015-02-15
WRKY transcription factors (TFs), one of the ten largest TF families in higher plants, play important roles in regulating plant development and resistance. To date, little is known about the WRKY TF family in Brassica oleracea. Recently, the completed genome sequence of cabbage (B. oleracea var. capitata) allows us to systematically analyze WRKY genes in this species. A total of 148 WRKY genes were characterized and classified into seven subgroups that belong to three major groups. Phylogenetic and synteny analyses revealed that the repertoire of cabbage WRKY genes was derived from a common ancestor shared with Arabidopsis thaliana. The B. oleracea WRKY genes were found to be preferentially retained after the whole-genome triplication (WGT) event in its recent ancestor, suggesting that the WGT event had largely contributed to a rapid expansion of the WRKY gene family in B. oleracea. The analysis of RNA-Seq data from various tissues (i.e., roots, stems, leaves, buds, flowers and siliques) revealed that most of the identified WRKY genes were positively expressed in cabbage, and a large portion of them exhibited patterns of differential and tissue-specific expression, demonstrating that these gene members might play essential roles in plant developmental processes. Comparative analysis of the expression level among duplicated genes showed that gene expression divergence was evidently presented among cabbage WRKY paralogs, indicating functional divergence of these duplicated WRKY genes. Copyright © 2014 Elsevier B.V. All rights reserved.
2012-01-01
Background Alteration in gene expression resulting from allopolyploidization is a prominent feature in plants, but its spectrum and extent are not fully known. Common wheat (Triticum aestivum) was formed via allohexaploidization about 10,000 years ago, and became the most important crop plant. To gain further insights into the genome-wide transcriptional dynamics associated with the onset of common wheat formation, we conducted microarray-based genome-wide gene expression analysis on two newly synthesized allohexaploid wheat lines with chromosomal stability and a genome constitution analogous to that of the present-day common wheat. Results Multi-color GISH (genomic in situ hybridization) was used to identify individual plants from two nascent allohexaploid wheat lines between Triticum turgidum (2n = 4x = 28; genome BBAA) and Aegilops tauschii (2n = 2x = 14; genome DD), which had a stable chromosomal constitution analogous to that of common wheat (2n = 6x = 42; genome BBAADD). Genome-wide analysis of gene expression was performed for these allohexaploid lines along with their parental plants from T. turgidum and Ae. tauschii, using the Affymetrix Gene Chip Wheat Genome-Array. Comparison with the parental plants coupled with inclusion of empirical mid-parent values (MPVs) revealed that whereas the great majority of genes showed the expected parental additivity, two major patterns of alteration in gene expression in the allohexaploid lines were identified: parental dominance expression and non-additive expression. Genes involved in each of the two altered expression patterns could be classified into three distinct groups, stochastic, heritable and persistent, based on their transgenerational heritability and inter-line conservation. Strikingly, whereas both altered patterns of gene expression showed a propensity of inheritance, identity of the involved genes was highly stochastic, consistent with the involvement of diverse Gene Ontology (GO) terms. Nonetheless, those genes showing non-additive expression exhibited a significant enrichment for vesicle-function. Conclusions Our results show that two patterns of global alteration in gene expression are conditioned by allohexaploidization in wheat, that is, parental dominance expression and non-additive expression. Both altered patterns of gene expression but not the identity of the genes involved are likely to play functional roles in stabilization and establishment of the newly formed allohexaploid plants, and hence, relevant to speciation and evolution of T. aestivum. PMID:22277161
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren’s Syndrome
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J.; Gillespie, Colin S.; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T.; Emery, Paul; Lanyon, Peter; Hunter, John A.; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E.; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Background Fatigue is a debilitating condition with a significant impact on patients’ quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren’s Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Methods Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren’s Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Results Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Conclusions Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS. PMID:26694930
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype.
Blenkiron, Cherie; Goldstein, Leonard D; Thorne, Natalie P; Spiteri, Inmaculada; Chin, Suet-Feung; Dunning, Mark J; Barbosa-Morais, Nuno L; Teschendorff, Andrew E; Green, Andrew R; Ellis, Ian O; Tavaré, Simon; Caldas, Carlos; Miska, Eric A
2007-01-01
MicroRNAs (miRNAs), a class of short non-coding RNAs found in many plants and animals, often act post-transcriptionally to inhibit gene expression. Here we report the analysis of miRNA expression in 93 primary human breast tumors, using a bead-based flow cytometric miRNA expression profiling method. Of 309 human miRNAs assayed, we identify 133 miRNAs expressed in human breast and breast tumors. We used mRNA expression profiling to classify the breast tumors as luminal A, luminal B, basal-like, HER2+ and normal-like. A number of miRNAs are differentially expressed between these molecular tumor subtypes and individual miRNAs are associated with clinicopathological factors. Furthermore, we find that miRNAs could classify basal versus luminal tumor subtypes in an independent data set. In some cases, changes in miRNA expression correlate with genomic loss or gain; in others, changes in miRNA expression are likely due to changes in primary transcription and or miRNA biogenesis. Finally, the expression of DICER1 and AGO2 is correlated with tumor subtype and may explain some of the changes in miRNA expression observed. This study represents the first integrated analysis of miRNA expression, mRNA expression and genomic changes in human breast cancer and may serve as a basis for functional studies of the role of miRNAs in the etiology of breast cancer. Furthermore, we demonstrate that bead-based flow cytometric miRNA expression profiling might be a suitable platform to classify breast cancer into prognostic molecular subtypes.
Cheng, Hongtao; Hao, Mengyu; Wang, Wenxiang; Mei, Desheng; Tong, Chaobo; Wang, Hui; Liu, Jia; Fu, Li; Hu, Qiong
2016-09-08
SBP-box genes belong to one of the largest families of transcription factors. Though members of this family have been characterized to be important regulators of diverse biological processes, information of SBP-box genes in the third most important oilseed crop Brassica napus is largely undefined. In the present study, by whole genome bioinformatics analysis and transcriptional profiling, 58 putative members of SBP-box gene family in oilseed rape (Brassica napus L.) were identified and their expression pattern in different tissues as well as possible interaction with miRNAs were analyzed. In addition, B. napus lines with contrasting branch angle were used for investigating the involvement of SBP-box genes in plant architecture regulation. Detailed gene information, including genomic organization, structural feature, conserved domain and phylogenetic relationship of the genes were systematically characterized. By phylogenetic analysis, BnaSBP proteins were classified into eight distinct groups representing the clear orthologous relationships to their family members in Arabidopsis and rice. Expression analysis in twelve tissues including vegetative and reproductive organs showed different expression patterns among the SBP-box genes and a number of the genes exhibit tissue specific expression, indicating their diverse functions involved in the developmental process. Forty-four SBP-box genes were ascertained to contain the putative miR156 binding site, with 30 and 14 of the genes targeted by miR156 at the coding and 3'UTR region, respectively. Relative expression level of miR156 is varied across tissues. Different expression pattern of some BnaSBP genes and the negative correlation of transcription levels between miR156 and its target BnaSBP gene were observed in lines with different branch angle. Taken together, this study represents the first systematic analysis of the SBP-box gene family in Brassica napus. The data presented here provides base foundation for understanding the crucial roles of BnaSBP genes in plant development and other biological processes.
Dynamic changes in gene expression during human trophoblast differentiation.
Handwerger, Stuart; Aronow, Bruce
2003-01-01
The genetic program that directs human placental differentiation is poorly understood. In a recent study, we used DNA microarray analyses to determine genes that are dynamically regulated during human placental development in an in vitro model system in which highly purified cytotrophoblast cells aggregate spontaneously and fuse to form a multinucleated syncytium that expresses placental lactogen, human chorionic gonadotropin, and other proteins normally expressed by fully differentiated syncytiotrophoblast cells. Of the 6918 genes present on the Incyte Human GEM V microarray that we analyzed over a 9-day period, 141 were induced and 256 were downregulated by more than 2-fold. The dynamically regulated genes fell into nine distinct kinetic patterns of induction or repression, as detected by the K-means algorithm. Classifying the genes according to functional characteristics, the regulated genes could be divided into six overall categories: cell and tissue structural dynamics, cell cycle and apoptosis, intercellular communication, metabolism, regulation of gene expression, and expressed sequence tags and function unknown. Gene expression changes within key functional categories were tightly coupled to the morphological changes that occurred during trophoblast differentiation. Within several key gene categories (e.g., cell and tissue structure), many genes were strongly activated, while others with related function were strongly repressed. These findings suggest that trophoblast differentiation is augmented by "categorical reprogramming" in which the ability of induced genes to function is enhanced by diminished synthesis of other genes within the same category. We also observed categorical reprogramming in human decidual fibroblasts decidualized in vitro in response to progesterone, estradiol, and cyclic AMP. While there was little overlap between genes that are dynamically regulated during trophoblast differentiation versus decidualization, many of the categories in which genes were strongly activated also contained genes whose expression was strongly diminished. Taken together, these findings point to a fundamental role for simultaneous induction and repression of mRNAs that encode functionally related proteins during the differentiation process.
Meesapyodsuk, Dauenpen; Balsevich, John; Reed, Darwin W.; Covello, Patrick S.
2007-01-01
Saponaria vaccaria (Caryophyllaceae), a soapwort, known in western Canada as cowcockle, contains bioactive oleanane-type saponins similar to those found in soapbark tree (Quillaja saponaria; Rosaceae). To improve our understanding of the biosynthesis of these saponins, a combined polymerase chain reaction and expressed sequence tag approach was taken to identify the genes involved. A cDNA encoding a β-amyrin synthase (SvBS) was isolated by reverse transcription-polymerase chain reaction and characterized by expression in yeast (Saccharomyces cerevisiae). The SvBS gene is predominantly expressed in leaves. A S. vaccaria developing seed expressed sequence tag collection was developed and used for the isolation of a full-length cDNA bearing sequence similarity to ester-forming glycosyltransferases. The gene product of the cDNA, classified as UGT74M1, was expressed in Escherichia coli, purified, and identified as a triterpene carboxylic acid glucosyltransferase. UGT74M1 is expressed in roots and leaves and appears to be involved in monodesmoside biosynthesis in S. vaccaria. PMID:17172290
DNA methylation patterns and gene expression associated with litter size in Berkshire pig placenta
Kwon, Seulgi; Park, Da Hye; Kim, Tae Wan; Kang, Deok Gyeong; Yu, Go Eun; Kim, Il-Suk; Park, Hwa Chun; Ha, Jeongim; Kim, Chul Wook
2017-01-01
Increasing litter size is of great interest to the pig industry. DNA methylation is an important epigenetic modification that regulates gene expression, resulting in livestock phenotypes such as disease resistance, milk production, and reproduction. We classified Berkshire pigs into two groups according to litter size and estimated breeding value: smaller (SLG) and larger (LLG) litter size groups. Genome-wide DNA methylation and gene expression were analyzed using placenta genomic DNA and RNA to identify differentially methylated regions (DMRs) and differentially expressed genes (DEGs) associated with litter size. The methylation levels of CpG dinucleotides in different genomic regions were noticeably different between the groups, while global methylation pattern was similar, and excluding intergenic regions they were found the most frequently in gene body regions. Next, we analyzed RNA-Seq data to identify DEGs between the SLG and LLG groups. A total of 1591 DEGs were identified: 567 were downregulated and 1024 were upregulated in LLG compared to SLG. To identify genes that simultaneously exhibited changes in DNA methylation and mRNA expression, we integrated and analyzed the data from bisulfite-Seq and RNA-Seq. Nine DEGs positioned in DMRs were found. The expression of only three of these genes (PRKG2, CLCA4, and PCK1) was verified by RT-qPCR. Furthermore, we observed the same methylation patterns in blood samples as in the placental tissues by PCR-based methylation analysis. Together, these results provide useful data regarding potential epigenetic markers for selecting hyperprolific sows. PMID:28880934
Su, Yuhua; Nielsen, Dahlia; Zhu, Lei; Richards, Kristy; Suter, Steven; Breen, Matthew; Motsinger-Reif, Alison; Osborne, Jason
2013-01-05
: A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. The model utility was illustrated using a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University. A small number of genes were identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma (DLBCL) patients into two subgroups, the germinal center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. The number of human genes that were observed to be significantly differentially expressed (21) from the two-species analysis was very small compared to the number of human genes (190) identified with only one-species analysis (human data). The genes may be clinically relevant/important, as this small set achieved low misclassification rates of DLBCL subtypes. Additionally, the two subgroups defined by this cluster of human genes had significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed mixture model provided improved insight into the clinical differences between the two cancer subtypes.
Ambroise, Jérôme; Robert, Annie; Macq, Benoit; Gala, Jean-Luc
2012-01-06
An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised classifier. In the current work, TNIFSED was applied to predict the transcriptional network in Escherichia coli and in Saccharomyces cerevisiae, using datasets of 445 and 170 affymetrix arrays, respectively. Using the area under the curve of the receiver operating characteristics and the F-measure as indicators, we showed the predictive performance of TNIFSED to be better than unsupervised state-of-the-art methods. TNIFSED performed slightly worse than the supervised SIRENE algorithm for the target genes identification of the TF having a wide range of yet identified target genes but better for TF having only few identified target genes. Our results indicate that TNIFSED is complementary to the SIRENE algorithm, and particularly suitable to discover target genes of "orphan" TFs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bock KW; D Honys; JM. Ward
Male fertility depends on the proper development of the male gametophyte, successful pollen germination, tube growth and delivery of the sperm cells to the ovule. Previous studies have shown that nutrients like boron, and ion gradients or currents of Ca2+, H+, and K+ are critical for pollen tube growth. However, the molecular identities of transporters mediating these fluxes are mostly unknown. As a first step to integrate transport with pollen development and function, a genome-wide analysis of transporter genes expressed in the male gametophyte at four developmental stages was conducted. About 1269 genes encoding classified transporters were collected from themore » Arabidopsis thaliana genome. Of 757 transporter genes expressed in pollen, 16% or 124 genes, including AHA6, CNGC18, TIP1.3 and CHX08, are specifically or preferentially expressed relative to sporophytic tissues. Some genes are highly expressed in microspores and bicellular pollen (COPT3, STP2, OPT9); while others are activated only in tricellular or mature pollen (STP11, LHT7). Analyses of entire gene families showed that a subset of genes, including those expressed in sporophytic tissues, were developmentally-regulated during pollen maturation. Early and late expression patterns revealed by transcriptome analysis are supported by promoter::GUS analyses of CHX genes and by other methods. Recent genetic studies based on a few transporters, including plasma membrane H+ pump AHA3, Ca2+ pump ACA9, and K+ channel SPIK, further support the expression patterns and the inferred functions revealed by our analyses. Thus, revealing the distinct expression patterns of specific transporters and unknown polytopic proteins during microgametogenesis provides new insights for strategic mutant analyses necessary to integrate the roles of transporters and potential receptors with male gametophyte development.« less
Bock, Kevin W; Honys, David; Ward, John M; Padmanaban, Senthilkumar; Nawrocki, Eric P; Hirschi, Kendal D; Twell, David; Sze, Heven
2006-04-01
Male fertility depends on the proper development of the male gametophyte, successful pollen germination, tube growth, and delivery of the sperm cells to the ovule. Previous studies have shown that nutrients like boron, and ion gradients or currents of Ca2+, H+, and K+ are critical for pollen tube growth. However, the molecular identities of transporters mediating these fluxes are mostly unknown. As a first step to integrate transport with pollen development and function, a genome-wide analysis of transporter genes expressed in the male gametophyte at four developmental stages was conducted. Approximately 1,269 genes encoding classified transporters were collected from the Arabidopsis (Arabidopsis thaliana) genome. Of 757 transporter genes expressed in pollen, 16% or 124 genes, including AHA6, CNGC18, TIP1.3, and CHX08, are specifically or preferentially expressed relative to sporophytic tissues. Some genes are highly expressed in microspores and bicellular pollen (COPT3, STP2, OPT9), while others are activated only in tricellular or mature pollen (STP11, LHT7). Analyses of entire gene families showed that a subset of genes, including those expressed in sporophytic tissues, was developmentally regulated during pollen maturation. Early and late expression patterns revealed by transcriptome analysis are supported by promoter::beta-glucuronidase analyses of CHX genes and by other methods. Recent genetic studies based on a few transporters, including plasma membrane H+ pump AHA3, Ca2+ pump ACA9, and K+ channel SPIK, further support the expression patterns and the inferred functions revealed by our analyses. Thus, revealing the distinct expression patterns of specific transporters and unknown polytopic proteins during microgametogenesis provides new insights for strategic mutant analyses necessary to integrate the roles of transporters and potential receptors with male gametophyte development.
Seet, Li-Fong; Narayanaswamy, Arun; Finger, Sharon N; Htoon, Hla M; Nongpiur, Monisha E; Toh, Li Zhen; Ho, Henrietta; Perera, Shamira A; Wong, Tina T
2016-11-01
This study aimed to evaluate differences in iris gene expression profiles between primary angle closure glaucoma (PACG) and primary open angle glaucoma (POAG) and their interaction with biometric characteristics. Prospective study. Thirty-five subjects with PACG and thirty-three subjects with POAG who required trabeculectomy were enrolled at the Singapore National Eye Centre, Singapore. Iris specimens, obtained by iridectomy, were analysed by real-time polymerase chain reaction for expression of type I collagen, vascular endothelial growth factor (VEGF)-A, -B and -C, as well as VEGF receptors (VEGFRs) 1 and 2. Anterior segment optical coherence tomography (ASOCT) imaging for biometric parameters, including anterior chamber depth (ACD), anterior chamber volume (ACV) and lens vault (LV), was also performed pre-operatively. Relative mRNA levels between PACG and POAG irises, biometric measurements, discriminant analyses using genes and biometric parameters. COL1A1, VEGFB, VEGFC and VEGFR2 mRNA expression was higher in PACG compared to POAG irises. LV, ACD and ACV were significantly different between the two subgroups. Discriminant analyses based on gene expression, biometric parameters or a combination of both gene expression and biometrics (LV and ACV), correctly classified 94.1%, 85.3% and 94.1% of the original PACG and POAG cases, respectively. The discriminant function combining genes and biometrics demonstrated the highest accuracy in cross-validated classification of the two glaucoma subtypes. Distinct iris gene expression supports the pathophysiological differences that exist between PACG and POAG. Biometric parameters can combine with iris gene expression to more accurately define PACG from POAG. © 2016 The Authors. Clinical & Experimental Ophthalmology published by John Wiley & Sons Australia, Ltd on behalf of Royal Australian and New Zealand College of Ophthalmologists.
Dakshinamurthy, Amirtha Ganesh; Ramesar, Rajkumar; Goldberg, Paul; Blackburn, Jonathan M
2008-11-01
Cancer-testis (CT) antigens are a group of tumor antigens that are expressed in the testis and aberrantly in cancerous tissue but not in somatic tissues. The testis is an immune-privileged site because of the presence of a blood-testis barrier; as a result, CT antigens are considered to be essentially tumor specific and are attractive targets for immunotherapy. CT antigens are classified as the CT-X and the non-X CT antigens depending on the chromosomal location to which the genes are mapped. CT-X antigens are typically highly immunogenic and hence the first step towards tailored immunotherapy is to elucidate the expression profile of CT-X antigens in the respective tumors. In this study we investigated the expression profile of 16 CT-X antigen genes in 34 colorectal cancer (CRC) patients using reverse transcription-polymerase chain reaction. We observed that 12 of the 16 CT-X antigen genes studied did not show expression in any of the CRC samples analyzed. The other 4 CT-X antigen genes showed low frequency of expression and exhibited a highly variable expression profile when compared to other populations. Thus, our study forms the first report on the expression profile of CT-X antigen genes among CRC patients in the genetically diverse South African population. The results of our study suggest that genetic and ethnic variations in population might have a role in the expression of the CT-X antigen genes. Thus our results have significant implications for anti-CT antigen-based immunotherapy trials in this population.
Du, Jiancan; Hu, Simin; Yu, Qin; Wang, Chongde; Yang, Yunqiang; Sun, Hang; Yang, Yongping; Sun, Xudong
2017-01-01
The teosinte branched1/cycloidea/proliferating cell factor (TCP) gene family is a plant-specific transcription factor that participates in the control of plant development by regulating cell proliferation. However, no report is currently available about this gene family in turnips ( Brassica rapa ssp. rapa ). In this study, a genome-wide analysis of TCP genes was performed in turnips. Thirty-nine TCP genes in turnip genome were identified and distributed on 10 chromosomes. Phylogenetic analysis clearly showed that the family was classified as two clades: class I and class II. Gene structure and conserved motif analysis showed that the same clade genes have similar gene structures and conserved motifs. The expression profiles of 39 TCP genes were determined through quantitative real-time PCR. Most CIN-type BrrTCP genes were highly expressed in leaf. The members of CYC/TB1 subclade are highly expressed in flower bud and weakly expressed in root. By contrast, class I clade showed more widespread but less tissue-specific expression patterns. Yeast two-hybrid data show that BrrTCP proteins preferentially formed heterodimers. The function of BrrTCP2 was confirmed through ectopic expression of BrrTCP2 in wild-type and loss-of-function ortholog mutant of Arabidopsis. Overexpression of BrrTCP2 in wild-type Arabidopsis resulted in the diminished leaf size. Overexpression of BrrTCP2 in triple mutants of tcp2/4/10 restored the leaf phenotype of tcp2/4/10 to the phenotype of wild type. The comprehensive analysis of turnip TCP gene family provided the foundation to further study the roles of TCP genes in turnips.
Liu, Chaoyang; Xie, Tao; Chen, Chenjie; Luan, Aiping; Long, Jianmei; Li, Chuhao; Ding, Yaqi; He, Yehua
2017-07-01
The MYB proteins comprise one of the largest families of plant transcription factors, which are involved in various plant physiological and biochemical processes. Pineapple (Ananas comosus) is one of three most important tropical fruits worldwide. The completion of pineapple genome sequencing provides a great opportunity to investigate the organization and evolutionary traits of pineapple MYB genes at the genome-wide level. In the present study, a total of 94 pineapple R2R3-MYB genes were identified and further phylogenetically classified into 26 subfamilies, as supported by the conserved gene structures and motif composition. Collinearity analysis indicated that the segmental duplication events played a crucial role in the expansion of pineapple MYB gene family. Further comparative phylogenetic analysis suggested that there have been functional divergences of MYB gene family during plant evolution. RNA-seq data from different tissues and developmental stages revealed distinct temporal and spatial expression profiles of the AcMYB genes. Further quantitative expression analysis showed the specific expression patterns of the selected putative stress-related AcMYB genes in response to distinct abiotic stress and hormonal treatments. The comprehensive expression analysis of the pineapple MYB genes, especially the tissue-preferential and stress-responsive genes, could provide valuable clues for further function characterization. In this work, we systematically identified AcMYB genes by analyzing the pineapple genome sequence using a set of bioinformatics approaches. Our findings provide a global insight into the organization, phylogeny and expression patterns of the pineapple R2R3-MYB genes, and hence contribute to the greater understanding of their biological roles in pineapple.
St-Amand, Jonny; Yoshioka, Mayumi; Tanaka, Keitaro; Nishida, Yuichiro
2012-01-01
To identify preferentially expressed genes in the central endocrine organs of the hypothalamus and pituitary gland, we generated transcriptome-wide mRNA profiles of the hypothalamus, pituitary gland, and parietal cortex in male mice (12–15 weeks old) using serial analysis of gene expression (SAGE). Total counts of SAGE tags for the hypothalamus, pituitary gland, and parietal cortex were 165824, 126688, and 161045 tags, respectively. This represented 59244, 45151, and 55131 distinct tags, respectively. Comparison of these mRNA profiles revealed that 22 mRNA species, including three potential novel transcripts, were preferentially expressed in the hypothalamus. In addition to well-known hypothalamic transcripts, such as hypocretin, several genes involved in hormone function, intracellular transduction, metabolism, protein transport, steroidogenesis, extracellular matrix, and brain disease were identified as preferentially expressed hypothalamic transcripts. In the pituitary gland, 106 mRNA species, including 60 potential novel transcripts, were preferentially expressed. In addition to well-known pituitary genes, such as growth hormone and thyroid stimulating hormone beta, a number of genes classified to function in transport, amino acid metabolism, intracellular transduction, cell adhesion, disulfide bond formation, stress response, transcription, protein synthesis, and turnover, cell differentiation, the cell cycle, and in the cytoskeleton and extracellular matrix were also preferentially expressed. In conclusion, the current study identified not only well-known hypothalamic and pituitary transcripts but also a number of new candidates likely to be involved in endocrine homeostatic systems regulated by the hypothalamus and pituitary gland. PMID:22649398
St-Amand, Jonny; Yoshioka, Mayumi; Tanaka, Keitaro; Nishida, Yuichiro
2011-01-01
To identify preferentially expressed genes in the central endocrine organs of the hypothalamus and pituitary gland, we generated transcriptome-wide mRNA profiles of the hypothalamus, pituitary gland, and parietal cortex in male mice (12-15 weeks old) using serial analysis of gene expression (SAGE). Total counts of SAGE tags for the hypothalamus, pituitary gland, and parietal cortex were 165824, 126688, and 161045 tags, respectively. This represented 59244, 45151, and 55131 distinct tags, respectively. Comparison of these mRNA profiles revealed that 22 mRNA species, including three potential novel transcripts, were preferentially expressed in the hypothalamus. In addition to well-known hypothalamic transcripts, such as hypocretin, several genes involved in hormone function, intracellular transduction, metabolism, protein transport, steroidogenesis, extracellular matrix, and brain disease were identified as preferentially expressed hypothalamic transcripts. In the pituitary gland, 106 mRNA species, including 60 potential novel transcripts, were preferentially expressed. In addition to well-known pituitary genes, such as growth hormone and thyroid stimulating hormone beta, a number of genes classified to function in transport, amino acid metabolism, intracellular transduction, cell adhesion, disulfide bond formation, stress response, transcription, protein synthesis, and turnover, cell differentiation, the cell cycle, and in the cytoskeleton and extracellular matrix were also preferentially expressed. In conclusion, the current study identified not only well-known hypothalamic and pituitary transcripts but also a number of new candidates likely to be involved in endocrine homeostatic systems regulated by the hypothalamus and pituitary gland.
Kawaura, Kanako; Mochida, Keiichi; Yamazaki, Yukiko; Ogihara, Yasunari
2006-04-01
In this study, we constructed a 22k wheat oligo-DNA microarray. A total of 148,676 expressed sequence tags of common wheat were collected from the database of the Wheat Genomics Consortium of Japan. These were grouped into 34,064 contigs, which were then used to design an oligonucleotide DNA microarray. Following a multistep selection of the sense strand, 21,939 60-mer oligo-DNA probes were selected for attachment on the microarray slide. This 22k oligo-DNA microarray was used to examine the transcriptional response of wheat to salt stress. More than 95% of the probes gave reproducible hybridization signals when targeted with RNAs extracted from salt-treated wheat shoots and roots. With the microarray, we identified 1,811 genes whose expressions changed more than 2-fold in response to salt. These included genes known to mediate response to salt, as well as unknown genes, and they were classified into 12 major groups by hierarchical clustering. These gene expression patterns were also confirmed by real-time reverse transcription-PCR. Many of the genes with unknown function were clustered together with genes known to be involved in response to salt stress. Thus, analysis of gene expression patterns combined with gene ontology should help identify the function of the unknown genes. Also, functional analysis of these wheat genes should provide new insight into the response to salt stress. Finally, these results indicate that the 22k oligo-DNA microarray is a reliable method for monitoring global gene expression patterns in wheat.
Vimolmangkang, Sornkanok; Zheng, Danman; Han, Yuepeng; Khan, M Awais; Soria-Guerra, Ruth Elena; Korban, Schuyler S
2014-01-15
Although the mechanism of light regulation of color pigmentation of apple fruit is not fully understood, it has been shown that light can regulate expression of genes in the anthocyanin biosynthesis pathway by inducing transcription factors (TFs). Moreover, expression of genes encoding enzymes involved in this pathway may be coordinately regulated by multiple TFs. In this study, fruits on trees of apple cv. Red Delicious were covered with paper bags during early stages of fruit development and then removed prior to maturation to analyze the transcriptome in the exocarp of apple fruit. Comparisons of gene expression profiles of fruit covered with paper bags (dark-grown treatment) and those subjected to 14 h light treatment, following removal of paper bags, were investigated using an apple microarray of 40,000 sequences. Expression profiles were investigated over three time points, at one week intervals, during fruit development. Overall, 736 genes with expression values greater than two-fold were found to be modulated by light treatment. Light-induced products were classified into 19 categories with highest scores in primary metabolism (17%) and transcription (12%). Based on the Arabidopsis gene ontology annotation, 18 genes were identified as TFs. To further confirm expression patterns of flavonoid-related genes, these were subjected to quantitative RT-PCR (qRT-PCR) using fruit of red-skinned apple cv. Red Delicious and yellow-skinned apple cv. Golden Delicious. Of these, two genes showed higher levels of expression in 'Red Delicious' than in 'Golden Delicious', and were likely involved in the regulation of fruit red color pigmentation. © 2013 Elsevier B.V. All rights reserved.
Cao, Chuanwang; Wang, Zhiying; Niu, Changying; Desneux, Nicolas; Gao, Xiwu
2013-01-01
Phenol is a major pollutant in aquatic ecosystems due to its chemical stability, water solubility and environmental mobility. To date, little is known about the molecular modifications of invertebrates under phenol stress. In the present study, we used Solexa sequencing technology to investigate the transcriptome and differentially expressed genes (DEGs) of midges (Chironomus kiinensis) in response to phenol stress. A total of 51,518,972 and 51,150,832 clean reads in the phenol-treated and control libraries, respectively, were obtained and assembled into 51,014 non-redundant (Nr) consensus sequences. A total of 6,032 unigenes were classified by Gene Ontology (GO), and 18,366 unigenes were categorized into 238 Kyoto Encyclopedia of Genes and Genomes (KEGG) categories. These genes included representatives from almost all functional categories. A total of 10,724 differentially expressed genes (P value <0.05) were detected in a comparative analysis of the expression profiles between phenol-treated and control C. kiinensis including 8,390 upregulated and 2,334 downregulated genes. The expression levels of 20 differentially expressed genes were confirmed by real-time RT-PCR, and the trends in gene expression that were observed matched the Solexa expression profiles, although the magnitude of the variations was different. Through pathway enrichment analysis, significantly enriched pathways were identified for the DEGs, including metabolic pathways, aryl hydrocarbon receptor (AhR), pancreatic secretion and neuroactive ligand-receptor interaction pathways, which may be associated with the phenol responses of C. kiinensis. Using Solexa sequencing technology, we identified several groups of key candidate genes as well as important biological pathways involved in the molecular modifications of chironomids under phenol stress. PMID:23527048
Geng, Xiaodong; Wang, Yuanda; Hong, Quan; Yang, Jurong; Zheng, Wei; Zhang, Gang; Cai, Guangyan; Chen, Xiangmei; Wu, Di
2015-01-01
Rhabdomyolysis is a threatening syndrome because it causes the breakdown of skeletal muscle. Muscle destruction leads to the release of myoglobin, intracellular proteins, and electrolytes into the circulation. The aim of this study was to investigate the differences in gene expression profiles and signaling pathways upon rhabdomyolysis-induced acute kidney injury (AKI). In this study, we used glycerol-induced renal injury as a model of rhabdomyolysis-induced AKI. We analyzed data and relevant information from the Gene Expression Omnibus database (No: GSE44925). The gene expression data for three untreated mice were compared to data for five mice with rhabdomyolysis-induced AKI. The expression profiling of the three untreated mice and the five rhabdomyolysis-induced AKI mice was performed using microarray analysis. We examined the levels of Cyp3a13, Rela, Aldh7a1, Jun, CD14. And Cdkn1a using RT-PCR to determine the accuracy of the microarray results. The microarray analysis showed that there were 1050 downregulated and 659 upregulated genes in the rhabdomyolysis-induced AKI mice compared to the control group. The interactions of all differentially expressed genes in the Signal-Net were analyzed. Cyp3a13 and Rela had the most interactions with other genes. The data showed that Rela and Aldh7a1 were the key nodes and had important positions in the Signal-Net. The genes Jun, CD14, and Cdkn1a were also significantly upregulated. The pathway analysis classified the differentially expressed genes into 71 downregulated and 48 upregulated pathways including the PI3K/Akt, MAPK, and NF-κB signaling pathways. The results of this study indicate that the NF-κB, MAPK, PI3K/Akt, and apoptotic pathways are regulated in rhabdomyolysis-induced AKI.
A deep learning-based multi-model ensemble method for cancer prediction.
Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong
2018-01-01
Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Novel Genomic and Evolutionary Insight of WRKY Transcription Factors in Plant Lineage
Mohanta, Tapan Kumar; Park, Yong-Hwan; Bae, Hanhong
2016-01-01
The evolutionarily conserved WRKY transcription factor (TF) regulates different aspects of gene expression in plants, and modulates growth, development, as well as biotic and abiotic stress responses. Therefore, understanding the details regarding WRKY TFs is very important. In this study, large-scale genomic analyses of the WRKY TF gene family from 43 plant species were conducted. The results of our study revealed that WRKY TFs could be grouped and specifically classified as those belonging to the monocot or dicot plant lineage. In this study, we identified several novel WRKY TFs. To our knowledge, this is the first report on a revised grouping system of the WRKY TF gene family in plants. The different forms of novel chimeric forms of WRKY TFs in the plant genome might play a crucial role in their evolution. Tissue-specific gene expression analyses in Glycine max and Phaseolus vulgaris showed that WRKY11-1, WRKY11-2 and WRKY11-3 were ubiquitously expressed in all tissue types, and WRKY15-2 was highly expressed in the stem, root, nodule and pod tissues in G. max and P. vulgaris. PMID:27853303
Novel Genomic and Evolutionary Insight of WRKY Transcription Factors in Plant Lineage.
Mohanta, Tapan Kumar; Park, Yong-Hwan; Bae, Hanhong
2016-11-17
The evolutionarily conserved WRKY transcription factor (TF) regulates different aspects of gene expression in plants, and modulates growth, development, as well as biotic and abiotic stress responses. Therefore, understanding the details regarding WRKY TFs is very important. In this study, large-scale genomic analyses of the WRKY TF gene family from 43 plant species were conducted. The results of our study revealed that WRKY TFs could be grouped and specifically classified as those belonging to the monocot or dicot plant lineage. In this study, we identified several novel WRKY TFs. To our knowledge, this is the first report on a revised grouping system of the WRKY TF gene family in plants. The different forms of novel chimeric forms of WRKY TFs in the plant genome might play a crucial role in their evolution. Tissue-specific gene expression analyses in Glycine max and Phaseolus vulgaris showed that WRKY11-1, WRKY11-2 and WRKY11-3 were ubiquitously expressed in all tissue types, and WRKY15-2 was highly expressed in the stem, root, nodule and pod tissues in G. max and P. vulgaris.
Chen, Xue; Chen, Zhu; Zhao, Hualin; Zhao, Yang; Cheng, Beijiu; Xiang, Yan
2014-01-01
Background Homeodomain-leucine zipper (HD-Zip) proteins, a group of homeobox transcription factors, participate in various aspects of normal plant growth and developmental processes as well as environmental responses. To date, no overall analysis or expression profiling of the HD-Zip gene family in soybean (Glycine max) has been reported. Methods and Findings An investigation of the soybean genome revealed 88 putative HD-Zip genes. These genes were classified into four subfamilies, I to IV, based on phylogenetic analysis. In each subfamily, the constituent parts of gene structure and motif were relatively conserved. A total of 87 out of 88 genes were distributed unequally on 20 chromosomes with 36 segmental duplication events, indicating that segmental duplication is important for the expansion of the HD-Zip family. Analysis of the Ka/Ks ratios showed that the duplicated genes of the HD-Zip family basically underwent purifying selection with restrictive functional divergence after the duplication events. Analysis of expression profiles showed that 80 genes differentially expressed across 14 tissues, and 59 HD-Zip genes are differentially expressed under salinity and drought stress, with 20 paralogous pairs showing nearly identical expression patterns and three paralogous pairs diversifying significantly under drought stress. Quantitative real-time RT-PCR (qRT-PCR) analysis of six paralogous pairs of 12 selected soybean HD-Zip genes under both drought and salinity stress confirmed their stress-inducible expression patterns. Conclusions This study presents a thorough overview of the soybean HD-Zip gene family and provides a new perspective on the evolution of this gene family. The results indicate that HD-Zip family genes may be involved in many plant responses to stress conditions. Additionally, this study provides a solid foundation for uncovering the biological roles of HD-Zip genes in soybean growth and development. PMID:24498296
Probabilistic classifiers with high-dimensional data
Kim, Kyung In; Simon, Richard
2011-01-01
For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Jian; Luo, Mao; Zhu, Ye
2015-03-27
Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries ofmore » untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.« less
Single-cell transcriptional analysis of taste sensory neuron pair in Caenorhabditis elegans.
Takayama, Jun; Faumont, Serge; Kunitomo, Hirofumi; Lockery, Shawn R; Iino, Yuichi
2010-01-01
The nervous system is composed of a wide variety of neurons. A description of the transcriptional profiles of each neuron would yield enormous information about the molecular mechanisms that define morphological or functional characteristics. Here we show that RNA isolation from single neurons is feasible by using an optimized mRNA tagging method. This method extracts transcripts in the target cells by co-immunoprecipitation of the complexes of RNA and epitope-tagged poly(A) binding protein expressed specifically in the cells. With this method and genome-wide microarray, we compared the transcriptional profiles of two functionally different neurons in the main C. elegans gustatory neuron class ASE. Eight of the 13 known subtype-specific genes were successfully detected. Additionally, we identified nine novel genes including a receptor guanylyl cyclase, secreted proteins, a TRPC channel and uncharacterized genes conserved among nematodes, suggesting the two neurons are substantially different than previously thought. The expression of these novel genes was controlled by the previously known regulatory network for subtype differentiation. We also describe unique motif organization within individual gene groups classified by the expression patterns in ASE. Our study paves the way to the complete catalog of the expression profiles of individual C. elegans neurons.
Zhang, Hailing; Cao, Yingping; Shang, Chen; Li, Jikai; Wang, Jianli; Wu, Zhenying; Ma, Lichao; Qi, Tianxiong; Fu, Chunxiang; Hu, Baozhong
2017-01-01
The GRAS gene family is a large plant-specific family of transcription factors that are involved in diverse processes during plant development. Medicago truncatula is an ideal model plant for genetic research in legumes, and specifically for studying nodulation, which is crucial for nitrogen fixation. In this study, 59 MtGRAS genes were identified and classified into eight distinct subgroups based on phylogenetic relationships. Motifs located in the C-termini were conserved across the subgroups, while motifs in the N-termini were subfamily specific. Gene duplication was the main evolutionary force for MtGRAS expansion, especially proliferation of the LISCL subgroup. Seventeen duplicated genes showed strong effects of purifying selection and diverse expression patterns, highlighting their functional importance and diversification after duplication. Thirty MtGRAS genes, including NSP1 and NSP2, were preferentially expressed in nodules, indicating possible roles in the process of nodulation. A transcriptome study, combined with gene expression analysis under different stress conditions, suggested potential functions of MtGRAS genes in various biological pathways and stress responses. Taken together, these comprehensive analyses provide basic information for understanding the potential functions of GRAS genes, and will facilitate further discovery of MtGRAS gene functions. PMID:28945786
Han, Yahui; Ding, Ting; Su, Bo; Jiang, Haiyang
2016-01-01
Members of the chalcone synthase (CHS) family participate in the synthesis of a series of secondary metabolites in plants, fungi and bacteria. The metabolites play important roles in protecting land plants against various environmental stresses during the evolutionary process. Our research was conducted on comprehensive investigation of CHS genes in maize (Zea mays L.), including their phylogenetic relationships, gene structures, chromosomal locations and expression analysis. Fourteen CHS genes (ZmCHS01–14) were identified in the genome of maize, representing one of the largest numbers of CHS family members identified in one organism to date. The gene family was classified into four major classes (classes I–IV) based on their phylogenetic relationships. Most of them contained two exons and one intron. The 14 genes were unevenly located on six chromosomes. Two segmental duplication events were identified, which might contribute to the expansion of the maize CHS gene family to some extent. In addition, quantitative real-time PCR and microarray data analyses suggested that ZmCHS genes exhibited various expression patterns, indicating functional diversification of the ZmCHS genes. Our results will contribute to future studies of the complexity of the CHS gene family in maize and provide valuable information for the systematic analysis of the functions of the CHS gene family. PMID:26828478
Evaluation of xenobiotic-induced changes in gene expression as a method to identify and classify potential toxicants is being pursued by industry and regulatory agencies worldwide. A workshop was held at the Research Triangle Park campus of the Environmental Protection Agency to...
Peláez-García, Alberto; Yébenes, Laura; Berjón, Alberto; Angulo, Antonia; Zamora, Pilar; Sánchez-Méndez, José Ignacio; Espinosa, Enrique; Redondo, Andrés; Heredia-Soto, Victoria; Mendiola, Marta; Feliú, Jaime
2017-01-01
Purpose To compare the concordance in risk classification between the EndoPredict and the MammaPrint scores obtained for the same cancer samples on 40 estrogen-receptor positive/HER2-negative breast carcinomas. Methods Formalin-fixed, paraffin-embedded invasive breast carcinoma tissues that were previously analyzed with MammaPrint as part of routine care of the patients, and were classified as high-risk (20 patients) and low-risk (20 patients), were selected to be analyzed by the EndoPredict assay, a second generation gene expression test that combines expression of 8 genes (EP score) with two clinicopathological variables (tumor size and nodal status, EPclin score). Results The EP score classified 15 patients as low-risk and 25 patients as high-risk. EPclin re-classified 5 of the 25 EP high-risk patients into low-risk, resulting in a total of 20 high-risk and 20 low-risk tumors. EP score and MammaPrint score were significantly correlated (p = 0.008). Twelve of 20 samples classified as low-risk by MammaPrint were also low-risk by EP score (60%). 17 of 20 MammaPrint high-risk tumors were also high-risk by EP score. The overall concordance between EP score and MammaPrint was 72.5% (κ = 0.45, (95% CI, 0.182 to 0.718)). EPclin score also correlated with MammaPrint results (p = 0.004). Discrepancies between both tests occurred in 10 cases: 5 MammaPrint low-risk patients were classified as EPclin high-risk and 5 high-risk MammaPrint were classified as low-risk by EPclin and overall concordance of 75% (κ = 0.5, (95% CI, 0.232 to 0.768)). Conclusions This pilot study demonstrates a limited concordance between MammaPrint and EndoPredict. Differences in results could be explained by the inclusion of different gene sets in each platform, the use of different methodology, and the inclusion of clinicopathological parameters, such as tumor size and nodal status, in the EndoPredict test. PMID:28886093
Genome-wide differential gene expression in immortalized DF-1 chicken embryo fibroblast cell line
2011-01-01
Background When compared to primary chicken embryo fibroblast (CEF) cells, the immortal DF-1 CEF line exhibits enhanced growth rates and susceptibility to oxidative stress. Although genes responsible for cell cycle regulation and antioxidant functions have been identified, the genome-wide transcription profile of immortal DF-1 CEF cells has not been previously reported. Global gene expression in primary CEF and DF-1 cells was performed using a 4X44K chicken oligo microarray. Results A total of 3876 differentially expressed genes were identified with a 2 fold level cutoff that included 1706 up-regulated and 2170 down-regulated genes in DF-1 cells. Network and functional analyses using Ingenuity Pathways Analysis (IPA, Ingenuity® Systems, http://www.ingenuity.com) revealed that 902 of 3876 differentially expressed genes were classified into a number of functional groups including cellular growth and proliferation, cell cycle, cellular movement, cancer, genetic disorders, and cell death. Also, the top 5 gene networks with intermolecular connections were identified. Bioinformatic analyses suggested that DF-1 cells were characterized by enhanced molecular mechanisms for cell cycle progression and proliferation, suppressing cell death pathways, altered cellular morphogenesis, and accelerated capacity for molecule transport. Key molecules for these functions include E2F1, BRCA1, SRC, CASP3, and the peroxidases. Conclusions The global gene expression profiles provide insight into the cellular mechanisms that regulate the unique characteristics observed in immortal DF-1 CEF cells. PMID:22111699
Gao, Jianyong; Tian, Gang; Han, Xu; Zhu, Qiang
2018-01-01
Oral squamous cell carcinoma (OSCC) is the sixth most common type cancer worldwide, with poor prognosis. The present study aimed to identify gene signatures that could classify OSCC and predict prognosis in different stages. A training data set (GSE41613) and two validation data sets (GSE42743 and GSE26549) were acquired from the online Gene Expression Omnibus database. In the training data set, patients were classified based on the tumor-node-metastasis staging system, and subsequently grouped into low stage (L) or high stage (H). Signature genes between L and H stages were selected by disparity index analysis, and classification was performed by the expression of these signature genes. The established classification was compared with the L and H classification, and fivefold cross validation was used to evaluate the stability. Enrichment analysis for the signature genes was implemented by the Database for Annotation, Visualization and Integration Discovery. Two validation data sets were used to determine the precise of classification. Survival analysis was conducted followed each classification using the package ‘survival’ in R software. A set of 24 signature genes was identified based on the classification model with the Fi value of 0.47, which was used to distinguish OSCC samples in two different stages. Overall survival of patients in the H stage was higher than those in the L stage. Signature genes were primarily enriched in ‘ether lipid metabolism’ pathway and biological processes such as ‘positive regulation of adaptive immune response’ and ‘apoptotic cell clearance’. The results provided a novel 24-gene set that may be used as biomarkers to predict OSCC prognosis with high accuracy, which may be used to determine an appropriate treatment program for patients with OSCC in addition to the traditional evaluation index. PMID:29257303
Ye, Jianqiu; Yang, Hai; Shi, Haitao; Wei, Yunxie; Tie, Weiwei; Ding, Zehong; Yan, Yan; Luo, Ying; Xia, Zhiqiang; Wang, Wenquan; Peng, Ming; Li, Kaimian; Zhang, He; Hu, Wei
2017-11-02
Mitogen-activated protein kinase kinase kinases (MAPKKKs), an important unit of MAPK cascade, play crucial roles in plant development and response to various stresses. However, little is known concerning the MAPKKK family in the important subtropical and tropical crop cassava. In this study, 62 MAPKKK genes were identified in the cassava genome, and were classified into 3 subfamilies based on phylogenetic analysis. Most of MAPKKKs in the same subfamily shared similar gene structures and conserved motifs. The comprehensive transcriptome analysis showed that MAPKKK genes participated in tissue development and response to drought stress. Comparative expression profiles revealed that many MAPKKK genes were activated in cultivated varieties SC124 and Arg7 and the function of MeMAPKKKs in drought resistance may be different between SC124/Arg7 and W14. Expression analyses of the 7 selected MeMAPKKK genes showed that most of them were significantly upregulated by osmotic, salt and ABA treatments, whereas slightly induced by H 2 O 2 and cold stresses. Taken together, this study identified candidate MeMAPKKK genes for genetic improvement of abiotic stress resistance and provided new insights into MAPKKK -mediated cassava resistance to drought stress.
Analysis of barosensitive mechanisms in yeast for Pressure Regulated Fermentation
NASA Astrophysics Data System (ADS)
Nomura, Kazuki; Iwahashi, Hitoshi; Iguchi, Akinori; Shigematsu, Toru
2013-06-01
Introduction: We are intending to develop a novel food processing technology, Pressure Regulated Fermentation (PReF), using pressure sensitive (barosensitive) fermentation microorganisms. Objectives of our study are to clarify barosensitive mechanisms for application to PReF technology. We isolated Saccharomyces cerevisiae barosensitive mutant a924E1 that was derived from the parent KA31a. Methods: Gene expression levels were analyzed by DNA microarray. The altered genes of expression levels were classified according to the gene function. Mutated genes were estimated by mating and producing diploid strains and confirmed by PCR of mitochondrial DNA (mtDNA). Results and Discussion: Gene expression profiles showed that genes of `Energy' function and that of encoding protein localized in ``Mitochondria'' were significantly down regulated in the mutant. These results suggest the respiratory deficiency and relationship between barosensitivity and respiratory deficiency. Since the respiratory functions of diploids showed non Mendelian inheritance, the respiratory deficiency was indicated to be due to mtDNA mutation. PCR analysis showed that the region of COX1 locus was deleted. COX1 gene encodes the subunit 1 of cytochrome c oxidase. For this reason, barosensitivity is strongly correlated with mitochondrial functions.
Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.
Roche, Kimberly E; Weinstein, Marvin; Dunwoodie, Leland J; Poehlman, William L; Feltus, Frank A
2018-05-25
We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.
2016-01-04
2016 (wileyonlinelibrary.com) DOI 10.1002/jat.3278Systems toxicology of chemically induced liver and kidney injuries: histopathology-associated gene...injuries that classify 11 liver and eight kidney histopathology endpoints based on dose-dependent activation of the identified modules. We showed that...well as determine whether the injury module activation was specific to the tissue of origin (liver and kidney ). The generated modules provide a link
Tsunashima, Ryo; Naoi, Yasuto; Shimazu, Kenzo; Kagara, Naofumi; Shimoda, Masashi; Tanei, Tomonori; Miyake, Tomohiro; Kim, Seung Jin; Noguchi, Shinzaburo
2018-05-04
Prediction models for late (> 5 years) recurrence in ER-positive breast cancer need to be developed for the accurate selection of patients for extended hormonal therapy. We attempted to develop such a prediction model focusing on the differences in gene expression between breast cancers with early and late recurrence. For the training set, 779 ER-positive breast cancers treated with tamoxifen alone for 5 years were selected from the databases (GSE6532, GSE12093, GSE17705, and GSE26971). For the validation set, 221 ER-positive breast cancers treated with adjuvant hormonal therapy for 5 years with or without chemotherapy at our hospital were included. Gene expression was assayed by DNA microarray analysis (Affymetrix U133 plus 2.0). With the 42 genes differentially expressed in early and late recurrence breast cancers in the training set, a prediction model (42GC) for late recurrence was constructed. The patients classified by 42GC into the late recurrence-like group showed a significantly (P = 0.006) higher late recurrence rate as expected but a significantly (P = 1.62 × E-13) lower rate for early recurrence than non-late recurrence-like group. These observations were confirmed for the validation set, i.e., P = 0.020 for late recurrence and P = 5.70 × E-5 for early recurrence. We developed a unique prediction model (42GC) for late recurrence by focusing on the biological differences between breast cancers with early and late recurrence. Interestingly, patients in the late recurrence-like group by 42GC were at low risk for early recurrence.
Genome-Wide Identification and Expression Analysis of the WRKY Gene Family in Cassava
Wei, Yunxie; Shi, Haitao; Xia, Zhiqiang; Tie, Weiwei; Ding, Zehong; Yan, Yan; Wang, Wenquan; Hu, Wei; Li, Kaimian
2016-01-01
The WRKY family, a large family of transcription factors (TFs) found in higher plants, plays central roles in many aspects of physiological processes and adaption to environment. However, little information is available regarding the WRKY family in cassava (Manihot esculenta). In the present study, 85 WRKY genes were identified from the cassava genome and classified into three groups according to conserved WRKY domains and zinc-finger structure. Conserved motif analysis showed that all of the identified MeWRKYs had the conserved WRKY domain. Gene structure analysis suggested that the number of introns in MeWRKY genes varied from 1 to 5, with the majority of MeWRKY genes containing three exons. Expression profiles of MeWRKY genes in different tissues and in response to drought stress were analyzed using the RNA-seq technique. The results showed that 72 MeWRKY genes had differential expression in their transcript abundance and 78 MeWRKY genes were differentially expressed in response to drought stresses in different accessions, indicating their contribution to plant developmental processes and drought stress resistance in cassava. Finally, the expression of 9 WRKY genes was analyzed by qRT-PCR under osmotic, salt, ABA, H2O2, and cold treatments, indicating that MeWRKYs may be involved in different signaling pathways. Taken together, this systematic analysis identifies some tissue-specific and abiotic stress-responsive candidate MeWRKY genes for further functional assays in planta, and provides a solid foundation for understanding of abiotic stress responses and signal transduction mediated by WRKYs in cassava. PMID:26904033
Genome-Wide Identification and Expression Analysis of the WRKY Gene Family in Cassava.
Wei, Yunxie; Shi, Haitao; Xia, Zhiqiang; Tie, Weiwei; Ding, Zehong; Yan, Yan; Wang, Wenquan; Hu, Wei; Li, Kaimian
2016-01-01
The WRKY family, a large family of transcription factors (TFs) found in higher plants, plays central roles in many aspects of physiological processes and adaption to environment. However, little information is available regarding the WRKY family in cassava (Manihot esculenta). In the present study, 85 WRKY genes were identified from the cassava genome and classified into three groups according to conserved WRKY domains and zinc-finger structure. Conserved motif analysis showed that all of the identified MeWRKYs had the conserved WRKY domain. Gene structure analysis suggested that the number of introns in MeWRKY genes varied from 1 to 5, with the majority of MeWRKY genes containing three exons. Expression profiles of MeWRKY genes in different tissues and in response to drought stress were analyzed using the RNA-seq technique. The results showed that 72 MeWRKY genes had differential expression in their transcript abundance and 78 MeWRKY genes were differentially expressed in response to drought stresses in different accessions, indicating their contribution to plant developmental processes and drought stress resistance in cassava. Finally, the expression of 9 WRKY genes was analyzed by qRT-PCR under osmotic, salt, ABA, H2O2, and cold treatments, indicating that MeWRKYs may be involved in different signaling pathways. Taken together, this systematic analysis identifies some tissue-specific and abiotic stress-responsive candidate MeWRKY genes for further functional assays in planta, and provides a solid foundation for understanding of abiotic stress responses and signal transduction mediated by WRKYs in cassava.
Characterization of two rice MADS box genes that control flowering time.
Kang, H G; Jang, S; Chung, J E; Cho, Y G; An, G
1997-08-31
Plants contain a variety of the MADS box genes that encode regulatory proteins and play important roles in both the formation of flower meristem and the determination of floral organ identity. We have characterized two flower-specific cDNAs from rice, designated OsMADS7 and OsMADS8. The cDNAs displayed the structure of a typical plant MADS box gene, which consists of the MADS domain, I region, K domain, and C-terminal region. These genes were classified as members of the AGL2 gene family based on sequence homology. The OsMADS7 and 8 proteins were most homologous to OM1 and FBP2, respectively. The OsMADS7 and 8 transcripts were detectable primarily in carpels and also weakly in anthers. During flower development, the OsMADS genes started to express at the young flower stage and the expression continued to the late stage of flower development. The OsMADS7 and 8 genes were mapped on the long arms of the chromosome 8 and 9, respectively. To study the functions of the genes, the cDNA clones were expressed ectopically using the CaMV 35S promoter in a heterologous tobacco plant system. Transgenic plants expressing the OsMADS genes exhibited the phenotype of early flowering and dwarfism. The strength of the phenotypes was proportional to the levels of transgene expression and the phenotypes were co-inherited with the kanamycin resistant gene to the next generation. These results indicate that OsMADS7 and 8 are structurally related to the AGL2 family and are involved in controlling flowering time.
Li, Chang-Lin; Li, Kai-Cheng; Wu, Dan; Chen, Yan; Luo, Hao; Zhao, Jing-Rong; Wang, Sa-Shuang; Sun, Ming-Ming; Lu, Ying-Jin; Zhong, Yan-Qing; Hu, Xu-Ye; Hou, Rui; Zhou, Bei-Bei; Bao, Lan; Xiao, Hua-Sheng; Zhang, Xu
2016-01-01
Sensory neurons are distinguished by distinct signaling networks and receptive characteristics. Thus, sensory neuron types can be defined by linking transcriptome-based neuron typing with the sensory phenotypes. Here we classify somatosensory neurons of the mouse dorsal root ganglion (DRG) by high-coverage single-cell RNA-sequencing (10 950 ± 1 218 genes per neuron) and neuron size-based hierarchical clustering. Moreover, single DRG neurons responding to cutaneous stimuli are recorded using an in vivo whole-cell patch clamp technique and classified by neuron-type genetic markers. Small diameter DRG neurons are classified into one type of low-threshold mechanoreceptor and five types of mechanoheat nociceptors (MHNs). Each of the MHN types is further categorized into two subtypes. Large DRG neurons are categorized into four types, including neurexophilin 1-expressing MHNs and mechanical nociceptors (MNs) expressing BAI1-associated protein 2-like 1 (Baiap2l1). Mechanoreceptors expressing trafficking protein particle complex 3-like and Baiap2l1-marked MNs are subdivided into two subtypes each. These results provide a new system for cataloging somatosensory neurons and their transcriptome databases. PMID:26691752
Xu, Fan; Yang, Jing; Chen, Jin; Wu, Qingyuan; Gong, Wei; Zhang, Jianguo; Shao, Weihua; Mu, Jun; Yang, Deyu; Yang, Yongtao; Li, Zhiwei; Xie, Peng
2015-04-03
Recent depression research has revealed a growing awareness of how to best classify depression into depressive subtypes. Appropriately subtyping depression can lead to identification of subtypes that are more responsive to current pharmacological treatment and aid in separating out depressed patients in which current antidepressants are not particularly effective. Differential co-expression analysis (DCEA) and differential regulation analysis (DRA) were applied to compare the transcriptomic profiles of peripheral blood lymphocytes from patients with two depressive subtypes: major depressive disorder (MDD) and subsyndromal symptomatic depression (SSD). Six differentially regulated genes (DRGs) (FOSL1, SRF, JUN, TFAP4, SOX9, and HLF) and 16 transcription factor-to-target differentially co-expressed gene links or pairs (TF2target DCLs) appear to be the key differential factors in MDD; in contrast, one DRG (PATZ1) and eight TF2target DCLs appear to be the key differential factors in SSD. There was no overlap between the MDD target genes and SSD target genes. Venlafaxine (Efexor™, Effexor™) appears to have a significant effect on the gene expression profile of MDD patients but no significant effect on the gene expression profile of SSD patients. DCEA and DRA revealed no apparent similarities between the differential regulatory processes underlying MDD and SSD. This bioinformatic analysis may provide novel insights that can support future antidepressant R&D efforts.
Li, Jun; Hou, Hongmin; Li, Xiaoqin; Xiang, Jiang; Yin, Xiangjing; Gao, Hua; Zheng, Yi; Bassett, Carole L; Wang, Xiping
2013-09-01
SQUAMOSA promoter binding protein (SBP)-box genes encode a family of plant-specific transcription factors and play many crucial roles in plant development. In this study, 27 SBP-box gene family members were identified in the apple (Malus × domestica Borkh.) genome, 15 of which were suggested to be putative targets of MdmiR156. Plant SBPs were classified into eight groups according to the phylogenetic analysis of SBP-domain proteins. Gene structure, gene chromosomal location and synteny analyses of MdSBP genes within the apple genome demonstrated that tandem and segmental duplications, as well as whole genome duplications, have likely contributed to the expansion and evolution of the SBP-box gene family in apple. Additionally, synteny analysis between apple and Arabidopsis indicated that several paired homologs of MdSBP and AtSPL genes were located in syntenic genomic regions. Tissue-specific expression analysis of MdSBP genes in apple demonstrated their diversified spatiotemporal expression patterns. Most MdmiR156-targeted MdSBP genes, which had relatively high transcript levels in stems, leaves, apical buds and some floral organs, exhibited a more differential expression pattern than most MdmiR156-nontargeted MdSBP genes. Finally, expression analysis of MdSBP genes in leaves upon various plant hormone treatments showed that many MdSBP genes were responsive to different plant hormones, indicating that MdSBP genes may be involved in responses to hormone signaling during stress or in apple development. Copyright © 2013 Elsevier Masson SAS. All rights reserved.
Heng, Yujing Jan; Pennell, Craig Edward; Chua, Hon Nian; Perkins, Jonathan Edward; Lye, Stephen James
2014-01-01
Threatened preterm labor (TPTL) is defined as persistent premature uterine contractions between 20 and 37 weeks of gestation and is the most common condition that requires hospitalization during pregnancy. Most of these TPTL women continue their pregnancies to term while only an estimated 5% will deliver a premature baby within ten days. The aim of this work was to study differential whole blood gene expression associated with spontaneous preterm birth (sPTB) within 48 hours of hospital admission. Peripheral blood was collected at point of hospital admission from 154 women with TPTL before any medical treatment. Microarrays were utilized to investigate differential whole blood gene expression between TPTL women who did (n = 48) or did not have a sPTB (n = 106) within 48 hours of admission. Total leukocyte and neutrophil counts were significantly higher (35% and 41% respectively) in women who had sPTB than women who did not deliver within 48 hours (p<0.001). Fetal fibronectin (fFN) test was performed on 62 women. There was no difference in the urine, vaginal and placental microbiology and histopathology reports between the two groups of women. There were 469 significant differentially expressed genes (FDR<0.05); 28 differentially expressed genes were chosen for microarray validation using qRT-PCR and 20 out of 28 genes were successfully validated (p<0.05). An optimal random forest classifier model to predict sPTB was achieved using the top nine differentially expressed genes coupled with peripheral clinical blood data (sensitivity 70.8%, specificity 75.5%). These differentially expressed genes may further elucidate the underlying mechanisms of sPTB and pave the way for future systems biology studies to predict sPTB. PMID:24828675
Serial analysis of gene expression in a rat lung model of asthma.
Yin, Lei-Miao; Jiang, Gong-Hao; Wang, Yu; Wang, Yan; Liu, Yan-Yan; Jin, Wei-Rong; Zhang, Zen; Xu, Yu-Dong; Yang, Yong-Qing
2008-11-01
The pathogenesis and molecular mechanism underlying asthma remain undetermined. The purpose of this study was to identify genes and pathways involved in the early airway response (EAR) phase of asthma by using serial analysis of gene expression (SAGE). Two SAGE tag libraries of lung tissues derived from a rat model of asthma and controls were generated. Bioinformatic analyses were carried out using the Database for Annotation, Visualization and IntegratedDiscovery Functional Annotation Tool, Gene Ontology (GO) TreeMachine and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. A total of 26 552 SAGE tags of asthmatic rat lung were obtained, of which 12 221 were unique tags. Of the unique tags, 55.5% were matched with known genes. By comparison of the two libraries, 186 differentially expressed tags (P < 0.05) were identified, of which 103 were upregulated and 83 were downregulated. Using the bioinformatic tools these genes were classified into 23 functional groups, 15 KEGG pathways and 37 enriched GO categories. The bioinformatic analyses of gene distribution, enriched categories and the involvement of specific pathways in the SAGE libraries have provided information on regulatory networks of the EAR phase of asthma. Analyses of the regulated genes of interest may inform new hypotheses, increase our understanding of the disease and provide a foundation for future research.
Mitsumoto, Koji; Watanabe, Rina; Nakao, Katsuki; Yonenaka, Hisaki; Hashimoto, Takao; Kato, Norihisa; Kumrungsee, Thanutchaporn; Yanaka, Noriyuki
2017-09-01
Choline-deficient diet is extensively used as a model of nonalcoholic fatty liver disease (NAFLD). In this study, we explored genes in the liver for which the expression changed in response to the choline-deficient (CD) diet. Male CD-1 mice were divided into two groups and fed a CD diet with or without 0.2% choline bitartrate for one or three weeks. Hepatic levels of choline metabolites were analyzed by using liquid chromatography mass spectrometry and hepatic gene expression profiles were examined by DNA microarray analysis. The CD diet lowered liver choline metabolites after one week and exacerbated fatty liver between one and three weeks. We identified >300 genes whose expression was significantly altered in the livers of mice after consumption of this CD diet for one week and showed that liver gene expression profiles could be classified into six distinct groups. This study showed that STAT1 and interferon-regulated genes was up-regulated after the CD diet consumption and that the Stat1 mRNA level was negatively correlated with liver phosphatidylcholine level. Stat1 mRNA expression was actually up-regulated in isolated hepatocytes from the mouse liver with the CD diet. This study provides insight into the genomic effects of the CD diet through the Stat1 expression, which might be involved in NAFLD development. Copyright © 2017 Elsevier Inc. All rights reserved.
Transcriptional Analysis of Flowering Time in Switchgrass
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tornqvist, Carl-Erik; Vaillancourt, Brieanne; Kim, Jeongwoon
Over the past two decades, switchgrass (Panicum virgatum) has emerged as a priority biofuel feedstock. The bulk of switchgrass biomass is in the vegetative portion of the plant; therefore, increasing the length of vegetative growth will lead to an increase in overall biomass yield. The goal of this study was to gain insight into the control of flowering time in switchgrass that would assist in development of cultivars with longer vegetative phases through delayed flowering. RNA sequencing was used to assess genome-wide expression profiles across a developmental series between switchgrass genotypes belonging to the two main ecotypes: upland, typically earlymore » flowering, and lowland, typically late flowering. Leaf blades and tissues enriched for the shoot apical meristem (SAM) were collected in a developmental series from emergence through anthesis for RNA extraction. RNA from samples that flanked the SAM transition stage was sequenced for expression analyses. The analyses revealed differential expression patterns between early- and late-flowering genotypes for known flowering time orthologs. Namely, genes shown to play roles in photoperiod response and the circadian clock in other species were identified as potential candidates for regulating flowering time in the switchgrass genotypes analyzed. Based on their expression patterns, many of the differentially expressed genes could also be classified as putative promoters or repressors of flowering. The candidate genes presented here may be used to guide switchgrass improvement through marker-assisted breeding and/or transgenic or gene editing approaches.Over the past two decades, switchgrass (Panicum virgatum) has emerged as a priority biofuel feedstock. The bulk of switchgrass biomass is in the vegetative portion of the plant; therefore, increasing the length of vegetative growth will lead to an increase in overall biomass yield. The goal of this study was to gain insight into the control of flowering time in switchgrass that would assist in development of cultivars with longer vegetative phases through delayed flowering. RNA sequencing was used to assess genome-wide expression profiles across a developmental series between switchgrass genotypes belonging to the two main ecotypes: upland, typically early flowering, and lowland, typically late flowering. Leaf blades and tissues enriched for the shoot apical meristem (SAM) were collected in a developmental series from emergence through anthesis for RNA extraction. RNA from samples that flanked the SAM transition stage was sequenced for expression analyses. The analyses revealed differential expression patterns between early- and late-flowering genotypes for known flowering time orthologs. Namely, genes shown to play roles in photoperiod response and the circadian clock in other species were identified as potential candidates for regulating flowering time in the switchgrass genotypes analyzed. Based on their expression patterns, many of the differentially expressed genes could also be classified as putative promoters or repressors of flowering. The candidate genes presented here may then be used to guide switchgrass improvement through marker-assisted breeding and/or transgenic or gene editing approaches.« less
Transcriptional Analysis of Flowering Time in Switchgrass
Tornqvist, Carl-Erik; Vaillancourt, Brieanne; Kim, Jeongwoon; ...
2017-04-27
Over the past two decades, switchgrass (Panicum virgatum) has emerged as a priority biofuel feedstock. The bulk of switchgrass biomass is in the vegetative portion of the plant; therefore, increasing the length of vegetative growth will lead to an increase in overall biomass yield. The goal of this study was to gain insight into the control of flowering time in switchgrass that would assist in development of cultivars with longer vegetative phases through delayed flowering. RNA sequencing was used to assess genome-wide expression profiles across a developmental series between switchgrass genotypes belonging to the two main ecotypes: upland, typically earlymore » flowering, and lowland, typically late flowering. Leaf blades and tissues enriched for the shoot apical meristem (SAM) were collected in a developmental series from emergence through anthesis for RNA extraction. RNA from samples that flanked the SAM transition stage was sequenced for expression analyses. The analyses revealed differential expression patterns between early- and late-flowering genotypes for known flowering time orthologs. Namely, genes shown to play roles in photoperiod response and the circadian clock in other species were identified as potential candidates for regulating flowering time in the switchgrass genotypes analyzed. Based on their expression patterns, many of the differentially expressed genes could also be classified as putative promoters or repressors of flowering. The candidate genes presented here may be used to guide switchgrass improvement through marker-assisted breeding and/or transgenic or gene editing approaches.Over the past two decades, switchgrass (Panicum virgatum) has emerged as a priority biofuel feedstock. The bulk of switchgrass biomass is in the vegetative portion of the plant; therefore, increasing the length of vegetative growth will lead to an increase in overall biomass yield. The goal of this study was to gain insight into the control of flowering time in switchgrass that would assist in development of cultivars with longer vegetative phases through delayed flowering. RNA sequencing was used to assess genome-wide expression profiles across a developmental series between switchgrass genotypes belonging to the two main ecotypes: upland, typically early flowering, and lowland, typically late flowering. Leaf blades and tissues enriched for the shoot apical meristem (SAM) were collected in a developmental series from emergence through anthesis for RNA extraction. RNA from samples that flanked the SAM transition stage was sequenced for expression analyses. The analyses revealed differential expression patterns between early- and late-flowering genotypes for known flowering time orthologs. Namely, genes shown to play roles in photoperiod response and the circadian clock in other species were identified as potential candidates for regulating flowering time in the switchgrass genotypes analyzed. Based on their expression patterns, many of the differentially expressed genes could also be classified as putative promoters or repressors of flowering. The candidate genes presented here may then be used to guide switchgrass improvement through marker-assisted breeding and/or transgenic or gene editing approaches.« less
Prager, Rita; Rabsch, Wolfgang; Streckel, Wiebke; Voigt, Wolfgang; Tietze, Erhardt; Tschäpe, Helmut
2003-01-01
Salmonella enterica serotype O1,4,5,12:Hb:1,2, designated according to the current Kauffmann-White scheme as S. enterica serotype Paratyphi B, is a very diverse serotype with respect to its clinical and microbiological properties. PCR and blot techniques, which identify the presence, polymorphism, and expression of various effector protein genes, help to distinguish between strains with systemic and enteric outcomes of disease. All serotype Paratyphi B strains from systemic infections have been found to be somewhat genetically related with respect to the pattern of their virulence genes sopB, sopD, sopE1, avrA, and sptP as well as other molecular properties (multilocus enzyme electrophoresis type, pulsed-field gel electrophoresis [PFGE] type, ribotype, and IS200 type). They have been classified as members of the systemic pathovar (SPV). All these SPV strains possess a new sopE1-carrying bacteriophage (designated ΦSopE309) with high SopE1 protein expression but lack the commonly occurring avrA determinant. They exhibit normal SopB protein expression but lack SopD protein production. In contrast, strains from enteric infections classified as belonging to the enteric pathovar possess various combinations of the respective virulence genes, PFGE pattern, and ribotypes. We propose that the PCR technique for testing for the presence of the virulence genes sopE1 and avrA be used as a diagnostic tool for identifying both pathovars of S. enterica serotype Paratyphi B. This will be of great public health importance, since strains of serotype Paratyphi B have recently reemerged worldwide. PMID:12958256
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Moriau, L; Michelet, B; Bogaerts, P; Lambert, L; Michel, A; Oufattole, M; Boutry, M
1999-07-01
The plasma membrane H+-ATPase couples ATP hydrolysis to proton transport, thereby establishing the driving force for solute transport across the plasma membrane. In Nicotiana plumbaginifolia, this enzyme is encoded by at least nine pma (plasma membrane H+-ATPase) genes. Four of these are classified into two gene subfamilies, pma1-2-3 and pma4, which are the most highly expressed in plant species. We have isolated genomic clones for pma2 and pma4. Mapping of their transcript 5' end revealed the presence of a long leader that contained small open reading frames, regulatory features typical of other pma genes. The gusA reporter gene was then used to determine the expression of pma2, pma3 and pma4 in N. tabacum. These data, together with those obtained previously for pma1, led to the following conclusions. (i) The four pma-gusA genes were all expressed in root, stem, leaf and flower organs, but each in a cell-type specific manner. Expression in these organs was confirmed at the protein level, using subfamily-specific antibodies. (ii) pma4-gusA was expressed in many cell types and notably in root hair and epidermis, in companion cells, and in guard cells, indicating that in N. plumbaginifolia the same H+-ATPase isoform might be involved in mineral nutrition, phloem loading and control of stomata aperture. (iii) The second gene subfamily is composed, in N. plumbaginifolia, of a single gene (pma4) with a wide expression pattern and, in Arabidopsis thaliana, of three genes (aha1, aha2, aha3), at least two of them having a more restrictive expression pattern. (iv) Some cell types expressed pma2 and pma4 at the same time, which encode H+-ATPases with different enzymatic properties.
Isolation and characterization of two VpYABBY genes from wild Chinese Vitis pseudoreticulata.
Xiang, J; Liu, R Q; Li, T M; Han, L J; Zou, Y; Xu, T F; Wei, J Y; Wang, Y J; Xu, Y
2013-12-01
The establishment of abaxial-adaxial polarity is an important feature of the development of lateral organs in plants. Members of the YABBY gene family may be specific to seed-plant-specific transcriptional regulators that play critical roles in promoting abaxial cell fate in the model eudicot, Arabidopsis thaliana. However, recent study has shown that the roles of YABBY genes are not conserved in the development of angiosperms. The establishment of abaxial-adaxial polarity has not been studied in perennial fruit crops. Grapes are an important fruit crop in many regions of the world. Investigating YABBY genes in grapevines should help us to discover more about the key genetic and molecular pathways in grapevine development. To understand the characterization of YABBY genes in grapevines, two YABBY genes, VpYABBY1 (GenBank accession No. KC139089) and VpYABBY2 (GenBank accession No. KC139090), were isolated from the wild Chinese species Vitis pseudoreticulata. Both of these encode YABBY proteins. Sequence characterization and phylogenetic analyses show that VpYABBY1 is group classified into the FIL subfamily while VpYABBY2 is a member of the YAB2 subfamily of Arabidopsis thaliana. Subcellular localization analysis indicates that VpYABBY1 and VpYABBY2 proteins are localized in the nucleus. Tissue specific expressional analysis reveals that VpYABBY1 is expressed strongly in young leaves of grape but only weakly in the mature leaves. Meanwhile, VpYABBY2 is expressed in grape stems, flowers, tendrils, and leaves. Transgenic Arabidopsis plants ectopically expressing VpYABBY1 caused the partial abaxialization of the adaxial epidermises of leaves, behaving similarly to those over-expressing FIL or YAB3 with abaxialized lateral organs. By contrast, ectopic expression of VpYABBY2 in Arabidopsis did not cause any alteration in the adaxial-abaxial polarity. Sequence characterization and phylogenetic analysis revealed that VpYABBY1 and VpYABBY2 are group-classified into two different subfamilies. They have diverged functionally in the control of lateral organ development. VpYABBY1 may have a function in leaf development, while VpYABBY2 may play a specific role in carpel development and grape berry morphogenesis. It is further possible that during the evolution of different species, YABBY family members have preserved different expression regulatory systems and functions.
Intron-loss evolution of hatching enzyme genes in Teleostei
2010-01-01
Background Hatching enzyme, belonging to the astacin metallo-protease family, digests egg envelope at embryo hatching. Orthologous genes of the enzyme are found in all vertebrate genomes. Recently, we found that exon-intron structures of the genes were conserved among tetrapods, while the genes of teleosts frequently lost their introns. Occurrence of such intron losses in teleostean hatching enzyme genes is an uncommon evolutionary event, as most eukaryotic genes are generally known to be interrupted by introns and the intron insertion sites are conserved from species to species. Here, we report on extensive studies of the exon-intron structures of teleostean hatching enzyme genes for insight into how and why introns were lost during evolution. Results We investigated the evolutionary pathway of intron-losses in hatching enzyme genes of 27 species of Teleostei. Hatching enzyme genes of basal teleosts are of only one type, which conserves the 9-exon-8-intron structure of an assumed ancestor. On the other hand, otocephalans and euteleosts possess two types of hatching enzyme genes, suggesting a gene duplication event in the common ancestor of otocephalans and euteleosts. The duplicated genes were classified into two clades, clades I and II, based on phylogenetic analysis. In otocephalans and euteleosts, clade I genes developed a phylogeny-specific structure, such as an 8-exon-7-intron, 5-exon-4-intron, 4-exon-3-intron or intron-less structure. In contrast to the clade I genes, the structures of clade II genes were relatively stable in their configuration, and were similar to that of the ancestral genes. Expression analyses revealed that hatching enzyme genes were high-expression genes, when compared to that of housekeeping genes. When expression levels were compared between clade I and II genes, clade I genes tends to be expressed more highly than clade II genes. Conclusions Hatching enzyme genes evolved to lose their introns, and the intron-loss events occurred at the specific points of teleostean phylogeny. We propose that the high-expression hatching enzyme genes frequently lost their introns during the evolution of teleosts, while the low-expression genes maintained the exon-intron structure of the ancestral gene. PMID:20796321
Expression loss and revivification of RhoB gene in ovary carcinoma carcinogenesis and development.
Liu, Yingwei; Song, Na; Ren, Kexing; Meng, Shenglan; Xie, Yao; Long, Qida; Chen, Xiancheng; Zhao, Xia
2013-01-01
RhoB, a member of small GTPases belonging to the Ras protein superfamily, might have a suppressive activity in cancer progression. Here, expression of RhoB gene was evaluated in human benign, borderline and malignant ovary tumors by immunostaining, with normal ovary tissue as control. Malignant tumors were assessed according to Federation Internationale de Gynecologie Obstetrique (FIGO) guidelines and classified in stage I-IV. Revivification of RhoB gene was investigated by analyzing the effect of histone deacetylase (HDAC) inhibitor trichostatin (TSA) and methyltransferase inhibitor 5-azacytidine (5-Aza) on ovarian cancer cells via RT-PCR and western blot. Apoptosis of ovary cancer cells was detected using flowcytometry and fluorescence microscopy. Subsequently, RhoB expression is detected in normal ovary epithelium, borderline tumors, and decreases significantly or lost in the majority of ovarian cancer specimen (P<0.05). RhoB expression decreases significantly from stage II (71.4%) to stage III (43.5%) to stage IV (18.2%, P<0.05). TSA can both significantly revive the RhoB gene and mediate apoptosis of ovarian cancer cells, but 5-Aza couldn't. Interference into Revivification of RhoB gene results in reduction of ovary carcinoma cell apoptosis. It is proposed that loss of RhoB expression occurs frequently in ovary carcinogenesis and progression and its expression could be regulated by histone deacetylation but not by promoter hypermethylation, which may serve as a prospective gene treatment target for the patients with ovarian malignancy not responding to standard therapies.
Expression Loss and Revivification of RhoB Gene in Ovary Carcinoma Carcinogenesis and Development
Liu, Yingwei; Song, Na; Ren, Kexing; Meng, Shenglan; Xie, Yao; Long, Qida; Chen, Xiancheng; Zhao, Xia
2013-01-01
RhoB, a member of small GTPases belonging to the Ras protein superfamily, might have a suppressive activity in cancer progression. Here, expression of RhoB gene was evaluated in human benign, borderline and malignant ovary tumors by immunostaining, with normal ovary tissue as control. Malignant tumors were assessed according to Federation Internationale de Gynecologie Obstetrique (FIGO) guidelines and classified in stage I-IV. Revivification of RhoB gene was investigated by analyzing the effect of histone deacetylase (HDAC) inhibitor trichostatin (TSA) and methyltransferase inhibitor 5-azacytidine (5-Aza) on ovarian cancer cells via RT-PCR and western blot. Apoptosis of ovary cancer cells was detected using flowcytometry and fluorescence microscopy. Subsequently, RhoB expression is detected in normal ovary epithelium, borderline tumors, and decreases significantly or lost in the majority of ovarian cancer specimen (P<0.05). RhoB expression decreases significantly from stage II (71.4%) to stage III (43.5%) to stage IV (18.2%, P<0.05). TSA can both significantly revive the RhoB gene and mediate apoptosis of ovarian cancer cells, but 5-Aza couldn’t. Interference into Revivification of RhoB gene results in reduction of ovary carcinoma cell apoptosis. It is proposed that loss of RhoB expression occurs frequently in ovary carcinogenesis and progression and its expression could be regulated by histone deacetylation but not by promoter hypermethylation, which may serve as a prospective gene treatment target for the patients with ovarian malignancy not responding to standard therapies. PMID:24223801
Chatterjee, Sankhadeep; Dey, Nilanjan; Shi, Fuqian; Ashour, Amira S; Fong, Simon James; Sen, Soumya
2018-04-01
Dengue fever detection and classification have a vital role due to the recent outbreaks of different kinds of dengue fever. Recently, the advancement in the microarray technology can be employed for such classification process. Several studies have established that the gene selection phase takes a significant role in the classifier performance. Subsequently, the current study focused on detecting two different variations, namely, dengue fever (DF) and dengue hemorrhagic fever (DHF). A modified bag-of-features method has been proposed to select the most promising genes in the classification process. Afterward, a modified cuckoo search optimization algorithm has been engaged to support the artificial neural (ANN-MCS) to classify the unknown subjects into three different classes namely, DF, DHF, and another class containing convalescent and normal cases. The proposed method has been compared with other three well-known classifiers, namely, multilayer perceptron feed-forward network (MLP-FFN), artificial neural network (ANN) trained with cuckoo search (ANN-CS), and ANN trained with PSO (ANN-PSO). Experiments have been carried out with different number of clusters for the initial bag-of-features-based feature selection phase. After obtaining the reduced dataset, the hybrid ANN-MCS model has been employed for the classification process. The results have been compared in terms of the confusion matrix-based performance measuring metrics. The experimental results indicated a highly statistically significant improvement with the proposed classifier over the traditional ANN-CS model.
Genome-wide characterization of monomeric transcriptional regulators in Mycobacterium tuberculosis.
Feng, Lipeng; Chen, Zhenkang; Wang, Zhongwei; Hu, Yangbo; Chen, Shiyun
2016-05-01
Gene transcription catalysed by RNA polymerase is regulated by transcriptional regulators, which play central roles in the control of gene transcription in both eukaryotes and prokaryotes. In regulating gene transcription, many regulators form dimers that bind to DNA with repeated motifs. However, some regulators function as monomers, but their mechanisms of gene expression control are largely uncharacterized. Here we systematically characterized monomeric versus dimeric regulators in the tuberculosis causative agent Mycobacterium tuberculosis. Of the >160 transcriptional regulators annotated in M. tuberculosis, 154 transcriptional regulators were tested, 22 % probably act as monomers and most are annotated as hypothetical regulators. Notably, all members of the WhiB-like protein family are classified as monomers. To further investigate mechanisms of monomeric regulators, we analysed the actions of these WhiB proteins and found that the majority interact with the principal sigma factor σA, which is also a monomeric protein within the RNA polymerase holoenzyme. Taken together, our study for the first time globally classified monomeric regulators in M. tuberculosis and suggested a mechanism for monomeric regulators in controlling gene transcription through interacting with monomeric sigma factors.
A novel approach for dimension reduction of microarray.
Aziz, Rabia; Verma, C K; Srivastava, Namita
2017-12-01
This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA+ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA+ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA+ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA+ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA+ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA+ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gene expression of glucose transporter (GLUT) 1, 3 and 4 in bovine follicle and corpus luteum.
Nishimoto, H; Matsutani, R; Yamamoto, S; Takahashi, T; Hayashi, K-G; Miyamoto, A; Hamano, S; Tetsuka, M
2006-01-01
Glucose is the main energy substrate in the bovine ovary, and a sufficient supply of it is necessary to sustain the ovarian activity. Glucose cannot permeate the plasma membrane, and its uptake is mediated by a number of glucose transporters (GLUT). In the present study, we investigated the gene expression of GLUT1, 3 and 4 in the bovine follicle and corpus luteum (CL). Ovaries were obtained from Holstein x Japanese Black F1 heifers. Granulosa cells and theca interna layers were harvested from follicles classified into five categories by their physiologic status: follicular size (>or= 8.5 mm: dominant; < 8.5 mm: subordinate), ratio of estradiol (E(2)) to progesterone in follicular fluid (>or= 1: E(2) active;<1: E(2) inactive), and stage of estrous cycle (luteal phase, follicular phase). CL were also classified by the stage of estrous cycle. Expression levels of GLUT1, 3 and 4 mRNA were quantified by a real-time PCR. The mRNA for GLUT1 and 3 were detected in the bovine follicle and CL at comparable levels to those in classic GLUT-expressing organs such as brain and heart. Much lower but appreciable levels of GLUT4 were also detected in these tissues. The gene expression of these GLUT showed tissue- and stage-specific patterns. Despite considerable differences in physiologic conditions, similar levels of GLUT1, 3 and 4 mRNA were expressed in subordinate follicles as well as dominant E(2)-active follicles in both luteal and follicular phases, whereas a notable increase in the gene expression of these GLUT was observed in dominant E(2)-inactive follicles undergoing the atretic process. In these follicles, highly significant negative correlations were observed between the concentrations of glucose in follicular fluid and the levels of GLUT1 and 3 mRNA in granulosa cells, implying that the local glucose environment affects glucose uptake of follicles. These results indicate that GLUT1 and 3 act as major transporters of glucose while GLUT4 may play a supporting role in the bovine follicle and CL.
Decision trees for the analysis of genes involved in Alzheimer's disease pathology.
Mestizo Gutiérrez, Sonia L; Herrera Rivero, Marisol; Cruz Ramírez, Nicandro; Hernández, Elena; Aranda-Abreu, Gonzalo E
2014-09-21
Alzheimer's disease (AD) is characterized by a gradual loss of memory, orientation, judgement and language. There is still no cure for this disorder. AD pathogenesis remains fairly unknown and its underlying molecular mechanisms are not yet fully understood. Several studies have shown that the abnormal accumulation of beta-amyloid and tau proteins occurs 10 to 20 years before the onset of symptoms of the disease, so it is extremely important to identify changes in the brain before the first symptoms. We used decision trees to classify 31 individuals (9 healthy controls and 22 AD patients in three different stages of disease) according to the expression of 69 genes previously reported in a meta-analysis, plus the expression levels of APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT. We also included in our analysis the MMSE (Mini-Mental State Examination) scores and number of NFT (neurofibrillary tangles). Results allowed us to generate a model of classification values for different AD stages of severity, according to MMSE scores, and achieve the identification of the expression level of protein tau that may possibly determine the onset (incipient stage) of AD. We used decision trees to model the different stages of AD (severe, moderate, incipient and control) based on the meta-analysis of gene expression levels plus MMSE and NFT scores. Both classifiers reported the variable MMSE as most informative, however it we were found that the protein tau also an important role in the onset of AD. Copyright © 2014 Elsevier Ltd. All rights reserved.
Galatola, Martina; Cielo, Donatella; Panico, Camilla; Stellato, Pio; Malamisura, Basilio; Carbone, Lorenzo; Gianfrani, Carmen; Troncone, Riccardo; Greco, Luigi; Auricchio, Renata
2017-09-01
The prevalence of celiac disease (CD) has increased significantly in recent years, and risk prediction and early diagnosis have become imperative especially in at-risk families. In a previous study, we identified individuals with CD based on the expression profile of a set of candidate genes in peripheral blood monocytes. Here we evaluated the expression of a panel of CD candidate genes in peripheral blood mononuclear cells from at-risk infants long time before any symptom or production of antibodies. We analyzed the gene expression of a set of 9 candidate genes, associated with CD, in 22 human leukocyte antigen predisposed children from at-risk families for CD, studied from birth to 6 years of age. Nine of them developed CD (patients) and 13 did not (controls). We analyzed gene expression at 3 different time points (age matched in the 2 groups): 4-19 months before diagnosis, at the time of CD diagnosis, and after at least 1 year of a gluten-free diet. At similar age points, controls were also evaluated. Three genes (KIAA, TAGAP [T-cell Activation GTPase Activating Protein], and SH2B3 [SH2B Adaptor Protein 3]) were overexpressed in patients, compared with controls, at least 9 months before CD diagnosis. At a stepwise discriminant analysis, 4 genes (RGS1 [Regulator of G-protein signaling 1], TAGAP, TNFSF14 [Tumor Necrosis Factor (Ligand) Superfamily member 14], and SH2B3) differentiate patients from controls before serum antibodies production and clinical symptoms. Multivariate equation correctly classified CD from non-CD children in 95.5% of patients. The expression of a small set of candidate genes in peripheral blood mononuclear cells can predict CD at least 9 months before the appearance of any clinical and serological signs of the disease.
Liu, Qin; Dang, Huijie; Chen, Zhijian; Wu, Junzheng; Chen, Yinhua; Chen, Songbi; Luo, Lijuan
2018-03-26
The sugar transporter ( STP ) gene family encodes monosaccharide transporters that contain 12 transmembrane domains and belong to the major facilitator superfamily. STP genes play critical roles in monosaccharide distribution and participate in diverse plant metabolic processes. To investigate the potential roles of STPs in cassava ( Manihot esculenta ) tuber root growth, genome-wide identification and expression and functional analyses of the STP gene family were performed in this study. A total of 20 MeSTP genes ( MeSTP1 - 20 ) containing the Sugar_tr conserved motifs were identified from the cassava genome, which could be further classified into four distinct groups in the phylogenetic tree. The expression profiles of the MeSTP genes explored using RNA-seq data showed that most of the MeSTP genes exhibited tissue-specific expression, and 15 out of 20 MeSTP genes were mainly expressed in the early storage root of cassava. qRT-PCR analysis further confirmed that most of the MeSTPs displayed higher expression in roots after 30 and 40 days of growth, suggesting that these genes may be involved in the early growth of tuber roots. Although all the MeSTP proteins exhibited plasma membrane localization, variations in monosaccharide transport activity were found through a complementation analysis in a yeast ( Saccharomyces cerevisiae ) mutant, defective in monosaccharide uptake. Among them, MeSTP2, MeSTP15, and MeSTP19 were able to efficiently complement the uptake of five monosaccharides in the yeast mutant, while MeSTP3 and MeSTP16 only grew on medium containing galactose, suggesting that these two MeSTP proteins are transporters specific for galactose. This study provides significant insights into the potential functions of MeSTPs in early tuber root growth, which possibly involves the regulation of monosaccharide distribution.
Dominguez, Daniel; Tsai, Yi-Hsuan; Gomez, Nicholas; Jha, Deepak Kumar; Davis, Ian; Wang, Zefeng
2016-01-01
Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes. PMID:27364684
Li, Dong-Mei; Staehelin, Christian; Zhang, Yi-Shun; Peng, Shao-Lin
2009-09-01
The influence of Cuscuta campestris on its host Mikania micrantha has been studied with respect to biomass accumulation, physiology and ecology. Molecular events of this parasitic plant-plant interaction are poorly understood, however. In this study, we identified novel genes from M. micrantha induced by C. campestris infection. Genes expressed upon parasitization by C. campestris at early post-penetration stages were investigated by construction and characterization of subtracted cDNA libraries from shoots and stems of M. micrantha. Three hundred and three presumably up-regulated expressed sequence tags (ESTs) were identified and classified in functional categories, such as "metabolism", "cell defence and stress", "transcription factor", "signal transduction", "transportation" and "photosynthesis". In shoots and stems of infected M. micrantha, genes associated with defence responses and cell wall modifications were induced, confirming similar data from other parasitic plant-plant interactions. However, gene expression profiles in infected shoots and stems were found to be different. Compared to infected shoots, more genes induced in response to biotic and abiotic stress factors were identified in infected stems. Furthermore, database comparisons revealed a notable number of M. micrantha ESTs that matched genes with unknown function. Expression analysis by quantitative real-time RT-PCR of 21 genes (from different functional categories) showed significantly increased levels for 13 transcripts in response to C. campestris infection. In conclusion, this study provides an overview of genes from parasitized M. micrantha at early post-penetration stages. The acquired data form the basis for a molecular understanding of host reactions in response to parasitic plants.
2014-01-01
Background The chicken eggshell is a natural mechanical barrier to protect egg components from physical damage and microbial penetration. Its integrity and strength is critical for the development of the embryo or to ensure for consumers a table egg free of pathogens. This study compared global gene expression in laying hen uterus in the presence or absence of shell calcification in order to characterize gene products involved in the supply of minerals and / or the shell biomineralization process. Results Microarrays were used to identify a repertoire of 302 over-expressed genes during shell calcification. GO terms enrichment was performed to provide a global interpretation of the functions of the over-expressed genes, and revealed that the most over-represented proteins are related to reproductive functions. Our analysis identified 16 gene products encoding proteins involved in mineral supply, and allowed updating of the general model describing uterine ion transporters during eggshell calcification. A list of 57 proteins potentially secreted into the uterine fluid to be active in the mineralization process was also established. They were classified according to their potential functions (biomineralization, proteoglycans, molecular chaperone, antimicrobials and proteases/antiproteases). Conclusions Our study provides detailed descriptions of genes and corresponding proteins over-expressed when the shell is mineralizing. Some of these proteins involved in the supply of minerals and influencing the shell fabric to protect the egg contents are potentially useful biological markers for the genetic improvement of eggshell quality. PMID:24649854
Ao, Lu; Zhang, Zimei; Guan, Qingzhou; Guo, Yating; Guo, You; Zhang, Jiahui; Lv, Xingwei; Huang, Haiyan; Zhang, Huarong; Wang, Xianlong; Guo, Zheng
2018-04-23
Currently, using biopsy specimens to confirm suspicious liver lesions of early hepatocellular carcinoma are not entirely reliable because of insufficient sampling amount and inaccurate sampling location. It is necessary to develop a signature to aid early hepatocellular carcinoma diagnosis using biopsy specimens even when the sampling location is inaccurate. Based on the within-sample relative expression orderings of gene pairs, we identified a simple qualitative signature to distinguish both hepatocellular carcinoma and adjacent non-tumour tissues from cirrhosis tissues of non-hepatocellular carcinoma patients. A signature consisting of 19 gene pairs was identified in the training data sets and validated in 2 large collections of samples from biopsy and surgical resection specimens. For biopsy specimens, 95.7% of 141 hepatocellular carcinoma tissues and all (100%) of 108 cirrhosis tissues of non-hepatocellular carcinoma patients were correctly classified. Especially, all (100%) of 60 hepatocellular carcinoma adjacent normal tissues and 77.5% of 80 hepatocellular carcinoma adjacent cirrhosis tissues were classified to hepatocellular carcinoma. For surgical resection specimens, 99.7% of 733 hepatocellular carcinoma specimens were correctly classified to hepatocellular carcinoma, while 96.1% of 254 hepatocellular carcinoma adjacent cirrhosis tissues and 95.9% of 538 hepatocellular carcinoma adjacent normal tissues were classified to hepatocellular carcinoma. In contrast, 17.0% of 47 cirrhosis from non-hepatocellular carcinoma patients waiting for liver transplantation were classified to hepatocellular carcinoma, indicating that some patients with long-lasting cirrhosis could have already gained hepatocellular carcinoma characteristics. The signature can distinguish both hepatocellular carcinoma tissues and tumour-adjacent tissues from cirrhosis tissues of non-hepatocellular carcinoma patients even using inaccurately sampled biopsy specimens, which can aid early diagnosis of hepatocellular carcinoma. © 2018 The Authors. Liver International Published by John Wiley & Sons Ltd.
Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M.
2014-01-01
Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the “proline knot” motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1–3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants. PMID:24516650
Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M
2014-01-01
Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.
Roy, Janine; Aust, Daniela; Knösel, Thomas; Rümmele, Petra; Jahnke, Beatrix; Hentrich, Vera; Rückert, Felix; Niedergethmann, Marco; Weichert, Wilko; Bahra, Marcus; Schlitt, Hans J.; Settmacher, Utz; Friess, Helmut; Büchler, Markus; Saeger, Hans-Detlev; Schroeder, Michael; Pilarsky, Christian; Grützmann, Robert
2012-01-01
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice. PMID:22615549
Liao, Hui-Ling; Burns, Jacqueline K
2012-05-01
Distribution of viable Candidatus Liberibacter asiaticus (CaLas) in sweet orange fruit and leaves ('Hamlin' and 'Valencia') and transcriptomic changes associated with huanglongbing (HLB) infection in fruit tissues are reported. Viable CaLas was present in most fruit tissues tested in HLB trees, with the highest titre detected in vascular tissue near the calyx abscission zone. Transcriptomic changes associated with HLB infection were analysed in flavedo (FF), vascular tissue (VT), and juice vesicles (JV) from symptomatic (SY), asymptomatic (AS), and healthy (H) fruit. In SY 'Hamlin', HLB altered the expression of more genes in FF and VT than in JV, whereas in SY 'Valencia', the number of genes whose expression was changed by HLB was similar in these tissues. The expression of more genes was altered in SY 'Valencia' JV than in SY 'Hamlin' JV. More genes were also affected in AS 'Valencia' FF and VT than in AS 'Valencia' JV. Most genes whose expression was changed by HLB were classified as transporters or involved in carbohydrate metabolism. Physiological characteristics of HLB-infected and girdled fruit were compared to differentiate between HLB-specific and carbohydrate metabolism-related symptoms. SY and girdled fruit were smaller than H and ungirdled fruit, respectively, with poor juice quality. However, girdling did not cause misshapen fruit or differential peel coloration. Quantitative PCR analysis indicated that many selected genes changed their expression significantly in SY flavedo but not in girdled flavedo. Mechanisms regulating development of HLB symptoms may lie in the host disease response rather than being a direct consequence of carbohydrate starvation.
Sadanandam, Anguraj; Wullschleger, Stephan; Lyssiotis, Costas A.; Grötzinger, Carsten; Barbi, Stefano; Bersani, Samantha; Körner, Jan; Wafy, Ismael; Mafficini, Andrea; Lawlor, Rita T.; Simbolo, Michele; Asara, John M.; Bläker, Hendrik; Cantley, Lewis C.; Wiedenmann, Bertram; Scarpa, Aldo; Hanahan, Douglas
2016-01-01
Seeking to assess the representative and instructive value of an engineered mouse model of pancreatic neuroendocrine tumors (PanNET) for its cognate human cancer, we profiled and compared mRNA and miRNA transcriptomes of tumors from both. Mouse PanNET tumors could be classified into two distinctive subtypes, well-differentiated islet/insulinoma tumors (IT) and poorly differentiated tumors associated with liver metastases, dubbed metastasis-like primary (MLP). Human PanNETs were independently classified into these same two subtypes, along with a third, specific gene mutation–enriched subtype. The MLP subtypes in human and mouse were similar to liver metastases in terms of miRNA and mRNA transcriptome profiles and signature genes. The human/mouse MLP subtypes also similarly expressed genes known to regulate early pancreas development, whereas the IT subtypes expressed genes characteristic of mature islet cells, suggesting different tumorigenesis pathways. In addition, these subtypes exhibit distinct metabolic profiles marked by differential pyruvate metabolism, substantiating the significance of their separate identities. SIGNIFICANCE This study involves a comprehensive cross-species integrated analysis of multi-omics profiles and histology to stratify PanNETs into subtypes with distinctive characteristics. We provide support for the RIP1-TAG2 mouse model as representative of its cognate human cancer with prospects to better understand PanNET heterogeneity and consider future applications of personalized cancer therapy. PMID:26446169
Sadanandam, Anguraj; Wullschleger, Stephan; Lyssiotis, Costas A; Grötzinger, Carsten; Barbi, Stefano; Bersani, Samantha; Körner, Jan; Wafy, Ismael; Mafficini, Andrea; Lawlor, Rita T; Simbolo, Michele; Asara, John M; Bläker, Hendrik; Cantley, Lewis C; Wiedenmann, Bertram; Scarpa, Aldo; Hanahan, Douglas
2015-12-01
Seeking to assess the representative and instructive value of an engineered mouse model of pancreatic neuroendocrine tumors (PanNET) for its cognate human cancer, we profiled and compared mRNA and miRNA transcriptomes of tumors from both. Mouse PanNET tumors could be classified into two distinctive subtypes, well-differentiated islet/insulinoma tumors (IT) and poorly differentiated tumors associated with liver metastases, dubbed metastasis-like primary (MLP). Human PanNETs were independently classified into these same two subtypes, along with a third, specific gene mutation-enriched subtype. The MLP subtypes in human and mouse were similar to liver metastases in terms of miRNA and mRNA transcriptome profiles and signature genes. The human/mouse MLP subtypes also similarly expressed genes known to regulate early pancreas development, whereas the IT subtypes expressed genes characteristic of mature islet cells, suggesting different tumorigenesis pathways. In addition, these subtypes exhibit distinct metabolic profiles marked by differential pyruvate metabolism, substantiating the significance of their separate identities. This study involves a comprehensive cross-species integrated analysis of multi-omics profiles and histology to stratify PanNETs into subtypes with distinctive characteristics. We provide support for the RIP1-TAG2 mouse model as representative of its cognate human cancer with prospects to better understand PanNET heterogeneity and consider future applications of personalized cancer therapy. ©2015 American Association for Cancer Research.
Jue, Dengwei; Sang, Xuelian; Liu, Liqin; Shu, Bo; Wang, Yicheng; Xie, Jianghui; Liu, Chengming; Shi, Shengyou
2018-03-15
Ubiquitin-conjugating enzymes (E2s or UBC enzymes) play vital roles in plant development and combat various biotic and abiotic stresses. Longan ( Dimocarpus longan Lour.) is an important fruit tree in the subtropical region of Southeast Asia and Australia; however the characteristics of the UBC gene family in longan remain unknown. In this study, 40 D. longan UBC genes ( DlUBCs ), which were classified into 15 groups, were identified in the longan genome. An RNA-seq based analysis showed that DlUBCs showed distinct expression in nine longan tissues. Genome-wide RNA-seq and qRT-PCR based gene expression analysis revealed that 11 DlUBCs were up- or down-regualted in the cultivar "Sijimi" (SJ), suggesting that these genes may be important for flower induction. Finally, qRT-PCR analysis showed that the mRNA levels of 13 DlUBCs under SA (salicylic acid) treatment, seven under methyl jasmonate (MeJA) treatment, 27 under heat treatment, and 16 under cold treatment were up- or down-regulated, respectively. These results indicated that the DlUBCs may play important roles in responses to abiotic stresses. Taken together, our results provide a comprehensive insight into the organization, phylogeny, and expression patterns of the longan UBC genes, and therefore contribute to the greater understanding of their biological roles in longan.
Xu, Zhenbo; Xie, Jinhong; Liu, Junyan; Ji, Lili; Soteyome, Thanapop; Peters, Brian M; Chen, Dingqiang; Li, Bing; Li, Lin; Shirtliff, Mark E
2017-03-01
Bacillus cereus is one of the most common opportunistic pathogens responsible for various foodborn diseases. To investigate the regulatory mechanism of B. cereus under high osmotic pressure, two B. cereus strains B25 and B26 were isolated from the industrial soy sauce residue containing high-salt concentration. Resequencing was performed by Illumina/Solexa platform and 13,646 SNPs and 434 InDels were identified as common variants between B25 and B26 against reference genome, followed by COG, GO, and KEGG enrichment analysis. Furthermore, 49 key genes involving in Na + /H + ,K + transporter, dipeptide or tripeptide transporter, stress response were selected and classified into 27 groups. Further validation was performed by qRT-PCR, and 4 candidate genes were found most associated with osmotic response. Gene expression of the 4 candidate genes was then analyzed accordingly, and down regulation was obtained for gene BC0669 and BC0754 associated with K + transport system. However, dramatic up regulation was detected for gene BC2114 involving in glutathione peroxidase, indicating the activation of antioxidant responses by osmotic stress via genetic regulation. As concluded, bioinformatic analysis and gene expression profile represented the basis of further investigation on the genetic and regulatory mechanism of bacterial salt tolerance. Copyright © 2017 Elsevier Ltd. All rights reserved.
Yang, Congcong; Ding, Puyang; Liu, Yaxi; Qiao, Linyi; Chang, Zhijian; Geng, Hongwei; Wang, Penghao; Jiang, Qiantao; Wang, Jirui; Chen, Guoyue; Wei, Yuming; Zheng, Youliang; Lan, Xiujin
2017-01-01
The MADS-box genes encode transcription factors with key roles in plant growth and development. A comprehensive analysis of the MADS-box gene family in bread wheat (Triticum aestivum) has not yet been conducted, and our understanding of their roles in stress is rather limited. Here, we report the identification and characterization of the MADS-box gene family in wheat. A total of 180 MADS-box genes classified as 32 Mα, 5 Mγ, 5 Mδ, and 138 MIKC types were identified. Evolutionary analysis of the orthologs among T. urartu, Aegilops tauschii and wheat as well as homeologous sequences analysis among the three sub-genomes in wheat revealed that gene loss and chromosomal rearrangements occurred during and/or after the origin of bread wheat. Forty wheat MADS-box genes that were expressed throughout the investigated tissues and development stages were identified. The genes that were regulated in response to both abiotic stresses (i.e., phosphorus deficiency, drought, heat, and combined drought and heat) and biotic stresses (i.e., Fusarium graminearum, Septoria tritici, stripe rust and powdery mildew) were detected as well. A few notable MADS-box genes were specifically expressed in a single tissue and those showed relatively higher expression differences between the stress and control treatment. The expression patterns of considerable MADS-box genes differed from those of their orthologs in Brachypodium, rice, and Arabidopsis. Collectively, the present study provides new insights into the possible roles of MADS-box genes in response to stresses and will be valuable for further functional studies of important candidate MADS-box genes. PMID:28742823
Ma, Jian; Yang, Yujie; Luo, Wei; Yang, Congcong; Ding, Puyang; Liu, Yaxi; Qiao, Linyi; Chang, Zhijian; Geng, Hongwei; Wang, Penghao; Jiang, Qiantao; Wang, Jirui; Chen, Guoyue; Wei, Yuming; Zheng, Youliang; Lan, Xiujin
2017-01-01
The MADS-box genes encode transcription factors with key roles in plant growth and development. A comprehensive analysis of the MADS-box gene family in bread wheat (Triticum aestivum) has not yet been conducted, and our understanding of their roles in stress is rather limited. Here, we report the identification and characterization of the MADS-box gene family in wheat. A total of 180 MADS-box genes classified as 32 Mα, 5 Mγ, 5 Mδ, and 138 MIKC types were identified. Evolutionary analysis of the orthologs among T. urartu, Aegilops tauschii and wheat as well as homeologous sequences analysis among the three sub-genomes in wheat revealed that gene loss and chromosomal rearrangements occurred during and/or after the origin of bread wheat. Forty wheat MADS-box genes that were expressed throughout the investigated tissues and development stages were identified. The genes that were regulated in response to both abiotic stresses (i.e., phosphorus deficiency, drought, heat, and combined drought and heat) and biotic stresses (i.e., Fusarium graminearum, Septoria tritici, stripe rust and powdery mildew) were detected as well. A few notable MADS-box genes were specifically expressed in a single tissue and those showed relatively higher expression differences between the stress and control treatment. The expression patterns of considerable MADS-box genes differed from those of their orthologs in Brachypodium, rice, and Arabidopsis. Collectively, the present study provides new insights into the possible roles of MADS-box genes in response to stresses and will be valuable for further functional studies of important candidate MADS-box genes.
Hou, Xiao-Jin; Li, Si-Bei; Liu, Sheng-Rui; Hu, Chun-Gen; Zhang, Jin-Zhi
2014-01-01
MYB family genes are widely distributed in plants and comprise one of the largest transcription factors involved in various developmental processes and defense responses of plants. To date, few MYB genes and little expression profiling have been reported for citrus. Here, we describe and classify 177 members of the sweet orange MYB gene (CsMYB) family in terms of their genomic gene structures and similarity to their putative Arabidopsis orthologs. According to these analyses, these CsMYBs were categorized into four groups (4R-MYB, 3R-MYB, 2R-MYB and 1R-MYB). Gene structure analysis revealed that 1R-MYB genes possess relatively more introns as compared with 2R-MYB genes. Investigation of their chromosomal localizations revealed that these CsMYBs are distributed across nine chromosomes. Sweet orange includes a relatively small number of MYB genes compared with the 198 members in Arabidopsis, presumably due to a paralog reduction related to repetitive sequence insertion into promoter and non-coding transcribed region of the genes. Comparative studies of CsMYBs and Arabidopsis showed that CsMYBs had fewer gene duplication events. Expression analysis revealed that the MYB gene family has a wide expression profile in sweet orange development and plays important roles in development and stress responses. In addition, 337 new putative microsatellites with flanking sequences sufficient for primer design were also identified from the 177 CsMYBs. These results provide a useful reference for the selection of candidate MYB genes for cloning and further functional analysis forcitrus. PMID:25375352
Song, Hyun-Seob; McClure, Ryan S.; Bernstein, Hans C.; ...
2015-03-27
Cyanobacteria dynamically relay environmental inputs to intracellular adaptations through a coordinated adjustment of photosynthetic efficiency and carbon processing rates. The output of such adaptations is reflected through changes in transcriptional patterns and metabolic flux distributions that ultimately define growth strategy. To address interrelationships between metabolism and regulation, we performed integrative analyses of metabolic and gene co-expression networks in a model cyanobacterium, Synechococcus sp. PCC 7002. Centrality analyses using the gene co-expression network identified a set of key genes, which were defined here as ‘topologically important.’ Parallel in silico gene knock-out simulations, using the genome-scale metabolic network, classified what we termedmore » as ‘functionally important’ genes, deletion of which affected growth or metabolism. A strong positive correlation was observed between topologically and functionally important genes. Functionally important genes exhibited variable levels of topological centrality; however, the majority of topologically central genes were found to be functionally essential for growth. Subsequent functional enrichment analysis revealed that both functionally and topologically important genes in Synechococcus sp. PCC 7002 are predominantly associated with translation and energy metabolism, two cellular processes critical for growth. This research demonstrates how synergistic network-level analyses can be used for reconciliation of metabolic and gene expression data to uncover fundamental biological principles.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Hyun-Seob; McClure, Ryan S.; Bernstein, Hans C.
Cyanobacteria dynamically relay environmental inputs to intracellular adaptations through a coordinated adjustment of photosynthetic efficiency and carbon processing rates. The output of such adaptations is reflected through changes in transcriptional patterns and metabolic flux distributions that ultimately define growth strategy. To address interrelationships between metabolism and regulation, we performed integrative analyses of metabolic and gene co-expression networks in a model cyanobacterium, Synechococcus sp. PCC 7002. Centrality analyses using the gene co-expression network identified a set of key genes, which were defined here as ‘topologically important.’ Parallel in silico gene knock-out simulations, using the genome-scale metabolic network, classified what we termedmore » as ‘functionally important’ genes, deletion of which affected growth or metabolism. A strong positive correlation was observed between topologically and functionally important genes. Functionally important genes exhibited variable levels of topological centrality; however, the majority of topologically central genes were found to be functionally essential for growth. Subsequent functional enrichment analysis revealed that both functionally and topologically important genes in Synechococcus sp. PCC 7002 are predominantly associated with translation and energy metabolism, two cellular processes critical for growth. This research demonstrates how synergistic network-level analyses can be used for reconciliation of metabolic and gene expression data to uncover fundamental biological principles.« less
Qi, Yanxiang; Liu, Xiaomei; Pu, Jinji
2018-01-01
The NAC transcription factors involved plant development and response to various stress stimuli. However, little information is available concerning the NAC family in the woodland strawberry. Herein, 37 NAC genes were identified from the woodland strawberry genome and were classified into 13 groups based on phylogenetic analysis. And further analyses of gene structure and conserved motifs showed closer relationship of them in every subgroup. Quantitative real-time PCR evaluation different tissues revealed distinct spatial expression profiles of the FvNAC genes. The comprehensive expression of FvNAC genes revealed under abiotic stress (cold, heat, drought, salt), signal molecule treatments (H2O2, ABA, melatonin, rapamycin), biotic stress (Colletotrichum gloeosporioides and Ralstonia solanacearum). Expression profiles derived from quantitative real-time PCR suggested that 5 FvNAC genes responded dramatically to the various abiotic and biotic stresses, indicating their contribution to abiotic and biotic stresses resistance in woodland strawberry. Interestingly, FvNAC genes showed greater extent responded to the cold treatment than other abiotic stress, and H2O2 exhibited a greater response than ABA, melatonin, and rapamycin. For biotic stresses, 3 FvNAC genes were up-regulated during infection with C. gloeosporioides, while 6 FvNAC genes were down-regulated during infection with R. solanacearum. In conclusion, this study identified candidate FvNAC genes to be used for the genetic improvement of abiotic and biotic stress tolerance in woodland strawberry. PMID:29897926
Identification of an Efficient Gene Expression Panel for Glioblastoma Classification
Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni
2016-01-01
We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170
The interplay of post-translational modification and gene therapy.
Osamor, Victor Chukwudi; Chinedu, Shalom N; Azuh, Dominic E; Iweala, Emeka Joshua; Ogunlana, Olubanke Olujoke
2016-01-01
Several proteins interact either to activate or repress the expression of other genes during transcription. Based on the impact of these activities, the proteins can be classified into readers, modifier writers, and modifier erasers depending on whether histone marks are read, added, or removed, respectively, from a specific amino acid. Transcription is controlled by dynamic epigenetic marks with serious health implications in certain complex diseases, whose understanding may be useful in gene therapy. This work highlights traditional and current advances in post-translational modifications with relevance to gene therapy delivery. We report that enhanced understanding of epigenetic machinery provides clues to functional implication of certain genes/gene products and may facilitate transition toward revision of our clinical treatment procedure with effective fortification of gene therapy delivery.
Yang, Wei; Yang, Chunping; Zhang, Jin; Yang, Yang; Wang, Baoxin; Guan, Fengrong
2018-01-01
The white-striped longhorn beetle Batocera horsfieldi (Coleoptera: Cerambycidae) is a polyphagous wood-boring pest that causes substantial damage to the lumber industry. Moreover olfactory proteins are crucial components to function in related processes, but the B. horsfieldi genome is not readily available for olfactory proteins analysis. In the present study, developmental transcriptomes of larvae from the first instar to the prepupal stage, pupae, and adults (females and males) from emergence to mating were built by RNA sequencing to establish a genetic background that may help understand olfactory genes. Approximately 199 million clean reads were obtained and assembled into 171,664 transcripts, which were classified into 23,380, 26,511, 22,393, 30,270, and 87, 732 unigenes for larvae, pupae, females, males, and combined datasets, respectively. The unigenes were annotated against NCBI’s non-redundant nucleotide and protein sequences, Swiss-Prot, Gene Ontology (GO), Pfam, Clusters of Eukaryotic Orthologous Groups (KOG), and KEGG Orthology (KO) databases. A total of 43,197 unigenes were annotated into 55 sub-categories under the three main GO categories; 25,237 unigenes were classified into 26 functional KOG categories, and 25,814 unigenes were classified into five functional KEGG Pathway categories. RSEM software identified 2,983, 3,097, 870, 2,437, 5,161, and 2,882 genes that were differentially expressed between larvae and males, larvae and pupae, larvae and females, males and females, males and pupae, and females and pupae, respectively. Among them, genes encoding seven candidate odorant binding proteins (OBPs) and three chemosensory proteins (CSPs) were identified. RT-PCR and RT-qPCR analyses showed that BhorOBP3, BhorCSP2, and BhorOBPC1/C3/C4 were highly expressed in the antenna of males, indicating these genes may may play key roles in foraging and host-orientation in B. horsfieldi. Our results provide valuable molecular information about the olfactory system in B. horsfieldi and will help guide future functional studies on olfactory genes. PMID:29474419
Chen, Jing; Zhang, Hanping; Feng, Mingfeng; Zuo, Dengpan; Hu, Yahui; Jiang, Tong
2016-07-13
Woodland strawberry (Fragaria vesca) infected with Strawberry vein banding virus (SVBV) exhibits chlorotic symptoms along the leaf veins. However, little is known about the molecular mechanism of strawberry disease caused by SVBV. We performed the next-generation sequencing (RNA-Seq) study to identify gene expression changes induced by SVBV in woodland strawberry using mock-inoculated plants as a control. Using RNA-Seq, we have identified 36,850 unigenes, of which 517 were differentially expressed in the virus-infected plants (DEGs). The unigenes were annotated and classified with Gene Ontology (GO), Clusters of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. The KEGG pathway analysis of these genes suggested that strawberry disease caused by SVBV may affect multiple processes including pigment metabolism, photosynthesis and plant-pathogen interactions. Our research provides comprehensive transcriptome information regarding SVBV infection in strawberry.
Discovery and explanation of drug-drug interactions via text mining.
Percha, Bethany; Garten, Yael; Altman, Russ B
2012-01-01
Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.
Zhu, Mingku; Chen, Guoping; Dong, Tingting; Wang, Lingling; Zhang, Jianling; Zhao, Zhiping; Hu, Zongli
2015-01-01
The DEAD-box RNA helicases are involved in almost every aspect of RNA metabolism, associated with diverse cellular functions including plant growth and development, and their importance in response to biotic and abiotic stresses is only beginning to emerge. However, none of DEAD-box genes was well characterized in tomato so far. In this study, we reported on the identification and characterization of two putative DEAD-box RNA helicase genes, SlDEAD30 and SlDEAD31 from tomato, which were classified into stress-related DEAD-box proteins by phylogenetic analysis. Expression analysis indicated that SlDEAD30 was highly expressed in roots and mature leaves, while SlDEAD31 was constantly expressed in various tissues. Furthermore, the expression of both genes was induced mainly in roots under NaCl stress, and SlDEAD31 mRNA was also increased by heat, cold, and dehydration. In stress assays, transgenic tomato plants overexpressing SlDEAD31 exhibited dramatically enhanced salt tolerance and slightly improved drought resistance, which were simultaneously demonstrated by significantly enhanced expression of multiple biotic and abiotic stress-related genes, higher survival rate, relative water content (RWC) and chlorophyll content, and lower water loss rate and malondialdehyde (MDA) production compared to wild-type plants. Collectively, these results provide a preliminary characterization of SlDEAD30 and SlDEAD31 genes in tomato, and suggest that stress-responsive SlDEAD31 is essential for salt and drought tolerance and stress-related gene regulation in plants.
Use of toxicogenomics for identifying genetic markers of pulmonary oedema
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balharry, Dominique; Oreffo, Victor; Richards, Roy
2005-04-15
This study was undertaken primarily to identify genetic markers of oedema and inflammation. Mild pulmonary injury was induced following the instillation of the oedema-producing agent, bleomycin (0.5 units). Oedema was then confirmed by conventional toxicology (lavage protein levels, free cell counts and lung/body weight ratios) and histology 3 days post-bleomycin instillation.The expression profile of 1176 mRNA species was determined for bleomycin-exposed lung (Clontech Atlas macroarray, n = 9). To obtain pertinent results from these data, it was necessary to develop a simple, effective method for bioinformatic analysis of altered gene expression. Data were log{sub 10} transformed followed by global normalisation.more » Differential gene expression was accepted if: (a) genes were statistically significant (P {<=} 0.05) from a two-tailed t test; (b) genes were consistently outside a two standard deviation (SD) range from control levels. A combination of these techniques identified 31 mRNA transcripts (approximately 3%) which were significantly altered in bleomycin treated tissue. Of these genes, 26 were down-regulated whilst only five were up-regulated. Two distinct clusters were identified, with 17 genes classified as encoding hormone receptors, and nine as encoding ion channels. Both these clusters were consistently down-regulated.The magnitude of the changes in gene expression were quantified and confirmed by Q-PCR (n = 6), validating the macroarray data and the bioinformatic analysis employed.In conclusion, this study has developed a suitable macroarray analysis procedure and provides the basis for a better understanding of the gene expression changes occurring during the early phase of drug-induced pulmonary oedema.« less
Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein
2013-01-01
RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.
The Ets Transcription Factor EHF as a Regulator of Cornea Epithelial Cell Identity*
Stephens, Denise N.; Klein, Rachel Herndon; Salmans, Michael L.; Gordon, William; Ho, Hsiang; Andersen, Bogi
2013-01-01
The cornea is the clear, outermost portion of the eye composed of three layers: an epithelium that provides a protective barrier while allowing transmission of light into the eye, a collagen-rich stroma, and an endothelium monolayer. How cornea development and aging is controlled is poorly understood. Here we characterize the mouse cornea transcriptome from early embryogenesis through aging and compare it with transcriptomes of other epithelial tissues, identifying cornea-enriched genes, pathways, and transcriptional regulators. Additionally, we profiled cornea epithelium and stroma, defining genes enriched in these layers. Over 10,000 genes are differentially regulated in the mouse cornea across the time course, showing dynamic expression during development and modest expression changes in fewer genes during aging. A striking transition time point for gene expression between postnatal days 14 and 28 corresponds with completion of cornea development at the transcriptional level. Clustering classifies co-expressed, and potentially co-regulated, genes into biologically informative categories, including groups that exhibit epithelial or stromal enriched expression. Based on these findings, and through loss of function studies and ChIP-seq, we show that the Ets transcription factor EHF promotes cornea epithelial fate through complementary gene activating and repressing activities. Furthermore, we identify potential interactions between EHF, KLF4, and KLF5 in promoting cornea epithelial differentiation. These data provide insights into the mechanisms underlying epithelial development and aging, identifying EHF as a regulator of cornea epithelial identity and pointing to interactions between Ets and KLF factors in promoting epithelial fate. Furthermore, this comprehensive gene expression data set for the cornea is a powerful tool for discovery of novel cornea regulators and pathways. PMID:24142692
Faruki, Hawazin; Mayhew, Gregory M; Fan, Cheng; Wilkerson, Matthew D; Parker, Scott; Kam-Morgan, Lauren; Eisenberg, Marcia; Horten, Bruce; Hayes, D Neil; Perou, Charles M; Lai-Goldman, Myla
2016-06-01
Context .- A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective .- To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design .- The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription-polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results .- The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions .- The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly when classification by standard methods is challenging.
Clinical Value of Prognosis Gene Expression Signatures in Colorectal Cancer: A Systematic Review
Cordero, David; Riccadonna, Samantha; Solé, Xavier; Crous-Bou, Marta; Guinó, Elisabet; Sanjuan, Xavier; Biondo, Sebastiano; Soriano, Antonio; Jurman, Giuseppe; Capella, Gabriel; Furlanello, Cesare; Moreno, Victor
2012-01-01
Introduction The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets. Methods A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples. Results Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system. Conclusions The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic. PMID:23145004
Thuwajit, Chanitra; Thuwajit, Peti; Uchida, Kazuhiko; Daorueang, Daoyot; Kaewkes, Sasithorn; Wongkham, Sopit; Miwa, Masanao
2006-06-14
To investigate the mechanism of fibroblast cell proliferation stimulated by the Opisthorchis viverrini excretory/secretory (ES) product. NIH-3T3, mouse fibroblast cells were treated with O. viverrini ES product by non-contact co-cultured with the adult parasites. Total RNA from NIH-3T3 treated and untreated with O. viverrini was extracted, reverse transcribed and hybridized with the mouse 15K complementary DNA (cDNA) array. The result was analyzed by ArrayVision version 5 and GeneSpring version 5 softwares. After normalization, the ratios of gene expression of parasite treated to untreated NIH-3T3 cells of 2-and more-fold upregulated was defined as the differentially expressed genes. The expression levels of the signal transduction genes were validated by semi-quantitative SYBR-based real-time RT-PCR. Among a total of 15,000 genes/ESTs, 239 genes with established cell proliferation-related function were 2 fold-and more-up-regulated by O. viverrini ES product compared to those in cells without exposure to the parasitic product. These genes were classified into groups including energy and metabolism, signal transduction, protein synthesis and translation, matrix and structural protein, transcription control, cell cycle and DNA replication. Moreover, the expressions of serine-threonine kinase receptor, receptor tyrosine kinase and collagen production-related genes were up-regulated by O. viverrini ES product. The expression level of signal transduction genes; pkC, pdgfr alpha, jak 1, eps 8, tgf beta 1i4, strap and h ras measured by real-time RT-PCR confirmed their expression levels to those obtained from cDNA array. However, only the up-regulated expression of pkC, eps 8 and tgfbeta 1i4 which are the downstream signaling molecules of either epidermal growth factor (EGF) or transforming growth factor-beta (TGF-beta) showed statistical significance (P < 0.05). O. viverrini ES product stimulates the significant changes of gene expression in several functional categories and these mainly include transcripts related to cell proliferation. The TGF-beta and EGF signal transduction pathways are indicated as the possible pathways of O. viverrini-driven cell proliferation.
Hu, Ruibo; Yu, Changjiang; Wang, Xiaoyu; Jia, Chunlin; Pei, Shengqiang; He, Kang; He, Guo; Kong, Yingzhen; Zhou, Gongke
2017-01-01
HIGHLIGHT De novo transcriptome profiling of five tissues reveals candidate genes putatively involved in rhizome development in M. lutarioriparius. Miscanthus lutarioriparius is a promising lignocellulosic feedstock for second-generation bioethanol production. However, the genomic resource for this species is relatively limited thus hampers our understanding of the molecular mechanisms underlying many important biological processes. In this study, we performed the first de novo transcriptome analysis of five tissues (leaf, stem, root, lateral bud and rhizome bud) of M. lutarioriparius with an emphasis to identify putative genes involved in rhizome development. Approximately 66 gigabase (GB) paired-end clean reads were obtained and assembled into 169,064 unigenes with an average length of 759 bp. Among these unigenes, 103,899 (61.5%) were annotated in seven public protein databases. Differential gene expression profiling analysis revealed that 4,609, 3,188, 1,679, 1,218, and 1,077 genes were predominantly expressed in root, leaf, stem, lateral bud, and rhizome bud, respectively. Their expression patterns were further classified into 12 distinct clusters. Pathway enrichment analysis revealed that genes predominantly expressed in rhizome bud were mainly involved in primary metabolism and hormone signaling and transduction pathways. Noteworthy, 19 transcription factors (TFs) and 16 hormone signaling pathway-related genes were identified to be predominantly expressed in rhizome bud compared with the other tissues, suggesting putative roles in rhizome formation and development. In addition, a predictive regulatory network was constructed between four TFs and six auxin and abscisic acid (ABA) -related genes. Furthermore, the expression of 24 rhizome-specific genes was further validated by quantitative real-time RT-PCR (qRT-PCR) analysis. Taken together, this study provide a global portrait of gene expression across five different tissues and reveal preliminary insights into rhizome growth and development. The data presented will contribute to our understanding of the molecular mechanisms underlying rhizome development in M. lutarioriparius and remarkably enrich the genomic resources of Miscanthus. PMID:28446913
Martín, Juan F; Rodríguez-García, Antonio; Liras, Paloma
2017-05-01
Phosphate limitation is important for production of antibiotics and other secondary metabolites in Streptomyces. Phosphate control is mediated by the two-component system PhoR-PhoP. Following phosphate depletion, PhoP stimulates expression of genes involved in scavenging, transport and mobilization of phosphate, and represses the utilization of nitrogen sources. PhoP reduces expression of genes for aerobic respiration and activates nitrate respiration genes. PhoP activates genes for teichuronic acid formation and reduces expression of genes for phosphate-rich teichoic acid biosynthesis. In Streptomyces coelicolor, PhoP repressed several differentiation and pleiotropic regulatory genes, which affects development and indirectly antibiotic biosynthesis. A new bioinformatics analysis of the putative PhoP-binding sequences in Streptomyces avermitilis was made. Many sequences in S. avermitilis genome showed high weight values and were classified according to the available genetic information. These genes encode phosphate scavenging proteins, phosphate transporters and nitrogen metabolism genes. Among of the genes highlighted in the new studies was aveR, located in the avermectin gene cluster, encoding a LAL-type regulator, and afsS, which is regulated by PhoP and AfsR. The sequence logo for S. avermitilis PHO boxes is similar to that of S. coelicolor, with differences in the weight value for specific nucleotides in the sequence.
Genome-wide identification and characterization of Fox genes in the silkworm, Bombyx mori.
Song, JiangBo; Li, ZhiQuan; Tong, XiaoLing; Chen, Cong; Chen, Min; Meng, Gang; Chen, Peng; Li, ChunLin; Xin, YaQun; Gai, TingTing; Dai, FangYin; Lu, Cheng
2015-09-01
The forkhead box (Fox) transcription factor family has a characteristic of forkhead domain, a winged DNA-binding domain. The Fox genes have been classified into 23 subfamilies, designated FoxA to FoxS, of which the FoxR and FoxS subfamilies are specific to vertebrates. In this review, using whole-genome scanning, we identified 17 distinct Fox genes distributed on 13 chromosomes of the silkworm, Bombyx mori. A phylogenetic tree showed that the silkworm Fox genes could be classified into 13 subfamilies. The FoxK subfamily is specifically absent from the silkworm, although it is present in other lepidopteran insects, including Danaus plexippus and Heliconius melpomene. Microarray data revealed that the Fox genes have distinct expression patterns in the tissues on day 3 of the 5th instar larva. A Gene Ontology analysis suggested that the Fox genes have roles in cellular components, molecular functions, and biological processes, except in pore complex biogenesis. An analysis of the selective pressure on the proteins indicated that most of the amino acid sites in the Fox proteins are undergoing strong purifying selection. Here, we summarize the general characteristics of the Fox genes in the silkworm, which should support further functional studies of the silkworm Fox proteins.
Begum, Tina; Ghosh, Tapash Chandra
2014-10-05
To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation of the interactome network, show tissue-restricted expression, and are involved in transmembrane transport. Among all the factors, our regression analyses interestingly suggest the independent effects of 1) drug-induced perturbation and 2) the interaction term of expression breadth and transmembrane transport on protein evolutionary rates. We reasoned that the drug-induced network disruption is a combination of several edgetic perturbations and, thus, has more severe effect on gene phenotypes. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Muthamilarasan, Mehanathan; Bonthala, Venkata S.; Khandelwal, Rohit; Jaishankar, Jananee; Shweta, Shweta; Nawaz, Kashif; Prasad, Manoj
2015-01-01
Transcription factors (TFs) are major players in stress signaling and constitute an integral part of signaling networks. Among the major TFs, WRKY proteins play pivotal roles in regulation of transcriptional reprogramming associated with stress responses. In view of this, genome- and transcriptome-wide identification of WRKY TF family was performed in the C4model plants, Setaria italica (SiWRKY) and S. viridis (SvWRKY), respectively. The study identified 105 SiWRKY and 44 SvWRKY proteins that were computationally analyzed for their physicochemical properties. Sequence alignment and phylogenetic analysis classified these proteins into three major groups, namely I, II, and III with majority of WRKY proteins belonging to group II (53 SiWRKY and 23 SvWRKY), followed by group III (39 SiWRKY and 11 SvWRKY) and group I (10 SiWRKY and 6 SvWRKY). Group II proteins were further classified into 5 subgroups (IIa to IIe) based on their phylogeny. Domain analysis showed the presence of WRKY motif and zinc finger-like structures in these proteins along with additional domains in a few proteins. All SiWRKY genes were physically mapped on the S. italica genome and their duplication analysis revealed that 10 and 8 gene pairs underwent tandem and segmental duplications, respectively. Comparative mapping of SiWRKY and SvWRKY genes in related C4 panicoid genomes demonstrated the orthologous relationships between these genomes. In silico expression analysis of SiWRKY and SvWRKY genes showed their differential expression patterns in different tissues and stress conditions. Expression profiling of candidate SiWRKY genes in response to stress (dehydration and salinity) and hormone treatments (abscisic acid, salicylic acid, and methyl jasmonate) suggested the putative involvement of SiWRKY066 and SiWRKY082 in stress and hormone signaling. These genes could be potential candidates for further characterization to delineate their functional roles in abiotic stress signaling. PMID:26635818
Muthamilarasan, Mehanathan; Bonthala, Venkata S; Khandelwal, Rohit; Jaishankar, Jananee; Shweta, Shweta; Nawaz, Kashif; Prasad, Manoj
2015-01-01
Transcription factors (TFs) are major players in stress signaling and constitute an integral part of signaling networks. Among the major TFs, WRKY proteins play pivotal roles in regulation of transcriptional reprogramming associated with stress responses. In view of this, genome- and transcriptome-wide identification of WRKY TF family was performed in the C4model plants, Setaria italica (SiWRKY) and S. viridis (SvWRKY), respectively. The study identified 105 SiWRKY and 44 SvWRKY proteins that were computationally analyzed for their physicochemical properties. Sequence alignment and phylogenetic analysis classified these proteins into three major groups, namely I, II, and III with majority of WRKY proteins belonging to group II (53 SiWRKY and 23 SvWRKY), followed by group III (39 SiWRKY and 11 SvWRKY) and group I (10 SiWRKY and 6 SvWRKY). Group II proteins were further classified into 5 subgroups (IIa to IIe) based on their phylogeny. Domain analysis showed the presence of WRKY motif and zinc finger-like structures in these proteins along with additional domains in a few proteins. All SiWRKY genes were physically mapped on the S. italica genome and their duplication analysis revealed that 10 and 8 gene pairs underwent tandem and segmental duplications, respectively. Comparative mapping of SiWRKY and SvWRKY genes in related C4 panicoid genomes demonstrated the orthologous relationships between these genomes. In silico expression analysis of SiWRKY and SvWRKY genes showed their differential expression patterns in different tissues and stress conditions. Expression profiling of candidate SiWRKY genes in response to stress (dehydration and salinity) and hormone treatments (abscisic acid, salicylic acid, and methyl jasmonate) suggested the putative involvement of SiWRKY066 and SiWRKY082 in stress and hormone signaling. These genes could be potential candidates for further characterization to delineate their functional roles in abiotic stress signaling.
Functional dissection of drought-responsive gene expression patterns in Cynodon dactylon L.
Kim, Changsoo; Lemke, Cornelia; Paterson, Andrew H
2009-05-01
Water deficit is one of the main abiotic factors that affect plant productivity in subtropical regions. To identify genes induced during the water stress response in Bermudagrass (Cynodon dactylon), cDNA macroarrays were used. The macroarray analysis identified 189 drought-responsive candidate genes from C. dactylon, of which 120 were up-regulated and 69 were down-regulated. The candidate genes were classified into seven groups by cluster analysis of expression levels across two intensities and three durations of imposed stress. Annotation using BLASTX suggested that up-regulated genes may be involved in proline biosynthesis, signal transduction pathways, protein repair systems, and removal of toxins, while down-regulated genes were mostly related to basic plant metabolism such as photosynthesis and glycolysis. The functional classification of gene ontology (GO) was consistent with the BLASTX results, also suggesting some crosstalk between abiotic and biotic stress. Comparative analysis of cis-regulatory elements from the candidate genes implicated specific elements in drought response in Bermudagrass. Although only a subset of genes was studied, Bermudagrass shared many drought-responsive genes and cis-regulatory elements with other botanical models, supporting a strategy of cross-taxon application of drought-responsive genes, regulatory cues, and physiological-genetic information.
A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.
Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong
2017-01-01
To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.
Sommariva, Michele; De Cecco, Loris; De Cesare, Michelandrea; Sfondrini, Lucia; Ménard, Sylvie; Melani, Cecilia; Delia, Domenico; Zaffaroni, Nadia; Pratesi, Graziella; Uva, Valentina; Tagliabue, Elda; Balsari, Andrea
2011-10-15
Synthetic oligodeoxynucleotides expressing CpG motifs (CpG-ODN) are a Toll-like receptor 9 (TLR9) agonist that can enhance the antitumor activity of DNA-damaging chemotherapy and radiation therapy in preclinical mouse models. We hypothesized that the success of these combinations is related to the ability of CpG-ODN to modulate genes involved in DNA repair. We conducted an in silico analysis of genes implicated in DNA repair in data sets obtained from murine colon carcinoma cells in mice injected intratumorally with CpG-ODN and from splenocytes in mice treated intraperitoneally with CpG-ODN. CpG-ODN treatment caused downregulation of DNA repair genes in tumors. Microarray analyses of human IGROV-1 ovarian carcinoma xenografts in mice treated intraperitoneally with CpG-ODN confirmed in silico findings. When combined with the DNA-damaging drug cisplatin, CpG-ODN significantly increased the life span of mice compared with individual treatments. In contrast, CpG-ODN led to an upregulation of genes involved in DNA repair in immune cells. Cisplatin-treated patients with ovarian carcinoma as well as anthracycline-treated patients with breast cancer who are classified as "CpG-like" for the level of expression of CpG-ODN modulated DNA repair genes have a better outcome than patients classified as "CpG-untreated-like," indicating the relevance of these genes in the tumor cell response to DNA-damaging drugs. Taken together, the findings provide evidence that the tumor microenvironment can sensitize cancer cells to DNA-damaging chemotherapy, thereby expanding the benefits of CpG-ODN therapy beyond induction of a strong immune response.
Dunne, Philip D; McArt, Darragh G; Bradley, Conor A; O'Reilly, Paul G; Barrett, Helen L; Cummins, Robert; O'Grady, Tony; Arthur, Ken; Loughrey, Maurice B; Allen, Wendy L; McDade, Simon S; Waugh, David J; Hamilton, Peter W; Longley, Daniel B; Kay, Elaine W; Johnston, Patrick G; Lawler, Mark; Salto-Tellez, Manuel; Van Schaeybroeck, Sandra
2016-08-15
A number of independent gene expression profiling studies have identified transcriptional subtypes in colorectal cancer with potential diagnostic utility, culminating in publication of a colorectal cancer Consensus Molecular Subtype classification. The worst prognostic subtype has been defined by genes associated with stem-like biology. Recently, it has been shown that the majority of genes associated with this poor prognostic group are stromal derived. We investigated the potential for tumor misclassification into multiple diagnostic subgroups based on tumoral region sampled. We performed multiregion tissue RNA extraction/transcriptomic analysis using colorectal-specific arrays on invasive front, central tumor, and lymph node regions selected from tissue samples from 25 colorectal cancer patients. We identified a consensus 30-gene list, which represents the intratumoral heterogeneity within a cohort of primary colorectal cancer tumors. Using a series of online datasets, we showed that this gene list displays prognostic potential HR = 2.914 (confidence interval 0.9286-9.162) in stage II/III colorectal cancer patients, but in addition, we demonstrated that these genes are stromal derived, challenging the assumption that poor prognosis tumors with stem-like biology have undergone a widespread epithelial-mesenchymal transition. Most importantly, we showed that patients can be simultaneously classified into multiple diagnostically relevant subgroups based purely on the tumoral region analyzed. Gene expression profiles derived from the nonmalignant stromal region can influence assignment of colorectal cancer transcriptional subtypes, questioning the current molecular classification dogma and highlighting the need to consider pathology sampling region and degree of stromal infiltration when employing transcription-based classifiers to underpin clinical decision making in colorectal cancer. Clin Cancer Res; 22(16); 4095-104. ©2016 AACRSee related commentary by Morris and Kopetz, p. 3989. ©2016 American Association for Cancer Research.
Kusy, Maciej; Obrzut, Bogdan; Kluska, Jacek
2013-12-01
The aim of this article was to compare gene expression programming (GEP) method with three types of neural networks in the prediction of adverse events of radical hysterectomy in cervical cancer patients. One-hundred and seven patients treated by radical hysterectomy were analyzed. Each record representing a single patient consisted of 10 parameters. The occurrence and lack of perioperative complications imposed a two-class classification problem. In the simulations, GEP algorithm was compared to a multilayer perceptron (MLP), a radial basis function network neural, and a probabilistic neural network. The generalization ability of the models was assessed on the basis of their accuracy, the sensitivity, the specificity, and the area under the receiver operating characteristic curve (AUROC). The GEP classifier provided best results in the prediction of the adverse events with the accuracy of 71.96 %. Comparable but slightly worse outcomes were obtained using MLP, i.e., 71.87 %. For each of measured indices: accuracy, sensitivity, specificity, and the AUROC, the standard deviation was the smallest for the models generated by GEP classifier.
Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.
Zhang, Xiang; Guan, Naiyang; Jia, Zhilong; Qiu, Xiaogang; Luo, Zhigang
2015-01-01
Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fournier, Marcia V.; Martin, Katherine J.; Kenny, Paraic A.
To understand how non-malignant human mammary epithelial cells (HMEC) transit from a disorganized proliferating to an organized growth arrested state, and to relate this process to the changes that occur in breast cancer, we studied gene expression changes in non-malignant HMEC grown in three-dimensional cultures, and in a previously published panel of microarray data for 295 breast cancer samples. We hypothesized that the gene expression pattern of organized and growth arrested mammary acini would share similarities with breast tumors with good prognoses. Using Affymetrix HG-U133A microarrays, we analyzed the expression of 22,283 gene transcripts in two HMEC cell lines, 184more » (finite life span) and HMT3522 S1 (immortal non-malignant), on successive days post-seeding in a laminin-rich extracellular matrix assay. Both HMECs underwent growth arrest in G0/G1 and differentiated into polarized acini between days 5 and 7. We identified gene expression changes with the same temporal pattern in both lines. We show that genes that are significantly lower in the organized, growth arrested HMEC than in their proliferating counterparts can be used to classify breast cancer patients into poor and good prognosis groups with high accuracy. This study represents a novel unsupervised approach to identifying breast cancer markers that may be of use clinically.« less
Rojas-Peña, Monica L; Olivares-Navarrete, Rene; Hyzy, Sharon; Arafat, Dalia; Schwartz, Zvi; Boyan, Barbara D; Williams, Joseph; Gibson, Greg
2014-01-01
Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention.
Rojas-Peña, Monica L.; Olivares-Navarrete, Rene; Hyzy, Sharon; Arafat, Dalia; Schwartz, Zvi; Boyan, Barbara D.; Williams, Joseph; Gibson, Greg
2014-01-01
Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention. PMID:25184005
Combining Gene Signatures Improves Prediction of Breast Cancer Survival
Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian
2011-01-01
Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set. PMID:21423775
[The application of gene expression programming in the diagnosis of heart disease].
Dai, Wenbin; Zhang, Yuntao; Gao, Xingyu
2009-02-01
GEP (Gene expression programming) is a new genetic algorithm, and it has been proved to be excellent in function finding. In this paper, for the purpose of setting up a diagnostic model, GEP is used to deal with the data of heart disease. Eight variables, Sex, Chest pain, Blood pressure, Angina, Peak, Slope, Colored vessels and Thal, are picked out of thirteen variables to form a classified function. This function is used to predict a forecasting set of 100 samples, and the accuracy is 87%. Other algorithms such as SVM (Support vector machine) are applied to the same data and the forecasting results show that GEP is better than other algorithms.
Genetic and cytokine changes associated with symptomatic stages of CLL.
Agarwal, Amit; Cooke, Lawrence; Riley, Christopher; Qi, Wenqing; Mount, David; Mahadevan, Daruka
2014-09-01
The pathogenesis and drug resistance of symptomatic CLL patients involves genetic changes associated with the CLL clone as well as changes within the microenvironment. To further understand these processes, we compared early stage CLL to symptomatic late stage using gene expression and serum cytokine profiling to gain insight of the genetic and microenvironment changes associated with the most severe form of the disease. Patients were classified into low stage (Rai stage 0/I/II) and high stage (Rai stage III/IV). Gene expression profiles were obtained on pretreatment samples using the HG-U133A 2.0 Affymetrix platform. A comparison of low versus high stage CLL revealed a set of 21 genes differentially expressed genes. 15 genes were up regulated in the high stage compared to low stage while 6 genes were down regulated. Analysis of GO molecular function revealed 9 of 21 genes were involved in transcription factor activity. Serum cytokine profiles showed six cytokines to be significantly different in high stage patients. Two chemokines, SDF-1/CXCL12 and uPAR known to be involved in stem cell mobilization and homing were increased in serum of high stage patients. This study has identified therapeutic targets for symptomatic CLL patients. Copyright © 2014 Elsevier Ltd. All rights reserved.
Li, Xiang; Bi, Zhenghong; Di, Rong; Liang, Peng; He, Qiguang; Liu, Wenbo; Miao, Weiguo; Zheng, Fucong
2016-01-01
Powdery mildew is an important disease of rubber trees caused by Oidium heveae B. A. Steinmann. As far as we know, none of the resistance genes related to powdery mildew have been isolated from the rubber tree. There is little information available at the molecular level regarding how a rubber tree develops defense mechanisms against this pathogen. We have studied rubber tree mRNA transcripts from the resistant RRIC52 cultivar by differential display analysis. Leaves inoculated with the spores of O. heveae were collected from 0 to 120 hpi in order to identify pathogen-regulated genes at different infection stages. We identified 78 rubber tree genes that were differentially expressed during the plant–pathogen interaction. BLAST analysis for these 78 ESTs classified them into seven functional groups: cell wall and membrane pathways, transcription factor and regulatory proteins, transporters, signal transduction, phytoalexin biosynthesis, other metabolism functions, and unknown functions. The gene expression for eight of these genes was validated by qRT-PCR in both RRIC52 and the partially susceptible Reyan 7-33-97 cultivars, revealing the similar or differential changes of gene expressions between these two cultivars. This study has improved our overall understanding of the molecular mechanisms of rubber tree resistance to powdery mildew. PMID:26840302
Forreryd, Andy; Johansson, Henrik; Albrekt, Ann-Sofie; Lindstedt, Malin
2014-05-16
Allergic contact dermatitis (ACD) develops upon exposure to certain chemical compounds termed skin sensitizers. To reduce the occurrence of skin sensitizers, chemicals are regularly screened for their capacity to induce sensitization. The recently developed Genomic Allergen Rapid Detection (GARD) assay is an in vitro alternative to animal testing for identification of skin sensitizers, classifying chemicals by evaluating transcriptional levels of a genomic biomarker signature. During assay development and biomarker identification, genome-wide expression analysis was applied using microarrays covering approximately 30,000 transcripts. However, the microarray platform suffers from drawbacks in terms of low sample throughput, high cost per sample and time consuming protocols and is a limiting factor for adaption of GARD into a routine assay for screening of potential sensitizers. With the purpose to simplify assay procedures, improve technical parameters and increase sample throughput, we assessed the performance of three high throughput gene expression platforms--nCounter®, BioMark HD™ and OpenArray®--and correlated their performance metrics against our previously generated microarray data. We measured the levels of 30 transcripts from the GARD biomarker signature across 48 samples. Detection sensitivity, reproducibility, correlations and overall structure of gene expression measurements were compared across platforms. Gene expression data from all of the evaluated platforms could be used to classify most of the sensitizers from non-sensitizers in the GARD assay. Results also showed high data quality and acceptable reproducibility for all platforms but only medium to poor correlations of expression measurements across platforms. In addition, evaluated platforms were superior to the microarray platform in terms of cost efficiency, simplicity of protocols and sample throughput. We evaluated the performance of three non-array based platforms using a limited set of transcripts from the GARD biomarker signature. We demonstrated that it was possible to achieve acceptable discriminatory power in terms of separation between sensitizers and non-sensitizers in the GARD assay while reducing assay costs, simplify assay procedures and increase sample throughput by using an alternative platform, providing a first step towards the goal to prepare GARD for formal validation and adaption of the assay for industrial screening of potential sensitizers.
Córdoba, S; Balcells, I; Castelló, A; Ovilo, C; Noguera, J L; Timoneda, O; Sánchez, A
2015-10-05
Prolificacy can directly impact porcine profitability, but large genetic variation and low heritability have been found regarding litter size among porcine breeds. To identify key differences in gene expression associated to swine reproductive efficiency, we performed a transcriptome analysis of sows' endometrium from an Iberian x Meishan F2 population at day 30-32 of gestation, classified according to their estimated breeding value (EBV) as high (H, EBV > 0) and low (L, EBV < 0) prolificacy phenotypes. For each sample, mRNA and small RNA libraries were RNA-sequenced, identifying 141 genes and 10 miRNAs differentially expressed between H and L groups. We selected four miRNAs based on their role in reproduction, and five genes displaying the highest differences and a positive mapping into known reproductive QTLs for RT-qPCR validation on the whole extreme population. Significant differences were validated for genes: PTGS2 (p = 0.03; H/L ratio = 3.50), PTHLH (p = 0.03; H/L ratio = 3.69), MMP8 (p = 0.01; H/L ratio =4.41) and SCNN1G (p = 0.04; H/L ratio = 3.42). Although selected miRNAs showed similar expression levels between H and L groups, significant correlation was found between the expression level of ssc-miR-133a (p < 0.01) and ssc-miR-92a (p < 0.01) and validated genes. These results provide a better understanding of the genetic architecture of prolificacy-related traits and embryo implantation failure in pigs.
Schuur, Eric; Angel Aristizabal, Javier; Bargallo Rocha, Juan Enrique; Cabello, Cesar; Elizalde, Roberto; García‐Estévez, Laura; Gomez, Henry L.; Katz, Artur; Nuñez De Pierro, Aníbal
2017-01-01
Risk stratification of patients with early stage breast cancer may support adjuvant chemotherapy decision‐making. This review details the development and validation of six multi‐gene classifiers, each of which claims to provide useful prognostic and possibly predictive information for early stage breast cancer patients. A careful assessment is presented of each test's analytical validity, clinical validity, and clinical utility, as well as the quality of evidence supporting its use. PMID:28211064
Vidal Insua, Yolanda; De La Cámara, Juan; Brozos Vázquez, Elena; Fernández, Ana; Vázquez Rivera, Francisca; Villanueva Silva, Mª José; Barbazán, Jorge; Muinelo-Romay, Laura; Candamio Folgar, Sonia; Abalo, Alicia; López-López, Rafael; Abal, Miguel; Alonso-Alconada, Lorena
2017-01-01
Colorectal cancer (CRC) is one of the major causes of cancer-related deaths. Early detection of tumor relapse is crucial for determining the most appropriate therapeutic management. In clinical practice, computed tomography (CT) is routinely used, but small tumor changes are difficult to visualize, and reliable blood-based prognostic and monitoring biomarkers are urgently needed. The aim of this study was to prospectively validate a gene expression panel (composed of GAPDH, VIL1, CLU, TIMP1, TLN1, LOXL3 and ZEB2) for detecting circulating tumor cells (CTCs) as prognostic and predictive tool in blood samples from 94 metastatic CRC (mCRC) patients. Patients with higher gene panel expression before treatment had a reduced progression-free survival (PFS) and overall-survival (OS) rates compared with patients with low expression (p = 0.003 and p ≤ 0.001, respectively). Patients with increased expression of CTCs markers during treatment presented PFS and OS times of 8.95 and 11.74 months, respectively, compared with 14.41 and 24.7 for patients presenting decreased expression (PFS; p = 0.020; OS; p ≤ 0.001). Patients classified as non-responders by CTCs with treatment, but classified as responders by CT scan, showed significantly shorter survival times (PFS: 8.53 vs. 11.70; OS: 10.37 vs. 24.13; months). In conclusion, our CTCs detection panel demonstrated efficacy for early treatment response assessment in mCRC patients, and with increased reliability compared to CT scan. PMID:28608814
Sharma, Rita; Cao, Peijian; Jung, Ki-Hong; Sharma, Manoj K.; Ronald, Pamela C.
2013-01-01
Glycoside hydrolases (GH) catalyze the hydrolysis of glycosidic bonds in cell wall polymers and can have major effects on cell wall architecture. Taking advantage of the massive datasets available in public databases, we have constructed a rice phylogenomic database of GHs (http://ricephylogenomics.ucdavis.edu/cellwalls/gh/). This database integrates multiple data types including the structural features, orthologous relationships, mutant availability, and gene expression patterns for each GH family in a phylogenomic context. The rice genome encodes 437 GH genes classified into 34 families. Based on pairwise comparison with eight dicot and four monocot genomes, we identified 138 GH genes that are highly diverged between monocots and dicots, 57 of which have diverged further in rice as compared with four monocot genomes scanned in this study. Chromosomal localization and expression analysis suggest a role for both whole-genome and localized gene duplications in expansion and diversification of GH families in rice. We examined the meta-profiles of expression patterns of GH genes in twenty different anatomical tissues of rice. Transcripts of 51 genes exhibit tissue or developmental stage-preferential expression, whereas, seventeen other genes preferentially accumulate in actively growing tissues. When queried in RiceNet, a probabilistic functional gene network that facilitates functional gene predictions, nine out of seventeen genes form a regulatory network with the well-characterized genes involved in biosynthesis of cell wall polymers including cellulose synthase and cellulose synthase-like genes of rice. Two-thirds of the GH genes in rice are up regulated in response to biotic and abiotic stress treatments indicating a role in stress adaptation. Our analyses identify potential GH targets for cell wall modification. PMID:23986771
Genome-Wide Analysis of the NAC Gene Family in Physic Nut (Jatropha curcas L.)
Wu, Zhenying; Xu, Xueqin; Xiong, Wangdan; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Wu, Guojiang; Jiang, Huawu
2015-01-01
The NAC proteins (NAM, ATAF1/2 and CUC2) are plant-specific transcriptional regulators that have a conserved NAM domain in the N-terminus. They are involved in various biological processes, including both biotic and abiotic stress responses. In the present study, a total of 100 NAC genes (JcNAC) were identified in physic nut (Jatropha curcas L.). Based on phylogenetic analysis and gene structures, 83 JcNAC genes were classified as members of, or proposed to be diverged from, 39 previously predicted orthologous groups (OGs) of NAC sequences. Physic nut has a single intron-containing NAC gene subfamily that has been lost in many plants. The JcNAC genes are non-randomly distributed across the 11 linkage groups of the physic nut genome, and appear to be preferentially retained duplicates that arose from both ancient and recent duplication events. Digital gene expression analysis indicates that some of the JcNAC genes have tissue-specific expression profiles (e.g. in leaves, roots, stem cortex or seeds), and 29 genes differentially respond to abiotic stresses (drought, salinity, phosphorus deficiency and nitrogen deficiency). Our results will be helpful for further functional analysis of the NAC genes in physic nut. PMID:26125188
Genome-Wide Analyses of the Soybean F-Box Gene Family in Response to Salt Stress
Jia, Qi; Xiao, Zhi-Xia; Wong, Fuk-Ling; Sun, Song; Liang, Kang-Jing; Lam, Hon-Ming
2017-01-01
The F-box family is one of the largest gene families in plants that regulate diverse life processes, including salt responses. However, the knowledge of the soybean F-box genes and their roles in salt tolerance remains limited. Here, we conducted a genome-wide survey of the soybean F-box family, and their expression analysis in response to salinity via in silico analysis of online RNA-sequencing (RNA-seq) data and quantitative reverse-transcription polymerase chain reaction (qRT-PCR) to predict their potential functions. A total of 725 potential F-box proteins encoded by 509 genes were identified and classified into 9 subfamilies. The gene structures, conserved domains and chromosomal distributions were characterized. There are 76 pairs of duplicate genes identified, including genome-wide segmental and tandem duplication events, which lead to the expansion of the number of F-box genes. The in silico expression analysis showed that these genes would be involved in diverse developmental functions and play an important role in salt response. Our qRT-PCR analysis confirmed 12 salt-responding F-box genes. Overall, our results provide useful information on soybean F-box genes, especially their potential roles in salt tolerance. PMID:28417911
Genome-Wide Analyses of the Soybean F-Box Gene Family in Response to Salt Stress.
Jia, Qi; Xiao, Zhi-Xia; Wong, Fuk-Ling; Sun, Song; Liang, Kang-Jing; Lam, Hon-Ming
2017-04-12
The F-box family is one of the largest gene families in plants that regulate diverse life processes, including salt responses. However, the knowledge of the soybean F-box genes and their roles in salt tolerance remains limited. Here, we conducted a genome-wide survey of the soybean F-box family, and their expression analysis in response to salinity via in silico analysis of online RNA-sequencing (RNA-seq) data and quantitative reverse-transcription polymerase chain reaction (qRT-PCR) to predict their potential functions. A total of 725 potential F-box proteins encoded by 509 genes were identified and classified into 9 subfamilies. The gene structures, conserved domains and chromosomal distributions were characterized. There are 76 pairs of duplicate genes identified, including genome-wide segmental and tandem duplication events, which lead to the expansion of the number of F-box genes. The in silico expression analysis showed that these genes would be involved in diverse developmental functions and play an important role in salt response. Our qRT-PCR analysis confirmed 12 salt-responding F-box genes. Overall, our results provide useful information on soybean F-box genes, especially their potential roles in salt tolerance.
Genome-Wide Analysis of the NAC Gene Family in Physic Nut (Jatropha curcas L.).
Wu, Zhenying; Xu, Xueqin; Xiong, Wangdan; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Wu, Guojiang; Jiang, Huawu
2015-01-01
The NAC proteins (NAM, ATAF1/2 and CUC2) are plant-specific transcriptional regulators that have a conserved NAM domain in the N-terminus. They are involved in various biological processes, including both biotic and abiotic stress responses. In the present study, a total of 100 NAC genes (JcNAC) were identified in physic nut (Jatropha curcas L.). Based on phylogenetic analysis and gene structures, 83 JcNAC genes were classified as members of, or proposed to be diverged from, 39 previously predicted orthologous groups (OGs) of NAC sequences. Physic nut has a single intron-containing NAC gene subfamily that has been lost in many plants. The JcNAC genes are non-randomly distributed across the 11 linkage groups of the physic nut genome, and appear to be preferentially retained duplicates that arose from both ancient and recent duplication events. Digital gene expression analysis indicates that some of the JcNAC genes have tissue-specific expression profiles (e.g. in leaves, roots, stem cortex or seeds), and 29 genes differentially respond to abiotic stresses (drought, salinity, phosphorus deficiency and nitrogen deficiency). Our results will be helpful for further functional analysis of the NAC genes in physic nut.
Zhou, Yan; Xu, Daixiang; Jia, Ledong; Huang, Xiaohu; Ma, Guoqiang; Wang, Shuxian; Zhu, Meichen; Zhang, Aoxiang; Guan, Mingwei; Lu, Kun; Xu, Xinfu; Wang, Rui; Li, Jiana; Qu, Cunmin
2017-10-24
The basic region/leucine zipper motif (bZIP) transcription factor family is one of the largest families of transcriptional regulators in plants. bZIP genes have been systematically characterized in some plants, but not in rapeseed ( Brassica napus ). In this study, we identified 247 BnbZIP genes in the rapeseed genome, which we classified into 10 subfamilies based on phylogenetic analysis of their deduced protein sequences. The BnbZIP genes were grouped into functional clades with Arabidopsis genes with similar putative functions, indicating functional conservation. Genome mapping analysis revealed that the BnbZIPs are distributed unevenly across all 19 chromosomes, and that some of these genes arose through whole-genome duplication and dispersed duplication events. All expression profiles of 247 bZIP genes were extracted from RNA-sequencing data obtained from 17 different B . napus ZS11 tissues with 42 various developmental stages. These genes exhibited different expression patterns in various tissues, revealing that these genes are differentially regulated. Our results provide a valuable foundation for functional dissection of the different BnbZIP homologs in B . napus and its parental lines and for molecular breeding studies of bZIP genes in B . napus .
Zhou, Yan; Xu, Daixiang; Jia, Ledong; Huang, Xiaohu; Ma, Guoqiang; Wang, Shuxian; Zhu, Meichen; Zhang, Aoxiang; Guan, Mingwei; Xu, Xinfu; Wang, Rui; Li, Jiana
2017-01-01
The basic region/leucine zipper motif (bZIP) transcription factor family is one of the largest families of transcriptional regulators in plants. bZIP genes have been systematically characterized in some plants, but not in rapeseed (Brassica napus). In this study, we identified 247 BnbZIP genes in the rapeseed genome, which we classified into 10 subfamilies based on phylogenetic analysis of their deduced protein sequences. The BnbZIP genes were grouped into functional clades with Arabidopsis genes with similar putative functions, indicating functional conservation. Genome mapping analysis revealed that the BnbZIPs are distributed unevenly across all 19 chromosomes, and that some of these genes arose through whole-genome duplication and dispersed duplication events. All expression profiles of 247 bZIP genes were extracted from RNA-sequencing data obtained from 17 different B. napus ZS11 tissues with 42 various developmental stages. These genes exhibited different expression patterns in various tissues, revealing that these genes are differentially regulated. Our results provide a valuable foundation for functional dissection of the different BnbZIP homologs in B. napus and its parental lines and for molecular breeding studies of bZIP genes in B. napus. PMID:29064393
Zhang, Zongying; Jiang, Shenghui; Wang, Nan; Li, Min; Ji, Xiaohao; Sun, Shasha; Liu, Jingxuan; Wang, Deyun; Xu, Haifeng; Qi, Sumin; Wu, Shujing; Fei, Zhangjun; Feng, Shouqian; Chen, Xuesen
2015-01-01
Apple is one of the most economically important horticultural fruit crops worldwide. It is critical to gain insights into fruit ripening and softening to improve apple fruit quality and extend shelf life. In this study, forward and reverse suppression subtractive hybridization libraries were generated from ‘Taishanzaoxia’ apple fruits sampled around the ethylene climacteric to isolate ripening- and softening-related genes. A set of 648 unigenes were derived from sequence alignment and cluster assembly of 918 expressed sequence tags. According to gene ontology functional classification, 390 out of 443 unigenes (88%) were assigned to the biological process category, 356 unigenes (80%) were classified in the molecular function category, and 381 unigenes (86%) were allocated to the cellular component category. A total of 26 unigenes differentially expressed during fruit development period were analyzed by quantitative RT-PCR. These genes were involved in cell wall modification, anthocyanin biosynthesis, aroma production, stress response, metabolism, transcription, or were non-annotated. Some genes associated with cell wall modification, anthocyanin biosynthesis and aroma production were up-regulated and significantly correlated with ethylene production, suggesting that fruit texture, coloration and aroma may be regulated by ethylene in ‘Taishanzaoxia’. Some of the identified unigenes associated with fruit ripening and softening have not been characterized in public databases. The results contribute to an improved characterization of changes in gene expression during apple fruit ripening and softening. PMID:26719904
Global Genetic Response in a Cancer Cell: Self-Organized Coherent Expression Dynamics
Tsuchiya, Masa; Hashimoto, Midori; Takenaka, Yoshiko; Motoike, Ikuko N.; Yoshikawa, Kenichi
2014-01-01
Understanding the basic mechanism of the spatio-temporal self-control of genome-wide gene expression engaged with the complex epigenetic molecular assembly is one of major challenges in current biological science. In this study, the genome-wide dynamical profile of gene expression was analyzed for MCF-7 breast cancer cells induced by two distinct ErbB receptor ligands: epidermal growth factor (EGF) and heregulin (HRG), which drive cell proliferation and differentiation, respectively. We focused our attention to elucidate how global genetic responses emerge and to decipher what is an underlying principle for dynamic self-control of genome-wide gene expression. The whole mRNA expression was classified into about a hundred groups according to the root mean square fluctuation (rmsf). These expression groups showed characteristic time-dependent correlations, indicating the existence of collective behaviors on the ensemble of genes with respect to mRNA expression and also to temporal changes in expression. All-or-none responses were observed for HRG and EGF (biphasic statistics) at around 10–20 min. The emergence of time-dependent collective behaviors of expression occurred through bifurcation of a coherent expression state (CES). In the ensemble of mRNA expression, the self-organized CESs reveals distinct characteristic expression domains for biphasic statistics, which exhibits notably the presence of criticality in the expression profile as a route for genomic transition. In time-dependent changes in the expression domains, the dynamics of CES reveals that the temporal development of the characteristic domains is characterized as autonomous bistable switch, which exhibits dynamic criticality (the temporal development of criticality) in the genome-wide coherent expression dynamics. It is expected that elucidation of the biophysical origin for such critical behavior sheds light on the underlying mechanism of the control of whole genome. PMID:24831017
Buck, E L; Mizubuti, I Y; Alfieri, A A; Otonel, R A A; Buck, L Y; Souza, F P; Prado-Calixto, O P; Poveda-Parra, A R; Alexandre Filho, L; Lopera-Barrero, N M
2017-03-16
Propolis can be used as growth enhancer due to its antimicrobial, antioxidant, and immune-stimulant properties, but its effects on morphometry and muscle gene expression are largely unknown. The present study evaluates the influence of propolis on muscle morphometry and myostatin gene expression in Nile tilapia (Oreochromis niloticus) bred in net cages. Reversed males (GIFT strain) with an initial weight of 170 ± 25 g were distributed in a (2 x 4) factorial scheme, with two diets (DPRO, commercial diet with 4% propolis ethanol extract and DCON, commercial diet without propolis, control) and four assessment periods (0, 35, 70, and 105 experimental days). Muscles were evaluated at each assessment period. Histomorphometric analysis classified the fiber diameters into four groups: <20 μm; 20-30 μm; 30-50 μm; and > 50 μm. RT-qPCR was performed to assess myostatin gene expression. Fibers < 20 µm diameter were more frequent in DPRO than in DCON at all times. Fiber percentages >30 µm (30-50 and > 50 µm) at 70 days were 25.39% and 40.07% for DPRO and DCON, respectively. There was greater myostatin gene expression at 105 days, averaging 1.93 and 1.89 for DCON and DPRO, respectively, with no significant difference in any of the analyzed periods. Propolis ethanol extract did not affect the diameter of muscle fibers or the gene expression of myostatin. Future studies should describe the mechanisms of natural products' effects on muscle growth and development since these factors are highly relevant for fish production performance.
Kameshwar, Ayyappa Kumar Sista; Qin, Wensheng
2017-10-01
Lignin, most complex and abundant biopolymer on the earth's surface, attains its stability from intricate polyphenolic units and non-phenolic bonds, making it difficult to depolymerize or separate from other units of biomass. Eccentric lignin degrading ability and availability of annotated genome make Phanerochaete chrysosporium ideal for studying lignin degrading mechanisms. Decoding and understanding the molecular mechanisms underlying the process of lignin degradation will significantly aid the progressing biofuel industries and lead to the production of commercially vital platform chemicals. In this study, we have performed a large-scale metadata analysis to understand the common gene expression patterns of P. chrysosporium during lignin degradation. Gene expression datasets were retrieved from NCBI GEO database and analyzed using GEO2R and Bioconductor packages. Commonly expressed statistically significant genes among different datasets were further considered to understand their involvement in lignin degradation and detoxification mechanisms. We have observed three sets of enzymes commonly expressed during ligninolytic conditions which were later classified into primary ligninolytic, aromatic compound-degrading and other necessary enzymes. Similarly, we have observed three sets of genes coding for detoxification and stress-responsive, phase I and phase II metabolic enzymes. Results obtained in this study indicate the coordinated action of enzymes involved in lignin depolymerization and detoxification-stress responses under ligninolytic conditions. We have developed tentative network of genes and enzymes involved in lignin degradation and detoxification mechanisms by P. chrysosporium based on the literature and results obtained in this study. However, ambiguity raised due to higher expression of several uncharacterized proteins necessitates for further proteomic studies in P. chrysosporium.
Li, Weiwei; Zhao, Lei; Meng, Fei; Wang, Yunsheng; Tan, Huarong; Yang, Hua; Wei, Chaoling; Wan, Xiaochun; Gao, Liping; Xia, Tao
2013-01-01
Phenolic compounds in tea plant [Camellia sinensis (L.)] play a crucial role in dominating tea flavor and possess a number of key pharmacological benefits on human health. The present research aimed to study the profile of tissue-specific, development-dependent accumulation pattern of phenolic compounds in tea plant. A total of 50 phenolic compounds were identified qualitatively using liquid chromatography in tandem mass spectrometry technology. Of which 29 phenolic compounds were quantified based on their fragmentation behaviors. Most of the phenolic compounds were higher in the younger leaves than that in the stem and root, whereas the total amount of proanthocyanidins were unexpectedly higher in the root. The expression patterns of 63 structural and regulator genes involved in the shikimic acid, phenylpropanoid, and flavonoid pathways were analyzed by quantitative real-time polymerase chain reaction and cluster analysis. Based on the similarity of their expression patterns, the genes were classified into two main groups: C1 and C2; and the genes in group C1 had high relative expression level in the root or low in the bud and leaves. The expression patterns of genes in C2-2-1 and C2-2-2-1 groups were probably responsible for the development-dependent accumulation of phenolic compounds in the leaves. Enzymatic analysis suggested that the accumulation of catechins was influenced simultaneously by catabolism and anabolism. Further research is recommended to know the expression patterns of various genes and the reason for the variation in contents of different compounds in different growth stages and also in different organs. PMID:23646127
Zhou, Rongqiong; Xia, Qingyou; Huang, Hancheng; Lai, Min; Wang, Zhenxin
2011-10-01
Toxocara canis is a widespread intestinal nematode parasite of dogs, which can also cause disease in humans. We employed an expressed sequence tag (EST) strategy in order to study gene-expression including development, digestion and reproduction of T. canis. ESTs provided a rapid way to identify genes, particularly in organisms for which we have very little molecular information. In this study, a cDNA library was constructed from a female adult of T. canis and 215 high-quality ESTs from 5'-ends of the cDNA clones representing 79 unigenes were obtained. The titer of the primary cDNA library was 1.83×10(6)pfu/mL with a recombination rate of 99.33%. Most of the sequences ranged from 300 to 900bp with an average length of 656bp. Cluster analysis of these ESTs allowed identification of 79 unique sequences containing 28 contigs and 51 singletons. BLASTX searches revealed that 18 unigenes (22.78% of the total) or 70 ESTs (32.56% of the total) were novel genes that had no significant matches to any protein sequences in the public databases. The rest of the 61 unigenes (77.22% of the total) or 145 ESTs (67.44% of the total) were closely matched to the known genes or sequences deposited in the public databases. These genes were classified into seven groups based on their known or putative biological functions. We also confirmed the gene expression patterns of several immune-related genes using RT-PCR examination. This work will provide a valuable resource for the further investigations in the stage-, sex- and tissue-specific gene transcription or expression. Copyright © 2011. Published by Elsevier Inc.
2012-01-01
Background Ethylene production and signalling play an important role in somatic embryogenesis, especially for species that are recalcitrant in in vitro culture. The AP2/ERF superfamily has been identified and classified in Hevea brasiliensis. This superfamily includes the ERFs involved in response to ethylene. The relative transcript abundance of ethylene biosynthesis genes and of AP2/ERF genes was analysed during somatic embryogenesis for callus lines with different regeneration potential, in order to identify genes regulated during that process. Results The analysis of relative transcript abundance was carried out by real-time RT-PCR for 142 genes. The transcripts of ERFs from group I, VII and VIII were abundant at all stages of the somatic embryogenesis process. Forty genetic expression markers for callus regeneration capacity were identified. Fourteen markers were found for proliferating calli and 35 markers for calli at the end of the embryogenesis induction phase. Sixteen markers discriminated between normal and abnormal embryos and, lastly, there were 36 markers of conversion into plantlets. A phylogenetic analysis comparing the sequences of the AP2 domains of Hevea and Arabidopsis genes enabled us to predict the function of 13 expression marker genes. Conclusions This first characterization of the AP2/ERF superfamily in Hevea revealed dramatic regulation of the expression of AP2/ERF genes during the somatic embryogenesis process. The gene expression markers of proliferating callus capacity to regenerate plants by somatic embryogenesis should make it possible to predict callus lines suitable to be used for multiplication. Further functional characterization of these markers opens up prospects for discovering specific AP2/ERF functions in the Hevea species for which somatic embryogenesis is difficult. PMID:23268714
Combining multiple decisions: applications to bioinformatics
NASA Astrophysics Data System (ADS)
Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.
2008-01-01
Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
Geng, Xiaodong; Wang, Yuanda; Hong, Quan; Yang, Jurong; Zheng, Wei; Zhang, Gang; Cai, Guangyan; Chen, Xiangmei; Wu, Di
2015-01-01
Purpose: Rhabdomyolysis is a threatening syndrome because it causes the breakdown of skeletal muscle. Muscle destruction leads to the release of myoglobin, intracellular proteins, and electrolytes into the circulation. The aim of this study was to investigate the differences in gene expression profiles and signaling pathways upon rhabdomyolysis-induced acute kidney injury (AKI). Methods: In this study, we used glycerol-induced renal injury as a model of rhabdomyolysis-induced AKI. We analyzed data and relevant information from the Gene Expression Omnibus database (No: GSE44925). The gene expression data for three untreated mice were compared to data for five mice with rhabdomyolysis-induced AKI. The expression profiling of the three untreated mice and the five rhabdomyolysis-induced AKI mice was performed using microarray analysis. We examined the levels of Cyp3a13, Rela, Aldh7a1, Jun, CD14. And Cdkn1a using RT-PCR to determine the accuracy of the microarray results. Results: The microarray analysis showed that there were 1050 downregulated and 659 upregulated genes in the rhabdomyolysis-induced AKI mice compared to the control group. The interactions of all differentially expressed genes in the Signal-Net were analyzed. Cyp3a13 and Rela had the most interactions with other genes. The data showed that Rela and Aldh7a1 were the key nodes and had important positions in the Signal-Net. The genes Jun, CD14, and Cdkn1a were also significantly upregulated. The pathway analysis classified the differentially expressed genes into 71 downregulated and 48 upregulated pathways including the PI3K/Akt, MAPK, and NF-κB signaling pathways. Conclusion: The results of this study indicate that the NF-κB, MAPK, PI3K/Akt, and apoptotic pathways are regulated in rhabdomyolysis-induced AKI. PMID:26823722
Talar, Urszula; Kiełbowicz-Matuk, Agnieszka; Czarnecka, Jagoda; Rorat, Tadeusz
2017-01-01
Plant B-box domain proteins (BBX) mediate many light-influenced developmental processes including seedling photomorphogenesis, seed germination, shade avoidance and photoperiodic regulation of flowering. Despite the wide range of potential functions, the current knowledge regarding BBX proteins in major crop plants is scarce. In this study, we identify and characterize the StBBX gene family in potato, which is composed of 30 members, with regard to structural properties and expression profiles under diurnal cycle, etiolation and de-etiolations. Based on domain organization and phylogenetic relationships, StBBX genes have been classified into five groups. Using real-time quantitative PCR, we found that expression of most of them oscillates following a 24-h rhythm; however, large differences in expression profiles were observed between the genes regarding amplitude and position of the maximal and minimal expression levels in the day/night cycle. On the basis of the time-of-day/time-of-night, we distinguished three expression groups specifically expressed during the light and two during the dark phase. In addition, we showed that the expression of several StBBX genes is under the control of the circadian clock and that some others are specifically associated with the etiolation and de-etiolation conditions. Thus, we concluded that StBBX proteins are likely key players involved in the complex diurnal and circadian networks regulating plant development as a function of light conditions and day duration.
Liu, Xiang; Li, Shangqi; Peng, Wenzhu; Feng, Shuaisheng; Feng, Jianxin; Mahboob, Shahid; Al-Ghanim, Khalid A; Xu, Peng
2016-01-01
The ATP-binding cassette (ABC) gene family is considered to be one of the largest gene families in all forms of prokaryotic and eukaryotic life. Although the ABC transporter genes have been annotated in some species, detailed information about the ABC superfamily and the evolutionary characterization of ABC genes in common carp (Cyprinus carpio) are still unclear. In this research, we identified 61 ABC transporter genes in the common carp genome. Phylogenetic analysis revealed that they could be classified into seven subfamilies, namely 11 ABCAs, six ABCBs, 19 ABCCs, eight ABCDs, two ABCEs, four ABCFs, and 11 ABCGs. Comparative analysis of the ABC genes in seven vertebrate species including common carp, showed that at least 10 common carp genes were retained from the third round of whole genome duplication, while 12 duplicated ABC genes may have come from the fourth round of whole genome duplication. Gene losses were also observed for 14 ABC genes. Expression profiles of the 61 ABC genes in six common carp tissues (brain, heart, spleen, kidney, intestine, and gill) revealed extensive functional divergence among the ABC genes. Different copies of some genes had tissue-specific expression patterns, which may indicate some gene function specialization. This study provides essential genomic resources for future studies in common carp.
Peng, Wenzhu; Feng, Shuaisheng; Feng, Jianxin; Mahboob, Shahid; Al-Ghanim, Khalid A.
2016-01-01
The ATP-binding cassette (ABC) gene family is considered to be one of the largest gene families in all forms of prokaryotic and eukaryotic life. Although the ABC transporter genes have been annotated in some species, detailed information about the ABC superfamily and the evolutionary characterization of ABC genes in common carp (Cyprinus carpio) are still unclear. In this research, we identified 61 ABC transporter genes in the common carp genome. Phylogenetic analysis revealed that they could be classified into seven subfamilies, namely 11 ABCAs, six ABCBs, 19 ABCCs, eight ABCDs, two ABCEs, four ABCFs, and 11 ABCGs. Comparative analysis of the ABC genes in seven vertebrate species including common carp, showed that at least 10 common carp genes were retained from the third round of whole genome duplication, while 12 duplicated ABC genes may have come from the fourth round of whole genome duplication. Gene losses were also observed for 14 ABC genes. Expression profiles of the 61 ABC genes in six common carp tissues (brain, heart, spleen, kidney, intestine, and gill) revealed extensive functional divergence among the ABC genes. Different copies of some genes had tissue-specific expression patterns, which may indicate some gene function specialization. This study provides essential genomic resources for future studies in common carp. PMID:27058731
Identification and expression profiles of the WRKY transcription factor family in Ricinus communis.
Li, Hui-Liang; Zhang, Liang-Bo; Guo, Dong; Li, Chang-Zhu; Peng, Shi-Qing
2012-07-25
In plants, WRKY proteins constitute a large family of transcription factors. They are involved in many biological processes, such as plant development, metabolism, and responses to biotic and abiotic stresses. A large number of WRKY transcription factors have been reported from Arabidopsis, rice, and other higher plants. The recent publication of the draft genome sequence of castor bean (Ricinus communis) has allowed a genome-wide search for R. communis WRKY (RcWRKY) transcription factors and the comparison of these positively identified proteins with their homologs in model plants. A total of 47 WRKY genes were identified in the castor bean genome. According to the structural features of the WRKY domain, the RcWRKY are classified into seven main phylogenetic groups. Furthermore, putative orthologs of RcWRKY proteins in Arabidopsis and rice could now be assigned. An analysis of expression profiles of RcWRKY genes indicates that 47 WRKY genes display differential expressions either in their transcript abundance or expression patterns under normal growth conditions. Copyright © 2012 Elsevier B.V. All rights reserved.
Qian, Liwei; Zheng, Haoran; Zhou, Hong; Qin, Ruibin; Li, Jinlong
2013-01-01
The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of–the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction. PMID:23516469
Milioli, Heloisa Helena; Vimieiro, Renato; Riveros, Carlos; Tishchenko, Inna; Berretta, Regina; Moscato, Pablo
2015-01-01
Background The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. Methods and Findings The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. Conclusions The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes. PMID:26132585
Predicting breast cancer using an expression values weighted clinical classifier.
Thomas, Minta; De Brabanter, Kris; Suykens, Johan A K; De Moor, Bart
2014-12-31
Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters. We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.
Aspler, Anne L; Bolshin, Carly; Vernon, Suzanne D; Broderick, Gordon
2008-09-26
Genomic profiling of peripheral blood reveals altered immunity in chronic fatigue syndrome (CFS) however interpretation remains challenging without immune demographic context. The object of this work is to identify modulation of specific immune functional components and restructuring of co-expression networks characteristic of CFS using the quantitative genomics of peripheral blood. Gene sets were constructed a priori for CD4+ T cells, CD8+ T cells, CD19+ B cells, CD14+ monocytes and CD16+ neutrophils from published data. A group of 111 women were classified using empiric case definition (U.S. Centers for Disease Control and Prevention) and unsupervised latent cluster analysis (LCA). Microarray profiles of peripheral blood were analyzed for expression of leukocyte-specific gene sets and characteristic changes in co-expression identified from topological evaluation of linear correlation networks. Median expression for a set of 6 genes preferentially up-regulated in CD19+ B cells was significantly lower in CFS (p = 0.01) due mainly to PTPRK and TSPAN3 expression. Although no other gene set was differentially expressed at p < 0.05, patterns of co-expression in each group differed markedly. Significant co-expression of CD14+ monocyte with CD16+ neutrophil (p = 0.01) and CD19+ B cell sets (p = 0.00) characterized CFS and fatigue phenotype groups. Also in CFS was a significant negative correlation between CD8+ and both CD19+ up-regulated (p = 0.02) and NK gene sets (p = 0.08). These patterns were absent in controls. Dissection of blood microarray profiles points to B cell dysfunction with coordinated immune activation supporting persistent inflammation and antibody-mediated NK cell modulation of T cell activity. This has clinical implications as the CD19+ genes identified could provide robust and biologically meaningful basis for the early detection and unambiguous phenotyping of CFS.
Guo, Xiaoyun; Li, Zezhi; Zhang, Chen; Yi, Zhenghui; Li, Haozhe; Cao, Lan; Yuan, Chengmei; Hong, Wu; Wu, Zhiguo; Peng, Daihui; Chen, Jun; Xia, Weiping; Zhao, Guoqing; Wang, Fan; Yu, Shunying; Cui, Donghong; Xu, Yifeng; Golam, Chowdhury M I; Smith, Alicia K; Wang, Tong; Fang, Yiru
2015-10-01
Subsyndromal symptomatic depression (SSD) is a common disease with significant social dysfunction. However, SSD is still not well understood and the pathophysiology of it remains unclear. We classified 48 candidate genes for SSD according to our previous study into clusters and pathways using DAVID Bioinformatics Functional Annotation Tool. We further replicated the result by using real-time Quantitative PCR (qPCR) studies to examine the expression of identified genes (i.e., STAT5b, PKCB1, ABL1 and NRAS) in another group of Han Chinese patients with SSD (n = 50). We further validated the result by examining PRKCB1 expression collected from MDD patients (n = 20). To test whether a deficit in PRKCB1 expression leads to dysregulation in PRKCB1 dependent transcript networks, we tested mRNA expression levels for the remaining 44 genes out of 48 genes in SSD patients. Finally, the power of discovery was improved by incorporating information from Quantitative Trait (eQTL) analysis. The results showed that the PRCKB1 gene expression in peripheral blood mononuclear cells (PBMC) was 33.3% down-regulated in SSD patients (n = 48, t = 3.202, p = 0.002), and a more dramatic (n = 17, 49%) down-regulation in MDD patients than control (n = 49, t = 2.114, p = 0.001). We also identified 37 genes that displayed a strong correlation with PRKCB1 mRNA expression levels in SSD patients. The expression of PRKCB1 was regulated by multiple single nucleotide polymorphisms (SNPs) both at the transcript level and exon level. In conclusion, we first found a significant decrease of PRCKB1 mRNA expression in SSD, suggesting PRKCB1 might be the candidate gene and biomarker for SSD. Copyright © 2015 Elsevier Ltd. All rights reserved.
An atlas of active enhancers across human cell types and tissues
NASA Astrophysics Data System (ADS)
Andersson, Robin; Gebhard, Claudia; Miguel-Escalada, Irene; Hoof, Ilka; Bornholdt, Jette; Boyd, Mette; Chen, Yun; Zhao, Xiaobei; Schmidl, Christian; Suzuki, Takahiro; Ntini, Evgenia; Arner, Erik; Valen, Eivind; Li, Kang; Schwarzfischer, Lucia; Glatz, Dagmar; Raithel, Johanna; Lilje, Berit; Rapin, Nicolas; Bagger, Frederik Otzen; Jørgensen, Mette; Andersen, Peter Refsing; Bertin, Nicolas; Rackham, Owen; Burroughs, A. Maxwell; Baillie, J. Kenneth; Ishizu, Yuri; Shimizu, Yuri; Furuhata, Erina; Maeda, Shiori; Negishi, Yutaka; Mungall, Christopher J.; Meehan, Terrence F.; Lassmann, Timo; Itoh, Masayoshi; Kawaji, Hideya; Kondo, Naoto; Kawai, Jun; Lennartsson, Andreas; Daub, Carsten O.; Heutink, Peter; Hume, David A.; Jensen, Torben Heick; Suzuki, Harukazu; Hayashizaki, Yoshihide; Müller, Ferenc; Consortium, The Fantom; Forrest, Alistair R. R.; Carninci, Piero; Rehli, Michael; Sandelin, Albin
2014-03-01
Enhancers control the correct temporal and cell-type-specific activation of gene expression in multicellular eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. Here we use the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. The atlas is used to compare regulatory programs between different cells at unprecedented depth, to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation.
Tang, Qing; Zang, Gonggu; Cheng, Chaohua; Luan, Mingbao; Dai, Zhigang; Xu, Ying; Yang, Zemao; Zhao, Lining; Su, Jianguang
2017-01-01
Boehmeria tricuspis includes sexually reproducing diploid and apomictic triploid individuals. Previously, we established that triploid B. tricuspis reproduces through obligate diplospory. To understand the molecular basis of apomictic development in B. tricuspis, we sequenced and compared transcriptomic profiles of the flowers of sexual and apomictic plants at four key developmental stages. A total of 283,341 unique transcripts were obtained from 1,463 million high-quality paired-end reads. In total, 18,899 unigenes were differentially expressed between the reproductive types at the four stages. By classifying the transcripts into gene ontology categories of differentially expressed genes, we showed that differential plant hormone signal transduction, cell cycle regulation, and transcription factor regulation are possibly involved in apomictic development and/or a polyploidization response in B. tricuspis. Furthermore, we suggest that specific gene families are possibly related to apomixis and might have important effects on diplosporous floral development. These results make a notable contribution to our understanding of the molecular basis of diplosporous development in B. tricuspis. PMID:28382950
QSAR Study for Carcinogenic Potency of Aromatic Amines Based on GEP and MLPs
Song, Fucheng; Zhang, Anling; Liang, Hui; Cui, Lianhua; Li, Wenlian; Si, Hongzong; Duan, Yunbo; Zhai, Honglin
2016-01-01
A new analysis strategy was used to classify the carcinogenicity of aromatic amines. The physical-chemical parameters are closely related to the carcinogenicity of compounds. Quantitative structure activity relationship (QSAR) is a method of predicting the carcinogenicity of aromatic amine, which can reveal the relationship between carcinogenicity and physical-chemical parameters. This study accessed gene expression programming by APS software, the multilayer perceptrons by Weka software to predict the carcinogenicity of aromatic amines, respectively. All these methods relied on molecular descriptors calculated by CODESSA software and eight molecular descriptors were selected to build function equations. As a remarkable result, the accuracy of gene expression programming in training and test sets are 0.92 and 0.82, the accuracy of multilayer perceptrons in training and test sets are 0.84 and 0.74 respectively. The precision of the gene expression programming is obviously superior to multilayer perceptrons both in training set and test set. The QSAR application in the identification of carcinogenic compounds is a high efficiency method. PMID:27854309
Wang, Zhuo; Jin, Shuilin; Liu, Guiyou; Zhang, Xiurui; Wang, Nan; Wu, Deliang; Hu, Yang; Zhang, Chiping; Jiang, Qinghua; Xu, Li; Wang, Yadong
2017-05-23
The development of single-cell RNA sequencing has enabled profound discoveries in biology, ranging from the dissection of the composition of complex tissues to the identification of novel cell types and dynamics in some specialized cellular environments. However, the large-scale generation of single-cell RNA-seq (scRNA-seq) data collected at multiple time points remains a challenge to effective measurement gene expression patterns in transcriptome analysis. We present an algorithm based on the Dynamic Time Warping score (DTWscore) combined with time-series data, that enables the detection of gene expression changes across scRNA-seq samples and recovery of potential cell types from complex mixtures of multiple cell types. The DTWscore successfully classify cells of different types with the most highly variable genes from time-series scRNA-seq data. The study was confined to methods that are implemented and available within the R framework. Sample datasets and R packages are available at https://github.com/xiaoxiaoxier/DTWscore .
Wei, Jiankai; Zhang, Xiaojun; Yu, Yang; Huang, Hao; Li, Fuhua; Xiang, Jianhai
2014-01-01
Penaeid shrimp has a distinctive metamorphosis stage during early development. Although morphological and biochemical studies about this ontogeny have been developed for decades, researches on gene expression level are still scarce. In this study, we have investigated the transcriptomes of five continuous developmental stages in Pacific white shrimp (Litopenaeus vannamei) with high throughput Illumina sequencing technology. The reads were assembled and clustered into 66,815 unigenes, of which 32,398 have putative homologues in nr database, 14,981 have been classified into diverse functional categories by Gene Ontology (GO) annotation and 26,257 have been associated with 255 pathways by KEGG pathway mapping. Meanwhile, the differentially expressed genes (DEGs) between adjacent developmental stages were identified and gene expression patterns were clustered. By GO term enrichment analysis, KEGG pathway enrichment analysis and functional gene profiling, the physiological changes during shrimp metamorphosis could be better understood, especially histogenesis, diet transition, muscle development and exoskeleton reconstruction. In conclusion, this is the first study that characterized the integrated transcriptomic profiles during early development of penaeid shrimp, and these findings will serve as significant references for shrimp developmental biology and aquaculture research. PMID:25197823
Atilano, Shari R.; Malik, Deepika; Chwa, Marilyn; Cáceres-Del-Carpio, Javier; Nesburn, Anthony B.; Boyer, David S.; Kuppermann, Baruch D.; Jazwinski, S. Michal; Miceli, Michael V.; Wallace, Douglas C.; Udar, Nitin; Kenney, M. Cristina
2015-01-01
Mitochondrial (mt) DNA can be classified into haplogroups representing different geographic and/or racial origins of populations. The H haplogroup is protective against age-related macular degeneration (AMD), while the J haplogroup is high risk for AMD. In the present study, we performed comparison analyses of human retinal cell cybrids, which possess identical nuclei, but mtDNA from subjects with either the H or J haplogroups, and demonstrate differences in total global methylation, and expression patterns for two genes related to acetylation and five genes related to methylation. Analyses revealed that untreated-H and -J cybrids have different expression levels for nuclear genes (CFH, EFEMP1, VEGFA and NFkB2). However, expression levels for these genes become equivalent after treatment with a methylation inhibitor, 5-aza-2′-deoxycytidine. Moreover, sequencing of the entire mtDNA suggests that differences in epigenetic status found in cybrids are likely due to single nucleotide polymorphisms (SNPs) within the haplogroup profiles rather than rare variants or private SNPs. In conclusion, our findings indicate that mtDNA variants can mediate methylation profiles and transcription for inflammation, angiogenesis and various signaling pathways, which are important in several common diseases. PMID:25964427
Cloud-scale genomic signals processing classification analysis for gene expression microarray data.
Harvey, Benjamin; Soo-Yeon Ji
2014-01-01
As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring inference though analysis of DNA/mRNA sequence data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological inference by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale classification analysis of microarray data using Wavelet thresholding in a Cloud environment to identify significantly expressed features. This paper proposes a novel methodology that uses Wavelet based Denoising to initialize a threshold for determination of significantly expressed genes for classification. Additionally, this research was implemented and encompassed within cloud-based distributed processing environment. The utilization of Cloud computing and Wavelet thresholding was used for the classification 14 tumor classes from the Global Cancer Map (GCM). The results proved to be more accurate than using a predefined p-value for differential expression classification. This novel methodology analyzed Wavelet based threshold features of gene expression in a Cloud environment, furthermore classifying the expression of samples by analyzing gene patterns, which inform us of biological processes. Moreover, enabling researchers to face the present and forthcoming challenges that may arise in the analysis of data in functional genomics of large microarray datasets.
Kim, Kyu-Tae; Lee, Hye Won; Lee, Hae-Ock; Kim, Sang Cheol; Seo, Yun Jee; Chung, Woosung; Eum, Hye Hyeon; Nam, Do-Hyun; Kim, Junhyong; Joo, Kyeung Min; Park, Woong-Yang
2015-06-19
Intra-tumoral genetic and functional heterogeneity correlates with cancer clinical prognoses. However, the mechanisms by which intra-tumoral heterogeneity impacts therapeutic outcome remain poorly understood. RNA sequencing (RNA-seq) of single tumor cells can provide comprehensive information about gene expression and single-nucleotide variations in individual tumor cells, which may allow for the translation of heterogeneous tumor cell functional responses into customized anti-cancer treatments. We isolated 34 patient-derived xenograft (PDX) tumor cells from a lung adenocarcinoma patient tumor xenograft. Individual tumor cells were subjected to single cell RNA-seq for gene expression profiling and expressed mutation profiling. Fifty tumor-specific single-nucleotide variations, including KRAS(G12D), were observed to be heterogeneous in individual PDX cells. Semi-supervised clustering, based on KRAS(G12D) mutant expression and a risk score representing expression of 69 lung adenocarcinoma-prognostic genes, classified PDX cells into four groups. PDX cells that survived in vitro anti-cancer drug treatment displayed transcriptome signatures consistent with the group characterized by KRAS(G12D) and low risk score. Single-cell RNA-seq on viable PDX cells identified a candidate tumor cell subgroup associated with anti-cancer drug resistance. Thus, single-cell RNA-seq is a powerful approach for identifying unique tumor cell-specific gene expression profiles which could facilitate the development of optimized clinical anti-cancer strategies.
Shang, Zhiwei; Li, Hongwen
2017-10-01
Vitiligo is an acquired skin disease with pigmentary disorder. Autoimmune destruction of melanocytes is thought to be major factor in the etiology of vitiligo. miRNA-based regulators of gene expression have been reported to play crucial roles in autoimmune disease. Therefore, we attempt to profile the miRNA expressions and predict their potential targets, assessing the biological functions of differentially expressed miRNA. Total RNA was extracted from peripheral blood of vitiligo (experimental group, n = 5) and non-vitiligo (control group, n = 5) age-matched patients. Samples were hybridized to a miRNA array. Box, scatter and principal component analysis plots were performed, followed by unsupervised hierarchical clustering analysis to classify the samples. Quantitative reverse transcription polymerase chain reaction (RT-PCR) was conducted for validation of microarray data. Three different databases, TargetScan, PITA and microRNA.org, were used to predict the potential target genes. Gene ontology (GO) annotation and pathway analysis were performed to assess the potential functions of predicted genes of identified miRNA. A total of 100 (29 upregulated and 71 downregulated) miRNA were filtered by volcano plot analysis. Four miRNA were validated by quantitative RT-PCR as significantly downregulated in the vitiligo group. The functions of predicted target genes associated with differentially expressed miRNA were assessed by GO analysis, showing that the GO term with most significantly enriched target genes was axon guidance, and that the axon guidance pathway was most significantly correlated with these miRNA. In conclusion, we identified four downregulated miRNA in vitiligo and assessed the potential functions of target genes related to these differentially expressed miRNA. © 2017 Japanese Dermatological Association.
Liao, Hui-Ling; Burns, Jacqueline K.
2012-01-01
Distribution of viable Candidatus Liberibacter asiaticus (CaLas) in sweet orange fruit and leaves (‘Hamlin’ and ‘Valencia’) and transcriptomic changes associated with huanglongbing (HLB) infection in fruit tissues are reported. Viable CaLas was present in most fruit tissues tested in HLB trees, with the highest titre detected in vascular tissue near the calyx abscission zone. Transcriptomic changes associated with HLB infection were analysed in flavedo (FF), vascular tissue (VT), and juice vesicles (JV) from symptomatic (SY), asymptomatic (AS), and healthy (H) fruit. In SY ‘Hamlin’, HLB altered the expression of more genes in FF and VT than in JV, whereas in SY ‘Valencia’, the number of genes whose expression was changed by HLB was similar in these tissues. The expression of more genes was altered in SY ‘Valencia’ JV than in SY ‘Hamlin’ JV. More genes were also affected in AS ‘Valencia’ FF and VT than in AS ‘Valencia’ JV. Most genes whose expression was changed by HLB were classified as transporters or involved in carbohydrate metabolism. Physiological characteristics of HLB-infected and girdled fruit were compared to differentiate between HLB-specific and carbohydrate metabolism-related symptoms. SY and girdled fruit were smaller than H and ungirdled fruit, respectively, with poor juice quality. However, girdling did not cause misshapen fruit or differential peel coloration. Quantitative PCR analysis indicated that many selected genes changed their expression significantly in SY flavedo but not in girdled flavedo. Mechanisms regulating development of HLB symptoms may lie in the host disease response rather than being a direct consequence of carbohydrate starvation. PMID:22407645
van Doorn, Remco; Dijkman, Remco; Vermeer, Maarten H; Out-Luiting, Jacoba J; van der Raaij-Helmer, Elisabeth M H; Willemze, Rein; Tensen, Cornelis P
2004-08-15
Sézary syndrome (Sz) is a malignancy of CD4+ memory skin-homing T cells and presents with erythroderma, lymphadenopathy, and peripheral blood involvement. To gain more insight into the molecular features of Sz, oligonucleotide array analysis was performed comparing gene expression patterns of CD4+ T cells from peripheral blood of patients with Sz with those of patients with erythroderma secondary to dermatitis and healthy controls. Using unsupervised hierarchical clustering gene, expression patterns of T cells from patients with Sz were classified separately from those of benign T cells. One hundred twenty-three genes were identified as significantly differentially expressed and had an average fold change exceeding 2. T cells from patients with Sz demonstrated decreased expression of the following hematopoietic malignancy-linked tumor suppressor genes: TGF-beta receptor II, Mxi1, Riz1, CREB-binding protein, BCL11a, STAT4, and Forkhead Box O1A. Moreover, the tyrosine kinase receptor EphA4 and the potentially oncogenic transcription factor Twist were highly and selectively expressed in T cells of patients with Sz. High expression of EphA4 and Twist was also observed in lesional skin biopsy specimens of a subset of patients with cutaneous T cell lymphomas related to Sz, whereas their expression was nearly undetectable in benign T cells or in skin lesions of patients with inflammatory dermatoses. Detection of EphA4 and Twist may be used in the molecular diagnosis of Sz and related cutaneous T-cell lymphomas. Furthermore, the membrane-bound EphA4 receptor may serve as a target for directed therapeutic intervention.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mangelsen, Elke; Kilian, Joachim; Berendzen, Kenneth W.
2008-02-01
WRKY proteins belong to the WRKY-GCM1 superfamily of zinc finger transcription factors that have been subject to a large plant-specific diversification. For the cereal crop barley (Hordeum vulgare), three different WRKY proteins have been characterized so far, as regulators in sucrose signaling, in pathogen defense, and in response to cold and drought, respectively. However, their phylogenetic relationship remained unresolved. In this study, we used the available sequence information to identify a minimum number of 45 barley WRKY transcription factor (HvWRKY) genes. According to their structural features the HvWRKY factors were classified into the previously defined polyphyletic WRKY subgroups 1 tomore » 3. Furthermore, we could assign putative orthologs of the HvWRKY proteins in Arabidopsis and rice. While in most cases clades of orthologous proteins were formed within each group or subgroup, other clades were composed of paralogous proteins for the grasses and Arabidopsis only, which is indicative of specific gene radiation events. To gain insight into their putative functions, we examined expression profiles of WRKY genes from publicly available microarray data resources and found group specific expression patterns. While putative orthologs of the HvWRKY transcription factors have been inferred from phylogenetic sequence analysis, we performed a comparative expression analysis of WRKY genes in Arabidopsis and barley. Indeed, highly correlative expression profiles were found between some of the putative orthologs. HvWRKY genes have not only undergone radiation in monocot or dicot species, but exhibit evolutionary traits specific to grasses. HvWRKY proteins exhibited not only sequence similarities between orthologs with Arabidopsis, but also relatedness in their expression patterns. This correlative expression is indicative for a putative conserved function of related WRKY proteins in mono- and dicot species.« less
Xu, Zongda; Zhang, Qixiang; Sun, Lidan; Du, Dongliang; Cheng, Tangren; Pan, Huitang; Yang, Weiru; Wang, Jia
2014-10-01
MADS-box genes encode transcription factors that play crucial roles in plant development, especially in flower and fruit development. To gain insight into this gene family in Prunus mume, an important ornamental and fruit plant in East Asia, and to elucidate their roles in flower organ determination and fruit development, we performed a genome-wide identification, characterisation and expression analysis of MADS-box genes in this Rosaceae tree. In this study, 80 MADS-box genes were identified in P. mume and categorised into MIKC, Mα, Mβ, Mγ and Mδ groups based on gene structures and phylogenetic relationships. The MIKC group could be further classified into 12 subfamilies. The FLC subfamily was absent in P. mume and the six tandemly arranged DAM genes might experience a species-specific evolution process in P. mume. The MADS-box gene family might experience an evolution process from MIKC genes to Mδ genes to Mα, Mβ and Mγ genes. The expression analysis suggests that P. mume MADS-box genes have diverse functions in P. mume development and the functions of duplicated genes diverged after the duplication events. In addition to its involvement in the development of female gametophytes, type I genes also play roles in male gametophytes development. In conclusion, this study adds to our understanding of the roles that the MADS-box genes played in flower and fruit development and lays a foundation for selecting candidate genes for functional studies in P. mume and other species. Furthermore, this study also provides a basis to study the evolution of the MADS-box family.
Prediction of in vivo hepatotoxicity effects using in vitro ...
High-throughput in vitro transcriptomics data support molecular understanding of chemical-induced toxicity. Here, we evaluated the utility of such data to predict liver toxicity. First, in vitro gene expression data for 93 genes was generated following exposure of metabolically competent HepaRG cells to 1060 environmental chemicals from the US EPA ToxCast library. The empirical relationship between these data and rat chronic liver endpoints from animal studies in the Toxicity Reference Database (ToxRefDB) was then evaluated using machine learning techniques. Chemicals were classified as positive (242) or negative (135) based on observed hepatic histopathologic effects, and divided into three categories: hypertrophy (183), injury (112) and proliferative lesions (101). Hepatotoxicants were classified on the basis of the bioactivity of 93 genes (descriptors) using six machine learning algorithms: linear discriminant analysis, naïve Bayes, support vector classification, classification and regression trees, k-nearest neighbors, and an ensemble of classifiers. Classification performance was evaluated using 10-fold cross-validation testing, and in-loop, filter-based, feature subset selection. The best balanced accuracy for prediction of hypertrophy, injury and proliferative lesions were 0.81 ± 0.07, 0.79 ± 0.08 and 0.77 ± 0.09, respectively. Gene specific perturbation of xenobiotic metabolism enzymes (CYP7A1/2E1/4A11/1A1/4A22) and transporters (ABCG2, ABCB11, SLC22
Gene expression information improves reliability of receptor status in breast cancer patients
Kenn, Michael; Schlangen, Karin; Castillo-Tong, Dan Cacsire; Singer, Christian F.; Cibena, Michael; Koelbl, Heinz; Schreiner, Wolfgang
2017-01-01
Immunohistochemical (IHC) determination of receptor status in breast cancer patients is frequently inaccurate. Since it directs the choice of systemic therapy, it is essential to increase its reliability. We increase the validity of IHC receptor expression by additionally considering gene expression (GE) measurements. Crisp therapeutic decisions are based on IHC estimates, even if they are borderline reliable. We further improve decision quality by a responsibility function, defining a critical domain for gene expression. Refined normalization is devised to file any newly diagnosed patient into existing data bases. Our approach renders receptor estimates more reliable by identifying patients with questionable receptor status. The approach is also more efficient since the rate of conclusive samples is increased. We have curated and evaluated gene expression data, together with clinical information, from 2880 breast cancer patients. Combining IHC with gene expression information yields a method more reliable and also more efficient as compared to common practice up to now. Several types of possibly suboptimal treatment allocations, based on IHC receptor status alone, are enumerated. A ‘therapy allocation check’ identifies patients possibly miss-classified. Estrogen: false negative 8%, false positive 6%. Progesterone: false negative 14%, false positive 11%. HER2: false negative 2%, false positive 50%. Possible implications are discussed. We propose an ‘expression look-up-plot’, allowing for a significant potential to improve the quality of precision medicine. Methods are developed and exemplified here for breast cancer patients, but they may readily be transferred to diagnostic data relevant for therapeutic decisions in other fields of oncology. PMID:29100391
Labourier, Emmanuel; Shifrin, Alexander; Busseniers, Anne E; Lupo, Mark A; Manganelli, Monique L; Andruss, Bernard; Wylie, Dennis; Beaudenon-Huibregtse, Sylvie
2015-07-01
Molecular testing for oncogenic mutations or gene expression in fine-needle aspirations (FNAs) from thyroid nodules with indeterminate cytology identifies a subset of benign or malignant lesions with high predictive value. This study aimed to evaluate a novel diagnostic algorithm combining mutation detection and miRNA expression to improve the diagnostic yield of molecular cytology. Surgical specimens and preoperative FNAs (n = 638) were tested for 17 validated gene alterations using the miRInform Thyroid test and with a 10-miRNA gene expression classifier generating positive (malignant) or negative (benign) results. Cross-sectional sampling of thyroid nodules with atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) or follicular neoplasm/suspicious for a follicular neoplasm (FN/SFN) cytology (n = 109) was conducted at 12 endocrinology centers across the United States. Qualitative molecular results were compared with surgical histopathology to determine diagnostic performance and model clinical effect. Mutations were detected in 69% of nodules with malignant outcome. Among mutation-negative specimens, miRNA testing correctly identified 64% of malignant cases and 98% of benign cases. The diagnostic sensitivity and specificity of the combined algorithm was 89% (95% confidence interval [CI], 73-97%) and 85% (95% CI, 75-92%), respectively. At 32% cancer prevalence, 61% of the molecular results were benign with a negative predictive value of 94% (95% CI, 85-98%). Independently of variations in cancer prevalence, the test increased the yield of true benign results by 65% relative to mRNA-based gene expression classification and decreased the rate of avoidable diagnostic surgeries by 69%. Multiplatform testing for DNA, mRNA, and miRNA can accurately classify benign and malignant thyroid nodules, increase the diagnostic yield of molecular cytology, and further improve the preoperative risk-based management of benign nodules with AUS/FLUS or FN/SFN cytology.
Kianmehr, Keivan; Alhajj, Reda
2008-09-01
In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.
Hou, Zhaoqi; Jia, Bing; Li, Fei; Liu, Pu; Liu, Li; Ye, Zhenfeng; Zhu, Liwu; Wang, Qi; Heng, Wei
2018-01-01
The plant genes encoding ABCGs that have been identified to date play a role in suberin formation in response to abiotic and biotic stress. In the present study, 80 ABCG genes were identified in 'Dangshansuli' Chinese white pear and designated as PbABCGs. Based on the structural characteristics and phylogenetic analysis, the PbABCG family genes could be classified into seven main groups: classes A-G. Segmental and dispersed duplications were the primary forces underlying the PbABCG gene family expansion in 'Dangshansuli' pear. Most of the PbABCG duplicated gene pairs date to the recent whole-genome duplication that occurred 30~45 million years ago. Purifying selection has also played a critical role in the evolution of the ABCG genes. Ten PbABCG genes screened in the transcriptome of 'Dangshansuli' pear and its russet mutant 'Xiusu' were validated, and the expression levels of the PbABCG genes exhibited significant differences at different stages. The results presented here will undoubtedly be useful for better understanding of the complexity of the PbABCG gene family and will facilitate the functional characterization of suberin formation in the russet mutant.
Pasture-feeding of Charolais steers influences skeletal muscle metabolism and gene expression.
Cassar-Malek, I; Jurie, C; Bernard, C; Barnola, I; Micol, D; Hocquette, J-F
2009-10-01
Extensive beef production systems on pasture are promoted to improve animal welfare and beef quality. This study aimed to compare the influence on muscle characteristics of two management approaches representative of intensive and extensive production systems. One group of 6 Charolais steers was fed maize-silage indoors and another group of 6 Charolais steers grazed on pasture. Activities of enzymes representative of glycolytic and oxidative (Isocitrate dehydrogenase [ICDH], citrate synthase [CS], hydroxyacyl-CoA dehydrogenase [HAD]) muscle metabolism were assessed in Rectus abdominis (RA) and Semitendinosus (ST) muscles. Activities of oxidative enzymes ICDH, CS and HAD were higher in muscles from grazing animals demonstrating a plasticity of muscle metabolism according to the production and feeding system. Gene expression profiling in RA and ST muscles was performed on both production groups using a multi-tissue bovine cDNA repertoire. Variance analysis showed an effect of the muscle type and of the production system on gene expression (P<0.001). A list of the 212 most variable genes according to the production system was established, of which 149 genes corresponded to identified genes. They were classified according to their gene function annotation mainly in the "protein metabolism and modification", "signal transduction", "cell cycle", "developmental processes" and "muscle contraction" biological processes. Selenoprotein W was found to be underexpressed in pasture-fed animals and could be proposed as a putative gene marker of the grass-based system. In conclusion, enzyme-specific adaptations and gene expression modifications were observed in response to the production system and some of them could be candidates for grazing or grass-feeding traceability.
Nectoux, J; Fichou, Y; Rosas-Vargas, H; Cagnard, N; Bahi-Buisson, N; Nusbaum, P; Letourneur, F; Chelly, J; Bienvenu, T
2010-07-01
More than 90% of Rett syndrome (RTT) patients have heterozygous mutations in the X-linked methyl-CpG binding protein 2 (MECP2) gene that encodes the methyl-CpG-binding protein 2, a transcriptional modulator. Because MECP2 is subjected to X chromosome inactivation (XCI), girls with RTT either express the wild-type or mutant allele in each individual cell. To test the consequences of MECP2 mutations resulting from a genome-wide transcriptional dysregulation and to identify its target genes in a system that circumvents the functional mosaicism resulting from XCI, we carried out gene expression profiling of clonal populations derived from fibroblast primary cultures expressing exclusively either the wild-type or the mutant MECP2 allele. Clonal cultures were obtained from skin biopsy of three RTT patients carrying either a non-sense or a frameshift MECP2 mutation. For each patient, gene expression profiles of wild-type and mutant clones were compared by oligonucleotide expression microarray analysis. Firstly, clustering analysis classified the RTT patients according to their genetic background and MECP2 mutation. Secondly, expression profiling by microarray analysis and quantitative RT-PCR indicated four up-regulated genes and five down-regulated genes significantly dysregulated in all our statistical analysis, including excellent potential candidate genes for the understanding of the pathophysiology of this neurodevelopmental disease. Thirdly, chromatin immunoprecipitation analysis confirmed MeCP2 binding to respective CpG islands in three out of four up-regulated candidate genes and sequencing of bisulphite-converted DNA indicated that MeCP2 preferentially binds to methylated-DNA sequences. Most importantly, the finding that at least two of these genes (BMCC1 and RNF182) were shown to be involved in cell survival and/or apoptosis may suggest that impaired MeCP2 function could alter the survival of neurons thus compromising brain function without inducing cell death.
A 15-gene signature for prediction of colon cancer recurrence and prognosis based on SVM.
Xu, Guangru; Zhang, Minghui; Zhu, Hongxing; Xu, Jinhua
2017-03-10
To screen the gene signature for distinguishing patients with high risks from those with low-risks for colon cancer recurrence and predicting their prognosis. Five microarray datasets of colon cancer samples were collected from Gene Expression Omnibus database and one was obtained from The Cancer Genome Atlas (TCGA). After preprocessing, data in GSE17537 were analyzed using the Linear Models for Microarray data (LIMMA) method to identify the differentially expressed genes (DEGs). The DEGs further underwent PPI network-based neighborhood scoring and support vector machine (SVM) analyses to screen the feature genes associated with recurrence and prognosis, which were then validated by four datasets GSE38832, GSE17538, GSE28814 and TCGA using SVM and Cox regression analyses. A total of 1207 genes were identified as DEGs between recurrence and no-recurrence samples, including 726 downregulated and 481 upregulated genes. Using SVM analysis and five gene expression profile data confirmation, a 15-gene signature (HES5, ZNF417, GLRA2, OR8D2, HOXA7, FABP6, MUSK, HTR6, GRIP2, KLRK1, VEGFA, AKAP12, RHEB, NCRNA00152 and PMEPA1) were identified as a predictor of recurrence risk and prognosis for colon cancer patients. Our identified 15-gene signature may be useful to classify colon cancer patients with different prognosis and some genes in this signature may represent new therapeutic targets. Copyright © 2016. Published by Elsevier B.V.
Radiation Dose-Rate Effects on Gene Expression in a Mouse Biodosimetry Model
Paul, Sunirmal; Smilenov, Lubomir B.; Elliston, Carl D.; Amundson, Sally A.
2015-01-01
In the event of a nuclear accident or radiological terrorist attack, there will be a pressing need for biodosimetry to triage a large, potentially exposed population and to assign individuals to appropriate treatment. Exposures from fallout are likely, resulting in protracted dose delivery that would, in turn, impact the extent of injury. Biodosimetry approaches that can distinguish such low-dose-rate (LDR) exposures from acute exposures have not yet been developed. In this study, we used the C57BL/6 mouse model in an initial investigation of the impact of low-dose-rate delivery on the transcriptomic response in blood. While a large number of the same genes responded to LDR and acute radiation exposures, for many genes the magnitude of response was lower after LDR exposures. Some genes, however, were differentially expressed (P < 0.001, false discovery rate < 5%) in mice exposed to LDR compared with mice exposed to acute radiation. We identified a set of 164 genes that correctly classified 97% of the samples in this experiment as exposed to acute or LDR radiation using a support vector machine algorithm. Gene expression is a promising approach to radiation biodosimetry, enhanced greatly by this first demonstration of its potential for distinguishing between acute and LDR exposures. Further development of this aspect of radiation biodosimetry, either as part of a complete gene expression biodosimetry test or as an adjunct to other methods, could provide vital triage information in a mass radiological casualty event. PMID:26114327
Jourda, Cyril; Cardi, Céline; Gibert, Olivier; Giraldo Toro, Andrès; Ricci, Julien; Mbéguié-A-Mbéguié, Didier; Yahiaoui, Nabila
2016-01-01
Starch is the most widespread and abundant storage carbohydrate in plants. It is also a major feature of cultivated bananas as it accumulates to large amounts during banana fruit development before almost complete conversion to soluble sugars during ripening. Little is known about the structure of major gene families involved in banana starch metabolism and their evolution compared to other species. To identify genes involved in banana starch metabolism and investigate their evolutionary history, we analyzed six gene families playing a crucial role in plant starch biosynthesis and degradation: the ADP-glucose pyrophosphorylases (AGPases), starch synthases (SS), starch branching enzymes (SBE), debranching enzymes (DBE), α-amylases (AMY) and β-amylases (BAM). Using comparative genomics and phylogenetic approaches, these genes were classified into families and sub-families and orthology relationships with functional genes in Eudicots and in grasses were identified. In addition to known ancestral duplications shaping starch metabolism gene families, independent evolution in banana and grasses also occurred through lineage-specific whole genome duplications for specific sub-families of AGPase, SS, SBE, and BAM genes; and through gene-scale duplications for AMY genes. In particular, banana lineage duplications yielded a set of AGPase, SBE and BAM genes that were highly or specifically expressed in banana fruits. Gene expression analysis highlighted a complex transcriptional reprogramming of starch metabolism genes during ripening of banana fruits. A differential regulation of expression between banana gene duplicates was identified for SBE and BAM genes, suggesting that part of starch metabolism regulation in the fruit evolved in the banana lineage. PMID:27994606
Ahn, Jun Cheul; Kim, Dae-Won; You, Young Nim; Seok, Min Sook; Park, Jeong Mee; Hwang, Hyunsik; Kim, Beom-Gi; Luan, Sheng; Park, Hong-Seog; Cho, Hye Sun
2010-11-18
FK506 binding proteins (FKBPs) and cyclophilins (CYPs) are abundant and ubiquitous proteins belonging to the peptidyl-prolyl cis/trans isomerase (PPIase) superfamily, which regulate much of metabolism through a chaperone or an isomerization of proline residues during protein folding. They are collectively referred to as immunophilin (IMM), being present in almost all cellular organs. In particular, a number of IMMs relate to environmental stresses. FKBP and CYP proteins in rice (Oryza sativa cv. Japonica) were identified and classified, and given the appropriate name for each IMM, considering the ortholog-relation with Arabidopsis and Chlamydomonas or molecular weight of the proteins. 29 FKBP and 27 CYP genes can putatively be identified in rice; among them, a number of genes can be putatively classified as orthologs of Arabidopsis IMMs. However, some genes were novel, did not match with those of Arabidopsis and Chlamydomonas, and several genes were paralogs by genetic duplication. Among 56 IMMs in rice, a significant number are regulated by salt and/or desiccation stress. In addition, their expression levels responding to the water-stress have been analyzed in different tissues, and some subcellular IMMs located by means of tagging with GFP protein. Like other green photosynthetic organisms such as Arabidopsis (23 FKBPs and 29 CYPs) and Chlamydomonas (23 FKBs and 26 CYNs), rice has the highest number of IMM genes among organisms reported so far, suggesting that the numbers relate closely to photosynthesis. Classification of the putative FKBPs and CYPs in rice provides the information about their evolutional/functional significance when comparisons are drawn with the relatively well studied genera, Arabidopsis and Chlamydomonas. In addition, many of the genes upregulated by water stress offer the possibility of manipulating the stress responses in rice.
Liu, Peng; Stajich, Jason E
2015-04-01
Batrachochytrium dendrobatidis (Bd) is the causative agent of chytridiomycosis responsible for worldwide decline in amphibian populations. Previous analysis of the Bd genome revealed a unique expansion of the carbohydrate-binding module family 18 (CBM18) predicted to be a sub-class of chitin recognition domains. CBM expansions have been linked to the evolution of pathogenicity in a variety of fungal species by protecting the fungus from the host. Based on phylogenetic analysis and presence of additional protein domains, the gene family can be classified into 3 classes: Tyrosinase-, Deacetylase-, and Lectin-like. Examination of the mRNA expression levels from sporangia and zoospores of nine of the cbm18 genes found that the Lectin-like genes had the highest expression while the Tyrosinase-like genes showed little expression, especially in zoospores. Heterologous expression of GFP-tagged copies of four CBM18 genes in Saccharomyces cerevisiae demonstrated that two copies containing secretion signal peptides are trafficked to the cell boundary. The Lectin-like genes cbm18-ll1 and cbm18-ll2 co-localized with the chitinous cell boundaries visualized by staining with calcofluor white. In vitro assays of the full length and single domain copies from CBM18-LL1 demonstrated chitin binding and no binding to cellulose or xylan. Expressed CBM18 domain proteins were demonstrated to protect the fungus, Trichoderma reeseii, in vitro against hydrolysis from exogenously added chitinase, likely by binding and limiting exposure of fungal chitin. These results demonstrate that cbm18 genes can play a role in fungal defense and expansion of their copy number may be an important pathogenicity factor of this emerging infectious disease of amphibians. Copyright © 2015 Elsevier Inc. All rights reserved.
Miura, Shigenori; Zou, Wen; Ueda, Mitsuyoshi; Tanaka, Atsuo
2000-01-01
A Saccharomyces cerevisiae strain, KK-211, isolated by the long-term bioprocess of stereoselective reduction in isooctane, showed extremely high tolerance to the solvent, which is toxic to yeast cells, but, in comparison with its wild-type parent, DY-1, showed low tolerance to hydrophilic organic solvents, such as dimethyl sulfoxide and ethanol. In order to detect the isooctane tolerance-associated genes, mRNA differential display (DD) was employed using mRNAs isolated from strains DY-1 and KK-211 cultivated without isooctane, and from strain KK-211 cultivated with isooctane. Thirty genes were identified as being differentially expressed in these three types of cells and were classified into three groups according to their expression patterns. These patterns were further confirmed and quantified by Northern blot analysis. On the DD fingerprints, the expression of 14 genes, including MUQ1, PRY2, HAC1, AGT1, GAC1, and ICT1 (YLR099c) was induced, while the expression of the remaining 16 genes, including JEN1, PRY1, PRY3, and KRE1, was decreased, in strain KK-211 cultivated with isooctane. The genes represented by HAC1, PRY1, and ICT1 have been reported to be associated with cell stress, and AGT1 and GAC1 have been reported to be involved in the uptake of trehalose and the production of glycogen, respectively. MUQ1 and KRE1, encoding proteins associated with cell surface maintenance, were also detected. Based on these results, we concluded that alteration of expression levels of multiple genes, not of a single gene, might be the critical determinant for isooctane tolerance in strain KK-211. PMID:11055939
[Effects of aconite root on energy metabolism and expression of related genes in rats].
Yu, Huayun; Ji, Xuming; Wu, Zhichun; Wang, Shijun
2011-09-01
To study the influence of aconite root, a Chinese medicinal herb with hot property, on energy metabolism and gene expression spectrum, and to analyze the possible mechanism of it effect. Thirty two SPF Wistar rats were randomly divided into aconite root group and control group. Decoction of aconite root and NS were intragastrically administrated with the concentration of 10 mL x kg(-1) respectively once a day for 20 days. Temperature, energy intake (EI), digestive energy (DE) and metabolic energy (ME) were measured. The activity of ATPase and succinate dehydrogenase (SDH) in liver was detected by colorimetry. The gene expression of liver was detected with Illumina's rat ref-12 gene array. The differential expression genes were selected, annotated and classified based on gene ontology (GO). Real-time quantitative reverse-transcriptase PCR (Q-RT-PCR) was used to test the accuracy of the array results. Compared with the control group, the toe temperature (TT) on the 10th and 20th day after the administration,the EI/BM( body mass), DE/BM, ME/BM and the activity of Na+ - K+ - ATPase, Ca2+ - Mg2+ - ATPase and SDH of liver in the aconite root group increased significantly (P<0.05). There were 592 differential expression genes in aconite root group compared with the control group. Based on Go analysis, the most significant genes was related to metabolic process (lgP = - 15.5897). Aconite root could improve the energy metabolism in rats, by influencing the metabolic process of sugar, lipid and amino acid, which may be the main molecular mechanism of warming yang and dispelling cold for the treatment of the cold syndrome according to Chinese medicine theory.
Tang, Pei-An; Wu, Hai-Jing; Xue, Hao; Ju, Xing-Rong; Song, Wei; Zhang, Qi-Lin; Yuan, Ming-Long
2017-07-30
The Indian meal moth Plodia interpunctella (Lepidoptera: Pyralidae) is a worldwide pest that causes serious damage to stored foods. Although many efforts have been conducted on this species due to its economic importance, the study of genetic basis of development, behavior and insecticide resistance has been greatly hampered due to lack of genomic information. In this study, we used high throughput sequencing platform to perform a de novo transcriptome assembly and tag-based digital gene expression profiling (DGE) analyses across four different developmental stages of P. interpunctella (egg, third-instar larvae, pupae and adult). We obtained approximate 9gigabyte (GB) of clean data and recovered 84,938 unigenes, including 37,602 clusters and 47,336 singletons. These unigenes were annotated using BLAST against the non-redundant protein databases and then functionally classified based on Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes databases (KEGG). A large number of differentially expressed genes were identified by pairwise comparisons among different developmental stages. Gene expression profiles dramatically changed between developmental stage transitions. Some of these differentially expressed genes were related to digestion and cuticularization. Quantitative real-time PCR results of six randomly selected genes conformed the findings in the DGEs. Furthermore, we identified over 8000 microsatellite markers and 97,648 single nucleotide polymorphisms which will be useful for population genetics studies of P. interpunctella. This transcriptomic information provided insight into the developmental basis of P. interpunctella and will be helpful for establishing integrated management strategies and developing new targets of insecticides for this serious pest. Copyright © 2017 Elsevier B.V. All rights reserved.
Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.
Diao, Wei-Ping; Snyder, John C; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge
2016-01-01
The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper.
Yang, Yongchao; Wang, Yongqi; Mo, Yanling; Zhang, Ruimin; Zhang, Yong; Ma, Jianxiang; Wei, Chunhua
2018-01-01
Despite identification of WRKY family genes in numerous plant species, a little is known about WRKY genes in watermelon, one of the most economically important fruit crops around the world. Here, we identified a total of 63 putative WRKY genes in watermelon and classified them into three major groups (I-III) and five subgroups (IIa-IIe) in group II. The structure analysis indicated that ClWRKYs with different WRKY domains or motifs may play different roles by regulating respective target genes. The expressions of ClWRKYs in different tissues indicate that they are involved in various tissue growth and development. Furthermore, the diverse responses of ClWRKYs to drought, salt, or cold stress suggest that they positively or negatively affect plant tolerance to various abiotic stresses. In addition, the altered expression patterns of ClWRKYs in response to phytohormones such as, ABA, SA, MeJA, and ETH, imply the occurrence of complex cross-talks between ClWRKYs and plant hormone signals in regulating plant physiological and biological processes. Taken together, our findings provide valuable clues to further explore the function and regulatory mechanisms of ClWRKY genes in watermelon growth, development, and adaption to environmental stresses. PMID:29338040
Yang, Xiaozhen; Li, Hao; Yang, Yongchao; Wang, Yongqi; Mo, Yanling; Zhang, Ruimin; Zhang, Yong; Ma, Jianxiang; Wei, Chunhua; Zhang, Xian
2018-01-01
Despite identification of WRKY family genes in numerous plant species, a little is known about WRKY genes in watermelon, one of the most economically important fruit crops around the world. Here, we identified a total of 63 putative WRKY genes in watermelon and classified them into three major groups (I-III) and five subgroups (IIa-IIe) in group II. The structure analysis indicated that ClWRKYs with different WRKY domains or motifs may play different roles by regulating respective target genes. The expressions of ClWRKYs in different tissues indicate that they are involved in various tissue growth and development. Furthermore, the diverse responses of ClWRKYs to drought, salt, or cold stress suggest that they positively or negatively affect plant tolerance to various abiotic stresses. In addition, the altered expression patterns of ClWRKYs in response to phytohormones such as, ABA, SA, MeJA, and ETH, imply the occurrence of complex cross-talks between ClWRKYs and plant hormone signals in regulating plant physiological and biological processes. Taken together, our findings provide valuable clues to further explore the function and regulatory mechanisms of ClWRKY genes in watermelon growth, development, and adaption to environmental stresses.
Sweeney, Torres; Lejeune, Alex; Moloney, Aidan P; Monahan, Frank J; Gettigan, Paul Mc; Downey, Gerard; Park, Stephen D E; Ryan, Marion T
2016-09-21
Differences between cattle production systems can influence the nutritional and sensory characteristics of beef, in particular its fatty acid (FA) composition. As beef products derived from pasture-based systems can demand a higher premium from consumers, there is a need to understand the biological characteristics of pasture produced meat and subsequently to develop methods of authentication for these products. Here, we describe an approach to authentication that focuses on differences in the transcriptomic profile of muscle from animals finished in different systems of production of practical relevance to the Irish beef industry. The objectives of this study were to identify a panel of differentially expressed (DE) genes/networks in the muscle of cattle raised outdoors on pasture compared to animals raised indoors on a concentrate based diet and to subsequently identify an optimum panel which can classify the meat based on a production system. A comparison of the muscle transcriptome of outdoor/pasture-fed and Indoor/concentrate-fed cattle resulted in the identification of 26 DE genes. Functional analysis of these genes identified two significant networks (1: Energy Production, Lipid Metabolism, Small Molecule Biochemistry; and 2: Lipid Metabolism, Molecular Transport, Small Molecule Biochemistry), both of which are involved in FA metabolism. The expression of selected up-regulated genes in the outdoor/pasture-fed animals correlated positively with the total n-3 FA content of the muscle. The pathway and network analysis of the DE genes indicate that peroxisome proliferator-activated receptor (PPAR) and FYN/AMPK could be implicit in the regulation of these alterations to the lipid profile. In terms of authentication, the expression profile of three DE genes (ALAD, EIF4EBP1 and NPNT) could almost completely separate the samples based on production system (95 % authentication for animals on pasture-based and 100 % for animals on concentrate- based diet) in this context. The majority of DE genes between muscle of the outdoor/pasture-fed and concentrate-fed cattle were related to lipid metabolism and in particular β-oxidation. In this experiment the combined expression profiles of ALAD, EIF4EBP1 and NPNT were optimal in classifying the muscle transcriptome based on production system. Given the overall lack of comparable studies and variable concordance with those that do exist, the use of transcriptomic data in authenticating production systems requires more exploration across a range of contexts and breeds.
2011-01-01
Background Gene co-expression, in the form of a correlation coefficient, has been valuable in the analysis, classification and prediction of protein-protein interactions. However, it is susceptible to bias from a few samples having a large effect on the correlation coefficient. Gene co-expression stability is a means of quantifying this bias, with high stability indicating robust, unbiased co-expression correlation coefficients. We assess the utility of gene co-expression stability as an additional measure to support the co-expression correlation in the analysis of protein-protein interaction networks. Results We studied the patterns of co-expression correlation and stability in interacting proteins with respect to their interaction promiscuity, levels of intrinsic disorder, and essentiality or disease-relatedness. Co-expression stability, along with co-expression correlation, acts as a better classifier of hub proteins in interaction networks, than co-expression correlation alone, enabling the identification of a class of hubs that are functionally distinct from the widely accepted transient (date) and obligate (party) hubs. Proteins with high levels of intrinsic disorder have low co-expression correlation and high stability with their interaction partners suggesting their involvement in transient interactions, except for a small group that have high co-expression correlation and are typically subunits of stable complexes. Similar behavior was seen for disease-related and essential genes. Interacting proteins that are both disordered have higher co-expression stability than ordered protein pairs. Using co-expression correlation and stability, we found that transient interactions are more likely to occur between an ordered and a disordered protein while obligate interactions primarily occur between proteins that are either both ordered, or disordered. Conclusions We observe that co-expression stability shows distinct patterns in structurally and functionally different groups of proteins and interactions. We conclude that it is a useful and important measure to be used in concert with gene co-expression correlation for further insights into the characteristics of proteins in the context of their interaction network. PMID:22369639
Zhao, Chen; Mao, Jinghe; Ai, Junmei; Shenwu, Ming; Shi, Tieliu; Zhang, Daqing; Wang, Xiaonan; Wang, Yunliang; Deng, Youping
2013-01-01
Insulin resistance is a key element in the pathogenesis of type 2 diabetes mellitus. Plasma free fatty acids were assumed to mediate the insulin resistance, while the relationship between lipid and glucose disposal remains to be demonstrated across liver, skeletal muscle and blood. We profiled both lipidomics and gene expression of 144 total peripheral blood samples, 84 from patients with T2D and 60 from healthy controls. Then, factor and partial least squares models were used to perform a combined analysis of lipidomics and gene expression profiles to uncover the bioprocesses that are associated with lipidomic profiles in type 2 diabetes. According to factor analysis of the lipidomic profile, several species of lipids were found to be correlated with different phenotypes, including diabetes-related C23:2CE, C23:3CE, C23:4CE, ePE36:4, ePE36:5, ePE36:6; race-related (African-American) PI36:1; and sex-related PE34:1 and LPC18:2. The major variance of gene expression profile was not caused by known factors and no significant difference can be directly derived from differential gene expression profile. However, the combination of lipidomic and gene expression analyses allows us to reveal the correlation between the altered lipid profile with significantly enriched pathways, such as one carbon pool by folate, arachidonic acid metabolism, insulin signaling pathway, amino sugar and nucleotide sugar metabolism, propanoate metabolism, and starch and sucrose metabolism. The genes in these pathways showed a good capability to classify diabetes samples. Combined analysis of gene expression and lipidomic profiling reveals type 2 diabetes-associated lipid species and enriched biological pathways in peripheral blood, while gene expression profile does not show direct correlation. Our findings provide a new clue to better understand the mechanism of disordered lipid metabolism in association with type 2 diabetes.
Mertens, Tinne C J; van der Does, Anne M; Kistemaker, Loes E; Ninaber, Dennis K; Taube, Christian; Hiemstra, Pieter S
2017-07-01
Allergic airways inflammation in asthma is characterized by an airway epithelial gene signature composed of POSTN , CLCA1 , and SERPINB2 This Th2 gene signature is proposed as a tool to classify patients with asthma into Th2-high and Th2-low phenotypes. However, many asthmatics smoke and the effects of cigarette smoke exposure on the epithelial Th2 gene signature are largely unknown. Therefore, we investigated the combined effect of IL-13 and whole cigarette smoke (CS) on the Th2 gene signature and the mucin-related genes MUC5AC and SPDEF in air-liquid interface differentiated human bronchial (ALI-PBEC) and tracheal epithelial cells (ALI-PTEC). Cultures were exposed to IL-13 for 14 days followed by 5 days of IL-13 with CS exposure. Alternatively, cultures were exposed once daily to CS for 14 days, followed by 5 days CS with IL-13. POSTN , SERPINB2 , and CLCA1 expression were measured 24 h after the last exposure to CS and IL-13. In both models POSTN , SERPINB2 , and CLCA1 expression were increased by IL-13. CS markedly affected the IL-13-induced Th2 gene signature as indicated by a reduced POSTN , CLCA1 , and MUC5AC expression in both models. In contrast, IL-13-induced SERPINB2 expression remained unaffected by CS, whereas SPDEF expression was additively increased. Importantly, cessation of CS exposure failed to restore IL-13-induced POSTN and CLCA1 expression. We show for the first time that CS differentially affects the IL-13-induced gene signature for Th2-high asthma. These findings provide novel insights into the interaction between Th2 inflammation and cigarette smoke that is important for asthma pathogenesis and biomarker-guided therapy in asthma. © 2017 The Authors. Physiological Reports published by Wiley Periodicals, Inc. on behalf of The Physiological Society and the American Physiological Society.
Rendo-Urteaga, Tara; García-Calzón, Sonia; González-Muniesa, Pedro; Milagro, Fermín I; Chueca, María; Oyarzabal, Mirentxu; Azcona-Sanjulián, M Cristina; Martínez, J Alfredo; Marti, Amelia
2015-01-28
The present study analyses the gene expression profile of peripheral blood mononuclear cells (PBMC) from obese boys. The aims of the present study were to identify baseline differences between low responders (LR) and high responders (HR) after 10 weeks of a moderate energy-restricted dietary intervention, and to compare the gene expression profile between the baseline and the endpoint of the nutritional intervention. Spanish obese boys (age 10-14 years) were advised to follow a 10-week moderate energy-restricted diet. Participants were classified into two groups based on the association between the response to the nutritional intervention and the changes in BMI standard deviation score (BMI-SDS): HR group (n 6), who had a more decreased BMI-SDS; LR group (n 6), who either maintained or had an even increased BMI-SDS. The expression of 28,869 genes was analysed in PBMC from both groups at baseline and after the nutritional intervention, using the Affymetrix Human Gene 1.1 ST 24-Array plate microarray. At baseline, the HR group showed a lower expression of inflammation and immune response-related pathways, which suggests that the LR group could have a more developed pro-inflammatory phenotype. Concomitantly, LEPR and SIRPB1 genes were highly expressed in the LR group, indicating a tendency towards an impaired immune response and leptin resistance. Moreover, the moderate energy-restricted diet was able to down-regulate the inflammatory 'mitogen-activated protein kinase signalling pathway' in the HR group, as well as some inflammatory genes (AREG and TNFAIP3). The present study confirms that changes in the gene expression profile of PBMC in obese boys may help to understand the weight-loss response. However, further research is required to confirm these findings.
Carbajo, Daniel; Magi, Shigeyuki; Itoh, Masayoshi; Kawaji, Hideya; Lassmann, Timo; Arner, Erik; Forrest, Alistair R R; Carninci, Piero; Hayashizaki, Yoshihide; Daub, Carsten O; Okada-Hatakeyama, Mariko; Mar, Jessica C
2015-01-01
Understanding how cells use complex transcriptional programs to alter their fate in response to specific stimuli is an important question in biology. For the MCF-7 human breast cancer cell line, we applied gene expression trajectory models to identify the genes involved in driving cell fate transitions. We modified trajectory models to account for the scenario where cells were exposed to different stimuli, in this case epidermal growth factor and heregulin, to arrive at different cell fates, i.e. proliferation and differentiation respectively. Using genome-wide CAGE time series data collected from the FANTOM5 consortium, we identified the sets of promoters that were involved in the transition of MCF-7 cells to their specific fates versus those with expression changes that were generic to both stimuli. Of the 1,552 promoters identified, 1,091 had stimulus-specific expression while 461 promoters had generic expression profiles over the time course surveyed. Many of these stimulus-specific promoters mapped to key regulators of the ERK (extracellular signal-regulated kinases) signaling pathway such as FHL2 (four and a half LIM domains 2). We observed that in general, generic promoters peaked in their expression early on in the time course, while stimulus-specific promoters tended to show activation of their expression at a later stage. The genes that mapped to stimulus-specific promoters were enriched for pathways that control focal adhesion, p53 signaling and MAPK signaling while generic promoters were enriched for cell death, transcription and the cell cycle. We identified 162 genes that were controlled by an alternative promoter during the time course where a subset of 37 genes had separate promoters that were classified as stimulus-specific and generic. The results of our study highlighted the degree of complexity involved in regulating a cell fate transition where multiple promoters mapping to the same gene can demonstrate quite divergent expression profiles.
Celik Altunoglu, Yasemin; Baloglu, Mehmet Cengiz; Baloglu, Pinar; Yer, Esra Nurten; Kara, Sibel
2017-01-01
Late embryogenesis abundant (LEA) proteins are large and diverse group of polypeptides which were first identified during seed dehydration and then in vegetative plant tissues during different stress responses. Now, gene family members of LEA proteins have been detected in various organisms. However, there is no report for this protein family in watermelon and melon until this study. A total of 73 LEA genes from watermelon ( ClLEA ) and 61 LEA genes from melon ( CmLEA ) were identified in this comprehensive study. They were classified into four and three distinct clusters in watermelon and melon, respectively. There was a correlation between gene structure and motif composition among each LEA groups. Segmental duplication played an important role for LEA gene expansion in watermelon. Maximum gene ontology of LEA genes was observed with poplar LEA genes. For evaluation of tissue specific expression patterns of ClLEA and CmLEA genes, publicly available RNA-seq data were analyzed. The expression analysis of selected LEA genes in root and leaf tissues of drought-stressed watermelon and melon were examined using qRT-PCR. Among them, ClLEA - 12 - 17 - 46 genes were quickly induced after drought application. Therefore, they might be considered as early response genes for water limitation conditions in watermelon. In addition, CmLEA - 42 - 43 genes were found to be up-regulated in both tissues of melon under drought stress. Our results can open up new frontiers about understanding of functions of these important family members under normal developmental stages and stress conditions by bioinformatics and transcriptomic approaches.
Sharma, Mukul; Vedithi, Sundeep Chaitanya; Das, Madhusmita; Roy, Anindya; Ebenezer, Mannam
2017-01-01
Survival of Mycobacterium leprae, the causative bacteria for leprosy, in the human host is dependent to an extent on the ways in which its genome integrity is retained. DNA repair mechanisms protect bacterial DNA from damage induced by various stress factors. The current study is aimed at understanding the sequence and functional annotation of DNA repair genes in M. leprae. T he genome of M. leprae was annotated using sequence alignment tools to identify DNA repair genes that have homologs in Mycobacterium tuberculosis and Escherichia coli. A set of 96 genes known to be involved in DNA repair mechanisms in E. coli and Mycobacteriaceae were chosen as a reference. Among these, 61 were identified in M. leprae based on sequence similarity and domain architecture. The 61 were classified into 36 characterized gene products (59%), 11 hypothetical proteins (18%), and 14 pseudogenes (23%). All these genes have homologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli were present in M. leprae and in Mycobacteriaceae. These 61 genes were further investigated for their expression profiles in the whole transcriptome microarray data of M. leprae which was obtained from the signal intensities of 60bp probes, tiling the entire genome with 10bp overlaps. It was noted that transcripts corresponding to all the 61 genes were identified in the transcriptome data with varying expression levels ranging from 0.18 to 2.47 fold (normalized with 16SrRNA). The mRNA expression levels of a representative set of seven genes ( four annotated and three hypothetical protein coding genes) were analyzed using quantitative Polymerase Chain Reaction (qPCR) assays with RNA extracted from skin biopsies of 10 newly diagnosed, untreated leprosy cases. It was noted that RNA expression levels were higher for genes involved in homologous recombination whereas the genes with a low level of expression are involved in the direct repair pathway. This study provided preliminary information on the potential DNA repair pathways that are extant in M. leprae and the associated genes.
Gene selection with multiple ordering criteria.
Chen, James J; Tsai, Chen-An; Tzeng, Shengli; Chen, Chun-Houh
2007-03-05
A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the fold-change and p-value are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a two-factor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects. We propose three layer ranking algorithms, point-admissible, line-admissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, fold-change, p-value, and frequency of selections by the SVM-RFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4-fold change or less, the two-dimensional (p-value and fold-change) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard p-value ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a two-factor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations. The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives.
Robles, Ana I.; Arai, Eri; Mathé, Ewy A.; Okayama, Hirokazu; Schetter, Aaron J.; Brown, Derek; Petersen, David; Bowman, Elise D.; Noro, Rintaro; Welsh, Judith A.; Edelman, Daniel C.; Stevenson, Holly S.; Wang, Yonghong; Tsuchiya, Naoto; Kohno, Takashi; Skaug, Vidar; Mollerup, Steen; Haugen, Aage; Meltzer, Paul S.; Yokota, Jun; Kanai, Yae
2015-01-01
Introduction Up to 30% Stage I lung cancer patients suffer recurrence within 5 years of curative surgery. We sought to improve existing protein-coding gene and microRNA expression prognostic classifiers by incorporating epigenetic biomarkers. Methods Genome-wide screening of DNA methylation and pyrosequencing analysis of HOXA9 promoter methylation were performed in two independently collected cohorts of Stage I lung adenocarcinoma. The prognostic value of HOXA9 promoter methylation alone and in combination with mRNA and miRNA biomarkers was assessed by Cox regression and Kaplan-Meier survival analysis in both cohorts. Results Promoters of genes marked by Polycomb in Embryonic Stem Cells were methylated de novo in tumors and identified patients with poor prognosis. The HOXA9 locus was methylated de novo in Stage I tumors (P < 0.0005). High HOXA9 promoter methylation was associated with worse cancer-specific survival (Hazard Ratio [HR], 2.6; P = 0.02) and recurrence-free survival (HR, 3.0; P = 0.01), and identified high-risk patients in stratified analysis of Stage IA and IB. Four protein-coding gene (XPO1, BRCA1, HIF1α, DLC1), miR-21 expression and HOXA9 promoter methylation were each independently associated with outcome (HR, 2.8; P = 0.002; HR, 2.3; P = 0.01; and HR, 2.4; P = 0.005, respectively), and, when combined, identified high-risk, therapy naïve, Stage I patients (HR, 10.2; P = 3x10−5). All associations were confirmed in two independently collected cohorts. Conclusion A prognostic classifier comprising three types of genomic and epigenomic data may help guide the postoperative management of Stage I lung cancer patients at high risk of recurrence. PMID:26134223
Kocak, H; Ackermann, S; Hero, B; Kahlert, Y; Oberthuer, A; Juraeva, D; Roels, F; Theissen, J; Westermann, F; Deubzer, H; Ehemann, V; Brors, B; Odenthal, M; Berthold, F; Fischer, M
2013-04-11
Neuroblastoma is an embryonal malignancy of the sympathetic nervous system. Spontaneous regression and differentiation of neuroblastoma is observed in a subset of patients, and has been suggested to represent delayed activation of physiologic molecular programs of fetal neuroblasts. Homeobox genes constitute an important family of transcription factors, which play a fundamental role in morphogenesis and cell differentiation during embryogenesis. In this study, we demonstrate that expression of the majority of the human HOX class I homeobox genes is significantly associated with clinical covariates in neuroblastoma using microarray expression data of 649 primary tumors. Moreover, a HOX gene expression-based classifier predicted neuroblastoma patient outcome independently of age, stage and MYCN amplification status. Among all HOX genes, HOXC9 expression was most prominently associated with favorable prognostic markers. Most notably, elevated HOXC9 expression was significantly associated with spontaneous regression in infant neuroblastoma. Re-expression of HOXC9 in three neuroblastoma cell lines led to a significant reduction in cell viability, and abrogated tumor growth almost completely in neuroblastoma xenografts. Neuroblastoma growth arrest was related to the induction of programmed cell death, as indicated by an increase in the sub-G1 fraction and translocation of phosphatidylserine to the outer membrane. Programmed cell death was associated with the release of cytochrome c from the mitochondria into the cytosol and activation of the intrinsic cascade of caspases, indicating that HOXC9 re-expression triggers the intrinsic apoptotic pathway. Collectively, our results show a strong prognostic impact of HOX gene expression in neuroblastoma, and may point towards a role of Hox-C9 in neuroblastoma spontaneous regression.
Oufattole, M; Arango, M; Boutry, M
2000-04-01
To analyze in detail the multigene family encoding the plasma-membrane H(+)-ATPase (pma) in Nicotiana plumbaginifolia Viv., five new pma genes (pma 5-9) were isolated. Three of these (pma 6, 8, 9) were fully characterized and classified into new and independent subfamilies. Their cell-type expression was followed by the beta-glucuronidase (gusA) reporter-gene method. While the pma8-gusA transgene was not expressed in transgenic tobacco, expression of the two other transgenes (pma6- and pma9-gusA) was found to be restricted to particular cell types. In the vegetative tissues, pma6-gusA expression was limited to the head cells of the leaf short trichomes, involved in secretion, and to the cortical parenchyma of the young nodes where the developing leaves and axillary flowering stalks join the stem. In the latter tissues, gene expression was enhanced by mechanical stress, suggesting that H(+)-ATPase might be involved in the strength of the tissues and their resistance to mechanical trauma. The pma9-gusA transgene was mainly expressed in the apical meristem of adventitious roots and axillary buds as well as in the phloem tissues of the stem, in which expression depended on the developmental stage. In flowers, pma9-gusA expression was limited to the mature pollen grains and the young fertilized ovules, while that of pma6-gusA was identified in most of the organs. Reverse transcription-polymerase chain reaction of leaf and stem RNA confirmed the expression of pma 6 and 9, while pma8 was found to be expressed in both organs at a lower level. In conclusion, although pma 6 and 9 had a more restricted expression pattern than the previously characterized pma genes, they were nevertheless expressed in cell types in which H(+)-ATPase had not been previously detected.
Li, Lian; Wu, Jie; Luo, Man; Sun, Yu; Wang, Genlin
2016-05-01
Summer heat stress (HS) is a major contributing factor in low fertility in lactating dairy cows in hot environments. Heat stress inhibits ovarian follicular development leading to diminished reproductive efficiency of dairy cows during summer. Ovarian follicle development is a complex process. During follicle development, granulosa cells (GCs) replicate, secrete hormones, and support the growth of the oocyte. To obtain an overview of the effects of heat stress on GCs, digital gene expression profiling was employed to screen and identify differentially expressed genes (DEGs; false discovery rate (FDR) ≤ 0.001, fold change ≥2) of cultured GCs during heat stress. A total of 1211 DEGs including 175 upregulated and 1036 downregulated ones were identified, of which DEGs can be classified into Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The results suggested that heat stress triggers a dramatic and complex program of altered gene expression in GCs. We hypothesized that heat stress could induce the apoptosis and dysfunction of GCs. Real-time reverse transcription-polymerase chain reaction (RT-PCR) was used to evaluate the expression of steroidogenic genes (steroidogenic acute regulatory protein (Star), cytochrome P-450 (CYP11A1), CYP19A1, and steroidogenic factor 1 (SF-1)) and apoptosis-related genes (caspase-3, BCL-2, and BAX). Radio immunoassay (RIA) was used to analyze the level of 17β-estradiol (E2) and progesterone (P4). We also assessed the apoptosis of GCs by flow cytometry. Our data suggested that heat stress induced GC apoptosis through the BAX/BCL-2 pathway and reduced the steroidogenic gene messenger RNA (mRNA) expression and E2 synthesis. These results suggest that the decreased function of GCs may cause ovarian dysfunction and offer an improved understanding of the molecular mechanism responsible for the low fertility in cattle in summer.
Frerich, Candace A.; Brayer, Kathryn J.; Painter, Brandon M.; Kang, Huining; Mitani, Yoshitsugu; El-Naggar, Adel K.; Ness, Scott A.
2018-01-01
The relative rarity of salivary gland adenoid cystic carcinoma (ACC) and its slow growing yet aggressive nature has complicated the development of molecular markers for patient stratification. To analyze molecular differences linked to the protracted disease course of ACC and metastases that form 5 or more years after diagnosis, detailed RNA-sequencing (RNA-seq) analysis was performed on 68 ACC tumor samples, starting with archived, formalin-fixed paraffin-embedded (FFPE) samples up to 25 years old, so that clinical outcomes were available. A statistical peak-finding approach was used to classify the tumors that expressed MYB or MYBL1, which had overlapping gene expression signatures, from a group that expressed neither oncogene and displayed a unique phenotype. Expression of MYB or MYBL1 was closely correlated to the expression of the SOX4 and EN1 genes, suggesting that they are direct targets of Myb proteins in ACC tumors. Unsupervised hierarchical clustering identified a subgroup of approximately 20% of patients with exceptionally poor overall survival (median less than 30 months) and a unique gene expression signature resembling embryonic stem cells. The results provide a strategy for stratifying ACC patients and identifying the high-risk, poor-outcome group that are candidates for personalized therapies. PMID:29484115
Nikiforova, Marina N; Mercurio, Stephanie; Wald, Abigail I; Barbi de Moura, Michelle; Callenberg, Keith; Santana-Santos, Lucas; Gooding, William E; Yip, Linwah; Ferris, Robert L; Nikiforov, Yuri E
2018-04-15
Molecular tests have clinical utility for thyroid nodules with indeterminate fine-needle aspiration (FNA) cytology, although their performance requires further improvement. This study evaluated the analytical performance of the newly created ThyroSeq v3 test. ThyroSeq v3 is a DNA- and RNA-based next-generation sequencing assay that analyzes 112 genes for a variety of genetic alterations, including point mutations, insertions/deletions, gene fusions, copy number alterations, and abnormal gene expression, and it uses a genomic classifier (GC) to separate malignant lesions from benign lesions. It was validated in 238 tissue samples and 175 FNA samples with known surgical follow-up. Analytical performance studies were conducted. In the training tissue set of samples, ThyroSeq GC detected more than 100 genetic alterations, including BRAF, RAS, TERT, and DICER1 mutations, NTRK1/3, BRAF, and RET fusions, 22q loss, and gene expression alterations. GC cutoffs were established to distinguish cancer from benign nodules with 93.9% sensitivity, 89.4% specificity, and 92.1% accuracy. This correctly classified most papillary, follicular, and Hurthle cell lesions, medullary thyroid carcinomas, and parathyroid lesions. In the FNA validation set, the GC sensitivity was 98.0%, the specificity was 81.8%, and the accuracy was 90.9%. Analytical accuracy studies demonstrated a minimal required nucleic acid input of 2.5 ng, a 12% minimal acceptable tumor content, and reproducible test results under variable stress conditions. The ThyroSeq v3 GC analyzes 5 different classes of molecular alterations and provides high accuracy for detecting all common types of thyroid cancer and parathyroid lesions. The analytical sensitivity, specificity, and robustness of the test have been successfully validated and indicate its suitability for clinical use. Cancer 2018;124:1682-90. © 2018 American Cancer Society. © 2018 American Cancer Society.
Macqueen, Daniel J; Wilcox, Alexander H
2014-04-09
The calpains are a superfamily of proteases with extensive relevance to human health and welfare. Vast research attention is given to the vertebrate 'classical' subfamily, making it surprising that the evolutionary origins, distribution and relationships of these genes is poorly characterized. Consequently, there exists uncertainty about the conservation of gene family structure, function and expression that has been principally defined from work with mammals. Here, more than 200 vertebrate classical calpains were incorporated in phylogenetic analyses spanning an unprecedented range of taxa, including jawless and cartilaginous fish. We demonstrate that the common vertebrate ancestor had at least six classical calpains, including a single gene that gave rise to CAPN11, 1, 2 and 8 in the early jawed fish lineage, plus CAPN3, 9, 12, 13 and a novel calpain gene, hereafter named CAPN17. We reveal that while all vertebrate classical calpains have been subject to persistent purifying selection during evolution, the degree and nature of selective pressure has often been lineage-dependent. The tissue expression of the complete classic calpain family was assessed in representative teleost fish, amphibians, reptiles and mammals. This highlighted systematic divergence in expression across vertebrate taxa, with most classic calpain genes from fish and amphibians having more extensive tissue distribution than in amniotes. Our data suggest that classical calpain functions have frequently diverged during vertebrate evolution and challenge the ongoing value of the established system of classifying calpains by expression.
Macqueen, Daniel J.; Wilcox, Alexander H.
2014-01-01
The calpains are a superfamily of proteases with extensive relevance to human health and welfare. Vast research attention is given to the vertebrate ‘classical’ subfamily, making it surprising that the evolutionary origins, distribution and relationships of these genes is poorly characterized. Consequently, there exists uncertainty about the conservation of gene family structure, function and expression that has been principally defined from work with mammals. Here, more than 200 vertebrate classical calpains were incorporated in phylogenetic analyses spanning an unprecedented range of taxa, including jawless and cartilaginous fish. We demonstrate that the common vertebrate ancestor had at least six classical calpains, including a single gene that gave rise to CAPN11, 1, 2 and 8 in the early jawed fish lineage, plus CAPN3, 9, 12, 13 and a novel calpain gene, hereafter named CAPN17. We reveal that while all vertebrate classical calpains have been subject to persistent purifying selection during evolution, the degree and nature of selective pressure has often been lineage-dependent. The tissue expression of the complete classic calpain family was assessed in representative teleost fish, amphibians, reptiles and mammals. This highlighted systematic divergence in expression across vertebrate taxa, with most classic calpain genes from fish and amphibians having more extensive tissue distribution than in amniotes. Our data suggest that classical calpain functions have frequently diverged during vertebrate evolution and challenge the ongoing value of the established system of classifying calpains by expression. PMID:24718597
Stateman, William A.; Knöppel, Alexandra B.; Flegel, Willy A.; Henkin, Robert I.
2015-01-01
PURPOSE Our previous study of Type II congenital smell loss patients revealed a statistically significant lower prevalence of an FY (ACKR1, formerly DARC) haplotype compared to controls. The present study correlates this genetic feature with subgroups of patients defined by specific smell and taste functions. METHODS Smell and taste function measurements were performed by use of olfactometry and gustometry to define degree of abnormality of smell and taste function. Smell loss was classified as anosmia or hyposmia (types I, II or III). Taste loss was similarly classified as ageusia or hypogeusia (types I, II or III). Based upon these results patient erythrocyte antigen expression frequencies were categorized by smell and taste loss with results compared between patients within the Type II group and published controls. RESULTS Comparison of antigen expression frequencies revealed a statistically significant decrease in incidence of an Fyb haplotype only among patients with type I hyposmia and any form of taste loss (hypogeusia). In all other patient groups erythrocyte antigens were expressed at normal frequencies. CONCLUSIONS Data suggest that Type II congenital smell loss patients who exhibit both type I hyposmia and hypogeusia are genetically distinct from all other patients with Type II congenital smell loss. This distinction is based on decreased Fyb expression which correlated with abnormalities in two sensory modalities (hyposmia type I and hypogeusia). Only patients with these two specific sensory abnormalities expressed the Fyb antigen (encoded by the ACKR1 gene on the long arm of chromosome 1) at frequencies different from controls. PMID:27968956
A survey of human brain transcriptome diversity at the single cell level.
Darmanis, Spyros; Sloan, Steven A; Zhang, Ye; Enge, Martin; Caneda, Christine; Shuer, Lawrence M; Hayden Gephart, Melanie G; Barres, Ben A; Quake, Stephen R
2015-06-09
The human brain is a tissue of vast complexity in terms of the cell types it comprises. Conventional approaches to classifying cell types in the human brain at single cell resolution have been limited to exploring relatively few markers and therefore have provided a limited molecular characterization of any given cell type. We used single cell RNA sequencing on 466 cells to capture the cellular complexity of the adult and fetal human brain at a whole transcriptome level. Healthy adult temporal lobe tissue was obtained during surgical procedures where otherwise normal tissue was removed to gain access to deeper hippocampal pathology in patients with medical refractory seizures. We were able to classify individual cells into all of the major neuronal, glial, and vascular cell types in the brain. We were able to divide neurons into individual communities and show that these communities preserve the categorization of interneuron subtypes that is typically observed with the use of classic interneuron markers. We then used single cell RNA sequencing on fetal human cortical neurons to identify genes that are differentially expressed between fetal and adult neurons and those genes that display an expression gradient that reflects the transition between replicating and quiescent fetal neuronal populations. Finally, we observed the expression of major histocompatibility complex type I genes in a subset of adult neurons, but not fetal neurons. The work presented here demonstrates the applicability of single cell RNA sequencing on the study of the adult human brain and constitutes a first step toward a comprehensive cellular atlas of the human brain.
Novel organization of the common nodulation genes in Rhizobium leguminosarum bv. phaseoli strains.
Vázquez, M; Dávalos, A; de las Peñas, A; Sánchez, F; Quinto, C
1991-01-01
Nodulation by Rhizobium, Bradyrhizobium, and Azorhizobium species in the roots of legumes and nonlegumes requires the proper expression of plant genes and of both common and specific bacterial nodulation genes. The common nodABC genes form an operon or are physically mapped together in all species studied thus far. Rhizobium leguminosarum bv. phaseoli strains are classified in two groups. The type I group has reiterated nifHDK genes and a narrow host range of nodulation. The type II group has a single copy of the nifHDK genes and a wide host range of nodulation. We have found by genetic and nucleotide sequence analysis that in type I strain CE-3, the functional common nodA gene is separated from the nodBC genes by 20 kb and thus is transcriptionally separated from the latter genes. This novel organization could be the result of a complex rearrangement, as we found zones of identity between the two separated nodA and nodBC regions. Moreover, this novel organization of the common nodABC genes seems to be a general characteristic of R. leguminosarum bv. phaseoli type I strains. Despite the separation, the coordination of the expression of these genes seems not to be altered. PMID:1991718
Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi
2010-01-01
Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722
Kim, Minseung; Zorraquino, Violeta; Tagkopoulos, Ilias
2015-03-01
A tantalizing question in cellular physiology is whether the cellular state and environmental conditions can be inferred by the expression signature of an organism. To investigate this relationship, we created an extensive normalized gene expression compendium for the bacterium Escherichia coli that was further enriched with meta-information through an iterative learning procedure. We then constructed an ensemble method to predict environmental and cellular state, including strain, growth phase, medium, oxygen level, antibiotic and carbon source presence. Results show that gene expression is an excellent predictor of environmental structure, with multi-class ensemble models achieving balanced accuracy between 70.0% (±3.5%) to 98.3% (±2.3%) for the various characteristics. Interestingly, this performance can be significantly boosted when environmental and strain characteristics are simultaneously considered, as a composite classifier that captures the inter-dependencies of three characteristics (medium, phase and strain) achieved 10.6% (±1.0%) higher performance than any individual models. Contrary to expectations, only 59% of the top informative genes were also identified as differentially expressed under the respective conditions. Functional analysis of the respective genetic signatures implicates a wide spectrum of Gene Ontology terms and KEGG pathways with condition-specific information content, including iron transport, transferases, and enterobactin synthesis. Further experimental phenotypic-to-genotypic mapping that we conducted for knock-out mutants argues for the information content of top-ranked genes. This work demonstrates the degree at which genome-scale transcriptional information can be predictive of latent, heterogeneous and seemingly disparate phenotypic and environmental characteristics, with far-reaching applications.
Ensemble stump classifiers and gene expression signatures in lung cancer.
Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn
2007-01-01
Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
Ghatei, Najmeh; Nabavi, Ariane Sadr; Toosi, Mohammad Hossein Bahreyni; Azimian, Hosein; Homayoun, Mansour; Targhi, Reza Ghasemnezhad; Haghir, Hossein
2017-09-01
The increasing rate of over using cell phones has been considerable in youths and pregnant women. We examined the effect of mobile phones radiation on genes expression variation on cerebellum of BALB/c mice before and after of the birth. In this study, a mobile phone jammer, which is an instrument to prevent receiving signals between cellular phones and base transceiver stations (two frequencies 900 and 1800 MHz) for exposure was used and twelve pregnant mice (BALB/c) divided into two groups (n=6), first group irradiated in pregnancy period (19th day), the second group did not irradiate in pregnancy period. After childbirth, offspring were classified into four groups (n=4): Group1: control, Group 2: B1 (Irradiated after birth), Group 3: B2 (Irradiated in pregnancy period and after birth), Group 4: B3 (Irradiated in pregnancy period). When maturity was completed (8-10 weeks old), mice were dissected and cerebellum was isolated. The expression level of bax , bcl-2, p21 and p53 genes examined by real-time reverse transcription polymerase chain reaction (Real-Time RT- PCR). The data showed that mobile phone radio waves were ineffective on the expression level of bcl-2 and p53 genes) P >0.05(. Also gene expression level of bax decreased and gene expression level of p21 increased comparing to the control group ( P <0.05). From the obtained data it could be concluded that the mobile phone radiations did not induce apoptosis in cells of the cerebellum and the injured cells can be repaired by cell cycle arrest.
Ivanov, Sergey V.; Kuzmin, Igor; Wei, Ming-Hui; Pack, Svetlana; Geil, Laura; Johnson, Bruce E.; Stanbridge, Eric J.; Lerman, Michael I.
1998-01-01
To discover genes involved in von Hippel-Lindau (VHL)-mediated carcinogenesis, we used renal cell carcinoma cell lines stably transfected with wild-type VHL-expressing transgenes. Large-scale RNA differential display technology applied to these cell lines identified several differentially expressed genes, including an alpha carbonic anhydrase gene, termed CA12. The deduced protein sequence was classified as a one-pass transmembrane CA possessing an apparently intact catalytic domain in the extracellular CA module. Reintroduced wild-type VHL strongly inhibited the overexpression of the CA12 gene in the parental renal cell carcinoma cell lines. Similar results were obtained with CA9, encoding another transmembrane CA with an intact catalytic domain. Although both domains of the VHL protein contribute to regulation of CA12 expression, the elongin binding domain alone could effectively regulate CA9 expression. We mapped CA12 and CA9 loci to chromosome bands 15q22 and 17q21.2 respectively, regions prone to amplification in some human cancers. Additional experiments are needed to define the role of CA IX and CA XII enzymes in the regulation of pH in the extracellular microenvironment and its potential impact on cancer cell growth. PMID:9770531
He, Qiuling; Jones, Don C.; Li, Wei; Xie, Fuliang; Ma, Jun; Sun, Runrun; Wang, Qinglian; Zhu, Shuijin; Zhang, Baohong
2016-01-01
The R2R3-MYB is one of the largest families of transcription factors, which have been implicated in multiple biological processes. There is great diversity in the number of R2R3-MYB genes in different plants. However, there is no report on genome-wide characterization of this gene family in cotton. In the present study, a total of 205 putative R2R3-MYB genes were identified in cotton D genome (Gossypium raimondii), that are much larger than that found in other cash crops with fully sequenced genomes. These GrMYBs were classified into 13 groups with the R2R3-MYB genes from Arabidopsis and rice. The amino acid motifs and phylogenetic tree were predicted and analyzed. The sequences of GrMYBs were distributed across 13 chromosomes at various densities. The results showed that the expansion of the G. Raimondii R2R3-MYB family was mainly attributable to whole genome duplication and segmental duplication. Moreover, the expression pattern of 52 selected GrMYBs and 46 GaMYBs were tested in roots and leaves under different abiotic stress conditions. The results revealed that the MYB genes in cotton were differentially expressed under salt and drought stress treatment. Our results will be useful for determining the precise role of the MYB genes during stress responses with crop improvement. PMID:27009386
Qian, Baoying; Xue, Liangyi; Huang, Hongli
2016-01-01
The large yellow croaker (Larimichthys crocea) is an economically important fish species in Chinese mariculture industry. To understand the molecular basis underlying the response to fasting, Illumina HiSeqTM 2000 was used to analyze the liver transcriptome of fasting large yellow croakers. A total of 54,933,550 clean reads were obtained and assembled into 110,364 contigs. Annotation to the NCBI database identified a total of 38,728 unigenes, of which 19,654 were classified into Gene Ontology and 22,683 were found in Kyoto Encyclopedia of Genes and Genomes (KEGG). Comparative analysis of the expression profiles between fasting fish and normal-feeding fish identified a total of 7,623 differentially expressed genes (P < 0.05), including 2,500 upregulated genes and 5,123 downregulated genes. Dramatic differences were observed in the genes involved in metabolic pathways such as fat digestion and absorption, citrate cycle, and glycolysis/gluconeogenesis, and the similar results were also found in the transcriptome of skeletal muscle. Further qPCR analysis confirmed that the genes encoding the factors involved in those pathways significantly changed in terms of expression levels. The results of the present study provide insights into the molecular mechanisms underlying the metabolic response of the large yellow croaker to fasting as well as identified areas that require further investigation. PMID:26967898
Gene expression profiling assigns CHEK2 1100delC breast cancers to the luminal intrinsic subtypes.
Nagel, Jord H A; Peeters, Justine K; Smid, Marcel; Sieuwerts, Anieta M; Wasielewski, Marijke; de Weerd, Vanja; Trapman-Jansen, Anita M A C; van den Ouweland, Ans; Brüggenwirth, Hennie; van I Jcken, Wilfred F J; Klijn, Jan G M; van der Spek, Peter J; Foekens, John A; Martens, John W M; Schutte, Mieke; Meijers-Heijboer, Hanne
2012-04-01
CHEK2 1100delC is a moderate-risk cancer susceptibility allele that confers a high breast cancer risk in a polygenic setting. Gene expression profiling of CHEK2 1100delC breast cancers may reveal clues to the nature of the polygenic CHEK2 model and its genes involved. Here, we report global gene expression profiles of a cohort of 155 familial breast cancers, including 26 CHEK2 1100delC mutant tumors. In line with previous work, all CHEK2 1100delC mutant tumors clustered among the hormone receptor-positive breast cancers. In the hormone receptor-positive subset, a 40-gene CHEK2 signature was subsequently defined that significantly associated with CHEK2 1100delC breast cancers. The identification of a CHEK2 gene signature implies an unexpected biological homogeneity among the CHEK2 1100delC breast cancers. In addition, all 26 CHEK2 1100delC tumors classified as luminal intrinsic subtype breast cancers, with 8 luminal A and 18 luminal B tumors. This biological make-up of CHEK2 1100delC breast cancers suggests that a relatively limited number of additional susceptibility alleles are involved in the polygenic CHEK2 model. Identification of these as-yet-unknown susceptibility alleles should be aided by clues from the 40-gene CHEK2 signature.
Distinct types of primary cutaneous large B-cell lymphoma identified by gene expression profiling.
Hoefnagel, Juliette J; Dijkman, Remco; Basso, Katia; Jansen, Patty M; Hallermann, Christian; Willemze, Rein; Tensen, Cornelis P; Vermeer, Maarten H
2005-05-01
In the European Organization for Research and Treatment of Cancer (EORTC) classification 2 types of primary cutaneous large B-cell lymphoma (PCLBCL) are distinguished: primary cutaneous follicle center cell lymphomas (PCFCCL) and PCLBCL of the leg (PCLBCL-leg). Distinction between both groups is considered important because of differences in prognosis (5-year survival > 95% and 52%, respectively) and the first choice of treatment (radiotherapy or systemic chemotherapy, respectively), but is not generally accepted. To establish a molecular basis for this subdivision in the EORTC classification, we investigated the gene expression profiles of 21 PCLBCLs by oligonucleotide microarray analysis. Hierarchical clustering based on a B-cell signature (7450 genes) classified PCLBCL into 2 distinct subgroups consisting of, respectively, 8 PCFCCLs and 13 PCLBCLsleg. PCLBCLs-leg showed increased expression of genes associated with cell proliferation; the proto-oncogenes Pim-1, Pim-2, and c-Myc; and the transcription factors Mum1/IRF4 and Oct-2. In the group of PCFCCL high expression of SPINK2 was observed. Further analysis suggested that PCFCCLs and PCLBCLs-leg have expression profiles similar to that of germinal center B-cell-like and activated B-cell-like diffuse large B-cell lymphoma, respectively. The results of this study suggest that different pathogenetic mechanisms are involved in the development of PCFCCLs and PCLBCLs-leg and provide molecular support for the subdivision used in the EORTC classification.
Chen, Xi'en; Zhang, Ya-lin
2015-04-01
The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests on crucifer crops worldwide. In this study, 19 cDNAs encoding glutathione S-transferases (GSTs) were identified from the genomic and transcriptomic database for DBM (KONAGAbase) and further characterized. Phylogenetic analysis showed that the 19 GSTs were classified into six different cytosolic classes, including four in delta, six in epsilon, three in omega, two in sigma, one in theta and one in zeta. Two GSTs were unclassified. RT-PCR analysis revealed that most GST genes were expressed in all developmental stages, with higher expression in the larval stages. Six DBM GSTs were expressed at the highest levels in the midgut tissue. Twelve purified recombinant GSTs showed varied enzymatic properties towards 1-chloro-2,4-dinitrobenzene and glutathione, whereas rPxGSTo2, rPxGSTz1 and rPxGSTu2 had no activity. Real-time quantitative PCR revealed that expression levels of the 19 DBM GST genes were varied and changed after exposure to acephate, indoxacarb, beta-cypermethrin and spinosad. PxGSTd3 was significantly overexpressed, while PxGSTe3 and PxGSTs2 were significantly downregulated by all four insecticide exposures. The changes in DBM GST gene expression levels exposed to different insecticides indicate that they may play individual roles in tolerance to insecticides and xenobiotics. © 2014 Society of Chemical Industry.
A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status.
Sood, Sanjana; Gallagher, Iain J; Lunnon, Katie; Rullman, Eric; Keohane, Aoife; Crossland, Hannah; Phillips, Bethan E; Cederholm, Tommy; Jensen, Thomas; van Loon, Luc J C; Lannfelt, Lars; Kraus, William E; Atherton, Philip J; Howard, Robert; Gustafsson, Thomas; Hodges, Angela; Timmons, James A
2015-09-07
Diagnostics of the human ageing process may help predict future healthcare needs or guide preventative measures for tackling diseases of older age. We take a transcriptomics approach to build the first reproducible multi-tissue RNA expression signature by gene-chip profiling tissue from sedentary normal subjects who reached 65 years of age in good health. One hundred and fifty probe-sets form an accurate classifier of young versus older muscle tissue and this healthy ageing RNA classifier performed consistently in independent cohorts of human muscle, skin and brain tissue (n = 594, AUC = 0.83-0.96) and thus represents a biomarker for biological age. Using the Uppsala Longitudinal Study of Adult Men birth-cohort (n = 108) we demonstrate that the RNA classifier is insensitive to confounding lifestyle biomarkers, while greater gene score at age 70 years is independently associated with better renal function at age 82 years and longevity. The gene score is 'up-regulated' in healthy human hippocampus with age, and when applied to blood RNA profiles from two large independent age-matched dementia case-control data sets (n = 717) the healthy controls have significantly greater gene scores than those with cognitive impairment. Alone, or when combined with our previously described prototype Alzheimer disease (AD) RNA 'disease signature', the healthy ageing RNA classifier is diagnostic for AD. We identify a novel and statistically robust multi-tissue RNA signature of human healthy ageing that can act as a diagnostic of future health, using only a peripheral blood sample. This RNA signature has great potential to assist research aimed at finding treatments for and/or management of AD and other ageing-related conditions.
Delgado Sandoval, Silvia del Carmen; Abraham Juárez, María Jazmín; Simpson, June
2012-03-01
Agave tequilana is a monocarpic perennial species that flowers after 5-8 years of vegetative growth signaling the end of the plant's life cycle. When fertilization is unsuccessful, vegetative bulbils are induced on the umbels of the inflorescence near the bracteoles from newly formed meristems. Although the regulation of inflorescence and flower development has been described in detail for monocarpic annuals and polycarpic species, little is known at the molecular level for these processes in monocarpic perennials, and few studies have been carried out on bulbils. Histological samples revealed the early induction of umbel meristems soon after the initiation of the vegetative to inflorescence transition in A. tequilana. To identify candidate genes involved in the regulation of floral induction, a search for MADS-box transcription factor ESTs was conducted using an A. tequilana transcriptome database. Seven different MIKC MADS genes classified into 6 different types were identified based on previously characterized A. thaliana and O. sativa MADS genes and sequences from non-grass monocotyledons. Quantitative real-time PCR analysis of the seven candidate MADS genes in vegetative, inflorescence, bulbil and floral tissues uncovered novel patterns of expression for some of the genes in comparison with orthologous genes characterized in other species. In situ hybridization studies using two different genes showed expression in specific tissues of vegetative meristems and floral buds. Distinct MADS gene regulatory patterns in A. tequilana may be related to the specific reproductive strategies employed by this species.
Pan, Lin-Jie; Jiang, Ling
2014-03-01
The WRKY transcription factor (TF) plays a very important role in the response of plants to various abiotic and biotic stresses. A local papaya database was built according to the GenBank expressed sequence tag database using the BioEdit software. Fifty-two coding sequences of Carica papaya WRKY TFs were predicted using the tBLASTn tool. The phylogenetic tree of the WRKY proteins was classified. The expression profiles of 13 selected C. papaya WRKY TF genes under stress induction were constructed by quantitative real-time polymerase chain reaction. The expression levels of these WRKY genes in response to 3 abiotic and 2 biotic stresses were evaluated. TF807.3 and TF72.14 are upregulated by low temperature; TF807.3, TF43.76, TF12.199 and TF12.62 are involved in the response to drought stress; TF9.35, TF18.51, TF72.14 and TF12.199 is involved in response to wound; TF12.199, TF807.3, TF21.156 and TF18.51 was induced by PRSV pathogen; TF72.14 and TF43.76 are upregulated by SA. The regulated expression levels of above eight genes normalized against housekeeping gene actin were significant at probability of 0.01 levels. These WRKY TFs could be related to corresponding stress resistance and selected as the candidate genes, especially, the two genes TF807.3 and TF12.199, which were regulated notably by four stresses respectively. This study may provide useful information and candidate genes for the development of transgenic stress tolerant papaya varieties.
Abolhassani, Hassan; Farrokhi, Amir Salek; Pourhamdi, Shabnam; Mohammadinejad, Payam; Sadeghi, Bamdad; Moazzeni, Seyed-Mohammad; Aghamohammadi, Asghar
2013-08-01
Common variable immunodeficiency (CVID) is a heterogeneous disorder characterized by reduced serum level of IgG, IgA or IgM and recurrent bacterial infections. Class switch recombination (CSR) as a critical process in immunoglobulin production is defective in a group of CVID patients. Activation-induced cytidine deaminase (AID) protein is an important molecule involving CSR process. The aim of this study was to investigate the AID gene mRNA production in a group of CVID patients indicating possible role of this molecule in this disorder. Peripheral blood mononuclear cells (PBMC) of 29 CVID patients and 21 healthy controls were isolated and stimulated by CD40L and IL-4 to induce AID gene expression. After 5 days AID gene mRNA production was investigated by real time polymerase chain reaction. AID gene was expressed in all of the studied patients. However the mean density of extracted AID mRNA showed higher level in CVID patients (230.95±103.04 ng/ml) rather than controls (210.00±44.72 ng/ml; P=0.5). CVID cases with lower level of AID had decreased total level of IgE (P=0.04) and stimulated IgE production (P=0.02); while cases with increased level of AID presented higher level of IgA (P=0.04) and numbers of B cells (P=0.02) and autoimmune disease (P=0.02). Different levels of AID gene expression may have important roles in dysregulation of immune system and final clinical presentation in CVID patients. Therefore investigating the expression of AID gene can help in classifying CVID patients.
Ryan, Natalia; Chorley, Brian; Tice, Raymond R.; Judson, Richard; Corton, J. Christopher
2016-01-01
Microarray profiling of chemical-induced effects is being increasingly used in medium- and high-throughput formats. Computational methods are described here to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), often modulated by potential endocrine disrupting chemicals. ERα biomarker genes were identified by their consistent expression after exposure to 7 structurally diverse ERα agonists and 3 ERα antagonists in ERα-positive MCF-7 cells. Most of the biomarker genes were shown to be directly regulated by ERα as determined by ESR1 gene knockdown using siRNA as well as through chromatin immunoprecipitation coupled with DNA sequencing analysis of ERα-DNA interactions. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression datasets from experiments using MCF-7 cells, including those evaluating the transcriptional effects of hormones and chemicals. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% and 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) ER reference chemicals including “very weak” agonists. Importantly, the biomarker predictions accurately replicated predictions based on 18 in vitro high-throughput screening assays that queried different steps in ERα signaling. For 114 chemicals, the balanced accuracies were 95% and 98% for activation or suppression, respectively. These results demonstrate that the ERα gene expression biomarker can accurately identify ERα modulators in large collections of microarray data derived from MCF-7 cells. PMID:26865669
Chen, Zhenyu; Li, Jianping; Wei, Liwei
2007-10-01
Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.
2012-01-01
Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977
Competing endogenous RNA regulatory network in papillary thyroid carcinoma.
Chen, Shouhua; Fan, Xiaobin; Gu, He; Zhang, Lili; Zhao, Wenhua
2018-05-11
The present study aimed to screen all types of RNAs involved in the development of papillary thyroid carcinoma (PTC). RNA‑sequencing data of PTC and normal samples were used for screening differentially expressed (DE) microRNAs (DE‑miRNAs), long non‑coding RNAs (DE‑lncRNAs) and genes (DEGs). Subsequently, lncRNA‑miRNA, miRNA‑gene (that is, miRNA‑mRNA) and gene‑gene interaction pairs were extracted and used to construct regulatory networks. Feature genes in the miRNA‑mRNA network were identified by topological analysis and recursive feature elimination analysis. A support vector machine (SVM) classifier was built using 15 feature genes, and its classification effect was validated using two microarray data sets that were downloaded from the Gene Expression Omnibus (GEO) database. In addition, Gene Ontology function and Kyoto Encyclopedia Genes and Genomes pathway enrichment analyses were conducted for genes identified in the ceRNA network. A total of 506 samples, including 447 tumor samples and 59 normal samples, were obtained from The Cancer Genome Atlas (TCGA); 16 DE‑lncRNAs, 917 DEGs and 30 DE‑miRNAs were screened. The miRNA‑mRNA regulatory network comprised 353 nodes and 577 interactions. From these data, 15 feature genes with high predictive precision (>95%) were extracted from the network and were used to form an SVM classifier with an accuracy of 96.05% (486/506) for PTC samples downloaded from TCGA, and accuracies of 96.81 and 98.46% for GEO downloaded data sets. The ceRNA regulatory network comprised 596 lines (or interactions) and 365 nodes. Genes in the ceRNA network were significantly enriched in 'neuron development', 'differentiation', 'neuroactive ligand‑receptor interaction', 'metabolism of xenobiotics by cytochrome P450', 'drug metabolism' and 'cytokine‑cytokine receptor interaction' pathways. Hox transcript antisense RNA, miRNA‑206 and kallikrein‑related peptidase 10 were nodes in the ceRNA regulatory network of the selected feature gene, and they may serve import roles in the development of PTC.
Verhagen, Lilly M; Zomer, Aldert; Maes, Mailis; Villalba, Julian A; Del Nogal, Berenice; Eleveld, Marc; van Hijum, Sacha Aft; de Waard, Jacobus H; Hermans, Peter Wm
2013-02-01
Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.
2013-01-01
Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts. PMID:23375113
Sestak, Karol; Conroy, Lauren; Aye, Pyone P.; Mehra, Smriti; Doxiadis, Gaby G.; Kaushal, Deepak
2011-01-01
Background A non-human primate (NHP) model of gluten sensitivity was employed to study the gene perturbations associated with dietary gluten changes in small intestinal tissues from gluten-sensitive rhesus macaques (Macaca mulatta). Methodology Stages of remission and relapse were accomplished in gluten-sensitive animals by administration of gluten-free (GFD) and gluten-containing (GD) diets, as described previously. Pin-head-sized biopsies, obtained non-invasively by pediatric endoscope from duodenum while on GFD or GD, were used for preparation of total RNA and gene profiling, using the commercial Rhesus Macaque Microarray (Agilent Technologies),targeting expression of over 20,000 genes. Principal Findings When compared with normal healthy control, gluten-sensitive macaques showed differential gene expressions induced by GD. While observed gene perturbations were classified into one of 12 overlapping categories - cancer, metabolism, digestive tract function, immune response, cell growth, signal transduction, autoimmunity, detoxification of xenobiotics, apoptosis, actin-collagen deposition, neuronal and unknown function - this study focused on cancer-related gene networks such as cytochrome P450 family (detoxification function) and actin-collagen-matrix metalloproteinases (MMP) genes. Conclusions/Significance A loss of detoxification function paralleled with necessity to metabolize carcinogens was revealed in gluten-sensitive animals while on GD. An increase in cancer-promoting factors and a simultaneous decrease in cancer-preventing factors associated with altered expression of actin-collagen-MMP gene network were noted. In addition, gluten-sensitive macaques showed reduced number of differentially expressed genes including the cancer-associated ones upon withdrawal of dietary gluten. Taken together, these findings indicate potentially expanded utility of gluten-sensitive rhesus macaques in cancer research. PMID:21533263
Hu, Wei; Xia, Zhiqiang; Yan, Yan; Ding, Zehong; Tie, Weiwei; Wang, Lianzhe; Zou, Meiling; Wei, Yunxie; Lu, Cheng; Hou, Xiaowan; Wang, Wenquan; Peng, Ming
2015-01-01
Cassava is an important food and potential biofuel crop that is tolerant to multiple abiotic stressors. The mechanisms underlying these tolerances are currently less known. CBL-interacting protein kinases (CIPKs) have been shown to play crucial roles in plant developmental processes, hormone signaling transduction, and in the response to abiotic stress. However, no data is currently available about the CPK family in cassava. In this study, a total of 25 CIPK genes were identified from cassava genome based on our previous genome sequencing data. Phylogenetic analysis suggested that 25 MeCIPKs could be classified into four subfamilies, which was supported by exon-intron organizations and the architectures of conserved protein motifs. Transcriptomic analysis of a wild subspecies and two cultivated varieties showed that most MeCIPKs had different expression patterns between wild subspecies and cultivatars in different tissues or in response to drought stress. Some orthologous genes involved in CIPK interaction networks were identified between Arabidopsis and cassava. The interaction networks and co-expression patterns of these orthologous genes revealed that the crucial pathways controlled by CIPK networks may be involved in the differential response to drought stress in different accessions of cassava. Nine MeCIPK genes were selected to investigate their transcriptional response to various stimuli and the results showed the comprehensive response of the tested MeCIPK genes to osmotic, salt, cold, oxidative stressors, and ABA signaling. The identification and expression analysis of CIPK family suggested that CIPK genes are important components of development and multiple signal transduction pathways in cassava. The findings of this study will help lay a foundation for the functional characterization of the CIPK gene family and provide an improved understanding of abiotic stress responses and signaling transduction in cassava. PMID:26579161
Wei, Wei; Hu, Yang; Han, Yong-Tao; Zhang, Kai; Zhao, Feng-Li; Feng, Jia-Yue
2016-08-01
WRKY proteins comprise a large family of transcription factors that play important roles in response to biotic and abiotic stresses and in plant growth and development. To date, little is known about the WRKY gene family in strawberry. In this study, we identified 62 WRKY genes (FvWRKYs) in the wild diploid woodland strawberry (Fragaria vesca, 2n = 2x = 14) accession Heilongjiang-3. According to the phylogenetic analysis and structural features, these identified strawberry FvWRKY genes were classified into three main groups. In addition, eight FvWRKY-GFP fusion proteins showed distinct subcellular localizations in Arabidopsis mesophyll protoplasts. Furthermore, we examined the expression of the 62 FvWRKY genes in 'Heilongjiang-3' under various conditions, including biotic stress (Podosphaera aphanis), abiotic stresses (drought, salt, cold, and heat), and hormone treatments (abscisic acid, ethephon, methyl jasmonate, and salicylic acid). The expression levels of 33 FvWRKY genes were upregulated, while 12 FvWRKY genes were downregulated during powdery mildew infection. FvWRKY genes responded to drought and salt treatment to a greater extent than to temperature stress. Expression profiles derived from quantitative real-time PCR suggested that 11 FvWRKY genes responded dramatically to various stimuli at the transcriptional level, indicating versatile roles in responses to biotic and abiotic stresses. Interaction networks revealed that the crucial pathways controlled by WRKY proteins may be involved in the differential response to biotic stress. Taken together, the present work may provide the basis for future studies of the genetic modification of WRKY genes for pathogen resistance and stress tolerance in strawberry. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Xiao, Lin-Fan; Zhang, Wei; Jing, Tian-Xing; Zhang, Meng-Yi; Miao, Ze-Qing; Wei, Dan-Dan; Yuan, Guo-Rui; Wang, Jin-Jun
2018-03-01
The ATP-binding cassette (ABC) is the largest transporter gene family and the genes play key roles in xenobiotic resistance, metabolism, and development of all phyla. However, the specific functions of ABC gene families in insects is unclear. We report a genome-wide identification, phylogenetic, and transcriptional analysis of the ABC genes in the oriental fruit fly, Bactrocera dorsalis (Hendel). We identified a total of 47 ABC genes (BdABCs) from the transcriptomic and genomic databases of B. dorsalis and classified these genes into eight subfamilies (A-H), including 7 ABCAs, 7 ABCBs, 9 ABCCs, 2 ABCDs, 1 ABCE, 3 ABCFs, 15 ABCGs, and 3 ABCHs. Comparative phylogenetic analysis of the ABCs suggests an orthologous relationship between B. dorsalis and other insect species in which these genes have been related to pesticide resistance and essential biological processes. Comparison of transcriptome and relative expression patterns of BdABCs indicated diverse multifunctions within different B. dorsalis tissues. The expression of 4, 10, and 14 BdABCs from 18 BdABCs was significantly upregulated after exposure to LD 50 s of malathion, avermectin, and beta-cypermethrin, respectively. The maximum expression level of most BdABCs (including BdABCFs, BdABCGs, and BdABCHs) occurred at 48h post exposures, whereas BdABCEs peaked at 24h after treatment. Furthermore, RNA interference-mediated suppression of BdABCB7 resulted in increased toxicity of malathion against B. dorsalis. These data suggest that ABC transporter genes might play key roles in xenobiotic metabolism and biosynthesis in B. dorsalis. Copyright © 2017 Elsevier Inc. All rights reserved.
Huang, Jinguang; Zheng, Chengchao
2013-01-01
RNA helicases are enzymes that are thought to unwind double-stranded RNA molecules in an energy-dependent fashion through the hydrolysis of NTP. RNA helicases are associated with all processes involving RNA molecules, including nuclear transcription, editing, splicing, ribosome biogenesis, RNA export, and organelle gene expression. The involvement of RNA helicase in response to stress and in plant growth and development has been reported previously. While their importance in Arabidopsis and Oryza sativa has been partially studied, the function of RNA helicase proteins is poorly understood in Zea mays and Glycine max. In this study, we identified a total of RNA helicase genes in Arabidopsis and other crop species genome by genome-wide comparative in silico analysis. We classified the RNA helicase genes into three subfamilies according to the structural features of the motif II region, such as DEAD-box, DEAH-box and DExD/H-box, and different species showed different patterns of alternative splicing. Secondly, chromosome location analysis showed that the RNA helicase protein genes were distributed across all chromosomes with different densities in the four species. Thirdly, phylogenetic tree analyses identified the relevant homologs of DEAD-box, DEAH-box and DExD/H-box RNA helicase proteins in each of the four species. Fourthly, microarray expression data showed that many of these predicted RNA helicase genes were expressed in different developmental stages and different tissues under normal growth conditions. Finally, real-time quantitative PCR analysis showed that the expression levels of 10 genes in Arabidopsis and 13 genes in Zea mays were in close agreement with the microarray expression data. To our knowledge, this is the first report of a comparative genome-wide analysis of the RNA helicase gene family in Arabidopsis, Oryza sativa, Zea mays and Glycine max. This study provides valuable information for understanding the classification and putative functions of the RNA helicase gene family in crop growth and development. PMID:24265739
Xu, Ruirui; Zhang, Shizhong; Huang, Jinguang; Zheng, Chengchao
2013-01-01
RNA helicases are enzymes that are thought to unwind double-stranded RNA molecules in an energy-dependent fashion through the hydrolysis of NTP. RNA helicases are associated with all processes involving RNA molecules, including nuclear transcription, editing, splicing, ribosome biogenesis, RNA export, and organelle gene expression. The involvement of RNA helicase in response to stress and in plant growth and development has been reported previously. While their importance in Arabidopsis and Oryza sativa has been partially studied, the function of RNA helicase proteins is poorly understood in Zea mays and Glycine max. In this study, we identified a total of RNA helicase genes in Arabidopsis and other crop species genome by genome-wide comparative in silico analysis. We classified the RNA helicase genes into three subfamilies according to the structural features of the motif II region, such as DEAD-box, DEAH-box and DExD/H-box, and different species showed different patterns of alternative splicing. Secondly, chromosome location analysis showed that the RNA helicase protein genes were distributed across all chromosomes with different densities in the four species. Thirdly, phylogenetic tree analyses identified the relevant homologs of DEAD-box, DEAH-box and DExD/H-box RNA helicase proteins in each of the four species. Fourthly, microarray expression data showed that many of these predicted RNA helicase genes were expressed in different developmental stages and different tissues under normal growth conditions. Finally, real-time quantitative PCR analysis showed that the expression levels of 10 genes in Arabidopsis and 13 genes in Zea mays were in close agreement with the microarray expression data. To our knowledge, this is the first report of a comparative genome-wide analysis of the RNA helicase gene family in Arabidopsis, Oryza sativa, Zea mays and Glycine max. This study provides valuable information for understanding the classification and putative functions of the RNA helicase gene family in crop growth and development.
SRY protein is expressed in ovotestis and streak gonads from human sex-reversal.
Salas-Cortés, L; Jaubert, F; Nihoul-Feketé, C; Brauner, R; Rosemblatt, M; Fellous, M
2000-01-01
In mammals, a master gene located on the Y chromosome, the testis-determining gene SRY, controls sex determination. SRY protein is expressed in the genital ridge before testis determination, and in the testis it is expressed in Sertoli and germ cells. Completely sex-reversed patients are classified as either 46,XX males or 46,XY females. SRY mutations have been described in only 15% of patients with 46,XY complete or partial gonadal dysgenesis. However, although incomplete or partial sex-reversal affects 46,XX true hermaphrodites, 46,XY gonadal dysgenesis, and 46,XX/46,XY mosaicism, only 15% of the 46,XX true hermaphrodites analyzed have the SRY gene. Here, we demonstrate that the SRY protein is expressed in the tubules of streak gonads and rete testis, indicating that the SRY protein is normally expressed early during testis determination. Based on these results, we propose that some factors downstream from SRY may be mutated in these 46,XY sex-reversal patients. We have also analyzed SRY protein expression in the ovotestis from 46,XX true hermaphrodites and 46,XX/46,XY mosaicism, demonstrating SRY protein expression in both testicular and ovarian portions in these patients. This suggests that the SRY protein does not inhibit ovary development. These results confirm that other factors are needed for complete testis development, in particular, those downstream of the SRY protein. Copyright 2001 S. Karger AG, Basel
Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
Broman, Karl W.; Keller, Mark P.; Broman, Aimee Teo; Kendziorski, Christina; Yandell, Brian S.; Sen, Śaunak; Attie, Alan D.
2015-01-01
In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individual’s eQTL genotype based on expression data alone. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes (although we further identified a small number of sample mix-ups in each of the six panels of gene expression microarrays). Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems. Our methods have been implemented in an R package, R/lineup. PMID:26290572
Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.
Broman, Karl W; Keller, Mark P; Broman, Aimee Teo; Kendziorski, Christina; Yandell, Brian S; Sen, Śaunak; Attie, Alan D
2015-08-19
In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individual's eQTL genotype based on expression data alone. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes (although we further identified a small number of sample mix-ups in each of the six panels of gene expression microarrays). Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems. Our methods have been implemented in an R package, R/lineup. Copyright © 2015 Broman et al.
Bing, Feng; Zhao, Yu
2016-01-01
To screen the biomarkers having the ability to predict prognosis after chemotherapy for breast cancers. Three microarray data of breast cancer patients undergoing chemotherapy were collected from Gene Expression Omnibus database. After preprocessing, data in GSE41112 were analyzed using significance analysis of microarrays to screen the differentially expressed genes (DEGs). The DEGs were further analyzed by Differentially Coexpressed Genes and Links to construct a function module, the prognosis efficacy of which was verified by the other two datasets (GSE22226 and GSE58644) using Kaplan-Meier plots. The involved genes in function module were subjected to a univariate Cox regression analysis to confirm whether the expression of each prognostic gene was associated with survival. A total of 511 DEGs between breast cancer patients who received chemotherapy or not were obtained, consisting of 421 upregulated and 90 downregulated genes. Using the Differentially Coexpressed Genes and Links package, 1,244 differentially coexpressed genes (DCGs) were identified, among which 36 DCGs were regulated by the transcription factor complex NFY (NFYA, NFYB, NFYC). These 39 genes constructed a gene module to classify the samples in GSE22226 and GSE58644 into three subtypes and these subtypes exhibited significantly different survival rates. Furthermore, several genes of the 39 DCGs were shown to be significantly associated with good (such as CDC20) and poor (such as ARID4A) prognoses following chemotherapy. Our present study provided a serial of biomarkers for predicting the prognosis of chemotherapy or targets for development of alternative treatment (ie, CDC20 and ARID4A) in breast cancer patients.
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Xia, Shengjun; Chen, Yu; Jiang, Jiafu; Chen, Sumei; Guan, Zhiyong; Fang, Weimin; Chen, Fadi
2013-01-01
The molecular mechanisms underlying gravitropic bending of shoots are poorly understood and how genes related with this growing progress is still unclear. To identify genes related to asymmetric growth in the creeping shoots of chrysanthemum, suppression subtractive hybridization was used to visualize differential gene expression in the upper and lower halves of creeping shoots of ground-cover chrysanthemum under gravistimulation. Sequencing of 43 selected clones produced 41 unigenes (40 singletons and 1 unigenes), which were classifiable into 9 functional categories. A notable frequency of genes involve in cell wall biosynthesis up-regulated during gravistimulation in the upper side or lower side were found, such as beta tubulin (TUB), subtilisin-like protease (SBT), Glutathione S-transferase (GST), and expensing-like protein (EXP), lipid transfer proteins (LTPs), glycine-rich protein (GRP) and membrane proteins. Our findings also highlighted the function of some metal transporter during asymmetric growth, including the boron transporter (BT) and ZIP transporter (ZT), which were thought primarily for maintaining the integrity of cell walls and played important roles in cellulose biosynthesis. CmTUB (beta tubulin) was cloned, and the expression profile and phylogeny was examined, because the cytoskeleton of plant cells involved in the plant gravitropic bending growth is well known.
Kamei, Asuka; Watanabe, Yuki; Shinozaki, Fumika; Yasuoka, Akihito; Shimada, Kousuke; Kondo, Kaori; Ishijima, Tomoko; Toyoda, Tsudoi; Arai, Soichi; Kondo, Takashi; Abe, Keiko
2017-02-01
Maple syrup contains various polyphenols and we investigated the effects of a polyphenol-rich maple syrup extract (MSXH) on the physiology of mice fed a high-fat diet (HFD). The mice fed a low-fat diet (LFD), an HFD, or an HFD supplemented with 0.02% (002MSXH) or 0.05% MSXH (005MSXH) for 4 weeks. Global gene expression analysis of the liver was performed, and the differentially expressed genes were classified into three expression patterns; pattern A (LFD < HFD > 002MSXH = 005MSXH, LFD > HFD < 002MSXH = 005MSXH), pattern B (LFD < HFD = 002MSXH > 005MSXH, LFD > HFD = 002MSXH < 005MSXH), and pattern C (LFD < HFD > 002MSXH < 005MSXH, LFD > HFD < 002MSXH > 005MSXH). Pattern A was enriched in glycolysis, fatty acid metabolism, and folate metabolism. Pattern B was enriched in tricarboxylic acid cycle while pattern C was enriched in gluconeogenesis, cholesterol metabolism, amino acid metabolism, and endoplasmic reticulum stress-related event. Our study suggested that the effects of MSXH ingestion showed (i) dose-dependent pattern involved in energy metabolisms and (ii) reversely pattern involved in stress responses. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino
2014-01-30
The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.
Two Clinical Phenotypes in Polycythemia Vera
Spivak, Jerry L.; Considine, Michael; Williams, Donna M.; Talbot, Conover C.; Rogers, Ophelia; Moliterno, Alison R.; Jie, Chunfa; Ochs, Michael F.
2014-01-01
BACKGROUND Polycythemia vera is the ultimate phenotypic consequence of the V617F mutation in Janus kinase 2 (encoded by JAK2), but the extent to which this mutation influences the behavior of the involved CD34+ hematopoietic stem cells is unknown. METHODS We analyzed gene expression in CD34+ peripheral-blood cells from 19 patients with polycythemia vera, using oligonucleotide microarray technology after correcting for potential confounding by sex, since the phenotypic features of the disease differ between men and women. RESULTS Men with polycythemia vera had twice as many up-regulated or down-regulated genes as women with polycythemia vera, in a comparison of gene expression in the patients and in healthy persons of the same sex, but there were 102 genes with differential regulation that was concordant in men and women. When these genes were used for class discovery by means of unsupervised hierarchical clustering, the 19 patients could be divided into two groups that did not differ significantly with respect to age, neutrophil JAK2 V617F allele burden, white-cell count, platelet count, or clonal dominance. However, they did differ significantly with respect to disease duration; hemoglobin level; frequency of thromboembolic events, palpable splenomegaly, and splenectomy; chemotherapy exposure; leukemic transformation; and survival. The unsupervised clustering was confirmed by a supervised approach with the use of a top-scoring-pair classifier that segregated the 19 patients into the same two phenotypic groups with 100% accuracy. CONCLUSIONS Removing sex as a potential confounder, we identified an accurate molecular method for classifying patients with polycythemia vera according to disease behavior, independently of their JAK2 V617F allele burden, and identified previously unrecognized molecular pathways in polycythemia vera outside the canonical JAK2 pathway that may be amenable to targeted therapy. PMID:25162887
Schmidt-Heck, Wolfgang; Wönne, Eva C; Hiller, Thomas; Menzel, Uwe; Koczan, Dirk; Damm, Georg; Seehofer, Daniel; Knöspel, Fanny; Freyer, Nora; Guthke, Reinhard; Dooley, Steven; Zeilinger, Katrin
2017-05-01
The liver is the major site for alcohol metabolism in the body and therefore the primary target organ for ethanol (EtOH)-induced toxicity. In this study, we investigated the in vitro response of human liver cells to different EtOH concentrations in a perfused bioartificial liver device that mimics the complex architecture of the natural organ. Primary human liver cells were cultured in the bioartificial liver device and treated for 24 hours with medium containing 150 mM (low), 300 mM (medium), or 600 mM (high) EtOH, while a control culture was kept untreated. Gene expression patterns for each EtOH concentration were monitored using Affymetrix Human Gene 1.0 ST Gene chips. Scaled expression profiles of differentially expressed genes (DEGs) were clustered using Fuzzy c-means algorithm. In addition, functional classification methods, KEGG pathway mapping and also a machine learning approach (Random Forest) were utilized. A number of 966 (150 mM EtOH), 1,334 (300 mM EtOH), or 4,132 (600 mM EtOH) genes were found to be differentially expressed. Dose-response relationships of the identified clusters of co-expressed genes showed a monotonic, threshold, or nonmonotonic (hormetic) behavior. Functional classification of DEGs revealed that low or medium EtOH concentrations operate adaptation processes, while alterations observed for the high EtOH concentration reflect the response to cellular damage. The genes displaying a hormetic response were functionally characterized by overrepresented "cellular ketone metabolism" and "carboxylic acid metabolism." Altered expression of the genes BAHD1 and H3F3B was identified as sufficient to classify the samples according to the applied EtOH doses. Different pathways of metabolic and epigenetic regulation are affected by EtOH exposition and partly undergo hormetic regulation in the bioartificial liver device. Gene expression changes observed at high EtOH concentrations reflect in some aspects the situation of alcoholic hepatitis in humans. Copyright © 2017 by the Research Society on Alcoholism.
Jayaraman, Ananthi; Puranik, Swati; Rai, Neeraj Kumar; Vidapu, Sudhakar; Sahu, Pranav Pankaj; Lata, Charu; Prasad, Manoj
2008-11-01
Plant growth and productivity are affected by various abiotic stresses such as heat, drought, cold, salinity, etc. The mechanism of salt tolerance is one of the most important subjects in plant science as salt stress decreases worldwide agricultural production. In our present study we used cDNA-AFLP technique to compare gene expression profiles of a salt tolerant and a salt-sensitive cultivar of foxtail millet (Seteria italica) in response to salt stress to identify early responsive differentially expressed transcripts accumulated upon salt stress and validate the obtained result through quantitative real-time PCR (qRT-PCR). The expression profile was compared between a salt tolerant (Prasad) and susceptible variety (Lepakshi) of foxtail millet in both control condition (L0 and P0) and after 1 h (L1 and P1) of salt stress. We identified 90 transcript-derived fragments (TDFs) that are differentially expressed, out of which 86 TDFs were classified on the basis of their either complete presence or absence (qualitative variants) and 4 on differential expression pattern levels (quantitative variants) in the two varieties. Finally, we identified 27 non-redundant differentially expressed cDNAs that are unique to salt tolerant variety which represent different groups of genes involved in metabolism, cellular transport, cell signaling, transcriptional regulation, mRNA splicing, seed development and storage, etc. The expression patterns of seven out of nine such genes showed a significant increase of differential expression in tolerant variety after 1 h of salt stress in comparison to salt-sensitive variety as analyzed by qRT-PCR. The direct and indirect relationship of identified TDFs with salinity tolerance mechanism is discussed.
Qiu, Lingling; Jiang, Bo; Fang, Jia; Shen, Yike; Fang, Zhongxiang; Rm, Saravana Kumar; Yi, Keke; Shen, Chenjia; Yan, Daoliang; Zheng, Bingsong
2016-11-17
Hickory (Carya cathayensis), a woody plant with high nutritional and economic value, is widely planted in China. Due to its long juvenile phase, grafting is a useful technique for large-scale cultivation of hickory. To reveal the molecular mechanism during the graft process, we sequenced the transcriptomes of graft union in hickory. In our study, six RNA-seq libraries yielded a total of 83,676,860 clean short reads comprising 4.19 Gb of sequence data. A large number of differentially expressed genes (DEGs) at three time points during the graft process were identified. In detail, 777 DEGs in the 7 d vs 0 d (day after grafting) comparison were classified into 11 enriched Gene Ontology (GO) categories, and 262 DEGs in the 14 d vs 0 d comparison were classified into 15 enriched GO categories. Furthermore, an overview of the PPI network was constructed by these DEGs. In addition, 20 genes related to the auxin-and cytokinin-signaling pathways were identified, and some were validated by qRT-PCR analysis. Our comprehensive analysis provides basic information on the candidate genes and hormone signaling pathways involved in the graft process in hickory and other woody plants.
Singh, Uma M.; Chandra, Muktesh; Shankhdhar, Shailesh C.; Kumar, Anil
2014-01-01
Background In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. Principal Finding In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Conclusion Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species. PMID:25157851
Singh, Uma M; Chandra, Muktesh; Shankhdhar, Shailesh C; Kumar, Anil
2014-01-01
In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species.
Ayyappan, Vasudevan; Kalavacharla, Venu; Thimmapuram, Jyothi; Bhide, Ketaki P; Sripathi, Venkateswara R; Smolinski, Tomasz G; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce
2015-01-01
Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress.
Thimmapuram, Jyothi; Bhide, Ketaki P.; Sripathi, Venkateswara R.; Smolinski, Tomasz G.; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce
2015-01-01
Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress. PMID:26167691
Liu, Juanxu; Li, Jingyu; Wang, Huinan; Fu, Zhaodi; Liu, Juan; Yu, Yixun
2011-01-01
Ethylene-responsive element-binding factor (ERF) genes constitute one of the largest transcription factor gene families in plants. In Arabidopsis and rice, only a few ERF genes have been characterized so far. Flower senescence is associated with increased ethylene production in many flowers. However, the characterization of ERF genes in flower senescence has not been reported. In this study, 13 ERF cDNAs were cloned from petunia. Based on the sequence characterization, these PhERFs could be classified into four of the 12 known ERF families. Their predicted amino acid sequences exhibited similarities to ERFs from other plant species. Expression analyses of PhERF mRNAs were performed in corollas and gynoecia of petunia flower. The 13 PhERF genes displayed differential expression patterns and levels during natural flower senescence. Exogenous ethylene accelerates the transcription of the various PhERF genes, and silver thiosulphate (STS) decreased the transcription of several PhERF genes in corollas and gynoecia. PhERF genes of group VII showed a strong association with the rise in ethylene production in both petals and gynoecia, and might be associated particularly with flower senescence in petunia. The effect of sugar, methyl jasmonate, and the plant hormones abscisic acid, salicylic acid, and 6-benzyladenine in regulating the different PhERF transcripts was investigated. Functional nuclear localization signal analyses of two PhERF proteins (PhERF2 and PhERF3) were carried out using fluorescence microscopy. These results supported a role for petunia PhERF genes in transcriptional regulation of petunia flower senescence processes.
Sample entropy analysis of cervical neoplasia gene-expression signatures
Botting, Shaleen K; Trzeciakowski, Jerome P; Benoit, Michelle F; Salama, Salama A; Diaz-Arrastia, Concepcion R
2009-01-01
Background We introduce Approximate Entropy as a mathematical method of analysis for microarray data. Approximate entropy is applied here as a method to classify the complex gene expression patterns resultant of a clinical sample set. Since Entropy is a measure of disorder in a system, we believe that by choosing genes which display minimum entropy in normal controls and maximum entropy in the cancerous sample set we will be able to distinguish those genes which display the greatest variability in the cancerous set. Here we describe a method of utilizing Approximate Sample Entropy (ApSE) analysis to identify genes of interest with the highest probability of producing an accurate, predictive, classification model from our data set. Results In the development of a diagnostic gene-expression profile for cervical intraepithelial neoplasia (CIN) and squamous cell carcinoma of the cervix, we identified 208 genes which are unchanging in all normal tissue samples, yet exhibit a random pattern indicative of the genetic instability and heterogeneity of malignant cells. This may be measured in terms of the ApSE when compared to normal tissue. We have validated 10 of these genes on 10 Normal and 20 cancer and CIN3 samples. We report that the predictive value of the sample entropy calculation for these 10 genes of interest is promising (75% sensitivity, 80% specificity for prediction of cervical cancer over CIN3). Conclusion The success of the Approximate Sample Entropy approach in discerning alterations in complexity from biological system with such relatively small sample set, and extracting biologically relevant genes of interest hold great promise. PMID:19232110
Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.
Diao, Wei-Ping; Snyder, John C.; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge
2016-01-01
The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper. PMID:26941768
Calla, Bernarda; Blahut-Beatty, Laureen; Koziol, Lisa; Zhang, Yunfang; Neece, David J; Carbajulca, Doris; Garcia, Alexandre; Simmonds, Daina H; Clough, Steven J
2014-08-01
Oxalate oxidases (OxO) catalyse the degradation of oxalic acid (OA). Highly resistant transgenic soybean carrying an OxO gene and its susceptible parent soybean line, AC Colibri, were tested for genome-wide gene expression in response to the necrotrophic, OA-producing pathogen Sclerotinia sclerotiorum using soybean cDNA microarrays. The genes with changed expression at statistically significant levels (overall F-test P-value cut-off of 0.0001) were classified into functional categories and pathways, and were analysed to evaluate the differences in transcriptome profiles. Although many genes and pathways were found to be similarly activated or repressed in both genotypes after inoculation with S. sclerotiorum, the OxO genotype displayed a measurably faster induction of basal defence responses, as observed by the differential changes in defence-related and secondary metabolite genes compared with its susceptible parent AC Colibri. In addition, the experiment presented provides data on several other transcripts that support the hypothesis that S. sclerotiorum at least partially elicits the hypersensitive response, induces lignin synthesis (cinnamoyl CoA reductase) and elicits as yet unstudied signalling pathways (G-protein-coupled receptor and related). Of the nine genes showing the most extreme opposite directions of expression between genotypes, eight were related to photosynthesis and/or oxidation, highlighting the importance of redox in the control of this pathogen. © 2014 BSPP AND JOHN WILEY & SONS LTD.
Dt2 Is a Gain-of-Function MADS-Domain Factor Gene That Specifies Semideterminacy in Soybean[C][W
Ping, Jieqing; Liu, Yunfeng; Sun, Lianjun; Zhao, Meixia; Li, Yinghui; She, Maoyun; Sui, Yi; Lin, Feng; Liu, Xiaodong; Tang, Zongxiang; Nguyen, Hanh; Tian, Zhixi; Qiu, Lijuan; Nelson, Randall L.; Clemente, Thomas E.; Specht, James E.; Ma, Jianxin
2014-01-01
Similar to Arabidopsis thaliana, the wild soybeans (Glycine soja) and many cultivars exhibit indeterminate stem growth specified by the shoot identity gene Dt1, the functional counterpart of Arabidopsis TERMINAL FLOWER1 (TFL1). Mutations in TFL1 and Dt1 both result in the shoot apical meristem (SAM) switching from vegetative to reproductive state to initiate terminal flowering and thus produce determinate stems. A second soybean gene (Dt2) regulating stem growth was identified, which, in the presence of Dt1, produces semideterminate plants with terminal racemes similar to those observed in determinate plants. Here, we report positional cloning and characterization of Dt2, a dominant MADS domain factor gene classified into the APETALA1/SQUAMOSA (AP1/SQUA) subfamily that includes floral meristem (FM) identity genes AP1, FUL, and CAL in Arabidopsis. Unlike AP1, whose expression is limited to FMs in which the expression of TFL1 is repressed, Dt2 appears to repress the expression of Dt1 in the SAMs to promote early conversion of the SAMs into reproductive inflorescences. Given that Dt2 is not the gene most closely related to AP1 and that semideterminacy is rarely seen in wild soybeans, Dt2 appears to be a recent gain-of-function mutation, which has modified the genetic pathways determining the stem growth habit in soybean. PMID:25005919
Yang, Qinsong; Niu, Qingfeng; Li, Jianzhao; Zheng, Xiaoyan; Ma, Yunjing; Bai, Songling; Teng, Yuanwen
2018-06-01
Homeodomain-leucine zipper (HD-Zip) proteins, which form one of the largest and most diverse families, regulate many biological processes in plants, including differentiation, flowering, vascular development, and stress signaling. Abscisic acid (ABA) has been proved to be one of the key regulators of bud dormancy and to influence several HD-Zip genes expression. However, the role of HD-Zip genes in regulating bud dormancy remains unclear. We identified 47 pear (P. pyrifolia White Pear Group) HD-Zip genes, which were classified into four subfamilies (HD-Zip I-IV). We further revealed that gene expression levels of some HD-Zip members were closely related to ABA concentrations in flower buds during dormancy transition. Exogenous ABA treatment confirmed that PpHB22 and several other HD-Zip genes responded to ABA. Yeast one-hybrid and dual luciferase assay results combining subcellular localization showed that PpHB22 was present in nucleus and directly induced PpDAM1 (dormancy associated MADS-box 1) expression. Thus, PpHB22 is a negative regulator of plant growth associated with the ABA response pathway and functions upstream of PpDAM1. These findings enrich our understanding of the function of HD-Zip genes related to the bud dormancy transition. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Mao, Ke; Dong, Qinglong; Li, Chao; Liu, Changhai; Ma, Fengwang
2017-01-01
The bHLH (basic helix-loop-helix) transcription factor family is the second largest in plants. It occurs in all three eukaryotic kingdoms, and plays important roles in regulating growth and development. However, family members have not previously been studied in apple. Here, we identified 188 MdbHLH proteins in apple "Golden Delicious" ( Malus × domestica Borkh.), which could be classified into 18 groups. We also investigated the gene structures and 12 conserved motifs in these MdbHLH s. Coupled with expression analysis and protein interaction network prediction, we identified several genes that might be responsible for abiotic stress responses. This study provides insight and rich resources for subsequent investigations of such proteins in apple.
Mao, Ke; Dong, Qinglong; Li, Chao; Liu, Changhai; Ma, Fengwang
2017-01-01
The bHLH (basic helix-loop-helix) transcription factor family is the second largest in plants. It occurs in all three eukaryotic kingdoms, and plays important roles in regulating growth and development. However, family members have not previously been studied in apple. Here, we identified 188 MdbHLH proteins in apple “Golden Delicious” (Malus × domestica Borkh.), which could be classified into 18 groups. We also investigated the gene structures and 12 conserved motifs in these MdbHLHs. Coupled with expression analysis and protein interaction network prediction, we identified several genes that might be responsible for abiotic stress responses. This study provides insight and rich resources for subsequent investigations of such proteins in apple. PMID:28443104
He, Chunmei; Teixeira da Silva, Jaime A; Tan, Jianwen; Zhang, Jianxia; Pan, Xiaoping; Li, Mingzhi; Luo, Jianping; Duan, Jun
2017-08-23
The WRKY family, one of the largest families of transcription factors, plays important roles in the regulation of various biological processes, including growth, development and stress responses in plants. In the present study, 63 DoWRKY genes were identified from the Dendrobium officinale genome. These were classified into groups I, II, III and a non-group, each with 14, 28, 10 and 11 members, respectively. ABA-responsive, sulfur-responsive and low temperature-responsive elements were identified in the 1-k upstream regulatory region of DoWRKY genes. Subsequently, the expression of the 63 DoWRKY genes under cold stress was assessed, and the expression profiles of a large number of these genes were regulated by low temperature in roots and stems. To further understand the regulatory mechanism of DoWRKY genes in biological processes, potential WRKY target genes were investigated. Among them, most stress-related genes contained multiple W-box elements in their promoters. In addition, the genes involved in polysaccharide synthesis and hydrolysis contained W-box elements in their 1-k upstream regulatory regions, suggesting that DoWRKY genes may play a role in polysaccharide metabolism. These results provide a basis for investigating the function of WRKY genes and help to understand the downstream regulation network in plants within the Orchidaceae.
Hu, Wei; Wang, Lianzhe; Tie, Weiwei; Yan, Yan; Ding, Zehong; Liu, Juhua; Li, Meiying; Peng, Ming; Xu, Biyu; Jin, Zhiqiang
2016-01-01
The leucine zipper (bZIP) transcription factors play important roles in multiple biological processes. However, less information is available regarding the bZIP family in the important fruit crop banana. In this study, 121 bZIP transcription factor genes were identified in the banana genome. Phylogenetic analysis showed that MabZIPs were classified into 11 subfamilies. The majority of MabZIP genes in the same subfamily shared similar gene structures and conserved motifs. The comprehensive transcriptome analysis of two banana genotypes revealed the differential expression patterns of MabZIP genes in different organs, in various stages of fruit development and ripening, and in responses to abiotic stresses, including drought, cold, and salt. Interaction networks and co-expression assays showed that group A MabZIP-mediated networks participated in various stress signaling, which was strongly activated in Musa ABB Pisang Awak. This study provided new insights into the complicated transcriptional control of MabZIP genes and provided robust tissue-specific, development-dependent, and abiotic stress-responsive candidate MabZIP genes for potential applications in the genetic improvement of banana cultivars. PMID:27445085
Gene Expression Profiling of Acute Lymphoblastic Leukemia in Children with Very Early Relapse.
Núñez-Enríquez, Juan Carlos; Bárcenas-López, Diego Alberto; Hidalgo-Miranda, Alfredo; Jiménez-Hernández, Elva; Bekker-Méndez, Vilma Carolina; Flores-Lujano, Janet; Solis-Labastida, Karina Anastacia; Martínez-Morales, Gabriela Bibiana; Sánchez-Muñoz, Fausto; Espinoza-Hernández, Laura Eugenia; Velázquez-Aviña, Martha Margarita; Merino-Pasaye, Laura Elizabeth; García Velázquez, Alejandra Jimena; Pérez-Saldívar, María Luisa; Mojica-Espinoza, Raúl; Ramírez-Bello, Julián; Jiménez-Morales, Silvia; Mejía-Aranguré, Juan Manuel
2016-11-01
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer worldwide. Mexican patients have high mortality rates, low frequency of good prognosis biomarkers (i.e., ETV6-RUNX1) and a high proportion is classified at the time of diagnosis with a high risk to relapse according to clinical features. In addition, very early relapses are more frequently observed than in other populations. The aim of the study was to identify new potential biomarkers associated with very early relapse in Mexican ALL children through transcriptome analysis. Microarray gene expression profiling on bone marrow samples of 54 pediatric ALL patients, collected at time of diagnosis and/or at relapse, was performed. Eleven patients presented relapse within the first 18 months after diagnosis. Affymetrix Human Transcriptome Array 2.0 (HTA 2.0) was used to perform gene expression analysis. Annotation and functional enrichment analyses were carried out using Gene Ontology, KEGG pathway analysis and Ingenuity Pathway Analysis tools. BLVRB, ZCCHC7, PAX5, EBF1, TMOD1 and BLNK were differentially expressed (fold-change >2.0 and p value <0.01) between relapsed and non-relapsed patients. Functional analysis of abnormally expressed genes revealed their important role in cellular processes related to the development of hematological diseases, cancer, cell death and survival and in cell-to-cell signaling interaction. Our data support previous findings showing the relevance of PAX5, EBF1 and ZCCHC7 as potential biomarkers to identify a subgroup of ALL children in high risk to relapse. Copyright © 2016 IMSS. Published by Elsevier Inc. All rights reserved.
Mutch, David M; Pers, Tune H; Temanni, M Ramzi; Pelloux, Veronique; Marquez-Quiñones, Adriana; Holst, Claus; Martinez, J Alfredo; Babalis, Dimitris; van Baak, Marleen A; Handjieva-Darlenska, Teodora; Walker, Celia G; Astrup, Arne; Saris, Wim H M; Langin, Dominique; Viguerie, Nathalie; Zucker, Jean-Daniel; Clément, Karine
2011-12-01
Weight loss has been shown to reduce risk factors associated with cardiovascular disease and diabetes; however, successful maintenance of weight loss continues to pose a challenge. The present study was designed to assess whether changes in subcutaneous adipose tissue (scAT) gene expression during a low-calorie diet (LCD) could be used to differentiate and predict subjects who experience successful short-term weight maintenance from subjects who experience weight regain. Forty white women followed a dietary protocol consisting of an 8-wk LCD phase followed by a 6-mo weight-maintenance phase. Participants were classified as weight maintainers (WMs; 0-10% weight regain) and weight regainers (WRs; 50-100% weight regain) by considering changes in body weight during the 2 phases. Anthropometric measurements, bioclinical variables, and scAT gene expression were studied in all individuals before and after the LCD. Energy intake was estimated by using 3-d dietary records. No differences in body weight and fasting insulin were observed between WMs and WRs at baseline or after the LCD period. The LCD resulted in significant decreases in body weight and in several plasma variables in both groups. WMs experienced a significant reduction in insulin secretion in response to an oral-glucose-tolerance test after the LCD; in contrast, no changes in insulin secretion were observed in WRs after the LCD. An ANOVA of scAT gene expression showed that genes regulating fatty acid metabolism, citric acid cycle, oxidative phosphorylation, and apoptosis were regulated differently by the LCD in WM and WR subjects. This study suggests that LCD-induced changes in insulin secretion and scAT gene expression may have the potential to predict successful short-term weight maintenance. This trial was registered at clinicaltrials.gov as NCT00390637.
Christie, Lyndsay; van Aerle, Ronny; Paley, Richard K; Verner-Jeffreys, David W; Tidbury, Hannah; Green, Matthew; Feist, Stephen W; Cano, Irene
2018-07-01
Puffy skin disease (PSD) is an emerging skin condition which affects rainbow trout, Oncorhynchus mykiss (Walbaum). The transmission pattern of PSD suggests an infectious aetiology, however, the actual causative infectious agent(s) remain(s) unknown. In the present study, the rainbow trout epidermal immune response to PSD was characterised. Skin samples from infected fish were analysed and classified as mild, moderate or severe PSD by gross pathology and histological assessment. The level of expression of 26 immune-associated genes including cytokines, immunoglobulins and cell markers were examined by TaqMan qPCR assays. A significant up-regulation of the gene expression of C3, lysozyme, IL-1β and T-bet and down-regulation of TGFβ and TLR3 was observed in PSD fish compared to control fish. MHCI gene expression was up-regulated only in severe PSD lesions. Histological examinations of the epidermis showed a significant increase in the number of eosinophil cells and dendritic melanocytes in PSD fish. In severe lesions, mild diffuse lymphocyte infiltration was observed. IgT and CD8 positive cells were detected locally in the skin of PSD fish by in situ hybridisation (ISH), however, the gene expression of those genes was not different from control fish. Total IgM in serum of diseased animals was not different from control fish, measured by a sandwich ELISA, nor was significant up regulation of IgM gene expression in PSD lesions observed. Taken together, these results show activation of the complement pathway, up-regulation of a Th17 type response and eosinophilia during PSD. This is typical of a response to extracellular pathogens (i.e. bacteria and parasites) and allergens, commonly associated with acute dermatitis. Copyright © 2018. Published by Elsevier Ltd.
Molecular Subtypes of Glioblastoma Are Relevant to Lower Grade Glioma
Sloan, Andrew E.; Chen, Yanwen; Brat, Daniel J.; O’Neill, Brian Patrick; de Groot, John; Yust-Katz, Shlomit; Yung, Wai-Kwan Alfred; Cohen, Mark L.; Aldape, Kenneth D.; Rosenfeld, Steven; Verhaak, Roeland G. W.; Barnholtz-Sloan, Jill S.
2014-01-01
Background Gliomas are the most common primary malignant brain tumors in adults with great heterogeneity in histopathology and clinical course. The intent was to evaluate the relevance of known glioblastoma (GBM) expression and methylation based subtypes to grade II and III gliomas (ie. lower grade gliomas). Methods Gene expression array, single nucleotide polymorphism (SNP) array and clinical data were obtained for 228 GBMs and 176 grade II/II gliomas (GII/III) from the publically available Rembrandt dataset. Two additional datasets with IDH1 mutation status were utilized as validation datasets (one publicly available dataset and one newly generated dataset from MD Anderson). Unsupervised clustering was performed and compared to gene expression subtypes assigned using the Verhaak et al 840-gene classifier. The glioma-CpG Island Methylator Phenotype (G-CIMP) was assigned using prediction models by Fine et al. Results Unsupervised clustering by gene expression aligned with the Verhaak 840-gene subtype group assignments. GII/IIIs were preferentially assigned to the proneural subtype with IDH1 mutation and G-CIMP. GBMs were evenly distributed among the four subtypes. Proneural, IDH1 mutant, G-CIMP GII/III s had significantly better survival than other molecular subtypes. Only 6% of GBMs were proneural and had either IDH1 mutation or G-CIMP but these tumors had significantly better survival than other GBMs. Copy number changes in chromosomes 1p and 19q were associated with GII/IIIs, while these changes in CDKN2A, PTEN and EGFR were more commonly associated with GBMs. Conclusions GBM gene-expression and methylation based subtypes are relevant for GII/III s and associate with overall survival differences. A better understanding of the association between these subtypes and GII/IIIs could further knowledge regarding prognosis and mechanisms of glioma progression. PMID:24614622
microRNA-133: expression, function and therapeutic potential in muscle diseases and cancer.
Yu, Hao; Lu, Yinhui; Li, Zhaofa; Wang, Qizhao
2014-01-01
microRNAs (miRNAs) are a class of small non-coding RNAs that are 18-25 nucleotides (nt) in length and negatively regulate gene expression post-transcriptionally. miRNAs are known to mediate myriad processes and pathways. While many miRNAs are expressed ubiquitously, some are expressed in a tissue specific manner. miR-133 is one of the most studied and best characterized miRNAs to date. Specifically expressed in muscles, it has been classified as myomiRNAs and is necessary for proper skeletal and cardiac muscle development and function. Genes encoding miR-133 (miR-133a-1, miR-133a-2 and miR-133b) are transcribed as bicistronic transcripts together with miR-1-2, miR-1-1, and miR-206, respectively. However, they exhibit opposing impacts on muscle development. miR-133 gets involved in muscle development by targeting a lot of genes, including SFR, HDAC4, cyclin D2 and so on. Its aberrant expression has been linked to many diseases in skeletal muscle and cardiac muscle such as cardiac hypertrophy, muscular dystrophy, heart failure, cardiac arrhythmia. Beyond the study in muscle, miR-133 has been implicated in cancer and identified as a key factor in cancer development, including bladder cancer, prostate cancer and so on. Much more attention has been drawn to the versatile molecular functions of miR-133, making it a truly valuable therapeutic gene in miRNA-based gene therapy. In this review, we identified and summarized the results of studies of miR-133 with emphasis on its function in human diseases in muscle and cancer, and highlighted its therapeutic value. It might provide researchers a new insight into the biological significance of miR-133.
Aboushousha, Tarek; Mamdouh, Samah; Hamdy, Hussam; Helal, Noha; Khorshed, Fatma; Safwat, Gehan; Seleem, Mohamed
2018-01-01
Objective: To investigate the expression of TTF-1, RAGE, GLUT1 and SOX2 in HCV-associated HCCs and in surrounding non-tumorous liver tissue. Material and Methods: Tissue material from partial hepatectomy cases for HCC along with corresponding serum samples and 30 control serum samples from healthy volunteers were studied. Biopsies were classified into: non-tumor hepatic tissue (36 sections); HCC (33 sections) and liver cell dysplasia (LCD) (15 sections). All cases were positive for HCV. Immunohistochemistry (IHC), gene extraction and quantitative real-time reverse-transcription assays (qRT-PCR) were applied. Results: By IHC, LCD and HCC showed significantly high percentages of positive cases with all markers. SOX2 showed significant increase with higher HCC grades, while RAGE demonstrated an inverse relation and GLUT-1 and TTF-1 lacked any correlation. In nontumorous-HCV tissue, we found significantly high TTF-1, low RAGE and negative SOX2 expression. RAGE, GLUT-1 and SOX2 show non-significant elevation positivity in high grade HCV compared to low grade lesions. TTF-1, RAGE and SOX2 exhibited low expression in cirrhosis compared to fibrosis. Biochemical studies on serum and tissue extracts revealed significant down-regulation of RAGE, GLUT-1 and SOX2 genes, as well as significant up-regulation of the TTF-1 gene in HCC cases compared to controls. All studied genes show significant correlation with HCC grade. In non-tumor tissue, only TTF-1 gene expression had a significant correlation with the fibrosis score. Conclusion: Higher expression of TTF-1, RAGE, GLUT-1 and SOX2 in HCC and dysplasia compared to non-tumor tissues indicates up-regulation of these markers as early events during the development of HCV-associated HCC. PMID:29373917
Aboushousha, Tarek; Mamdouh, Samah; Hamdy, Hussam; Helal, Noha; Khorshed, Fatma; Safwat, Gehan; Seleem, Mohamed
2018-01-27
Objective: To investigate the expression of TTF-1, RAGE, GLUT1 and SOX2 in HCV-associated HCCs and in surrounding non-tumorous liver tissue. Material and Methods: Tissue material from partial hepatectomy cases for HCC along with corresponding serum samples and 30 control serum samples from healthy volunteers were studied. Biopsies were classified into: non-tumor hepatic tissue (36 sections); HCC (33 sections) and liver cell dysplasia (LCD) (15 sections). All cases were positive for HCV. Immunohistochemistry (IHC), gene extraction and quantitative real-time reverse-transcription assays (qRT-PCR) were applied. Results: By IHC, LCD and HCC showed significantly high percentages of positive cases with all markers. SOX2 showed significant increase with higher HCC grades, while RAGE demonstrated an inverse relation and GLUT-1 and TTF-1 lacked any correlation. In nontumorous-HCV tissue, we found significantly high TTF-1, low RAGE and negative SOX2 expression. RAGE, GLUT-1 and SOX2 show non-significant elevation positivity in high grade HCV compared to low grade lesions. TTF-1, RAGE and SOX2 exhibited low expression in cirrhosis compared to fibrosis. Biochemical studies on serum and tissue extracts revealed significant down-regulation of RAGE, GLUT-1 and SOX2 genes, as well as significant up-regulation of the TTF-1 gene in HCC cases compared to controls. All studied genes show significant correlation with HCC grade. In non-tumor tissue, only TTF-1 gene expression had a significant correlation with the fibrosis score. Conclusion: Higher expression of TTF-1, RAGE, GLUT-1 and SOX2 in HCC and dysplasia compared to non-tumor tissues indicates up-regulation of these markers as early events during the development of HCV-associated HCC. Creative Commons Attribution License
Functional and evolution characterization of SWEET sugar transporters in Ananas comosus.
Guo, Chengying; Li, Huayang; Xia, Xinyao; Liu, Xiuyuan; Yang, Long
2018-02-05
Sugars will eventually be exported transporters (SWEETs) are a group of recently identified sugar transporters in plants that play important roles in diverse physiological processes. However, currently, limited information about this gene family is available in pineapple (Ananas comosus). The availability of the recently released pineapple genome sequence provides the opportunity to identify SWEET genes in a Bromeliaceae family member at the genome level. In this study, 39 pineapple SWEET genes were identified in two pineapple cultivars (18 AnfSWEET and 21 AnmSWEET) and further phylogenetically classified into five clades. A phylogenetic analysis revealed distinct evolutionary paths for the SWEET genes of the two pineapple cultivars. The MD2 cultivar might have experienced a different expansion than the F153 cultivar because two additional duplications exist, which separately gave rise to clades III and IV. A gene exon/intron structure analysis showed that the pineapple SWEET genes contained highly conserved exon/intron numbers. An analysis of public RNA-seq data and expression profiling showed that SWEET genes may be involved in fruit development and ripening processes. AnmSWEET5 and AnmSWEET11 were highly expressed in the early stages of pineapple fruit development and then decreased. The study increases the understanding of the roles of SWEET genes in pineapple. Copyright © 2018 Elsevier Inc. All rights reserved.
MicroRNA signature of the human developing pancreas.
Rosero, Samuel; Bravo-Egana, Valia; Jiang, Zhijie; Khuri, Sawsan; Tsinoremas, Nicholas; Klein, Dagmar; Sabates, Eduardo; Correa-Medina, Mayrin; Ricordi, Camillo; Domínguez-Bendala, Juan; Diez, Juan; Pastori, Ricardo L
2010-09-22
MicroRNAs are non-coding RNAs that regulate gene expression including differentiation and development by either inhibiting translation or inducing target degradation. The aim of this study is to determine the microRNA expression signature during human pancreatic development and to identify potential microRNA gene targets calculating correlations between the signature microRNAs and their corresponding mRNA targets, predicted by bioinformatics, in genome-wide RNA microarray study. The microRNA signature of human fetal pancreatic samples 10-22 weeks of gestational age (wga), was obtained by PCR-based high throughput screening with Taqman Low Density Arrays. This method led to identification of 212 microRNAs. The microRNAs were classified in 3 groups: Group number I contains 4 microRNAs with the increasing profile; II, 35 microRNAs with decreasing profile and III with 173 microRNAs, which remain unchanged. We calculated Pearson correlations between the expression profile of microRNAs and target mRNAs, predicted by TargetScan 5.1 and miRBase algorithms, using genome-wide mRNA expression data. Group I correlated with the decreasing expression of 142 target mRNAs and Group II with the increasing expression of 876 target mRNAs. Most microRNAs correlate with multiple targets, just as mRNAs are targeted by multiple microRNAs. Among the identified targets are the genes and transcription factors known to play an essential role in pancreatic development. We have determined specific groups of microRNAs in human fetal pancreas that change the degree of their expression throughout the development. A negative correlative analysis suggests an intertwined network of microRNAs and mRNAs collaborating with each other. This study provides information leading to potential two-way level of combinatorial control regulating gene expression through microRNAs targeting multiple mRNAs and, conversely, target mRNAs regulated in parallel by other microRNAs as well. This study may further the understanding of gene expression regulation in the human developing pancreas.
MicroRNA signature of the human developing pancreas
2010-01-01
Background MicroRNAs are non-coding RNAs that regulate gene expression including differentiation and development by either inhibiting translation or inducing target degradation. The aim of this study is to determine the microRNA expression signature during human pancreatic development and to identify potential microRNA gene targets calculating correlations between the signature microRNAs and their corresponding mRNA targets, predicted by bioinformatics, in genome-wide RNA microarray study. Results The microRNA signature of human fetal pancreatic samples 10-22 weeks of gestational age (wga), was obtained by PCR-based high throughput screening with Taqman Low Density Arrays. This method led to identification of 212 microRNAs. The microRNAs were classified in 3 groups: Group number I contains 4 microRNAs with the increasing profile; II, 35 microRNAs with decreasing profile and III with 173 microRNAs, which remain unchanged. We calculated Pearson correlations between the expression profile of microRNAs and target mRNAs, predicted by TargetScan 5.1 and miRBase altgorithms, using genome-wide mRNA expression data. Group I correlated with the decreasing expression of 142 target mRNAs and Group II with the increasing expression of 876 target mRNAs. Most microRNAs correlate with multiple targets, just as mRNAs are targeted by multiple microRNAs. Among the identified targets are the genes and transcription factors known to play an essential role in pancreatic development. Conclusions We have determined specific groups of microRNAs in human fetal pancreas that change the degree of their expression throughout the development. A negative correlative analysis suggests an intertwined network of microRNAs and mRNAs collaborating with each other. This study provides information leading to potential two-way level of combinatorial control regulating gene expression through microRNAs targeting multiple mRNAs and, conversely, target mRNAs regulated in parallel by other microRNAs as well. This study may further the understanding of gene expression regulation in the human developing pancreas. PMID:20860821
Jiang, Xiangying; Ringwald, Martin; Blake, Judith; Shatkay, Hagit
2017-01-01
The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area. www.informatics.jax.org. © The Author(s) 2017. Published by Oxford University Press.
Classification based upon gene expression data: bias and precision of error rates.
Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L
2007-06-01
Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp
Molecular Mechanisms Related to Parturition-Induced Stress Urinary Incontinence
Lin, Guiting; Shindel, Alan W.; Banie, Lia; Deng, Donna; Wang, Guifang; Hayashi, Narihiko; Lin, Ching-Shwun; Lue, Tom F.
2010-01-01
Background The molecular mechanisms underlying stress urinary incontinence (SUI) at the tissue level are poorly understood. Objective To study genetic and molecular alterations in the urethra of animals with experimentally induced SUI. Design/Setting/Participants Cohort analysis of primiparous 2-month-old female Sprague-Dawley rats with experimentally induced SUI versus those who did not develop SUI in a university research laboratory setting Intervention Within 24 h of parturition, rats underwent intravaginal balloon dilation and bilateral ovariectomy. Transvesical cystometry was performed 12 wk after parturition. Rats were classified as continent (C) or incontinent (I) according to the results of cystometry. Measurements The expression of over 22,000 genes in urethral tissue from the two groups was assessed with the use of an oligo microarray. The expression of relevant genes was confirmed by real-time polymerase chain reaction. Protein expression of small mothers against decapentaplegic 2 (Smad2), one of the differentially expressed genes, was extensively studied by immunohistochemistry and Western blot analysis. Regulation of Smad2 activity by transforming growth factor-β (Tgf-β) was assessed in cultured urethral smooth muscle cells (USMCs). Results & Limitations After intervention, 14 (58.3%) rats remained continent and 10 (41.7%) became incontinent. There were significant differences in the expression of 42 urethral genes between continent and incontinent rats. The expression of genes involved in the TGF cellular signaling pathway (Smad2), collagen breakdown (matrix metalloproteinase 13 [Mmp13]), and smooth muscle inhibition (regulator of G-protein signaling 2 [Rgs2]) was significantly increased in the incontinent group. Smad2 protein expression was significantly upregulated in the incontinent rats. In cultured USMCs, Smad2 phosphorylation and nuclear translocation increased after Tgf-β treatment. Conclusions Genes important in inflammation, collagen breakdown, and smooth muscle inhibition are upregulated in the urethras of female rats with parturition-associated incontinence. PMID:18372098
Cochain, Clément; Vafadarnejad, Ehsan; Arampatzi, Panagiota; Jaroslav, Pelisek; Winkels, Holger; Ley, Klaus; Wolf, Dennis; Saliba, Antoine-Emmanuel; Zernecke, Alma
2018-03-15
Rationale: It is assumed that atherosclerotic arteries contain several macrophage subsets endowed with specific functions. The precise identity of these subsets is poorly characterized as they ha ve been defined by the expression of a restricted number of markers. Objective: We have applied single-cell RNA-seq as an unbiased profiling strategy to interrogate and classify aortic macrophage heterogeneity at the single-cell level in atherosclerosis. Methods and Results: We performed single-cell RNA sequencing of total aortic CD45 + cells extracted from the non-diseased (chow fed) and atherosclerotic (11 weeks of high fat diet) aorta of Ldlr -/- mice. Unsupervised clustering singled out 13 distinct aortic cell clusters. Among the myeloid cell populations, Resident-like macrophages with a gene expression profile similar to aortic resident macrophages were found in healthy and diseased aortae, whereas monocytes, monocyte-derived dendritic cells (MoDC), and two populations of macrophages were almost exclusively detectable in atherosclerotic aortae, comprising Inflammatory macrophages showing enrichment in I l1b , and previously undescribed TREM2 hi macrophages. Differential gene expression and gene ontology enrichment analyses revealed specific gene expression patterns distinguishing these three macrophage subsets and MoDC, and uncovered putative functions of each cell type. Notably, TREM2 hi macrophages appeared to be endowed with specialized functions in lipid metabolism and catabolism, and presented a gene expression signature reminiscent of osteoclasts, suggesting a role in lesion calcification. TREM2 expression was moreover detected in human lesional macrophages. Importantly, these macrophage populations were present also in advanced atherosclerosis and in Apoe -/- aortae, indicating relevance of our findings in different stages of atherosclerosis and mouse models. Conclusions: These data unprecedentedly uncovered the transcriptional landscape and phenotypic heterogeneity of aortic macrophages and MoDCs in atherosclerotic and identified previously unrecognized macrophage populations and their gene expression signature, suggesting specialized functions. Our findings will open up novel opportunities to explore distinct myeloid cell populations and their functions in atherosclerosis.
Sheng, Sheng; Liao, Cheng-Wu; Zheng, Yu; Zhou, Yu; Xu, Yan; Song, Wen-Miao; He, Peng; Zhang, Jian; Wu, Fu-An
2017-06-01
Meteorus pulchricornis is an endoparasitoid wasp which attacks the larvae of various lepidopteran pests. We present the first antennal transcriptome dataset for M. pulchricornis. A total of 48,845,072 clean reads were obtained and 34,967 unigenes were assembled. Of these, 15,458 unigenes showed a significant similarity (E-value <10 -5 ) to known proteins in the NCBI non-redundant protein database. Gene ontology (GO) and cluster of orthologous groups (COG) analyses were used to classify the functions of M. pulchricornis antennae genes. We identified 16 putative odorant-binding protein (OBP) genes, eight chemosensory protein (CSP) genes, 99 olfactory receptor (OR) genes, 19 ionotropic receptor (IR) genes and one sensory neuron membrane protein (SNMP) gene. BLASTx best hit results and phylogenetic analysis both indicated that these chemosensory genes were most closely related to those found in other hymenopteran species. Real-time quantitative PCR assays showed that 14 MpulOBP genes were antennae-specific. Of these, MpulOBP6, MpulOBP9, MpulOBP10, MpulOBP12, MpulOBP15 and MpulOBP16 were found to have greater expression in the antennae than in other body parts, while MpulOBP2 and MpulOBP3 were expressed predominately in the legs and abdomens, respectively. These results might provide a foundation for future studies of olfactory genes and chemoreception in M. pulchricornis. Copyright © 2017 Elsevier Inc. All rights reserved.
Biologic Phenotyping of the Human Small Airway Epithelial Response to Cigarette Smoking
Tilley, Ann E.; O'Connor, Timothy P.; Hackett, Neil R.; Strulovici-Barel, Yael; Salit, Jacqueline; Amoroso, Nancy; Zhou, Xi Kathy; Raman, Tina; Omberg, Larsson; Clark, Andrew; Mezey, Jason; Crystal, Ronald G.
2011-01-01
Background The first changes associated with smoking are in the small airway epithelium (SAE). Given that smoking alters SAE gene expression, but only a fraction of smokers develop chronic obstructive pulmonary disease (COPD), we hypothesized that assessment of SAE genome-wide gene expression would permit biologic phenotyping of the smoking response, and that a subset of healthy smokers would have a “COPD-like” SAE transcriptome. Methodology/Principal Findings SAE (10th–12th generation) was obtained via bronchoscopy of healthy nonsmokers, healthy smokers and COPD smokers and microarray analysis was used to identify differentially expressed genes. Individual responsiveness to smoking was quantified with an index representing the % of smoking-responsive genes abnormally expressed (ISAE), with healthy smokers grouped into “high” and “low” responders based on the proportion of smoking-responsive genes up- or down-regulated in each smoker. Smokers demonstrated significant variability in SAE transcriptome with ISAE ranging from 2.9 to 51.5%. While the SAE transcriptome of “low” responder healthy smokers differed from both “high” responders and smokers with COPD, the transcriptome of the “high” responder healthy smokers was indistinguishable from COPD smokers. Conclusion/Significance The SAE transcriptome can be used to classify clinically healthy smokers into subgroups with lesser and greater responses to cigarette smoking, even though these subgroups are indistinguishable by clinical criteria. This identifies a group of smokers with a “COPD-like” SAE transcriptome. PMID:21829517
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Comparative study of classification algorithms for immunosignaturing data
2012-01-01
Background High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data. Results We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy. Conclusions ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties. PMID:22720696
Sojikul, Punchapat; Kongsawadworakul, Panida; Viboonjun, Unchera; Thaiprasit, Jittrawan; Intawong, Burapat; Narangajavana, Jarunya; Svasti, Mom Rajawong Jisnuson
2010-10-01
Cassava (Manihot esculenta Crantz) is a root crop that accumulates large quantities of starch, and it is an important source of carbohydrate. Study on gene expressions during storage root development provides important information on storage root formation and starch accumulation as well as unlock new traits for improving of starch yield. cDNA-Amplified Fragment Length Polymorphism (AFLP) was used to compare gene expression profiles in fibrous and storage roots of cassava cultivar Kasetsart 50. Total of 155 differentially expressed transcript-derived fragments with undetectable or low expression in leaves were characterized and classified into 11 groups regarding to their functions. The four major groups were no similarity (20%), hypothetical or unknown proteins (17%), cellular metabolism and biosynthesis (17%) and cellular communication and signaling (14%). Interestingly, sulfite reductase (MeKD82), calcium-dependent protein kinase (CDPK) (MeKD83), ent-kaurene synthase (KS) (MeKD106) and hexose transporter (HT) (MeKD154) showed root-specific expression patterns. This finding is consistent with previously reported genes involved in the initiation of potato tuber. Semi-quantitative reverse transcription polymerase chain reaction of early-developed root samples confirmed that those four genes exhibited significant expression with similar pattern in the storage root initiation and early developmental stages. We proposed that KS and HT may involve in transient induction of CDPK expression, which may play an important role in the signaling pathway of storage root initiation. Sulfite reductase, on the other hand, may involve in storage root development by facilitating sulfur-containing protein biosynthesis or detoxifying the cyanogenic glucoside content through aspartate biosynthesis. Copyright © Physiologia Plantarum 2010.
The Zur regulon of Corynebacterium glutamicum ATCC 13032
2010-01-01
Background Zinc is considered as an essential element for all living organisms, but it can be toxic at large concentrations. Bacteria therefore tightly regulate zinc metabolism. The Cg2502 protein of Corynebacterium glutamicum was a candidate to control zinc metabolism in this species, since it was classified as metalloregulator of the zinc uptake regulator (Zur) subgroup of the ferric uptake regulator (Fur) family of DNA-binding transcription regulators. Results The cg2502 (zur) gene was deleted in the chromosome of C. glutamicum ATCC 13032 by an allelic exchange procedure to generate the zur-deficient mutant C. glutamicum JS2502. Whole-genome DNA microarray hybridizations and real-time RT-PCR assays comparing the gene expression in C. glutamicum JS2502 with that of the wild-type strain detected 18 genes with enhanced expression in the zur mutant. The expression data were combined with results from cross-genome comparisons of shared regulatory sites, revealing the presence of candidate Zur-binding sites in the mapped promoter regions of five transcription units encoding components of potential zinc ABC-type transporters (cg0041-cg0042/cg0043; cg2911-cg2912-cg2913), a putative secreted protein (cg0040), a putative oxidoreductase (cg0795), and a putative P-loop GTPase of the COG0523 protein family (cg0794). Enhanced transcript levels of the respective genes in C. glutamicum JS2502 were verified by real-time RT-PCR, and complementation of the mutant with a wild-type zur gene reversed the effect of differential gene expression. The zinc-dependent expression of the putative cg0042 and cg2911 operons was detected in vivo with a gfp reporter system. Moreover, the zinc-dependent binding of purified Zur protein to double-stranded 40-mer oligonucleotides containing candidate Zur-binding sites was demonstrated in vitro by DNA band shift assays. Conclusion Whole-genome expression profiling and DNA band shift assays demonstrated that Zur directly represses in a zinc-dependent manner the expression of nine genes organized in five transcription units. Accordingly, the Zur (Cg2502) protein is the key transcription regulator for genes involved in zinc homeostasis in C. glutamicum. PMID:20055984
Intraclonal Cell Expansion and Selection Driven by B Cell Receptor in Chronic Lymphocytic Leukemia
Colombo, Monica; Cutrona, Giovanna; Reverberi, Daniele; Fabris, Sonia; Neri, Antonino; Fabbi, Marina; Quintana, Giovanni; Quarta, Giovanni; Ghiotto, Fabio; Fais, Franco; Ferrarini, Manlio
2011-01-01
The mutational status of the immunoglobulin heavy-chain variable region (IGHV) genes utilized by chronic lymphocytic leukemia (CLL) clones defines two disease subgroups. Patients with unmutated IGHV have a more aggressive disease and a worse outcome than patients with cells having somatic IGHV gene mutations. Moreover, up to 30% of the unmutated CLL clones exhibit very similar or identical B cell receptors (BcR), often encoded by the same IG genes. These “stereotyped” BcRs have been classified into defined subsets. The presence of an IGHV gene somatic mutation and the utilization of a skewed gene repertoire compared with normal B cells together with the expression of stereotyped receptors by unmutated CLL clones may indicate stimulation/selection by antigenic epitopes. This antigenic stimulation may occur prior to or during neoplastic transformation, but it is unknown whether this stimulation/selection continues after leukemogenesis has ceased. In this study, we focused on seven CLL cases with stereotyped BcR Subset #8 found among a cohort of 700 patients; in six, the cells expressed IgG and utilized IGHV4-39 and IGKV1-39/IGKV1D-39 genes, as reported for Subset #8 BcR. One case exhibited special features, including expression of IgM or IgG by different subclones consequent to an isotype switch, allelic inclusion at the IGH locus in the IgM-expressing cells and a particular pattern of cytogenetic lesions. Collectively, the data indicate a process of antigenic stimulation/selection of the fully transformed CLL cells leading to the expansion of the Subset #8 IgG-bearing subclone. PMID:21541442
Ethylene-induced differential gene expression during abscission of citrus leaves
Merelo, Paz; Cercós, Manuel; Tadeo, Francisco R.; Talón, Manuel
2008-01-01
The main objective of this work was to identify and classify genes involved in the process of leaf abscission in Clementina de Nules (Citrus clementina Hort. Ex Tan.). A 7 K unigene citrus cDNA microarray containing 12 K spots was used to characterize the transcriptome of the ethylene-induced abscission process in laminar abscission zone-enriched tissues and the petiole of debladed leaf explants. In these conditions, ethylene induced 100% leaf explant abscission in 72 h while, in air-treated samples, the abscission period started later and took 240 h. Gene expression monitored during the first 36 h of ethylene treatment showed that out of the 12 672 cDNA microarray probes, ethylene differentially induced 725 probes distributed as follows: 216 (29.8%) probes in the laminar abscission zone and 509 (70.2%) in the petiole. Functional MIPS classification and manual annotation of differentially expressed genes highlighted key processes regulating the activation and progress of the cell separation that brings about abscission. These included cell-wall modification, lipid transport, protein biosynthesis and degradation, and differential activation of signal transduction and transcription control pathways. Expression data associated with the petiole indicated the occurrence of a double defensive strategy mediated by the activation of a biochemical programme including scavenging ROS, defence and PR genes, and a physical response mostly based on lignin biosynthesis and deposition. This work identifies new genes probably involved in the onset and development of the leaf abscission process and suggests a different but co-ordinated and complementary role for the laminar abscission zone and the petiole during the process of abscission. PMID:18515267
Yoon, Dukyong; Kim, Hyosil; Suh-Kim, Haeyoung; Park, Rae Woong; Lee, KiYoung
2011-01-01
Microarray analyses based on differentially expressed genes (DEGs) have been widely used to distinguish samples across different cellular conditions. However, studies based on DEGs have not been able to clearly determine significant differences between samples of pathophysiologically similar HIV-1 stages, e.g., between acute and chronic progressive (or AIDS) or between uninfected and clinically latent stages. We here suggest a novel approach to allow such discrimination based on stage-specific genetic features of HIV-1 infection. Our approach is based on co-expression changes of genes known to interact. The method can identify a genetic signature for a single sample as contrasted with existing protein-protein-based analyses with correlational designs. Our approach distinguishes each sample using differentially co-expressed interacting protein pairs (DEPs) based on co-expression scores of individual interacting pairs within a sample. The co-expression score has positive value if two genes in a sample are simultaneously up-regulated or down-regulated. And the score has higher absolute value if expression-changing ratios are similar between the two genes. We compared characteristics of DEPs with that of DEGs by evaluating their usefulness in separation of HIV-1 stage. And we identified DEP-based network-modules and their gene-ontology enrichment to find out the HIV-1 stage-specific gene signature. Based on the DEP approach, we observed clear separation among samples from distinct HIV-1 stages using clustering and principal component analyses. Moreover, the discrimination power of DEPs on the samples (70-100% accuracy) was much higher than that of DEGs (35-45%) using several well-known classifiers. DEP-based network analysis also revealed the HIV-1 stage-specific network modules; the main biological processes were related to "translation," "RNA splicing," "mRNA, RNA, and nucleic acid transport," and "DNA metabolism." Through the HIV-1 stage-related modules, changing stage-specific patterns of protein interactions could be observed. DEP-based method discriminated the HIV-1 infection stages clearly, and revealed a HIV-1 stage-specific gene signature. The proposed DEP-based method might complement existing DEG-based approaches in various microarray expression analyses.