Sample records for gene expression-based classification

  1. Gene selection for tumor classification using neighborhood rough sets and entropy measures.

    PubMed

    Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu

    2017-03-01

    With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Minimising Immunohistochemical False Negative ER Classification Using a Complementary 23 Gene Expression Signature of ER Status

    PubMed Central

    Li, Qiyuan; Eklund, Aron C.; Juul, Nicolai; Haibe-Kains, Benjamin; Workman, Christopher T.; Richardson, Andrea L.; Szallasi, Zoltan; Swanton, Charles

    2010-01-01

    Background Expression of the oestrogen receptor (ER) in breast cancer predicts benefit from endocrine therapy. Minimising the frequency of false negative ER status classification is essential to identify all patients with ER positive breast cancers who should be offered endocrine therapies in order to improve clinical outcome. In routine oncological practice ER status is determined by semi-quantitative methods such as immunohistochemistry (IHC) or other immunoassays in which the ER expression level is compared to an empirical threshold[1], [2]. The clinical relevance of gene expression-based ER subtypes as compared to IHC-based determination has not been systematically evaluated. Here we attempt to reduce the frequency of false negative ER status classification using two gene expression approaches and compare these methods to IHC based ER status in terms of predictive and prognostic concordance with clinical outcome. Methodology/Principal Findings Firstly, ER status was discriminated by fitting the bimodal expression of ESR1 to a mixed Gaussian model. The discriminative power of ESR1 suggested bimodal expression as an efficient way to stratify breast cancer; therefore we identified a set of genes whose expression was both strongly bimodal, mimicking ESR expression status, and highly expressed in breast epithelial cell lines, to derive a 23-gene ER expression signature-based classifier. We assessed our classifiers in seven published breast cancer cohorts by comparing the gene expression-based ER status to IHC-based ER status as a predictor of clinical outcome in both untreated and tamoxifen treated cohorts. In untreated breast cancer cohorts, the 23 gene signature-based ER status provided significantly improved prognostic power compared to IHC-based ER status (P = 0.006). In tamoxifen-treated cohorts, the 23 gene ER expression signature predicted clinical outcome (HR = 2.20, P = 0.00035). These complementary ER signature-based strategies estimated that between 15.1% and 21.8% patients of IHC-based negative ER status would be classified with ER positive breast cancer. Conclusion/Significance Expression-based ER status classification may complement IHC to minimise false negative ER status classification and optimise patient stratification for endocrine therapies. PMID:21152022

  3. AUCTSP: an improved biomarker gene pair class predictor.

    PubMed

    Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin T

    2018-06-26

    The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

  4. iPcc: a novel feature extraction method for accurate disease class discovery and prediction

    PubMed Central

    Ren, Xianwen; Wang, Yong; Zhang, Xiang-Sun; Jin, Qi

    2013-01-01

    Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles. PMID:23761440

  5. Validation of the Lung Subtyping Panel in Multiple Fresh-Frozen and Formalin-Fixed, Paraffin-Embedded Lung Tumor Gene Expression Data Sets.

    PubMed

    Faruki, Hawazin; Mayhew, Gregory M; Fan, Cheng; Wilkerson, Matthew D; Parker, Scott; Kam-Morgan, Lauren; Eisenberg, Marcia; Horten, Bruce; Hayes, D Neil; Perou, Charles M; Lai-Goldman, Myla

    2016-06-01

    Context .- A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective .- To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design .- The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription-polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results .- The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions .- The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly when classification by standard methods is challenging.

  6. Cloud-scale genomic signals processing classification analysis for gene expression microarray data.

    PubMed

    Harvey, Benjamin; Soo-Yeon Ji

    2014-01-01

    As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring inference though analysis of DNA/mRNA sequence data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological inference by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale classification analysis of microarray data using Wavelet thresholding in a Cloud environment to identify significantly expressed features. This paper proposes a novel methodology that uses Wavelet based Denoising to initialize a threshold for determination of significantly expressed genes for classification. Additionally, this research was implemented and encompassed within cloud-based distributed processing environment. The utilization of Cloud computing and Wavelet thresholding was used for the classification 14 tumor classes from the Global Cancer Map (GCM). The results proved to be more accurate than using a predefined p-value for differential expression classification. This novel methodology analyzed Wavelet based threshold features of gene expression in a Cloud environment, furthermore classifying the expression of samples by analyzing gene patterns, which inform us of biological processes. Moreover, enabling researchers to face the present and forthcoming challenges that may arise in the analysis of data in functional genomics of large microarray datasets.

  7. Integrating Colon Cancer Microarray Data: Associating Locus-Specific Methylation Groups to Gene Expression-Based Classifications.

    PubMed

    Barat, Ana; Ruskin, Heather J; Byrne, Annette T; Prehn, Jochen H M

    2015-11-23

    Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC) and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like) finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype.

  8. Integrating Colon Cancer Microarray Data: Associating Locus-Specific Methylation Groups to Gene Expression-Based Classifications

    PubMed Central

    Barat, Ana; Ruskin, Heather J.; Byrne, Annette T.; Prehn, Jochen H. M.

    2015-01-01

    Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC) and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like) finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype. PMID:27600244

  9. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Gene Selection and Cancer Classification: A Rough Sets Based Approach

    NASA Astrophysics Data System (ADS)

    Sun, Lijun; Miao, Duoqian; Zhang, Hongyun

    Indentification of informative gene subsets responsible for discerning between available samples of gene expression data is an important task in bioinformatics. Reducts, from rough sets theory, corresponding to a minimal set of essential genes for discerning samples, is an efficient tool for gene selection. Due to the compuational complexty of the existing reduct algoritms, feature ranking is usually used to narrow down gene space as the first step and top ranked genes are selected . In this paper,we define a novel certierion based on the expression level difference btween classes and contribution to classification of the gene for scoring genes and present a algorithm for generating all possible reduct from informative genes.The algorithm takes the whole attribute sets into account and find short reduct with a significant reduction in computational complexity. An exploration of this approach on benchmark gene expression data sets demonstrates that this approach is successful for selecting high discriminative genes and the classification accuracy is impressive.

  11. A comprehensive simulation study on classification of RNA-Seq data.

    PubMed

    Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

    2017-01-01

    RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.

  12. The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.

    PubMed

    Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo

    2018-05-17

    The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.

  13. Grouped gene selection and multi-classification of acute leukemia via new regularized multinomial regression.

    PubMed

    Li, Juntao; Wang, Yanyan; Jiang, Tao; Xiao, Huimin; Song, Xuekun

    2018-05-09

    Diagnosing acute leukemia is the necessary prerequisite to treating it. Multi-classification on the gene expression data of acute leukemia is help for diagnosing it which contains B-cell acute lymphoblastic leukemia (BALL), T-cell acute lymphoblastic leukemia (TALL) and acute myeloid leukemia (AML). However, selecting cancer-causing genes is a challenging problem in performing multi-classification. In this paper, weighted gene co-expression networks are employed to divide the genes into groups. Based on the dividing groups, a new regularized multinomial regression with overlapping group lasso penalty (MROGL) has been presented to simultaneously perform multi-classification and select gene groups. By implementing this method on three-class acute leukemia data, the grouped genes which work synergistically are identified, and the overlapped genes shared by different groups are also highlighted. Moreover, MROGL outperforms other five methods on multi-classification accuracy. Copyright © 2017. Published by Elsevier B.V.

  14. Biological classification with RNA-Seq data: Can alternatively spliced transcript expression enhance machine learning classifier?

    PubMed

    Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry

    2018-06-25

    The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  15. Identification of an Efficient Gene Expression Panel for Glioblastoma Classification

    PubMed Central

    Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni

    2016-01-01

    We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170

  16. Grouping patients for masseter muscle genotype-phenotype studies.

    PubMed

    Moawad, Hadwah Abdelmatloub; Sinanan, Andrea C M; Lewis, Mark P; Hunt, Nigel P

    2012-03-01

    To use various facial classifications, including either/both vertical and horizontal facial criteria, to assess their effects on the interpretation of masseter muscle (MM) gene expression. Fresh MM biopsies were obtained from 29 patients (age, 16-36 years) with various facial phenotypes. Based on clinical and cephalometric analysis, patients were grouped using three different classifications: (1) basic vertical, (2) basic horizontal, and (3) combined vertical and horizontal. Gene expression levels of the myosin heavy chain genes MYH1, MYH2, MYH3, MYH6, MYH7, and MYH8 were recorded using quantitative reverse transcriptase polymerase chain reaction (RT-PCR) and were related to the various classifications. The significance level for statistical analysis was set at P ≤ .05. Using classification 1, none of the MYH genes were found to be significantly different between long face (LF) patients and the average vertical group. Using classification 2, MYH3, MYH6, and MYH7 genes were found to be significantly upregulated in retrognathic patients compared with prognathic and average horizontal groups. Using classification 3, only the MYH7 gene was found to be significantly upregulated in retrognathic LF compared with prognathic LF, prognathic average vertical faces, and average vertical and horizontal groups. The use of basic vertical or basic horizontal facial classifications may not be sufficient for genetics-based studies of facial phenotypes. Prognathic and retrognathic facial phenotypes have different MM gene expressions; therefore, it is not recommended to combine them into one single group, even though they may have a similar vertical facial phenotype.

  17. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco

    2011-01-01

    Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.

  18. Novel gene sets improve set-level classification of prokaryotic gene expression data.

    PubMed

    Holec, Matěj; Kuželka, Ondřej; Železný, Filip

    2015-10-28

    Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

  19. CARSVM: a class association rule-based classification framework and its application to gene expression data.

    PubMed

    Kianmehr, Keivan; Alhajj, Reda

    2008-09-01

    In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.

  20. Importance of correlation between gene expression levels: application to the type I interferon signature in rheumatoid arthritis.

    PubMed

    Reynier, Frédéric; Petit, Fabien; Paye, Malick; Turrel-Davin, Fanny; Imbert, Pierre-Emmanuel; Hot, Arnaud; Mougin, Bruno; Miossec, Pierre

    2011-01-01

    The analysis of gene expression data shows that many genes display similarity in their expression profiles suggesting some co-regulation. Here, we investigated the co-expression patterns in gene expression data and proposed a correlation-based research method to stratify individuals. Using blood from rheumatoid arthritis (RA) patients, we investigated the gene expression profiles from whole blood using Affymetrix microarray technology. Co-expressed genes were analyzed by a biclustering method, followed by gene ontology analysis of the relevant biclusters. Taking the type I interferon (IFN) pathway as an example, a classification algorithm was developed from the 102 RA patients and extended to 10 systemic lupus erythematosus (SLE) patients and 100 healthy volunteers to further characterize individuals. We developed a correlation-based algorithm referred to as Classification Algorithm Based on a Biological Signature (CABS), an alternative to other approaches focused specifically on the expression levels. This algorithm applied to the expression of 35 IFN-related genes showed that the IFN signature presented a heterogeneous expression between RA, SLE and healthy controls which could reflect the level of global IFN signature activation. Moreover, the monitoring of the IFN-related genes during the anti-TNF treatment identified changes in type I IFN gene activity induced in RA patients. In conclusion, we have proposed an original method to analyze genes sharing an expression pattern and a biological function showing that the activation levels of a biological signature could be characterized by its overall state of correlation.

  1. Multiclass cancer diagnosis using tumor gene expression signatures

    DOE PAGES

    Ramaswamy, S.; Tamayo, P.; Rifkin, R.; ...

    2001-12-11

    The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a supportmore » vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.« less

  2. MO-DE-207B-03: Improved Cancer Classification Using Patient-Specific Biological Pathway Information Via Gene Expression Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Young, M; Craft, D

    Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less

  3. Identification of suitable genes contributes to lung adenocarcinoma clustering by multiple meta-analysis methods.

    PubMed

    Yang, Ze-Hui; Zheng, Rui; Gao, Yuan; Zhang, Qiang

    2016-09-01

    With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies. We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods. Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity. Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01). Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research. © 2015 John Wiley & Sons Ltd.

  4. Challenges in projecting clustering results across gene expression-profiling datasets.

    PubMed

    Lusa, Lara; McShane, Lisa M; Reid, James F; De Cecco, Loris; Ambrogi, Federico; Biganzoli, Elia; Gariboldi, Manuela; Pierotti, Marco A

    2007-11-21

    Gene expression microarray studies for several types of cancer have been reported to identify previously unknown subtypes of tumors. For breast cancer, a molecular classification consisting of five subtypes based on gene expression microarray data has been proposed. These subtypes have been reported to exist across several breast cancer microarray studies, and they have demonstrated some association with clinical outcome. A classification rule based on the method of centroids has been proposed for identifying the subtypes in new collections of breast cancer samples; the method is based on the similarity of the new profiles to the mean expression profile of the previously identified subtypes. Previously identified centroids of five breast cancer subtypes were used to assign 99 breast cancer samples, including a subset of 65 estrogen receptor-positive (ER+) samples, to five breast cancer subtypes based on microarray data for the samples. The effect of mean centering the genes (i.e., transforming the expression of each gene so that its mean expression is equal to 0) on subtype assignment by method of centroids was assessed. Further studies of the effect of mean centering and of class prevalence in the test set on the accuracy of method of centroids classifications of ER status were carried out using training and test sets for which ER status had been independently determined by ligand-binding assay and for which the proportion of ER+ and ER- samples were systematically varied. When all 99 samples were considered, mean centering before application of the method of centroids appeared to be helpful for correctly assigning samples to subtypes, as evidenced by the expression of genes that had previously been used as markers to identify the subtypes. However, when only the 65 ER+ samples were considered for classification, many samples appeared to be misclassified, as evidenced by an unexpected distribution of ER+ samples among the resultant subtypes. When genes were mean centered before classification of samples for ER status, the accuracy of the ER subgroup assignments was highly dependent on the proportion of ER+ samples in the test set; this effect of subtype prevalence was not seen when gene expression data were not mean centered. Simple corrections such as mean centering of genes aimed at microarray platform or batch effect correction can have undesirable consequences because patient population effects can easily be confused with these assay-related effects. Careful thought should be given to the comparability of the patient populations before attempting to force data comparability for purposes of assigning subtypes to independent subjects.

  5. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

    PubMed

    Doungpan, Narumol; Engchuan, Worrawat; Chan, Jonathan H; Meechai, Asawin

    2016-12-05

    Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results. The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms. The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method. The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.

  6. Genic insights from integrated human proteomics in GeneCards.

    PubMed

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.

  7. Lung tumor diagnosis and subtype discovery by gene expression profiling.

    PubMed

    Wang, Lu-yong; Tu, Zhuowen

    2006-01-01

    The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.

  8. Gene expression-based molecular diagnostic system for malignant gliomas is superior to histological diagnosis.

    PubMed

    Shirahata, Mitsuaki; Iwao-Koizumi, Kyoko; Saito, Sakae; Ueno, Noriko; Oda, Masashi; Hashimoto, Nobuo; Takahashi, Jun A; Kato, Kikuya

    2007-12-15

    Current morphology-based glioma classification methods do not adequately reflect the complex biology of gliomas, thus limiting their prognostic ability. In this study, we focused on anaplastic oligodendroglioma and glioblastoma, which typically follow distinct clinical courses. Our goal was to construct a clinically useful molecular diagnostic system based on gene expression profiling. The expression of 3,456 genes in 32 patients, 12 and 20 of whom had prognostically distinct anaplastic oligodendroglioma and glioblastoma, respectively, was measured by PCR array. Next to unsupervised methods, we did supervised analysis using a weighted voting algorithm to construct a diagnostic system discriminating anaplastic oligodendroglioma from glioblastoma. The diagnostic accuracy of this system was evaluated by leave-one-out cross-validation. The clinical utility was tested on a microarray-based data set of 50 malignant gliomas from a previous study. Unsupervised analysis showed divergent global gene expression patterns between the two tumor classes. A supervised binary classification model showed 100% (95% confidence interval, 89.4-100%) diagnostic accuracy by leave-one-out cross-validation using 168 diagnostic genes. Applied to a gene expression data set from a previous study, our model correlated better with outcome than histologic diagnosis, and also displayed 96.6% (28 of 29) consistency with the molecular classification scheme used for these histologically controversial gliomas in the original article. Furthermore, we observed that histologically diagnosed glioblastoma samples that shared anaplastic oligodendroglioma molecular characteristics tended to be associated with longer survival. Our molecular diagnostic system showed reproducible clinical utility and prognostic ability superior to traditional histopathologic diagnosis for malignant glioma.

  9. Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework.

    PubMed

    Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G

    2014-12-05

    Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.

  10. A Pathway Based Classification Method for Analyzing Gene Expression for Alzheimer's Disease Diagnosis.

    PubMed

    Voyle, Nicola; Keohane, Aoife; Newhouse, Stephen; Lunnon, Katie; Johnston, Caroline; Soininen, Hilkka; Kloszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon; Hodges, Angela; Kiddle, Steven; Dobson, Richard Jb

    2016-01-01

    Recent studies indicate that gene expression levels in blood may be able to differentiate subjects with Alzheimer's disease (AD) from normal elderly controls and mild cognitively impaired (MCI) subjects. However, there is limited replicability at the single marker level. A pathway-based interpretation of gene expression may prove more robust. This study aimed to investigate whether a case/control classification model built on pathway level data was more robust than a gene level model and may consequently perform better in test data. The study used two batches of gene expression data from the AddNeuroMed (ANM) and Dementia Case Registry (DCR) cohorts. Our study used Illumina Human HT-12 Expression BeadChips to collect gene expression from blood samples. Random forest modeling with recursive feature elimination was used to predict case/control status. Age and APOE ɛ4 status were used as covariates for all analysis. Gene and pathway level models performed similarly to each other and to a model based on demographic information only. Any potential increase in concordance from the novel pathway level approach used here has not lead to a greater predictive ability in these datasets. However, we have only tested one method for creating pathway level scores. Further, we have been able to benchmark pathways against genes in datasets that had been extensively harmonized. Further work should focus on the use of alternative methods for creating pathway level scores, in particular those that incorporate pathway topology, and the use of an endophenotype based approach.

  11. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  12. Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

    PubMed

    Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

    2006-03-01

    Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

  13. SoFoCles: feature filtering for microarray classification based on gene ontology.

    PubMed

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  14. Twenty-four signature genes predict the prognosis of oral squamous cell carcinoma with high accuracy and repeatability

    PubMed Central

    Gao, Jianyong; Tian, Gang; Han, Xu; Zhu, Qiang

    2018-01-01

    Oral squamous cell carcinoma (OSCC) is the sixth most common type cancer worldwide, with poor prognosis. The present study aimed to identify gene signatures that could classify OSCC and predict prognosis in different stages. A training data set (GSE41613) and two validation data sets (GSE42743 and GSE26549) were acquired from the online Gene Expression Omnibus database. In the training data set, patients were classified based on the tumor-node-metastasis staging system, and subsequently grouped into low stage (L) or high stage (H). Signature genes between L and H stages were selected by disparity index analysis, and classification was performed by the expression of these signature genes. The established classification was compared with the L and H classification, and fivefold cross validation was used to evaluate the stability. Enrichment analysis for the signature genes was implemented by the Database for Annotation, Visualization and Integration Discovery. Two validation data sets were used to determine the precise of classification. Survival analysis was conducted followed each classification using the package ‘survival’ in R software. A set of 24 signature genes was identified based on the classification model with the Fi value of 0.47, which was used to distinguish OSCC samples in two different stages. Overall survival of patients in the H stage was higher than those in the L stage. Signature genes were primarily enriched in ‘ether lipid metabolism’ pathway and biological processes such as ‘positive regulation of adaptive immune response’ and ‘apoptotic cell clearance’. The results provided a novel 24-gene set that may be used as biomarkers to predict OSCC prognosis with high accuracy, which may be used to determine an appropriate treatment program for patients with OSCC in addition to the traditional evaluation index. PMID:29257303

  15. Statistical approach for selection of biologically informative genes.

    PubMed

    Das, Samarendra; Rai, Anil; Mishra, D C; Rai, Shesh N

    2018-05-20

    Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies. Published by Elsevier B.V.

  16. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    PubMed Central

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028

  17. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    PubMed

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  18. Microarray Meta-Analysis Identifies Acute Lung Injury Biomarkers in Donor Lungs That Predict Development of Primary Graft Failure in Recipients

    PubMed Central

    Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia

    2012-01-01

    Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521

  19. Circular RNA and gene expression profiles in gastric cancer based on microarray chip technology.

    PubMed

    Sui, Weiguo; Shi, Zhoufang; Xue, Wen; Ou, Minglin; Zhu, Ying; Chen, Jiejing; Lin, Hua; Liu, Fuhua; Dai, Yong

    2017-03-01

    The aim of the present study was to screen gastric cancer (GC) tissue and adjacent tissue for differences in mRNA and circular (circRNA) expression, to analyze the differences in circRNA and mRNA expression, and to investigate the circRNA expression in gastric carcinoma and its mechanism. circRNA and mRNA differential expression profiles generated using Agilent microarray technology were analyzed in the GC tissues and adjacent tissues. qRT-PCR was used to verify the differential expression of circRNAs and mRNAs according to the interactions between circRNAs and miRNAs as well as the possible existence of miRNA and mRNA interactions. We found that: i) the circRNA expression profile revealed 1,285 significant differences in circRNA expression, with circRNA expression downregulated in 594 samples and upregulated in 691 samples via interactions with miRNAs. The qRT-PCR validation experiments showed that hsa_circRNA_400071, hsa_circRNA_000543 and hsa_circRNA_001959 expression was consistent with the microarray analysis results. ii) 29,112 genes were found in the GC tissues and adjacent tissues, including 5,460 differentially expressed genes. Among them, 2,390 differentially expressed genes were upregulated and 3,070 genes were downregulated. Gene Ontology (GO) analysis of the differentially expressed genes revealed these genes involved in biological process classification, cellular component classification and molecular function classification. Pathway analysis of the differentially expressed genes identified 83 significantly enriched genes, including 28 upregulated genes and 55 downregulated genes. iii) 69 differentially expressed circRNAs were found that might adsorb specific miRNAs to regulate the expression of their target gene mRNAs. The conclusions are: i) differentially expressed circRNAs had corresponding miRNA binding sites. These circRNAs regulated the expression of target genes through interactions with miRNAs and might become new molecular biomarkers for GC in the future. ii) Differentially expressed genes may be involved in the occurrence of GC via a variety of mechanisms. iii) CD44, CXXC5, MYH9, MALAT1 and other genes may have important implications for the occurrence and development of GC through the regulation, interaction, and mutual influence of circRNA-miRNA-mRNA via different mechanisms.

  20. Classification of ductal carcinoma in situ by gene expression profiling.

    PubMed

    Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes B G; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J

    2006-01-01

    Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples.

  1. Classification of ductal carcinoma in situ by gene expression profiling

    PubMed Central

    Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes BG; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J

    2006-01-01

    Introduction Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Methods Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. Results DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Conclusion Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples. PMID:17069663

  2. Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures

    PubMed Central

    Natsoulis, Georges; El Ghaoui, Laurent; Lanckriet, Gert R.G.; Tolley, Alexander M.; Leroy, Fabrice; Dunlea, Shane; Eynon, Barrett P.; Pearson, Cecelia I.; Tugendreich, Stuart; Jarnagin, Kurt

    2005-01-01

    A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class. PMID:15867433

  3. Integrated computational biology analysis to evaluate target genes for chronic myelogenous leukemia.

    PubMed

    Zheng, Yu; Wang, Yu-Ping; Cao, Hongbao; Chen, Qiusheng; Zhang, Xi

    2018-06-05

    Although hundreds of genes have been linked to chronic myelogenous leukemia (CML), many of the results lack reproducibility. In the present study, data across multiple modalities were integrated to evaluate 579 CML candidate genes, including literature‑based CML‑gene relation data, Gene Expression Omnibus RNA expression data and pathway‑based gene‑gene interaction data. The expression data included samples from 76 patients with CML and 73 healthy controls. For each target gene, four metrics were proposed and tested with case/control classification. The effectiveness of the four metrics presented was demonstrated by the high classification accuracy (94.63%; P<2x10‑4). Cross metric analysis suggested nine top candidate genes for CML: Epidermal growth factor receptor, tumor protein p53, catenin β 1, janus kinase 2, tumor necrosis factor, abelson murine leukemia viral oncogene homolog 1, vascular endothelial growth factor A, B‑cell lymphoma 2 and proto‑oncogene tyrosine‑protein kinase. In addition, 145 CML candidate pathways enriched with 485 out of 579 genes were identified (P<8.2x10‑11; q=0.005). In conclusion, weighted genetic networks generated using computational biology may be complementary to biological experiments for the evaluation of known or novel CML target genes.

  4. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  5. Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

    PubMed Central

    Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.

    2013-01-01

    Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761

  6. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data.

    PubMed

    Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.

  7. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    PubMed Central

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  8. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification.

    PubMed

    Oberthuer, André; Berthold, Frank; Warnat, Patrick; Hero, Barbara; Kahlert, Yvonne; Spitz, Rüdiger; Ernestus, Karen; König, Rainer; Haas, Stefan; Eils, Roland; Schwab, Manfred; Brors, Benedikt; Westermann, Frank; Fischer, Matthias

    2006-11-01

    To develop a gene expression-based classifier for neuroblastoma patients that reliably predicts courses of the disease. Two hundred fifty-one neuroblastoma specimens were analyzed using a customized oligonucleotide microarray comprising 10,163 probes for transcripts with differential expression in clinical subgroups of the disease. Subsequently, the prediction analysis for microarrays (PAM) was applied to a first set of patients with maximally divergent clinical courses (n = 77). The classification accuracy was estimated by a complete 10-times-repeated 10-fold cross validation, and a 144-gene predictor was constructed from this set. This classifier's predictive power was evaluated in an independent second set (n = 174) by comparing results of the gene expression-based classification with those of risk stratification systems of current trials from Germany, Japan, and the United States. The first set of patients was accurately predicted by PAM (cross-validated accuracy, 99%). Within the second set, the PAM classifier significantly separated cohorts with distinct courses (3-year event-free survival [EFS] 0.86 +/- 0.03 [favorable; n = 115] v 0.52 +/- 0.07 [unfavorable; n = 59] and 3-year overall survival 0.99 +/- 0.01 v 0.84 +/- 0.05; both P < .0001) and separated risk groups of current neuroblastoma trials into subgroups with divergent outcome (NB2004: low-risk 3-year EFS 0.86 +/- 0.04 v 0.25 +/- 0.15, P < .0001; intermediate-risk 1.00 v 0.57 +/- 0.19, P = .018; high-risk 0.81 +/- 0.10 v 0.56 +/- 0.08, P = .06). In a multivariate Cox regression model, the PAM predictor classified patients of the second set more accurately than risk stratification of current trials from Germany, Japan, and the United States (P < .001; hazard ratio, 4.756 [95% CI, 2.544 to 8.893]). Integration of gene expression-based class prediction of neuroblastoma patients may improve risk estimation of current neuroblastoma trials.

  9. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

    PubMed

    Baur, Brittany; Bozdag, Serdar

    2016-01-01

    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

  10. A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It's Application to Tumor Gene Expression Data Analysis.

    PubMed

    Li, Jiangeng; Su, Lei; Pang, Zenan

    2015-12-01

    Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.

  11. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification

    PubMed Central

    Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661

  12. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification.

    PubMed

    Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

  13. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  14. Distinct types of primary cutaneous large B-cell lymphoma identified by gene expression profiling.

    PubMed

    Hoefnagel, Juliette J; Dijkman, Remco; Basso, Katia; Jansen, Patty M; Hallermann, Christian; Willemze, Rein; Tensen, Cornelis P; Vermeer, Maarten H

    2005-05-01

    In the European Organization for Research and Treatment of Cancer (EORTC) classification 2 types of primary cutaneous large B-cell lymphoma (PCLBCL) are distinguished: primary cutaneous follicle center cell lymphomas (PCFCCL) and PCLBCL of the leg (PCLBCL-leg). Distinction between both groups is considered important because of differences in prognosis (5-year survival > 95% and 52%, respectively) and the first choice of treatment (radiotherapy or systemic chemotherapy, respectively), but is not generally accepted. To establish a molecular basis for this subdivision in the EORTC classification, we investigated the gene expression profiles of 21 PCLBCLs by oligonucleotide microarray analysis. Hierarchical clustering based on a B-cell signature (7450 genes) classified PCLBCL into 2 distinct subgroups consisting of, respectively, 8 PCFCCLs and 13 PCLBCLsleg. PCLBCLs-leg showed increased expression of genes associated with cell proliferation; the proto-oncogenes Pim-1, Pim-2, and c-Myc; and the transcription factors Mum1/IRF4 and Oct-2. In the group of PCFCCL high expression of SPINK2 was observed. Further analysis suggested that PCFCCLs and PCLBCLs-leg have expression profiles similar to that of germinal center B-cell-like and activated B-cell-like diffuse large B-cell lymphoma, respectively. The results of this study suggest that different pathogenetic mechanisms are involved in the development of PCFCCLs and PCLBCLs-leg and provide molecular support for the subdivision used in the EORTC classification.

  15. Inferring gene dependency network specific to phenotypic alteration based on gene expression data and clinical information of breast cancer.

    PubMed

    Zhou, Xionghui; Liu, Juan

    2014-01-01

    Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for phenotypic change.

  16. A new approach to enhance the performance of decision tree for classifying gene expression data.

    PubMed

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  17. An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features.

    PubMed

    Nandi, Sutanu; Subramanian, Abhishek; Sarkar, Ram Rup

    2017-07-25

    Prediction of essential genes helps to identify a minimal set of genes that are absolutely required for the appropriate functioning and survival of a cell. The available machine learning techniques for essential gene prediction have inherent problems, like imbalanced provision of training datasets, biased choice of the best model for a given balanced dataset, choice of a complex machine learning algorithm, and data-based automated selection of biologically relevant features for classification. Here, we propose a simple support vector machine-based learning strategy for the prediction of essential genes in Escherichia coli K-12 MG1655 metabolism that integrates a non-conventional combination of an appropriate sample balanced training set, a unique organism-specific genotype, phenotype attributes that characterize essential genes, and optimal parameters of the learning algorithm to generate the best machine learning model (the model with the highest accuracy among all the models trained for different sample training sets). For the first time, we also introduce flux-coupled metabolic subnetwork-based features for enhancing the classification performance. Our strategy proves to be superior as compared to previous SVM-based strategies in obtaining a biologically relevant classification of genes with high sensitivity and specificity. This methodology was also trained with datasets of other recent supervised classification techniques for essential gene classification and tested using reported test datasets. The testing accuracy was always high as compared to the known techniques, proving that our method outperforms known methods. Observations from our study indicate that essential genes are conserved among homologous bacterial species, demonstrate high codon usage bias, GC content and gene expression, and predominantly possess a tendency to form physiological flux modules in metabolism.

  18. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks | Center for Cancer Research

    Cancer.gov

    The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in

  19. Early Detection of Breast Cancer Using Molecular Beacons

    DTIC Science & Technology

    2008-01-01

    a molecular beacon (MB)-based approach for direct examination of gene expression in viable and fixed cells (2, 3). This objective of proposed study ...can be distinguished from normal cells (dark) (Figure 1) (2, 3, 8). Recently, a class of new fluorescent emitting nanoparticles, semiconductor ...morphological classification. This method may offer a simple and fast procedure to detect biomarker gene expression in clinical samples. Our study results

  20. CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

    PubMed

    Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel

    2016-03-01

    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  1. Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles.

    PubMed

    Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin

    2009-01-01

    Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

  2. An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes

    PubMed Central

    2013-01-01

    Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960

  3. geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification.

    PubMed

    Reboiro-Jato, Miguel; Arrais, Joel P; Oliveira, José Luis; Fdez-Riverola, Florentino

    2014-01-30

    The diagnosis and prognosis of several diseases can be shortened through the use of different large-scale genome experiments. In this context, microarrays can generate expression data for a huge set of genes. However, to obtain solid statistical evidence from the resulting data, it is necessary to train and to validate many classification techniques in order to find the best discriminative method. This is a time-consuming process that normally depends on intricate statistical tools. geneCommittee is a web-based interactive tool for routinely evaluating the discriminative classification power of custom hypothesis in the form of biologically relevant gene sets. While the user can work with different gene set collections and several microarray data files to configure specific classification experiments, the tool is able to run several tests in parallel. Provided with a straightforward and intuitive interface, geneCommittee is able to render valuable information for diagnostic analyses and clinical management decisions based on systematically evaluating custom hypothesis over different data sets using complementary classifiers, a key aspect in clinical research. geneCommittee allows the enrichment of microarrays raw data with gene functional annotations, producing integrated datasets that simplify the construction of better discriminative hypothesis, and allows the creation of a set of complementary classifiers. The trained committees can then be used for clinical research and diagnosis. Full documentation including common use cases and guided analysis workflows is freely available at http://sing.ei.uvigo.es/GC/.

  4. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification.

    PubMed

    Doostparast Torshizi, Abolfazl; Petzold, Linda R

    2018-01-01

    Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Cell-of-Origin in Diffuse Large B-Cell Lymphoma: Are the Assays Ready for the Clinic?

    PubMed

    Scott, David W

    2015-01-01

    Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoma worldwide and consists of a heterogeneous group of cancers classified together on the basis of shared morphology, immunophenotype, and aggressive clinical behavior. It is now recognized that this malignancy comprises at least two distinct molecular subtypes identified by gene expression profiling: the activated B-cell-like (ABC) and the germinal center B-cell-like (GCB) groups-the cell-of-origin (COO) classification. These two groups have different genetic mutation landscapes, pathobiology, and outcomes following treatment. Evidence is accumulating that novel agents have selective activity in one or the other COO group, making COO a predictive biomarker. Thus, there is now a pressing need for accurate and robust methods to assign COO, to support clinical trials, and ultimately guide treatment decisions for patients. The "gold standard" methods for COO are based on gene expression profiling (GEP) of RNA from fresh frozen tissue using microarray technology, which is an impractical solution when formalin-fixed paraffin-embedded tissue (FFPET) biopsies are the standard diagnostic material. This review outlines the history of the COO classification before examining the practical implementation of COO assays applicable to FFPET biopsies. The immunohistochemistry (IHC)-based algorithms and gene expression-based assays suitable for the highly degraded RNA from FFPET are discussed. Finally, the technical and practical challenges that still need to be addressed are outlined before robust gene expression-based assays are used in the routine management of patients with DLBCL.

  6. Robust diagnosis of non-Hodgkin lymphoma phenotypes validated on gene expression data from different laboratories.

    PubMed

    Bhanot, Gyan; Alexe, Gabriela; Levine, Arnold J; Stolovitzky, Gustavo

    2005-01-01

    A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrometry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers , is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.

  7. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

    PubMed

    Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar

    2016-04-01

    Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  9. Network-Induced Classification Kernels for Gene Expression Profile Analysis

    PubMed Central

    Dror, Gideon; Shamir, Ron

    2012-01-01

    Abstract Computational classification of gene expression profiles into distinct disease phenotypes has been highly successful to date. Still, robustness, accuracy, and biological interpretation of the results have been limited, and it was suggested that use of protein interaction information jointly with the expression profiles can improve the results. Here, we study three aspects of this problem. First, we show that interactions are indeed relevant by showing that co-expressed genes tend to be closer in the network of interactions. Second, we show that the improved performance of one extant method utilizing expression and interactions is not really due to the biological information in the network, while in another method this is not the case. Finally, we develop a new kernel method—called NICK—that integrates network and expression data for SVM classification, and demonstrate that overall it achieves better results than extant methods while running two orders of magnitude faster. PMID:22697242

  10. Classification of early-stage non-small cell lung cancer by weighing gene expression profiles with connectivity information.

    PubMed

    Zhang, Ao; Tian, Suyan

    2018-05-01

    Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Regularization strategies for hyperplane classifiers: application to cancer classification with gene expression data.

    PubMed

    Andries, Erik; Hagstrom, Thomas; Atlas, Susan R; Willman, Cheryl

    2007-02-01

    Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

  12. ROKU: a novel method for identification of tissue-specific genes.

    PubMed

    Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

    2006-06-12

    One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes.

  13. First Generation Gene Expression Signature for Early Prediction of Late Occurring Hematological Acute Radiation Syndrome in Baboons.

    PubMed

    Port, M; Herodin, F; Valente, M; Drouet, M; Lamkowski, A; Majewski, M; Abend, M

    2016-07-01

    We implemented a two-stage study to predict late occurring hematologic acute radiation syndrome (HARS) in a baboon model based on gene expression changes measured in peripheral blood within the first two days after irradiation. Eighteen baboons were irradiated to simulate different patterns of partial-body and total-body exposure, which corresponded to an equivalent dose of 2.5 or 5 Gy. According to changes in blood cell counts the surviving baboons (n = 17) exhibited mild (H1-2, n = 4) or more severe (H2-3, n = 13) HARS. Blood samples taken before irradiation served as unexposed control (H0, n = 17). For stage I of this study, a whole genome screen (mRNA microarrays) was performed using a portion of the samples (H0, n = 5; H1-2, n = 4; H2-3, n = 5). For stage II, using the remaining samples and the more sensitive methodology, qRT-PCR, validation was performed on candidate genes that were differentially up- or down-regulated during the first two days after irradiation. Differential gene expression was defined as significant (P < 0.05) and greater than or equal to a twofold difference above a H0 classification. From approximately 20,000 genes, on average 46% appeared to be expressed. On day 1 postirradiation for H2-3, approximately 2-3 times more genes appeared up-regulated (1,418 vs. 550) or down-regulated (1,603 vs. 735) compared to H1-2. This pattern became more pronounced at day 2 while the number of differentially expressed genes decreased. The specific genes showed an enrichment of biological processes coding for immune system processes, natural killer cell activation and immune response (P = 1 × E-06 up to 9 × E-14). Based on the P values, magnitude and sustained differential gene expression over time, we selected 89 candidate genes for validation using qRT-PCR. Ultimately, 22 genes were confirmed for identification of H1-3 classifications and seven genes for identification of H2-3 classifications using qRT-PCR. For H1-3 classifications, most genes were constantly three to fivefold down-regulated relative to H0 over both days, but some genes appeared 10.3-fold (VSIG4) or even 30.7-fold up-regulated (CD177) over H0. For H2-3, some genes appeared four to sevenfold up-regulated relative to H0 (RNASE3, DAGLA, ARG2), but other genes showed a strong 14- to 33-fold down-regulation relative to H0 (WNT3, POU2AF1, CCR7). All of these genes allowed an almost completely identifiable separation among each of the HARS categories. In summary, clinically relevant HARS can be independently predicted with all 29 irradiated genes examined in the peripheral blood of baboons within the first two days postirradiation. While further studies are needed to confirm these findings, this model shows potential relevance in the prediction of clinical outcomes in exposed humans and as an aid in the prioritizing of medical treatment.

  14. Familial or Sporadic Idiopathic Scoliosis – classification based on artificial neural network and GAPDH and ACTB transcription profile

    PubMed Central

    2013-01-01

    Background Importance of hereditary factors in the etiology of Idiopathic Scoliosis is widely accepted. In clinical practice some of the IS patients present with positive familial history of the deformity and some do not. Traditionally about 90% of patients have been considered as sporadic cases without familial recurrence. However the exact proportion of Familial and Sporadic Idiopathic Scoliosis is still unknown. Housekeeping genes encode proteins that are usually essential for the maintenance of basic cellular functions. ACTB and GAPDH are two housekeeping genes encoding respectively a cytoskeletal protein β-actin, and glyceraldehyde-3-phosphate dehydrogenase, an enzyme of glycolysis. Although their expression levels can fluctuate between different tissues and persons, human housekeeping genes seem to exhibit a preserved tissue-wide expression ranking order. It was hypothesized that expression ranking order of two representative housekeeping genes ACTB and GAPDH might be disturbed in the tissues of patients with Familial Idiopathic Scoliosis (with positive family history of idiopathic scoliosis) opposed to the patients with no family members affected (Sporadic Idiopathic Scoliosis). An artificial neural network (ANN) was developed that could serve to differentiate between familial and sporadic cases of idiopathic scoliosis based on the expression levels of ACTB and GAPDH in different tissues of scoliotic patients. The aim of the study was to investigate whether the expression levels of ACTB and GAPDH in different tissues of idiopathic scoliosis patients could be used as a source of data for specially developed artificial neural network in order to predict the positive family history of index patient. Results The comparison of developed models showed, that the most satisfactory classification accuracy was achieved for ANN model with 18 nodes in the first hidden layer and 16 nodes in the second hidden layer. The classification accuracy for positive Idiopathic Scoliosis anamnesis only with the expression measurements of ACTB and GAPDH with the use of ANN based on 6-18-16-1 architecture was 8 of 9 (88%). Only in one case the prediction was ambiguous. Conclusions Specially designed artificial neural network model proved possible association between expression level of ACTB, GAPDH and positive familial history of Idiopathic Scoliosis. PMID:23289769

  15. A 16-Gene Signature Distinguishes Anaplastic Astrocytoma from Glioblastoma

    PubMed Central

    Rao, Soumya Alige Mahabala; Srinivasan, Sujaya; Patric, Irene Rosita Pia; Hegde, Alangar Sathyaranjandas; Chandramouli, Bangalore Ashwathnarayanara; Arimappamagan, Arivazhagan; Santosh, Vani; Kondaiah, Paturu; Rao, Manchanahalli R. Sathyanarayana; Somasundaram, Kumaravel

    2014-01-01

    Anaplastic astrocytoma (AA; Grade III) and glioblastoma (GBM; Grade IV) are diffusely infiltrating tumors and are called malignant astrocytomas. The treatment regimen and prognosis are distinctly different between anaplastic astrocytoma and glioblastoma patients. Although histopathology based current grading system is well accepted and largely reproducible, intratumoral histologic variations often lead to difficulties in classification of malignant astrocytoma samples. In order to obtain a more robust molecular classifier, we analysed RT-qPCR expression data of 175 differentially regulated genes across astrocytoma using Prediction Analysis of Microarrays (PAM) and found the most discriminatory 16-gene expression signature for the classification of anaplastic astrocytoma and glioblastoma. The 16-gene signature obtained in the training set was validated in the test set with diagnostic accuracy of 89%. Additionally, validation of the 16-gene signature in multiple independent cohorts revealed that the signature predicted anaplastic astrocytoma and glioblastoma samples with accuracy rates of 99%, 88%, and 92% in TCGA, GSE1993 and GSE4422 datasets, respectively. The protein-protein interaction network and pathway analysis suggested that the 16-genes of the signature identified epithelial-mesenchymal transition (EMT) pathway as the most differentially regulated pathway in glioblastoma compared to anaplastic astrocytoma. In addition to identifying 16 gene classification signature, we also demonstrated that genes involved in epithelial-mesenchymal transition may play an important role in distinguishing glioblastoma from anaplastic astrocytoma. PMID:24475040

  16. Identifying gnostic predictors of the vaccine response

    PubMed Central

    Haining, W. Nicholas; Pulendran, Bali

    2012-01-01

    Molecular predictors of the response to vaccination could transform vaccine development. They would allow larger numbers of vaccine candidates to be rapidly screened, shortening the development time for new vaccines. Gene-expression based predictors of vaccine response have shown early promise. However, a limitation of gene-expression based predictors is that they often fail to reveal the mechanistic basis for their ability to classify response. Linking predictive signatures to the function of their component genes would advance basic understanding of vaccine immunity and also improve the robustness of outcome classification. New analytic tools now allow more biological meaning to be extracted from predictive signatures. Functional genomic approaches to perturb gene expression in mammalian cells permit the function of predictive genes to be surveyed in highly parallel experiments. The challenge for vaccinologists is therefore to use these tools to embed mechanistic insights into predictors of vaccine response. PMID:22633886

  17. CnidBase: The Cnidarian Evolutionary Genomics Database

    PubMed Central

    Ryan, Joseph F.; Finnerty, John R.

    2003-01-01

    CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians. In turn, CnidBase will help to illuminate the role of specific genes in shaping cnidarian biodiversity in the present day and in the distant past. CnidBase highlights evolutionary changes between species within the phylum Cnidaria and structures genomic and expression data to facilitate comparisons to non-cnidarian metazoans. CnidBase aims to further the progress that has already been made in the realm of cnidarian evolutionary genomics by creating a central community resource which will help drive future research and facilitate more accurate classification and comparison of new experimental data with existing data. CnidBase is available at http://cnidbase.bu.edu/. PMID:12519972

  18. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification

    PubMed Central

    2012-01-01

    Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977

  19. Gene-expression signatures can distinguish gastric cancer grades and stages.

    PubMed

    Cui, Juan; Li, Fan; Wang, Guoqing; Fang, Xuedong; Puett, J David; Xu, Ying

    2011-03-18

    Microarray gene-expression data of 54 paired gastric cancer and adjacent noncancerous gastric tissues were analyzed, with the aim to establish gene signatures for cancer grades (well-, moderately-, poorly- or un-differentiated) and stages (I, II, III and IV), which have been determined by pathologists. Our statistical analysis led to the identification of a number of gene combinations whose expression patterns serve well as signatures of different grades and different stages of gastric cancer. A 19-gene signature was found to have discerning power between high- and low-grade gastric cancers in general, with overall classification accuracy at 79.6%. An expanded 198-gene panel allows the stratification of cancers into four grades and control, giving rise to an overall classification agreement of 74.2% between each grade designated by the pathologists and our prediction. Two signatures for cancer staging, consisting of 10 genes and 9 genes, respectively, provide high classification accuracies at 90.0% and 84.0%, among early-, advanced-stage cancer and control. Functional and pathway analyses on these signature genes reveal the significant relevance of the derived signatures to cancer grades and progression. To the best of our knowledge, this represents the first study on identification of genes whose expression patterns can serve as markers for cancer grades and stages.

  20. Similarity-balanced discriminant neighbor embedding and its application to cancer classification based on gene expression data.

    PubMed

    Zhang, Li; Qian, Liqiang; Ding, Chuntao; Zhou, Weida; Li, Fanzhang

    2015-09-01

    The family of discriminant neighborhood embedding (DNE) methods is typical graph-based methods for dimension reduction, and has been successfully applied to face recognition. This paper proposes a new variant of DNE, called similarity-balanced discriminant neighborhood embedding (SBDNE) and applies it to cancer classification using gene expression data. By introducing a novel similarity function, SBDNE deals with two data points in the same class and the different classes with different ways. The homogeneous and heterogeneous neighbors are selected according to the new similarity function instead of the Euclidean distance. SBDNE constructs two adjacent graphs, or between-class adjacent graph and within-class adjacent graph, using the new similarity function. According to these two adjacent graphs, we can generate the local between-class scatter and the local within-class scatter, respectively. Thus, SBDNE can maximize the between-class scatter and simultaneously minimize the within-class scatter to find the optimal projection matrix. Experimental results on six microarray datasets show that SBDNE is a promising method for cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. CrossLink: a novel method for cross-condition classification of cancer subtypes.

    PubMed

    Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei

    2016-08-22

    We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.

  2. Correlation of Biomarker Expression in Colonic Mucosa with Disease Phenotype in Crohn's Disease and Ulcerative Colitis.

    PubMed

    Bruno, Maria E C; Rogier, Eric W; Arsenescu, Razvan I; Flomenhoft, Deborah R; Kurkjian, Cathryn J; Ellis, Gavin I; Kaetzel, Charlotte S

    2015-10-01

    Inflammatory bowel diseases (IBD), including Crohn's disease (CD) and ulcerative colitis (UC), are characterized by chronic intestinal inflammation due to immunological, microbial, and environmental factors in genetically predisposed individuals. Advances in the diagnosis, prognosis, and treatment of IBD require the identification of robust biomarkers that can be used for molecular classification of diverse disease presentations. We previously identified five genes, RELA, TNFAIP3 (A20), PIGR, TNF, and IL8, whose mRNA levels in colonic mucosal biopsies could be used in a multivariate analysis to classify patients with CD based on disease behavior and responses to therapy. We compared expression of these five biomarkers in IBD patients classified as having CD or UC, and in healthy controls. Patients with CD were characterized as having decreased median expression of TNFAIP3, PIGR, and TNF in non-inflamed colonic mucosa as compared to healthy controls. By contrast, UC patients exhibited decreased expression of PIGR and elevated expression of IL8 in colonic mucosa compared to healthy controls. A multivariate analysis combining mRNA levels for all five genes resulted in segregation of individuals based on disease presentation (CD vs. UC) as well as severity, i.e., patients in remission versus those with acute colitis at the time of biopsy. We propose that this approach could be used as a model for molecular classification of IBD patients, which could further be enhanced by the inclusion of additional genes that are identified by functional studies, global gene expression analyses, and genome-wide association studies.

  3. Evaluation of gene expression classification studies: factors associated with classification performance.

    PubMed

    Novianti, Putri W; Roes, Kit C B; Eijkemans, Marinus J C

    2014-01-01

    Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was affected by factors such as the classification algorithm, cross validation method, number of genes, and gene selection method. In this paper, we study the hypothesis that the disease under study significantly determines which method is optimal, and that additionally sample size, class imbalance, type of medical question (diagnostic, prognostic or treatment response), and microarray platform are potentially influential. A systematic literature review was used to extract the information from 48 published articles on non-cancer microarray classification studies. The impact of the various factors on the reported classification accuracy was analyzed through random-intercept logistic regression. The type of medical question and method of cross validation dominated the explained variation in accuracy among studies, followed by disease category and microarray platform. In total, 42% of the between study variation was explained by all the study specific and problem specific factors that we studied together.

  4. Applying Cost-Sensitive Extreme Learning Machine and Dissimilarity Integration to Gene Expression Data Classification.

    PubMed

    Liu, Yanqiu; Lu, Huijuan; Yan, Ke; Xia, Haixia; An, Chunlin

    2016-01-01

    Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.

  5. The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.

    PubMed

    Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R

    2017-01-01

    Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.

  6. Inference of combinatorial Boolean rules of synergistic gene sets from cancer microarray datasets.

    PubMed

    Park, Inho; Lee, Kwang H; Lee, Doheon

    2010-06-15

    Gene set analysis has become an important tool for the functional interpretation of high-throughput gene expression datasets. Moreover, pattern analyses based on inferred gene set activities of individual samples have shown the ability to identify more robust disease signatures than individual gene-based pattern analyses. Although a number of approaches have been proposed for gene set-based pattern analysis, the combinatorial influence of deregulated gene sets on disease phenotype classification has not been studied sufficiently. We propose a new approach for inferring combinatorial Boolean rules of gene sets for a better understanding of cancer transcriptome and cancer classification. To reduce the search space of the possible Boolean rules, we identify small groups of gene sets that synergistically contribute to the classification of samples into their corresponding phenotypic groups (such as normal and cancer). We then measure the significance of the candidate Boolean rules derived from each group of gene sets; the level of significance is based on the class entropy of the samples selected in accordance with the rules. By applying the present approach to publicly available prostate cancer datasets, we identified 72 significant Boolean rules. Finally, we discuss several identified Boolean rules, such as the rule of glutathione metabolism (down) and prostaglandin synthesis regulation (down), which are consistent with known prostate cancer biology. Scripts written in Python and R are available at http://biosoft.kaist.ac.kr/~ihpark/. The refined gene sets and the full list of the identified Boolean rules are provided in the Supplementary Material. Supplementary data are available at Bioinformatics online.

  7. Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.

    PubMed

    Zhang, Xiang; Guan, Naiyang; Jia, Zhilong; Qiu, Xiaogang; Luo, Zhigang

    2015-01-01

    Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

  8. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less

  9. Non-Gaussian Distributions Affect Identification of Expression Patterns, Functional Annotation, and Prospective Classification in Human Cancer Genomes

    PubMed Central

    Marko, Nicholas F.; Weil, Robert J.

    2012-01-01

    Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863

  10. ROKU: a novel method for identification of tissue-specific genes

    PubMed Central

    Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

    2006-01-01

    Background One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. Results We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. Conclusion ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes. PMID:16764735

  11. Heterogeneous data fusion for brain tumor classification.

    PubMed

    Metsis, Vangelis; Huang, Heng; Andronesi, Ovidiu C; Makedon, Fillia; Tzika, Aria

    2012-10-01

    Current research in biomedical informatics involves analysis of multiple heterogeneous data sets. This includes patient demographics, clinical and pathology data, treatment history, patient outcomes as well as gene expression, DNA sequences and other information sources such as gene ontology. Analysis of these data sets could lead to better disease diagnosis, prognosis, treatment and drug discovery. In this report, we present a novel machine learning framework for brain tumor classification based on heterogeneous data fusion of metabolic and molecular datasets, including state-of-the-art high-resolution magic angle spinning (HRMAS) proton (1H) magnetic resonance spectroscopy and gene transcriptome profiling, obtained from intact brain tumor biopsies. Our experimental results show that our novel framework outperforms any analysis using individual dataset.

  12. Impact of missing data imputation methods on gene expression clustering and classification.

    PubMed

    de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G

    2015-02-26

    Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .

  13. Characterization and Expression of Drug Resistance Genes in MDROs Originating from Combat Wound Infections

    DTIC Science & Technology

    2016-09-01

    assigned a classification. MLST analysis MLST was determined using an in-house automated pipeline that first searches for homologs of each gene of...and virulence mechanism contributing to their success as pathogens in the wound environment. A novel bioinformatics pipeline was used to incorporate...monitored in two ways: read-based genome QC and assembly based metrics. The JCVI Genome QC pipeline samples sequence reads and performs BLAST

  14. Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages

    PubMed Central

    Li, Qian; Lin, Sen

    2017-01-01

    Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats. PMID:28800357

  15. Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages.

    PubMed

    Lin, Yaqiu; Zhu, Jiangjiang; Wang, Yong; Li, Qian; Lin, Sen

    2017-01-01

    Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats.

  16. A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.

    PubMed

    Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong

    2017-01-01

    To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.

  17. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset.

    PubMed

    Mallik, Saurav; Maulik, Ujjwal

    2015-10-01

    Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. PlantTribes: a gene and gene family resource for comparative genomics in plants

    PubMed Central

    Wall, P. Kerr; Leebens-Mack, Jim; Müller, Kai F.; Field, Dawn; Altman, Naomi S.; dePamphilis, Claude W.

    2008-01-01

    The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575–1584)] to classify all of these species’ protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting ∼4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study. PMID:18073194

  19. Blood-Based Gene Expression Profiles Models for Classification of Subsyndromal Symptomatic Depression and Major Depressive Disorder

    PubMed Central

    Yu, Shunying; Yuan, Chengmei; Hong, Wu; Wang, Zuowei; Cui, Jian; Shi, Tieliu; Fang, Yiru

    2012-01-01

    Subsyndromal symptomatic depression (SSD) is a subtype of subthreshold depressive and also lead to significant psychosocial functional impairment as same as major depressive disorder (MDD). Several studies have suggested that SSD is a transitory phenomena in the depression spectrum and is thus considered a subtype of depression. However, the pathophysioloy of depression remain largely obscure and studies on SSD are limited. The present study compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD, and matched controls (8 subjects in each group). Support vector machines (SVMs) were utilized for training and testing on candidate signature expression profiles from signature selection step. Firstly, we identified 63 differentially expressed SSD signatures in contrast to control (P< = 5.0E-4) and 30 differentially expressed MDD signatures in contrast to control, respectively. Then, 123 gene signatures were identified with significantly differential expression level between SSD and MDD. Secondly, in order to conduct priority selection for biomarkers for SSD and MDD together, we selected top gene signatures from each group of pair-wise comparison results, and merged the signatures together to generate better profiles used for clearly classify SSD and MDD sets in the same time. In details, we tried different combination of signatures from the three pair-wise compartmental results and finally determined 48 gene expression signatures with 100% accuracy. Our finding suggested that SSD and MDD did not exhibit the same expressed genome signature with peripheral blood leukocyte, and blood cell–derived RNA of these 48 gene models may have significant value for performing diagnostic functions and classifying SSD, MDD, and healthy controls. PMID:22348066

  20. Use of Attribute Driven Incremental Discretization and Logic Learning Machine to build a prognostic classifier for neuroblastoma patients.

    PubMed

    Cangelosi, Davide; Muselli, Marco; Parodi, Stefano; Blengio, Fabiola; Becherini, Pamela; Versteeg, Rogier; Conte, Massimo; Varesio, Luigi

    2014-01-01

    Cancer patient's outcome is written, in part, in the gene expression profile of the tumor. We previously identified a 62-probe sets signature (NB-hypo) to identify tissue hypoxia in neuroblastoma tumors and showed that NB-hypo stratified neuroblastoma patients in good and poor outcome 1. It was important to develop a prognostic classifier to cluster patients into risk groups benefiting of defined therapeutic approaches. Novel classification and data discretization approaches can be instrumental for the generation of accurate predictors and robust tools for clinical decision support. We explored the application to gene expression data of Rulex, a novel software suite including the Attribute Driven Incremental Discretization technique for transforming continuous variables into simplified discrete ones and the Logic Learning Machine model for intelligible rule generation. We applied Rulex components to the problem of predicting the outcome of neuroblastoma patients on the bases of 62 probe sets NB-hypo gene expression signature. The resulting classifier consisted in 9 rules utilizing mainly two conditions of the relative expression of 11 probe sets. These rules were very effective predictors, as shown in an independent validation set, demonstrating the validity of the LLM algorithm applied to microarray data and patients' classification. The LLM performed as efficiently as Prediction Analysis of Microarray and Support Vector Machine, and outperformed other learning algorithms such as C4.5. Rulex carried out a feature selection by selecting a new signature (NB-hypo-II) of 11 probe sets that turned out to be the most relevant in predicting outcome among the 62 of the NB-hypo signature. Rules are easily interpretable as they involve only few conditions. Our findings provided evidence that the application of Rulex to the expression values of NB-hypo signature created a set of accurate, high quality, consistent and interpretable rules for the prediction of neuroblastoma patients' outcome. We identified the Rulex weighted classification as a flexible tool that can support clinical decisions. For these reasons, we consider Rulex to be a useful tool for cancer classification from microarray gene expression data.

  1. GTA: a game theoretic approach to identifying cancer subnetwork markers.

    PubMed

    Farahmand, S; Goliaei, S; Ansari-Pour, N; Razaghi-Moghadam, Z

    2016-03-01

    The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years. A subset of these studies attempt to analyze genome-wide expression profiles to identify markers with high reliability and reusability across independent whole-transcriptome microarray datasets. Therefore, the functional relationships of genes are integrated with their expression data. However, for a more accurate representation of the functional relationships among genes, utilization of the protein-protein interaction network (PPIN) seems to be necessary. Herein, a novel game theoretic approach (GTA) is proposed for the identification of cancer subnetwork markers by integrating genome-wide expression profiles and PPIN. The GTA method was applied to three distinct whole-transcriptome breast cancer datasets to identify the subnetwork markers associated with metastasis. To evaluate the performance of our approach, the identified subnetwork markers were compared with gene-based, pathway-based and network-based markers. We show that GTA is not only capable of identifying robust metastatic markers, it also provides a higher classification performance. In addition, based on these GTA-based subnetworks, we identified a new bonafide candidate gene for breast cancer susceptibility.

  2. Genome-wide identification, classification, and expression analysis of the arabinogalactan protein gene family in rice (Oryza sativa L.)

    PubMed Central

    Zhao, Jie

    2010-01-01

    Arabinogalactan proteins (AGPs) comprise a family of hydroxyproline-rich glycoproteins that are implicated in plant growth and development. In this study, 69 AGPs are identified from the rice genome, including 13 classical AGPs, 15 arabinogalactan (AG) peptides, three non-classical AGPs, three early nodulin-like AGPs (eNod-like AGPs), eight non-specific lipid transfer protein-like AGPs (nsLTP-like AGPs), and 27 fasciclin-like AGPs (FLAs). The results from expressed sequence tags, microarrays, and massively parallel signature sequencing tags are used to analyse the expression of AGP-encoding genes, which is confirmed by real-time PCR. The results reveal that several rice AGP-encoding genes are predominantly expressed in anthers and display differential expression patterns in response to abscisic acid, gibberellic acid, and abiotic stresses. Based on the results obtained from this analysis, an attempt has been made to link the protein structures and expression patterns of rice AGP-encoding genes to their functions. Taken together, the genome-wide identification and expression analysis of the rice AGP gene family might facilitate further functional studies of rice AGPs. PMID:20423940

  3. Multi-label literature classification based on the Gene Ontology graph.

    PubMed

    Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

    2008-12-08

    The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.

  4. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.

    PubMed

    Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

    2006-06-23

    Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.

  5. Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.

    PubMed

    Gálvez, Juan Manuel; Castillo, Daniel; Herrera, Luis Javier; San Román, Belén; Valenzuela, Olga; Ortuño, Francisco Manuel; Rojas, Ignacio

    2018-01-01

    Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.

  6. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  7. Cell-Type–Specific Transcriptional Profiles of the Dimorphic Pathogen Penicillium marneffei Reflect Distinct Reproductive, Morphological, and Environmental Demands

    PubMed Central

    Pasricha, Shivani; Payne, Michael; Canovas, David; Pase, Luke; Ngaosuwankul, Nathamon; Beard, Sally; Oshlack, Alicia; Smyth, Gordon K.; Chaiyaroj, Sansanee C.; Boyce, Kylie J.; Andrianopoulos, Alex

    2013-01-01

    Penicillium marneffei is an opportunistic human pathogen endemic to Southeast Asia. At 25° P. marneffei grows in a filamentous hyphal form and can undergo asexual development (conidiation) to produce spores (conidia), the infectious agent. At 37° P. marneffei grows in the pathogenic yeast cell form that replicates by fission. Switching between these growth forms, known as dimorphic switching, is dependent on temperature. To understand the process of dimorphic switching and the physiological capacity of the different cell types, two microarray-based profiling experiments covering approximately 42% of the genome were performed. The first experiment compared cells from the hyphal, yeast, and conidiation phases to identify “phase or cell-state–specific” gene expression. The second experiment examined gene expression during the dimorphic switch from one morphological state to another. The data identified a variety of differentially expressed genes that have been organized into metabolic clusters based on predicted function and expression patterns. In particular, C-14 sterol reductase–encoding gene ergM of the ergosterol biosynthesis pathway showed high-level expression throughout yeast morphogenesis compared to hyphal. Deletion of ergM resulted in severe growth defects with increased sensitivity to azole-type antifungal agents but not amphotericin B. The data defined gene classes based on spatio-temporal expression such as those expressed early in the dimorphic switch but not in the terminal cell types and those expressed late. Such classifications have been helpful in linking a given gene of interest to its expression pattern throughout the P. marneffei dimorphic life cycle and its likely role in pathogenicity. PMID:24062530

  8. Lex-SVM: exploring the potential of exon expression profiling for disease classification.

    PubMed

    Yuan, Xiongying; Zhao, Yi; Liu, Changning; Bu, Dongbo

    2011-04-01

    Exon expression profiling technologies, including exon arrays and RNA-Seq, measure the abundance of every exon in a gene. Compared with gene expression profiling technologies like 3' array, exon expression profiling technologies could detect alterations in both transcription and alternative splicing, therefore they are expected to be more sensitive in diagnosis. However, exon expression profiling also brings higher dimension, more redundancy, and significant correlation among features. Ignoring the correlation structure among exons of a gene, a popular classification method like L1-SVM selects exons individually from each gene and thus is vulnerable to noise. To overcome this limitation, we present in this paper a new variant of SVM named Lex-SVM to incorporate correlation structure among exons and known splicing patterns to promote classification performance. Specifically, we construct a new norm, ex-norm, including our prior knowledge on exon correlation structure to regularize the coefficients of a linear SVM. Lex-SVM can be solved efficiently using standard linear programming techniques. The advantage of Lex-SVM is that it can select features group-wisely, force features in a subgroup to take equal weihts and exclude the features that contradict the majority in the subgroup. Experimental results suggest that on exon expression profile, Lex-SVM is more accurate than existing methods. Lex-SVM also generates a more compact model and selects genes more consistently in cross-validation. Unlike L1-SVM selecting only one exon in a gene, Lex-SVM assigns equal weights to as many exons in a gene as possible, lending itself easier for further interpretation.

  9. Expression profiling in canine osteosarcoma: identification of biomarkers and pathways associated with outcome

    PubMed Central

    2010-01-01

    Background Osteosarcoma (OSA) spontaneously arises in the appendicular skeleton of large breed dogs and shares many physiological and molecular biological characteristics with human OSA. The standard treatment for OSA in both species is amputation or limb-sparing surgery, followed by chemotherapy. Unfortunately, OSA is an aggressive cancer with a high metastatic rate. Characterization of OSA with regard to its metastatic potential and chemotherapeutic resistance will improve both prognostic capabilities and treatment modalities. Methods We analyzed archived primary OSA tissue from dogs treated with limb amputation followed by doxorubicin or platinum-based drug chemotherapy. Samples were selected from two groups: dogs with disease free intervals (DFI) of less than 100 days (n = 8) and greater than 300 days (n = 7). Gene expression was assessed with Affymetrix Canine 2.0 microarrays and analyzed with a two-tailed t-test. A subset of genes was confirmed using qRT-PCR and used in classification analysis to predict prognosis. Systems-based gene ontology analysis was conducted on genes selected using a standard J5 metric. The genes identified using this approach were converted to their human homologues and assigned to functional pathways using the GeneGo MetaCore platform. Results Potential biomarkers were identified using gene expression microarray analysis and 11 differentially expressed (p < 0.05) genes were validated with qRT-PCR (n = 10/group). Statistical classification models using the qRT-PCR profiles predicted patient outcomes with 100% accuracy in the training set and up to 90% accuracy upon stratified cross validation. Pathway analysis revealed alterations in pathways associated with oxidative phosphorylation, hedgehog and parathyroid hormone signaling, cAMP/Protein Kinase A (PKA) signaling, immune responses, cytoskeletal remodeling and focal adhesion. Conclusions This profiling study has identified potential new biomarkers to predict patient outcome in OSA and new pathways that may be targeted for therapeutic intervention. PMID:20860831

  10. Dexamethasone Stimulated Gene Expression in Peripheral Blood is a Sensitive Marker for Glucocorticoid Receptor Resistance in Depressed Patients

    PubMed Central

    Menke, Andreas; Arloth, Janine; Pütz, Benno; Weber, Peter; Klengel, Torsten; Mehta, Divya; Gonik, Mariya; Rex-Haffner, Monika; Rubel, Jennifer; Uhr, Manfred; Lucae, Susanne; Deussing, Jan M; Müller-Myhsok, Bertram; Holsboer, Florian; Binder, Elisabeth B

    2012-01-01

    Although gene expression profiles in peripheral blood in major depression are not likely to identify genes directly involved in the pathomechanism of affective disorders, they may serve as biomarkers for this disorder. As previous studies using baseline gene expression profiles have provided mixed results, our approach was to use an in vivo dexamethasone challenge test and to compare glucocorticoid receptor (GR)-mediated changes in gene expression between depressed patients and healthy controls. Whole genome gene expression data (baseline and following GR-stimulation with 1.5 mg dexamethasone p.o.) from two independent cohorts were analyzed to identify gene expression pattern that would predict case and control status using a training (N=18 cases/18 controls) and a test cohort (N=11/13). Dexamethasone led to reproducible regulation of 2670 genes in controls and 1151 transcripts in cases. Several genes, including FKBP5 and DUSP1, previously associated with the pathophysiology of major depression, were found to be reliable markers of GR-activation. Using random forest analyses for classification, GR-stimulated gene expression outperformed baseline gene expression as a classifier for case and control status with a correct classification of 79.1 vs 41.6% in the test cohort. GR-stimulated gene expression performed best in dexamethasone non-suppressor patients (88.7% correctly classified with 100% sensitivity), but also correctly classified 77.3% of the suppressor patients (76.7% sensitivity), when using a refined set of 19 genes. Our study suggests that in vivo stimulated gene expression in peripheral blood cells could be a promising molecular marker of altered GR-functioning, an important component of the underlying pathology, in patients suffering from depressive episodes. PMID:22237309

  11. Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data.

    PubMed

    Chatterjee, Sankhadeep; Dey, Nilanjan; Shi, Fuqian; Ashour, Amira S; Fong, Simon James; Sen, Soumya

    2018-04-01

    Dengue fever detection and classification have a vital role due to the recent outbreaks of different kinds of dengue fever. Recently, the advancement in the microarray technology can be employed for such classification process. Several studies have established that the gene selection phase takes a significant role in the classifier performance. Subsequently, the current study focused on detecting two different variations, namely, dengue fever (DF) and dengue hemorrhagic fever (DHF). A modified bag-of-features method has been proposed to select the most promising genes in the classification process. Afterward, a modified cuckoo search optimization algorithm has been engaged to support the artificial neural (ANN-MCS) to classify the unknown subjects into three different classes namely, DF, DHF, and another class containing convalescent and normal cases. The proposed method has been compared with other three well-known classifiers, namely, multilayer perceptron feed-forward network (MLP-FFN), artificial neural network (ANN) trained with cuckoo search (ANN-CS), and ANN trained with PSO (ANN-PSO). Experiments have been carried out with different number of clusters for the initial bag-of-features-based feature selection phase. After obtaining the reduced dataset, the hybrid ANN-MCS model has been employed for the classification process. The results have been compared in terms of the confusion matrix-based performance measuring metrics. The experimental results indicated a highly statistically significant improvement with the proposed classifier over the traditional ANN-CS model.

  12. Heterogeneous activation of the TGFβ pathway in glioblastomas identified by gene expression-based classification using TGFβ-responsive genes

    PubMed Central

    Xu, Xie L; Kapoun, Ann M

    2009-01-01

    Background TGFβ has emerged as an attractive target for the therapeutic intervention of glioblastomas. Aberrant TGFβ overproduction in glioblastoma and other high-grade gliomas has been reported, however, to date, none of these reports has systematically examined the components of TGFβ signaling to gain a comprehensive view of TGFβ activation in large cohorts of human glioma patients. Methods TGFβ activation in mammalian cells leads to a transcriptional program that typically affects 5–10% of the genes in the genome. To systematically examine the status of TGFβ activation in high-grade glial tumors, we compiled a gene set of transcriptional response to TGFβ stimulation from tissue culture and in vivo animal studies. These genes were used to examine the status of TGFβ activation in high-grade gliomas including a large cohort of glioblastomas. Unsupervised and supervised classification analysis was performed in two independent, publicly available glioma microarray datasets. Results Unsupervised and supervised classification using the TGFβ-responsive gene list in two independent glial tumor gene expression data sets revealed various levels of TGFβ activation in these tumors. Among glioblastomas, one of the most devastating human cancers, two subgroups were identified that showed distinct TGFβ activation patterns as measured from transcriptional responses. Approximately 62% of glioblastoma samples analyzed showed strong TGFβ activation, while the rest showed a weak TGFβ transcriptional response. Conclusion Our findings suggest heterogeneous TGFβ activation in glioblastomas, which may cause potential differences in responses to anti-TGFβ therapies in these two distinct subgroups of glioblastomas patients. PMID:19192267

  13. FlyBase: genes and gene models

    PubMed Central

    Drysdale, Rachel A.; Crosby, Madeline A.

    2005-01-01

    FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosphila species. PMID:15608223

  14. Exceptions to the rule: case studies in the prediction of pathogenicity for genetic variants in hereditary cancer genes.

    PubMed

    Rosenthal, E T; Bowles, K R; Pruss, D; van Kan, A; Vail, P J; McElroy, H; Wenstrup, R J

    2015-12-01

    Based on current consensus guidelines and standard practice, many genetic variants detected in clinical testing are classified as disease causing based on their predicted impact on the normal expression or function of the gene in the absence of additional data. However, our laboratory has identified a subset of such variants in hereditary cancer genes for which compelling contradictory evidence emerged after the initial evaluation following the first observation of the variant. Three representative examples of variants in BRCA1, BRCA2 and MSH2 that are predicted to disrupt splicing, prematurely truncate the protein, or remove the start codon were evaluated for pathogenicity by analyzing clinical data with multiple classification algorithms. Available clinical data for all three variants contradicts the expected pathogenic classification. These variants illustrate potential pitfalls associated with standard approaches to variant classification as well as the challenges associated with monitoring data, updating classifications, and reporting potentially contradictory interpretations to the clinicians responsible for translating test outcomes to appropriate clinical action. It is important to address these challenges now as the model for clinical testing moves toward the use of large multi-gene panels and whole exome/genome analysis, which will dramatically increase the number of genetic variants identified. © 2015 The Authors. Clinical Genetics published by John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. The Regulation of Gene Expression in Cnidarian-Algal Associations.

    DTIC Science & Technology

    1998-07-13

    symbiotic cnidarians , Aiptasia pallida, Anthopleura eligantissima, synbiosis-specific proteins, cDNA libraries, O. SECURITY CLASSIFICATION OP REPORT...gene expression in cnidarian -algal associations Award Period: 1 July 1995-30 June 1998 Objectives: A. To identify and characterize heat shock...Exploring Symbiosis-Specific Gene Expression in Cnidarian /Algal Associations. In: Molecular Approaches to the Study of the Ocean.. Ed. K. Cooksey, Chapman

  16. Population Level Purifying Selection and Gene Expression Shape Subgenome Evolution in Maize.

    PubMed

    Pophaly, Saurabh D; Tellier, Aurélien

    2015-12-01

    The maize ancestor experienced a recent whole-genome duplication (WGD) followed by gene erosion which generated two subgenomes, the dominant subgenome (maize1) experiencing fewer deletions than maize2. We take advantage of available extensive polymorphism and gene expression data in maize to study purifying selection and gene expression divergence between WGD retained paralog pairs. We first report a strong correlation in nucleotide diversity between duplicate pairs, except for upstream regions. We then show that maize1 genes are under stronger purifying selection than maize2. WGD retained genes have higher gene dosage and biased Gene Ontologies consistent with previous studies. The relative gene expression of paralogs across tissues demonstrates that 98% of duplicate pairs have either subfunctionalized in a tissuewise manner or have diverged consistently in their expression thereby preventing functional complementation. Tissuewise subfunctionalization seems to be a hallmark of transcription factors, whereas consistent repression occurs for macromolecular complexes. We show that dominant gene expression is a strong determinant of the strength of purifying selection, explaining the inferred stronger negative selection on maize1 genes. We propose a novel expression-based classification of duplicates which is more robust to explain observed polymorphism patterns than the subgenome location. Finally, upstream regions of repressed genes exhibit an enrichment in transposable elements which indicates a possible mechanism for expression divergence. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Cancer classification through filtering progressive transductive support vector machine based on gene expression data

    NASA Astrophysics Data System (ADS)

    Lu, Xinguo; Chen, Dan

    2017-08-01

    Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.

  18. Spectral biclustering of microarray data: coclustering genes and conditions.

    PubMed

    Kluger, Yuval; Basri, Ronen; Chang, Joseph T; Gerstein, Mark

    2003-04-01

    Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find "marker genes" that are differentially expressed in particular sets of "conditions." We have developed a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of gene expression data, if they exist. In a cancer context, these checkerboards correspond to genes that are markedly up- or downregulated in patients with particular types of tumors. Our method, spectral biclustering, is based on the observation that checkerboard structures in matrices of expression data can be found in eigenvectors corresponding to characteristic expression patterns across genes or conditions. In addition, these eigenvectors can be readily identified by commonly used linear algebra approaches, in particular the singular value decomposition (SVD), coupled with closely integrated normalization steps. We present a number of variants of the approach, depending on whether the normalization over genes and conditions is done independently or in a coupled fashion. We then apply spectral biclustering to a selection of publicly available cancer expression data sets, and examine the degree to which the approach is able to identify checkerboard structures. Furthermore, we compare the performance of our biclustering methods against a number of reasonable benchmarks (e.g., direct application of SVD or normalized cuts to raw data).

  19. BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.

    PubMed

    Sen Sarma, Moushumi; Arcoleo, David; Khetani, Radhika S; Chee, Brant; Ling, Xu; He, Xin; Jiang, Jing; Mei, Qiaozhu; Zhai, ChengXiang; Schatz, Bruce

    2011-07-01

    With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php.

  20. Characterization of distinct classes of differential gene expression in osteoblast cultures from non-syndromic craniosynostosis bone.

    PubMed

    Rojas-Peña, Monica L; Olivares-Navarrete, Rene; Hyzy, Sharon; Arafat, Dalia; Schwartz, Zvi; Boyan, Barbara D; Williams, Joseph; Gibson, Greg

    2014-01-01

    Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention.

  1. Characterization of Distinct Classes of Differential Gene Expression in Osteoblast Cultures from Non-Syndromic Craniosynostosis Bone

    PubMed Central

    Rojas-Peña, Monica L.; Olivares-Navarrete, Rene; Hyzy, Sharon; Arafat, Dalia; Schwartz, Zvi; Boyan, Barbara D.; Williams, Joseph; Gibson, Greg

    2014-01-01

    Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention. PMID:25184005

  2. Pilot Comparison of Stromal Gene Expression Among Normal Prostate Tissues and Primary Prostate Cancer Tissues in White and Black Men

    DTIC Science & Technology

    2007-09-01

    AD_________________ Award Number: W81XWH-04-1-0817 TITLE: Pilot Comparison of Stromal Gene ...COVERED 30 Sep 2006 – 31 Aug 2007 4. TITLE AND SUBTITLE Pilot Comparison of Stromal Gene Expression among Normal Prostate Tissues and 5a. CONTRACT...subject to formal hypothesis testing. 15. SUBJECT TERMS Prostate Stromal Gene Expression 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF

  3. Prediction of gene expression in embryonic structures of Drosophila melanogaster.

    PubMed

    Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis

    2007-07-01

    Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.

  4. Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster

    PubMed Central

    Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis

    2007-01-01

    Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945

  5. Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus)

    PubMed Central

    Pan, Lei; Liu, Yan; Wei, Qiang; Xiao, Chenwen; Ji, Quanan; Bao, Guolian; Wu, Xinsheng

    2015-01-01

    Background Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination. Methodology/Principal Findings Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt). Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO), Clusters of Orthologous Groups (KOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG), we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit. Conclusions/Significance The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development. PMID:25955442

  6. Integrated Proteomic and Transcriptomic-Based Approaches to Identifying Signature Biomarkers and Pathways for Elucidation of Daoy and UW228 Subtypes.

    PubMed

    Higdon, Roger; Kala, Jessie; Wilkins, Devan; Yan, Julia Fangfei; Sethi, Manveen K; Lin, Liang; Liu, Siqi; Montague, Elizabeth; Janko, Imre; Choiniere, John; Kolker, Natali; Hancock, William S; Kolker, Eugene; Fanayan, Susan

    2017-02-03

    Medulloblastoma (MB) is the most common malignant pediatric brain tumor. Patient survival has remained largely the same for the past 20 years, with therapies causing significant health, cognitive, behavioral and developmental complications for those who survive the tumor. In this study, we profiled the total transcriptome and proteome of two established MB cell lines, Daoy and UW228, using high-throughput RNA sequencing (RNA-Seq) and label-free nano-LC-MS/MS-based quantitative proteomics, coupled with advanced pathway analysis. While Daoy has been suggested to belong to the sonic hedgehog (SHH) subtype, the exact UW228 subtype is not yet clearly established. Thus, a goal of this study was to identify protein markers and pathways that would help elucidate their subtype classification. A number of differentially expressed genes and proteins, including a number of adhesion, cytoskeletal and signaling molecules, were observed between the two cell lines. While several cancer-associated genes/proteins exhibited similar expression across the two cell lines, upregulation of a number of signature proteins and enrichment of key components of SHH and WNT signaling pathways were uniquely observed in Daoy and UW228, respectively. The novel information on differentially expressed genes/proteins and enriched pathways provide insights into the biology of MB, which could help elucidate their subtype classification.

  7. Effective Feature Selection for Classification of Promoter Sequences.

    PubMed

    K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

    2016-01-01

    Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  8. The Application of Gene Expression Profiling in Predictions of Occult Lymph Node Metastasis in Colorectal Cancer Patients

    PubMed Central

    Peyravian, Noshad; Larki, Pegah; Gharib, Ehsan; Nazemalhosseini-Mojarad, Ehsan; Anaraki, Fakhrosadate; Young, Chris; McClellan, James; Ashrafian Bonab, Maziar; Asadzadeh-Aghdaei, Hamid; Zali, Mohammad Reza

    2018-01-01

    A key factor in determining the likely outcome for a patient with colorectal cancer is whether or not the tumour has metastasised to the lymph nodes—information which is also important in assessing any possibilities of lymph node resection so as to improve survival. In this review we perform a wide-range assessment of literature relating to recent developments in gene expression profiling (GEP) of the primary tumour, to determine their utility in assessing node status. A set of characteristic genes seems to be involved in the prediction of lymph node metastasis (LNM) in colorectal patients. Hence, GEP is applicable in personalised/individualised/tailored therapies and provides insights into developing novel therapeutic targets. Not only is GEP useful in prediction of LNM, but it also allows classification based on differences such as sample size, target gene expression, and examination method. PMID:29498671

  9. Classification based upon gene expression data: bias and precision of error rates.

    PubMed

    Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L

    2007-06-01

    Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp

  10. Bioinformatics, interaction network analysis, and neural networks to characterize gene expression of radicular cyst and periapical granuloma.

    PubMed

    Poswar, Fabiano de Oliveira; Farias, Lucyana Conceição; Fraga, Carlos Alberto de Carvalho; Bambirra, Wilson; Brito-Júnior, Manoel; Sousa-Neto, Manoel Damião; Santos, Sérgio Henrique Souza; de Paula, Alfredo Maurício Batista; D'Angelo, Marcos Flávio Silveira Vasconcelos; Guimarães, André Luiz Sena

    2015-06-01

    Bioinformatics has emerged as an important tool to analyze the large amount of data generated by research in different diseases. In this study, gene expression for radicular cysts (RCs) and periapical granulomas (PGs) was characterized based on a leader gene approach. A validated bioinformatics algorithm was applied to identify leader genes for RCs and PGs. Genes related to RCs and PGs were first identified in PubMed, GenBank, GeneAtlas, and GeneCards databases. The Web-available STRING software (The European Molecular Biology Laboratory [EMBL], Heidelberg, Baden-Württemberg, Germany) was used in order to build the interaction map among the identified genes by a significance score named weighted number of links. Based on the weighted number of links, genes were clustered using k-means. The genes in the highest cluster were considered leader genes. Multilayer perceptron neural network analysis was used as a complementary supplement for gene classification. For RCs, the suggested leader genes were TP53 and EP300, whereas PGs were associated with IL2RG, CCL2, CCL4, CCL5, CCR1, CCR3, and CCR5 genes. Our data revealed different gene expression for RCs and PGs, suggesting that not only the inflammatory nature but also other biological processes might differentiate RCs and PGs. Copyright © 2015 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  11. Cloud-Scale Genomic Signals Processing for Robust Large-Scale Cancer Genomic Microarray Data Analysis.

    PubMed

    Harvey, Benjamin Simeon; Ji, Soo-Yeon

    2017-01-01

    As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring forth oncological inference to the bioinformatics community through the analysis of large-scale cancer genomic (LSCG) DNA and mRNA microarray data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological interpretation by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale distributed parallel (CSDP) separable 1-D wavelet decomposition technique for denoising through differential expression thresholding and classification of LSCG microarray data. This research presents a novel methodology that utilizes a CSDP separable 1-D method for wavelet-based transformation in order to initialize a threshold which will retain significantly expressed genes through the denoising process for robust classification of cancer patients. Additionally, the overall study was implemented and encompassed within CSDP environment. The utilization of cloud computing and wavelet-based thresholding for denoising was used for the classification of samples within the Global Cancer Map, Cancer Cell Line Encyclopedia, and The Cancer Genome Atlas. The results proved that separable 1-D parallel distributed wavelet denoising in the cloud and differential expression thresholding increased the computational performance and enabled the generation of higher quality LSCG microarray datasets, which led to more accurate classification results.

  12. TrSDB: a proteome database of transcription factors

    PubMed Central

    Hermoso, Antoni; Aguilar, Daniel; Aviles, Francesc X.; Querol, Enrique

    2004-01-01

    TrSDB—TranScout Database—(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression. PMID:14681387

  13. Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD).

    PubMed

    Jiang, Xiangying; Ringwald, Martin; Blake, Judith; Shatkay, Hagit

    2017-01-01

    The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area. www.informatics.jax.org. © The Author(s) 2017. Published by Oxford University Press.

  14. Soybean kinome: functional classification and gene expression patterns

    PubMed Central

    Liu, Jinyi; Chen, Nana; Grant, Joshua N.; Cheng, Zong-Ming (Max); Stewart, C. Neal; Hewezi, Tarek

    2015-01-01

    The protein kinase (PK) gene family is one of the largest and most highly conserved gene families in plants and plays a role in nearly all biological functions. While a large number of genes have been predicted to encode PKs in soybean, a comprehensive functional classification and global analysis of expression patterns of this large gene family is lacking. In this study, we identified the entire soybean PK repertoire or kinome, which comprised 2166 putative PK genes, representing 4.67% of all soybean protein-coding genes. The soybean kinome was classified into 19 groups, 81 families, and 122 subfamilies. The receptor-like kinase (RLK) group was remarkably large, containing 1418 genes. Collinearity analysis indicated that whole-genome segmental duplication events may have played a key role in the expansion of the soybean kinome, whereas tandem duplications might have contributed to the expansion of specific subfamilies. Gene structure, subcellular localization prediction, and gene expression patterns indicated extensive functional divergence of PK subfamilies. Global gene expression analysis of soybean PK subfamilies revealed tissue- and stress-specific expression patterns, implying regulatory functions over a wide range of developmental and physiological processes. In addition, tissue and stress co-expression network analysis uncovered specific subfamilies with narrow or wide interconnected relationships, indicative of their association with particular or broad signalling pathways, respectively. Taken together, our analyses provide a foundation for further functional studies to reveal the biological and molecular functions of PKs in soybean. PMID:25614662

  15. Combining multiple decisions: applications to bioinformatics

    NASA Astrophysics Data System (ADS)

    Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.

    2008-01-01

    Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.

  16. Regulation of IAP (Inhibitor of Apoptosis) Gene Expression by the p53 Tumor Suppressor Protein

    DTIC Science & Technology

    2005-05-01

    adenovirus, gene therapy, polymorphism, 31 16. PRICE CODE 17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20...averaged results of three inde- pendent experiments, with standard error. Right panel: Level of p53 in infected cells using the antibody Ab-6 (Calbiochem...with highly purified mitochondria as described in (2). The arrow marks oligomerized BAK. The right _ -. panel depicts the purity of BMH CrosIinked Mito

  17. Challenging the Cancer Molecular Stratification Dogma: Intratumoral Heterogeneity Undermines Consensus Molecular Subtypes and Potential Diagnostic Value in Colorectal Cancer.

    PubMed

    Dunne, Philip D; McArt, Darragh G; Bradley, Conor A; O'Reilly, Paul G; Barrett, Helen L; Cummins, Robert; O'Grady, Tony; Arthur, Ken; Loughrey, Maurice B; Allen, Wendy L; McDade, Simon S; Waugh, David J; Hamilton, Peter W; Longley, Daniel B; Kay, Elaine W; Johnston, Patrick G; Lawler, Mark; Salto-Tellez, Manuel; Van Schaeybroeck, Sandra

    2016-08-15

    A number of independent gene expression profiling studies have identified transcriptional subtypes in colorectal cancer with potential diagnostic utility, culminating in publication of a colorectal cancer Consensus Molecular Subtype classification. The worst prognostic subtype has been defined by genes associated with stem-like biology. Recently, it has been shown that the majority of genes associated with this poor prognostic group are stromal derived. We investigated the potential for tumor misclassification into multiple diagnostic subgroups based on tumoral region sampled. We performed multiregion tissue RNA extraction/transcriptomic analysis using colorectal-specific arrays on invasive front, central tumor, and lymph node regions selected from tissue samples from 25 colorectal cancer patients. We identified a consensus 30-gene list, which represents the intratumoral heterogeneity within a cohort of primary colorectal cancer tumors. Using a series of online datasets, we showed that this gene list displays prognostic potential HR = 2.914 (confidence interval 0.9286-9.162) in stage II/III colorectal cancer patients, but in addition, we demonstrated that these genes are stromal derived, challenging the assumption that poor prognosis tumors with stem-like biology have undergone a widespread epithelial-mesenchymal transition. Most importantly, we showed that patients can be simultaneously classified into multiple diagnostically relevant subgroups based purely on the tumoral region analyzed. Gene expression profiles derived from the nonmalignant stromal region can influence assignment of colorectal cancer transcriptional subtypes, questioning the current molecular classification dogma and highlighting the need to consider pathology sampling region and degree of stromal infiltration when employing transcription-based classifiers to underpin clinical decision making in colorectal cancer. Clin Cancer Res; 22(16); 4095-104. ©2016 AACRSee related commentary by Morris and Kopetz, p. 3989. ©2016 American Association for Cancer Research.

  18. Low expression of MN1 associates with better treatment response in older patients with de novo cytogenetically normal acute myeloid leukemia

    PubMed Central

    Schwind, Sebastian; Marcucci, Guido; Kohlschmidt, Jessica; Radmacher, Michael D.; Mrózek, Krzysztof; Maharry, Kati; Becker, Heiko; Metzeler, Klaus H.; Whitman, Susan P.; Wu, Yue-Zhong; Powell, Bayard L.; Baer, Maria R.; Kolitz, Jonathan E.; Carroll, Andrew J.; Larson, Richard A.

    2011-01-01

    Low MN1 expression bestows favorable prognosis in younger adults with cytogenetically normal acute myeloid leukemia (CN-AML), but its prognostic significance in older patients is unknown. We analyzed pretherapy MN1 expression in 140 older (≥ 60 years) de novo CN-AML patients treated on cytarabine/daunorubicin-based protocols. Low MN1 expressers had higher complete remission (CR) rates (P = .001), and longer overall survival (P = .03) and event-free survival (EFS; P = .004). In multivariable models, low MN1 expression was associated with better CR rates and EFS. The impact of MN1 expression on overall survival and EFS was predominantly in patients 70 years of age or older, with low MN1 expressers with mutated NPM1 having the best outcome. The impact of MN1 expression was also observed in the Intermediate-I, but not the Favorable group of the European LeukemiaNet classification, where low MN1 expressers had CR rates and EFS similar to those of Favorable group patients. MN1 expresser-status-associated gene- and microRNA-expression signatures revealed underexpression of drug resistance and adverse outcome predictors, and overexpression of HOX genes and HOX-gene–embedded microRNAs in low MN1 expressers. We conclude that low MN1 expression confers better prognosis in older CN-AML patients and may refine the European LeukemiaNet classification. Biologic features associated with MN1 expression may help identify new treatment targets. PMID:21828125

  19. Extending bicluster analysis to annotate unclassified ORFs and predict novel functional modules using expression data

    PubMed Central

    Bryan, Kenneth; Cunningham, Pádraig

    2008-01-01

    Background Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation. PMID:18831786

  20. System for selecting relevant information for decision support.

    PubMed

    Kalina, Jan; Seidl, Libor; Zvára, Karel; Grünfeldová, Hana; Slovák, Dalibor; Zvárová, Jana

    2013-01-01

    We implemented a prototype of a decision support system called SIR which has a form of a web-based classification service for diagnostic decision support. The system has the ability to select the most relevant variables and to learn a classification rule, which is guaranteed to be suitable also for high-dimensional measurements. The classification system can be useful for clinicians in primary care to support their decision-making tasks with relevant information extracted from any available clinical study. The implemented prototype was tested on a sample of patients in a cardiological study and performs an information extraction from a high-dimensional set containing both clinical and gene expression data.

  1. EgoNet: identification of human disease ego-network modules

    PubMed Central

    2014-01-01

    Background Mining novel biomarkers from gene expression profiles for accurate disease classification is challenging due to small sample size and high noise in gene expression measurements. Several studies have proposed integrated analyses of microarray data and protein-protein interaction (PPI) networks to find diagnostic subnetwork markers. However, the neighborhood relationship among network member genes has not been fully considered by those methods, leaving many potential gene markers unidentified. The main idea of this study is to take full advantage of the biological observation that genes associated with the same or similar diseases commonly reside in the same neighborhood of molecular networks. Results We present EgoNet, a novel method based on egocentric network-analysis techniques, to exhaustively search and prioritize disease subnetworks and gene markers from a large-scale biological network. When applied to a triple-negative breast cancer (TNBC) microarray dataset, the top selected modules contain both known gene markers in TNBC and novel candidates, such as RAD51 and DOK1, which play a central role in their respective ego-networks by connecting many differentially expressed genes. Conclusions Our results suggest that EgoNet, which is based on the ego network concept, allows the identification of novel biomarkers and provides a deeper understanding of their roles in complex diseases. PMID:24773628

  2. Global identification and expression analysis of stress-responsive genes of the Argonaute family in apple.

    PubMed

    Xu, Ruirui; Liu, Caiyun; Li, Ning; Zhang, Shizhong

    2016-12-01

    Argonaute (AGO) proteins, which are found in yeast, animals, and plants, are the core molecules of the RNA-induced silencing complex. These proteins play important roles in plant growth, development, and responses to biotic stresses. The complete analysis and classification of the AGO gene family have been recently reported in different plants. Nevertheless, systematic analysis and expression profiling of these genes have not been performed in apple (Malus domestica). Approximately 15 AGO genes were identified in the apple genome. The phylogenetic tree, chromosome location, conserved protein motifs, gene structure, and expression of the AGO gene family in apple were analyzed for gene prediction. All AGO genes were phylogenetically clustered into four groups (i.e., AGO1, AGO4, MEL1/AGO5, and ZIPPY/AGO7) with the AGO genes of Arabidopsis. These groups of the AGO gene family were statistically analyzed and compared among 31 plant species. The predicted apple AGO genes are distributed across nine chromosomes at different densities and include three segment duplications. Expression studies indicated that 15 AGO genes exhibit different expression patterns in at least one of the tissues tested. Additionally, analysis of gene expression levels indicated that the genes are mostly involved in responses to NaCl, PEG, heat, and low-temperature stresses. Hence, several candidate AGO genes are involved in different aspects of physiological and developmental processes and may play an important role in abiotic stress responses in apple. To the best of our knowledge, this study is the first to report a comprehensive analysis of the apple AGO gene family. Our results provide useful information to understand the classification and putative functions of these proteins, especially for gene members that may play important roles in abiotic stress responses in M. hupehensis.

  3. Molecular diagnostics in the management of rhabdomyosarcoma.

    PubMed

    Arnold, Michael A; Barr, Fredric G

    2017-02-01

    A classification of rhabdomyosarcoma (RMS) with prognostic relevance has primarily relied on clinical features and histologic classification as either embryonal or alveolar RMS. The PAX3-FOXO1 and PAX7-FOXO1 gene fusions occur in 80% of cases with the alveolar subtype and are more predictive of outcome than histologic classification. Identifying additional molecular hallmarks that further subclassify RMS is an active area of research. Areas Covered: The authors review the current state of the PAX3-FOXO1 and PAX7-FOXO1 fusions as prognostic biomarkers. Emerging biomarkers, including mRNA expression profiling, MYOD1 mutations, RAS pathway mutations and gene fusions involving NCOA2 or VGLL2 are also reviewed. Expert commentary: Strategies for modifying RMS risk stratification based on molecular biomarkers are emerging with the potential to transform the clinical management of RMS, ultimately improving patient outcomes by tailoring therapy to predicted patient risk and identifying targets for novel therapies.

  4. Ion Channel Gene Expression in Lung Adenocarcinoma: Potential Role in Prognosis and Diagnosis

    PubMed Central

    Ko, Jae-Hong; Gu, Wanjun; Lim, Inja; Bang, Hyoweon; Ko, Eun A.; Zhou, Tong

    2014-01-01

    Ion channels are known to regulate cancer processes at all stages. The roles of ion channels in cancer pathology are extremely diverse. We systematically analyzed the expression patterns of ion channel genes in lung adenocarcinoma. First, we compared the expression of ion channel genes between normal and tumor tissues in patients with lung adenocarcinoma. Thirty-seven ion channel genes were identified as being differentially expressed between the two groups. Next, we investigated the prognostic power of ion channel genes in lung adenocarcinoma. We assigned a risk score to each lung adenocarcinoma patient based on the expression of the differentially expressed ion channel genes. We demonstrated that the risk score effectively predicted overall survival and recurrence-free survival in lung adenocarcinoma. We also found that the risk scores for ever-smokers were higher than those for never-smokers. Multivariate analysis indicated that the risk score was a significant prognostic factor for survival, which is independent of patient age, gender, stage, smoking history, Myc level, and EGFR/KRAS/ALK gene mutation status. Finally, we investigated the difference in ion channel gene expression between the two major subtypes of non-small cell lung cancer: adenocarcinoma and squamous-cell carcinoma. Thirty ion channel genes were identified as being differentially expressed between the two groups. We suggest that ion channel gene expression can be used to improve the subtype classification in non-small cell lung cancer at the molecular level. The findings in this study have been validated in several independent lung cancer cohorts. PMID:24466154

  5. Ontology based molecular signatures for immune cell types via gene expression analysis

    PubMed Central

    2013-01-01

    Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649

  6. Comprehensive analysis of MGMT promoter methylation: correlation with MGMT expression and clinical response in GBM.

    PubMed

    Shah, Nameeta; Lin, Biaoyang; Sibenaller, Zita; Ryken, Timothy; Lee, Hwahyung; Yoon, Jae-Geun; Rostad, Steven; Foltz, Greg

    2011-01-07

    O⁶-methylguanine DNA-methyltransferase (MGMT) promoter methylation has been identified as a potential prognostic marker for glioblastoma patients. The relationship between the exact site of promoter methylation and its effect on gene silencing, and the patient's subsequent response to therapy, is still being defined. The aim of this study was to comprehensively characterize cytosine-guanine (CpG) dinucleotide methylation across the entire MGMT promoter and to correlate individual CpG site methylation patterns to mRNA expression, protein expression, and progression-free survival. To best identify the specific MGMT promoter region most predictive of gene silencing and response to therapy, we determined the methylation status of all 97 CpG sites in the MGMT promoter in tumor samples from 70 GBM patients using quantitative bisulfite sequencing. We next identified the CpG site specific and regional methylation patterns most predictive of gene silencing and improved progression-free survival. Using this data, we propose a new classification scheme utilizing methylation data from across the entire promoter and show that an analysis based on this approach, which we call 3R classification, is predictive of progression-free survival (HR  = 5.23, 95% CI [2.089-13.097], p<0.0001). To adapt this approach to the clinical setting, we used a methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) test based on the 3R classification and show that this test is both feasible in the clinical setting and predictive of progression free survival (HR  = 3.076, 95% CI [1.301-7.27], p = 0.007). We discuss the potential advantages of a test based on this promoter-wide analysis and compare it to the commonly used methylation-specific PCR test. Further prospective validation of these two methods in a large independent patient cohort will be needed to confirm the added value of promoter wide analysis of MGMT methylation in the clinical setting.

  7. Comprehensive Analysis of MGMT Promoter Methylation: Correlation with MGMT Expression and Clinical Response in GBM

    PubMed Central

    Shah, Nameeta; Lin, Biaoyang; Sibenaller, Zita; Ryken, Timothy; Lee, Hwahyung; Yoon, Jae-Geun; Rostad, Steven; Foltz, Greg

    2011-01-01

    O6-methylguanine DNA-methyltransferase (MGMT) promoter methylation has been identified as a potential prognostic marker for glioblastoma patients. The relationship between the exact site of promoter methylation and its effect on gene silencing, and the patient's subsequent response to therapy, is still being defined. The aim of this study was to comprehensively characterize cytosine-guanine (CpG) dinucleotide methylation across the entire MGMT promoter and to correlate individual CpG site methylation patterns to mRNA expression, protein expression, and progression-free survival. To best identify the specific MGMT promoter region most predictive of gene silencing and response to therapy, we determined the methylation status of all 97 CpG sites in the MGMT promoter in tumor samples from 70 GBM patients using quantitative bisulfite sequencing. We next identified the CpG site specific and regional methylation patterns most predictive of gene silencing and improved progression-free survival. Using this data, we propose a new classification scheme utilizing methylation data from across the entire promoter and show that an analysis based on this approach, which we call 3R classification, is predictive of progression-free survival (HR  = 5.23, 95% CI [2.089–13.097], p<0.0001). To adapt this approach to the clinical setting, we used a methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) test based on the 3R classification and show that this test is both feasible in the clinical setting and predictive of progression free survival (HR  = 3.076, 95% CI [1.301–7.27], p = 0.007). We discuss the potential advantages of a test based on this promoter-wide analysis and compare it to the commonly used methylation-specific PCR test. Further prospective validation of these two methods in a large independent patient cohort will be needed to confirm the added value of promoter wide analysis of MGMT methylation in the clinical setting. PMID:21249131

  8. Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles

    PubMed Central

    Huang, Hung-Chung; Jupiter, Daniel; VanBuren, Vincent

    2010-01-01

    Background Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. Methodology/Principal Findings In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as ‘brain group’ and ‘non-brain group’; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. Conclusions/Significance The methodology employed here may be used to facilitate disease-specific biomarker discovery. PMID:20140228

  9. Differential gene expression in patients with subsyndromal symptomatic depression and major depressive disorder.

    PubMed

    Yang, Chengqing; Hu, Guoqin; Li, Zezhi; Wang, Qingzhong; Wang, Xuemei; Yuan, Chengmei; Wang, Zuowei; Hong, Wu; Lu, Weihong; Cao, Lan; Chen, Jun; Wang, Yong; Yu, Shunying; Zhou, Yimin; Yi, Zhenghui; Fang, Yiru

    2017-01-01

    Subsyndromal symptomatic depression (SSD) is a subtype of subthreshold depressive and can lead to significant psychosocial functional impairment. Although the pathogenesis of major depressive disorder (MDD) and SSD still remains poorly understood, a set of studies have found that many same genetic factors play important roles in the etiology of these two disorders. Nowadays, the differential gene expression between MDD and SSD is still unknown. In our previous study, we compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD and matched healthy controls (8 subjects in each group), and finally determined 48 gene expression signatures. Based on these findings, we further clarify whether these genes mRNA was different expressed in peripheral blood in patients with SSD, MDD and healthy controls (60 subjects respectively). With the help of the quantitative real-time reverse transcription-polymerase chain reaction (RT-qPCR), we gained gene relative expression levels among the three groups. We found that there are three of the forty eight co-regulated genes had differential expression in peripheral blood among the three groups, which are CD84, STRN, CTNS gene (F = 3.528, p = 0.034; F = 3.382, p = 0.039; F = 3.801, p = 0.026, respectively) while there were no significant differences for other genes. CD84, STRN, CTNS gene may have significant value for performing diagnostic functions and classifying SSD, MDD and healthy controls.

  10. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

    PubMed

    Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

    2018-04-11

    The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

  11. Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

    PubMed Central

    Zhao, Xin; Cheung, Leo Wang-Kit

    2007-01-01

    Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811

  12. Integrated genome-wide Alu methylation and transcriptome profiling analyses reveal novel epigenetic regulatory networks associated with autism spectrum disorder.

    PubMed

    Saeliw, Thanit; Tangsuwansri, Chayanin; Thongkorn, Surangrat; Chonchaiya, Weerasak; Suphapeetiporn, Kanya; Mutirangura, Apiwat; Tencomnao, Tewin; Hu, Valerie W; Sarachana, Tewarit

    2018-01-01

    Alu elements are a group of repetitive elements that can influence gene expression through CpG residues and transcription factor binding. Altered gene expression and methylation profiles have been reported in various tissues and cell lines from individuals with autism spectrum disorder (ASD). However, the role of Alu elements in ASD remains unclear. We thus investigated whether Alu elements are associated with altered gene expression profiles in ASD. We obtained five blood-based gene expression profiles from the Gene Expression Omnibus database and human Alu-inserted gene lists from the TranspoGene database. Differentially expressed genes (DEGs) in ASD were identified from each study and overlapped with the human Alu-inserted genes. The biological functions and networks of Alu-inserted DEGs were then predicted by Ingenuity Pathway Analysis (IPA). A combined bisulfite restriction analysis of lymphoblastoid cell lines (LCLs) derived from 36 ASD and 20 sex- and age-matched unaffected individuals was performed to assess the global DNA methylation levels within Alu elements, and the Alu expression levels were determined by quantitative RT-PCR. In ASD blood or blood-derived cells, 320 Alu-inserted genes were reproducibly differentially expressed. Biological function and pathway analysis showed that these genes were significantly associated with neurodevelopmental disorders and neurological functions involved in ASD etiology. Interestingly, estrogen receptor and androgen signaling pathways implicated in the sex bias of ASD, as well as IL-6 signaling and neuroinflammation signaling pathways, were also highlighted. Alu methylation was not significantly different between the ASD and sex- and age-matched control groups. However, significantly altered Alu methylation patterns were observed in ASD cases sub-grouped based on Autism Diagnostic Interview-Revised scores compared with matched controls. Quantitative RT-PCR analysis of Alu expression also showed significant differences between ASD subgroups. Interestingly, Alu expression was correlated with methylation status in one phenotypic ASD subgroup. Alu methylation and expression were altered in LCLs from ASD subgroups. Our findings highlight the association of Alu elements with gene dysregulation in ASD blood samples and warrant further investigation. Moreover, the classification of ASD individuals into subgroups based on phenotypes may be beneficial and could provide insights into the still unknown etiology and the underlying mechanisms of ASD.

  13. Robustness of equations that define molecular subtypes of glioblastoma tumors based on five transcripts measured by RT-PCR.

    PubMed

    Castells, Xavier; Acebes, Juan José; Majós, Carles; Boluda, Susana; Julià-Sapé, Margarida; Candiota, Ana Paula; Ariño, Joaquín; Barceló, Anna; Arús, Carles

    2015-01-01

    Glioblastoma (Gb) is one of the most deadly tumors. Its molecular subtypes are yet to be fully characterized while the attendant efforts for personalized medicine need to be intensified in relation to glioblastoma diagnosis, treatment, and prognosis. Several molecular signatures based on gene expression microarrays were reported, but the use of microarrays for routine clinical practice is challenged by attendant economic costs. Several authors have proposed discriminant equations based on RT-PCR. Still, the discriminant threshold is often incompletely described, which makes proper validation difficult. In a previous work, we have reported two Gb subtypes based on the expression levels of four genes: CHI3L1, LDHA, LGALS1, and IGFBP3. One Gb subtype presented with low expression of the four genes mentioned, and of MGMT in a large portion of the patients (with anticipated high methylation of its promoter), and mutated IDH1. Here, we evaluate the robustness of the equations fitted with these genes using RT-PCR values in a set of 64 cases and importantly, define an unequivocal discriminant threshold with a view to prognostic implications. We developed two approaches to generate the discriminant equations: 1) using the expression level of the four genes mentioned above, and 2) using those genes displaying the highest correlation with survival among the aforementioned four ones, plus MGMT, as an attempt to further reduce the number of genes. The ease of equations' applicability, reduction in cost for raw data, and robustness in terms of resampling-based classification accuracy warrant further evaluation of these equations to discern Gb tumor biopsy heterogeneity at molecular level, diagnose potential malignancy, and prognosis of individual patients with glioblastomas.

  14. Scoring clustering solutions by their biological relevance.

    PubMed

    Gat-Viks, I; Sharan, R; Shamir, R

    2003-12-12

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.

  15. A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis.

    PubMed

    Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan

    2014-01-01

    Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.

  16. Genome-wide analysis of TCP family in tobacco.

    PubMed

    Chen, L; Chen, Y Q; Ding, A M; Chen, H; Xia, F; Wang, W F; Sun, Y H

    2016-05-23

    The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco.

  17. 21 CFR 866.6040 - Gene expression profiling test system for breast cancer prognosis.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... cancer prognosis. 866.6040 Section 866.6040 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... cancer prognosis. (a) Identification. A gene expression profiling test system for breast cancer prognosis... previously diagnosed breast cancer. (b) Classification. Class II (special controls). The special control is...

  18. 21 CFR 866.6040 - Gene expression profiling test system for breast cancer prognosis.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... cancer prognosis. 866.6040 Section 866.6040 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... cancer prognosis. (a) Identification. A gene expression profiling test system for breast cancer prognosis... previously diagnosed breast cancer. (b) Classification. Class II (special controls). The special control is...

  19. 21 CFR 866.6040 - Gene expression profiling test system for breast cancer prognosis.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... cancer prognosis. 866.6040 Section 866.6040 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... cancer prognosis. (a) Identification. A gene expression profiling test system for breast cancer prognosis... previously diagnosed breast cancer. (b) Classification. Class II (special controls). The special control is...

  20. 21 CFR 866.6040 - Gene expression profiling test system for breast cancer prognosis.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... cancer prognosis. 866.6040 Section 866.6040 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... cancer prognosis. (a) Identification. A gene expression profiling test system for breast cancer prognosis... previously diagnosed breast cancer. (b) Classification. Class II (special controls). The special control is...

  1. 21 CFR 866.6040 - Gene expression profiling test system for breast cancer prognosis.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... cancer prognosis. 866.6040 Section 866.6040 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... cancer prognosis. (a) Identification. A gene expression profiling test system for breast cancer prognosis... previously diagnosed breast cancer. (b) Classification. Class II (special controls). The special control is...

  2. A signature inferred from Drosophila mitotic genes predicts survival of breast cancer patients.

    PubMed

    Damasco, Christian; Lembo, Antonio; Somma, Maria Patrizia; Gatti, Maurizio; Di Cunto, Ferdinando; Provero, Paolo

    2011-02-28

    The classification of breast cancer patients into risk groups provides a powerful tool for the identification of patients who will benefit from aggressive systemic therapy. The analysis of microarray data has generated several gene expression signatures that improve diagnosis and allow risk assessment. There is also evidence that cell proliferation-related genes have a high predictive power within these signatures. We thus constructed a gene expression signature (the DM signature) using the human orthologues of 108 Drosophila melanogaster genes required for either the maintenance of chromosome integrity (36 genes) or mitotic division (72 genes). The DM signature has minimal overlap with the extant signatures and is highly predictive of survival in 5 large breast cancer datasets. In addition, we show that the DM signature outperforms many widely used breast cancer signatures in predictive power, and performs comparably to other proliferation-based signatures. For most genes of the DM signature, an increased expression is negatively correlated with patient survival. The genes that provide the highest contribution to the predictive power of the DM signature are those involved in cytokinesis. This finding highlights cytokinesis as an important marker in breast cancer prognosis and as a possible target for antimitotic therapies.

  3. A comparison of machine learning techniques for survival prediction in breast cancer

    PubMed Central

    2011-01-01

    Background The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many "gene expression signatures" have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature. Results We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptrons and Random Forests in classifying patients from the NKI breast cancer dataset, and comparably to the scoring-based method originally proposed by the authors of the 70-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection. Conclusions Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data. PMID:21569330

  4. Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions

    PubMed Central

    Kluger, Yuval; Basri, Ronen; Chang, Joseph T.; Gerstein, Mark

    2003-01-01

    Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find “marker genes” that are differentially expressed in particular sets of “conditions.” We have developed a method that simultaneously clusters genes and conditions, finding distinctive “checkerboard” patterns in matrices of gene expression data, if they exist. In a cancer context, these checkerboards correspond to genes that are markedly up- or downregulated in patients with particular types of tumors. Our method, spectral biclustering, is based on the observation that checkerboard structures in matrices of expression data can be found in eigenvectors corresponding to characteristic expression patterns across genes or conditions. In addition, these eigenvectors can be readily identified by commonly used linear algebra approaches, in particular the singular value decomposition (SVD), coupled with closely integrated normalization steps. We present a number of variants of the approach, depending on whether the normalization over genes and conditions is done independently or in a coupled fashion. We then apply spectral biclustering to a selection of publicly available cancer expression data sets, and examine the degree to which the approach is able to identify checkerboard structures. Furthermore, we compare the performance of our biclustering methods against a number of reasonable benchmarks (e.g., direct application of SVD or normalized cuts to raw data). PMID:12671006

  5. An integrated method for cancer classification and rule extraction from microarray data

    PubMed Central

    Huang, Liang-Tsung

    2009-01-01

    Different microarray techniques recently have been successfully used to investigate useful information for cancer diagnosis at the gene expression level due to their ability to measure thousands of gene expression levels in a massively parallel way. One important issue is to improve classification performance of microarray data. However, it would be ideal that influential genes and even interpretable rules can be explored at the same time to offer biological insight. Introducing the concepts of system design in software engineering, this paper has presented an integrated and effective method (named X-AI) for accurate cancer classification and the acquisition of knowledge from DNA microarray data. This method included a feature selector to systematically extract the relative important genes so as to reduce the dimension and retain as much as possible of the class discriminatory information. Next, diagonal quadratic discriminant analysis (DQDA) was combined to classify tumors, and generalized rule induction (GRI) was integrated to establish association rules which can give an understanding of the relationships between cancer classes and related genes. Two non-redundant datasets of acute leukemia were used to validate the proposed X-AI, showing significantly high accuracy for discriminating different classes. On the other hand, I have presented the abilities of X-AI to extract relevant genes, as well as to develop interpretable rules. Further, a web server has been established for cancer classification and it is freely available at . PMID:19272192

  6. Gene expression profiling in multiple myeloma--reporting of entities, risk, and targets in clinical routine.

    PubMed

    Meissner, Tobias; Seckinger, Anja; Rème, Thierry; Hielscher, Thomas; Möhler, Thomas; Neben, Kai; Goldschmidt, Hartmut; Klein, Bernard; Hose, Dirk

    2011-12-01

    Multiple myeloma is an incurable malignant plasma cell disease characterized by survival ranging from several months to more than 15 years. Assessment of risk and underlying molecular heterogeneity can be excellently done by gene expression profiling (GEP), but its way into clinical routine is hampered by the lack of an appropriate reporting tool and the integration with other prognostic factors into a single "meta" risk stratification. The GEP-report (GEP-R) was built as an open-source software developed in R for gene expression reporting in clinical practice using Affymetrix microarrays. GEP-R processes new samples by applying a documentation-by-value strategy to the raw data to be able to assign thresholds and grouping algorithms defined on a reference cohort of 262 patients with multiple myeloma. Furthermore, we integrated expression-based and conventional prognostic factors within one risk stratification (HM-metascore). The GEP-R comprises (i) quality control, (ii) sample identity control, (iii) biologic classification, (iv) risk stratification, and (v) assessment of target genes. The resulting HM-metascore is defined as the sum over the weighted factors gene expression-based risk-assessment (UAMS-, IFM-score), proliferation, International Staging System (ISS) stage, t(4;14), and expression of prognostic target genes (AURKA, IGF1R) for which clinical grade inhibitors exist. The HM-score delineates three significantly different groups of 13.1%, 72.1%, and 14.7% of patients with a 6-year survival rate of 89.3%, 60.6%, and 18.6%, respectively. GEP reporting allows prospective assessment of risk and target gene expression and integration of current prognostic factors in clinical routine, being customizable about novel parameters or other cancer entities. ©2011 AACR.

  7. Molecular cancer classification using a meta-sample-based regularized robust coding method.

    PubMed

    Wang, Shu-Lin; Sun, Liuchao; Fang, Jianwen

    2014-01-01

    Previous studies have demonstrated that machine learning based molecular cancer classification using gene expression profiling (GEP) data is promising for the clinic diagnosis and treatment of cancer. Novel classification methods with high efficiency and prediction accuracy are still needed to deal with high dimensionality and small sample size of typical GEP data. Recently the sparse representation (SR) method has been successfully applied to the cancer classification. Nevertheless, its efficiency needs to be improved when analyzing large-scale GEP data. In this paper we present the meta-sample-based regularized robust coding classification (MRRCC), a novel effective cancer classification technique that combines the idea of meta-sample-based cluster method with regularized robust coding (RRC) method. It assumes that the coding residual and the coding coefficient are respectively independent and identically distributed. Similar to meta-sample-based SR classification (MSRC), MRRCC extracts a set of meta-samples from the training samples, and then encodes a testing sample as the sparse linear combination of these meta-samples. The representation fidelity is measured by the l2-norm or l1-norm of the coding residual. Extensive experiments on publicly available GEP datasets demonstrate that the proposed method is more efficient while its prediction accuracy is equivalent to existing MSRC-based methods and better than other state-of-the-art dimension reduction based methods.

  8. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  9. A deep learning-based multi-model ensemble method for cancer prediction.

    PubMed

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Prediction of in vivo hepatotoxicity effects using in vitro ...

    EPA Pesticide Factsheets

    High-throughput in vitro transcriptomics data support molecular understanding of chemical-induced toxicity. Here, we evaluated the utility of such data to predict liver toxicity. First, in vitro gene expression data for 93 genes was generated following exposure of metabolically competent HepaRG cells to 1060 environmental chemicals from the US EPA ToxCast library. The empirical relationship between these data and rat chronic liver endpoints from animal studies in the Toxicity Reference Database (ToxRefDB) was then evaluated using machine learning techniques. Chemicals were classified as positive (242) or negative (135) based on observed hepatic histopathologic effects, and divided into three categories: hypertrophy (183), injury (112) and proliferative lesions (101). Hepatotoxicants were classified on the basis of the bioactivity of 93 genes (descriptors) using six machine learning algorithms: linear discriminant analysis, naïve Bayes, support vector classification, classification and regression trees, k-nearest neighbors, and an ensemble of classifiers. Classification performance was evaluated using 10-fold cross-validation testing, and in-loop, filter-based, feature subset selection. The best balanced accuracy for prediction of hypertrophy, injury and proliferative lesions were 0.81 ± 0.07, 0.79 ± 0.08 and 0.77 ± 0.09, respectively. Gene specific perturbation of xenobiotic metabolism enzymes (CYP7A1/2E1/4A11/1A1/4A22) and transporters (ABCG2, ABCB11, SLC22

  11. Gene expression profiles in whole blood and associations with metabolic dysregulation in obesity.

    PubMed

    Cox, Amanda J; Zhang, Ping; Evans, Tiffany J; Scott, Rodney J; Cripps, Allan W; West, Nicholas P

    Gene expression data provides one tool to gain further insight into the complex biological interactions linking obesity and metabolic disease. This study examined associations between blood gene expression profiles and metabolic disease in obesity. Whole blood gene expression profiles, performed using the Illumina HT-12v4 Human Expression Beadchip, were compared between (i) individuals with obesity (O) or lean (L) individuals (n=21 each), (ii) individuals with (M) or without (H) Metabolic Syndrome (n=11 each) matched on age and gender. Enrichment of differentially expressed genes (DEG) into biological pathways was assessed using Ingenuity Pathway Analysis. Association between sets of genes from biological pathways considered functionally relevant and Metabolic Syndrome were further assessed using an area under the curve (AUC) and cross-validated classification rate (CR). For OvL, only 50 genes were significantly differentially expressed based on the selected differential expression threshold (1.2-fold, p<0.05). For MvH, 582 genes were significantly differentially expressed (1.2-fold, p<0.05) and pathway analysis revealed enrichment of DEG into a diverse set of pathways including immune/inflammatory control, insulin signalling and mitochondrial function pathways. Gene sets from the mTOR signalling pathways demonstrated the strongest association with Metabolic Syndrome (p=8.1×10 -8 ; AUC: 0.909, CR: 72.7%). These results support the use of expression profiling in whole blood in the absence of more specific tissue types for investigations of metabolic disease. Using a pathway analysis approach it was possible to identify an enrichment of DEG into biological pathways that could be targeted for in vitro follow-up. Copyright © 2017 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.

  12. Molecular classification of gastric cancer: a new paradigm.

    PubMed

    Shah, Manish A; Khanin, Raya; Tang, Laura; Janjigian, Yelena Y; Klimstra, David S; Gerdes, Hans; Kelsen, David P

    2011-05-01

    Gastric cancer may be subdivided into 3 distinct subtypes--proximal, diffuse, and distal gastric cancer--based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology. Our aim is to test the hypothesis that these distinct gastric cancer subtypes may also be distinguished by gene expression analysis. Patients with localized gastric adenocarcinoma being screened for a phase II preoperative clinical trial (National Cancer Institute, NCI #5917) underwent endoscopic biopsy for fresh tumor procurement. Four to 6 targeted biopsies of the primary tumor were obtained. Macrodissection was carried out to ensure more than 80% carcinoma in the sample. HG-U133A GeneChip (Affymetrix) was used for cDNA expression analysis, and all arrays were processed and analyzed using the Bioconductor R-package. Between November 2003 and January 2006, 57 patients were screened to identify 36 patients with localized gastric cancer who had adequate RNA for expression analysis. Using supervised analysis, we built a classifier to distinguish the 3 gastric cancer subtypes, successfully classifying each into tightly grouped clusters. Leave-one-out cross-validation error was 0.14, suggesting that more than 85% of samples were classified correctly. Gene set analysis with the false discovery rate set at 0.25 identified several pathways that were differentially regulated when comparing each gastric cancer subtype to adjacent normal stomach. Subtypes of gastric cancer that have epidemiologic and histologic distinctions are also distinguished by gene expression data. These preliminary data suggest a new classification of gastric cancer with implications for improving our understanding of disease biology and identification of unique molecular drivers for each gastric cancer subtype. ©2011 AACR.

  13. Molecular Classification of Gastric Cancer: A new paradigm

    PubMed Central

    Shah, Manish A.; Khanin, Raya; Tang, Laura; Janjigian, Yelena Y.; Klimstra, David S.; Gerdes, Hans; Kelsen, David P.

    2011-01-01

    Purpose Gastric cancer may be subdivided into three distinct subtypes –proximal, diffuse, and distal gastric cancer– based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology. Our aim is to test the hypothesis that these distinct gastric cancer subtypes may also be distinguished by gene expression analysis. Experimental Design Patients with localized gastric adenocarcinoma being screened for a phase II preoperative clinical trial (NCI 5917) underwent endoscopic biopsy for fresh tumor procurement. 4–6 targeted biopsies of the primary tumor were obtained. Macrodissection was performed to ensure >80% carcinoma in the sample. HG-U133A GeneChip (Affymetrix) was used for cDNA expression analysis, and all arrays were processed and analyzed using the Bioconductor R-package. Results Between November 2003 and January 2006, 57 patients were screened to identify 36 patients with localized gastric cancer who had adequate RNA for expression analysis. Using supervised analysis, we built a classifier to distinguish the three gastric cancer subtypes, successfully classifying each into tightly grouped clusters. Leave-one-out cross validation error was 0.14, suggesting that >85% of samples were classified correctly. Gene set analysis with the False Discovery Rate set at 0.25 identified several pathways that were differentially regulated when comparing each gastric cancer subtype to adjacent normal stomach. Conclusions Subtypes of gastric cancer that have epidemiologic and histologic distinction are also distinguished by gene expression data. These preliminary data suggest a new classification of gastric cancer with implications for improving our understanding of disease biology and identification of unique molecular drivers for each gastric cancer subtype. PMID:21430069

  14. Alteration of gene expression by zinc oxide nanoparticles or zinc sulfate in vivo and comparison with in vitro data: A harmonious case.

    PubMed

    Zhang, Wei-Dong; Zhao, Yong; Zhang, Hong-Fu; Wang, Shu-Kun; Hao, Zhi-Hui; Liu, Jing; Yuan, Yu-Qing; Zhang, Peng-Fei; Yang, Hong-Di; Shen, Wei; Li, Lan

    2016-08-01

    Granulosa cells (GCs) are those somatic cells closest to the female germ cell. GCs play a vital role in oocyte growth and development, and the oocyte is necessary for multiplication of a species. Zinc oxide (ZnO) nanoparticles (NPs) readily cross biologic barriers to be absorbed into biologic systems that make them promising candidates as food additives. The objective of the present investigation was to explore the impact of intact NPs on gene expression and the functional classification of altered genes in hen GCs in vivo, to compare the data from in vivo and in vitro studies, and finally to point out the adverse effects of ZnO NPs on the reproductive system. After a 24-week treatment, hen GCs were isolated and gene expression was quantified. Intact NPs were found in the ovary and other organs. Zn levels were similar in ZnO-NP-100 mg/kg- and ZnSO4-100 mg/kg-treated hen ovaries. ZnO-NP-100 mg/kg and ZnSO4-100 mg/kg regulated the expression of the same sets of genes, and they also altered the expression of different sets of genes individually. The number of genes altered by the ZnO-NP-100 mg/kg and ZnSO4-100 mg/kg treatments was different. Gene Ontology (GO) functional analysis reported that different results for the two treatments and, in Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, 12 pathways (out of the top 20 pathways) in each treatment were different. These results suggested that intact NPs and Zn(2+) had different effects on gene expression in GCs in vivo. In our recent publication, we noted that intact NPs and Zn(2+) differentially altered gene expression in GCs in vitro. However, GO functional classification and KEGG pathway enrichment analyses revealed close similarities for the changed genes in vivo and in vitro after ZnO NP treatment. Furthermore, close similarities were observed for the changed genes after ZnSO4 treatments in vivo and in vitro by GO functional classification and KEGG pathway enrichment analyses. Therefore, the effects of ZnO NPs on gene expression in vitro might represent their effects on gene expression in vivo. The results from this study and our earlier studies support previous findings indicating ZnO NPs promote adverse effects on organisms. Therefore, precautions should be taken when ZnO NPs are used as diet additives for hens because they might cause reproductive issues. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Simple and Flexible Classification of Gene Expression Microarrays Via Swirls and Ripples | Division of Cancer Prevention

    Cancer.gov

    By Stuart G. Baker The program requires Mathematica 7.01.0 The key function is Classify [datalist,options] where datalist={data, genename, dataname} data ={matrix for class 0, matrix for class 1}, matrix is gene expression by specimen genename a list of names of genes, dataname ={name of data set, name of class0, name of class1} |

  16. iTAK: A program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators and protein kinases

    USDA-ARS?s Scientific Manuscript database

    Transcription factors (TFs) are proteins that regulate the expression of target genes by binding to specific elements in their regulatory regions. Transcriptional regulators (TRs) also regulate the expression of target genes; however, they operate indirectly via interaction with the basal transcript...

  17. Gene expression signature in urine for diagnosing and assessing aggressiveness of bladder urothelial carcinoma.

    PubMed

    Mengual, Lourdes; Burset, Moisès; Ribal, María José; Ars, Elisabet; Marín-Aguilera, Mercedes; Fernández, Manuel; Ingelmo-Torres, Mercedes; Villavicencio, Humberto; Alcaraz, Antonio

    2010-05-01

    To develop an accurate and noninvasive method for bladder cancer diagnosis and prediction of disease aggressiveness based on the gene expression patterns of urine samples. Gene expression patterns of 341 urine samples from bladder urothelial cell carcinoma (UCC) patients and 235 controls were analyzed via TaqMan Arrays. In a first phase of the study, three consecutive gene selection steps were done to identify a gene set expression signature to detect and stratify UCC in urine. Subsequently, those genes more informative for UCC diagnosis and prediction of tumor aggressiveness were combined to obtain a classification system of bladder cancer samples. In a second phase, the obtained gene set signature was evaluated in a routine clinical scenario analyzing only voided urine samples. We have identified a 12+2 gene expression signature for UCC diagnosis and prediction of tumor aggressiveness on urine samples. Overall, this gene set panel had 98% sensitivity (SN) and 99% specificity (SP) in discriminating between UCC and control samples and 79% SN and 92% SP in predicting tumor aggressiveness. The translation of the model to the clinically applicable format corroborates that the 12+2 gene set panel described maintains a high accuracy for UCC diagnosis (SN = 89% and SP = 95%) and tumor aggressiveness prediction (SN = 79% and SP = 91%) in voided urine samples. The 12+2 gene expression signature described in urine is able to identify patients suffering from UCC and predict tumor aggressiveness. We show that a panel of molecular markers may improve the schedule for diagnosis and follow-up in UCC patients. Copyright 2010 AACR.

  18. Cloning and Expression of Genes for Dengue Virus Type-2 Encoded-Antigens for Rapid Diagnosis and Vaccine Development

    DTIC Science & Technology

    1988-10-31

    00 0 Cloning and Expression of Genes for Dengue Virus (Type-2 Encoded-Antigens for Rapid ODiagnosis and Vaccine DevelopmentN| ANNUAL PROGRESS REPORT...11. TITLE (include Security Classification) Cloning and Expression of Genes f or Dengue Virus Type 2 Fncoded Antigens for Rapid Diagnosis and Vaccine ...epidemics in Central and South Americas and the Caribbean is a cause of major concern. An effective vaccine is not available to protect individuals

  19. Identification, Classification, and Expression Analysis of GRAS Gene Family in Malus domestica

    PubMed Central

    Fan, Sheng; Zhang, Dong; Gao, Cai; Zhao, Ming; Wu, Haiqin; Li, Youmei; Shen, Yawen; Han, Mingyu

    2017-01-01

    GRAS genes encode plant-specific transcription factors that play important roles in plant growth and development. However, little is known about the GRAS gene family in apple. In this study, 127 GRAS genes were identified in the apple (Malus domestica Borkh.) genome and named MdGRAS1 to MdGRAS127 according to their chromosomal locations. The chemical characteristics, gene structures and evolutionary relationships of the MdGRAS genes were investigated. The 127 MdGRAS genes could be grouped into eight subfamilies based on their structural features and phylogenetic relationships. Further analysis of gene structures, segmental and tandem duplication, gene phylogeny and tissue-specific expression with ArrayExpress database indicated their diversification in quantity, structure and function. We further examined the expression pattern of MdGRAS genes during apple flower induction with transcriptome sequencing. Eight higher MdGRAS (MdGRAS6, 26, 28, 44, 53, 64, 107, and 122) genes were surfaced. Further quantitative reverse transcription PCR indicated that the candidate eight genes showed distinct expression patterns among different tissues (leaves, stems, flowers, buds, and fruits). The transcription levels of eight genes were also investigated with various flowering related treatments (GA3, 6-BA, and sucrose) and different flowering varieties (Yanfu No. 6 and Nagafu No. 2). They all were affected by flowering-related circumstance and showed different expression level. Changes in response to these hormone or sugar related treatments indicated their potential involvement during apple flower induction. Taken together, our results provide rich resources for studying GRAS genes and their potential clues in genetic improvement of apple flowering, which enriches biological theories of GRAS genes in apple and their involvement in flower induction of fruit trees. PMID:28503152

  20. Identification, Classification, and Expression Analysis of GRAS Gene Family in Malus domestica.

    PubMed

    Fan, Sheng; Zhang, Dong; Gao, Cai; Zhao, Ming; Wu, Haiqin; Li, Youmei; Shen, Yawen; Han, Mingyu

    2017-01-01

    GRAS genes encode plant-specific transcription factors that play important roles in plant growth and development. However, little is known about the GRAS gene family in apple. In this study, 127 GRAS genes were identified in the apple ( Malus domestica Borkh.) genome and named MdGRAS1 to MdGRAS127 according to their chromosomal locations. The chemical characteristics, gene structures and evolutionary relationships of the MdGRAS genes were investigated. The 127 MdGRAS genes could be grouped into eight subfamilies based on their structural features and phylogenetic relationships. Further analysis of gene structures, segmental and tandem duplication, gene phylogeny and tissue-specific expression with ArrayExpress database indicated their diversification in quantity, structure and function. We further examined the expression pattern of MdGRAS genes during apple flower induction with transcriptome sequencing. Eight higher MdGRAS ( MdGRAS6, 26, 28, 44, 53, 64, 107 , and 122 ) genes were surfaced. Further quantitative reverse transcription PCR indicated that the candidate eight genes showed distinct expression patterns among different tissues (leaves, stems, flowers, buds, and fruits). The transcription levels of eight genes were also investigated with various flowering related treatments (GA 3 , 6-BA, and sucrose) and different flowering varieties (Yanfu No. 6 and Nagafu No. 2). They all were affected by flowering-related circumstance and showed different expression level. Changes in response to these hormone or sugar related treatments indicated their potential involvement during apple flower induction. Taken together, our results provide rich resources for studying GRAS genes and their potential clues in genetic improvement of apple flowering, which enriches biological theories of GRAS genes in apple and their involvement in flower induction of fruit trees.

  1. Function Clustering Self-Organization Maps (FCSOMs) for mining differentially expressed genes in Drosophila and its correlation with the growth medium.

    PubMed

    Liu, L L; Liu, M J; Ma, M

    2015-09-28

    The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.

  2. Genome-wide identification, phylogeny and expression analyses of SCARECROW-LIKE(SCL) genes in millet (Setaria italica).

    PubMed

    Liu, Hongyun; Qin, Jiajia; Fan, Hui; Cheng, Jinjin; Li, Lin; Liu, Zheng

    2017-07-01

    As a member of the GRAS gene family, SCARECROW - LIKE ( SCL ) genes encode transcriptional regulators that are involved in plant information transmission and signal transduction. In this study, 44 SCL genes including two SCARECROW genes in millet were identified to be distributed on eight chromosomes, except chromosome 6. All the millet genes contain motifs 6-8, indicating that these motifs are conserved during the evolution. SCL genes of millet were divided into eight groups based on the phylogenetic relationship and classification of Arabidopsis SCL genes. Several putative millet orthologous genes in Arabidopsis , maize and rice were identified. High throughput RNA sequencing revealed that the expressions of millet SCL genes in root, stem, leaf, spica, and along leaf gradient varied greatly. Analyses combining the gene expression patterns, gene structures, motif compositions, promoter cis -elements identification, alternative splicing of transcripts and phylogenetic relationship of SCL genes indicate that the these genes may play diverse functions. Functionally characterized SCL genes in maize, rice and Arabidopsis would provide us some clues for future characterization of their homologues in millet. To the best of our knowledge, this is the first study of millet SCL genes at the genome wide level. Our work provides a useful platform for functional analysis of SCL genes in millet, a model crop for C 4 photosynthesis and bioenergy studies.

  3. Allelic Imbalance Is a Prevalent and Tissue-Specific Feature of the Mouse Transcriptome

    PubMed Central

    Pinter, Stefan F.; Colognori, David; Beliveau, Brian J.; Sadreyev, Ruslan I.; Payer, Bernhard; Yildirim, Eda; Wu, Chao-ting; Lee, Jeannie T.

    2015-01-01

    In mammals, several classes of monoallelic genes have been identified, including those subject to X-chromosome inactivation (XCI), genomic imprinting, and random monoallelic expression (RMAE). However, the extent to which these epigenetic phenomena are influenced by underlying genetic variation is unknown. Here we perform a systematic classification of allelic imbalance in mouse hybrids derived from reciprocal crosses of divergent strains. We observe that deviation from balanced biallelic expression is common, occurring in ∼20% of the mouse transcriptome in a given tissue. Allelic imbalance attributed to genotypic variation is by far the most prevalent class and typically is tissue-specific. However, some genotype-based imbalance is maintained across tissues and is associated with greater genetic variation, especially in 5′ and 3′ termini of transcripts. We further identify novel random monoallelic and imprinted genes and find that genotype can modify penetrance of parental origin even in the setting of large imprinted regions. Examination of nascent transcripts in single cells from inbred parental strains reveals that genes showing genotype-based imbalance in hybrids can also exhibit monoallelic expression in isogenic backgrounds. This surprising observation may suggest a competition between alleles and/or reflect the combined impact of cis- and trans-acting variation on expression of a given gene. Our findings provide novel insights into gene regulation and may be relevant to human genetic variation and disease. PMID:25858912

  4. Identifying Candidate Reprogramming Genes in Mouse Induced Pluripotent Stem Cells.

    PubMed

    Gao, Fang; Li, Jingyu; Zhang, Heng; Yang, Xu; An, Tiezhu

    2017-08-01

    Factor-based induced reprogramming approaches have tremendous potential for human regenerative medicine, but the efficiencies of these approaches are still low. In this study, we analyzed the global transcriptional profiles of mouse induced pluripotent stem cells (miPSCs) and mouse embryonic stem cells (mESCs) from seven different labs and present here the first successful clustering according to cell type, not by lab of origin. We identified 2131 different expression genes (DEs) as candidate pluripotency-associated genes by comparing mESCs/miPSCs with somatic cells and 720 DEs between miPSCs and mESCs. Interestingly, there was a significant overlap between the two DE sets. Therefore, we defined the overlap DEs as "consensus DEs" including 313 miPSC-specific genes expressed at a higher level in miPSCs versus mESCs and 184 mESC-specific genes in total and reasoned that these may contribute to the differences in pluripotency between mESCs and miPSCs. A classification of "consensus DEs" according to their different expression levels between somatic cells and mESCs/miPSCs shows that 86% of the miPSC-specific genes are more highly expressed in somatic cells, while 73% of mESC-specific genes are highly expressed in mESCs/miPSCs, indicating that the miPSCs have not efficiently silenced the expression pattern of the somatic cells from which they are derived and failed to completely induce the genes with high expression levels in mESCs. We further revealed a strong correlation between oocyte-enriched factors and insufficiently induced mESC-specific genes and identified 11 hub genes via network analysis. In light of these findings, we postulated that these key hub genes might not only drive somatic cell nuclear transfer (SCNT) reprogramming but also augment the efficiency and quality of miPSC reprogramming.

  5. Gene selection and cancer type classification of diffuse large-B-cell lymphoma using a bivariate mixture model for two-species data.

    PubMed

    Su, Yuhua; Nielsen, Dahlia; Zhu, Lei; Richards, Kristy; Suter, Steven; Breen, Matthew; Motsinger-Reif, Alison; Osborne, Jason

    2013-01-05

    : A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. The model utility was illustrated using a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University. A small number of genes were identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma (DLBCL) patients into two subgroups, the germinal center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. The number of human genes that were observed to be significantly differentially expressed (21) from the two-species analysis was very small compared to the number of human genes (190) identified with only one-species analysis (human data). The genes may be clinically relevant/important, as this small set achieved low misclassification rates of DLBCL subtypes. Additionally, the two subgroups defined by this cluster of human genes had significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed mixture model provided improved insight into the clinical differences between the two cancer subtypes.

  6. GECKO: a complete large-scale gene expression analysis platform.

    PubMed

    Theilhaber, Joachim; Ulyanov, Anatoly; Malanthara, Anish; Cole, Jack; Xu, Dapeng; Nahf, Robert; Heuer, Michael; Brockel, Christoph; Bushnell, Steven

    2004-12-10

    Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing approximately 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (approximately 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. The Gecko system is being made publicly available as free software http://sourceforge.net/projects/geckoe. In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs.

  7. An ensemble predictive modeling framework for breast cancer classification.

    PubMed

    Nagarajan, Radhakrishnan; Upreti, Meenakshi

    2017-12-01

    Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

    PubMed

    Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J

    2018-05-17

    Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

  9. Implementation of spectral clustering with partitioning around medoids (PAM) algorithm on microarray data of carcinoma

    NASA Astrophysics Data System (ADS)

    Cahyaningrum, Rosalia D.; Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Technology of microarray became one of the imperative tools in life science to observe the gene expression levels, one of which is the expression of the genes of people with carcinoma. Carcinoma is a cancer that forms in the epithelial tissue. These data can be analyzed such as the identification expressions hereditary gene and also build classifications that can be used to improve diagnosis of carcinoma. Microarray data usually served in large dimension that most methods require large computing time to do the grouping. Therefore, this study uses spectral clustering method which allows to work with any object for reduces dimension. Spectral clustering method is a method based on spectral decomposition of the matrix which is represented in the form of a graph. After the data dimensions are reduced, then the data are partitioned. One of the famous partition method is Partitioning Around Medoids (PAM) which is minimize the objective function with exchanges all the non-medoid points into medoid point iteratively until converge. Objectivity of this research is to implement methods spectral clustering and partitioning algorithm PAM to obtain groups of 7457 genes with carcinoma based on the similarity value. The result in this study is two groups of genes with carcinoma.

  10. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  11. CMS-dependent prognostic impact of KRAS and BRAFV600E mutations in primary colorectal cancer.

    PubMed

    Smeby, J; Sveen, A; Merok, M A; Danielsen, S A; Eilertsen, I A; Guren, M G; Dienstmann, R; Nesbakken, A; Lothe, R A

    2018-05-01

    The prognostic impact of KRAS and BRAFV600E mutations in primary colorectal cancer (CRC) varies with microsatellite instability (MSI) status. The gene expression-based consensus molecular subtypes (CMSs) of CRC define molecularly and clinically distinct subgroups, and represent a novel stratification framework in biomarker analysis. We investigated the prognostic value of these mutations within the CMS groups. Totally 1197 primary tumors from a Norwegian series of CRC stage I-IV were analyzed for MSI and mutation status in hotspots in KRAS (codons 12, 13 and 61) and BRAF (codon 600). A subset was analyzed for gene expression and confident CMS classification was obtained for 317 samples. This cohort was expanded with clinical and molecular data, including CMS classification, from 514 patients in the publically available dataset GSE39582. Gene expression signatures associated with KRAS and BRAFV600E mutations were used to evaluate differential impact of mutations on gene expression among the CMS groups. BRAFV600E and KRAS mutations were both associated with inferior 5-year overall survival (OS) exclusively in MSS tumors (BRAFV600E mutation versus KRAS/BRAF wild-type: Hazard ratio (HR) 2.85, P < 0.001; KRAS mutation versus KRAS/BRAF wild-type: HR 1.30, P = 0.013). BRAFV600E-mutated MSS tumors were strongly enriched and associated with metastatic disease in CMS1, leading to negative prognostic impact in this subtype (OS: BRAFV600E mutation versus wild-type: HR 7.73, P = 0.001). In contrast, the poor prognosis of KRAS mutations was limited to MSS tumors with CMS2/CMS3 epithelial-like gene expression profiles (OS: KRAS mutation versus wild-type: HR 1.51, P = 0.011). The subtype-specific prognostic associations were substantiated by differential effects of BRAFV600E and KRAS mutations on gene expression signatures according to the MSI status and CMS group. BRAFV600E mutations are enriched and associated with metastatic disease in CMS1 MSS tumors, leading to poor prognosis in this subtype. KRAS mutations are associated with adverse outcome in epithelial (CMS2/CMS3) MSS tumors.

  12. Rough set soft computing cancer classification and network: one stone, two birds.

    PubMed

    Zhang, Yue

    2010-07-15

    Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article.

  13. A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

    PubMed

    Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

    2017-05-10

    Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .

  14. A Model-Free Machine Learning Method for Risk Classification and Survival Probability Prediction.

    PubMed

    Geng, Yuan; Lu, Wenbin; Zhang, Hao Helen

    2014-01-01

    Risk classification and survival probability prediction are two major goals in survival data analysis since they play an important role in patients' risk stratification, long-term diagnosis, and treatment selection. In this article, we propose a new model-free machine learning framework for risk classification and survival probability prediction based on weighted support vector machines. The new procedure does not require any specific parametric or semiparametric model assumption on data, and is therefore capable of capturing nonlinear covariate effects. We use numerous simulation examples to demonstrate finite sample performance of the proposed method under various settings. Applications to a glioma tumor data and a breast cancer gene expression survival data are shown to illustrate the new methodology in real data analysis.

  15. Comparative study of classification algorithms for immunosignaturing data

    PubMed Central

    2012-01-01

    Background High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data. Results We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy. Conclusions ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties. PMID:22720696

  16. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

  17. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

  18. Pathway analyses and understanding disease associations

    PubMed Central

    Liu, Yu; Chance, Mark R

    2013-01-01

    High throughput technologies have been applied to investigate the underlying mechanisms of complex diseases, identify disease-associations and help to improve treatment. However it is challenging to derive biological insight from conventional single gene based analysis of “omics” data from high throughput experiments due to sample and patient heterogeneity. To address these challenges, many novel pathway and network based approaches were developed to integrate various “omics” data, such as gene expression, copy number alteration, Genome Wide Association Studies, and interaction data. This review will cover recent methodological developments in pathway analysis for the detection of dysregulated interactions and disease-associated subnetworks, prioritization of candidate disease genes, and disease classifications. For each application, we will also discuss the associated challenges and potential future directions. PMID:24319650

  19. Gene selection for microarray data classification via subspace learning and manifold regularization.

    PubMed

    Tang, Chang; Cao, Lijuan; Zheng, Xiao; Wang, Minhui

    2017-12-19

    With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.

  20. Distinct iris gene expression profiles of primary angle closure glaucoma and primary open angle glaucoma and their interaction with ocular biometric parameters.

    PubMed

    Seet, Li-Fong; Narayanaswamy, Arun; Finger, Sharon N; Htoon, Hla M; Nongpiur, Monisha E; Toh, Li Zhen; Ho, Henrietta; Perera, Shamira A; Wong, Tina T

    2016-11-01

    This study aimed to evaluate differences in iris gene expression profiles between primary angle closure glaucoma (PACG) and primary open angle glaucoma (POAG) and their interaction with biometric characteristics. Prospective study. Thirty-five subjects with PACG and thirty-three subjects with POAG who required trabeculectomy were enrolled at the Singapore National Eye Centre, Singapore. Iris specimens, obtained by iridectomy, were analysed by real-time polymerase chain reaction for expression of type I collagen, vascular endothelial growth factor (VEGF)-A, -B and -C, as well as VEGF receptors (VEGFRs) 1 and 2. Anterior segment optical coherence tomography (ASOCT) imaging for biometric parameters, including anterior chamber depth (ACD), anterior chamber volume (ACV) and lens vault (LV), was also performed pre-operatively. Relative mRNA levels between PACG and POAG irises, biometric measurements, discriminant analyses using genes and biometric parameters. COL1A1, VEGFB, VEGFC and VEGFR2 mRNA expression was higher in PACG compared to POAG irises. LV, ACD and ACV were significantly different between the two subgroups. Discriminant analyses based on gene expression, biometric parameters or a combination of both gene expression and biometrics (LV and ACV), correctly classified 94.1%, 85.3% and 94.1% of the original PACG and POAG cases, respectively. The discriminant function combining genes and biometrics demonstrated the highest accuracy in cross-validated classification of the two glaucoma subtypes. Distinct iris gene expression supports the pathophysiological differences that exist between PACG and POAG. Biometric parameters can combine with iris gene expression to more accurately define PACG from POAG. © 2016 The Authors. Clinical & Experimental Ophthalmology published by John Wiley & Sons Australia, Ltd on behalf of Royal Australian and New Zealand College of Ophthalmologists.

  1. On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification

    NASA Astrophysics Data System (ADS)

    Aygün, Eser; Oommen, B. John; Cataltepe, Zehra

    Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of bio-sequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classification using the information residing in their syntactic representations. The latter has traditionally been achieved using the edit distances required in the respective peptide comparisons. We advocate that one can model the differences between compared strings as a mutation model consisting of random Substitutions, Insertions and Deletions (SID) obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a Support Vector Machine (SVM)-based peptide classifier, referred to as OIT_SVM, can be devised.

  2. Differentiating disease subtypes by using pathway patterns constructed from gene expressions and protein networks.

    PubMed

    Hung, Fei-Hung; Chiu, Hung-Wen

    2015-01-01

    Gene expression profiles differ in different diseases. Even if diseases are at the same stage, such diseases exhibit different gene expressions, not to mention the different subtypes at a single lesion site. Distinguishing different disease subtypes at a single lesion site is difficult. In early cases, subtypes were initially distinguished by doctors. Subsequently, further differences were found through pathological experiments. For example, a brain tumor can be classified according to its origin, its cell-type origin, or the tumor site. Because of the advancements in bioinformatics and the techniques for accumulating gene expressions, researchers can use gene expression data to classify disease subtypes. Because the operation of a biopathway is closely related to the disease mechanism, the application of gene expression profiles for clustering disease subtypes is insufficient. In this study, we collected gene expression data of healthy and four myelodysplastic syndrome subtypes and applied a method that integrated protein-protein interaction and gene expression data to identify different patterns of disease subtypes. We hope it is efficient for the classification of disease subtypes in adventure.

  3. Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data.

    PubMed

    Gomez-Pulido, Juan A; Cerrada-Barrios, Jose L; Trinidad-Amado, Sebastian; Lanza-Gutierrez, Jose M; Fernandez-Diaz, Ramon A; Crawford, Broderick; Soto, Ricardo

    2016-08-31

    Metaheuristics are widely used to solve large combinatorial optimization problems in bioinformatics because of the huge set of possible solutions. Two representative problems are gene selection for cancer classification and biclustering of gene expression data. In most cases, these metaheuristics, as well as other non-linear techniques, apply a fitness function to each possible solution with a size-limited population, and that step involves higher latencies than other parts of the algorithms, which is the reason why the execution time of the applications will mainly depend on the execution time of the fitness function. In addition, it is usual to find floating-point arithmetic formulations for the fitness functions. This way, a careful parallelization of these functions using the reconfigurable hardware technology will accelerate the computation, specially if they are applied in parallel to several solutions of the population. A fine-grained parallelization of two floating-point fitness functions of different complexities and features involved in biclustering of gene expression data and gene selection for cancer classification allowed for obtaining higher speedups and power-reduced computation with regard to usual microprocessors. The results show better performances using reconfigurable hardware technology instead of usual microprocessors, in computing time and power consumption terms, not only because of the parallelization of the arithmetic operations, but also thanks to the concurrent fitness evaluation for several individuals of the population in the metaheuristic. This is a good basis for building accelerated and low-energy solutions for intensive computing scenarios.

  4. Chronomics of pressure overload-induced cardiac hypertrophy in mice reveals altered day/night gene expression and biomarkers of heart disease.

    PubMed

    Tsimakouridze, Elena V; Straume, Marty; Podobed, Peter S; Chin, Heather; LaMarre, Jonathan; Johnson, Ron; Antenos, Monica; Kirby, Gordon M; Mackay, Allison; Huether, Patsy; Simpson, Jeremy A; Sole, Michael; Gadal, Gerard; Martino, Tami A

    2012-08-01

    There is critical demand in contemporary medicine for gene expression markers in all areas of human disease, for early detection of disease, classification, prognosis, and response to therapy. The integrity of circadian gene expression underlies cardiovascular health and disease; however time-of-day profiling in heart disease has never been examined. We hypothesized that a time-of-day chronomic approach using samples collected across 24-h cycles and analyzed by microarrays and bioinformatics advances contemporary approaches, because it includes sleep-time and/or wake-time molecular responses. As proof of concept, we demonstrate the value of this approach in cardiovascular disease using a murine Transverse Aortic Constriction (TAC) model of pressure overload-induced cardiac hypertrophy in mice. First, microarrays and a novel algorithm termed DeltaGene were used to identify time-of-day differences in gene expression in cardiac hypertrophy 8 wks post-TAC. The top 300 candidates were further analyzed using knowledge-based platforms, paring the list to 20 candidates, which were then validated by real-time polymerase chain reaction (RTPCR). Next, we tested whether the time-of-day gene expression profiles could be indicative of disease progression by comparing the 1- vs. 8-wk TAC. Lastly, since protein expression is functionally relevant, we monitored time-of-day cycling for the analogous cardiac proteins. This approach is generally applicable and can lead to new understanding of disease.

  5. Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.

    PubMed

    Tong, Dong Ling; Schierz, Amanda C

    2011-09-01

    Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data. The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time. The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types. The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality. Copyright © 2011 Elsevier B.V. All rights reserved.

  6. A Comparison of RNA-Seq Results from Paired Formalin-Fixed Paraffin-Embedded and Fresh-Frozen Glioblastoma Tissue Samples

    PubMed Central

    Esteve-Codina, Anna; Arpi, Oriol; Martinez-García, Maria; Pineda, Estela; Mallo, Mar; Gut, Marta; Carrato, Cristina; Rovira, Anna; Lopez, Raquel; Tortosa, Avelina; Dabad, Marc; Del Barco, Sonia; Heath, Simon; Bagué, Silvia; Ribalta, Teresa; Alameda, Francesc; de la Iglesia, Nuria

    2017-01-01

    The molecular classification of glioblastoma (GBM) based on gene expression might better explain outcome and response to treatment than clinical factors. Whole transcriptome sequencing using next-generation sequencing platforms is rapidly becoming accepted as a tool for measuring gene expression for both research and clinical use. Fresh frozen (FF) tissue specimens of GBM are difficult to obtain since tumor tissue obtained at surgery is often scarce and necrotic and diagnosis is prioritized over freezing. After diagnosis, leftover tissue is usually stored as formalin-fixed paraffin-embedded (FFPE) tissue. However, RNA from FFPE tissues is usually degraded, which could hamper gene expression analysis. We compared RNA-Seq data obtained from matched pairs of FF and FFPE GBM specimens. Only three FFPE out of eleven FFPE-FF matched samples yielded informative results. Several quality-control measurements showed that RNA from FFPE samples was highly degraded but maintained transcriptomic similarities to RNA from FF samples. Certain issues regarding mutation analysis and subtype prediction were detected. Nevertheless, our results suggest that RNA-Seq of FFPE GBM specimens provides reliable gene expression data that can be used in molecular studies of GBM if the RNA is sufficiently preserved. PMID:28122052

  7. Gene expression profiles reveal key genes for early diagnosis and treatment of adamantinomatous craniopharyngioma.

    PubMed

    Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing

    2018-04-23

    Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis and benefit the therapy improvement.

  8. Genome-wide analysis of the R2R3-MYB transcription factor gene family in sweet orange (Citrus sinensis).

    PubMed

    Liu, Chaoyang; Wang, Xia; Xu, Yuantao; Deng, Xiuxin; Xu, Qiang

    2014-10-01

    MYB transcription factor represents one of the largest gene families in plant genomes. Sweet orange (Citrus sinensis) is one of the most important fruit crops worldwide, and recently the genome has been sequenced. This provides an opportunity to investigate the organization and evolutionary characteristics of sweet orange MYB genes from whole genome view. In the present study, we identified 100 R2R3-MYB genes in the sweet orange genome. A comprehensive analysis of this gene family was performed, including the phylogeny, gene structure, chromosomal localization and expression pattern analyses. The 100 genes were divided into 29 subfamilies based on the sequence similarity and phylogeny, and the classification was also well supported by the highly conserved exon/intron structures and motif composition. The phylogenomic comparison of MYB gene family among sweet orange and related plant species, Arabidopsis, cacao and papaya suggested the existence of functional divergence during evolution. Expression profiling indicated that sweet orange R2R3-MYB genes exhibited distinct temporal and spatial expression patterns. Our analysis suggested that the sweet orange MYB genes may play important roles in different plant biological processes, some of which may be potentially involved in citrus fruit quality. These results will be useful for future functional analysis of the MYB gene family in sweet orange.

  9. Transcriptomic markers meet the real world: finding diagnostic signatures of corticosteroid treatment in commercial beef samples

    PubMed Central

    2012-01-01

    Background The use of growth-promoters in beef cattle, despite the EU ban, remains a frequent practice. The use of transcriptomic markers has already proposed to identify indirect evidence of anabolic hormone treatment. So far, such approach has been tested in experimentally treated animals. Here, for the first time commercial samples were analyzed. Results Quantitative determination of Dexamethasone (DEX) residues in the urine collected at the slaughterhouse was performed by Liquid Chromatography-Mass Spectrometry (LC-MS). DNA-microarray technology was used to obtain transcriptomic profiles of skeletal muscle in commercial samples and negative controls. LC-MS confirmed the presence of low level of DEX residues in the urine of the commercial samples suspect for histological classification. Principal Component Analysis (PCA) on microarray data identified two clusters of samples. One cluster included negative controls and a subset of commercial samples, while a second cluster included part of the specimens collected at the slaughterhouse together with positives for corticosteroid treatment based on thymus histology and LC-MS. Functional analysis of the differentially expressed genes (3961) between the two groups provided further evidence that animals clustering with positive samples might have been treated with corticosteroids. These suspect samples could be reliably classified with a specific classification tool (Prediction Analysis of Microarray) using just two genes. Conclusions Despite broad variation observed in gene expression profiles, the present study showed that DNA-microarrays can be used to find transcriptomic signatures of putative anabolic treatments and that gene expression markers could represent a useful screening tool. PMID:23110699

  10. MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants.

    PubMed

    Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren

    2018-01-01

    Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.

  11. IBTK Differently Modulates Gene Expression and RNA Splicing in HeLa and K562 Cells.

    PubMed

    Fiume, Giuseppe; Scialdone, Annarita; Rizzo, Francesca; De Filippo, Maria Rosaria; Laudanna, Carmelo; Albano, Francesco; Golino, Gaetanina; Vecchio, Eleonora; Pontoriero, Marilena; Mimmi, Selena; Ceglia, Simona; Pisano, Antonio; Iaccino, Enrico; Palmieri, Camillo; Paduano, Sergio; Viglietto, Giuseppe; Weisz, Alessandro; Scala, Giuseppe; Quinto, Ileana

    2016-11-07

    The IBTK gene encodes the major protein isoform IBTKα that was recently characterized as substrate receptor of Cul3-dependent E3 ligase, regulating ubiquitination coupled to proteasomal degradation of Pdcd4, an inhibitor of translation. Due to the presence of Ankyrin-BTB-RCC1 domains that mediate several protein-protein interactions, IBTKα could exert expanded regulatory roles, including interaction with transcription regulators. To verify the effects of IBTKα on gene expression, we analyzed HeLa and K562 cell transcriptomes by RNA-Sequencing before and after IBTK knock-down by shRNA transduction. In HeLa cells, 1285 (2.03%) of 63,128 mapped transcripts were differentially expressed in IBTK -shRNA-transduced cells, as compared to cells treated with control-shRNA, with 587 upregulated (45.7%) and 698 downregulated (54.3%) RNAs. In K562 cells, 1959 (3.1%) of 63128 mapped RNAs were differentially expressed in IBTK -shRNA-transduced cells, including 1053 upregulated (53.7%) and 906 downregulated (46.3%). Only 137 transcripts (0.22%) were commonly deregulated by IBTK silencing in both HeLa and K562 cells, indicating that most IBTKα effects on gene expression are cell type-specific. Based on gene ontology classification, the genes responsive to IBTK are involved in different biological processes, including in particular chromatin and nucleosomal organization, gene expression regulation, and cellular traffic and migration. In addition, IBTK RNA interference affected RNA maturation in both cell lines, as shown by the evidence of alternative 3'- and 5'-splicing, mutually exclusive exons, retained introns, and skipped exons. Altogether, these results indicate that IBTK differently modulates gene expression and RNA splicing in HeLa and K562 cells, demonstrating a novel biological role of this protein.

  12. IBTK Differently Modulates Gene Expression and RNA Splicing in HeLa and K562 Cells

    PubMed Central

    Fiume, Giuseppe; Scialdone, Annarita; Rizzo, Francesca; De Filippo, Maria Rosaria; Laudanna, Carmelo; Albano, Francesco; Golino, Gaetanina; Vecchio, Eleonora; Pontoriero, Marilena; Mimmi, Selena; Ceglia, Simona; Pisano, Antonio; Iaccino, Enrico; Palmieri, Camillo; Paduano, Sergio; Viglietto, Giuseppe; Weisz, Alessandro; Scala, Giuseppe; Quinto, Ileana

    2016-01-01

    The IBTK gene encodes the major protein isoform IBTKα that was recently characterized as substrate receptor of Cul3-dependent E3 ligase, regulating ubiquitination coupled to proteasomal degradation of Pdcd4, an inhibitor of translation. Due to the presence of Ankyrin-BTB-RCC1 domains that mediate several protein-protein interactions, IBTKα could exert expanded regulatory roles, including interaction with transcription regulators. To verify the effects of IBTKα on gene expression, we analyzed HeLa and K562 cell transcriptomes by RNA-Sequencing before and after IBTK knock-down by shRNA transduction. In HeLa cells, 1285 (2.03%) of 63,128 mapped transcripts were differentially expressed in IBTK-shRNA-transduced cells, as compared to cells treated with control-shRNA, with 587 upregulated (45.7%) and 698 downregulated (54.3%) RNAs. In K562 cells, 1959 (3.1%) of 63128 mapped RNAs were differentially expressed in IBTK-shRNA-transduced cells, including 1053 upregulated (53.7%) and 906 downregulated (46.3%). Only 137 transcripts (0.22%) were commonly deregulated by IBTK silencing in both HeLa and K562 cells, indicating that most IBTKα effects on gene expression are cell type-specific. Based on gene ontology classification, the genes responsive to IBTK are involved in different biological processes, including in particular chromatin and nucleosomal organization, gene expression regulation, and cellular traffic and migration. In addition, IBTK RNA interference affected RNA maturation in both cell lines, as shown by the evidence of alternative 3′- and 5′-splicing, mutually exclusive exons, retained introns, and skipped exons. Altogether, these results indicate that IBTK differently modulates gene expression and RNA splicing in HeLa and K562 cells, demonstrating a novel biological role of this protein. PMID:27827994

  13. Rough Set Soft Computing Cancer Classification and Network: One Stone, Two Birds

    PubMed Central

    Zhang, Yue

    2010-01-01

    Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article. PMID:20706619

  14. Cloning and Characterization of a Cell Senescence Gene for Breast Cancer Cells

    DTIC Science & Technology

    2004-07-01

    have already established the inducible expression system in a retroviral vector for these studies. F. References 1. Hayflick , L. (1965). The limited ...CLASSIFICATION 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATION 20. LIMITATION OF ABSTRACT OF REPORT OF THIS PAGE OFABSTRACT Unclassified...13-14 Annual report A. Introduction Normal diploid mammalian cells display a limited proliferative life span in culture (1-3

  15. Classification of Time Series Gene Expression in Clinical Studies via Integration of Biological Network

    PubMed Central

    Qian, Liwei; Zheng, Haoran; Zhou, Hong; Qin, Ruibin; Li, Jinlong

    2013-01-01

    The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of–the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction. PMID:23516469

  16. Global gene expression profiling of oral cavity cancers suggests molecular heterogeneity within anatomic subsites

    PubMed Central

    Severino, Patricia; Alvares, Adriana M; Michaluart, Pedro; Okamoto, Oswaldo K; Nunes, Fabio D; Moreira-Filho, Carlos A; Tajara, Eloiza H

    2008-01-01

    Background Oral squamous cell carcinoma (OSCC) is a frequent neoplasm, which is usually aggressive and has unpredictable biological behavior and unfavorable prognosis. The comprehension of the molecular basis of this variability should lead to the development of targeted therapies as well as to improvements in specificity and sensitivity of diagnosis. Results Samples of primary OSCCs and their corresponding surgical margins were obtained from male patients during surgery and their gene expression profiles were screened using whole-genome microarray technology. Hierarchical clustering and Principal Components Analysis were used for data visualization and One-way Analysis of Variance was used to identify differentially expressed genes. Samples clustered mostly according to disease subsite, suggesting molecular heterogeneity within tumor stages. In order to corroborate our results, two publicly available datasets of microarray experiments were assessed. We found significant molecular differences between OSCC anatomic subsites concerning groups of genes presently or potentially important for drug development, including mRNA processing, cytoskeleton organization and biogenesis, metabolic process, cell cycle and apoptosis. Conclusion Our results corroborate literature data on molecular heterogeneity of OSCCs. Differences between disease subsites and among samples belonging to the same TNM class highlight the importance of gene expression-based classification and challenge the development of targeted therapies. PMID:19014556

  17. Transcriptome Analysis of Capsicum Chlorosis Virus-Induced Hypersensitive Resistance Response in Bell Capsicum.

    PubMed

    Widana Gamage, Shirani M K; McGrath, Desmond J; Persley, Denis M; Dietzgen, Ralf G

    2016-01-01

    Capsicum chlorosis virus (CaCV) is an emerging pathogen of capsicum, tomato and peanut crops in Australia and South-East Asia. Commercial capsicum cultivars with CaCV resistance are not yet available, but CaCV resistance identified in Capsicum chinense is being introgressed into commercial Bell capsicum. However, our knowledge of the molecular mechanisms leading to the resistance response to CaCV infection is limited. Therefore, transcriptome and expression profiling data provide an important resource to better understand CaCV resistance mechanisms. We assembled capsicum transcriptomes and analysed gene expression using Illumina HiSeq platform combined with a tag-based digital gene expression system. Total RNA extracted from CaCV/mock inoculated CaCV resistant (R) and susceptible (S) capsicum at the time point when R line showed a strong hypersensitive response to CaCV infection was used in transcriptome assembly. Gene expression profiles of R and S capsicum in CaCV- and buffer-inoculated conditions were compared. None of the genes were differentially expressed (DE) between R and S cultivars when mock-inoculated, while 2484 genes were DE when inoculated with CaCV. Functional classification revealed that the most highly up-regulated DE genes in R capsicum included pathogenesis-related genes, cell death-associated genes, genes associated with hormone-mediated signalling pathways and genes encoding enzymes involved in synthesis of defense-related secondary metabolites. We selected 15 genes to confirm DE expression levels by real-time quantitative PCR. DE transcript profiling data provided comprehensive gene expression information to gain an understanding of the underlying CaCV resistance mechanisms. Further, we identified candidate CaCV resistance genes in the CaCV-resistant C. annuum x C. chinense breeding line. This knowledge will be useful in future for fine mapping of the CaCV resistance locus and potential genetic engineering of resistance into CaCV-susceptible crops.

  18. Transcriptome Analysis of Capsicum Chlorosis Virus-Induced Hypersensitive Resistance Response in Bell Capsicum

    PubMed Central

    Widana Gamage, Shirani M. K.; McGrath, Desmond J.; Persley, Denis M.

    2016-01-01

    Background Capsicum chlorosis virus (CaCV) is an emerging pathogen of capsicum, tomato and peanut crops in Australia and South-East Asia. Commercial capsicum cultivars with CaCV resistance are not yet available, but CaCV resistance identified in Capsicum chinense is being introgressed into commercial Bell capsicum. However, our knowledge of the molecular mechanisms leading to the resistance response to CaCV infection is limited. Therefore, transcriptome and expression profiling data provide an important resource to better understand CaCV resistance mechanisms. Methodology/Principal Findings We assembled capsicum transcriptomes and analysed gene expression using Illumina HiSeq platform combined with a tag-based digital gene expression system. Total RNA extracted from CaCV/mock inoculated CaCV resistant (R) and susceptible (S) capsicum at the time point when R line showed a strong hypersensitive response to CaCV infection was used in transcriptome assembly. Gene expression profiles of R and S capsicum in CaCV- and buffer-inoculated conditions were compared. None of the genes were differentially expressed (DE) between R and S cultivars when mock-inoculated, while 2484 genes were DE when inoculated with CaCV. Functional classification revealed that the most highly up-regulated DE genes in R capsicum included pathogenesis-related genes, cell death-associated genes, genes associated with hormone-mediated signalling pathways and genes encoding enzymes involved in synthesis of defense-related secondary metabolites. We selected 15 genes to confirm DE expression levels by real-time quantitative PCR. Conclusion/Significance DE transcript profiling data provided comprehensive gene expression information to gain an understanding of the underlying CaCV resistance mechanisms. Further, we identified candidate CaCV resistance genes in the CaCV-resistant C. annuum x C. chinense breeding line. This knowledge will be useful in future for fine mapping of the CaCV resistance locus and potential genetic engineering of resistance into CaCV-susceptible crops. PMID:27398596

  19. MicroRNAs in Prostate Cancer

    DTIC Science & Technology

    2008-11-01

    microarray, gene expression, androgen 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18 . NUMBER OF PAGES 19a. NAME OF RESPONSIBLE...Anindya Dutta; Yong Sun Lee; Hak Kyun Kim MicroRNAs are short single-stranded RNAs of 18 -22 bases length that are produced by the processing of...we hope to go to xenograft assays to demonstrate that microRNA alterations can suppress metastasis. REFERENCES: 1. Mattie, M.D., et al

  20. Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verhaak, Roel GW; Hoadley, Katherine A; Purdom, Elizabeth

    The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefitmore » in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.« less

  1. Pathological Bases for a Robust Application of Cancer Molecular Classification

    PubMed Central

    Diaz-Cano, Salvador J.

    2015-01-01

    Any robust classification system depends on its purpose and must refer to accepted standards, its strength relying on predictive values and a careful consideration of known factors that can affect its reliability. In this context, a molecular classification of human cancer must refer to the current gold standard (histological classification) and try to improve it with key prognosticators for metastatic potential, staging and grading. Although organ-specific examples have been published based on proteomics, transcriptomics and genomics evaluations, the most popular approach uses gene expression analysis as a direct correlate of cellular differentiation, which represents the key feature of the histological classification. RNA is a labile molecule that varies significantly according with the preservation protocol, its transcription reflect the adaptation of the tumor cells to the microenvironment, it can be passed through mechanisms of intercellular transference of genetic information (exosomes), and it is exposed to epigenetic modifications. More robust classifications should be based on stable molecules, at the genetic level represented by DNA to improve reliability, and its analysis must deal with the concept of intratumoral heterogeneity, which is at the origin of tumor progression and is the byproduct of the selection process during the clonal expansion and progression of neoplasms. The simultaneous analysis of multiple DNA targets and next generation sequencing offer the best practical approach for an analytical genomic classification of tumors. PMID:25898411

  2. Dynamic Visualization of Co-expression in Systems Genetics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    New, Joshua Ryan; Huang, Jian; Chesler, Elissa J

    2008-01-01

    Biologists hope to address grand scientific challenges by exploring the abundance of data made available through modern microarray technology and other high-throughput techniques. The impact of this data, however, is limited unless researchers can effectively assimilate such complex information and integrate it into their daily research; interactive visualization tools are called for to support the effort. Specifically, typical studies of gene co-expression require novel visualization tools that enable the dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These tools should allow biologists to develop an intuitive understanding of the structure of biologicalmore » networks and discover genes which reside in critical positions in networks and pathways. By using a graph as a universal data representation of correlation in gene expression data, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool for interacting with gene co-expression data integrates techniques such as: graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized b-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using a real-world workflow from a large-scale, systems genetics study of mammalian gene co-expression.« less

  3. Genotype-based gene signature of glioma risk.

    PubMed

    Huang, Yen-Tsung; Zhang, Yi; Wu, Zhijin; Michaud, Dominique S

    2017-07-01

    Glioma accounts for 80% of malignant brain tumors, but its etiologic determinants remain elusive. Despite genetic susceptibility loci identified by genome-wide association study (GWAS), the agnostic approach leaves open the possibility that other susceptibility genes remain to be discovered. Here we conduct a gene-centric integrative GWAS (iGWAS) of glioma risk that combines transcriptomics and genetics. We synthesized a brain transcriptomics dataset (n = 354), a GWAS dataset (n = 4203), and an advanced glioma tumor transcriptomic dataset (n = 483) to conduct an iGWAS. Using the expression quantitative trait loci (eQTL) dataset, we built models to predict gene expression for the GWAS data, based on eQTL genotypes. With the predicted gene expression, iGWAS analyses were performed using a novel statistical method. Gene signature risk score was constructed using a penalized logistic regression model. A total of 30527 transcripts were analyzed using the iGWAS approach. Four novel glioma susceptibility genes were identified with internal and external validation, including DRD5 (P = 3.0 × 10-79), WDR1 (P = 8.4 × 10-77), NOMO1 (P = 1.3 × 10-25), and PDXDC1 (P = 8.3 × 10-24). The genotype-predicted transcription pattern between cases and controls is consistent with that between tumor and its matched normal tissue. The genotype-based 4-gene signature improved the classification between glioma cases and controls based on age, gender, and population stratification, with area under the receiver operating characteristic curve increasing from 0.77 to 0.85 (P = 8.1 × 10-23). A new genotype-based gene signature of glioma was identified using a novel iGWAS approach, which integrates multiplatform genomic data as well as different genetic association studies. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  4. Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.

    PubMed

    Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao

    2015-04-01

    Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

    DTIC Science & Technology

    2002-01-01

    their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested

  6. Comparative analyses identify molecular signature of MRI-classified SVZ-associated glioblastoma

    PubMed Central

    Lin, Chin-Hsing Annie; Rhodes, Christopher T.; Lin, ChenWei; Phillips, Joanna J.; Berger, Mitchel S.

    2017-01-01

    ABSTRACT Glioblastoma (GBM) is a highly aggressive brain cancer with limited therapeutic options. While efforts to identify genes responsible for GBM have revealed mutations and aberrant gene expression associated with distinct types of GBM, patients with GBM are often diagnosed and classified based on MRI features. Therefore, we seek to identify molecular representatives in parallel with MRI classification for group I and group II primary GBM associated with the subventricular zone (SVZ). As group I and II GBM contain stem-like signature, we compared gene expression profiles between these 2 groups of primary GBM and endogenous neural stem progenitor cells to reveal dysregulation of cell cycle, chromatin status, cellular morphogenesis, and signaling pathways in these 2 types of MRI-classified GBM. In the absence of IDH mutation, several genes associated with metabolism are differentially expressed in these subtypes of primary GBM, implicating metabolic reprogramming occurs in tumor microenvironment. Furthermore, histone lysine methyltransferase EZH2 was upregulated while histone lysine demethylases KDM2 and KDM4 were downregulated in both group I and II primary GBM. Lastly, we identified 9 common genes across large data sets of gene expression profiles among MRI-classified group I/II GBM, a large cohort of GBM subtypes from TCGA, and glioma stem cells by unsupervised clustering comparison. These commonly upregulated genes have known functions in cell cycle, centromere assembly, chromosome segregation, and mitotic progression. Our findings highlight altered expression of genes important in chromosome integrity across all GBM, suggesting a common mechanism of disrupted fidelity of chromosome structure in GBM. PMID:28278055

  7. Transferring genomics to the clinic: distinguishing Burkitt and diffuse large B cell lymphomas.

    PubMed

    Sha, Chulin; Barrans, Sharon; Care, Matthew A; Cunningham, David; Tooze, Reuben M; Jack, Andrew; Westhead, David R

    2015-01-01

    Classifiers based on molecular criteria such as gene expression signatures have been developed to distinguish Burkitt lymphoma and diffuse large B cell lymphoma, which help to explore the intermediate cases where traditional diagnosis is difficult. Transfer of these research classifiers into a clinical setting is challenging because there are competing classifiers in the literature based on different methodology and gene sets with no clear best choice; classifiers based on one expression measurement platform may not transfer effectively to another; and, classifiers developed using fresh frozen samples may not work effectively with the commonly used and more convenient formalin fixed paraffin-embedded samples used in routine diagnosis. Here we thoroughly compared two published high profile classifiers developed on data from different Affymetrix array platforms and fresh-frozen tissue, examining their transferability and concordance. Based on this analysis, a new Burkitt and diffuse large B cell lymphoma classifier (BDC) was developed and employed on Illumina DASL data from our own paraffin-embedded samples, allowing comparison with the diagnosis made in a central haematopathology laboratory and evaluation of clinical relevance. We show that both previous classifiers can be recapitulated using very much smaller gene sets than originally employed, and that the classification result is closely dependent on the Burkitt lymphoma criteria applied in the training set. The BDC classification on our data exhibits high agreement (~95 %) with the original diagnosis. A simple outcome comparison in the patients presenting intermediate features on conventional criteria suggests that the cases classified as Burkitt lymphoma by BDC have worse response to standard diffuse large B cell lymphoma treatment than those classified as diffuse large B cell lymphoma. In this study, we comprehensively investigate two previous Burkitt lymphoma molecular classifiers, and implement a new gene expression classifier, BDC, that works effectively on paraffin-embedded samples and provides useful information for treatment decisions. The classifier is available as a free software package under the GNU public licence within the R statistical software environment through the link http://www.bioinformatics.leeds.ac.uk/labpages/softwares/ or on github https://github.com/Sharlene/BDC.

  8. Clinical Value of Prognosis Gene Expression Signatures in Colorectal Cancer: A Systematic Review

    PubMed Central

    Cordero, David; Riccadonna, Samantha; Solé, Xavier; Crous-Bou, Marta; Guinó, Elisabet; Sanjuan, Xavier; Biondo, Sebastiano; Soriano, Antonio; Jurman, Giuseppe; Capella, Gabriel; Furlanello, Cesare; Moreno, Victor

    2012-01-01

    Introduction The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets. Methods A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples. Results Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system. Conclusions The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic. PMID:23145004

  9. The Early Innate Response of Chickens to Salmonella enterica Is Dependent on the Presence of O-Antigen but Not on Serovar Classification

    PubMed Central

    Varmuzova, Karolina; Matulova, Marta Elsheimer; Sebkova, Alena; Sekelova, Zuzana; Havlickova, Hana; Sisak, Frantisek; Babak, Vladimir; Rychlik, Ivan

    2014-01-01

    Salmonella vaccines used in poultry in the EU are based on attenuated strains of either Salmonella serovar Enteritidis or Typhimurium which results in a decrease in S. Enteritidis and S. Typhimurium but may allow other Salmonella serovars to fill an empty ecological niche. In this study we were therefore interested in the early interactions of chicken immune system with S. Infantis compared to S. Enteritidis and S. Typhimurium, and a role of O-antigen in these interactions. To reach this aim, we orally infected newly hatched chickens with 7 wild type strains of Salmonella serovars Enteritidis, Typhimurium and Infantis as well as with their rfaL mutants and characterized the early Salmonella-chicken interactions. Inflammation was characterized in the cecum 4 days post-infection by measuring expression of 43 different genes. All wild type strains stimulated a greater inflammatory response than any of the rfaL mutants. However, there were large differences in chicken responses to different wild type strains not reflecting their serovar classification. The initial interaction between newly-hatched chickens and Salmonella was found to be dependent on the presence of O-antigen but not on its structure, i.e. not on serovar classification. In addition, we observed that the expression of calbindin or aquaporin 8 in the cecum did not change if inflammatory gene expression remained within a 10 fold fluctuation, indicating the buffering capacity of the cecum, preserving normal gut functions even in the presence of minor inflammatory stimuli. PMID:24763249

  10. Identification of high-risk cutaneous melanoma tumors is improved when combining the online American Joint Committee on Cancer Individualized Melanoma Patient Outcome Prediction Tool with a 31-gene expression profile-based classification.

    PubMed

    Ferris, Laura K; Farberg, Aaron S; Middlebrook, Brooke; Johnson, Clare E; Lassen, Natalie; Oelschlager, Kristen M; Maetzold, Derek J; Cook, Robert W; Rigel, Darrell S; Gerami, Pedram

    2017-05-01

    A significant proportion of patients with American Joint Committee on Cancer (AJCC)-defined early-stage cutaneous melanoma have disease recurrence and die. A 31-gene expression profile (GEP) that accurately assesses metastatic risk associated with primary cutaneous melanomas has been described. We sought to compare accuracy of the GEP in combination with risk determined using the web-based AJCC Individualized Melanoma Patient Outcome Prediction Tool. GEP results from 205 stage I/II cutaneous melanomas with sufficient clinical data for prognostication using the AJCC tool were classified as low (class 1) or high (class 2) risk. Two 5-year overall survival cutoffs (AJCC 79% and 68%), reflecting survival for patients with stage IIA or IIB disease, respectively, were assigned for binary AJCC risk. Cox univariate analysis revealed significant risk classification of distant metastasis-free and overall survival (hazard ratio range 3.2-9.4, P < .001) for both tools. In all, 43 (21%) cases had discordant GEP and AJCC classification (using 79% cutoff). Eleven of 13 (85%) deaths in that group were predicted as high risk by GEP but low risk by AJCC. Specimens reflect tertiary care center referrals; more effective therapies have been approved for clinical use after accrual. The GEP provides valuable prognostic information and improves identification of high-risk melanomas when used together with the AJCC online prediction tool. Copyright © 2016 American Academy of Dermatology, Inc. Published by Elsevier Inc. All rights reserved.

  11. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  12. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.

  13. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515

  14. Feature weight estimation for gene selection: a local hyperlinear learning approach

    PubMed Central

    2014-01-01

    Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071

  15. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    PubMed

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

  16. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  17. Portraying the Expression Landscapes of B-Cell Lymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

    PubMed Central

    Hopp, Lydia; Lembcke, Kathrin; Binder, Hans; Wirth, Henry

    2013-01-01

    We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics. PMID:24833231

  18. Tabu search and binary particle swarm optimization for feature selection using microarray data.

    PubMed

    Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong

    2009-12-01

    Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.

  19. Promising personalized therapeutic options for diffuse large B-cell Lymphoma Subtypes with oncogene addictions.

    PubMed

    Steinhardt, James J; Gartenhaus, Ronald B

    2012-09-01

    Currently, two major classification systems segregate diffuse large B-cell lymphoma (DLBCL) into subtypes based on gene expression profiles and provide great insights about the oncogenic mechanisms that may be crucial for lymphomagenesis as well as prognostic information regarding response to current therapies. However, these current classification systems primarily look at expression and not dependency and are thus limited to inductive or probabilistic reasoning when evaluating alternative therapeutic options. The development of a deductive classification system that identifies subtypes in which all patients with a given phenotype require the same oncogenic drivers, and would therefore have a similar response to a rational therapy targeting the essential drivers, would significantly advance the treatment of DLBCL. This review highlights the putative drivers identified as well as the work done to identify potentially dependent populations. These studies integrated genomic analysis and functional screens to provide a rationale for targeted therapies within defined populations. Personalizing treatments by identifying patients with oncogenic dependencies via genotyping and specifically targeting the responsible drivers may constitute a novel approach for the treatment of DLBCL. ©2012 AACR.

  20. Reliable pre-eclampsia pathways based on multiple independent microarray data sets.

    PubMed

    Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo

    2015-02-01

    Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre-eclampsia. © The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. Gene Expression Ratios Lead to Accurate and Translatable Predictors of DR5 Agonism across Multiple Tumor Lineages.

    PubMed

    Reddy, Anupama; Growney, Joseph D; Wilson, Nick S; Emery, Caroline M; Johnson, Jennifer A; Ward, Rebecca; Monaco, Kelli A; Korn, Joshua; Monahan, John E; Stump, Mark D; Mapa, Felipa A; Wilson, Christopher J; Steiger, Janine; Ledell, Jebediah; Rickles, Richard J; Myer, Vic E; Ettenberg, Seth A; Schlegel, Robert; Sellers, William R; Huet, Heather A; Lehár, Joseph

    2015-01-01

    Death Receptor 5 (DR5) agonists demonstrate anti-tumor activity in preclinical models but have yet to demonstrate robust clinical responses. A key limitation may be the lack of patient selection strategies to identify those most likely to respond to treatment. To overcome this limitation, we screened a DR5 agonist Nanobody across >600 cell lines representing 21 tumor lineages and assessed molecular features associated with response. High expression of DR5 and Casp8 were significantly associated with sensitivity, but their expression thresholds were difficult to translate due to low dynamic ranges. To address the translational challenge of establishing thresholds of gene expression, we developed a classifier based on ratios of genes that predicted response across lineages. The ratio classifier outperformed the DR5+Casp8 classifier, as well as standard approaches for feature selection and classification using genes, instead of ratios. This classifier was independently validated using 11 primary patient-derived pancreatic xenograft models showing perfect predictions as well as a striking linearity between prediction probability and anti-tumor response. A network analysis of the genes in the ratio classifier captured important biological relationships mediating drug response, specifically identifying key positive and negative regulators of DR5 mediated apoptosis, including DR5, CASP8, BID, cFLIP, XIAP and PEA15. Importantly, the ratio classifier shows translatability across gene expression platforms (from Affymetrix microarrays to RNA-seq) and across model systems (in vitro to in vivo). Our approach of using gene expression ratios presents a robust and novel method for constructing translatable biomarkers of compound response, which can also probe the underlying biology of treatment response.

  2. Gene Expression Ratios Lead to Accurate and Translatable Predictors of DR5 Agonism across Multiple Tumor Lineages

    PubMed Central

    Reddy, Anupama; Growney, Joseph D.; Wilson, Nick S.; Emery, Caroline M.; Johnson, Jennifer A.; Ward, Rebecca; Monaco, Kelli A.; Korn, Joshua; Monahan, John E.; Stump, Mark D.; Mapa, Felipa A.; Wilson, Christopher J.; Steiger, Janine; Ledell, Jebediah; Rickles, Richard J.; Myer, Vic E.; Ettenberg, Seth A.; Schlegel, Robert; Sellers, William R.

    2015-01-01

    Death Receptor 5 (DR5) agonists demonstrate anti-tumor activity in preclinical models but have yet to demonstrate robust clinical responses. A key limitation may be the lack of patient selection strategies to identify those most likely to respond to treatment. To overcome this limitation, we screened a DR5 agonist Nanobody across >600 cell lines representing 21 tumor lineages and assessed molecular features associated with response. High expression of DR5 and Casp8 were significantly associated with sensitivity, but their expression thresholds were difficult to translate due to low dynamic ranges. To address the translational challenge of establishing thresholds of gene expression, we developed a classifier based on ratios of genes that predicted response across lineages. The ratio classifier outperformed the DR5+Casp8 classifier, as well as standard approaches for feature selection and classification using genes, instead of ratios. This classifier was independently validated using 11 primary patient-derived pancreatic xenograft models showing perfect predictions as well as a striking linearity between prediction probability and anti-tumor response. A network analysis of the genes in the ratio classifier captured important biological relationships mediating drug response, specifically identifying key positive and negative regulators of DR5 mediated apoptosis, including DR5, CASP8, BID, cFLIP, XIAP and PEA15. Importantly, the ratio classifier shows translatability across gene expression platforms (from Affymetrix microarrays to RNA-seq) and across model systems (in vitro to in vivo). Our approach of using gene expression ratios presents a robust and novel method for constructing translatable biomarkers of compound response, which can also probe the underlying biology of treatment response. PMID:26378449

  3. Macrophage Responses to Epithelial Dysfunction Promote Lung Fibrosis in Aging

    DTIC Science & Technology

    2017-10-01

    alveolar macrophages based on single cell molecular classification in patients with pulmonary fibrosis. We have recruited a planned number of patients...biomarkers expressed by human tissue-resident and monocyte-derived alveolar macrophages based on single cell molecular classification in patients with...identify novel biomarkers expressed by human tissue-resident and monocyte- derived alveolar macrophages based on single cell molecular classification

  4. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    PubMed

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia

    PubMed Central

    LI, CHENGLONG; ZHU, BIAO; CHEN, JIAO; HUANG, XIAOBING

    2016-01-01

    In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation-positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the micro-array data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML. PMID:27177049

  6. Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia.

    PubMed

    Li, Chenglong; Zhu, Biao; Chen, Jiao; Huang, Xiaobing

    2016-07-01

    In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation‑positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the microarray data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML.

  7. Structure, Expression, Chromosomal Location and Product of the Gene Encoding Adh2 in Petunia

    PubMed Central

    Gregerson, R. G.; Cameron, L.; McLean, M.; Dennis, P.; Strommer, J.

    1993-01-01

    In most higher plants the genes encoding alcohol dehydrogenase comprise a small gene family, usually with two members. The Adh1 gene of Petunia has been cloned and analyzed, but a second identifiable gene was not recovered from any of three genomic libraries. We have therefore employed the polymerase chain reaction to obtain the major portion of a second Adh gene. From sequence, mapping and northern data we conclude this gene encodes ADH2, the major anaerobically inducible Adh gene of Petunia. The availability of both Adh1 and Adh2 from Petunia has permitted us to compare their structures and patterns of expression to those of the well-studied Adh genes of maize, of which one is highly expressed developmentally, while both are induced in response to hypoxia. Despite their evolutionary distance, evidenced by deduced amino acid sequence as well as taxonomic classification, the pairs of genes are regulated in strikingly similar ways in maize and Petunia. Our findings suggest a significant biological basis for the regulatory strategy employed by these distant species for differential expression of multiple Adh genes. PMID:8096485

  8. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

    PubMed Central

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003

  9. Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

  10. Comparative transcriptome analyses of three medicinal Forsythia species and prediction of candidate genes involved in secondary metabolisms.

    PubMed

    Sun, Luchao; Rai, Amit; Rai, Megha; Nakamura, Michimi; Kawano, Noriaki; Yoshimatsu, Kayo; Suzuki, Hideyuki; Kawahara, Nobuo; Saito, Kazuki; Yamazaki, Mami

    2018-05-07

    The three Forsythia species, F. suspensa, F. viridissima and F. koreana, have been used as herbal medicines in China, Japan and Korea for centuries and they are known to be rich sources of numerous pharmaceutical metabolites, forsythin, forsythoside A, arctigenin, rutin and other phenolic compounds. In this study, de novo transcriptome sequencing and assembly was performed on these species. Using leaf and flower tissues of F. suspensa, F. viridissima and F. koreana, 1.28-2.45-Gbp sequences of Illumina based pair-end reads were obtained and assembled into 81,913, 88,491 and 69,458 unigenes, respectively. Classification of the annotated unigenes in gene ontology terms and KEGG pathways was used to compare the transcriptome of three Forsythia species. The expression analysis of orthologous genes across all three species showed the expression in leaf tissues being highly correlated. The candidate genes presumably involved in the biosynthetic pathway of lignans and phenylethanoid glycosides were screened as co-expressed genes. They express highly in the leaves of F. viridissima and F. koreana. Furthermore, the three unigenes annotated as acyltransferase were predicted to be associated with the biosynthesis of acteoside and forsythoside A from the expression pattern and phylogenetic analysis. This study is the first report on comparative transcriptome analyses of medicinally important Forsythia genus and will serve as an important resource to facilitate further studies on biosynthesis and regulation of therapeutic compounds in Forsythia species.

  11. Gateways to the FANTOM5 promoter level mammalian expression atlas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lizio, Marina; Harshbarger, Jayson; Shimoji, Hisashi

    The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). In conclusion, this resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

  12. Gateways to the FANTOM5 promoter level mammalian expression atlas

    DOE PAGES

    Lizio, Marina; Harshbarger, Jayson; Shimoji, Hisashi; ...

    2015-01-05

    The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). In conclusion, this resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.

  13. Mechanism-based risk assessment strategy for drug-induced cholestasis using the transcriptional benchmark dose derived by toxicogenomics.

    PubMed

    Kawamoto, Taisuke; Ito, Yuichi; Morita, Osamu; Honda, Hiroshi

    2017-01-01

    Cholestasis is one of the major causes of drug-induced liver injury (DILI), which can result in withdrawal of approved drugs from the market. Early identification of cholestatic drugs is difficult due to the complex mechanisms involved. In order to develop a strategy for mechanism-based risk assessment of cholestatic drugs, we analyzed gene expression data obtained from the livers of rats that had been orally administered with 12 known cholestatic compounds repeatedly for 28 days at three dose levels. Qualitative analyses were performed using two statistical approaches (hierarchical clustering and principle component analysis), in addition to pathway analysis. The transcriptional benchmark dose (tBMD) and tBMD 95% lower limit (tBMDL) were used for quantitative analyses, which revealed three compound sub-groups that produced different types of differential gene expression; these groups of genes were mainly involved in inflammation, cholesterol biosynthesis, and oxidative stress. Furthermore, the tBMDL values for each test compound were in good agreement with the relevant no observed adverse effect level. These results indicate that our novel strategy for drug safety evaluation using mechanism-based classification and tBMDL would facilitate the application of toxicogenomics for risk assessment of cholestatic DILI.

  14. Global Transcriptional Response of Human Liver Cells to Ethanol Stress of Different Strength Reveals Hormetic Behavior.

    PubMed

    Schmidt-Heck, Wolfgang; Wönne, Eva C; Hiller, Thomas; Menzel, Uwe; Koczan, Dirk; Damm, Georg; Seehofer, Daniel; Knöspel, Fanny; Freyer, Nora; Guthke, Reinhard; Dooley, Steven; Zeilinger, Katrin

    2017-05-01

    The liver is the major site for alcohol metabolism in the body and therefore the primary target organ for ethanol (EtOH)-induced toxicity. In this study, we investigated the in vitro response of human liver cells to different EtOH concentrations in a perfused bioartificial liver device that mimics the complex architecture of the natural organ. Primary human liver cells were cultured in the bioartificial liver device and treated for 24 hours with medium containing 150 mM (low), 300 mM (medium), or 600 mM (high) EtOH, while a control culture was kept untreated. Gene expression patterns for each EtOH concentration were monitored using Affymetrix Human Gene 1.0 ST Gene chips. Scaled expression profiles of differentially expressed genes (DEGs) were clustered using Fuzzy c-means algorithm. In addition, functional classification methods, KEGG pathway mapping and also a machine learning approach (Random Forest) were utilized. A number of 966 (150 mM EtOH), 1,334 (300 mM EtOH), or 4,132 (600 mM EtOH) genes were found to be differentially expressed. Dose-response relationships of the identified clusters of co-expressed genes showed a monotonic, threshold, or nonmonotonic (hormetic) behavior. Functional classification of DEGs revealed that low or medium EtOH concentrations operate adaptation processes, while alterations observed for the high EtOH concentration reflect the response to cellular damage. The genes displaying a hormetic response were functionally characterized by overrepresented "cellular ketone metabolism" and "carboxylic acid metabolism." Altered expression of the genes BAHD1 and H3F3B was identified as sufficient to classify the samples according to the applied EtOH doses. Different pathways of metabolic and epigenetic regulation are affected by EtOH exposition and partly undergo hormetic regulation in the bioartificial liver device. Gene expression changes observed at high EtOH concentrations reflect in some aspects the situation of alcoholic hepatitis in humans. Copyright © 2017 by the Research Society on Alcoholism.

  15. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

    PubMed Central

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.

    2017-01-01

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623

  16. MicroRNA Expression-Based Model Indicates Event-Free Survival in Pediatric Acute Myeloid Leukemia

    PubMed Central

    Lim, Emilia L.; Trinh, Diane L.; Ries, Rhonda E.; Wang, Jim; Gerbing, Robert B.; Ma, Yussanne; Topham, James; Hughes, Maya; Pleasance, Erin; Mungall, Andrew J.; Moore, Richard; Zhao, Yongjun; Aplenc, Richard; Sung, Lillian; Kolb, E. Anders; Gamis, Alan; Smith, Malcolm; Gerhard, Daniela S.; Alonzo, Todd A.; Meshinchi, Soheil; Marra, Marco A.

    2017-01-01

    Purpose Children with acute myeloid leukemia (AML) whose disease is refractory to standard induction chemotherapy therapy or who experience relapse after initial response have dismal outcomes. We sought to comprehensively profile pediatric AML microRNA (miRNA) samples to identify dysregulated genes and assess the utility of miRNAs for improved outcome prediction. Patients and Methods To identify miRNA biomarkers that are associated with treatment failure, we performed a comprehensive sequence-based characterization of the pediatric AML miRNA landscape. miRNA sequencing was performed on 1,362 samples—1,303 primary, 22 refractory, and 37 relapse samples. One hundred sixty-four matched samples—127 primary and 37 relapse samples—were analyzed by using RNA sequencing. Results By using penalized lasso Cox proportional hazards regression, we identified 36 miRNAs the expression levels at diagnosis of which were highly associated with event-free survival. Combined expression of the 36 miRNAs was used to create a novel miRNA-based risk classification scheme (AMLmiR36). This new miRNA-based risk classifier identifies those patients who are at high risk (hazard ratio, 2.830; P ≤ .001) or low risk (hazard ratio, 0.323; P ≤ .001) of experiencing treatment failure, independent of conventional karyotype or mutation status. The performance of AMLmiR36 was independently assessed by using 878 patients from two different clinical trials (AAML0531 and AAML1031). Our analysis also revealed that miR-106a-363 was abundantly expressed in relapse and refractory samples, and several candidate targets of miR-106a-5p were involved in oxidative phosphorylation, a process that is suppressed in treatment-resistant leukemic cells. Conclusion To assess the utility of miRNAs for outcome prediction in patients with pediatric AML, we designed and validated a miRNA-based risk classification scheme. We also hypothesized that the abundant expression of miR-106a could increase treatment resistance via modulation of genes that are involved in oxidative phosphorylation. PMID:29068783

  17. De Novo Transcriptome Assembly and Characterization of Lithospermum officinale to Discover Putative Genes Involved in Specialized Metabolites Biosynthesis.

    PubMed

    Rai, Amit; Nakaya, Taiki; Shimizu, Yohei; Rai, Megha; Nakamura, Michimi; Suzuki, Hideyuki; Saito, Kazuki; Yamazaki, Mami

    2018-05-29

    Lithospermum officinale is a valuable source of bioactive metabolites with medicinal and industrial values. However, little is known about genes involved in the biosynthesis of these metabolites, primarily due to the lack of genome or transcriptome resources. This study presents the first effort to establish and characterize de novo transcriptome assembly resource for L. officinale and expression analysis for three of its tissues, namely leaf, stem, and root. Using over 4Gbps of RNA-sequencing datasets, we obtained de novo transcriptome assembly of L. officinale , consisting of 77,047 unigenes with assembly N50 value as 1524 bps. Based on transcriptome annotation and functional classification, 52,766 unigenes were assigned with putative genes functions, gene ontology terms, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. KEGG pathway and gene ontology enrichment analysis using highly expressed unigenes across three tissues and targeted metabolome analysis showed active secondary metabolic processes enriched specifically in the root of L. officinale . Using co-expression analysis, we also identified 20 and 48 unigenes representing different enzymes of lithospermic/chlorogenic acid and shikonin biosynthesis pathways, respectively. We further identified 15 candidate unigenes annotated as cytochrome P450 with the highest expression in the root of L. officinale as novel genes with a role in key biochemical reactions toward shikonin biosynthesis. Thus, through this study, we not only generated a high-quality genomic resource for L. officinale but also propose candidate genes to be involved in shikonin biosynthesis pathways for further functional characterization. Georg Thieme Verlag KG Stuttgart · New York.

  18. Integrative topological analysis of mass spectrometry data reveals molecular features with clinical relevance in esophageal squamous cell carcinoma

    PubMed Central

    Gao, She-Gan; Liu, Rui-Min; Zhao, Yun-Gang; Wang, Pei; Ward, Douglas G.; Wang, Guang-Chao; Guo, Xiang-Qian; Gu, Juan; Niu, Wan-Bin; Zhang, Tian; Martin, Ashley; Guo, Zhi-Peng; Feng, Xiao-Shan; Qi, Yi-Jun; Ma, Yuan-Fang

    2016-01-01

    Combining MS-based proteomic data with network and topological features of such network would identify more clinically relevant molecules and meaningfully expand the repertoire of proteins derived from MS analysis. The integrative topological indexes representing 95.96% information of seven individual topological measures of node proteins were calculated within a protein-protein interaction (PPI) network, built using 244 differentially expressed proteins (DEPs) identified by iTRAQ 2D-LC-MS/MS. Compared with DEPs, differentially expressed genes (DEGs) and comprehensive features (CFs), structurally dominant nodes (SDNs) based on integrative topological index distribution produced comparable classification performance in three different clinical settings using five independent gene expression data sets. The signature molecules of SDN-based classifier for distinction of early from late clinical TNM stages were enriched in biological traits of protein synthesis, intracellular localization and ribosome biogenesis, which suggests that ribosome biogenesis represents a promising therapeutic target for treating ESCC. In addition, ITGB1 expression selected exclusively by integrative topological measures correlated with clinical stages and prognosis, which was further validated with two independent cohorts of ESCC samples. Thus the integrative topological analysis of PPI networks proposed in this study provides an alternative approach to identify potential biomarkers and therapeutic targets from MS/MS data with functional insights in ESCC. PMID:26898710

  19. Blood gene expression profiling of an early acetaminophen response.

    PubMed

    Bushel, P R; Fannin, R D; Gerrish, K; Watkins, P B; Paules, R S

    2017-06-01

    Acetaminophen can adversely affect the liver especially when overdosed. We used whole blood as a surrogate to identify genes as potential early indicators of an acetaminophen-induced response. In a clinical study, healthy human subjects were dosed daily with 4 g of either acetaminophen or placebo pills for 7 days and evaluated over the course of 14 days. Alanine aminotransferase (ALT) levels for responders to acetaminophen increased between days 4 and 9 after dosing, and 12 genes were detected with expression profiles significantly altered within 24 h. The early responsive genes separated the subjects by class and dose period. In addition, the genes clustered patients who overdosed on acetaminophen apart from controls and also predicted the exposure classifications with 100% accuracy. The responsive genes serve as early indicators of an acetaminophen exposure, and their gene expression profiles can potentially be evaluated as molecular indicators for further consideration.

  20. Blood Gene Expression Profiling of an Early Acetaminophen Response

    PubMed Central

    Bushel, Pierre R.; Fannin, Rick D.; Gerrish, Kevin; Watkins, Paul B.; Paules, Richard S.

    2018-01-01

    Acetaminophen can adversely affect the liver especially when overdosed. We used whole blood as a surrogate to identify genes as potential early indicators of an acetaminophen-induced response. In a clinical study, healthy human subjects were dosed daily with 4g of either acetaminophen or placebo pills for 7 days and evaluated over the course of 14 days. Alanine aminotransferase (ALT) levels for responders to acetaminophen increased between days 4 and 9 after dosing and 12 genes were detected with expression profiles significantly altered within 24 hrs. The early responsive genes separated the subjects by class and dose period. In addition, the genes clustered patients who overdosed on acetaminophen apart from controls and also predicted the exposure classifications with 100% accuracy. The responsive genes serve as early indicators of an acetaminophen exposure and their gene expression profiles can potentially be evaluated as molecular indicators for further consideration. PMID:26927286

  1. brain-coX: investigating and visualising gene co-expression in seven human brain transcriptomic datasets.

    PubMed

    Freytag, Saskia; Burgess, Rosemary; Oliver, Karen L; Bahlo, Melanie

    2017-06-08

    The pathogenesis of neurological and mental health disorders often involves multiple genes, complex interactions, as well as brain- and development-specific biological mechanisms. These characteristics make identification of disease genes for such disorders challenging, as conventional prioritisation tools are not specifically tailored to deal with the complexity of the human brain. Thus, we developed a novel web-application-brain-coX-that offers gene prioritisation with accompanying visualisations based on seven gene expression datasets in the post-mortem human brain, the largest such resource ever assembled. We tested whether our tool can correctly prioritise known genes from 37 brain-specific KEGG pathways and 17 psychiatric conditions. We achieved average sensitivity of nearly 50%, at the same time reaching a specificity of approximately 75%. We also compared brain-coX's performance to that of its main competitors, Endeavour and ToppGene, focusing on the ability to discover novel associations. Using a subset of the curated SFARI autism gene collection we show that brain-coX's prioritisations are most similar to SFARI's own curated gene classifications. brain-coX is the first prioritisation and visualisation web-tool targeted to the human brain and can be freely accessed via http://shiny.bioinf.wehi.edu.au/freytag.s/ .

  2. Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer

    PubMed Central

    2015-01-01

    Background microRNA (miRNA) expression plays an influential role in cancer classification and malignancy, and miRNAs are feasible as alternative diagnostic markers for pancreatic cancer, a highly aggressive neoplasm with silent early symptoms, high metastatic potential, and resistance to conventional therapies. Methods In this study, we evaluated the benefits of multi-omics data analysis by integrating miRNA and mRNA expression data in pancreatic cancer. Using support vector machine (SVM) modelling and leave-one-out cross validation (LOOCV), we evaluated the diagnostic performance of single- or multi-markers based on miRNA and mRNA expression profiles from 104 PDAC tissues and 17 benign pancreatic tissues. For selecting even more reliable and robust markers, we performed validation by independent datasets from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) data depositories. For validation, miRNA activity was estimated by miRNA-target gene interaction and mRNA expression datasets in pancreatic cancer. Results Using a comprehensive identification approach, we successfully identified 705 multi-markers having powerful diagnostic performance for PDAC. In addition, these marker candidates annotated with cancer pathways using gene ontology analysis. Conclusions Our prediction models have strong potential for the diagnosis of pancreatic cancer. PMID:26328610

  3. Intrinsic subtypes from PAM50 gene expression assay in a population-based breast cancer cohort: differences by age, race, and tumor characteristics.

    PubMed

    Sweeney, Carol; Bernard, Philip S; Factor, Rachel E; Kwan, Marilyn L; Habel, Laurel A; Quesenberry, Charles P; Shakespear, Kaylynn; Weltzien, Erin K; Stijleman, Inge J; Davis, Carole A; Ebbert, Mark T W; Castillo, Adrienne; Kushi, Lawrence H; Caan, Bette J

    2014-05-01

    Data are lacking to describe gene expression-based breast cancer intrinsic subtype patterns for population-based patient groups. We studied a diverse cohort of women with breast cancer from the Life After Cancer Epidemiology and Pathways studies. RNA was extracted from 1 mm punches from fixed tumor tissue. Quantitative reverse-transcriptase PCR was conducted for the 50 genes that comprise the PAM50 intrinsic subtype classifier. In a subcohort of 1,319 women, the overall subtype distribution based on PAM50 was 53.1% luminal A, 20.5% luminal B, 13.0% HER2-enriched, 9.8% basal-like, and 3.6% normal-like. Among low-risk endocrine-positive tumors (i.e., estrogen and progesterone receptor positive by immunohistochemistry, HER2 negative, and low histologic grade), only 76.5% were categorized as luminal A by PAM50. Continuous-scale luminal A, luminal B, HER2-enriched, and normal-like scores from PAM50 were mutually positively correlated. Basal-like score was inversely correlated with other subtypes. The proportion with non-luminal A subtype decreased with older age at diagnosis, P Trend < 0.0001. Compared with non-Hispanic Whites, African American women were more likely to have basal-like tumors, age-adjusted OR = 4.4 [95% confidence intervals (CI), 2.3-8.4], whereas Asian and Pacific Islander women had reduced odds of basal-like subtype, OR = 0.5 (95% CI, 0.3-0.9). Our data indicate that over 50% of breast cancers treated in the community have luminal A subtype. Gene expression-based classification shifted some tumors categorized as low risk by surrogate clinicopathologic criteria to higher-risk subtypes. Subtyping in a population-based cohort revealed distinct profiles by age and race. ©2014 AACR.

  4. Phenotypic and genotypic expression of self-incompatibility haplotypes in Arabidopsis lyrata suggests unique origin of alleles in different dominance classes.

    PubMed

    Prigoda, Nadia L; Nassuth, Annette; Mable, Barbara K

    2005-07-01

    The highly divergent alleles of the SRK gene in outcrossing Arabidopsis lyrata have provided important insights into the evolutionary history of self-incompatibility (SI) alleles and serve as an ideal model for studies of the evolutionary and molecular interactions between alleles in cell-cell recognition systems in general. One tantalizing question is how new specificities arise in systems that require coordination between male and female components. Allelic recruitment via gene conversion has been proposed as one possibility, based on the division of DNA sequences at the SRK locus into two distinctive groups: (1) sequences whose relationships are not well resolved and display the long branch lengths expected for a gene under balancing selection (Class A); and (2) sequences falling into a well-supported group with shorter branch lengths (Class B) that are closely related to an unlinked paralogous locus. The purpose of this study was to determine if differences in phenotype (site of expression assayed using allele-specific reverse transcription-polymerase chain reaction) or function (dominance relationships assayed through controlled pollinations) accompany the sequence-based classification. Expression of Class A alleles was restricted to floral tissues, as predicted for genes involved in the SI response. In contrast, Class B alleles, despite being tightly linked to the SI phenotype, were unexpectedly expressed in both leaves and floral tissues; the same pattern found for a related unlinked paralogous sequence. Whereas Class A included haplotypes in three different dominance classes, all Class B haplotypes were found to be recessive to all except one Class A haplotype. In addition, mapping of expression and dominance patterns onto an S-domain-based genealogy suggested that allelic dominance may be determined more by evolutionary history than by frequency-dependent selection for lowered dominance as some theories suggest. The possibility that interlocus gene conversion might have contributed to allelic diversity is discussed.

  5. Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm.

    PubMed

    Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder; Murphy, Denis J

    2018-01-01

    Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops.

  6. Comparative genomic and transcriptomic analysis of selected fatty acid biosynthesis genes and CNL disease resistance genes in oil palm

    PubMed Central

    Rosli, Rozana; Amiruddin, Nadzirah; Ab Halim, Mohd Amin; Chan, Pek-Lan; Chan, Kuang-Lim; Azizi, Norazah; Morris, Priscilla E.; Leslie Low, Eng-Ti; Ong-Abdullah, Meilina; Sambanthamurthi, Ravigadevi; Singh, Rajinder

    2018-01-01

    Comparative genomics and transcriptomic analyses were performed on two agronomically important groups of genes from oil palm versus other major crop species and the model organism, Arabidopsis thaliana. The first analysis was of two gene families with key roles in regulation of oil quality and in particular the accumulation of oleic acid, namely stearoyl ACP desaturases (SAD) and acyl-acyl carrier protein (ACP) thioesterases (FAT). In both cases, these were found to be large gene families with complex expression profiles across a wide range of tissue types and developmental stages. The detailed classification of the oil palm SAD and FAT genes has enabled the updating of the latest version of the oil palm gene model. The second analysis focused on disease resistance (R) genes in order to elucidate possible candidates for breeding of pathogen tolerance/resistance. Ortholog analysis showed that 141 out of the 210 putative oil palm R genes had homologs in banana and rice. These genes formed 37 clusters with 634 orthologous genes. Classification of the 141 oil palm R genes showed that the genes belong to the Kinase (7), CNL (95), MLO-like (8), RLK (3) and Others (28) categories. The CNL R genes formed eight clusters. Expression data for selected R genes also identified potential candidates for breeding of disease resistance traits. Furthermore, these findings can provide information about the species evolution as well as the identification of agronomically important genes in oil palm and other major crops. PMID:29672525

  7. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification.

    PubMed

    Shimoni, Yishai

    2018-02-01

    One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.

  8. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification

    PubMed Central

    2018-01-01

    One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520

  9. Gene Expression Profiling Specifies Chemokine, Mitochondrial and Lipid Metabolism Signatures in Leprosy

    PubMed Central

    Guerreiro, Luana Tatiana Albuquerque; Robottom-Ferreira, Anna Beatriz; Ribeiro-Alves, Marcelo; Toledo-Pinto, Thiago Gomes; Rosa Brito, Tiana; Rosa, Patrícia Sammarco; Sandoval, Felipe Galvan; Jardim, Márcia Rodrigues; Antunes, Sérgio Gomes; Shannon, Edward J.; Sarno, Euzenir Nunes; Pessolani, Maria Cristina Vidal; Williams, Diana Lynn; Moraes, Milton Ozório

    2013-01-01

    Herein, we performed microarray experiments in Schwann cells infected with live M. leprae and identified novel differentially expressed genes (DEG) in M. leprae infected cells. Also, we selected candidate genes associated or implicated with leprosy in genetic studies and biological experiments. Forty-seven genes were selected for validation in two independent types of samples by multiplex qPCR. First, an in vitro model using THP-1 cells was infected with live Mycobacterium leprae and M. bovis bacillus Calmette-Guérin (BCG). In a second situation, mRNA obtained from nerve biopsies from patients with leprosy or other peripheral neuropathies was tested. We detected DEGs that discriminate M. bovis BCG from M. leprae infection. Specific signatures of susceptible responses after M. leprae infection when compared to BCG lead to repression of genes, including CCL2, CCL3, IL8 and SOD2. The same 47-gene set was screened in nerve biopsies, which corroborated the down-regulation of CCL2 and CCL3 in leprosy, but also evidenced the down-regulation of genes involved in mitochondrial metabolism, and the up-regulation of genes involved in lipid metabolism and ubiquitination. Finally, a gene expression signature from DEG was identified in patients confirmed of having leprosy. A classification tree was able to ascertain 80% of the cases as leprosy or non-leprous peripheral neuropathy based on the expression of only LDLR and CCL4. A general immune and mitochondrial hypo-responsive state occurs in response to M. leprae infection. Also, the most important genes and pathways have been highlighted providing new tools for early diagnosis and treatment of leprosy. PMID:23798993

  10. Whole Blood Gene Expression Profiling Predicts Severe Morbidity and Mortality in Cystic Fibrosis: A 5-Year Follow-Up Study.

    PubMed

    Saavedra, Milene T; Quon, Bradley S; Faino, Anna; Caceres, Silvia M; Poch, Katie R; Sanders, Linda A; Malcolm, Kenneth C; Nichols, David P; Sagel, Scott D; Taylor-Cousar, Jennifer L; Leach, Sonia M; Strand, Matthew; Nick, Jerry A

    2018-05-01

    Cystic fibrosis pulmonary exacerbations accelerate pulmonary decline and increase mortality. Previously, we identified a 10-gene leukocyte panel measured directly from whole blood, which indicates response to exacerbation treatment. We hypothesized that molecular characteristics of exacerbations could also predict future disease severity. We tested whether a 10-gene panel measured from whole blood could identify patient cohorts at increased risk for severe morbidity and mortality, beyond standard clinical measures. Transcript abundance for the 10-gene panel was measured from whole blood at the beginning of exacerbation treatment (n = 57). A hierarchical cluster analysis of subjects based on their gene expression was performed, yielding four molecular clusters. An analysis of cluster membership and outcomes incorporating an independent cohort (n = 21) was completed to evaluate robustness of cluster partitioning of genes to predict severe morbidity and mortality. The four molecular clusters were analyzed for differences in forced expiratory volume in 1 second, C-reactive protein, return to baseline forced expiratory volume in 1 second after treatment, time to next exacerbation, and time to morbidity or mortality events (defined as lung transplant referral, lung transplant, intensive care unit admission for respiratory insufficiency, or death). Clustering based on gene expression discriminated between patient groups with significant differences in forced expiratory volume in 1 second, admission frequency, and overall morbidity and mortality. At 5 years, all subjects in cluster 1 (very low risk) were alive and well, whereas 90% of subjects in cluster 4 (high risk) had suffered a major event (P = 0.0001). In multivariable analysis, the ability of gene expression to predict clinical outcomes remained significant, despite adjustment for forced expiratory volume in 1 second, sex, and admission frequency. The robustness of gene clustering to categorize patients appropriately in terms of clinical characteristics, and short- and long-term clinical outcomes, remained consistent, even when adding in a secondary population with significantly different clinical outcomes. Whole blood gene expression profiling allows molecular classification of acute pulmonary exacerbations, beyond standard clinical measures, providing a predictive tool for identifying subjects at increased risk for mortality and disease progression.

  11. Probabilistic classifiers with high-dimensional data

    PubMed Central

    Kim, Kyung In; Simon, Richard

    2011-01-01

    For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946

  12. Functional classification of rice flanking sequence tagged genes using MapMan terms and global understanding on metabolic and regulatory pathways affected by dxr mutant having defects in light response.

    PubMed

    Chandran, Anil Kumar Nalini; Lee, Gang-Seob; Yoo, Yo-Han; Yoon, Ung-Han; Ahn, Byung-Ohg; Yun, Doh-Won; Kim, Jin-Hyun; Choi, Hong-Kyu; An, GynHeung; Kim, Tae-Ho; Jung, Ki-Hong

    2016-12-01

    Rice is one of the most important food crops for humans. To improve the agronomical traits of rice, the functions of more than 1,000 rice genes have been recently characterized and summarized. The completed, map-based sequence of the rice genome has significantly accelerated the functional characterization of rice genes, but progress remains limited in assigning functions to all predicted non-transposable element (non-TE) genes, estimated to number 37,000-41,000. The International Rice Functional Genomics Consortium (IRFGC) has generated a huge number of gene-indexed mutants by using mutagens such as T-DNA, Tos17 and Ds/dSpm. These mutants have been identified by 246,566 flanking sequence tags (FSTs) and cover 65 % (25,275 of 38,869) of the non-TE genes in rice, while the mutation ratio of TE genes is 25.7 %. In addition, almost 80 % of highly expressed non-TE genes have insertion mutations, indicating that highly expressed genes in rice chromosomes are more likely to have mutations by mutagens such as T-DNA, Ds, dSpm and Tos17. The functions of around 2.5 % of rice genes have been characterized, and studies have mainly focused on transcriptional and post-transcriptional regulation. Slow progress in characterizing the function of rice genes is mainly due to a lack of clues to guide functional studies or functional redundancy. These limitations can be partially solved by a well-categorized functional classification of FST genes. To create this classification, we used the diverse overviews installed in the MapMan toolkit. Gene Ontology (GO) assignment to FST genes supplemented the limitation of MapMan overviews. The functions of 863 of 1,022 known genes can be evaluated by current FST lines, indicating that FST genes are useful resources for functional genomic studies. We assigned 16,169 out of 29,624 FST genes to 34 MapMan classes, including major three categories such as DNA, RNA and protein. To demonstrate the MapMan application on FST genes, transcriptome analysis was done from a rice mutant of 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR) gene with FST. Mapping of 756 down-regulated genes in dxr mutants and their annotation in terms of various MapMan overviews revealed candidate genes downstream of DXR-mediating light signaling pathway in diverse functional classes such as the methyl-D-erythritol 4-phosphatepathway (MEP) pathway overview, photosynthesis, secondary metabolism and regulatory overview. This report provides a useful guide for systematic phenomics and further applications to enhance the key agronomic traits of rice.

  13. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    PubMed

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  14. Near-isogenic cotton germplasm lines that differ in fiber-bundle strength have temporal differences in fiber gene expression patterns as revealed by comparative high-throughput profiling.

    PubMed

    Hinchliffe, Doug J; Meredith, William R; Yeater, Kathleen M; Kim, Hee Jin; Woodward, Andrew W; Chen, Z Jeffrey; Triplett, Barbara A

    2010-05-01

    Gene expression profiles of developing cotton (Gossypium hirsutum L.) fibers from two near-isogenic lines (NILs) that differ in fiber-bundle strength, short-fiber content, and in fewer than two genetic loci were compared using an oligonucleotide microarray. Fiber gene expression was compared at five time points spanning fiber elongation and secondary cell wall (SCW) biosynthesis. Fiber samples were collected from field plots in a randomized, complete block design, with three spatially distinct biological replications for each NIL at each time point. Microarray hybridizations were performed in a loop experimental design that allowed comparisons of fiber gene expression profiles as a function of time between the two NILs. Overall, developmental expression patterns revealed by the microarray experiment agreed with previously reported cotton fiber gene expression patterns for specific genes. Additionally, genes expressed coordinately with the onset of SCW biosynthesis in cotton fiber correlated with gene expression patterns of other SCW-producing plant tissues. Functional classification and enrichment analysis of differentially expressed genes between the two NILs revealed that genes associated with SCW biosynthesis were significantly up-regulated in fibers of the high-fiber quality line at the transition stage of cotton fiber development. For independent corroboration of the microarray results, 15 genes were selected for quantitative reverse transcription PCR analysis of fiber gene expression. These analyses, conducted over multiple field years, confirmed the temporal difference in fiber gene expression between the two NILs. We hypothesize that the loci conferring temporal differences in fiber gene expression between the NILs are important regulatory sequences that offer the potential for more targeted manipulation of cotton fiber quality.

  15. Integrated analysis of DNA methylation, immunohistochemistry and mRNA expression, data identifies a Methylation Expression Index (MEI) robustly associated with survival of ER-positive breast cancer patients

    PubMed Central

    Garcia-Closas, Montserrat; Davis, Sean; Meltzer, Paul; Lissowska, Jolanta; Horne, Hisani N.; Sherman, Mark E.; Lee, Maxwell

    2015-01-01

    Identification of prognostic gene expression signatures may enable improved decisions about management of breast cancer. To identify a prognostic signature for breast cancer, we performed DNA methylation profiling and identified methylation markers that were associated with expression of ER, PR, HER2, CK5/6 and EGFR proteins. Methylation markers that were correlated with corresponding mRNA expression levels were identified using 208 invasive tumors from a population-based case-control study conducted in Poland. Using this approach, we defined the Methylation Expression Index (MEI) signature that was based on a weighted sum of mRNA levels of 57 genes. Classification of cases as low or high MEI scores were related to survival using Cox regression models. In the Polish study, women with ER-positive low MEI cancers had reduced survival at a median of 5.20 years of follow-up, HR=2.85 95%CI=1.25-6.47. Low MEI was also related to decreased survival in four independent datasets totaling over 2500 ER-positive breast cancers. These results suggest that integrated analysis of tumor expression markers, DNA methylation, and mRNA data can be an important approach for identifying breast cancer prognostic signatures. Prospective assessment of MEI along with other prognostic signatures should be evaluated in future studies. PMID:25773928

  16. Gene expression profiling of choline-deprived neural precursor cells isolated from mouse brain.

    PubMed

    Niculescu, Mihai D; Craciunescu, Corneliu N; Zeisel, Steven H

    2005-04-04

    Choline is an essential nutrient and an important methyl donor. Choline deficiency alters fetal development of the hippocampus in rodents and these changes are associated with decreased memory function lasting throughout life. Also, choline deficiency alters global and gene-specific DNA methylation in several models. This gene expression profiling study describes changes in cortical neural precursor cells from embryonic day 14 mice, after 48 h of exposure to a choline-deficient medium. Using Significance Analysis of Microarrays, we found the expression of 1003 genes to be significantly changed (from a total of 16,000 total genes spotted on the array), with a false discovery rate below 5%. A total of 846 genes were overexpressed while 157 were underexpressed. Classification by gene ontology revealed that 331 of these genes modulate cell proliferation, apoptosis, neuronal and glial differentiation, methyl metabolism, and calcium-binding protein classes. Twenty-seven genes that had changed expression have previously been reported to be regulated by promoter or intron methylation. These findings support our previous work suggesting that choline deficiency decreases the proliferation of neural precursors and possibly increases premature neuronal differentiation and apoptosis.

  17. Gene selection for cancer classification with the help of bees.

    PubMed

    Moosa, Johra Muhammad; Shakur, Rameen; Kaykobad, Mohammad; Rahman, Mohammad Sohel

    2016-08-10

    Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses. This study presents a modified Artificial Bee Colony Algorithm (ABC) to select minimum number of genes that are deemed to be significant for cancer along with improvement of predictive accuracy. The search equation of ABC is believed to be good at exploration but poor at exploitation. To overcome this limitation we have modified the ABC algorithm by incorporating the concept of pheromones which is one of the major components of Ant Colony Optimization (ACO) algorithm and a new operation in which successive bees communicate to share their findings. The proposed algorithm is evaluated using a suite of ten publicly available datasets after the parameters are tuned scientifically with one of the datasets. Obtained results are compared to other works that used the same datasets. The performance of the proposed method is proved to be superior. The method presented in this paper can provide subset of genes leading to more accurate classification results while the number of selected genes is smaller. Additionally, the proposed modified Artificial Bee Colony Algorithm could conceivably be applied to problems in other areas as well.

  18. MicroRNA-integrated and network-embedded gene selection with diffusion distance.

    PubMed

    Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C

    2010-10-29

    Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.

  19. Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling.

    PubMed

    Ando, Tatsuya; Suguro, Miyuki; Kobayashi, Takeshi; Seto, Masao; Honda, Hiroyuki

    2003-10-01

    A fuzzy neural network (FNN) using gene expression profile data can select combinations of genes from thousands of genes, and is applicable to predict outcome for cancer patients after chemotherapy. However, wide clinical heterogeneity reduces the accuracy of prediction. To overcome this problem, we have proposed an FNN system based on majoritarian decision using multiple noninferior models. We used transcriptional profiling data, which were obtained from "Lymphochip" DNA microarrays (http://llmpp.nih.gov/DLBCL), reported by Rosenwald (N Engl J Med 2002; 346: 1937-47). When the data were analyzed by our FNN system, accuracy (73.4%) of outcome prediction using only 1 FNN model with 4 genes was higher than that (68.5%) of the Cox model using 17 genes. Higher accuracy (91%) was obtained when an FNN system with 9 noninferior models, consisting of 35 independent genes, was used. The genes selected by the system included genes that are informative in the prognosis of Diffuse large B-cell lymphoma (DLBCL), such as genes showing an expression pattern similar to that of CD10 and BCL-6 or similar to that of IRF-4 and BCL-4. We classified 220 DLBCL patients into 5 groups using the prediction results of 9 FNN models. These groups may correspond to DLBCL subtypes. In group A containing half of the 220 patients, patients with poor outcome were found to satisfy 2 rules, i.e., high expression of MAX dimerization with high expression of unknown A (LC_26146), or high expression of MAX dimerization with low expression of unknown B (LC_33144). The present paper is the first to describe the multiple noninferior FNN modeling system. This system is a powerful tool for predicting outcome and classifying patients, and is applicable to other heterogeneous diseases.

  20. Gene expression profiling of acute myeloid leukemia samples from adult patients with AML-M1 and -M2 through boutique microarrays, real-time PCR and droplet digital PCR.

    PubMed

    Handschuh, Luiza; Kaźmierczak, Maciej; Milewski, Marek C; Góralski, Michał; Łuczak, Magdalena; Wojtaszewska, Marzena; Uszczyńska-Ratajczak, Barbara; Lewandowski, Krzysztof; Komarnicki, Mieczysław; Figlerowicz, Marek

    2018-03-01

    Acute myeloid leukemia (AML) is the most common and severe form of acute leukemia diagnosed in adults. Owing to its heterogeneity, AML is divided into classes associated with different treatment outcomes and specific gene expression profiles. Based on previous studies on AML, in this study, we designed and generated an AML-array containing 900 oligonucleotide probes complementary to human genes implicated in hematopoietic cell differentiation and maturation, proliferation, apoptosis and leukemic transformation. The AML-array was used to hybridize 118 samples from 33 patients with AML of the M1 and M2 subtypes of the French-American‑British (FAB) classification and 15 healthy volunteers (HV). Rigorous analysis of the microarray data revealed that 83 genes were differentially expressed between the patients with AML and the HV, including genes not yet discussed in the context of AML pathogenesis. The most overexpressed genes in AML were STMN1, KITLG, CDK6, MCM5, KRAS, CEBPA, MYC, ANGPT1, SRGN, RPLP0, ENO1 and SET, whereas the most underexpressed genes were IFITM1, LTB, FCN1, BIRC3, LYZ, ADD3, S100A9, FCER1G, PTRPE, CD74 and TMSB4X. The overexpression of the CPA3 gene was specific for AML with mutated NPM1 and FLT3. Although the microarray-based method was insufficient to differentiate between any other AML subgroups, quantitative PCR approaches enabled us to identify 3 genes (ANXA3, S100A9 and WT1) whose expression can be used to discriminate between the 2 studied AML FAB subtypes. The expression levels of the ANXA3 and S100A9 genes were increased, whereas those of WT1 were decreased in the AML-M2 compared to the AML-M1 group. We also examined the association between the STMN1, CAT and ABL1 genes, and the FLT3 and NPM1 mutation status. FLT3+/NPM1- AML was associated with the highest expression of STMN1, and ABL1 was upregulated in FLT3+ AML and CAT in FLT3- AML, irrespectively of the NPM1 mutation status. Moreover, our results indicated that CAT and WT1 gene expression levels correlated with the response to therapy. CAT expression was highest in patients who remained longer under complete remission, whereas WT1 expression increased with treatment resistance. On the whole, this study demonstrates that the AML-array can potentially serve as a first-line screening tool, and may be helpful for the diagnosis of AML, whereas the differentiation between AML subgroups can be more successfully performed with PCR-based analysis of a few marker genes.

  1. PCR and RFLP analyses based on the ribosomal protein operon

    USDA-ARS?s Scientific Manuscript database

    Differentiation and classification of phytoplasmas have been primarily based on the highly conserved 16Sr RNA gene. RFLP analysis of 16Sr RNA gene sequences has identified 31 16Sr RNA (16Sr) groups and more than 100 16Sr subgroups. Classification of phytoplasma strains can however, become more refin...

  2. Microarray and network-based identification of functional modules and pathways of active tuberculosis.

    PubMed

    Bian, Zhong-Rui; Yin, Juan; Sun, Wen; Lin, Dian-Jie

    2017-04-01

    Diagnose of active tuberculosis (TB) is challenging and treatment response is also difficult to efficiently monitor. The aim of this study was to use an integrated analysis of microarray and network-based method to the samples from publically available datasets to obtain a diagnostic module set and pathways in active TB. Towards this goal, background protein-protein interactions (PPI) network was generated based on global PPI information and gene expression data, following by identification of differential expression network (DEN) from the background PPI network. Then, ego genes were extracted according to the degree features in DEN. Next, module collection was conducted by ego gene expansion based on EgoNet algorithm. After that, differential expression of modules between active TB and controls was evaluated using random permutation test. Finally, biological significance of differential modules was detected by pathways enrichment analysis based on Reactome database, and Fisher's exact test was implemented to extract differential pathways for active TB. Totally, 47 ego genes and 47 candidate modules were identified from the DEN. By setting the cutoff-criteria of gene size >5 and classification accuracy ≥0.9, 7 ego modules (Module 4, Module 7, Module 9, Module 19, Module 25, Module 38 and Module 43) were extracted, and all of them had the statistical significance between active TB and controls. Then, Fisher's exact test was conducted to capture differential pathways for active TB. Interestingly, genes in Module 4, Module 25, Module 38, and Module 43 were enriched in the same pathway, formation of a pool of free 40S subunits. Significant pathway for Module 7 and Module 9 was eukaryotic translation termination, and for Module 19 was nonsense mediated decay enhanced by the exon junction complex (EJC). Accordingly, differential modules and pathways might be potential biomarkers for treating active TB, and provide valuable clues for better understanding of molecular mechanism of active TB. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Differential Gene Expression (DEX) and Alternative Splicing Events (ASE) for Temporal Dynamic Processes Using HMMs and Hierarchical Bayesian Modeling Approaches.

    PubMed

    Oh, Sunghee; Song, Seongho

    2017-01-01

    In gene expression profile, data analysis pipeline is categorized into four levels, major downstream tasks, i.e., (1) identification of differential expression; (2) clustering co-expression patterns; (3) classification of subtypes of samples; and (4) detection of genetic regulatory networks, are performed posterior to preprocessing procedure such as normalization techniques. To be more specific, temporal dynamic gene expression data has its inherent feature, namely, two neighboring time points (previous and current state) are highly correlated with each other, compared to static expression data which samples are assumed as independent individuals. In this chapter, we demonstrate how HMMs and hierarchical Bayesian modeling methods capture the horizontal time dependency structures in time series expression profiles by focusing on the identification of differential expression. In addition, those differential expression genes and transcript variant isoforms over time detected in core prerequisite steps can be generally further applied in detection of genetic regulatory networks to comprehensively uncover dynamic repertoires in the aspects of system biology as the coupled framework.

  4. Characterization of 1,577 Primary Prostate Cancers Reveals Novel Biological and Clinicopathological Insights into Molecular Subtypes

    PubMed Central

    Tomlins, Scott A.; Alshalalfa, Mohammed; Davicioni, Elai; Erho, Nicholas; Yousefi, Kasra; Zhao, Shuang; Haddad, Zaid; Den, Robert B.; Dicker, Adam P.; Trock, Bruce; DeMarzo, Angelo; Ross, Ashley; Schaeffer, Edward M.; Klein, Eric A.; Magi-Galluzzi, Cristina; Karnes, Jeffery R.; Jenkins, Robert B.; Feng, Felix Y.

    2015-01-01

    Background Prostate cancer (PCa) molecular subtypes have been defined by essentially mutually exclusive events, including ETS gene fusions (most commonly involving ERG) and SPINK1 over-expression. Clinical assessment may aid in disease stratification, complementing available prognostic tests. Objective To determine the analytical validity and clinicopatholgical associations of microarray-based molecular subtyping. Design, Setting and Participants We analyzed Affymetrix GeneChip expression profiles for 1,577 patients from eight radical prostatectomy (RP) cohorts, including 1,351 cases assessed using the Decipher prognostic assay (performed in a CLIA-certified laboratory). A microarray-based (m-) random forest ERG classification model was trained and validated. Outlier expression analysis was used to predict other mutually exclusive non-ERG ETS gene rearrangements (ETS+) or SPINK1 over-expression (SPINK1+). Outcome Measurements Associations with clinical features and outcomes by multivariable logistic regression analysis and receiver operating curves. Results and Limitations The m-ERG classifier showed 95% accuracy in an independent validation subset (n=155 samples). Across cohorts, 45%, 9%, 8% and 38% of PCa were classified as m-ERG+, m-ETS+, m-SPINK1+, and triple negative (m-ERG−/m-ETS−/m-SPINK1−), respectively. Gene expression profiling supports three underlying molecularly defined groups (m-ERG+, m-ETS+ and m-SPINK1+/triple negative). On multivariable analysis, m-ERG+ tumors were associated with lower preoperative serum PSA and Gleason scores, but enriched for extraprostatic extension (p<0.001). m-ETS+ tumors were associated with seminal vesicle invasion (p=0.01), while m-SPINK1+/triple negative tumors had higher Gleason scores and were more frequent in Black/African American patients (p<0.001). Clinical outcomes were not significantly different between subtypes. Conclusions A clinically available prognostic test (Decipher) can also assess PCa molecular subtypes, obviating the need for additional testing. Clinicopathological differences were found among subtypes based on global expression patterns. PMID:25964175

  5. Autosomal Dominant Cataract: Intrafamilial Phenotypic Variability, Interocular Asymmetry, and Variable Progression in Four Chilean Families

    PubMed Central

    Shafie, Suraiya M.; Barria von-Bischhoffshausen, Fernando R.; Bateman, J. Bronwyn

    2006-01-01

    PURPOSE To document intrafamilial and interocular phenotypic variability of autosomal dominant cataract (ADC). DESIGN Prospective observational case series. METHODS We performed ophthalmologic examination in four Chilean ADC families. RESULTS The families exhibited variability with respect to morphology, location with the lens, color and density of cataracts among affected members. We documented asymmetry between eyes in the morphology, location within the lens, color and density of cataracts, and a variable rate of progression. CONCLUSIONS The cataracts in these families exhibit wide intrafamilial and interocular phenotypic variability, supporting the premise that the mutated genes are expressed differentially in individuals and between eyes; other genes or environmental factors may be the bases for this variability. Marked progression among some family members underscores the variable clinical course of a common mutation within a family. Like retinitis pigmentosa, classification of ADC will be most useful if based on the gene and specific mutation. PMID:16564818

  6. Distinction between asymptomatic monoclonal B-cell lymphocytosis with cyclin D1 overexpression and mantle cell lymphoma: from molecular profiling to flow cytometry.

    PubMed

    Espinet, Blanca; Ferrer, Ana; Bellosillo, Beatriz; Nonell, Lara; Salar, Antonio; Fernández-Rodríguez, Concepción; Puigdecanet, Eulàlia; Gimeno, Javier; Garcia-Garcia, Mar; Vela, Maria Carmen; Luño, Elisa; Collado, Rosa; Navarro, José Tomás; de la Banda, Esmeralda; Abrisqueta, Pau; Arenillas, Leonor; Serrano, Cristina; Lloreta, Josep; Miñana, Belén; Cerutti, Andrea; Florensa, Lourdes; Orfao, Alberto; Sanz, Ferran; Solé, Francesc; Dominguez-Sola, David; Serrano, Sergio

    2014-02-15

    According to current diagnostic criteria, mantle cell lymphoma (MCL) encompasses the usual, aggressive variants and rare, nonnodal cases with monoclonal asymptomatic lymphocytosis, cyclin D1-positive (MALD1). We aimed to understand the biology behind this clinical heterogeneity and to identify markers for adequate identification of MALD1 cases. We compared 17 typical MCL cases with a homogeneous group of 13 untreated MALD1 cases (median follow-up, 71 months). We conducted gene expression profiling with functional analysis in five MCL and five MALD1. Results were validated in 12 MCL and 8 MALD1 additional cases by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and in 24 MCL and 13 MALD1 cases by flow cytometry. Classification and regression trees strategy was used to generate an algorithm based on CD38 and CD200 expression by flow cytometry. We found 171 differentially expressed genes with enrichment of neoplastic behavior and cell proliferation signatures in MCL. Conversely, MALD1 was enriched in gene sets related to immune activation and inflammatory responses. CD38 and CD200 were differentially expressed between MCL and MALD1 and confirmed by flow cytometry (median CD38, 89% vs. 14%; median CD200, 0% vs. 24%, respectively). Assessment of both proteins allowed classifying 85% (11 of 13) of MALD1 cases whereas 15% remained unclassified. SOX11 expression by qRT-PCR was significantly different between MCL and MALD1 groups but did not improve the classification. We show for the first time that MALD1, in contrast to MCL, is characterized by immune activation and driven by inflammatory cues. Assessment of CD38/CD200 by flow cytometry is useful to distinguish most cases of MALD1 from MCL in the clinical setting. MALD1 should be identified and segregated from the current MCL category to avoid overdiagnosis and unnecessary treatment. ©2013 AACR

  7. Distinction between Asymptomatic Monoclonal B-cell Lymphocytosis with Cyclin D1 Overexpression and Mantle Cell Lymphoma: From Molecular Profiling to Flow Cytometry

    PubMed Central

    Espinet, Blanca; Ferrer, Ana; Bellosillo, Beatriz; Nonell, Lara; Salar, Antonio; Fernández-Rodríguez, Concepción; Puigdecanet, Eulàlia; Gimeno, Javier; Garcia-Garcia, Mar; Carmen Vela, Maria; Luño, Elisa; Collado, Rosa; Navarro, José Tomás; de la Banda, Esmeralda; Abrisqueta, Pau; Arenillas, Leonor; Serrano, Cristina; Lloreta, Josep; Miñana, Belén; Cerutti, Andrea; Florensa, Lourdes; Orfao, Alberto; Sanz, Ferran; Solé, Francesc; Dominguez-Sola, David; Serrano, Sergio

    2015-01-01

    Purpose According to current diagnostic criteria, mantle cell lymphoma (MCL) encompasses the usual, aggressive variants and rare, nonnodal cases with monoclonal asymptomatic lymphocytosis, cyclin D1–positive (MALD1). We aimed to understand the biology behind this clinical heterogeneity and to identify markers for adequate identification of MALD1 cases. Experimental Design We compared 17 typical MCL cases with a homogeneous group of 13 untreated MALD1 cases (median follow-up, 71 months). We conducted gene expression profiling with functional analysis in five MCL and five MALD1. Results were validated in 12 MCL and 8 MALD1 additional cases by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and in 24 MCL and 13 MALD1 cases by flow cytometry. Classification and regression trees strategy was used to generate an algorithm based on CD38 and CD200 expression by flow cytometry. Results We found 171 differentially expressed genes with enrichment of neoplastic behavior and cell proliferation signatures in MCL. Conversely, MALD1 was enriched in gene sets related to immune activation and inflammatory responses. CD38 and CD200 were differentially expressed between MCL and MALD1 and confirmed by flow cytometry (median CD38, 89% vs. 14%; median CD200, 0% vs. 24%, respectively). Assessment of both proteins allowed classifying 85% (11 of 13) of MALD1 cases whereas 15% remained unclassified. SOX11 expression by qRT-PCR was significantly different between MCL and MALD1 groups but did not improve the classification. Conclusion We show for the first time that MALD1, in contrast to MCL, is characterized by immune activation and driven by inflammatory cues. Assessment of CD38/CD200 by flow cytometry is useful to distinguish most cases of MALD1 from MCL in the clinical setting. MALD1 should be identified and segregated from the current MCL category to avoid overdiagnosis and unnecessary treatment. PMID:24352646

  8. Prediction of cancer class with majority voting genetic programming classifier using gene expression data.

    PubMed

    Paul, Topon Kumar; Iba, Hitoshi

    2009-01-01

    In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.

  9. Integrative Chemical-Biological Read-Across Approach for Chemical Hazard Classification

    PubMed Central

    Low, Yen; Sedykh, Alexander; Fourches, Denis; Golbraikh, Alexander; Whelan, Maurice; Rusyn, Ivan; Tropsha, Alexander

    2013-01-01

    Traditional read-across approaches typically rely on the chemical similarity principle to predict chemical toxicity; however, the accuracy of such predictions is often inadequate due to the underlying complex mechanisms of toxicity. Here we report on the development of a hazard classification and visualization method that draws upon both chemical structural similarity and comparisons of biological responses to chemicals measured in multiple short-term assays (”biological” similarity). The Chemical-Biological Read-Across (CBRA) approach infers each compound's toxicity from those of both chemical and biological analogs whose similarities are determined by the Tanimoto coefficient. Classification accuracy of CBRA was compared to that of classical RA and other methods using chemical descriptors alone, or in combination with biological data. Different types of adverse effects (hepatotoxicity, hepatocarcinogenicity, mutagenicity, and acute lethality) were classified using several biological data types (gene expression profiling and cytotoxicity screening). CBRA-based hazard classification exhibited consistently high external classification accuracy and applicability to diverse chemicals. Transparency of the CBRA approach is aided by the use of radial plots that show the relative contribution of analogous chemical and biological neighbors. Identification of both chemical and biological features that give rise to the high accuracy of CBRA-based toxicity prediction facilitates mechanistic interpretation of the models. PMID:23848138

  10. A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus.

    PubMed

    Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh

    2013-12-13

    Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.

  11. THP-1 monocytes but not macrophages as a potential alternative for CD34{sup +} dendritic cells to identify chemical skin sensitizers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lambrechts, Nathalie; Verstraelen, Sandra; Lodewyckx, Hanne

    2009-04-15

    Early detection of the sensitizing potential of chemicals is an emerging issue for chemical, pharmaceutical and cosmetic industries. In our institute, an in vitro classification model for prediction of chemical-induced skin sensitization based on gene expression signatures in human CD34{sup +} progenitor-derived dendritic cells (DC) has been developed. This primary cell model is able to closely mimic the induction phase of sensitization by Langerhans cells in the skin, but it has drawbacks, such as the availability of cord blood. The aim of this study was to investigate whether human in vitro cultured THP-1 monocytes or macrophages display a similar expressionmore » profile for 13 predictive gene markers previously identified in DC and whether they also possess a discriminating capacity towards skin sensitizers and non-sensitizers based on these marker genes. To this end, the cell models were exposed to 5 skin sensitizers (ammonium hexachloroplatinate IV, 1-chloro-2,4-dinitrobenzene, eugenol, para-phenylenediamine, and tetramethylthiuram disulfide) and 5 non-sensitizers (L-glutamic acid, methyl salicylate, sodium dodecyl sulfate, tributyltin chloride, and zinc sulfate) for 6, 10, and 24 h, and mRNA expression of the 13 genes was analyzed using real-time RT-PCR. The transcriptional response of 7 out of 13 genes in THP-1 monocytes was significantly correlated with DC, whereas only 2 out of 13 genes in THP-1 macrophages. After a cross-validation of a discriminant analysis of the gene expression profiles in the THP-1 monocytes, this cell model demonstrated to also have a capacity to distinguish skin sensitizers from non-sensitizers. However, the DC model was superior to the monocyte model for discrimination of (non-)sensitizing chemicals.« less

  12. Classification of intramural metastases and lymph node metastases of esophageal cancer from gene expression based on boosting and projective adaptive resonance theory.

    PubMed

    Takahashi, Hiro; Aoyagi, Kazuhiko; Nakanishi, Yukihiro; Sasaki, Hiroki; Yoshida, Teruhiko; Honda, Hiroyuki

    2006-07-01

    Esophageal cancer is a well-known cancer with poorer prognosis than other cancers. An optimal and individualized treatment protocol based on accurate diagnosis is urgently needed to improve the treatment of cancer patients. For this purpose, it is important to develop a sophisticated algorithm that can manage a large amount of data, such as gene expression data from DNA microarrays, for optimal and individualized diagnosis. Marker gene selection is essential in the analysis of gene expression data. We have already developed a combination method of the use of the projective adaptive resonance theory and that of a boosted fuzzy classifier with the SWEEP operator denoted PART-BFCS. This method is superior to other methods, and has four features, namely fast calculation, accurate prediction, reliable prediction, and rule extraction. In this study, we applied this method to analyze microarray data obtained from esophageal cancer patients. A combination method of PART-BFCS and the U-test was also investigated. It was necessary to use a specific type of BFCS, namely, BFCS-1,2, because the esophageal cancer data were very complexity. PART-BFCS and PART-BFCS with the U-test models showed higher performances than two conventional methods, namely, k-nearest neighbor (kNN) and weighted voting (WV). The genes including CDK6 could be found by our methods and excellent IF-THEN rules could be extracted. The genes selected in this study have a high potential as new diagnosis markers for esophageal cancer. These results indicate that the new methods can be used in marker gene selection for the diagnosis of cancer patients.

  13. Genomic profiling using array comparative genomic hybridization define distinct subtypes of diffuse large b-cell lymphoma: a review of the literature

    PubMed Central

    2012-01-01

    Diffuse large B-cell lymphoma (DLBCL) is the most common type of non-Hodgkin Lymphoma comprising of greater than 30% of adult non-Hodgkin Lymphomas. DLBCL represents a diverse set of lymphomas, defined as diffuse proliferation of large B lymphoid cells. Numerous cytogenetic studies including karyotypes and fluorescent in situ hybridization (FISH), as well as morphological, biological, clinical, microarray and sequencing technologies have attempted to categorize DLBCL into morphological variants, molecular and immunophenotypic subgroups, as well as distinct disease entities. Despite such efforts, most lymphoma remains undistinguishable and falls into DLBCL, not otherwise specified (DLBCL-NOS). The advent of microarray-based studies (chromosome, RNA, gene expression, etc) has provided a plethora of high-resolution data that could potentially facilitate the finer classification of DLBCL. This review covers the microarray data currently published for DLBCL. We will focus on these types of data; 1) array based CGH; 2) classical CGH; and 3) gene expression profiling studies. The aims of this review were three-fold: (1) to catalog chromosome loci that are present in at least 20% or more of distinct DLBCL subtypes; a detailed list of gains and losses for different subtypes was generated in a table form to illustrate specific chromosome loci affected in selected subtypes; (2) to determine common and distinct copy number alterations among the different subtypes and based on this information, characteristic and similar chromosome loci for the different subtypes were depicted in two separate chromosome ideograms; and, (3) to list re-classified subtypes and those that remained indistinguishable after review of the microarray data. To the best of our knowledge, this is the first effort to compile and review available literatures on microarray analysis data and their practical utility in classifying DLBCL subtypes. Although conventional cytogenetic methods such as Karyotypes and FISH have played a major role in classification schemes of lymphomas, better classification models are clearly needed to further understanding the biology, disease outcome and therapeutic management of DLBCL. In summary, microarray data reviewed here can provide better subtype specific classifications models for DLBCL. PMID:22967872

  14. The functional therapeutic chemical classification system.

    PubMed

    Croset, Samuel; Overington, John P; Rebholz-Schuhmann, Dietrich

    2014-03-15

    Drug repositioning is the discovery of new indications for compounds that have already been approved and used in a clinical setting. Recently, some computational approaches have been suggested to unveil new opportunities in a systematic fashion, by taking into consideration gene expression signatures or chemical features for instance. We present here a novel method based on knowledge integration using semantic technologies, to capture the functional role of approved chemical compounds. In order to computationally generate repositioning hypotheses, we used the Web Ontology Language to formally define the semantics of over 20 000 terms with axioms to correctly denote various modes of action (MoA). Based on an integration of public data, we have automatically assigned over a thousand of approved drugs into these MoA categories. The resulting new resource is called the Functional Therapeutic Chemical Classification System and was further evaluated against the content of the traditional Anatomical Therapeutic Chemical Classification System. We illustrate how the new classification can be used to generate drug repurposing hypotheses, using Alzheimers disease as a use-case. https://www.ebi.ac.uk/chembl/ftc; https://github.com/loopasam/ftc. croset@ebi.ac.uk Supplementary data are available at Bioinformatics online.

  15. Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

    PubMed Central

    Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

    2003-01-01

    Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292

  16. Expression of HOXB genes is significantly different in acute myeloid leukemia with a partial tandem duplication of MLL vs. a MLL translocation: a cross-laboratory study.

    PubMed

    Liu, Hsi-Che; Shih, Lee-Yung; May Chen, Mei-Ju; Wang, Chien-Chih; Yeh, Ting-Chi; Lin, Tung-Huei; Chen, Chien-Yu; Lin, Chih-Jen; Liang, Der-Cherng

    2011-05-01

    In acute myeloid leukemia (AML), the mixed lineage leukemia (MLL) gene may be rearranged to generate a partial tandem duplication (PTD), or fused to partner genes through a chromosomal translocation (tMLL). In this study, we first explored the differentially expressed genes between MLL-PTD and tMLL using gene expression profiling of our cohort (15 MLL-PTD and 10 tMLL) and one published data set. The top 250 probes were chosen from each set, resulting in 29 common probes (21 unique genes) to both sets. The selected genes include four HOXB genes, HOXB2, B3, B5, and B6. The expression values of these HOXB genes significantly differ between MLL-PTD and tMLL cases. Clustering and classification analyses were thoroughly conducted to support our gene selection results. Second, as MLL-PTD, FLT3-ITD, and NPM1 mutations are identified in AML with normal karyotypes, we briefly studied their impact on the HOXB genes. Another contribution of this study is to demonstrate that using public data from other studies enriches samples for analysis and yields more conclusive results. 2011 Elsevier Inc. All rights reserved.

  17. Rate of Amino Acid Substitution Is Influenced by the Degree and Conservation of Male-Biased Transcription Over 50 Myr of Drosophila Evolution

    PubMed Central

    Grath, Sonja; Parsch, John

    2012-01-01

    Sex-biased gene expression (i.e., the differential expression of genes between males and females) is common among sexually reproducing species. However, genes often differ in their sex-bias classification or degree of sex bias between species. There is also an unequal distribution of sex-biased genes (especially male-biased genes) between the X chromosome and the autosomes. We used whole-genome expression data and evolutionary rate estimates for two different Drosophilid lineages, melanogaster and obscura, spanning an evolutionary time scale of around 50 Myr to investigate the influence of sex-biased gene expression and chromosomal location on the rate of molecular evolution. In both lineages, the rate of protein evolution correlated positively with the male/female expression ratio. Genes with highly male-biased expression, genes expressed specifically in male reproductive tissues, and genes with conserved male-biased expression over long evolutionary time scales showed the fastest rates of evolution. An analysis of sex-biased gene evolution in both lineages revealed evidence for a “fast-X” effect in which the rate of evolution was greater for X-linked than for autosomal genes. This pattern was particularly pronounced for male-biased genes. Genes located on the obscura “neo-X” chromosome, which originated from a recent X-autosome fusion, showed rates of evolution that were intermediate between genes located on the ancestral X-chromosome and the autosomes. This suggests that the shift to X-linkage led to an increase in the rate of molecular evolution. PMID:22321769

  18. On the classification techniques in data mining for microarray data classification

    NASA Astrophysics Data System (ADS)

    Aydadenta, Husna; Adiwijaya

    2018-03-01

    Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.

  19. Analysis of Gene Expression Profiles of Soft Tissue Sarcoma Using a Combination of Knowledge-Based Filtering with Integration of Multiple Statistics

    PubMed Central

    Doi, Ayano; Ichinohe, Risa; Ikuyo, Yoriko; Takahashi, Teruyoshi; Marui, Shigetaka; Yasuhara, Koji; Nakamura, Tetsuro; Sugita, Shintaro; Sakamoto, Hiromi; Yoshida, Teruhiko; Hasegawa, Tadashi

    2014-01-01

    The diagnosis and treatment of soft tissue sarcomas (STS) have been difficult. Of the diverse histological subtypes, undifferentiated pleomorphic sarcoma (UPS) is particularly difficult to diagnose accurately, and its classification per se is still controversial. Recent advances in genomic technologies provide an excellent way to address such problems. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In the present study, we analyzed microarray data from 88 STS patients using a combination method that used knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. We identified 25 genes, including hypoxia-related genes (e.g., MIF, SCD1, P4HA1, ENO1, and STAT1) and cell cycle- and DNA repair-related genes (e.g., TACC3, PRDX1, PRKDC, and H2AFY). These genes showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. STAT1 showed a strong association with overall survival in UPS patients (logrank p = 1.84×10−6 and adjusted p value 2.99×10−3 after the permutation test). According to the literature, the 25 genes selected are useful not only as markers of differential diagnosis but also as prognostic/predictive markers and/or therapeutic targets for STS. Our combination method can identify genes that are potential prognostic/predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation. PMID:25188299

  20. Gene profiling, biomarkers and pathways characterizing HCV-related hepatocellular carcinoma

    PubMed Central

    De Giorgi, Valeria; Monaco, Alessandro; Worchech, Andrea; Tornesello, MariaLina; Izzo, Francesco; Buonaguro, Luigi; Marincola, Francesco M; Wang, Ena; Buonaguro, Franco M

    2009-01-01

    Background Hepatitis C virus (HCV) infection is a major cause of hepatocellular carcinoma (HCC) worldwide. The molecular mechanisms of HCV-induced hepatocarcinogenesis are not yet fully elucidated. Besides indirect effects as tissue inflammation and regeneration, a more direct oncogenic activity of HCV can be postulated leading to an altered expression of cellular genes by early HCV viral proteins. In the present study, a comparison of gene expression patterns has been performed by microarray analysis on liver biopsies from HCV-positive HCC patients and HCV-negative controls. Methods Gene expression profiling of liver tissues has been performed using a high-density microarray containing 36'000 oligos, representing 90% of the human genes. Samples were obtained from 14 patients affected by HCV-related HCC and 7 HCV-negative non-liver-cancer patients, enrolled at INT in Naples. Transcriptional profiles identified in liver biopsies from HCC nodules and paired non-adjacent non-HCC liver tissue of the same HCV-positive patients were compared to those from HCV-negative controls by the Cluster program. The pathway analysis was performed using the BRB-Array- Tools based on the "Ingenuity System Database". Significance threshold of t-test was set at 0.001. Results Significant differences were found between the expression patterns of several genes falling into different metabolic and inflammation/immunity pathways in HCV-related HCC tissues as well as the non-HCC counterpart compared to normal liver tissues. Only few genes were found differentially expressed between HCV-related HCC tissues and paired non-HCC counterpart. Conclusion In this study, informative data on the global gene expression pattern of HCV-related HCC and non-HCC counterpart, as well as on their difference with the one observed in normal liver tissues have been obtained. These results may lead to the identification of specific biomarkers relevant to develop tools for detection, diagnosis, and classification of HCV-related HCC. PMID:19821982

  1. A machine-learned computational functional genomics-based approach to drug classification.

    PubMed

    Lötsch, Jörn; Ultsch, Alfred

    2016-12-01

    The public accessibility of "big data" about the molecular targets of drugs and the biological functions of genes allows novel data science-based approaches to pharmacology that link drugs directly with their effects on pathophysiologic processes. This provides a phenotypic path to drug discovery and repurposing. This paper compares the performance of a functional genomics-based criterion to the traditional drug target-based classification. Knowledge discovery in the DrugBank and Gene Ontology databases allowed the construction of a "drug target versus biological process" matrix as a combination of "drug versus genes" and "genes versus biological processes" matrices. As a canonical example, such matrices were constructed for classical analgesic drugs. These matrices were projected onto a toroid grid of 50 × 82 artificial neurons using a self-organizing map (SOM). The distance, respectively, cluster structure of the high-dimensional feature space of the matrices was visualized on top of this SOM using a U-matrix. The cluster structure emerging on the U-matrix provided a correct classification of the analgesics into two main classes of opioid and non-opioid analgesics. The classification was flawless with both the functional genomics and the traditional target-based criterion. The functional genomics approach inherently included the drugs' modulatory effects on biological processes. The main pharmacological actions known from pharmacological science were captures, e.g., actions on lipid signaling for non-opioid analgesics that comprised many NSAIDs and actions on neuronal signal transmission for opioid analgesics. Using machine-learned techniques for computational drug classification in a comparative assessment, a functional genomics-based criterion was found to be similarly suitable for drug classification as the traditional target-based criterion. This supports a utility of functional genomics-based approaches to computational system pharmacology for drug discovery and repurposing.

  2. Constrained clusters of gene expression profiles with pathological features.

    PubMed

    Sese, Jun; Kurokawa, Yukinori; Monden, Morito; Kato, Kikuya; Morishita, Shinichi

    2004-11-22

    Gene expression profiles should be useful in distinguishing variations in disease, since they reflect accurately the status of cells. The primary clustering of gene expression reveals the genotypes that are responsible for the proximity of members within each cluster, while further clustering elucidates the pathological features of the individual members of each cluster. However, since the first clustering process and the second classification step, in which the features are associated with clusters, are performed independently, the initial set of clusters may omit genes that are associated with pathologically meaningful features. Therefore, it is important to devise a way of identifying gene expression clusters that are associated with pathological features. We present the novel technique of 'itemset constrained clustering' (IC-Clustering), which computes the optimal cluster that maximizes the interclass variance of gene expression between groups, which are divided according to the restriction that only divisions that can be expressed using common features are allowed. This constraint automatically labels each cluster with a set of pathological features which characterize that cluster. When applied to liver cancer datasets, IC-Clustering revealed informative gene expression clusters, which could be annotated with various pathological features, such as 'tumor' and 'man', or 'except tumor' and 'normal liver function'. In contrast, the k-means method overlooked these clusters.

  3. Multiclass classification of microarray data samples with a reduced number of genes

    PubMed Central

    2011-01-01

    Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522

  4. Biallelic and Triallelic 5-Hydroxytyramine Transporter Gene-Linked Polymorphic Region (5-HTTLPR) Polymorphisms and Their Relationship with Lifelong Premature Ejaculation: A Case-Control Study in a Chinese Population

    PubMed Central

    Huang, Yuanyuan; Zhang, Xiansheng; Gao, Jingjing; Tang, Dongdong; Gao, Pan; Li, Chao; Liu, Weiqun; Liang, Chaozhao

    2016-01-01

    Background This study aimed to explore the relationship between premature ejaculation (PE) and the serotonin transporter gene-linked polymorphic region (5-HTTLPR) with respect to the biallelic and triallelic classifications. Material/Methods A total of 115 outpatients who complained of ejaculating prematurely and who were diagnosed as having lifelong premature ejaculation (LPE) and 101 controls without PE complaint were recruited. All subjects completed a detailed questionnaire and were genotyped for 5-HTTLPR polymorphism using PCR-based technology. We evaluated the associations between 5-HTTLPR allelic and genotypic frequencies and their association with LPE, as well as the intravaginal ejaculation latency time (IELT) of different 5-HTTLPR genotypes among LPE patients. Results The patients and controls did not differ significantly in terms of any characteristic except age. The results showed no significant difference regarding biallelic 5-HTTLPR. According to the triallelic classification, no significant difference was found when comparing the genotypic distribution (P=0.091). However, the distribution of the S, LG, and LA alleles in the cases was significantly different from the controls (P=0.018). We found a significantly lower frequency of LA allele and higher frequency of LG allele in patients. Based on another classification by expression, we found a significantly lower frequency of the L’L’ genotype (OR=0.37; 95%CI=0.15–0.91, P=0.025) in patients with LPE. No significant association was detected between IELT of LPE and different genotypes. Conclusions Contrary to the general classification based on S/L alleles, triallelic 5-HTTLPR was associated with LPE. Triallelic 5-HTTLPR may be a promising field for genetic research in PE to avoid false-negative results in future studies. PMID:27311544

  5. Transcriptome analysis of WRKY gene family in Oryza officinalis Wall ex Watt and WRKY genes involved in responses to Xanthomonas oryzae pv. oryzae stress

    PubMed Central

    Jiang, Chunmiao; Shen, Qingxi J.; Wang, Bo; He, Bin; Xiao, Suqin; Chen, Ling; Yu, Tengqiong; Ke, Xue; Zhong, Qiaofang; Fu, Jian; Chen, Yue; Wang, Lingxian; Yin, Fuyou; Zhang, Dunyu; Ghidan, Walid; Huang, Xingqi; Cheng, Zaiquan

    2017-01-01

    Oryza officinalis Wall ex Watt, a very important and special wild rice species, shows abundant genetic diversity and disease resistance features, especially high resistance to bacterial blight. The molecular mechanisms of bacterial blight resistance in O. officinalis have not yet been elucidated. The WRKY transcription factor family is one of the largest gene families involved in plant growth, development and stress response. However, little is known about the numbers, structure, molecular phylogenetics, and expression of the WRKY genes under Xanthomonas oryzae pv. oryzae (Xoo) stress in O. officinalis due to lacking of O. officinalis genome. Therefore, based on the RNA-sequencing data of O. officinalis, we performed a comprehensive study of WRKY genes in O. officinalis and identified 89 OoWRKY genes. Then 89 OoWRKY genes were classified into three groups based on the WRKY domains and zinc finger motifs. Phylogenetic analysis strongly supported that the evolution of OoWRKY genes were consistent with previous studies of WRKYs, and subgroup IIc OoWRKY genes were the original ancestors of some group II and group III OoWRKYs. Among the 89 OoWRKY genes, eight OoWRKYs displayed significantly different expression (>2-fold, p<0.01) in the O. officinalis transcriptome under Xoo strains PXO99 and C5 stress 48 h, suggesting these genes might play important role in PXO99 and C5 stress responses in O. officinalis. QRT-PCR analysis and confirmation of eight OoWRKYs expression patterns revealed that they responded strongly to PXO99 and C5 stress 24 h, 48 h, and 72 h, and the trends of these genes displaying marked changes were consistent with the 48 h RNA-sequencing data, demonstrated these genes played important roles in response to biotic stress and might even involved in the bacterial blight resistance. Tissue expression profiles of eight OoWRKY genes revealed that they were highly expressed in root, stem, leaf, and flower, especially in leaf (except OoWRKY71), suggesting these genes might be also important for plant growth and organ development. In this study, we analyzed the WRKY family of transcription factors in O.officinalis. Insight was gained into the classification, evolution, and function of the OoWRKY genes, revealing the putative roles of eight significantly different expression OoWRKYs in Xoo strains PXO99 and C5 stress responses in O.officinalis. This study provided a better understanding of the evolution and functions of O. officinalis WRKY genes, and suggested that manipulating eight significantly different expression OoWRKYs would enhance resistance to bacterial blight. PMID:29190793

  6. Transcriptome analysis of WRKY gene family in Oryza officinalis Wall ex Watt and WRKY genes involved in responses to Xanthomonas oryzae pv. oryzae stress.

    PubMed

    Jiang, Chunmiao; Shen, Qingxi J; Wang, Bo; He, Bin; Xiao, Suqin; Chen, Ling; Yu, Tengqiong; Ke, Xue; Zhong, Qiaofang; Fu, Jian; Chen, Yue; Wang, Lingxian; Yin, Fuyou; Zhang, Dunyu; Ghidan, Walid; Huang, Xingqi; Cheng, Zaiquan

    2017-01-01

    Oryza officinalis Wall ex Watt, a very important and special wild rice species, shows abundant genetic diversity and disease resistance features, especially high resistance to bacterial blight. The molecular mechanisms of bacterial blight resistance in O. officinalis have not yet been elucidated. The WRKY transcription factor family is one of the largest gene families involved in plant growth, development and stress response. However, little is known about the numbers, structure, molecular phylogenetics, and expression of the WRKY genes under Xanthomonas oryzae pv. oryzae (Xoo) stress in O. officinalis due to lacking of O. officinalis genome. Therefore, based on the RNA-sequencing data of O. officinalis, we performed a comprehensive study of WRKY genes in O. officinalis and identified 89 OoWRKY genes. Then 89 OoWRKY genes were classified into three groups based on the WRKY domains and zinc finger motifs. Phylogenetic analysis strongly supported that the evolution of OoWRKY genes were consistent with previous studies of WRKYs, and subgroup IIc OoWRKY genes were the original ancestors of some group II and group III OoWRKYs. Among the 89 OoWRKY genes, eight OoWRKYs displayed significantly different expression (>2-fold, p<0.01) in the O. officinalis transcriptome under Xoo strains PXO99 and C5 stress 48 h, suggesting these genes might play important role in PXO99 and C5 stress responses in O. officinalis. QRT-PCR analysis and confirmation of eight OoWRKYs expression patterns revealed that they responded strongly to PXO99 and C5 stress 24 h, 48 h, and 72 h, and the trends of these genes displaying marked changes were consistent with the 48 h RNA-sequencing data, demonstrated these genes played important roles in response to biotic stress and might even involved in the bacterial blight resistance. Tissue expression profiles of eight OoWRKY genes revealed that they were highly expressed in root, stem, leaf, and flower, especially in leaf (except OoWRKY71), suggesting these genes might be also important for plant growth and organ development. In this study, we analyzed the WRKY family of transcription factors in O.officinalis. Insight was gained into the classification, evolution, and function of the OoWRKY genes, revealing the putative roles of eight significantly different expression OoWRKYs in Xoo strains PXO99 and C5 stress responses in O.officinalis. This study provided a better understanding of the evolution and functions of O. officinalis WRKY genes, and suggested that manipulating eight significantly different expression OoWRKYs would enhance resistance to bacterial blight.

  7. nRC: non-coding RNA Classifier based on structural features.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso

    2017-01-01

    Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

  8. Gene Expression in Accumbens GABA Neurons from Inbred Rats with Different Drug-Taking Behavior

    PubMed Central

    Sharp, B.M.; Chen, H.; Gong, S.; Wu, X.; Liu, Z.; Hiler, K.; Taylor, W.L.; Matta, S.G.

    2011-01-01

    Inbred Lewis and Fisher 344 rat strains differ greatly in drug self-administration; Lewis rats operantly self-administer drugs of abuse including nicotine, whereas Fisher self-administer poorly. As shown herein, operant food self-administration is similar. Based on their pivotal role in drug reward, we hypothesized that differences in basal gene expression in GABAergic neurons projecting from nucleus accumbens (NAcc) to ventral pallidum (VP) play a role in vulnerability to drug taking behavior. The transcriptomes of NAcc shell-VP GABAergic neurons from these two strains were analyzed in adolescents, using a multidisciplinary approach that combined stereotaxic ionotophoretic brain microinjections, laser-capture microdissection (LCM) and microarray measurement of transcripts. LCM enriched the gene transcripts detected in GABA neurons compared to the residual NAcc tissue: a ratio of neuron/residual > 1 and false discovery rate (FDR) <5% yielded 6,623 transcripts, whereas a ratio of >3 yielded 3,514. Strain-dependent differences in gene expression within GABA neurons were identified; 322 vs. 60 transcripts showed 1.5-fold vs. 2-fold differences in expression (FDR<5%). Classification by gene ontology showed these 322 transcripts were widely distributed, without categorical enrichment. This is most consistent with a global change in GABA neuron function. Literature-mining by Chilibot found 38 genes related to synaptic plasticity, signaling and gene transcription, all of which determine drug-abuse; 33 genes have no known association with addiction or nicotine. In Lewis rats, upregulation of Mint-1, Cask, CamkIIδ, Ncam1, Vsnl1, Hpcal1 and Car8 indicates these transcripts likely contribute to altered signaling and synaptic function in NAcc GABA projection neurons to VP. PMID:21745336

  9. Genomics of Mature and Immature Olfactory Sensory Neurons

    PubMed Central

    Nickell, Melissa D.; Breheny, Patrick; Stromberg, Arnold J.; McClintock, Timothy S.

    2014-01-01

    The continuous replacement of neurons in the olfactory epithelium provides an advantageous model for investigating neuronal differentiation and maturation. By calculating the relative enrichment of every mRNA detected in samples of mature mouse olfactory sensory neurons (OSNs), immature OSNs, and the residual population of neighboring cell types, and then comparing these ratios against the known expression patterns of >300 genes, enrichment criteria that accurately predicted the OSN expression patterns of nearly all genes were determined. We identified 847 immature OSN-specific and 691 mature OSN-specific genes. The control of gene expression by chromatin modification and transcription factors, and neurite growth, protein transport, RNA processing, cholesterol biosynthesis, and apoptosis via death domain receptors, were overrepresented biological processes in immature OSNs. Ion transport (ion channels), presynaptic functions, and cilia-specific processes were overrepresented in mature OSNs. Processes overrepresented among the genes expressed by all OSNs were protein and ion transport, ER overload response, protein catabolism, and the electron transport chain. To more accurately represent gradations in mRNA abundance and identify all genes expressed in each cell type, classification methods were used to produce probabilities of expression in each cell type for every gene. These probabilities, which identified 9,300 genes expressed in OSNs, were 96% accurate at identifying genes expressed in OSNs and 86% accurate at discriminating genes specific to mature and immature OSNs. This OSN gene database not only predicts the genes responsible for the major biological processes active in OSNs, but also identifies thousands of never before studied genes that support OSN phenotypes. PMID:22252456

  10. Identification of Disease Critical Genes Using Collective Meta-heuristic Approaches: An Application to Preeclampsia.

    PubMed

    Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar

    2017-12-01

    Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.

  11. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  12. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    PubMed

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Ethylene-induced differential gene expression during abscission of citrus leaves

    PubMed Central

    Merelo, Paz; Cercós, Manuel; Tadeo, Francisco R.; Talón, Manuel

    2008-01-01

    The main objective of this work was to identify and classify genes involved in the process of leaf abscission in Clementina de Nules (Citrus clementina Hort. Ex Tan.). A 7 K unigene citrus cDNA microarray containing 12 K spots was used to characterize the transcriptome of the ethylene-induced abscission process in laminar abscission zone-enriched tissues and the petiole of debladed leaf explants. In these conditions, ethylene induced 100% leaf explant abscission in 72 h while, in air-treated samples, the abscission period started later and took 240 h. Gene expression monitored during the first 36 h of ethylene treatment showed that out of the 12 672 cDNA microarray probes, ethylene differentially induced 725 probes distributed as follows: 216 (29.8%) probes in the laminar abscission zone and 509 (70.2%) in the petiole. Functional MIPS classification and manual annotation of differentially expressed genes highlighted key processes regulating the activation and progress of the cell separation that brings about abscission. These included cell-wall modification, lipid transport, protein biosynthesis and degradation, and differential activation of signal transduction and transcription control pathways. Expression data associated with the petiole indicated the occurrence of a double defensive strategy mediated by the activation of a biochemical programme including scavenging ROS, defence and PR genes, and a physical response mostly based on lignin biosynthesis and deposition. This work identifies new genes probably involved in the onset and development of the leaf abscission process and suggests a different but co-ordinated and complementary role for the laminar abscission zone and the petiole during the process of abscission. PMID:18515267

  14. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

    PubMed

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H

    2017-01-09

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

    PubMed Central

    Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

    2015-01-01

    Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836

  16. A Robust Unified Approach to Analyzing Methylation and Gene Expression Data

    PubMed Central

    Khalili, Abbas; Huang, Tim; Lin, Shili

    2009-01-01

    Microarray technology has made it possible to investigate expression levels, and more recently methylation signatures, of thousands of genes simultaneously, in a biological sample. Since more and more data from different biological systems or technological platforms are being generated at an incredible rate, there is an increasing need to develop statistical methods that are applicable to multiple data types and platforms. Motivated by such a need, a flexible finite mixture model that is applicable to methylation, gene expression, and potentially data from other biological systems, is proposed. Two major thrusts of this approach are to allow for a variable number of components in the mixture to capture non-biological variation and small biases, and to use a robust procedure for parameter estimation and probe classification. The method was applied to the analysis of methylation signatures of three breast cancer cell lines. It was also tested on three sets of expression microarray data to study its power and type I error rates. Comparison with a number of existing methods in the literature yielded very encouraging results; lower type I error rates and comparable/better power were achieved based on the limited study. Furthermore, the method also leads to more biologically interpretable results for the three breast cancer cell lines. PMID:20161265

  17. Cloning, annotation and expression analysis of mycoparasitism-related genes in Trichoderma harzianum 88.

    PubMed

    Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun

    2013-04-01

    Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions.

  18. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.

    PubMed

    Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet

    2017-01-01

    RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  19. Behçet's: A Disease or a Syndrome? Answer from an Expression Profiling Study.

    PubMed

    Oğuz, Ali Kemal; Yılmaz, Seda Taşır; Oygür, Çağdaş Şahap; Çandar, Tuba; Sayın, Irmak; Kılıçoğlu, Sibel Serin; Ergün, İhsan; Ateş, Aşkın; Özdağ, Hilal; Akar, Nejat

    2016-01-01

    Behçet's disease (BD) is a chronic, relapsing, multisystemic inflammatory disorder with unanswered questions regarding its etiology/pathogenesis and classification. Distinct manifestation based subsets, pronounced geographical variations in expression, and discrepant immunological abnormalities raised the question whether Behçet's is "a disease or a syndrome". To answer the preceding question we aimed to display and compare the molecular mechanisms underlying distinct subsets of BD. For this purpose, the expression data of the gene expression profiling and association study on BD by Xavier et al (2013) was retrieved from GEO database and reanalysed by gene expression data analysis/visualization and bioinformatics enrichment tools. There were 15 BD patients (B) and 14 controls (C). Three subsets of BD patients were generated: MB (isolated mucocutaneous manifestations, n = 7), OB (ocular involvement, n = 4), and VB (large vein thrombosis, n = 4). Class comparison analyses yielded the following numbers of differentially expressed genes (DEGs); B vs C: 4, MB vs C: 5, OB vs C: 151, VB vs C: 274, MB vs OB: 215, MB vs VB: 760, OB vs VB: 984. Venn diagram analysis showed that there were no common DEGs in the intersection "MB vs C" ∩ "OB vs C" ∩ "VB vs C". Cluster analyses successfully clustered distinct expressions of BD. During gene ontology term enrichment analyses, categories with relevance to IL-8 production (MB vs C) and immune response to microorganisms (OB vs C) were differentially enriched. Distinct subsets of BD display distinct expression profiles and different disease associated pathways. Based on these clear discrepancies, the designation as "Behçet's syndrome" (BS) should be encouraged and future research should take into consideration the immunogenetic heterogeneity of BS subsets. Four gene groups, namely, negative regulators of inflammation (CD69, CLEC12A, CLEC12B, TNFAIP3), neutrophil granule proteins (LTF, OLFM4, AZU1, MMP8, DEFA4, CAMP), antigen processing and presentation proteins (CTSS, ERAP1), and regulators of immune response (LGALS2, BCL10, ITCH, CEACAM8, CD36, IL8, CCL4, EREG, NFKBIZ, CCR2, CD180, KLRC4, NFAT5) appear to be instrumental in BS immunopathogenesis.

  20. A mixture model-based approach to the clustering of microarray expression data.

    PubMed

    McLachlan, G J; Bean, R W; Peel, D

    2002-03-01

    This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

  1. An Integrated Analysis of miRNA and mRNA Expressions in Non-Small Cell Lung Cancers

    PubMed Central

    Ma, Lina; Huang, Yanyan; Zhu, Wangyu; Zhou, Shiquan; Zhou, Jihang; Zeng, Fang; Liu, Xiaoguang; Zhang, Yongkui; Yu, Jun

    2011-01-01

    Using DNA microarrays, we generated both mRNA and miRNA expression data from 6 non-small cell lung cancer (NSCLC) tissues and their matching normal control from adjacent tissues to identify potential miRNA markers for diagnostics. We demonstrated that hsa-miR-96 is significantly and consistently up-regulated in all 6 NSCLCs. We validated this result in an independent set of 35 paired tumors and their adjacent normal tissues, as well as their sera that are collected before surgical resection or chemotherapy, and the results suggested that hsa-miR-96 may play an important role in NSCLC development and has great potential to be used as a noninvasive marker for diagnosing NSCLC. We predicted potential miRNA target mRNAs based on different methods (TargetScan and miRanda). Further classification of miRNA regulated genes based on their relationship with miRNAs revealed that hsa-miR-96 and certain other miRNAs tend to down-regulate their target mRNAs in NSCLC development, which have expression levels permissive to direct interaction between miRNAs and their target mRNAs. In addition, we identified a significant correlation of miRNA regulation with genes coincide with high density of CpG islands, which suggests that miRNA may represent a primary regulatory mechanism governing basic cellular functions and cell differentiations, and such mechanism may be complementary to DNA methylation in repressing or activating gene expression. PMID:22046296

  2. Genome-wide classification, evolutionary analysis and gene expression patterns of the kinome in Gossypium

    PubMed Central

    Yan, Jun; Li, Guilin; Guo, Xingqi; Li, Yang; Cao, Xuecheng

    2018-01-01

    The protein kinase (PK, kinome) family is one of the largest families in plants and regulates almost all aspects of plant processes, including plant development and stress responses. Despite their important functions, comprehensive functional classification, evolutionary analysis and expression patterns of the cotton PK gene family has yet to be performed on PK genes. In this study, we identified the cotton kinomes in the Gossypium raimondii, Gossypium arboretum, Gossypium hirsutum and Gossypium barbadense genomes and classified them into 7 groups and 122–24 subfamilies using software HMMER v3.0 scanning and neighbor-joining (NJ) phylogenetic analysis. Some conserved exon-intron structures were identified not only in cotton species but also in primitive plants, ferns and moss, suggesting the significant function and ancient origination of these PK genes. Collinearity analysis revealed that 16.6 million years ago (Mya) cotton-specific whole genome duplication (WGD) events may have played a partial role in the expansion of the cotton kinomes, whereas tandem duplication (TD) events mainly contributed to the expansion of the cotton RLK group. Synteny analysis revealed that tetraploidization of G. hirsutum and G. barbadense contributed to the expansion of G. hirsutum and G. barbadense PKs. Global expression analysis of cotton PKs revealed stress-specific and fiber development-related expression patterns, suggesting that many cotton PKs might be involved in the regulation of the stress response and fiber development processes. This study provides foundational information for further studies on the evolution and molecular function of cotton PKs. PMID:29768506

  3. Response of Human Skin to Aesthetic Scarification

    PubMed Central

    Gabriel, Vincent A.; McClellan, Elizabeth A.; Scheuermann, Richard H.

    2014-01-01

    This study was undertaken to investigate changes in RNA expression in previously healthy adult human skin following thermal injury induced by contact with hot metal that was undertaken as part of aesthetic scarification, a body modification practice. Subjects were recruited to have pre-injury skin and serial wound biopsies performed. 4 mm punch biopsies were taken prior to branding and 1 hour, 1 week, and 1, 2 and 3 months post injury. RNA was extracted and quality assured prior to the use of a whole-genome based bead array platform to describe expression changes in the samples using the pre-injury skin as a comparator. Analysis of the array data was performed using k-means clustering and a hypergeometric probability distribution without replacement and corrections for multiple comparisons were done. Confirmatory q-PCR was performed. Using a k of 10, several clusters of genes were shown to co-cluster together based on Gene Ontology classification with probabilities unlikely to occur by chance alone. OF particular interest were clusters relating to cell cycle, proteinaceous extracellular matrix and keratinization. Given the consistent expression changes at one week following injury in the cell cycle cluster, there is an opportunity to intervene early following burn injury to influence scar development. PMID:24582755

  4. Discovering biclusters in gene expression data based on high-dimensional linear geometries

    PubMed Central

    Gan, Xiangchao; Liew, Alan Wee-Chung; Yan, Hong

    2008-01-01

    Background In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. Results In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. Conclusion We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis. PMID:18433477

  5. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    PubMed

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  6. A novel approach for dimension reduction of microarray.

    PubMed

    Aziz, Rabia; Verma, C K; Srivastava, Namita

    2017-12-01

    This paper proposes a new hybrid search technique for feature (gene) selection (FS) using Independent component analysis (ICA) and Artificial Bee Colony (ABC) called ICA+ABC, to select informative genes based on a Naïve Bayes (NB) algorithm. An important trait of this technique is the optimization of ICA feature vector using ABC. ICA+ABC is a hybrid search algorithm that combines the benefits of extraction approach, to reduce the size of data and wrapper approach, to optimize the reduced feature vectors. This hybrid search technique is facilitated by evaluating the performance of ICA+ABC on six standard gene expression datasets of classification. Extensive experiments were conducted to compare the performance of ICA+ABC with the results obtained from recently published Minimum Redundancy Maximum Relevance (mRMR) +ABC algorithm for NB classifier. Also to check the performance that how ICA+ABC works as feature selection with NB classifier, compared the combination of ICA with popular filter techniques and with other similar bio inspired algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The result shows that ICA+ABC has a significant ability to generate small subsets of genes from the ICA feature vector, that significantly improve the classification accuracy of NB classifier compared to other previously suggested methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Comprehensive gene expression profiling and immunohistochemical studies support application of immunophenotypic algorithm for molecular subtype classification in diffuse large B-cell lymphoma: A report from the International DLBCL Rituximab-CHOP Consortium Program Study

    PubMed Central

    Visco, Carlo; Li, Yan; Xu-Monette, Zijun Y.; Miranda, Roberto N.; Green, Tina M.; Li, Yong; Tzankov, Alexander; Wen, Wei; Liu, Wei-min; Kahl, Brad S.; d’Amore, Emanuele S. G.; Montes-Moreno, Santiago; Dybkær, Karen; Chiu, April; Tam, Wayne; Orazi, Attilio; Zu, Youli; Bhagat, Govind; Winter, Jane N.; Wang, Huan-You; O’Neill, Stacey; Dunphy, Cherie H.; Hsi, Eric D.; Zhao, X. Frank; Go, Ronald S.; Choi, William W. L.; Zhou, Fan; Czader, Magdalena; Tong, Jiefeng; Zhao, Xiaoying; van Krieken, J. Han; Huang, Qing; Ai, Weiyun; Etzell, Joan; Ponzoni, Maurilio; Ferreri, Andres J. M.; Piris, Miguel A.; Møller, Michael B.; Bueso-Ramos, Carlos E.; Medeiros, L. Jeffrey; Wu, Lin; Young, Ken H.

    2013-01-01

    Gene expression profiling (GEP) has stratified diffuse large B-cell lymphoma (DLBCL) into molecular subgroups that correspond to different stages of lymphocyte development - namely germinal center B-cell-like and activated B-cell-like. This classification has prognostic significance, but GEP is expensive and not readily applicable into daily practice, which has lead to immunohistochemical algorithms proposed as a surrogate for GEP analysis. We assembled tissue microarrays from 475 de novo DLBCL patients who were treated with rituximab-CHOP chemotherapy. All cases were successfully profiled by GEP on formalin-fixed, paraffin-embedded tissue samples. Sections were stained with antibodies reactive with CD10, GCET1, FOXP1, MUM1, and BCL6 and cases were classified following a rationale of sequential steps of differentiation of B-cells. Cutoffs for each marker were obtained using receiver operating characteristic curves, obviating the need for any arbitrary method. An algorithm based on the expression of CD10, FOXP1, and BCL6 was developed that had a simpler structure than other recently proposed algorithms and 92.6% concordance with GEP. In multivariate analysis, both the International Prognostic Index and our proposed algorithm were significant independent predictors of progression-free and overall survival. In conclusion, this algorithm effectively predicts prognosis of DLBCL patients matching GEP subgroups in the era of rituximab therapy. PMID:22437443

  8. Entropy-based gene ranking without selection bias for the predictive classification of microarray data.

    PubMed

    Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe

    2003-11-06

    We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  9. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection.

    PubMed

    Belciug, Smaranda; Gorunescu, Florin

    2018-06-08

    Methods based on microarrays (MA), mass spectrometry (MS), and machine learning (ML) algorithms have evolved rapidly in recent years, allowing for early detection of several types of cancer. A pitfall of these approaches, however, is the overfitting of data due to large number of attributes and small number of instances -- a phenomenon known as the 'curse of dimensionality'. A potentially fruitful idea to avoid this drawback is to develop algorithms that combine fast computation with a filtering module for the attributes. The goal of this paper is to propose a statistical strategy to initiate the hidden nodes of a single-hidden layer feedforward neural network (SLFN) by using both the knowledge embedded in data and a filtering mechanism for attribute relevance. In order to attest its feasibility, the proposed model has been tested on five publicly available high-dimensional datasets: breast, lung, colon, and ovarian cancer regarding gene expression and proteomic spectra provided by cDNA arrays, DNA microarray, and MS. The novel algorithm, called adaptive SLFN (aSLFN), has been compared with four major classification algorithms: traditional ELM, radial basis function network (RBF), single-hidden layer feedforward neural network trained by backpropagation algorithm (BP-SLFN), and support vector-machine (SVM). Experimental results showed that the classification performance of aSLFN is competitive with the comparison models. Copyright © 2018. Published by Elsevier Inc.

  10. Large-scale gene function analysis with the PANTHER classification system.

    PubMed

    Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

    2013-08-01

    The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.

  11. RNA-seq transcriptome analysis of formalin fixed, paraffin-embedded canine meningioma

    PubMed Central

    Grenier, Jennifer K.; Foureman, Polly A.; Sloma, Erica A.

    2017-01-01

    Meningiomas are the most commonly reported primary intracranial tumor in dogs and humans and between the two species there are similarities in histology and biologic behavior. Due to these similarities, dogs have been proposed as models for meningioma pathobiology. However, little is known about specific pathways and individual genes that are involved in the development and progression of canine meningioma. In addition, studies are lacking that utilize RNAseq to characterize gene expression in clinical cases of canine meningioma. The primary objective of this study was to develop a technique for which high quality RNA can be extracted from formalin-fixed, paraffin embedded tissue and then used for transcriptome analysis to determine patterns of gene expression. RNA was extracted from thirteen canine meningiomas–eleven from formalin fixed and two flash-frozen. These represented six grade I and seven grade II meningiomas based on the World Health Organization classification system for human meningioma. RNA was also extracted from fresh frozen leptomeninges from three control dogs for comparison. RNAseq libraries made from formalin fixed tissue were of sufficient quality to successfully identify 125 significantly differentially expressed genes, the majority of which were related to oncogenic processes. Twelve genes (AQP1, BMPER, FBLN2, FRZB, MEDAG, MYC, PAMR1, PDGFRL, PDPN, PECAM1, PERP, ZC2HC1C) were validated using qPCR. Among the differentially expressed genes were oncogenes, tumor suppressors, transcription factors, VEGF-related genes, and members of the WNT pathway. Our work demonstrates that RNA of sufficient quality can be extracted from FFPE canine meningioma samples to provide biologically relevant transcriptome analyses using a next-generation sequencing technique, such as RNA-seq. PMID:29073243

  12. Transcriptomic analysis of Ruditapes philippinarum hemocytes reveals cytoskeleton disruption after in vitro Vibrio tapetis challenge.

    PubMed

    Brulle, Franck; Jeffroy, Fanny; Madec, Stéphanie; Nicolas, Jean-Louis; Paillard, Christine

    2012-10-01

    The Manila clam, Ruditapes philippinarum, is an economically-important, commercial shellfish; harvests are diminished in some European waters by a pathogenic bacterium, Vibrio tapetis, that causes Brown Ring disease. To identify molecular characteristics associated with susceptibility or resistance to Brown Ring disease, Suppression Subtractive Hybridization (SSH) analyzes were performed to construct cDNA libraries enriched in up- or down-regulated transcripts from clam immune cells, hemocytes, after a 3-h in vitro challenge with cultured V. tapetis. Nine hundred and ninety eight sequences from the two libraries were sequenced, and an in silico analysis identified 235 unique genes. BLAST and "Gene ontology" classification analyzes revealed that 60.4% of the Expressed Sequence Tags (ESTs) have high similarities with genes involved in various physiological functions, such as immunity, apoptosis and cytoskeleton organization; whereas, 39.6% remain unidentified. From the 235 unique genes, we selected 22 candidates based upon physiological function and redundancy in the libraries. Then, Real-Time PCR analysis identified 3 genes related to cytoskeleton organization showing significant variation in expression attributable to V. tapetis exposure. Disruption in regulation of these genes is consistent with the etiologic agent of Brown Ring disease in Manila clams. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Stromal-Based Signatures for the Classification of Gastric Cancer.

    PubMed

    Uhlik, Mark T; Liu, Jiangang; Falcon, Beverly L; Iyer, Seema; Stewart, Julie; Celikkaya, Hilal; O'Mahony, Marguerita; Sevinsky, Christopher; Lowes, Christina; Douglass, Larry; Jeffries, Cynthia; Bodenmiller, Diane; Chintharlapalli, Sudhakar; Fischl, Anthony; Gerald, Damien; Xue, Qi; Lee, Jee-Yun; Santamaria-Pang, Alberto; Al-Kofahi, Yousef; Sui, Yunxia; Desai, Keyur; Doman, Thompson; Aggarwal, Amit; Carter, Julia H; Pytowski, Bronislaw; Jaminet, Shou-Ching; Ginty, Fiona; Nasir, Aejaz; Nagy, Janice A; Dvorak, Harold F; Benjamin, Laura E

    2016-05-01

    Treatment of metastatic gastric cancer typically involves chemotherapy and monoclonal antibodies targeting HER2 (ERBB2) and VEGFR2 (KDR). However, reliable methods to identify patients who would benefit most from a combination of treatment modalities targeting the tumor stroma, including new immunotherapy approaches, are still lacking. Therefore, we integrated a mouse model of stromal activation and gastric cancer genomic information to identify gene expression signatures that may inform treatment strategies. We generated a mouse model in which VEGF-A is expressed via adenovirus, enabling a stromal response marked by immune infiltration and angiogenesis at the injection site, and identified distinct stromal gene expression signatures. With these data, we designed multiplexed IHC assays that were applied to human primary gastric tumors and classified each tumor to a dominant stromal phenotype representative of the vascular and immune diversity found in gastric cancer. We also refined the stromal gene signatures and explored their relation to the dominant patient phenotypes identified by recent large-scale studies of gastric cancer genomics (The Cancer Genome Atlas and Asian Cancer Research Group), revealing four distinct stromal phenotypes. Collectively, these findings suggest that a genomics-based systems approach focused on the tumor stroma can be used to discover putative predictive biomarkers of treatment response, especially to antiangiogenesis agents and immunotherapy, thus offering an opportunity to improve patient stratification. Cancer Res; 76(9); 2573-86. ©2016 AACR. ©2016 American Association for Cancer Research.

  14. Synaptic genes are extensively downregulated across multiple brain regions in normal human aging and Alzheimer’s disease

    PubMed Central

    Berchtold, Nicole C.; Coleman, Paul D.; Cribbs, David H.; Rogers, Joseph; Gillen, Daniel L.; Cotman, Carl W.

    2014-01-01

    Synapses are essential for transmitting, processing, and storing information, all of which decline in aging and Alzheimer’s disease (AD). Because synapse loss only partially accounts for the cognitive declines seen in aging and AD, we hypothesized that existing synapses might undergo molecular changes that reduce their functional capacity. Microarrays were used to evaluate expression profiles of 340 synaptic genes in aging (20–99 years) and AD across 4 brain regions from 81 cases. The analysis revealed an unexpectedly large number of significant expression changes in synapse-related genes in aging, with many undergoing progressive downregulation across aging and AD. Functional classification of the genes showing altered expression revealed that multiple aspects of synaptic function are affected, notably synaptic vesicle trafficking and release, neurotransmitter receptors and receptor trafficking, postsynaptic density scaffolding, cell adhesion regulating synaptic stability, and neuromodulatory systems. The widespread declines in synaptic gene expression in normal aging suggests that function of existing synapses might be impaired, and that a common set of synaptic genes are vulnerable to change in aging and AD. PMID:23273601

  15. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

    PubMed

    Cawley, Gavin C; Talbot, Nicola L C

    2006-10-01

    Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

  16. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    PubMed

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  17. Genome-wide analysis of the Solanum tuberosum (potato) trehalose-6-phosphate synthase (TPS) gene family: evolution and differential expression during development and stress.

    PubMed

    Xu, Yingchun; Wang, Yanjie; Mattson, Neil; Yang, Liu; Jin, Qijiang

    2017-12-01

    Trehalose-6-phosphate synthase (TPS) serves important functions in plant desiccation tolerance and response to environmental stimuli. At present, a comprehensive analysis, i.e. functional classification, molecular evolution, and expression patterns of this gene family are still lacking in Solanum tuberosum (potato). In this study, a comprehensive analysis of the TPS gene family was conducted in potato. A total of eight putative potato TPS genes (StTPSs) were identified by searching the latest potato genome sequence. The amino acid identity among eight StTPSs varied from 59.91 to 89.54%. Analysis of d N /d S ratios suggested that regions in the TPP (trehalose-6-phosphate phosphatase) domains evolved faster than the TPS domains. Although the sequence of the eight StTPSs showed high similarity (2571-2796 bp), their gene length is highly differentiated (3189-8406 bp). Many of the regulatory elements possibly related to phytohormones, abiotic stress and development were identified in different TPS genes. Based on the phylogenetic tree constructed using TPS genes of potato, and four other Solanaceae plants, TPS genes could be categorized into 6 distinct groups. Analysis revealed that purifying selection most likely played a major role during the evolution of this family. Amino acid changes detected in specific branches of the phylogenetic tree suggests relaxed constraints might have contributed to functional divergence among groups. Moreover, StTPSs were found to exhibit tissue and treatment specific expression patterns upon analysis of transcriptome data, and performing qRT-PCR. This study provides a reference for genome-wide identification of the potato TPS gene family and sets a framework for further functional studies of this important gene family in development and stress response.

  18. Growth condition dependency is the major cause of non-responsiveness upon genetic perturbation

    PubMed Central

    Amini, Saman; Holstege, Frank C. P.

    2017-01-01

    Investigating the role and interplay between individual proteins in biological processes is often performed by assessing the functional consequences of gene inactivation or removal. Depending on the sensitivity of the assay used for determining phenotype, between 66% (growth) and 53% (gene expression) of Saccharomyces cerevisiae gene deletion strains show no defect when analyzed under a single condition. Although it is well known that this non-responsive behavior is caused by different types of redundancy mechanisms or by growth condition/cell type dependency, it is not known what the relative contribution of these different causes is. Understanding the underlying causes of and their relative contribution to non-responsive behavior upon genetic perturbation is extremely important for designing efficient strategies aimed at elucidating gene function and unraveling complex cellular systems. Here, we provide a systematic classification of the underlying causes of and their relative contribution to non-responsive behavior upon gene deletion. The overall contribution of redundancy to non-responsive behavior is estimated at 29%, of which approximately 17% is due to homology-based redundancy and 12% is due to pathway-based redundancy. The major determinant of non-responsiveness is condition dependency (71%). For approximately 14% of protein complexes, just-in-time assembly can be put forward as a potential mechanistic explanation for how proteins can be regulated in a condition dependent manner. Taken together, the results underscore the large contribution of growth condition requirement to non-responsive behavior, which needs to be taken into account for strategies aimed at determining gene function. The classification provided here, can also be further harnessed in systematic analyses of complex cellular systems. PMID:28257504

  19. Molecular Biology of Archaebacteria.

    DTIC Science & Technology

    1988-03-31

    Security Classification) Molecular Biology of Archaebacteria .. d. 12. PERSONAL AUTHOR(S) Patrick P. Dennis * 13a. TYPE OF REPORT 13b. TIME COVERED 14...Escherichia coli Research Objectives i) to characterize the principles of gene organization and regulation of gene expression in archaebacteria ; (ii) to...biophysical and molecular terms some of the mechanisms that allow archaebacteria to inhabit extreme environments. Progress - Year I A’ Ribosomal protein

  20. ALE: automated label extraction from GEO metadata.

    PubMed

    Giles, Cory B; Brown, Chase A; Ripperger, Michael; Dennis, Zane; Roopnarinesingh, Xiavan; Porter, Hunter; Perz, Aleksandra; Wren, Jonathan D

    2017-12-28

    NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.

  1. Identifying differentially expressed genes in cancer patients using a non-parameter Ising model.

    PubMed

    Li, Xumeng; Feltus, Frank A; Sun, Xiaoqian; Wang, James Z; Luo, Feng

    2011-10-01

    Identification of genes and pathways involved in diseases and physiological conditions is a major task in systems biology. In this study, we developed a novel non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also proposed a simulated annealing algorithm to find the optimal configuration of the Ising model. The Ising model was applied to two breast cancer microarray data sets. The results showed that more cancer-related DE sub-networks and genes were identified by the Ising model than those by the Markov random field model. Furthermore, cross-validation experiments showed that DE genes identified by Ising model can improve classification performance compared with DE genes identified by Markov random field model. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.

    PubMed

    Lee, Sangseon; Park, Youngjune; Kim, Sun

    2017-07-15

    Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information. http://biohealth.snu.ac.kr/software/MIDAS/. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  3. Microarray gene expression profiling using core biopsies of renal neoplasia.

    PubMed

    Rogers, Craig G; Ditlev, Jonathon A; Tan, Min-Han; Sugimura, Jun; Qian, Chao-Nan; Cooper, Jeff; Lane, Brian; Jewett, Michael A; Kahnoski, Richard J; Kort, Eric J; Teh, Bin T

    2009-01-01

    We investigate the feasibility of using microarray gene expression profiling technology to analyze core biopsies of renal tumors for classification of tumor histology. Core biopsies were obtained ex-vivo from 7 renal tumors-comprised of four histological subtypes-following radical nephrectomy using 18-gauge biopsy needles. RNA was isolated from these samples and, in the case of biopsy samples, amplified by in vitro transcription. Microarray analysis was then used to quantify the mRNA expression patterns in these samples relative to non-diseased renal tissue mRNA. Genes with significant variation across all non-biopsy tumor samples were identified, and the relationship between tumor and biopsy samples in terms of expression levels of these genes was then quantified in terms of Euclidean distance, and visualized by complete linkage clustering. Final pathologic assessment of kidney tumors demonstrated clear cell renal cell carcinoma (4), oncocytoma (1), angiomyolipoma (1) and adrenalcortical carcinoma (1). Five of the seven biopsy samples were most similar in terms of gene expression to the resected tumors from which they were derived in terms of Euclidean distance. All seven biopsies were assigned to the correct histological class by hierarchical clustering. We demonstrate the feasibility of gene expression profiling of core biopsies of renal tumors to classify tumor histology.

  4. Microarray gene expression profiling using core biopsies of renal neoplasia

    PubMed Central

    Rogers, Craig G.; Ditlev, Jonathon A.; Tan, Min-Han; Sugimura, Jun; Qian, Chao-Nan; Cooper, Jeff; Lane, Brian; Jewett, Michael A.; Kahnoski, Richard J.; Kort, Eric J.; Teh, Bin T.

    2009-01-01

    We investigate the feasibility of using microarray gene expression profiling technology to analyze core biopsies of renal tumors for classification of tumor histology. Core biopsies were obtained ex-vivo from 7 renal tumors—comprised of four histological subtypes—following radical nephrectomy using 18-gauge biopsy needles. RNA was isolated from these samples and, in the case of biopsy samples, amplified by in vitro transcription. Microarray analysis was then used to quantify the mRNA expression patterns in these samples relative to non-diseased renal tissue mRNA. Genes with significant variation across all non-biopsy tumor samples were identified, and the relationship between tumor and biopsy samples in terms of expression levels of these genes was then quantified in terms of Euclidean distance, and visualized by complete linkage clustering. Final pathologic assessment of kidney tumors demonstrated clear cell renal cell carcinoma (4), oncocytoma (1), angiomyolipoma (1) and adrenalcortical carcinoma (1). Five of the seven biopsy samples were most similar in terms of gene expression to the resected tumors from which they were derived in terms of Euclidean distance. All seven biopsies were assigned to the correct histological class by hierarchical clustering. We demonstrate the feasibility of gene expression profiling of core biopsies of renal tumors to classify tumor histology. PMID:19966938

  5. Concurrent Validity and Classification Accuracy of Curriculum-Based Measurement for Written Expression

    ERIC Educational Resources Information Center

    Furey, William M.; Marcotte, Amanda M.; Hintze, John M.; Shackett, Caroline M.

    2016-01-01

    The study presents a critical analysis of written expression curriculum-based measurement (WE-CBM) metrics derived from 3- and 10-min test lengths. Criterion validity and classification accuracy were examined for Total Words Written (TWW), Correct Writing Sequences (CWS), Percent Correct Writing Sequences (%CWS), and Correct Minus Incorrect…

  6. Cancer survival analysis using semi-supervised learning method based on Cox and AFT models with L1/2 regularization.

    PubMed

    Liang, Yong; Chai, Hua; Liu, Xiao-Ying; Xu, Zong-Ben; Zhang, Hai; Leung, Kwong-Sak

    2016-03-01

    One of the most important objectives of the clinical cancer research is to diagnose cancer more accurately based on the patients' gene expression profiles. Both Cox proportional hazards model (Cox) and accelerated failure time model (AFT) have been widely adopted to the high risk and low risk classification or survival time prediction for the patients' clinical treatment. Nevertheless, two main dilemmas limit the accuracy of these prediction methods. One is that the small sample size and censored data remain a bottleneck for training robust and accurate Cox classification model. In addition to that, similar phenotype tumours and prognoses are actually completely different diseases at the genotype and molecular level. Thus, the utility of the AFT model for the survival time prediction is limited when such biological differences of the diseases have not been previously identified. To try to overcome these two main dilemmas, we proposed a novel semi-supervised learning method based on the Cox and AFT models to accurately predict the treatment risk and the survival time of the patients. Moreover, we adopted the efficient L1/2 regularization approach in the semi-supervised learning method to select the relevant genes, which are significantly associated with the disease. The results of the simulation experiments show that the semi-supervised learning model can significant improve the predictive performance of Cox and AFT models in survival analysis. The proposed procedures have been successfully applied to four real microarray gene expression and artificial evaluation datasets. The advantages of our proposed semi-supervised learning method include: 1) significantly increase the available training samples from censored data; 2) high capability for identifying the survival risk classes of patient in Cox model; 3) high predictive accuracy for patients' survival time in AFT model; 4) strong capability of the relevant biomarker selection. Consequently, our proposed semi-supervised learning model is one more appropriate tool for survival analysis in clinical cancer research.

  7. Cancer-cell intrinsic gene expression signatures overcome intratumoural heterogeneity bias in colorectal cancer patient classification

    PubMed Central

    Dunne, Philip D.; Alderdice, Matthew; O'Reilly, Paul G.; Roddy, Aideen C.; McCorry, Amy M. B.; Richman, Susan; Maughan, Tim; McDade, Simon S.; Johnston, Patrick G.; Longley, Daniel B.; Kay, Elaine; McArt, Darragh G.; Lawler, Mark

    2017-01-01

    Stromal-derived intratumoural heterogeneity (ITH) has been shown to undermine molecular stratification of patients into appropriate prognostic/predictive subgroups. Here, using several clinically relevant colorectal cancer (CRC) gene expression signatures, we assessed the susceptibility of these signatures to the confounding effects of ITH using gene expression microarray data obtained from multiple tumour regions of a cohort of 24 patients, including central tumour, the tumour invasive front and lymph node metastasis. Sample clustering alongside correlative assessment revealed variation in the ability of each signature to cluster samples according to patient-of-origin rather than region-of-origin within the multi-region dataset. Signatures focused on cancer-cell intrinsic gene expression were found to produce more clinically useful, patient-centred classifiers, as exemplified by the CRC intrinsic signature (CRIS), which robustly clustered samples by patient-of-origin rather than region-of-origin. These findings highlight the potential of cancer-cell intrinsic signatures to reliably stratify CRC patients by minimising the confounding effects of stromal-derived ITH. PMID:28561046

  8. Advances in metaheuristics for gene selection and classification of microarray data.

    PubMed

    Duval, Béatrice; Hao, Jin-Kao

    2010-01-01

    Gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy for classification. Gene selection can be considered as a combinatorial search problem and thus be conveniently handled with optimization methods. In this article, we summarize some recent developments of using metaheuristic-based methods within an embedded approach for gene selection. In particular, we put forward the importance and usefulness of integrating problem-specific knowledge into the search operators of such a method. To illustrate the point, we explain how ranking coefficients of a linear classifier such as support vector machine (SVM) can be profitably used to reinforce the search efficiency of Local Search and Evolutionary Search metaheuristic algorithms for gene selection and classification.

  9. Transcriptome analysis of stem development in the tumourous stem mustard Brassica juncea var. tumida Tsen et Lee by RNA sequencing.

    PubMed

    Sun, Quan; Zhou, Guanfan; Cai, Yingfan; Fan, Yonghong; Zhu, Xiaoyan; Liu, Yihua; He, Xiaohong; Shen, Jinjuan; Jiang, Huaizhong; Hu, Daiwen; Pan, Zheng; Xiang, Liuxin; He, Guanghua; Dong, Daiwen; Yang, Jianping

    2012-04-21

    Tumourous stem mustard (Brassica juncea var. tumida Tsen et Lee) is an economically and nutritionally important vegetable crop of the Cruciferae family that also provides the raw material for Fuling mustard. The genetics breeding, physiology, biochemistry and classification of mustards have been extensively studied, but little information is available on tumourous stem mustard at the molecular level. To gain greater insight into the molecular mechanisms underlying stem swelling in this vegetable and to provide additional information for molecular research and breeding, we sequenced the transcriptome of tumourous stem mustard at various stem developmental stages and compared it with that of a mutant variety lacking swollen stems. Using Illumina short-read technology with a tag-based digital gene expression (DGE) system, we performed de novo transcriptome assembly and gene expression analysis. In our analysis, we assembled genetic information for tumourous stem mustard at various stem developmental stages. In addition, we constructed five DGE libraries, which covered the strains Yong'an and Dayejie at various development stages. Illumina sequencing identified 146,265 unigenes, including 11,245 clusters and 135,020 singletons. The unigenes were subjected to a BLAST search and annotated using the GO and KO databases. We also compared the gene expression profiles of three swollen stem samples with those of two non-swollen stem samples. A total of 1,042 genes with significantly different expression levels occurring simultaneously in the six comparison groups were screened out. Finally, the altered expression levels of a number of randomly selected genes were confirmed by quantitative real-time PCR. Our data provide comprehensive gene expression information at the transcriptional level and the first insight into the understanding of the molecular mechanisms and regulatory pathways of stem swelling and development in this plant, and will help define new mechanisms of stem development in non-model plant organisms.

  10. Stem Cell-Like Gene Expression in Ovarian Cancer Predicts Type II Subtype and Prognosis

    PubMed Central

    Schwede, Matthew; Spentzos, Dimitrios; Bentink, Stefan; Hofmann, Oliver; Haibe-Kains, Benjamin; Harrington, David; Quackenbush, John; Culhane, Aedín C.

    2013-01-01

    Although ovarian cancer is often initially chemotherapy-sensitive, the vast majority of tumors eventually relapse and patients die of increasingly aggressive disease. Cancer stem cells are believed to have properties that allow them to survive therapy and may drive recurrent tumor growth. Cancer stem cells or cancer-initiating cells are a rare cell population and difficult to isolate experimentally. Genes that are expressed by stem cells may characterize a subset of less differentiated tumors and aid in prognostic classification of ovarian cancer. The purpose of this study was the genomic identification and characterization of a subtype of ovarian cancer that has stem cell-like gene expression. Using human and mouse gene signatures of embryonic, adult, or cancer stem cells, we performed an unsupervised bipartition class discovery on expression profiles from 145 serous ovarian tumors to identify a stem-like and more differentiated subgroup. Subtypes were reproducible and were further characterized in four independent, heterogeneous ovarian cancer datasets. We identified a stem-like subtype characterized by a 51-gene signature, which is significantly enriched in tumors with properties of Type II ovarian cancer; high grade, serous tumors, and poor survival. Conversely, the differentiated tumors share properties with Type I, including lower grade and mixed histological subtypes. The stem cell-like signature was prognostic within high-stage serous ovarian cancer, classifying a small subset of high-stage tumors with better prognosis, in the differentiated subtype. In multivariate models that adjusted for common clinical factors (including grade, stage, age), the subtype classification was still a significant predictor of relapse. The prognostic stem-like gene signature yields new insights into prognostic differences in ovarian cancer, provides a genomic context for defining Type I/II subtypes, and potential gene targets which following further validation may be valuable in the clinical management or treatment of ovarian cancer. PMID:23536770

  11. An improved method for functional similarity analysis of genes based on Gene Ontology.

    PubMed

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia

    2016-12-23

    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/ .

  12. Distinct transcriptome responses to water limitation in isohydric and anisohydric grapevine cultivars.

    PubMed

    Dal Santo, Silvia; Palliotti, Alberto; Zenoni, Sara; Tornielli, Giovanni Battista; Fasoli, Marianna; Paci, Paola; Tombesi, Sergio; Frioni, Tommaso; Silvestroni, Oriana; Bellincontro, Andrea; d'Onofrio, Claudio; Matarese, Fabiola; Gatti, Matteo; Poni, Stefano; Pezzotti, Mario

    2016-10-20

    Grapevine (Vitis vinifera L.) is an economically important crop with a wide geographical distribution, reflecting its ability to grow successfully in a range of climates. However, many vineyards are located in regions with seasonal drought, and these are often predicted to be global climate change hotspots. Climate change affects the entire physiology of grapevine, with strong effects on yield, wine quality and typicity, making it difficult to produce berries of optimal enological quality and consistent stability over the forthcoming decades. Here we investigated the reactions of two grapevine cultivars to water stress, the isohydric variety Montepulciano and the anisohydric variety Sangiovese, by examining physiological and molecular perturbations in the leaf and berry. A multidisciplinary approach was used to characterize the distinct stomatal behavior of the two cultivars and its impact on leaf and berry gene expression. Positive associations were found among the photosynthetic, physiological and transcriptional modifications, and candidate genes encoding master regulators of the water stress response were identified using an integrated approach based on the analysis of topological co-expression network properties. In particular, the genome-wide transcriptional study indicated that the isohydric behavior relies upon the following responses: i) faster transcriptome response after stress imposition; ii) faster abscisic acid-related gene modulation; iii) more rapid expression of heat shock protein (HSP) genes and iv) reversion of gene-expression profile at rewatering. Conversely, that reactive oxygen species (ROS)-scavenging enzymes, molecular chaperones and abiotic stress-related genes were induced earlier and more strongly in the anisohydric cultivar. Overall, the present work found original evidence of a molecular basis for the proposed classification between isohydric and anisohydric grapevine genotypes.

  13. Intrinsic subtypes from PAM50 gene expression assay in a population-based breast cancer cohort: Differences by age, race, and tumor characteristics

    PubMed Central

    Sweeney, Carol; Bernard, Philip S.; Factor, Rachel E.; Kwan, Marilyn L.; Habel, Laurel A.; Quesenberry, Charles P.; Shakespear, Kaylynn; Weltzien, Erin K.; Stijleman, Inge J.; Davis, Carole A.; Ebbert, Mark T.W.; Castillo, Adrienne; Kushi, Lawrence H.; Caan, Bette J.

    2014-01-01

    Background Data are lacking to describe gene expression-based breast cancer intrinsic subtype patterns for population-based patient groups. Methods We studied a diverse cohort of women with breast cancer from the Life After Cancer Epidemiology (LACE) and Pathways studies. RNA was extracted from 1 mm punches from fixed tumor tissue. Quantitative reverse-transcriptase polymerase chain reaction (RT-qPCR) was conducted for the 50 genes that comprise the PAM50 intrinsic subtype classifier. Results In a subcohort of 1,319 women, the overall subtype distribution based on PAM50 was 53.1% Luminal A, 20.5% Luminal B, 13.0% HER2-enriched, 9.8% Basal-like, and 3.6% Normal-like. Among low-risk endocrine positive tumors (i.e. estrogen and progesterone receptor positive by immunohistochemistry, Her2 negative, and low histologic grade), only 76.5% were categorized as Luminal A by PAM50. Continuous-scale Luminal A, Luminal B, HER2-enriched, and Normal-like scores from PAM50 were mutually positively correlated; Basal-like score was inversely correlated with other subtypes. The proportion with non-Luminal A subtype decreased with older age at diagnosis, p trend < 0.0001. Compared with non-Hispanic whites, African-American women were more likely to have Basal-like tumors, age-adjusted odds ratio (OR) 4.4 (95% CI 2.3,8.4), whereas Asian and Pacific Islander women had reduced odds of Basal-like subtype, OR 0.5 (95% CI 0.3,0.9). Conclusions Our data indicate that over 50% of breast cancers treated in the community have Luminal A subtype. Gene expression-based classification shifted some tumors categorized as low risk by surrogate clinicopathological criteria to higher-risk subtypes. Impact Subtyping in a population-based cohort revealed distinct profiles by age and race. PMID:24521995

  14. Genome-Wide Analysis of NBS-LRR Genes in Sorghum Genome Revealed Several Events Contributing to NBS-LRR Gene Evolution in Grass Species

    PubMed Central

    Yang, Xiping; Wang, Jianping

    2016-01-01

    The nucleotide-binding site (NBS)–leucine-rich repeat (LRR) gene family is crucially important for offering resistance to pathogens. To explore evolutionary conservation and variability of NBS-LRR genes across grass species, we identified 88, 107, 24, and 44 full-length NBS-LRR genes in sorghum, rice, maize, and Brachypodium, respectively. A comprehensive analysis was performed on classification, genome organization, evolution, expression, and regulation of these NBS-LRR genes using sorghum as a representative of grass species. In general, the full-length NBS-LRR genes are highly clustered and duplicated in sorghum genome mainly due to local duplications. NBS-LRR genes have basal expression levels and are highly potentially targeted by miRNA. The number of NBS-LRR genes in the four grass species is positively correlated with the gene clustering rate. The results provided a valuable genomic resource and insights for functional and evolutionary studies of NBS-LRR genes in grass species. PMID:26792976

  15. Phenotype classification of single cells using SRS microscopy, RNA sequencing, and microfluidics (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi

    2016-03-01

    Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.

  16. [Evaluation of traditional pathological classification at molecular classification era for gastric cancer].

    PubMed

    Yu, Yingyan

    2014-01-01

    Histopathological classification is in a pivotal position in both basic research and clinical diagnosis and treatment of gastric cancer. Currently, there are different classification systems in basic science and clinical application. In medical literatures, different classifications are used including Lauren and WHO systems, which have confused many researchers. Lauren classification has been proposed for half a century, but is still used worldwide. It shows many advantages of simple, easy handling with prognostic significance. The WHO classification scheme is better than Lauren classification in that it is continuously being revised according to the progress of gastric cancer, and is always used in the clinical and pathological diagnosis of common scenarios. Along with the progression of genomics, transcriptomics, proteomics, metabolomics researches, molecular classification of gastric cancer becomes the current hot topics. The traditional therapeutic approach based on phenotypic characteristics of gastric cancer will most likely be replaced with a gene variation mode. The gene-targeted therapy against the same molecular variation seems more reasonable than traditional chemical treatment based on the same morphological change.

  17. Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes

    PubMed Central

    Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin

    2013-01-01

    One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110

  18. Divisional role of quantitative HER2 testing in breast cancer.

    PubMed

    Yamamoto-Ibusuki, Mutsuko; Yamamoto, Yutaka; Fu, Peifen; Yamamoto, Satoko; Fujiwara, Saori; Honda, Yumi; Iyama, Ken-ichi; Iwase, Hirotaka

    2015-03-01

    Human epidermal growth factor receptor 2 (HER2) is amplified in human breast cancers in which therapy targeted to HER2 significantly improves patient outcome. We re-visited the use of real-time quantitative polymerase chain reaction (qPCR)-based assays using formalin-fixed paraffin-embedded (FFPE) tissues as alternative methods and investigated their particular clinical relevance. DNA and RNA were isolated from FFPE specimens and HER2 status was assessed by qPCR in 249 consecutive patients with primary breast cancer. Concordance with results forg immunohistochemistry (IHC) and in situ hybridization (ISH), clinical characteristics and survival was assessed. HER2 gene copy number had a stronger correlation with clinicopathological characteristics and excellent concordance with IHC/ISH results (Sensitivity: 96.7 %; concordance: 99.2 %). HER2 gene expression showed inadequate sensitivity, rendering it unsuitable to determine HER2 status (Sensitivity: 46.7 %; concordance: 92.1 %), but lower HER2 gene expression, leading to the classification of many cases as "false negative", contributed to a prediction of better prognosis within the HER2-amplified subpopulation. Quantitative HER2 assessments are suggested to have evolved their accuracy in this decade, which can be a potential alternative for HER2 diagnosis in line with the in situ method, while HER2 gene expression levels could provide additional information regarding prognosis or therapeutic strategy within a HER2-amplified subpopulation.

  19. Divergence between motoneurons: gene expression profiling provides a molecular characterization of functionally discrete somatic and autonomic motoneurons

    PubMed Central

    Cui, Dapeng; Dougherty, Kimberly J.; Machacek, David W.; Sawchuk, Michael; Hochman, Shawn; Baro, Deborah J.

    2009-01-01

    Studies in the developing spinal cord suggest that different motoneuron (MN) cell types express very different genetic programs, but the degree to which adult programs differ is unknown. To compare genetic programs between adult MN columnar cell types, we used laser capture micro-dissection (LCM) and Affymetrix microarrays to create expression profiles for three columnar cell types: lateral and medial MNs from lumbar segments and sympathetic preganglionic motoneurons located in the thoracic intermediolateral nucleus. A comparison of the three expression profiles indicated that ~7% (813/11,552) of the genes showed significant differences in their expression levels. The largest differences were observed between sympathetic preganglionic MNs and the lateral motor column, with 6% (706/11,552) of the genes being differentially expressed. Significant differences in expression were observed for 1.8% (207/11,552) of the genes when comparing sympathetic preganglionic MNs with the medial motor column. Lateral and medial MNs showed the least divergence, with 1.3% (150/11,552) of the genes being differentially expressed. These data indicate that the amount of divergence in expression profiles between identified columnar MNs does not strictly correlate with divergence of function as defined by innervation patterns (somatic/muscle vs. autonomic/viscera). Classification of the differentially expressed genes with regard to function showed that they underpin all fundamental cell systems and processes, although most differentially expressed genes encode proteins involved in signal transduction. Mining the expression profiles to examine transcription factors essential for MN development suggested that many of the same transcription factors participatein combinatorial codes in embryonic and adult neurons, but patterns of expression change significantly. PMID:16317082

  20. Analysis of gene expression profile induced by EMP-1 in esophageal cancer cells using cDNA Microarray

    PubMed Central

    Wang, Hai-Tao; Kong, Jian-Ping; Ding, Fang; Wang, Xiu-Qin; Wang, Ming-Rong; Liu, Lian-Xin; Wu, Min; Liu, Zhi-Hua

    2003-01-01

    AIM: To obtain human esophageal cancer cell EC9706 stably expressed epithelial membrane protein-1 (EMP-1) with integrated eukaryotic plasmid harboring the open reading frame (ORF) of human EMP-1, and then to study the mechanism by which EMP-1 exerts its diverse cellular action on cell proliferation and altered gene profile by exploring the effect of EMP-1. METHODS: The authors first constructed pcDNA3.1/myc-his expression vector harboring the ORF of EMP-1 and then transfected it into human esophageal carcinoma cell line EC9706. The positive clones were analyzed by Western blot and RT-PCR. Moreover, the cell growth curve was observed and the cell cycle was checked by FACS technique. Using cDNA microarray technology, the authors compared the gene expression pattern in positive clones with control. To confirm the gene expression profile, semi-quantitative RT-PCR was carried out for 4 of the randomly picked differentially expressed genes. For those differentially expressed genes, classification was performed according to their function and cellular component. RESULTS: Human EMP-1 gene can be stably expressed in EC9706 cell line transfected with human EMP-1. The authors found the cell growth decreased, among which S phase was arrested and G1 phase was prolonged in the transfected positive clones. By cDNA microarray analysis, 35 genes showed an over 2.0 fold change in expression level after transfection, with 28 genes being consistently up-regulated and 7 genes being down-regulated. Among the classified genes, almost half of the induced genes (13 out of 28 genes) were related to cell signaling, cell communication and particularly to adhesion. CONCLUSION: Overexpression of human EMP-1 gene can inhibit the proliferation of EC9706 cell with S phase arrested and G1 phase prolonged. The cDNA microarray analysis suggested that EMP-1 may be one of regulators involved in cell signaling, cell communication and adhesion regulators. PMID:12632483

  1. Analysis of gene expression profile induced by EMP-1 in esophageal cancer cells using cDNA Microarray.

    PubMed

    Wang, Hai-Tao; Kong, Jian-Ping; Ding, Fang; Wang, Xiu-Qin; Wang, Ming-Rong; Liu, Lian-Xin; Wu, Min; Liu, Zhi-Hua

    2003-03-01

    To obtain human esophageal cancer cell EC9706 stably expressed epithelial membrane protein-1 (EMP-1) with integrated eukaryotic plasmid harboring the open reading frame (ORF) of human EMP-1, and then to study the mechanism by which EMP-1 exerts its diverse cellular action on cell proliferation and altered gene profile by exploring the effect of EMP-1. The authors first constructed pcDNA3.1/myc-his expression vector harboring the ORF of EMP-1 and then transfected it into human esophageal carcinoma cell line EC9706. The positive clones were analyzed by Western blot and RT-PCR. Moreover, the cell growth curve was observed and the cell cycle was checked by FACS technique. Using cDNA microarray technology, the authors compared the gene expression pattern in positive clones with control. To confirm the gene expression profile, semi-quantitative RT-PCR was carried out for 4 of the randomly picked differentially expressed genes. For those differentially expressed genes, classification was performed according to their function and cellular component. Human EMP-1 gene can be stably expressed in EC9706 cell line transfected with human EMP-1. The authors found the cell growth decreased, among which S phase was arrested and G1 phase was prolonged in the transfected positive clones. By cDNA microarray analysis, 35 genes showed an over 2.0 fold change in expression level after transfection, with 28 genes being consistently up-regulated and 7 genes being down-regulated. Among the classified genes, almost half of the induced genes (13 out of 28 genes) were related to cell signaling, cell communication and particularly to adhesion. Overexpression of human EMP-1 gene can inhibit the proliferation of EC9706 cell with S phase arrested and G1 phase prolonged. The cDNA microarray analysis suggested that EMP-1 may be one of regulators involved in cell signaling, cell communication and adhesion regulators.

  2. Genome-wide identification, classification, and analysis of NADP-ME family members from 12 crucifer species.

    PubMed

    Tao, Peng; Guo, Weiling; Li, Biyuan; Wang, Wuhong; Yue, Zhichen; Lei, Juanli; Zhao, Yanting; Zhong, Xinmin

    2016-06-01

    NADP-dependent malic enzymes (NADP-MEs) play essential roles in both normal development and stress responses in plants. Here, genome-wide analysis was performed to identify 65 putative NADP-ME genes from 12 crucifer species. These NADP-ME genes were grouped into five categories of syntenic orthologous genes and were divided into three clades of a phylogenic tree. Promoter motif analysis showed that NADP-ME1 genes in Group IV were more conserved with each other than the other NADP-ME genes in Groups I and II. A nucleotide motif involved in ABA responses, desiccation and seed development was found in the promoters of most NADP-ME1 genes. Generally, the NADP-ME genes of Brassica rapa, B. oleracea and B. napus had less introns than their corresponding Arabidopsis orthologs. In these three Brassica species, the NADP-ME genes derived from the least fractionated subgenome have lost less introns than those from the medium fractionated and most fractionated subgenomes. BrNADP-ME1 showed the highest expression in petals and mature embryos. Two paralogous NADP-ME2 genes (BrNADP-ME2a and BrNADP-ME2b) shared similar expression profiles and differential expression levels. BrNADP-ME3 showed down-regulation during embryogenesis and reached its lowest expression in early cotyledonary embryos. BrNADP-ME4 was expressed widely in multiple organs and showed high expression during the whole embryogenesis process. Different NADP-ME genes of B. rapa showed differential gene expression profiles in young leaves after ABA treatment or cold stress. Our genome-wide identification and characterization of NADP-ME genes extend our understanding of the evolution or function of this family in Brassicaceae.

  3. Exploratory biomarker analysis for treatment response in KRAS wild type metastatic colorectal cancer patients who received cetuximab plus irinotecan.

    PubMed

    Kim, Seung Tae; Ahn, Tae Jin; Lee, Eunjin; Do, In-Gu; Lee, Su Jin; Park, Se Hoon; Park, Joon Oh; Park, Young Suk; Lim, Ho Yeong; Kang, Won Ki; Kim, Suk Hyeong; Lee, Jeeyun; Kim, Hee Cheol

    2015-10-20

    More than half of the patients selected based on KRAS mutation status fail to respond to the treatment with cetuximab in metastatic colorectal cancer (mCRC). We designed a study to identify additional biomarkers that could act as indicators for cetuximab treatment in mCRC. We investigated 58 tumor samples from wild type KRAS CRC patients treated with cetuximab plus irinotecan (CI). We conducted the genotyping for mutations in either BRAF or PIK3CA and profiled comprehensively the expression of 522 kinase genes. BRAF mutation was detected in 5.1 % (3/58) of patients. All 50 patients showed wild type PIK3CA. Gene expression patterns that categorized patients with or without the disease control to CI were compared by supervised classification analysis. PSKH1, TLK2 and PHKG2 were overexpressed significantly in patients with the disease control to IC. The higher expression value of PSKH1 (r = 0.462, p < 0.001) and TLK2 (r = 0.361, p = 0.005) had the significant correlation to prolonged PFS. The result of this work demonstrated that expression nature of kinase genes such as PSKH1, TLK2 and PHKG2 may be informative to predict the efficacy of CI in wild type KRAS CRC. Mutations in either BRAF or PIK3CA were rare subsets in wild type KRAS CRC.

  4. De novo Transcriptome Assembly of Chinese Kale and Global Expression Analysis of Genes Involved in Glucosinolate Metabolism in Multiple Tissues

    PubMed Central

    Wu, Shuanghua; Lei, Jianjun; Chen, Guoju; Chen, Hancai; Cao, Bihao; Chen, Changming

    2017-01-01

    Chinese kale, a vegetable of the cruciferous family, is a popular crop in southern China and Southeast Asia due to its high glucosinolate content and nutritional qualities. However, there is little research on the molecular genetics and genes involved in glucosinolate metabolism and its regulation in Chinese kale. In this study, we sequenced and characterized the transcriptomes and expression profiles of genes expressed in 11 tissues of Chinese kale. A total of 216 million 150-bp clean reads were generated using RNA-sequencing technology. From the sequences, 98,180 unigenes were assembled for the whole plant, and 49,582~98,423 unigenes were assembled for each tissue. Blast analysis indicated that a total of 80,688 (82.18%) unigenes exhibited similarity to known proteins. The functional annotation and classification tools used in this study suggested that genes principally expressed in Chinese kale, were mostly involved in fundamental processes, such as cellular and molecular functions, the signal transduction, and biosynthesis of secondary metabolites. The expression levels of all unigenes were analyzed in various tissues of Chinese kale. A large number of candidate genes involved in glucosinolate metabolism and its regulation were identified, and the expression patterns of these genes were analyzed. We found that most of the genes involved in glucosinolate biosynthesis were highly expressed in the root, petiole, and in senescent leaves. The expression patterns of ten glucosinolate biosynthetic genes from RNA-seq were validated by quantitative RT-PCR in different tissues. These results provided an initial and global overview of Chinese kale gene functions and expression activities in different tissues. PMID:28228764

  5. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm.

    PubMed

    Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C

    2005-10-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  6. High levels of PROM1 (CD133) transcript are a potential predictor of poor prognosis in medulloblastoma

    PubMed Central

    Raso, Alessandro; Mascelli, Samantha; Biassoni, Roberto; Nozza, Paolo; Kool, Marcel; Pistorio, Angela; Ugolotti, Elisabetta; Milanaccio, Claudia; Pignatelli, Sara; Ferraro, Manuela; Pavanello, Marco; Ravegnani, Marcello; Cama, Armando; Garrè, Maria Luisa; Capra, Valeria

    2011-01-01

    The surface marker PROM1 is considered one of the most important markers of tumor-initiating cells, and its expression is believed to be an adverse prognostic factor in gliomas and in other malignancies. To date, to our knowledge, no specific studies of its expression in medulloblastoma series have been performed. The aims of our study were to evaluate the expression profile of the PROM1 gene in medulloblastoma and to assess its possible role as a prognostic factor. The PROM1 gene expression was evaluated by quantitative– polymerase chain reaction on 45 medulloblastoma samples by using specific dye-labeled probe systems. A significantly higher expression of PROM1 was found both in patients with poorer prognosis (P= .007) and in those with metastasis (P= .03). Kaplan–Meier analysis showed that both overall survival (OS) and progression-free survival (PFS) were shorter in patients with higher PROM1 mRNA levels than in patients with lower expression, even when the desmoplastic cases were excluded (P= .0004 and P= .002, for OS and PFS for all cases, respectively; P= .002 and P= .008 for OS and PFS for nondesmoplastic cases, respectively). Cox regression model demonstrated that PROM1 expression is an independent prognostic factor (hazard ratio, 4.56; P= .008). The result was validated on an independent cohort of 42 cases by microarray-based analysis (P= .019). This work suggests that high mRNA levels of PROM1 are associated with poor outcome in pediatric medulloblastoma. Furthermore, high PROM1 expression levels seem to increase the likelihood of metastases. Such results need to be confirmed in larger prospective series to possibly incorporate PROM1 gene expression into risk classification systems to be used in the clinical setting. PMID:21486962

  7. Gene expression during different periods of the handling-stress response in Pampus argenteus

    NASA Astrophysics Data System (ADS)

    Sun, Peng; Tang, Baojun; Yin, Fei

    2017-11-01

    Common aquaculture practices subject fish to a variety of acute and chronic stressors. Such stressors are inherent in aquaculture production but can adversely affect survival, growth, immune response, reproductive capacity, and behavior. Understanding the biological mechanisms underlying stress responses helps with methods to alleviate the negative effects through better aquaculture practices, resulting in improved animal welfare and production efficiency. In the present study, transcriptome sequencing of liver and kidney was performed in silver pomfret (Pampus argenteus) subjected to handling stress versus controls. A total of 162.19 million clean reads were assembled to 30 339 unigenes. The quality of the assembly was high, with an N50 length of 2 472 bases. For function classification and pathway assignment, the unigenes were categorized into three GO (gene ontology) categories, twenty-six clusters of eggNOG (evolutionary genealogy of genes: non-supervised orthologous groups) function categories, and thirty-eight KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways. Stress affected different functional groups of genes in the tissues studied. Differentially expressed genes were mainly involved in metabolic pathways (carbohydrate metabolism, lipid metabolism, amino-acid metabolism, uptake of cofactors and vitamins, and biosynthesis of other secondary metabolites), environmental information processing (signaling molecules and their interactions), organismal systems (endocrine system, digestive system), and disease (immune, neurodegenerative, endocrine and metabolic diseases). This is the first reported analysis of genome-wide transcriptome in P. argenteus, and the findings expand our understanding of the silver pomfret genome and gene expression in association with stress. The results will be useful to future analyses of functional genes and studies of healthy artificial breeding in P. argenteus and other related fish species.

  8. Multicenter validation of the diagnostic accuracy of a blood-based gene expression test for assessing obstructive coronary artery disease in nondiabetic patients.

    PubMed

    Rosenberg, Steven; Elashoff, Michael R; Beineke, Philip; Daniels, Susan E; Wingrove, James A; Tingley, Whittemore G; Sager, Philip T; Sehnert, Amy J; Yau, May; Kraus, William E; Newby, L Kristin; Schwartz, Robert S; Voros, Szilard; Ellis, Stephen G; Tahirkheli, Naeem; Waksman, Ron; McPherson, John; Lansky, Alexandra; Winn, Mary E; Schork, Nicholas J; Topol, Eric J

    2010-10-05

    Diagnosing obstructive coronary artery disease (CAD) in at-risk patients can be challenging and typically requires both noninvasive imaging methods and coronary angiography, the gold standard. Previous studies have suggested that peripheral blood gene expression can indicate the presence of CAD. To validate a previously developed 23-gene, expression-based classification test for diagnosis of obstructive CAD in nondiabetic patients. Multicenter prospective trial with blood samples obtained before coronary angiography. (ClinicalTrials.gov registration number: NCT00500617) SETTING: 39 centers in the United States. An independent validation cohort of 526 nondiabetic patients with a clinical indication for coronary angiography. Receiver-operating characteristic (ROC) analysis of classifier score measured by real-time polymerase chain reaction, additivity to clinical factors, and reclassification of patient disease likelihood versus disease status defined by quantitative coronary angiography. Obstructive CAD was defined as 50% or greater stenosis in 1 or more major coronary arteries by quantitative coronary angiography. The area under the ROC curve (AUC) was 0.70 ± 0.02 (P < 0.001); the test added to clinical variables (Diamond-Forrester method) (AUC, 0.72 with the test vs. 0.66 without; P = 0.003) and added somewhat to an expanded clinical model (AUC, 0.745 with the test vs. 0.732 without; P = 0.089). The test improved net reclassification over both the Diamond-Forrester method and the expanded clinical model (P < 0.001). At a score threshold that corresponded to a 20% likelihood of obstructive CAD (14.75), the sensitivity and specificity were 85% and 43% (yielding a negative predictive value of 83% and a positive predictive value of 46%), with 33% of patient scores below this threshold. Patients with chronic inflammatory disorders, elevated levels of leukocytes or cardiac protein markers, or diabetes were excluded. A noninvasive whole-blood test based on gene expression and demographic characteristics may be useful for assessing obstructive CAD in nondiabetic patients without known CAD. CardioDx.

  9. Gene expression profiles of metabolic aggressiveness and tumor recurrence in benign meningioma.

    PubMed

    Serna, Eva; Morales, José Manuel; Mata, Manuel; Gonzalez-Darder, José; San Miguel, Teresa; Gil-Benso, Rosario; Lopez-Gines, Concha; Cerda-Nicolas, Miguel; Monleon, Daniel

    2013-01-01

    Around 20% of meningiomas histologically benign may be clinically aggressive and recur. This strongly affects management of meningioma patients. There is a need to evaluate the potential aggressiveness of an individual meningioma. Additional criteria for better classification of meningiomas will improve clinical decisions as well as patient follow up strategy after surgery. The aim of this study was to determine the relationship between gene expression profiles and new metabolic subgroups of benign meningioma with potential clinical relevance. Forty benign and fourteen atypical meningioma tissue samples were included in the study. We obtained metabolic profiles by NMR and recurrence after surgery information for all of them. We measured gene expression by oligonucleotide microarray measurements on 19 of them. To our knowledge, this is the first time that distinct gene expression profiles are reported for benign meningioma molecular subgroups with clinical correlation. Our results show that metabolic aggressiveness in otherwise histological benign meningioma proceeds mostly through alterations in the expression of genes involved in the regulation of transcription, mainly the LMO3 gene. Genes involved in tumor metabolism, like IGF1R, are also differentially expressed in those meningioma subgroups with higher rates of membrane turnover, higher energy demand and increased resistance to apoptosis. These new subgroups of benign meningiomas exhibit different rates of recurrence. This work shows that benign meningioma with metabolic aggressiveness constitute a subgroup of potentially recurrent tumors in which alterations in genes regulating critical features of aggressiveness, like increased angiogenesis or cell invasion, are still no predominant. The determination of these gene expression biosignatures may allow the early detection of clinically aggressive tumors.

  10. A methodology to migrate the gene ontology to a description logic environment using DAML+OIL.

    PubMed

    Wroe, C J; Stevens, R; Goble, C A; Ashburner, M

    2003-01-01

    The Gene Ontology Next Generation Project (GONG) is developing a staged methodology to evolve the current representation of the Gene Ontology into DAML+OIL in order to take advantage of the richer formal expressiveness and the reasoning capabilities of the underlying description logic. Each stage provides a step level increase in formal explicit semantic content with a view to supporting validation, extension and multiple classification of the Gene Ontology. The paper introduces DAML+OIL and demonstrates the activity within each stage of the methodology and the functionality gained.

  11. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    PubMed

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  12. The human disease network in terms of dysfunctional regulatory mechanisms.

    PubMed

    Yang, Jing; Wu, Su-Juan; Dai, Wen-Tao; Li, Yi-Xue; Li, Yuan-Yuan

    2015-10-08

    Elucidation of human disease similarities has emerged as an active research area, which is highly relevant to etiology, disease classification, and drug repositioning. In pioneer studies, disease similarity was commonly estimated according to clinical manifestation. Subsequently, scientists started to investigate disease similarity based on gene-phenotype knowledge, which were inevitably biased to well-studied diseases. In recent years, estimating disease similarity according to transcriptomic behavior significantly enhances the probability of finding novel disease relationships, while the currently available studies usually mine expression data through differential expression analysis that has been considered to have little chance of unraveling dysfunctional regulatory relationships, the causal pathogenesis of diseases. We developed a computational approach to measure human disease similarity based on expression data. Differential coexpression analysis, instead of differential expression analysis, was employed to calculate differential coexpression level of every gene for each disease, which was then summarized to the pathway level. Disease similarity was eventually calculated as the partial correlation coefficients of pathways' differential coexpression values between any two diseases. The significance of disease relationships were evaluated by permutation test. Based on mRNA expression data and a differential coexpression analysis based method, we built a human disease network involving 1326 significant Disease-Disease links among 108 diseases. Compared with disease relationships captured by differential expression analysis based method, our disease links shared known disease genes and drugs more significantly. Some novel disease relationships were discovered, for example, Obesity and cancer, Obesity and Psoriasis, lung adenocarcinoma and S. pneumonia, which had been commonly regarded as unrelated to each other, but recently found to share similar molecular mechanisms. Additionally, it was found that both the type of disease and the type of affected tissue influenced the degree of disease similarity. A sub-network including Allergic asthma, Type 2 diabetes and Chronic kidney disease was extracted to demonstrate the exploration of their common pathogenesis. The present study produces a global view of human diseasome for the first time from the viewpoint of regulation mechanisms, which therefore could provide insightful clues to etiology and pathogenesis, and help to perform drug repositioning and design novel therapeutic interventions.

  13. Expression signature as a biomarker for prenatal diagnosis of trisomy 21.

    PubMed

    Volk, Marija; Maver, Aleš; Lovrečić, Luca; Juvan, Peter; Peterlin, Borut

    2013-01-01

    A universal biomarker panel with the potential to predict high-risk pregnancies or adverse pregnancy outcome does not exist. Transcriptome analysis is a powerful tool to capture differentially expressed genes (DEG), which can be used as biomarker-diagnostic-predictive tool for various conditions in prenatal setting. In search of biomarker set for predicting high-risk pregnancies, we performed global expression profiling to find DEG in Ts21. Subsequently, we performed targeted validation and diagnostic performance evaluation on a larger group of case and control samples. Initially, transcriptomic profiles of 10 cultivated amniocyte samples with Ts21 and 9 with normal euploid constitution were determined using expression microarrays. Datasets from Ts21 transcriptomic studies from GEO repository were incorporated. DEG were discovered using linear regression modelling and validated using RT-PCR quantification on an independent sample of 16 cases with Ts21 and 32 controls. The classification performance of Ts21 status based on expression profiling was performed using supervised machine learning algorithm and evaluated using a leave-one-out cross validation approach. Global gene expression profiling has revealed significant expression changes between normal and Ts21 samples, which in combination with data from previously performed Ts21 transcriptomic studies, were used to generate a multi-gene biomarker for Ts21, comprising of 9 gene expression profiles. In addition to biomarker's high performance in discriminating samples from global expression profiling, we were also able to show its discriminatory performance on a larger sample set 2, validated using RT-PCR experiment (AUC=0.97), while its performance on data from previously published studies reached discriminatory AUC values of 1.00. Our results show that transcriptomic changes might potentially be used to discriminate trisomy of chromosome 21 in the prenatal setting. As expressional alterations reflect both, causal and reactive cellular mechanisms, transcriptomic changes may thus have future potential in the diagnosis of a wide array of heterogeneous diseases that result from genetic disturbances.

  14. Molecular profiling identifies prognostic markers of stage IA lung adenocarcinoma.

    PubMed

    Zhang, Jie; Shao, Jinchen; Zhu, Lei; Zhao, Ruiying; Xing, Jie; Wang, Jun; Guo, Xiaohui; Tu, Shichun; Han, Baohui; Yu, Keke

    2017-09-26

    We previously showed that different pathologic subtypes were associated with different prognostic values in patients with stage IA lung adenocarcinoma (AC). We hypothesize that differential gene expression profiles of different subtypes may be valuable factors for prognosis in stage IA lung adenocarcinoma. We performed microarray gene expression profiling on tumor tissues micro-dissected from patients with acinar and solid predominant subtypes of stage IA lung adenocarcinoma. These patients had undergone a lobectomy and mediastinal lymph node dissection at the Shanghai Chest Hospital, Shanghai, China in 2012. No patient had preoperative treatment. We performed the Gene Set Enrichment Analysis (GSEA) analysis to look for gene expression signatures associated with tumor subtypes. The histologic subtypes of all patients were classified according to the 2015 WHO lung Adenocarcinoma classification. We found that patients with the solid predominant subtype are enriched for genes involved in RNA polymerase activity as well as inactivation of the p53 pathway. Further, we identified a list of genes that may serve as prognostic markers for stage IA lung adenocarcinoma. Validation in the TCGA database shows that these genes are correlated with survival, suggesting that they are novel prognostic factors for stage IA lung adenocarcinoma. In conclusion, we have uncovered novel prognostic factors for stage IA lung adenocarcinoma using gene expression profiling in combination with histopathology subtyping.

  15. Optimal number of features as a function of sample size for various classification rules.

    PubMed

    Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R

    2005-04-15

    Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.

  16. Molecular Cloning and Characterization of G Alpha Proteins from the Western Tarnished Plant Bug, Lygus hesperus

    PubMed Central

    Hull, J. Joe; Wang, Meixian

    2014-01-01

    The Gα subunits of heterotrimeric G proteins play critical roles in the activation of diverse signal transduction cascades. However, the role of these genes in chemosensation remains to be fully elucidated. To initiate a comprehensive survey of signal transduction genes, we used homology-based cloning methods and transcriptome data mining to identity Gα subunits in the western tarnished plant bug (Lygus hesperus Knight). Among the nine sequences identified were single variants of the Gαi, Gαo, Gαs, and Gα12 subfamilies and five alternative splice variants of the Gαq subfamily. Sequence alignment and phylogenetic analyses of the putative L. hesperus Gα subunits support initial classifications and are consistent with established evolutionary relationships. End-point PCR-based profiling of the transcripts indicated head specific expression for LhGαq4, and largely ubiquitous expression, albeit at varying levels, for the other LhGα transcripts. All subfamilies were amplified from L. hesperus chemosensory tissues, suggesting potential roles in olfaction and/or gustation. Immunohistochemical staining of cultured insect cells transiently expressing recombinant His-tagged LhGαi, LhGαs, and LhGαq1 revealed plasma membrane targeting, suggesting the respective sequences encode functional G protein subunits. PMID:26463065

  17. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  18. Genomewide analysis of TCP transcription factor gene family in Malus domestica.

    PubMed

    Xu, Ruirui; Sun, Peng; Jia, Fengjuan; Lu, Longtao; Li, Yuanyuan; Zhang, Shizhong; Huang, Jinguang

    2014-12-01

    Teosinte branched 1/cycloidea/proliferating cell factor 1 (TCP) proteins are a large family of transcriptional regulators in angiosperms. They are involved in various biological processes, including development and plant metabolism pathways. In this study, a total of 52 TCP genes were identified in apple (Malus domestica) genome. Bioinformatic methods were employed to predicate and analyse their relevant gene classification, gene structure, chromosome location, sequence alignment and conserved domains of MdTCP proteins. Expression analysis from microarray data showed that the expression levels of 28 and 51 MdTCP genes changed during the ripening and rootstock-scion interaction processes, respectively. The expression patterns of 12 selected MdTCP genes were analysed in different tissues and in response to abiotic stresses. All of the selected genes were detected in at least one of the tissues tested, and most of them were modulated by adverse treatments indicating that the MdTCPs were involved in various developmental and physiological processes. To the best of our knowledge, this is the first study of a genomewide analysis of apple TCP gene family. These results provide valuable information for studies on functions of the TCP transcription factor genes in apple.

  19. Behçet's: A Disease or a Syndrome? Answer from an Expression Profiling Study

    PubMed Central

    Oğuz, Ali Kemal; Yılmaz, Seda Taşır; Oygür, Çağdaş Şahap; Çandar, Tuba; Sayın, Irmak; Kılıçoğlu, Sibel Serin; Ergün, İhsan; Ateş, Aşkın; Özdağ, Hilal; Akar, Nejat

    2016-01-01

    Behçet’s disease (BD) is a chronic, relapsing, multisystemic inflammatory disorder with unanswered questions regarding its etiology/pathogenesis and classification. Distinct manifestation based subsets, pronounced geographical variations in expression, and discrepant immunological abnormalities raised the question whether Behçet’s is “a disease or a syndrome”. To answer the preceding question we aimed to display and compare the molecular mechanisms underlying distinct subsets of BD. For this purpose, the expression data of the gene expression profiling and association study on BD by Xavier et al (2013) was retrieved from GEO database and reanalysed by gene expression data analysis/visualization and bioinformatics enrichment tools. There were 15 BD patients (B) and 14 controls (C). Three subsets of BD patients were generated: MB (isolated mucocutaneous manifestations, n = 7), OB (ocular involvement, n = 4), and VB (large vein thrombosis, n = 4). Class comparison analyses yielded the following numbers of differentially expressed genes (DEGs); B vs C: 4, MB vs C: 5, OB vs C: 151, VB vs C: 274, MB vs OB: 215, MB vs VB: 760, OB vs VB: 984. Venn diagram analysis showed that there were no common DEGs in the intersection “MB vs C” ∩ “OB vs C” ∩ “VB vs C”. Cluster analyses successfully clustered distinct expressions of BD. During gene ontology term enrichment analyses, categories with relevance to IL-8 production (MB vs C) and immune response to microorganisms (OB vs C) were differentially enriched. Distinct subsets of BD display distinct expression profiles and different disease associated pathways. Based on these clear discrepancies, the designation as “Behçet’s syndrome” (BS) should be encouraged and future research should take into consideration the immunogenetic heterogeneity of BS subsets. Four gene groups, namely, negative regulators of inflammation (CD69, CLEC12A, CLEC12B, TNFAIP3), neutrophil granule proteins (LTF, OLFM4, AZU1, MMP8, DEFA4, CAMP), antigen processing and presentation proteins (CTSS, ERAP1), and regulators of immune response (LGALS2, BCL10, ITCH, CEACAM8, CD36, IL8, CCL4, EREG, NFKBIZ, CCR2, CD180, KLRC4, NFAT5) appear to be instrumental in BS immunopathogenesis. PMID:26890122

  20. Bacterial reference genes for gene expression studies by RT-qPCR: survey and analysis.

    PubMed

    Rocha, Danilo J P; Santos, Carolina S; Pacheco, Luis G C

    2015-09-01

    The appropriate choice of reference genes is essential for accurate normalization of gene expression data obtained by the method of reverse transcription quantitative real-time PCR (RT-qPCR). In 2009, a guideline called the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) highlighted the importance of the selection and validation of more than one suitable reference gene for obtaining reliable RT-qPCR results. Herein, we searched the recent literature in order to identify the bacterial reference genes that have been most commonly validated in gene expression studies by RT-qPCR (in the first 5 years following publication of the MIQE guidelines). Through a combination of different search parameters with the text mining tool MedlineRanker, we identified 145 unique bacterial genes that were recently tested as candidate reference genes. Of these, 45 genes were experimentally validated and, in most of the cases, their expression stabilities were verified using the software tools geNorm and NormFinder. It is noteworthy that only 10 of these reference genes had been validated in two or more of the studies evaluated. An enrichment analysis using Gene Ontology classifications demonstrated that genes belonging to the functional categories of DNA Replication (GO: 0006260) and Transcription (GO: 0006351) rendered a proportionally higher number of validated reference genes. Three genes in the former functional class were also among the top five most stable genes identified through an analysis of gene expression data obtained from the Pathosystems Resource Integration Center. These results may provide a guideline for the initial selection of candidate reference genes for RT-qPCR studies in several different bacterial species.

  1. Gene Set−Based Integrative Analysis Revealing Two Distinct Functional Regulation Patterns in Four Common Subtypes of Epithelial Ovarian Cancer

    PubMed Central

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Yi-Ping; Chuang, Jen-Hua; Yang, Ming-Jie; Yen, Ming-Shyen; Chiou, Shih-Hwa; Chang, Cheng-Chang

    2016-01-01

    Clear cell (CCC), endometrioid (EC), mucinous (MC) and high-grade serous carcinoma (SC) are the four most common subtypes of epithelial ovarian carcinoma (EOC). The widely accepted dualistic model of ovarian carcinogenesis divided EOCs into type I and II categories based on the molecular features. However, this hypothesis has not been experimentally demonstrated. We carried out a gene set-based analysis by integrating the microarray gene expression profiles downloaded from the publicly available databases. These quantified biological functions of EOCs were defined by 1454 Gene Ontology (GO) term and 674 Reactome pathway gene sets. The pathogenesis of the four EOC subtypes was investigated by hierarchical clustering and exploratory factor analysis. The patterns of functional regulation among the four subtypes containing 1316 cases could be accurately classified by machine learning. The results revealed that the ERBB and PI3K-related pathways played important roles in the carcinogenesis of CCC, EC and MC; while deregulation of cell cycle was more predominant in SC. The study revealed that two different functional regulation patterns exist among the four EOC subtypes, which were compatible with the type I and II classifications proposed by the dualistic model of ovarian carcinogenesis. PMID:27527159

  2. Design and evaluation of Actichip, a thematic microarray for the study of the actin cytoskeleton

    PubMed Central

    Muller, Jean; Mehlen, André; Vetter, Guillaume; Yatskou, Mikalai; Muller, Arnaud; Chalmel, Frédéric; Poch, Olivier; Friederich, Evelyne; Vallar, Laurent

    2007-01-01

    Background The actin cytoskeleton plays a crucial role in supporting and regulating numerous cellular processes. Mutations or alterations in the expression levels affecting the actin cytoskeleton system or related regulatory mechanisms are often associated with complex diseases such as cancer. Understanding how qualitative or quantitative changes in expression of the set of actin cytoskeleton genes are integrated to control actin dynamics and organisation is currently a challenge and should provide insights in identifying potential targets for drug discovery. Here we report the development of a dedicated microarray, the Actichip, containing 60-mer oligonucleotide probes for 327 genes selected for transcriptome analysis of the human actin cytoskeleton. Results Genomic data and sequence analysis features were retrieved from GenBank and stored in an integrative database called Actinome. From these data, probes were designed using a home-made program (CADO4MI) allowing sequence refinement and improved probe specificity by combining the complementary information recovered from the UniGene and RefSeq databases. Actichip performance was analysed by hybridisation with RNAs extracted from epithelial MCF-7 cells and human skeletal muscle. Using thoroughly standardised procedures, we obtained microarray images with excellent quality resulting in high data reproducibility. Actichip displayed a large dynamic range extending over three logs with a limit of sensitivity between one and ten copies of transcript per cell. The array allowed accurate detection of small changes in gene expression and reliable classification of samples based on the expression profiles of tissue-specific genes. When compared to two other oligonucleotide microarray platforms, Actichip showed similar sensitivity and concordant expression ratios. Moreover, Actichip was able to discriminate the highly similar actin isoforms whereas the two other platforms did not. Conclusion Our data demonstrate that Actichip is a powerful alternative to commercial high density microarrays for cytoskeleton gene profiling in normal or pathological samples. Actichip is available upon request. PMID:17727702

  3. Genome-wide identification and expression analysis of the ClTCP transcription factors in Citrullus lanatus.

    PubMed

    Shi, Pibiao; Guy, Kateta Malangisha; Wu, Weifang; Fang, Bingsheng; Yang, Jinghua; Zhang, Mingfang; Hu, Zhongyuan

    2016-04-12

    The plant-specific TCP transcription factor family, which is involved in the regulation of cell growth and proliferation, performs diverse functions in multiple aspects of plant growth and development. However, no comprehensive analysis of the TCP family in watermelon (Citrullus lanatus) has been undertaken previously. A total of 27 watermelon TCP encoding genes distributed on nine chromosomes were identified. Phylogenetic analysis clustered the genes into 11 distinct subgroups. Furthermore, phylogenetic and structural analyses distinguished two homology classes within the ClTCP family, designated Class I and Class II. The Class II genes were differentiated into two subclasses, the CIN subclass and the CYC/TB1 subclass. The expression patterns of all members were determined by semi-quantitative PCR. The functions of two ClTCP genes, ClTCP14a and ClTCP15, in regulating plant height were confirmed by ectopic expression in Arabidopsis wild-type and ortholog mutants. This study represents the first genome-wide analysis of the watermelon TCP gene family, which provides valuable information for understanding the classification and functions of the TCP genes in watermelon.

  4. Prognostic stratification improvement by integrating ID1/ID3/IGJ gene expression signature and immunophenotypic profile in adult patients with B-ALL.

    PubMed

    Cruz-Rodriguez, Nataly; Combita, Alba L; Enciso, Leonardo J; Raney, Lauren F; Pinzon, Paula L; Lozano, Olga C; Campos, Alba M; Peñaloza, Niyireth; Solano, Julio; Herrera, Maria V; Zabaleta, Jovanny; Quijano, Sandra

    2017-02-28

    Survival of adults with B-Acute Lymphoblastic Leukemia requires accurate risk stratification of patients in order to provide the appropriate therapy. Contemporary techniques, using clinical and cytogenetic variables are incomplete for prognosis prediction. To improve the classification of adult patients diagnosed with B-ALL into prognosis groups, two strategies were examined and combined: the expression of the ID1/ID3/IGJ gene signature by RT-PCR and the immunophenotypic profile of 19 markers proposed in the EuroFlow protocol by Flow Cytometry in bone marrow samples. Both techniques were correlated to stratify patients into prognostic groups. An inverse relationship between survival and expression of the three-genes signature was observed and an immunophenotypic profile associated with clinical outcome was identified. Markers CD10 and CD20 were correlated with simultaneous overexpression of ID1, ID3 and IGJ. Patients with simultaneous expression of the poor prognosis gene signature and overexpression of CD10 or CD20, had worse Event Free Survival and Overall Survival than patients who had either the poor prognosis gene expression signature or only CD20 or CD10 overexpressed. By utilizing the combined evaluation of these two immunophenotypic markers along with the poor prognosis gene expression signature, the risk stratification can be significantly strengthened. Further studies including a large number of patients are needed to confirm these findings.

  5. Changes in Gene Expression Predicting Local Control in Cervical Cancer: Results from Radiation Therapy Oncology Group 0128

    PubMed Central

    Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David

    2009-01-01

    Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178

  6. Analysis of clock-regulated genes in Neurospora reveals widespread posttranscriptional control of metabolic potential

    PubMed Central

    Hurley, Jennifer M.; Dasgupta, Arko; Emerson, Jillian M.; Zhou, Xiaoying; Ringelberg, Carol S.; Knabe, Nicole; Lipzen, Anna M.; Lindquist, Erika A.; Daum, Christopher G.; Barry, Kerrie W.; Grigoriev, Igor V.; Smith, Kristina M.; Galagan, James E.; Bell-Pedersen, Deborah; Freitag, Michael; Cheng, Chao; Loros, Jennifer J.; Dunlap, Jay C.

    2014-01-01

    Neurospora crassa has been for decades a principal model for filamentous fungal genetics and physiology as well as for understanding the mechanism of circadian clocks. Eukaryotic fungal and animal clocks comprise transcription-translation–based feedback loops that control rhythmic transcription of a substantial fraction of these transcriptomes, yielding the changes in protein abundance that mediate circadian regulation of physiology and metabolism: Understanding circadian control of gene expression is key to understanding eukaryotic, including fungal, physiology. Indeed, the isolation of clock-controlled genes (ccgs) was pioneered in Neurospora where circadian output begins with binding of the core circadian transcription factor WCC to a subset of ccg promoters, including those of many transcription factors. High temporal resolution (2-h) sampling over 48 h using RNA sequencing (RNA-Seq) identified circadianly expressed genes in Neurospora, revealing that from ∼10% to as much 40% of the transcriptome can be expressed under circadian control. Functional classifications of these genes revealed strong enrichment in pathways involving metabolism, protein synthesis, and stress responses; in broad terms, daytime metabolic potential favors catabolism, energy production, and precursor assembly, whereas night activities favor biosynthesis of cellular components and growth. Discriminative regular expression motif elicitation (DREME) identified key promoter motifs highly correlated with the temporal regulation of ccgs. Correlations between ccg abundance from RNA-Seq, the degree of ccg-promoter activation as reported by ccg-promoter–luciferase fusions, and binding of WCC as measured by ChIP-Seq, are not strong. Therefore, although circadian activation is critical to ccg rhythmicity, posttranscriptional regulation plays a major role in determining rhythmicity at the mRNA level. PMID:25362047

  7. Robust gene selection methods using weighting schemes for microarray data analysis.

    PubMed

    Kang, Suyeon; Song, Jongwoo

    2017-09-02

    A common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states. Owing to the high-dimensional nature of microarray data, identification of significant genes has been essential in analyzing the data. However, the performances of many gene selection techniques are highly dependent on the experimental conditions, such as the presence of measurement error or a limited number of sample replicates. We have proposed new filter-based gene selection techniques, by applying a simple modification to significance analysis of microarrays (SAM). To prove the effectiveness of the proposed method, we considered a series of synthetic datasets with different noise levels and sample sizes along with two real datasets. The following findings were made. First, our proposed methods outperform conventional methods for all simulation set-ups. In particular, our methods are much better when the given data are noisy and sample size is small. They showed relatively robust performance regardless of noise level and sample size, whereas the performance of SAM became significantly worse as the noise level became high or sample size decreased. When sufficient sample replicates were available, SAM and our methods showed similar performance. Finally, our proposed methods are competitive with traditional methods in classification tasks for microarrays. The results of simulation study and real data analysis have demonstrated that our proposed methods are effective for detecting significant genes and classification tasks, especially when the given data are noisy or have few sample replicates. By employing weighting schemes, we can obtain robust and reliable results for microarray data analysis.

  8. Genome-Wide Analysis of the RAV Family in Soybean and Functional Identification of GmRAV-03 Involvement in Salt and Drought Stresses and Exogenous ABA Treatment

    PubMed Central

    Zhao, Shu-Ping; Xu, Zhao-Shi; Zheng, Wei-Jun; Zhao, Wan; Wang, Yan-Xia; Yu, Tai-Fei; Chen, Ming; Zhou, Yong-Bin; Min, Dong-Hong; Ma, You-Zhi; Chai, Shou-Cheng; Zhang, Xiao-Hong

    2017-01-01

    Transcription factors play vital roles in plant growth and in plant responses to abiotic stresses. The RAV transcription factors contain a B3 DNA binding domain and/or an APETALA2 (AP2) DNA binding domain. Although genome-wide analyses of RAV family genes have been performed in several species, little is known about the family in soybean (Glycine max L.). In this study, a total of 13 RAV genes, named as GmRAVs, were identified in the soybean genome. We predicted and analyzed the amino acid compositions, phylogenetic relationships, and folding states of conserved domain sequences of soybean RAV transcription factors. These soybean RAV transcription factors were phylogenetically clustered into three classes based on their amino acid sequences. Subcellular localization analysis revealed that the soybean RAV proteins were located in the nucleus. The expression patterns of 13 RAV genes were analyzed by quantitative real-time PCR. Under drought stresses, the RAV genes expressed diversely, up- or down-regulated. Following NaCl treatments, all RAV genes were down-regulated excepting GmRAV-03 which was up-regulated. Under abscisic acid (ABA) treatment, the expression of all of the soybean RAV genes increased dramatically. These results suggested that the soybean RAV genes may be involved in diverse signaling pathways and may be responsive to abiotic stresses and exogenous ABA. Further analysis indicated that GmRAV-03 could increase the transgenic lines resistance to high salt and drought and result in the transgenic plants insensitive to exogenous ABA. This present study provides valuable information for understanding the classification and putative functions of the RAV transcription factors in soybean. PMID:28634481

  9. Fuzzy support vector machine for microarray imbalanced data classification

    NASA Astrophysics Data System (ADS)

    Ladayya, Faroh; Purnami, Santi Wulan; Irhamah

    2017-11-01

    DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.

  10. Spleen transcriptome response to infection with avian pathogenic Escherichia coli in broiler chickens

    PubMed Central

    2011-01-01

    Background Avian pathogenic Escherichia coli (APEC) is detrimental to poultry health and its zoonotic potential is a food safety concern. Regulation of antimicrobials in food-production animals has put greater focus on enhancing host resistance to bacterial infections through genetics. To better define effective mechanism of host resistance, global gene expression in the spleen of chickens, harvested at two times post-infection (PI) with APEC, was measured using microarray technology, in a design that will enable investigation of effects of vaccination, challenge, and pathology level. Results There were 1,101 genes significantly differentially expressed between severely infected and non-infected groups on day 1 PI and 1,723 on day 5 PI. Very little difference was seen between mildly infected and non-infected groups on either time point. Between birds exhibiting mild and severe pathology, there were 2 significantly differentially expressed genes on day 1 PI and 799 on day 5 PI. Groups with greater pathology had more genes with increased expression than decreased expression levels. Several predominate immune pathways, Toll-like receptor, Jak-STAT, and cytokine signaling, were represented between challenged and non-challenged groups. Vaccination had, surprisingly, no detectible effect on gene expression, although it significantly protected the birds from observable gross lesions. Functional characterization of significantly expressed genes revealed unique gene ontology classifications during each time point, with many unique to a particular treatment or class contrast. Conclusions More severe pathology caused by APEC infection was associated with a high level of gene expression differences and increase in gene expression levels. Many of the significantly differentially expressed genes were unique to a particular treatment, pathology level or time point. The present study not only investigates the transcriptomic regulations of APEC infection, but also the degree of pathology associated with that infection. This study will allow for greater discovery into host mechanisms for disease resistance, providing targets for marker assisted selection and advanced drug development. PMID:21951686

  11. Spleen transcriptome response to infection with avian pathogenic Escherichia coli in broiler chickens.

    PubMed

    Sandford, Erin E; Orr, Megan; Balfanz, Emma; Bowerman, Nate; Li, Xianyao; Zhou, Huaijun; Johnson, Timothy J; Kariyawasam, Subhashinie; Liu, Peng; Nolan, Lisa K; Lamont, Susan J

    2011-09-27

    Avian pathogenic Escherichia coli (APEC) is detrimental to poultry health and its zoonotic potential is a food safety concern. Regulation of antimicrobials in food-production animals has put greater focus on enhancing host resistance to bacterial infections through genetics. To better define effective mechanism of host resistance, global gene expression in the spleen of chickens, harvested at two times post-infection (PI) with APEC, was measured using microarray technology, in a design that will enable investigation of effects of vaccination, challenge, and pathology level. There were 1,101 genes significantly differentially expressed between severely infected and non-infected groups on day 1 PI and 1,723 on day 5 PI. Very little difference was seen between mildly infected and non-infected groups on either time point. Between birds exhibiting mild and severe pathology, there were 2 significantly differentially expressed genes on day 1 PI and 799 on day 5 PI. Groups with greater pathology had more genes with increased expression than decreased expression levels. Several predominate immune pathways, Toll-like receptor, Jak-STAT, and cytokine signaling, were represented between challenged and non-challenged groups. Vaccination had, surprisingly, no detectible effect on gene expression, although it significantly protected the birds from observable gross lesions. Functional characterization of significantly expressed genes revealed unique gene ontology classifications during each time point, with many unique to a particular treatment or class contrast. More severe pathology caused by APEC infection was associated with a high level of gene expression differences and increase in gene expression levels. Many of the significantly differentially expressed genes were unique to a particular treatment, pathology level or time point. The present study not only investigates the transcriptomic regulations of APEC infection, but also the degree of pathology associated with that infection. This study will allow for greater discovery into host mechanisms for disease resistance, providing targets for marker assisted selection and advanced drug development.

  12. Expression Profiling of Transcriptome and Its Associated Disease Risk in Yang Deficiency Constitution of Healthy Subjects

    PubMed Central

    Yu, Ruoxi; Yang, Yin; Han, Yuanyuan; Hou, Pengwei; Li, Yingshuai; Li, Siqi

    2016-01-01

    Objectives. Differences among healthy subjects and associated disease risks are of substantial interest in clinical medicine. According to the theory of “constitution-disease correlation” in traditional Chinese medicine, we try to find out if there is any connection between intolerance of cold in Yang deficiency constitution and molecular evidence and if there is any gene expression basis in specific disorders. Methods. Peripheral blood mononuclear cells were collected from Chinese Han individuals with Yang deficiency constitution (n = 20) and balanced constitution (n = 8) (aged 18–28) and global gene expression profiles were determined between them using the Affymetrix HG-U133 Plus 2.0 array. Results. The results showed that when the fold change was ≥1.2 and q ≤ 0.05, 909 genes were upregulated in the Yang deficiency constitution, while 1189 genes were downregulated. According to our research differential genes found in Yang deficiency constitution were usually related to lower immunity, metabolic disorders, and cancer tendency. Conclusion. Gene expression disturbance exists in Yang deficiency constitution, which corresponds to the concept of constitution and gene classification. It also suggests people with Yang deficiency constitution are susceptible to autoimmune diseases, enteritis, arthritis, metabolism disorders, and cancer, which provides molecular evidence for the theory of “constitution-disease correlation.” PMID:28484499

  13. Artificial neural network classifier predicts neuroblastoma patients' outcome.

    PubMed

    Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi

    2016-11-08

    More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic status of the tumor in patients with poor prognosis. We developed a robust classifier predicting neuroblastoma patients' outcome with a very low error rate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the potential of using hypoxia as target for neuroblastoma treatment.

  14. Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

    PubMed

    Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R

    2003-11-22

    Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

  15. The 2016 revision of the WHO Classification of Central Nervous System Tumours: retrospective application to a cohort of diffuse gliomas.

    PubMed

    Rogers, Te Whiti; Toor, Gurvinder; Drummond, Katharine; Love, Craig; Field, Kathryn; Asher, Rebecca; Tsui, Alpha; Buckland, Michael; Gonzales, Michael

    2018-03-01

    The classification of central nervous system tumours has more recently been shaped by a focus on molecular pathology rather than histopathology. We re-classified 82 glial tumours according to the molecular-genetic criteria of the 2016 revision of the World Health Organization (WHO) Classification of Tumours of the Central Nervous System. Initial diagnoses and grading were based on the morphological criteria of the 2007 WHO scheme. Because of the impression of an oligodendroglial component on initial histological assessment, each tumour was tested for co-deletion of chromosomes 1p and 19q and mutations of isocitrate dehydrogenase (IDH-1 and 2) genes. Additionally, expression of proteins encoded by alpha-thalassemia X-linked mental retardation (ATRX) and TP53 genes was assessed by immunohistochemistry. We found that all but two tumours could be assigned to a specific category in the 2016 revision. The most common change in diagnosis was from oligoastrocytoma to specifically astrocytoma or oligodendroglioma. Analysis of progression free survival (PFS) for WHO grade II and III tumours showed that the objective criteria of the 2016 revision separated diffuse gliomas into three distinct molecular categories: chromosome 1p/19q co-deleted/IDH mutant, intact 1p/19q/IDH mutant and IDH wild type. No significant difference in PFS was found when comparing IDH mutant grade II and III tumours suggesting that IDH status is more informative than tumour grade. The segregation into distinct molecular sub-types that is achieved by the 2016 revision provides an objective evidence base for managing patients with grade II and III diffuse gliomas based on prognosis.

  16. Ternary particles for effective vaccine delivery to the pulmonary system

    NASA Astrophysics Data System (ADS)

    Terry, Treniece La'shay

    Progress in the fields of molecular biology and genomics has provided great insight into the pathogenesis of disease and the defense mechanisms of the immune system. This knowledge has lead to the classification of an array of abnormal genes, for which, treatment relies on cellular expression of proteins. The utility of DNA-based vaccines hold great promise for the treatment of genetically based and infectious diseases, which ranges from hemophilia, cystic fibrosis, and HIV. Synthetic delivery systems consisting of cationic polymers, such as polyethylenimine (PEI), are capable of condensing DNA into compact structures, maximizing cellular uptake of DNA and yielding high levels of protein expression. To date, short term expression is a major obstacle in the development of gene therapies and has halted their expansion in clinical applications. This study intends to develop a sustained release vaccine delivery system using PLA-PEG block copolymers encapsulating PEI:DNA polyplexes. To enhance the effectiveness of such DNA-based vaccines, resident antigen presenting cells, macrophages and dendritic cells, will be targeted within the alveoli regions of the lungs. Porous microspheres will be engineered with aerodynamic properties capable of achieving deep lung deposition. A fabrication technique using concentric nozzles will be developed to produce porous microspheres. It was observed that modifications in the dispersed to continuous phase ratios have the largest influence on particle size distributions, release rates and encapsulation efficiency which ranged form 80--95% with fourteen days of release. Amphiphilic block copolymers were also used to fabricate porous microspheres. The confirmation of PEG within the biodegradable polymer backbone was found to have a tremendous impact on the microsphere morphology and encapsulation efficiency which varied from 50--90%. Porous microspheres were capable of providing sustained gene expression when tested in vitro using the luciferase reporter gene plasmid DNA. Prolonged expression was obtained for 9 days. PLGA and PLA-PEG microspheres were administered in vivo by intra-tracheal instillation and produced an acute inflammatory response, as observed from the large presence of neutrophils. The response using PLA-PEG microspheres yielded a lower total cell count signifying the incorporation of PEG into the copolymer backbone enhances the biocompatibility of the delivery system.

  17. Using gene chips to identify organ-specific, smooth muscle responses to experimental diabetes: potential applications to urological diseases.

    PubMed

    Hipp, Jason D; Davies, Kelvin P; Tar, Moses; Valcic, Mira; Knoll, Abraham; Melman, Arnold; Christ, George J

    2007-02-01

    To identify early diabetes-related alterations in gene expression in bladder and erectile tissue that would provide novel diagnostic and therapeutic treatment targets to prevent, delay or ameliorate the ensuing bladder and erectile dysfunction. The RG-U34A rat GeneChip (Affymetrix Inc., Sunnyvale, CA, USA) oligonucleotide microarray (containing approximately 8799 genes) was used to evaluate gene expression in corporal and male bladder tissue excised from rats 1 week after confirmation of a diabetic state, but before demonstrable changes in organ function in vivo. A conservative analytical approach was used to detect alterations in gene expression, and gene ontology (GO) classifications were used to identify biological themes/pathways involved in the aetiology of the organ dysfunction. In all, 320 and 313 genes were differentially expressed in bladder and corporal tissue, respectively. GO analysis in bladder tissue showed prominent increases in biological pathways involved in cell proliferation, metabolism, actin cytoskeleton and myosin, as well as decreases in cell motility, and regulation of muscle contraction. GO analysis in corpora showed increases in pathways related to ion channel transport and ion channel activity, while there were decreases in collagen I and actin genes. The changes in gene expression in these initial experiments are consistent with the pathophysiological characteristics of the bladder and erectile dysfunction seen later in the diabetic disease process. Thus, the observed changes in gene expression might be harbingers or biomarkers of impending organ dysfunction, and could provide useful diagnostic and therapeutic targets for a variety of progressive urological diseases/conditions (i.e. lower urinary tract symptoms related to benign prostatic hyperplasia, erectile dysfunction, etc.).

  18. Genome-Wide Identification, Phylogenetic and Expression Analyses of the Ubiquitin-Conjugating Enzyme Gene Family in Maize.

    PubMed

    Jue, Dengwei; Sang, Xuelian; Lu, Shengqiao; Dong, Chen; Zhao, Qiufang; Chen, Hongliang; Jia, Liqiang

    2015-01-01

    Ubiquitination is a post-translation modification where ubiquitin is attached to a substrate. Ubiquitin-conjugating enzymes (E2s) play a major role in the ubiquitin transfer pathway, as well as a variety of functions in plant biological processes. To date, no genome-wide characterization of this gene family has been conducted in maize (Zea mays). In the present study, a total of 75 putative ZmUBC genes have been identified and located in the maize genome. Phylogenetic analysis revealed that ZmUBC proteins could be divided into 15 subfamilies, which include 13 ubiquitin-conjugating enzymes (ZmE2s) and two independent ubiquitin-conjugating enzyme variant (UEV) groups. The predicted ZmUBC genes were distributed across 10 chromosomes at different densities. In addition, analysis of exon-intron junctions and sequence motifs in each candidate gene has revealed high levels of conservation within and between phylogenetic groups. Tissue expression analysis indicated that most ZmUBC genes were expressed in at least one of the tissues, indicating that these are involved in various physiological and developmental processes in maize. Moreover, expression profile analyses of ZmUBC genes under different stress treatments (4°C, 20% PEG6000, and 200 mM NaCl) and various expression patterns indicated that these may play crucial roles in the response of plants to stress. Genome-wide identification, chromosome organization, gene structure, evolutionary and expression analyses of ZmUBC genes have facilitated in the characterization of this gene family, as well as determined its potential involvement in growth, development, and stress responses. This study provides valuable information for better understanding the classification and putative functions of the UBC-encoding genes of maize.

  19. Genome-Wide Identification, Phylogenetic and Expression Analyses of the Ubiquitin-Conjugating Enzyme Gene Family in Maize

    PubMed Central

    Jue, Dengwei; Sang, Xuelian; Lu, Shengqiao; Dong, Chen; Zhao, Qiufang; Chen, Hongliang; Jia, Liqiang

    2015-01-01

    Background Ubiquitination is a post-translation modification where ubiquitin is attached to a substrate. Ubiquitin-conjugating enzymes (E2s) play a major role in the ubiquitin transfer pathway, as well as a variety of functions in plant biological processes. To date, no genome-wide characterization of this gene family has been conducted in maize (Zea mays). Methodology/Principal Findings In the present study, a total of 75 putative ZmUBC genes have been identified and located in the maize genome. Phylogenetic analysis revealed that ZmUBC proteins could be divided into 15 subfamilies, which include 13 ubiquitin-conjugating enzymes (ZmE2s) and two independent ubiquitin-conjugating enzyme variant (UEV) groups. The predicted ZmUBC genes were distributed across 10 chromosomes at different densities. In addition, analysis of exon-intron junctions and sequence motifs in each candidate gene has revealed high levels of conservation within and between phylogenetic groups. Tissue expression analysis indicated that most ZmUBC genes were expressed in at least one of the tissues, indicating that these are involved in various physiological and developmental processes in maize. Moreover, expression profile analyses of ZmUBC genes under different stress treatments (4°C, 20% PEG6000, and 200 mM NaCl) and various expression patterns indicated that these may play crucial roles in the response of plants to stress. Conclusions Genome-wide identification, chromosome organization, gene structure, evolutionary and expression analyses of ZmUBC genes have facilitated in the characterization of this gene family, as well as determined its potential involvement in growth, development, and stress responses. This study provides valuable information for better understanding the classification and putative functions of the UBC-encoding genes of maize. PMID:26606743

  20. Determining the Molecular and Genetic Basis for Diabetes in Navy Bottlenose Dolphins (Tursiops truncatus)

    DTIC Science & Technology

    2015-01-12

    thereby reduces hepatic glucose production. 15. SUBJECT TERMS Gluconeogenesis , CREB ZF, Fasting, Diabetes 16. SECURITY CLASSIFICATION OF: a...Dolphin PEPCK transcription increased in the face of increasing cAMP, supporting that this enzyme induces gluconeogenesis during the fasting state...D. Test effects of CREB-ZF over-expression on gluconeogenic gene expression • Dolphin CREB-ZF is a novel, negative regulator of gluconeogenesis

  1. Inspection of the grapevine BURP superfamily highlights an expansion of RD22 genes with distinctive expression features in berry development and ABA-mediated stress responses.

    PubMed

    Matus, José Tomás; Aquea, Felipe; Espinoza, Carmen; Vega, Andrea; Cavallini, Erika; Dal Santo, Silvia; Cañón, Paola; Rodríguez-Hoces de la Guardia, Amparo; Serrano, Jennifer; Tornielli, Giovanni Battista; Arce-Johnson, Patricio

    2014-01-01

    The RESPONSIVE TO DEHYDRATION 22 (RD22) gene is a molecular link between abscisic acid (ABA) signalling and abiotic stress responses. Its expression has been used as a reliable ABA early response marker. In Arabidopsis, the single copy RD22 gene possesses a BURP domain also located at the C-terminus of USP embryonic proteins and the beta subunit of polygalacturonases. In grapevine, a RD22 gene has been identified but putative paralogs are also found in the grape genome, possibly forming a large RD22 family in this species. In this work, we searched for annotations containing BURP domains in the Vitis vinifera genome. Nineteen proteins were defined by a comparative analysis between the two genome predictions and RNA-Seq data. These sequences were compared to other plant BURPs identified in previous genome surveys allowing us to reconceive group classifications based on phylogenetic relationships and protein motif occurrence. We observed a lineage-specific evolution of the RD22 family, with the biggest expansion in grapevine and poplar. In contrast, rice, sorghum and maize presented highly expanded monocot-specific groups. The Vitis RD22 group may have expanded from segmental duplications as most of its members are confined to a region in chromosome 4. The inspection of transcriptomic data revealed variable expression of BURP genes in vegetative and reproductive organs. Many genes were induced in specific tissues or by abiotic and biotic stresses. Three RD22 genes were further studied showing that they responded oppositely to ABA and to stress conditions. Our results show that the inclusion of RNA-Seq data is essential while describing gene families and improving gene annotations. Robust phylogenetic analyses including all BURP members from other sequenced species helped us redefine previous relationships that were erroneously established. This work provides additional evidence for RD22 genes serving as marker genes for different organs or stresses in grapevine.

  2. Analysis of bHLH coding genes using gene co-expression network approach.

    PubMed

    Srivastava, Swati; Sanchita; Singh, Garima; Singh, Noopur; Srivastava, Gaurava; Sharma, Ashok

    2016-07-01

    Network analysis provides a powerful framework for the interpretation of data. It uses novel reference network-based metrices for module evolution. These could be used to identify module of highly connected genes showing variation in co-expression network. In this study, a co-expression network-based approach was used for analyzing the genes from microarray data. Our approach consists of a simple but robust rank-based network construction. The publicly available gene expression data of Solanum tuberosum under cold and heat stresses were considered to create and analyze a gene co-expression network. The analysis provide highly co-expressed module of bHLH coding genes based on correlation values. Our approach was to analyze the variation of genes expression, according to the time period of stress through co-expression network approach. As the result, the seed genes were identified showing multiple connections with other genes in the same cluster. Seed genes were found to be vary in different time periods of stress. These analyzed seed genes may be utilized further as marker genes for developing the stress tolerant plant species.

  3. Immunohistochemistry as a surrogate for molecular testing: a review.

    PubMed

    Swanson, Paul E

    2015-02-01

    Despite the myriad of genetic and epigenetic alterations in human neoplasms that seem to demand specific molecular probes for their identification and practical application to diagnostic pathology, immunohistochemistry (IHC) remains a vital component of laboratory testing in the emerging molecular era. The development and proper application of sensitive and specific antibodies raised against cryptic proteins only expressed in quantity after gene translocation, translocation-specific chimeric fusion peptides, and gene products overexpressed because of gene amplification demonstrate that IHC is a legitimate surrogate for traditional cytogenetic and in situ hybridization-based identification of chromosomal abnormalities, if not a viable molecular technique in its own right. Similarly, the detection of mutational events, through the reliable demonstration of protein loss, the identification of proteins overexpressed because of activating mutations, the specific visualization of mutant gene products, and the localization of splice variant gene products emphasizes the potential value of IHC as a surrogate for mutational analyses of genes important to both diagnosis and prediction of therapeutic response. In the latter setting IHC also provides a means of approximating gene expression profiles in the molecular classification and risk stratification of human neoplasms. For time being, the application of appropriately targeted sensitive and specific antibodies provides a cost-effective screening modality, if not replacement, for selected molecular techniques, but IHC will lose its value if the development of companion tests for emerging novel biomarkers does not keep pace with molecular techniques, particularly as the costs and time constraints of genomic sequencing diminish over time.

  4. Unsupervised clustering of gene expression data points at hypoxia as possible trigger for metabolic syndrome.

    PubMed

    Ptitsyn, Andrey; Hulver, Matthew; Cefalu, William; York, David; Smith, Steven R

    2006-12-19

    Classification of large volumes of data produced in a microarray experiment allows for the extraction of important clues as to the nature of a disease. Using multi-dimensional unsupervised FOREL (FORmal ELement) algorithm we have re-analyzed three public datasets of skeletal muscle gene expression in connection with insulin resistance and type 2 diabetes (DM2). Our analysis revealed the major line of variation between expression profiles of normal, insulin resistant, and diabetic skeletal muscle. A cluster of most "metabolically sound" samples occupied one end of this line. The distance along this line coincided with the classic markers of diabetes risk, namely obesity and insulin resistance, but did not follow the accepted clinical diagnosis of DM2 as defined by the presence or absence of hyperglycemia. Genes implicated in this expression pattern are those controlling skeletal muscle fiber type and glycolytic metabolism. Additionally myoglobin and hemoglobin were upregulated and ribosomal genes deregulated in insulin resistant patients. Our findings are concordant with the changes seen in skeletal muscle with altitude hypoxia. This suggests that hypoxia and shift to glycolytic metabolism may also drive insulin resistance.

  5. Cell of origin associated classification of B-cell malignancies by gene signatures of the normal B-cell hierarchy.

    PubMed

    Johnsen, Hans Erik; Bergkvist, Kim Steve; Schmitz, Alexander; Kjeldsen, Malene Krag; Hansen, Steen Møller; Gaihede, Michael; Nørgaard, Martin Agge; Bæch, John; Grønholdt, Marie-Louise; Jensen, Frank Svendsen; Johansen, Preben; Bødker, Julie Støve; Bøgsted, Martin; Dybkær, Karen

    2014-06-01

    Recent findings have suggested biological classification of B-cell malignancies as exemplified by the "activated B-cell-like" (ABC), the "germinal-center B-cell-like" (GCB) and primary mediastinal B-cell lymphoma (PMBL) subtypes of diffuse large B-cell lymphoma and "recurrent translocation and cyclin D" (TC) classification of multiple myeloma. Biological classification of B-cell derived cancers may be refined by a direct and systematic strategy where identification and characterization of normal B-cell differentiation subsets are used to define the cancer cell of origin phenotype. Here we propose a strategy combining multiparametric flow cytometry, global gene expression profiling and biostatistical modeling to generate B-cell subset specific gene signatures from sorted normal human immature, naive, germinal centrocytes and centroblasts, post-germinal memory B-cells, plasmablasts and plasma cells from available lymphoid tissues including lymph nodes, tonsils, thymus, peripheral blood and bone marrow. This strategy will provide an accurate image of the stage of differentiation, which prospectively can be used to classify any B-cell malignancy and eventually purify tumor cells. This report briefly describes the current models of the normal B-cell subset differentiation in multiple tissues and the pathogenesis of malignancies originating from the normal germinal B-cell hierarchy.

  6. Genome-Wide Identification and Analysis of TCP Transcription Factors Involved in the Formation of Leafy Head in Chinese Cabbage.

    PubMed

    Liu, Yan; Guan, Xiaoyu; Liu, Shengnan; Yang, Meng; Ren, Junhui; Guo, Meng; Huang, Zhihui; Zhang, Yaowei

    2018-03-14

    Chinese cabbage ( Brassica rapa L. ssp . pekinensis ) is a widely cultivated and economically important vegetable crop with typical leaf curvature. The TCP (Teosinte branched1, Cycloidea, Proliferating cell factor) family proteins are plant-specific transcription factors (TFs) and play important roles in many plant biological processes, especially in the regulation of leaf curvature. In this study, 39 genes encoding TCP TFs are detected on the whole genome of B. rapa. Based on the phylogenetic analysis of TCPs between Arabidopsis thaliana and Brassica rapa , TCP genes of Chinese cabbage are named from BrTCP1a to BrTCP24b . Moreover, the chromosomal location; phylogenetic relationships among B. rapa , A. thaliana , and rice; gene structures and protein conserved sequence alignment; and conserved domains are analyzed. The expression profiles of BrTCPs are analyzed in different tissues. To understand the role of Chinese cabbage TCP members in regulating the curvature of leaves, the expression patterns of all BrTCP genes are detected at three development stages essential for leafy head formation. Our results provide information on the classification and details of BrTCPs and allow us to better understand the function of TCPs involved in leaf curvature of Chinese cabbage.

  7. Genome-Wide Identification and Analysis of TCP Transcription Factors Involved in the Formation of Leafy Head in Chinese Cabbage

    PubMed Central

    Liu, Yan; Guan, Xiaoyu; Liu, Shengnan; Yang, Meng; Ren, Junhui; Guo, Meng; Huang, Zhihui

    2018-01-01

    Chinese cabbage (Brassica rapa L. ssp. pekinensis) is a widely cultivated and economically important vegetable crop with typical leaf curvature. The TCP (Teosinte branched1, Cycloidea, Proliferating cell factor) family proteins are plant-specific transcription factors (TFs) and play important roles in many plant biological processes, especially in the regulation of leaf curvature. In this study, 39 genes encoding TCP TFs are detected on the whole genome of B. rapa. Based on the phylogenetic analysis of TCPs between Arabidopsis thaliana and Brassica rapa, TCP genes of Chinese cabbage are named from BrTCP1a to BrTCP24b. Moreover, the chromosomal location; phylogenetic relationships among B. rapa, A. thaliana, and rice; gene structures and protein conserved sequence alignment; and conserved domains are analyzed. The expression profiles of BrTCPs are analyzed in different tissues. To understand the role of Chinese cabbage TCP members in regulating the curvature of leaves, the expression patterns of all BrTCP genes are detected at three development stages essential for leafy head formation. Our results provide information on the classification and details of BrTCPs and allow us to better understand the function of TCPs involved in leaf curvature of Chinese cabbage. PMID:29538304

  8. Impact of a novel protein meal on the gastrointestinal microbiota and the host transcriptome of larval zebrafish Danio rerio

    PubMed Central

    Rurangwa, Eugene; Sipkema, Detmer; Kals, Jeroen; ter Veld, Menno; Forlenza, Maria; Bacanu, Gianina M.; Smidt, Hauke; Palstra, Arjan P.

    2015-01-01

    Larval zebrafish was subjected to a methodological exploration of the gastrointestinal microbiota and transcriptome. Assessed was the impact of two dietary inclusion levels of a novel protein meal (NPM) of animal origin (ragworm Nereis virens) on the gastrointestinal tract (GIT). Microbial development was assessed over the first 21 days post egg fertilization (dpf) through 16S rRNA gene-based microbial composition profiling by pyrosequencing. Differentially expressed genes in the GIT were demonstrated at 21 dpf by whole transcriptome sequencing (mRNAseq). Larval zebrafish showed rapid temporal changes in microbial colonization but domination occurred by one to three bacterial species generally belonging to Proteobacteria and Firmicutes. The high iron content of NPM may have led to an increased relative abundance of bacteria that were related to potential pathogens and bacteria with an increased iron metabolism. Functional classification of the 328 differentially expressed genes indicated that the GIT of larvae fed at higher NPM level was more active in transmembrane ion transport and protein synthesis. mRNAseq analysis did not reveal a major activation of genes involved in the immune response or indicating differences in iron uptake and homeostasis in zebrafish fed at the high inclusion level of NPM. PMID:25983694

  9. Whole genome expression and biochemical correlates of extreme constitutional types defined in Ayurveda.

    PubMed

    Prasher, Bhavana; Negi, Sapna; Aggarwal, Shilpi; Mandal, Amit K; Sethi, Tav P; Deshmukh, Shailaja R; Purohit, Sudha G; Sengupta, Shantanu; Khanna, Sangeeta; Mohammad, Farhan; Garg, Gaurav; Brahmachari, Samir K; Mukerji, Mitali

    2008-09-09

    Ayurveda is an ancient system of personalized medicine documented and practiced in India since 1500 B.C. According to this system an individual's basic constitution to a large extent determines predisposition and prognosis to diseases as well as therapy and life-style regime. Ayurveda describes seven broad constitution types (Prakritis) each with a varying degree of predisposition to different diseases. Amongst these, three most contrasting types, Vata, Pitta, Kapha, are the most vulnerable to diseases. In the realm of modern predictive medicine, efforts are being directed towards capturing disease phenotypes with greater precision for successful identification of markers for prospective disease conditions. In this study, we explore whether the different constitution types as described in Ayurveda has molecular correlates. Normal individuals of the three most contrasting constitutional types were identified following phenotyping criteria described in Ayurveda in Indian population of Indo-European origin. The peripheral blood samples of these individuals were analysed for genome wide expression levels, biochemical and hematological parameters. Gene Ontology (GO) and pathway based analysis was carried out on differentially expressed genes to explore if there were significant enrichments of functional categories among Prakriti types. Individuals from the three most contrasting constitutional types exhibit striking differences with respect to biochemical and hematological parameters and at genome wide expression levels. Biochemical profiles like liver function tests, lipid profiles, and hematological parameters like haemoglobin exhibited differences between Prakriti types. Functional categories of genes showing differential expression among Prakriti types were significantly enriched in core biological processes like transport, regulation of cyclin dependent protein kinase activity, immune response and regulation of blood coagulation. A significant enrichment of housekeeping, disease related and hub genes were observed in these extreme constitution types. Ayurveda based method of phenotypic classification of extreme constitutional types allows us to uncover genes that may contribute to system level differences in normal individuals which could lead to differential disease predisposition. This is a first attempt towards unraveling the clinical phenotyping principle of a traditional system of medicine in terms of modern biology. An integration of Ayurveda with genomics holds potential and promise for future predictive medicine.

  10. Whole genome expression and biochemical correlates of extreme constitutional types defined in Ayurveda

    PubMed Central

    Prasher, Bhavana; Negi, Sapna; Aggarwal, Shilpi; Mandal, Amit K; Sethi, Tav P; Deshmukh, Shailaja R; Purohit, Sudha G; Sengupta, Shantanu; Khanna, Sangeeta; Mohammad, Farhan; Garg, Gaurav; Brahmachari, Samir K; Mukerji, Mitali

    2008-01-01

    Background Ayurveda is an ancient system of personalized medicine documented and practiced in India since 1500 B.C. According to this system an individual's basic constitution to a large extent determines predisposition and prognosis to diseases as well as therapy and life-style regime. Ayurveda describes seven broad constitution types (Prakritis) each with a varying degree of predisposition to different diseases. Amongst these, three most contrasting types, Vata, Pitta, Kapha, are the most vulnerable to diseases. In the realm of modern predictive medicine, efforts are being directed towards capturing disease phenotypes with greater precision for successful identification of markers for prospective disease conditions. In this study, we explore whether the different constitution types as described in Ayurveda has molecular correlates. Methods Normal individuals of the three most contrasting constitutional types were identified following phenotyping criteria described in Ayurveda in Indian population of Indo-European origin. The peripheral blood samples of these individuals were analysed for genome wide expression levels, biochemical and hematological parameters. Gene Ontology (GO) and pathway based analysis was carried out on differentially expressed genes to explore if there were significant enrichments of functional categories among Prakriti types. Results Individuals from the three most contrasting constitutional types exhibit striking differences with respect to biochemical and hematological parameters and at genome wide expression levels. Biochemical profiles like liver function tests, lipid profiles, and hematological parameters like haemoglobin exhibited differences between Prakriti types. Functional categories of genes showing differential expression among Prakriti types were significantly enriched in core biological processes like transport, regulation of cyclin dependent protein kinase activity, immune response and regulation of blood coagulation. A significant enrichment of housekeeping, disease related and hub genes were observed in these extreme constitution types. Conclusion Ayurveda based method of phenotypic classification of extreme constitutional types allows us to uncover genes that may contribute to system level differences in normal individuals which could lead to differential disease predisposition. This is a first attempt towards unraveling the clinical phenotyping principle of a traditional system of medicine in terms of modern biology. An integration of Ayurveda with genomics holds potential and promise for future predictive medicine. PMID:18782426

  11. CoReCG: a comprehensive database of genes associated with colon-rectal cancer

    PubMed Central

    Agarwal, Rahul; Kumar, Binayak; Jayadev, Msk; Raghav, Dhwani; Singh, Ashutosh

    2016-01-01

    Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like genes, mutations, expression changes, epigenetic alteration or post translation modification, genetic heterogeneity. Earlier findings were mostly focused on understanding etiology of colorectal carcinogenesis but less emphasis were given for the comprehensive review of the existing findings of individual studies which can provide better diagnostics based on the suggested markers in discrete studies. Colon Rectal Cancer Gene Database (CoReCG), contains 2056 colon-rectal cancer genes information involved in distinct colorectal cancer stages sourced from published literature with an effective knowledge based information retrieval system. Additionally, interactive web interface enriched with various browsing sections, augmented with advance search facility for querying the database is provided for user friendly browsing, online tools for sequence similarity searches and knowledge based schema ensures a researcher friendly information retrieval mechanism. Colorectal cancer gene database (CoReCG) is expected to be a single point source for identification of colorectal cancer-related genes, thereby helping with the improvement of classification, diagnosis and treatment of human cancers. Database URL: lms.snu.edu.in/corecg PMID:27114494

  12. Differential expression of the TWEAK receptor Fn14 in IDH1 wild-type and mutant gliomas.

    PubMed

    Hersh, David S; Peng, Sen; Dancy, Jimena G; Galisteo, Rebeca; Eschbacher, Jennifer M; Castellani, Rudy J; Heath, Jonathan E; Legesse, Teklu; Kim, Anthony J; Woodworth, Graeme F; Tran, Nhan L; Winkles, Jeffrey A

    2018-06-01

    The TNF receptor superfamily member Fn14 is overexpressed by many solid tumor types, including glioblastoma (GBM), the most common and lethal form of adult brain cancer. GBM is notable for a highly infiltrative growth pattern and several groups have reported that high Fn14 expression levels can increase tumor cell invasiveness. We reported previously that the mesenchymal and proneural GBM transcriptomic subtypes expressed the highest and lowest levels of Fn14 mRNA, respectively. Given the recent histopathological re-classification of human gliomas by the World Health Organization based on isocitrate dehydrogenase 1 (IDH1) gene mutation status, we extended this work by comparing Fn14 gene expression in IDH1 wild-type (WT) and mutant (R132H) gliomas and in cell lines engineered to overexpress the IDH1 R132H enzyme. We found that both low-grade and high-grade (i.e., GBM) IDH1 R132H gliomas exhibit low Fn14 mRNA and protein levels compared to IDH1 WT gliomas. Forced overexpression of the IDH1 R132H protein in glioma cells reduced Fn14 expression, while treatment of IDH1 R132H-overexpressing cells with the IDH1 R132H inhibitor AGI-5198 or the DNA demethylating agent 5-aza-2'-deoxycytidine increased Fn14 expression. These results support a role for Fn14 in the more aggressive and invasive phenotype associated with IDH1 WT tumors and indicate that the low levels of Fn14 gene expression noted in IDH1 R132H mutant gliomas may be due to epigenetic regulation via changes in DNA methylation.

  13. [GST genes expression as prognostic factor in papillary thyroid cancer].

    PubMed

    Gonçalves, Antonio Jose; Monte, Osmar; Morari, Eliane Cristina; Ward, Laura Sterian; Nakasako, Diana Shimoda; Nieto, Juliana; Nakai, Marianne Yumi

    2009-01-01

    Analyze the relationship between the AMES classification and molecular factors from Glutation-S-Transferase System, specifically the GSTT1 and GSTM1 in patients with well differentiated thyroid cancer. Samples of thyroid tissue of 66 patients with papillary thyroid carcinoma were obtained (53 women and 13 men). Patients were divided in two groups (high and low risk) according to the AMES classification. In each group, presence of the null genotype of both GST enzymes system was studied. These results were compared with the AMES classification. Samples were obtained in the operating room immediately after thyroidectomy, placed in cryotubes, immersed in liquid nitrogen and stored in a freezer at -80 masculineC. DNA of this enzymes was extracted by the fenol-cloroformium method. There were 17 high risk patients and 49 low risk patients. The null genotype of the high risk group was 5.8% and in the other group was 6.1%. There was no relationship between absence of genes GSTT1 and GSTM1 and prognosis of the papillary thyroid carcinoma when compared to the AMES classifications.

  14. Molecular Diagnostics of Gliomas Using Next Generation Sequencing of a Glioma-Tailored Gene Panel.

    PubMed

    Zacher, Angela; Kaulich, Kerstin; Stepanow, Stefanie; Wolter, Marietta; Köhrer, Karl; Felsberg, Jörg; Malzkorn, Bastian; Reifenberger, Guido

    2017-03-01

    Current classification of gliomas is based on histological criteria according to the World Health Organization (WHO) classification of tumors of the central nervous system. Over the past years, characteristic genetic profiles have been identified in various glioma types. These can refine tumor diagnostics and provide important prognostic and predictive information. We report on the establishment and validation of gene panel next generation sequencing (NGS) for the molecular diagnostics of gliomas. We designed a glioma-tailored gene panel covering 660 amplicons derived from 20 genes frequently aberrant in different glioma types. Sensitivity and specificity of glioma gene panel NGS for detection of DNA sequence variants and copy number changes were validated by single gene analyses. NGS-based mutation detection was optimized for application on formalin-fixed paraffin-embedded tissue specimens including small stereotactic biopsy samples. NGS data obtained in a retrospective analysis of 121 gliomas allowed for their molecular classification into distinct biological groups, including (i) isocitrate dehydrogenase gene (IDH) 1 or 2 mutant astrocytic gliomas with frequent α-thalassemia/mental retardation syndrome X-linked (ATRX) and tumor protein p53 (TP53) gene mutations, (ii) IDH mutant oligodendroglial tumors with 1p/19q codeletion, telomerase reverse transcriptase (TERT) promoter mutation and frequent Drosophila homolog of capicua (CIC) gene mutation, as well as (iii) IDH wildtype glioblastomas with frequent TERT promoter mutation, phosphatase and tensin homolog (PTEN) mutation and/or epidermal growth factor receptor (EGFR) amplification. Oligoastrocytic gliomas were genetically assigned to either of these groups. Our findings implicate gene panel NGS as a promising diagnostic technique that may facilitate integrated histological and molecular glioma classification. © 2016 International Society of Neuropathology.

  15. ImmunemiR - A Database of Prioritized Immune miRNA Disease Associations and its Interactome.

    PubMed

    Prabahar, Archana; Natarajan, Jeyakumar

    2017-01-01

    MicroRNAs are the key regulators of gene expression and their abnormal expression in the immune system may be associated with several human diseases such as inflammation, cancer and autoimmune diseases. Elucidation of miRNA disease association through the interactome will deepen the understanding of its disease mechanisms. A specialized database for immune miRNAs is highly desirable to demonstrate the immune miRNA disease associations in the interactome. miRNAs specific to immune related diseases were retrieved from curated databases such as HMDD, miR2disease and PubMed literature based on MeSH classification of immune system diseases. The additional data such as miRNA target genes, genes coding protein-protein interaction information were compiled from related resources. Further, miRNAs were prioritized to specific immune diseases using random walk ranking algorithm. In total 245 immune miRNAs associated with 92 OMIM disease categories were identified from external databases. The resultant data were compiled as ImmunemiR, a database of prioritized immune miRNA disease associations. This database provides both text based annotation information and network visualization of its interactome. To our knowledge, ImmunemiR is the first available database to provide a comprehensive repository of human immune disease associated miRNAs with network visualization options of its target genes, protein-protein interactions (PPI) and its disease associations. It is freely available at http://www.biominingbu.org/immunemir/. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  17. Molecular events of apical bud formation in white spruce, Picea glauca.

    PubMed

    El Kayal, Walid; Allen, Carmen C G; Ju, Chelsea J-T; Adams, Eri; King-Jones, Susanne; Zaharia, L Irina; Abrams, Suzanne R; Cooke, Janice E K

    2011-03-01

    Bud formation is an adaptive trait that temperate forest trees have acquired to facilitate seasonal synchronization. We have characterized transcriptome-level changes that occur during bud formation of white spruce [Picea glauca (Moench) Voss], a primarily determinate species in which preformed stem units contained within the apical bud constitute most of next season's growth. Microarray analysis identified 4460 differentially expressed sequences in shoot tips during short day-induced bud formation. Cluster analysis revealed distinct temporal patterns of expression, and functional classification of genes in these clusters implied molecular processes that coincide with anatomical changes occurring in the developing bud. Comparing expression profiles in developing buds under long day and short day conditions identified possible photoperiod-responsive genes that may not be essential for bud development. Several genes putatively associated with hormone signalling were identified, and hormone quantification revealed distinct profiles for abscisic acid (ABA), cytokinins, auxin and their metabolites that can be related to morphological changes to the bud. Comparison of gene expression profiles during bud formation in different tissues revealed 108 genes that are differentially expressed only in developing buds and show greater transcript abundance in developing buds than other tissues. These findings provide a temporal roadmap of bud formation in white spruce. © 2011 Blackwell Publishing Ltd.

  18. Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.

    PubMed

    Roche, Kimberly E; Weinstein, Marvin; Dunwoodie, Leland J; Poehlman, William L; Feltus, Frank A

    2018-05-25

    We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.

  19. Customizing chemotherapy for colon cancer: the potential of gene expression profiling.

    PubMed

    Mariadason, John M; Arango, Diego; Augenlicht, Leonard H

    2004-06-01

    The value of gene expression profiling, or microarray analysis, for the classification and prognosis of multiple forms of cancer is now clearly established. For colon cancer, expression profiling can readily discriminate between normal and tumor tissue, and to some extent between tumors of different histopathological stage and prognosis. While a definitive in vivo study demonstrating the potential of this methodology for predicting response to chemotherapy is presently lacking, the ability of microarrays to distinguish other subtleties of colon cancer phenotype, as well as recent in vitro proof-of-principle experiments utilizing colon cancer cell lines, illustrate the potential of this methodology for predicting the probability of response to specific chemotherapeutic agents. This review discusses some of the recent advances in the use of microarray analysis for understanding and distinguishing colon cancer subtypes, and attempts to identify challenges that need to be overcome in order to achieve the goal of using gene expression profiling for customizing chemotherapy in colon cancer.

  20. Molecular impact of juvenile hormone agonists on neonatal Daphnia magna.

    PubMed

    Toyota, Kenji; Kato, Yasuhiko; Miyakawa, Hitoshi; Yatsu, Ryohei; Mizutani, Takeshi; Ogino, Yukiko; Miyagawa, Shinichi; Watanabe, Hajime; Nishide, Hiroyo; Uchiyama, Ikuo; Tatarazako, Norihisa; Iguchi, Taisen

    2014-05-01

    Daphnia magna has been used extensively to evaluate organism- and population-level responses to pollutants in acute toxicity and reproductive toxicity tests. We have previously reported that exposure to juvenile hormone (JH) agonists results in a reduction of reproductive function and production of male offspring in a cyclic parthenogenesis, D. magna. Recent advances in molecular techniques have provided tools to understand better the responses to pollutants in aquatic organisms, including D. magna. DNA microarray was used to evaluate gene expression profiles of neonatal daphnids exposed to JH agonists: methoprene (125, 250 and 500 ppb), fenoxycarb (0.5, 1 and 2 ppb) and epofenonane (50, 100 and 200 ppb). Exposure to these JH analogs resulted in chemical-specific patterns of gene expression. The heat map analyses based on hierarchical clustering revealed a similar pattern between treatments with a high dose of methoprene and with epofenonane. In contrast, treatment with low to middle doses of methoprene resulted in similar profiles to fenoxycarb treatments. Hemoglobin and JH epoxide hydrolase genes were clustered as JH-responsive genes. These data suggest that fenoxycarb has high activity as a JH agonist, methoprene shows high toxicity and epofenonane works through a different mechanism compared with other JH analogs, agreeing with data of previously reported toxicity tests. In conclusion, D. magna DNA microarray is useful for the classification of JH analogs and identification of JH-responsive genes. Copyright © 2013 John Wiley & Sons, Ltd.

  1. Scanning of Transposable Elements and Analyzing Expression of Transposase Genes of Sweet Potato [Ipomoea batatas

    PubMed Central

    Tao, Xiang; Lai, Xian-Jun; Zhang, Yi-Zheng; Tan, Xue-Mei; Wang, Haiyan

    2014-01-01

    Background Transposable elements (TEs) are the most abundant genomic components in eukaryotes and affect the genome by their replications and movements to generate genetic plasticity. Sweet potato performs asexual reproduction generally and the TEs may be an important genetic factor for genome reorganization. Complete identification of TEs is essential for the study of genome evolution. However, the TEs of sweet potato are still poorly understood because of its complex hexaploid genome and difficulty in genome sequencing. The recent availability of the sweet potato transcriptome databases provides an opportunity for discovering and characterizing the expressed TEs. Methodology/Principal Findings We first established the integrated-transcriptome database by de novo assembling four published sweet potato transcriptome databases from three cultivars in China. Using sequence-similarity search and analysis, a total of 1,405 TEs including 883 retrotransposons and 522 DNA transposons were predicted and categorized. Depending on mapping sets of RNA-Seq raw short reads to the predicted TEs, we compared the quantities, classifications and expression activities of TEs inter- and intra-cultivars. Moreover, the differential expressions of TEs in seven tissues of Xushu 18 cultivar were analyzed by using Illumina digital gene expression (DGE) tag profiling. It was found that 417 TEs were expressed in one or more tissues and 107 in all seven tissues. Furthermore, the copy number of 11 transposase genes was determined to be 1–3 copies in the genome of sweet potato by Real-time PCR-based absolute quantification. Conclusions/Significance Our result provides a new method for TE searching on species with transcriptome sequences while lacking genome information. The searching, identification and expression analysis of TEs will provide useful TE information in sweet potato, which are valuable for the further studies of TE-mediated gene mutation and optimization in asexual reproduction. It contributes to elucidating the roles of TEs in genome evolution. PMID:24608103

  2. Serrated colorectal cancer: Molecular classification, prognosis, and response to chemotherapy

    PubMed Central

    Murcia, Oscar; Juárez, Miriam; Hernández-Illán, Eva; Egoavil, Cecilia; Giner-Calabuig, Mar; Rodríguez-Soler, María; Jover, Rodrigo

    2016-01-01

    Molecular advances support the existence of an alternative pathway of colorectal carcinogenesis that is based on the hypermethylation of specific DNA regions that silences tumor suppressor genes. This alternative pathway has been called the serrated pathway due to the serrated appearance of tumors in histological analysis. New classifications for colorectal cancer (CRC) were proposed recently based on genetic profiles that show four types of molecular alterations: BRAF gene mutations, KRAS gene mutations, microsatellite instability, and hypermethylation of CpG islands. This review summarizes what is known about the serrated pathway of CRC, including CRC molecular and clinical features, prognosis, and response to chemotherapy. PMID:27053844

  3. Stroma-associated master regulators of molecular subtypes predict patient prognosis in ovarian cancer.

    PubMed

    Zhang, Shengzhe; Jing, Ying; Zhang, Meiying; Zhang, Zhenfeng; Ma, Pengfei; Peng, Huixin; Shi, Kaixuan; Gao, Wei-Qiang; Zhuang, Guanglei

    2015-11-04

    High-grade serous ovarian carcinoma (HGS-OvCa) has the lowest survival rate among all gynecologic cancers and is hallmarked by a high degree of heterogeneity. The Cancer Genome Atlas network has described a gene expression-based molecular classification of HGS-OvCa into Differentiated, Mesenchymal, Immunoreactive and Proliferative subtypes. However, the biological underpinnings and regulatory mechanisms underlying the distinct molecular subtypes are largely unknown. Here we showed that tumor-infiltrating stromal cells significantly contributed to the assignments of Mesenchymal and Immunoreactive clusters. Using reverse engineering and an unbiased interrogation of subtype regulatory networks, we identified the transcriptional modules containing master regulators that drive gene expression of Mesenchymal and Immunoreactive HGS-OvCa. Mesenchymal master regulators were associated with poor prognosis, while Immunoreactive master regulators positively correlated with overall survival. Meta-analysis of 749 HGS-OvCa expression profiles confirmed that master regulators as a prognostic signature were able to predict patient outcome. Our data unraveled master regulatory programs of HGS-OvCa subtypes with prognostic and potentially therapeutic relevance, and suggested that the unique transcriptional and clinical characteristics of ovarian Mesenchymal and Immunoreactive subtypes could be, at least partially, ascribed to tumor microenvironment.

  4. Gene expression profile of collagen types, osteopontin in the tympanic membrane of patients with tympanosclerosis.

    PubMed

    Sakowicz-Burkiewicz, Monika; Kuczkowski, Jerzy; Przybyła, Tomasz; Grdeń, Marzena; Starzyńska, Anna; Pawełczyk, Tadeusz

    2017-09-01

    Tympanosclerosis is a pathological process involving the middle ear. The hallmark of this disease is the formation of calcium deposits. In the submucosal layer, as well as in the right layer of the tympanic membrane, the calcium deposits result in a significant increase in the activity of fibroblasts and deposition of collagen fibers. The aim of our study was to examine the expression level of genes encoding collagen type I, II, III and IV (COL1A1, COL2A1, COL3A1, COL4A1) and osteopontin (SPP1) in the tympanic membrane of patients with tympanosclerosis. The total RNA was isolated from middle ear tissues with tympanosclerosis, received from 25 patients and from 19 normal tympanic membranes. The gene expression level was determined by real-time RT-PCR. The gene expression levels were correlated with clinical Tos classification of tympanosclerosis. We observed that in the tympanic membrane of patients with tympanosclerosis, the expression of type I collagen is decreased, while the expression of type II and IV collagen and osteopontin is increased. Moreover, mRNA levels of the investigated genes strongly correlated with the clinical stages of tympanosclerosis. The strong correlations between the expression of type I, II, IV collagen and osteopontin and the clinical stage of tympanosclerosis indicate the involvement of these proteins in excessive fibrosis and pathological remodeling of the tympanic membrane. In the future, a treatment aiming to modulate these gene expressions and/or regulation of the degradation of their protein products could be used as a new medical approach for patients with tympanosclerosis.

  5. Genome-Wide Analysis of bZIP-Encoding Genes in Maize

    PubMed Central

    Wei, Kaifa; Chen, Juan; Wang, Yanmei; Chen, Yanhui; Chen, Shaoxiang; Lin, Yina; Pan, Si; Zhong, Xiaojun; Xie, Daoxin

    2012-01-01

    In plants, basic leucine zipper (bZIP) proteins regulate numerous biological processes such as seed maturation, flower and vascular development, stress signalling and pathogen defence. We have carried out a genome-wide identification and analysis of 125 bZIP genes that exist in the maize genome, encoding 170 distinct bZIP proteins. This family can be divided into 11 groups according to the phylogenetic relationship among the maize bZIP proteins and those in Arabidopsis and rice. Six kinds of intron patterns (a–f) within the basic and hinge regions are defined. The additional conserved motifs have been identified and present the group specificity. Detailed three-dimensional structure analysis has been done to display the sequence conservation and potential distribution of the bZIP domain. Further, we predict the DNA-binding pattern and the dimerization property on the basis of the characteristic features in the basic and hinge regions and the leucine zipper, respectively, which supports our classification greatly and helps to classify 26 distinct subfamilies. The chromosome distribution and the genetic analysis reveal that 58 ZmbZIP genes are located in the segmental duplicate regions in the maize genome, suggesting that the segment chromosomal duplications contribute greatly to the expansion of the maize bZIP family. Across the 60 different developmental stages of 11 organs, three apparent clusters formed represent three kinds of different expression patterns among the ZmbZIP gene family in maize development. A similar but slightly different expression pattern of bZIPs in two inbred lines displays that 22 detected ZmbZIP genes might be involved in drought stress. Thirteen pairs and 143 pairs of ZmbZIP genes show strongly negative and positive correlations in the four distinct fungal infections, respectively, based on the expression profile and Pearson's correlation coefficient analysis. PMID:23103471

  6. Divergent evolution of arrested development in the dauer stage of Caenorhabditis elegans and the infective stage of Heterodera glycines

    PubMed Central

    Elling, Axel A; Mitreva, Makedonka; Recknor, Justin; Gai, Xiaowu; Martin, John; Maier, Thomas R; McDermott, Jeffrey P; Hewezi, Tarek; McK Bird, David; Davis, Eric L; Hussey, Richard S; Nettleton, Dan; McCarter, James P; Baum, Thomas J

    2007-01-01

    Background The soybean cyst nematode Heterodera glycines is the most important parasite in soybean production worldwide. A comprehensive analysis of large-scale gene expression changes throughout the development of plant-parasitic nematodes has been lacking to date. Results We report an extensive genomic analysis of H. glycines, beginning with the generation of 20,100 expressed sequence tags (ESTs). In-depth analysis of these ESTs plus approximately 1,900 previously published sequences predicted 6,860 unique H. glycines genes and allowed a classification by function using InterProScan. Expression profiling of all 6,860 genes throughout the H. glycines life cycle was undertaken using the Affymetrix Soybean Genome Array GeneChip. Our data sets and results represent a comprehensive resource for molecular studies of H. glycines. Demonstrating the power of this resource, we were able to address whether arrested development in the Caenorhabditis elegans dauer larva and the H. glycines infective second-stage juvenile (J2) exhibits shared gene expression profiles. We determined that the gene expression profiles associated with the C. elegans dauer pathway are not uniformly conserved in H. glycines and that the expression profiles of genes for metabolic enzymes of C. elegans dauer larvae and H. glycines infective J2 are dissimilar. Conclusion Our results indicate that hallmark gene expression patterns and metabolism features are not shared in the developmentally arrested life stages of C. elegans and H. glycines, suggesting that developmental arrest in these two nematode species has undergone more divergent evolution than previously thought and pointing to the need for detailed genomic analyses of individual parasite species. PMID:17919324

  7. Differential gene expression profiles of peripheral blood mononuclear cells in childhood asthma.

    PubMed

    Kong, Qian; Li, Wen-Jing; Huang, Hua-Rong; Zhong, Ying-Qiang; Fang, Jian-Pei

    2015-05-01

    Asthma is a common childhood disease with strong genetic components. This study compared whole-genome expression differences between asthmatic young children and healthy controls to identify gene signatures of childhood asthma. Total RNA extracted from peripheral blood mononuclear cells (PBMC) was subjected to microarray analysis. QRT-PCR was performed to verify the microarray results. Classification and functional characterization of differential genes were illustrated by hierarchical clustering and gene ontology analysis. Multiple logistic regression (MLR) analysis, receiver operating characteristic (ROC) curve analysis, and discriminate power were used to scan asthma-specific diagnostic markers. For fold-change>2 and p < 0.05, there were 758 named differential genes. The results of QRT-PCR confirmed successfully the array data. Hierarchical clustering divided 29 highly possible genes into seven categories and the genes in the same cluster were likely to possess similar expression patterns or functions. Gene ontology analysis presented that differential genes primarily enriched in immune response, response to stress or stimulus, and regulation of apoptosis in biological process. MLR and ROC curve analysis revealed that the combination of ADAM33, Smad7, and LIGHT possessed excellent discriminating power. The combination of ADAM33, Smad7, and LIGHT would be a reliable and useful childhood asthma model for prediction and diagnosis.

  8. On the statistical assessment of classifiers using DNA microarray data

    PubMed Central

    Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F

    2006-01-01

    Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed. Conclusions The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. PMID:16919171

  9. Viral proliferation and expression of tumor-related gene in different chicken embryo fibroblasts infected with different tumorigenic phenotypes of avian leukosis virus subgroup J.

    PubMed

    Qu, Yajin; Liu, Litao; Niu, Yujuan; Qu, Yue; Li, Ning; Sun, Wei; Lv, Chuanwei; Wang, Pengfei; Zhang, Guihua; Liu, Sidang

    2016-10-01

    Subgroup J avian leukosis virus (ALV-J) causes a neoplastic disease in infected chickens. The ALV-J strain NX0101, which was isolated from broiler breeders in 2001, mainly induced formation of myeloid cell tumors. However, strain HN10PY01, which was recently isolated from laying hens, mainly induces formation of myeloid cell tumors and hemangioma. To identify the molecular pathological mechanism underlying changes in host susceptibility and tumor classification induced by these two types of ALV-J strains, chicken embryo fibroblasts derived from chickens with different genetic backgrounds (broiler breeders and laying hens) and an immortalized chicken embryo fibroblasts (DF-1) were prepared and infected with strain NX0101 or HN10PY01, respectively. The 50% tissue culture infective dose (TCID50) and levels of ALV group-specific antigen p27 and heat shock protein 70 in the supernatant collected from the ALV-J infected cells were detected. Moreover, mRNA expression levels of tumor-related genes p53, c-myc, and Bcl-2 in ALV-J-infected cells were quantified. The results indicated that the infection of ALV-J could significantly increase mRNA expression levels of p53, c-myc, and Bcl-2 Strain HN10PY01 exhibited a greater influence on the three tumor-related genes in each of the three types of cells when compared with strain NX0101, and the TCID50 and p27 levels in the supernatant collected from HN10PY01-infected cells were higher than those collected from NX0101-infected cells. These results indicate that the infection of the two ALV-J strains influenced the gene expression levels in the infected cells, while the newly isolated strain HN10PY01 showed higher replication ability in cells and induced higher expression levels of tumor-related genes in infected cells. Furthermore, virus titers and expression levels of tumor-related genes and cellular stress responses of cells with different genetic backgrounds when infected with each of the two ALV-J strain were different, indicating that genetic backgrounds influenced the capabilities of the virus to infect and proliferate. The findings of this study provide useful data to further elucidate the mechanism underlying host susceptibility and tumor classification in ALV-J-infected chickens and cells. © 2016 Poultry Science Association Inc.

  10. Genome-wide identification of novel expression signatures reveal distinct patterns and prevalence of binding motifs for p53, nuclear factor-κB and other signal transcription factors in head and neck squamous cell carcinoma

    PubMed Central

    Yan, Bin; Yang, Xinping; Lee, Tin-Lap; Friedman, Jay; Tang, Jun; Van Waes, Carter; Chen, Zhong

    2007-01-01

    Background Differentially expressed gene profiles have previously been observed among pathologically defined cancers by microarray technologies, including head and neck squamous cell carcinomas (HNSCCs). However, the molecular expression signatures and transcriptional regulatory controls that underlie the heterogeneity in HNSCCs are not well defined. Results Genome-wide cDNA microarray profiling of ten HNSCC cell lines revealed novel gene expression signatures that distinguished cancer cell subsets associated with p53 status. Three major clusters of over-expressed genes (A to C) were defined through hierarchical clustering, Gene Ontology, and statistical modeling. The promoters of genes in these clusters exhibited different patterns and prevalence of transcription factor binding sites for p53, nuclear factor-κB (NF-κB), activator protein (AP)-1, signal transducer and activator of transcription (STAT)3 and early growth response (EGR)1, as compared with the frequency in vertebrate promoters. Cluster A genes involved in chromatin structure and function exhibited enrichment for p53 and decreased AP-1 binding sites, whereas clusters B and C, containing cytokine and antiapoptotic genes, exhibited a significant increase in prevalence of NF-κB binding sites. An increase in STAT3 and EGR1 binding sites was distributed among the over-expressed clusters. Novel regulatory modules containing p53 or NF-κB concomitant with other transcription factor binding motifs were identified, and experimental data supported the predicted transcriptional regulation and binding activity. Conclusion The transcription factors p53, NF-κB, and AP-1 may be important determinants of the heterogeneous pattern of gene expression, whereas STAT3 and EGR1 may broadly enhance gene expression in HNSCCs. Defining these novel gene signatures and regulatory mechanisms will be important for establishing new molecular classifications and subtyping, which in turn will promote development of targeted therapeutics for HNSCC. PMID:17498291

  11. The family structure of the Mucorales: a synoptic revision based on comprehensive multigene-genealogies.

    PubMed

    Hoffmann, K; Pawłowska, J; Walther, G; Wrzosek, M; de Hoog, G S; Benny, G L; Kirk, P M; Voigt, K

    2013-06-01

    The Mucorales (Mucoromycotina) are one of the most ancient groups of fungi comprising ubiquitous, mostly saprotrophic organisms. The first comprehensive molecular studies 11 yr ago revealed the traditional classification scheme, mainly based on morphology, as highly artificial. Since then only single clades have been investigated in detail but a robust classification of the higher levels based on DNA data has not been published yet. Therefore we provide a classification based on a phylogenetic analysis of four molecular markers including the large and the small subunit of the ribosomal DNA, the partial actin gene and the partial gene for the translation elongation factor 1-alpha. The dataset comprises 201 isolates in 103 species and represents about one half of the currently accepted species in this order. Previous family concepts are reviewed and the family structure inferred from the multilocus phylogeny is introduced and discussed. Main differences between the current classification and preceding concepts affects the existing families Lichtheimiaceae and Cunninghamellaceae, as well as the genera Backusella and Lentamyces which recently obtained the status of families along with the Rhizopodaceae comprising Rhizopus, Sporodiniella and Syzygites. Compensatory base change analyses in the Lichtheimiaceae confirmed the lower level classification of Lichtheimia and Rhizomucor while genera such as Circinella or Syncephalastrum completely lacked compensatory base changes.

  12. Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

    PubMed

    Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

    2014-02-03

    Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.

  13. Biochemical and transcriptomic analyses reveal different metabolite biosynthesis profiles among three color and developmental stages in 'Anji Baicha' (Camellia sinensis).

    PubMed

    Li, Chun-Fang; Xu, Yan-Xia; Ma, Jian-Qiang; Jin, Ji-Qiang; Huang, Dan-Juan; Yao, Ming-Zhe; Ma, Chun-Lei; Chen, Liang

    2016-09-08

    The new shoots of the albino tea cultivar 'Anji Baicha' are yellow or white at low temperatures and turn green as the environmental temperatures increase during the early spring. 'Anji Baicha' metabolite profiles exhibit considerable variability over three color and developmental stages, especially regarding the carotenoid, chlorophyll, and theanine concentrations. Previous studies focused on physiological characteristics, gene expression differences, and variations in metabolite abundances in albino tea plant leaves at specific growth stages. However, the molecular mechanisms regulating metabolite biosynthesis in various color and developmental stages in albino tea leaves have not been fully characterized. We used RNA-sequencing to analyze 'Anji Baicha' leaves at the yellow-green, albescent, and re-greening stages. The leaf transcriptomes differed considerably among the three stages. Functional classifications based on Gene Ontology enrichment and Kyoto Encyclopedia of Genes and Genomes enrichment analyses revealed that differentially expressed unigenes were mainly related to metabolic pathways, biosynthesis of secondary metabolites, phenylpropanoid biosynthesis, and carbon fixation in photosynthetic organisms. Chemical analyses revealed higher β-carotene and theanine levels, but lower chlorophyll a levels, in the albescent stage than in the green stage. Furthermore, unigenes involved in carotenoid, chlorophyll, and theanine biosyntheses were identified, and the expression patterns of the differentially expressed unigenes in these biosynthesis pathways were characterized. Through co-expression analyses, we identified the key genes in these pathways. These genes may be responsible for the metabolite biosynthesis differences among the different leaf color and developmental stages of 'Anji Baicha' tea plants. Our study presents the results of transcriptomic and biochemical analyses of 'Anji Baicha' tea plants at various stages. The distinct transcriptome profiles for each color and developmental stage enabled us to identify changes to biosynthesis pathways and revealed the contributions of such variations to the albino phenotype of tea plants. Furthermore, comparisons of the transcriptomes and related metabolites helped clarify the molecular regulatory mechanisms underlying the secondary metabolic pathways in different stages.

  14. Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification.

    PubMed

    Lu, Huijuan; Wei, Shasha; Zhou, Zili; Miao, Yanzi; Lu, Yi

    2015-01-01

    The main purpose of traditional classification algorithms on bioinformatics application is to acquire better classification accuracy. However, these algorithms cannot meet the requirement that minimises the average misclassification cost. In this paper, a new algorithm of cost-sensitive regularised extreme learning machine (CS-RELM) was proposed by using probability estimation and misclassification cost to reconstruct the classification results. By improving the classification accuracy of a group of small sample which higher misclassification cost, the new CS-RELM can minimise the classification cost. The 'rejection cost' was integrated into CS-RELM algorithm to further reduce the average misclassification cost. By using Colon Tumour dataset and SRBCT (Small Round Blue Cells Tumour) dataset, CS-RELM was compared with other cost-sensitive algorithms such as extreme learning machine (ELM), cost-sensitive extreme learning machine, regularised extreme learning machine, cost-sensitive support vector machine (SVM). The results of experiments show that CS-RELM with embedded rejection cost could reduce the average cost of misclassification and made more credible classification decision than others.

  15. Molecular Testing for miRNA, mRNA, and DNA on Fine-Needle Aspiration Improves the Preoperative Diagnosis of Thyroid Nodules With Indeterminate Cytology.

    PubMed

    Labourier, Emmanuel; Shifrin, Alexander; Busseniers, Anne E; Lupo, Mark A; Manganelli, Monique L; Andruss, Bernard; Wylie, Dennis; Beaudenon-Huibregtse, Sylvie

    2015-07-01

    Molecular testing for oncogenic mutations or gene expression in fine-needle aspirations (FNAs) from thyroid nodules with indeterminate cytology identifies a subset of benign or malignant lesions with high predictive value. This study aimed to evaluate a novel diagnostic algorithm combining mutation detection and miRNA expression to improve the diagnostic yield of molecular cytology. Surgical specimens and preoperative FNAs (n = 638) were tested for 17 validated gene alterations using the miRInform Thyroid test and with a 10-miRNA gene expression classifier generating positive (malignant) or negative (benign) results. Cross-sectional sampling of thyroid nodules with atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) or follicular neoplasm/suspicious for a follicular neoplasm (FN/SFN) cytology (n = 109) was conducted at 12 endocrinology centers across the United States. Qualitative molecular results were compared with surgical histopathology to determine diagnostic performance and model clinical effect. Mutations were detected in 69% of nodules with malignant outcome. Among mutation-negative specimens, miRNA testing correctly identified 64% of malignant cases and 98% of benign cases. The diagnostic sensitivity and specificity of the combined algorithm was 89% (95% confidence interval [CI], 73-97%) and 85% (95% CI, 75-92%), respectively. At 32% cancer prevalence, 61% of the molecular results were benign with a negative predictive value of 94% (95% CI, 85-98%). Independently of variations in cancer prevalence, the test increased the yield of true benign results by 65% relative to mRNA-based gene expression classification and decreased the rate of avoidable diagnostic surgeries by 69%. Multiplatform testing for DNA, mRNA, and miRNA can accurately classify benign and malignant thyroid nodules, increase the diagnostic yield of molecular cytology, and further improve the preoperative risk-based management of benign nodules with AUS/FLUS or FN/SFN cytology.

  16. Functional genomic responses to cystic fibrosis transmembrane conductance regulator (CFTR) and CFTR(delta508) in the lung.

    PubMed

    Xu, Yan; Liu, Cong; Clark, Jean C; Whitsett, Jeffrey A

    2006-04-21

    Cystic fibrosis (CF), a common lethal pulmonary disorder in Caucasians, is caused by mutations in the cystic fibrosis transmembrane conductance regulator gene (CFTR) that disturbs fluid homeostasis and host defense in target organs. The effects of CFTR and delta508-CFTR were assessed in transgenic mice that 1) lack CFTR expression (Cftr-/-); 2) express the human delta508 CFTR (CFTR(delta508)); 3) overexpress the normal human CFTR (CFTR(tg)) in respiratory epithelial cells. Genes were selected from Affymetrix Murine Gene-Chips analysis and subjected to functional classification, k-means clustering, promoter cis-elements/modules searching, literature mining, and pathway exploring. Genomic responses to Cftr-/- were not corrected by expression of CFTR(delta508). Genes regulating host defense, inflammation, fluid and electrolyte transport were similarly altered in Cftr-/- and CFTR(delta508) mice. CFTR(delta508) induced a primary disturbance in expression of genes regulating redox and antioxidant systems. Genomic responses to CFTR(tg) were modest and were not associated with lung pathology. CFTR(tg) and CFTR(delta508) induced genes encoding heat shock proteins and other chaperones but did not activate the endoplasmic reticulum-associated degradation pathway. RNAs encoding proteins that directly interact with CFTR were identified in each of the CFTR mouse models, supporting the hypothesis that CFTR functions within a multiprotein complex whose members interact at the level of protein-protein interactions and gene expression. Promoters of genes influenced by CFTR shared common regulatory elements, suggesting that their co-expression may be mediated by shared regulatory mechanisms. Genes and pathways involved in the response to CFTR may be of interest as modifiers of CF.

  17. Predictive model for inflammation grades of chronic hepatitis B: Large-scale analysis of clinical parameters and gene expressions.

    PubMed

    Zhou, Weichen; Ma, Yanyun; Zhang, Jun; Hu, Jingyi; Zhang, Menghan; Wang, Yi; Li, Yi; Wu, Lijun; Pan, Yida; Zhang, Yitong; Zhang, Xiaonan; Zhang, Xinxin; Zhang, Zhanqing; Zhang, Jiming; Li, Hai; Lu, Lungen; Jin, Li; Wang, Jiucun; Yuan, Zhenghong; Liu, Jie

    2017-11-01

    Liver biopsy is the gold standard to assess pathological features (eg inflammation grades) for hepatitis B virus-infected patients although it is invasive and traumatic; meanwhile, several gene profiles of chronic hepatitis B (CHB) have been separately described in relatively small hepatitis B virus (HBV)-infected samples. We aimed to analyse correlations among inflammation grades, gene expressions and clinical parameters (serum alanine amino transaminase, aspartate amino transaminase and HBV-DNA) in large-scale CHB samples and to predict inflammation grades by using clinical parameters and/or gene expressions. We analysed gene expressions with three clinical parameters in 122 CHB samples by an improved regression model. Principal component analysis and machine-learning methods including Random Forest, K-nearest neighbour and support vector machine were used for analysis and further diagnosis models. Six normal samples were conducted to validate the predictive model. Significant genes related to clinical parameters were found enriching in the immune system, interferon-stimulated, regulation of cytokine production, anti-apoptosis, and etc. A panel of these genes with clinical parameters can effectively predict binary classifications of inflammation grade (area under the ROC curve [AUC]: 0.88, 95% confidence interval [CI]: 0.77-0.93), validated by normal samples. A panel with only clinical parameters was also valuable (AUC: 0.78, 95% CI: 0.65-0.86), indicating that liquid biopsy method for detecting the pathology of CHB is possible. This is the first study to systematically elucidate the relationships among gene expressions, clinical parameters and pathological inflammation grades in CHB, and to build models predicting inflammation grades by gene expressions and/or clinical parameters as well. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  18. A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    PubMed Central

    2011-01-01

    Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy. PMID:22208882

  19. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database.

    PubMed

    Thompson, Bryony A; Spurdle, Amanda B; Plazzer, John-Paul; Greenblatt, Marc S; Akagi, Kiwamu; Al-Mulla, Fahd; Bapat, Bharati; Bernstein, Inge; Capellá, Gabriel; den Dunnen, Johan T; du Sart, Desiree; Fabre, Aurelie; Farrell, Michael P; Farrington, Susan M; Frayling, Ian M; Frebourg, Thierry; Goldgar, David E; Heinen, Christopher D; Holinski-Feder, Elke; Kohonen-Corish, Maija; Robinson, Kristina Lagerstedt; Leung, Suet Yi; Martins, Alexandra; Moller, Pal; Morak, Monika; Nystrom, Minna; Peltomaki, Paivi; Pineda, Marta; Qi, Ming; Ramesar, Rajkumar; Rasmussen, Lene Juel; Royer-Pokora, Brigitte; Scott, Rodney J; Sijmons, Rolf; Tavtigian, Sean V; Tops, Carli M; Weber, Thomas; Wijnen, Juul; Woods, Michael O; Macrae, Finlay; Genuardi, Maurizio

    2014-02-01

    The clinical classification of hereditary sequence variants identified in disease-related genes directly affects clinical management of patients and their relatives. The International Society for Gastrointestinal Hereditary Tumours (InSiGHT) undertook a collaborative effort to develop, test and apply a standardized classification scheme to constitutional variants in the Lynch syndrome-associated genes MLH1, MSH2, MSH6 and PMS2. Unpublished data submission was encouraged to assist in variant classification and was recognized through microattribution. The scheme was refined by multidisciplinary expert committee review of the clinical and functional data available for variants, applied to 2,360 sequence alterations, and disseminated online. Assessment using validated criteria altered classifications for 66% of 12,006 database entries. Clinical recommendations based on transparent evaluation are now possible for 1,370 variants that were not obviously protein truncating from nomenclature. This large-scale endeavor will facilitate the consistent management of families suspected to have Lynch syndrome and demonstrates the value of multidisciplinary collaboration in the curation and classification of variants in public locus-specific databases.

  20. Application of a five-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants lodged on the InSiGHT locus-specific database

    PubMed Central

    Plazzer, John-Paul; Greenblatt, Marc S.; Akagi, Kiwamu; Al-Mulla, Fahd; Bapat, Bharati; Bernstein, Inge; Capellá, Gabriel; den Dunnen, Johan T.; du Sart, Desiree; Fabre, Aurelie; Farrell, Michael P.; Farrington, Susan M.; Frayling, Ian M.; Frebourg, Thierry; Goldgar, David E.; Heinen, Christopher D.; Holinski-Feder, Elke; Kohonen-Corish, Maija; Robinson, Kristina Lagerstedt; Leung, Suet Yi; Martins, Alexandra; Moller, Pal; Morak, Monika; Nystrom, Minna; Peltomaki, Paivi; Pineda, Marta; Qi, Ming; Ramesar, Rajkumar; Rasmussen, Lene Juel; Royer-Pokora, Brigitte; Scott, Rodney J.; Sijmons, Rolf; Tavtigian, Sean V.; Tops, Carli M.; Weber, Thomas; Wijnen, Juul; Woods, Michael O.; Macrae, Finlay; Genuardi, Maurizio

    2015-01-01

    Clinical classification of sequence variants identified in hereditary disease genes directly affects clinical management of patients and their relatives. The International Society for Gastrointestinal Hereditary Tumours (InSiGHT) undertook a collaborative effort to develop, test and apply a standardized classification scheme to constitutional variants in the Lynch Syndrome genes MLH1, MSH2, MSH6 and PMS2. Unpublished data submission was encouraged to assist variant classification, and recognized by microattribution. The scheme was refined by multidisciplinary expert committee review of clinical and functional data available for variants, applied to 2,360 sequence alterations, and disseminated online. Assessment using validated criteria altered classifications for 66% of 12,006 database entries. Clinical recommendations based on transparent evaluation are now possible for 1,370 variants not obviously protein-truncating from nomenclature. This large-scale endeavor will facilitate consistent management of suspected Lynch Syndrome families, and demonstrates the value of multidisciplinary collaboration for curation and classification of variants in public locus-specific databases. PMID:24362816

  1. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data.

    PubMed

    Ji, Jiadong; He, Di; Feng, Yang; He, Yong; Xue, Fuzhong; Xie, Lei

    2017-10-01

    A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. R scripts available at https://github.com/jijiadong/JDINAC. lxie@iscb.org. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Genome-wide analysis of WRKY gene family in the sesame genome and identification of the WRKY genes involved in responses to abiotic stresses.

    PubMed

    Li, Donghua; Liu, Pan; Yu, Jingyin; Wang, Linhai; Dossa, Komivi; Zhang, Yanxin; Zhou, Rong; Wei, Xin; Zhang, Xiurong

    2017-09-11

    Sesame (Sesamum indicum L.) is one of the world's most important oil crops. However, it is susceptible to abiotic stresses in general, and to waterlogging and drought stresses in particular. The molecular mechanisms of abiotic stress tolerance in sesame have not yet been elucidated. The WRKY domain transcription factors play significant roles in plant growth, development, and responses to stresses. However, little is known about the number, location, structure, molecular phylogenetics, and expression of the WRKY genes in sesame. We performed a comprehensive study of the WRKY gene family in sesame and identified 71 SiWRKYs. In total, 65 of these genes were mapped to 15 linkage groups within the sesame genome. A phylogenetic analysis was performed using a related species (Arabidopsis thaliana) to investigate the evolution of the sesame WRKY genes. Tissue expression profiles of the WRKY genes demonstrated that six SiWRKY genes were highly expressed in all organs, suggesting that these genes may be important for plant growth and organ development in sesame. Analysis of the SiWRKY gene expression patterns revealed that 33 and 26 SiWRKYs respond strongly to waterlogging and drought stresses, respectively. Changes in the expression of 12 SiWRKY genes were observed at different times after the waterlogging and drought treatments had begun, demonstrating that sesame gene expression patterns vary in response to abiotic stresses. In this study, we analyzed the WRKY family of transcription factors encoded by the sesame genome. Insight was gained into the classification, evolution, and function of the SiWRKY genes, revealing their putative roles in a variety of tissues. Responses to abiotic stresses in different sesame cultivars were also investigated. The results of our study provide a better understanding of the structures and functions of sesame WRKY genes and suggest that manipulating these WRKYs could enhance resistance to waterlogging and drought.

  3. Epigenetically induced ectopic expression of UNCX impairs the proliferation and differentiation of myeloid cells

    PubMed Central

    Daniele, Giulia; Simonetti, Giorgia; Fusilli, Caterina; Iacobucci, Ilaria; Lonoce, Angelo; Palazzo, Antonio; Lomiento, Mariana; Mammoli, Fabiana; Marsano, Renè Massimiliano; Marasco, Elena; Mantovani, Vilma; Quentmeier, Hilmar; Drexler, Hans G; Ding, Jie; Palumbo, Orazio; Carella, Massimo; Nadarajah, Niroshan; Perricone, Margherita; Ottaviani, Emanuela; Baldazzi, Carmen; Testoni, Nicoletta; Papayannidis, Cristina; Ferrari, Sergio; Mazza, Tommaso; Martinelli, Giovanni; Storlazzi, Clelia Tiziana

    2017-01-01

    We here describe a leukemogenic role of the homeobox gene UNCX, activated by epigenetic modifications in acute myeloid leukemia (AML). We found the ectopic activation of UNCX in a leukemia patient harboring a t(7;10)(p22;p14) translocation, in 22 of 61 of additional cases [a total of 23 positive patients out of 62 (37.1%)], and in 6 of 75 (8%) of AML cell lines. UNCX is embedded within a low-methylation region (canyon) and encodes for a transcription factor involved in somitogenesis and neurogenesis, with specific expression in the eye, brain, and kidney. UNCX expression turned out to be associated, and significantly correlated, with DNA methylation increase at its canyon borders based on data in our patients and in archived data of patients from The Cancer Genome Atlas. UNCX-positive and -negative patients displayed significant differences in their gene expression profiles. An enrichment of genes involved in cell proliferation and differentiation, such as MAP2K1 and CCNA1, was revealed. Similar results were obtained in UNCX-transduced CD34+ cells, associated with low proliferation and differentiation arrest. Accordingly, we showed that UNCX expression characterizes leukemia cells at their early stage of differentiation, mainly M2 and M3 subtypes carrying wild-type NPM1. We also observed that UNCX expression significantly associates with an increased frequency of acute promyelocytic leukemia with PML-RARA and AML with t(8;21)(q22;q22.1); RUNX1-RUNX1T1 classes, according to the World Health Organization disease classification. In summary, our findings suggest a novel leukemogenic role of UNCX, associated with epigenetic modifications and with impaired cell proliferation and differentiation in AML. PMID:28411256

  4. Epigenetically induced ectopic expression of UNCX impairs the proliferation and differentiation of myeloid cells.

    PubMed

    Daniele, Giulia; Simonetti, Giorgia; Fusilli, Caterina; Iacobucci, Ilaria; Lonoce, Angelo; Palazzo, Antonio; Lomiento, Mariana; Mammoli, Fabiana; Marsano, Renè Massimiliano; Marasco, Elena; Mantovani, Vilma; Quentmeier, Hilmar; Drexler, Hans G; Ding, Jie; Palumbo, Orazio; Carella, Massimo; Nadarajah, Niroshan; Perricone, Margherita; Ottaviani, Emanuela; Baldazzi, Carmen; Testoni, Nicoletta; Papayannidis, Cristina; Ferrari, Sergio; Mazza, Tommaso; Martinelli, Giovanni; Storlazzi, Clelia Tiziana

    2017-07-01

    We here describe a leukemogenic role of the homeobox gene UNCX , activated by epigenetic modifications in acute myeloid leukemia (AML). We found the ectopic activation of UNCX in a leukemia patient harboring a t(7;10)(p22;p14) translocation, in 22 of 61 of additional cases [a total of 23 positive patients out of 62 (37.1%)], and in 6 of 75 (8%) of AML cell lines. UNCX is embedded within a low-methylation region (canyon) and encodes for a transcription factor involved in somitogenesis and neurogenesis, with specific expression in the eye, brain, and kidney. UNCX expression turned out to be associated, and significantly correlated, with DNA methylation increase at its canyon borders based on data in our patients and in archived data of patients from The Cancer Genome Atlas. UNCX -positive and -negative patients displayed significant differences in their gene expression profiles. An enrichment of genes involved in cell proliferation and differentiation, such as MAP2K1 and CCNA1 , was revealed. Similar results were obtained in UNCX -transduced CD34 + cells, associated with low proliferation and differentiation arrest. Accordingly, we showed that UNCX expression characterizes leukemia cells at their early stage of differentiation, mainly M2 and M3 subtypes carrying wild-type NPM1 We also observed that UNCX expression significantly associates with an increased frequency of acute promyelocytic leukemia with PML-RARA and AML with t(8;21)(q22;q22.1); RUNX1-RUNX1T1 classes, according to the World Health Organization disease classification. In summary, our findings suggest a novel leukemogenic role of UNCX , associated with epigenetic modifications and with impaired cell proliferation and differentiation in AML. Copyright© 2017 Ferrata Storti Foundation.

  5. Comparative transcriptional profiling analysis of developing melon (Cucumis melo L.) fruit from climacteric and non-climacteric varieties.

    PubMed

    Saladié, Montserrat; Cañizares, Joaquin; Phillips, Michael A; Rodriguez-Concepcion, Manuel; Larrigaudière, Christian; Gibon, Yves; Stitt, Mark; Lunn, John Edward; Garcia-Mas, Jordi

    2015-06-09

    In climacteric fruit-bearing species, the onset of fruit ripening is marked by a transient rise in respiration rate and autocatalytic ethylene production, followed by rapid deterioration in fruit quality. In non-climacteric species, there is no increase in respiration or ethylene production at the beginning or during fruit ripening. Melon is unusual in having climacteric and non-climacteric varieties, providing an interesting model system to compare both ripening types. Transcriptomic analysis of developing melon fruits from Védrantais and Dulce (climacteric) and Piel de sapo and PI 161375 (non-climacteric) varieties was performed to understand the molecular mechanisms that differentiate the two fruit ripening types. Fruits were harvested at 15, 25, 35 days after pollination and at fruit maturity. Transcript profiling was performed using an oligo-based microarray with 75 K probes. Genes linked to characteristic traits of fruit ripening were differentially expressed between climacteric and non-climacteric types, as well as several transcription factor genes and genes encoding enzymes involved in sucrose catabolism. The expression patterns of some genes in PI 161375 fruits were either intermediate between. Piel de sapo and the climacteric varieties, or more similar to the latter. PI 161375 fruits also accumulated some carotenoids, a characteristic trait of climacteric varieties. Simultaneous changes in transcript abundance indicate that there is coordinated reprogramming of gene expression during fruit development and at the onset of ripening in both climacteric and non-climacteric fruits. The expression patterns of genes related to ethylene metabolism, carotenoid accumulation, cell wall integrity and transcriptional regulation varied between genotypes and was consistent with the differences in their fruit ripening characteristics. There were differences between climacteric and non-climacteric varieties in the expression of genes related to sugar metabolism suggesting that they may be potential determinants of sucrose content and post-harvest stability of sucrose levels in fruit. Several transcription factor genes were also identified that were differentially expressed in both types, implicating them in regulation of ripening behaviour. The intermediate nature of PI 161375 suggested that classification of melon fruit ripening behaviour into just two distinct types is an over-simplification, and that in reality there is a continuous spectrum of fruit ripening behaviour.

  6. Global analysis of gene expression in mineralizing fish vertebra-derived cell lines: new insights into anti-mineralogenic effect of vanadate

    PubMed Central

    2011-01-01

    Background Fish has been deemed suitable to study the complex mechanisms of vertebrate skeletogenesis and gilthead seabream (Sparus aurata), a marine teleost with acellular bone, has been successfully used in recent years to study the function and regulation of bone and cartilage related genes during development and in adult animals. Tools recently developed for gilthead seabream, e.g. mineralogenic cell lines and a 4 × 44K Agilent oligo-array, were used to identify molecular determinants of in vitro mineralization and genes involved in anti-mineralogenic action of vanadate. Results Global analysis of gene expression identified 4,223 and 4,147 genes differentially expressed (fold change - FC > 1.5) during in vitro mineralization of VSa13 (pre-chondrocyte) and VSa16 (pre-osteoblast) cells, respectively. Comparative analysis indicated that nearly 45% of these genes are common to both cell lines and gene ontology (GO) classification is also similar for both cell types. Up-regulated genes (FC > 10) were mainly associated with transport, matrix/membrane, metabolism and signaling, while down-regulated genes were mainly associated with metabolism, calcium binding, transport and signaling. Analysis of gene expression in proliferative and mineralizing cells exposed to vanadate revealed 1,779 and 1,136 differentially expressed genes, respectively. Of these genes, 67 exhibited reverse patterns of expression upon vanadate treatment during proliferation or mineralization. Conclusions Comparative analysis of expression data from fish and data available in the literature for mammalian cell systems (bone-derived cells undergoing differentiation) indicate that the same type of genes, and in some cases the same orthologs, are involved in mechanisms of in vitro mineralization, suggesting their conservation throughout vertebrate evolution and across cell types. Array technology also allowed identification of genes differentially expressed upon exposure of fish cell lines to vanadate and likely involved in its anti-mineralogenic activity. Many were found to be unknown or they were never associated to bone homeostasis previously, thus providing a set of potential candidates whose study will likely bring insights into the complex mechanisms of tissue mineralization and bone formation. PMID:21668972

  7. Identification, Classification and Differential Expression of Oleosin Genes in Tung Tree (Vernicia fordii)

    PubMed Central

    Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M.

    2014-01-01

    Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the “proline knot” motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1–3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants. PMID:24516650

  8. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii).

    PubMed

    Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M

    2014-01-01

    Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.

  9. Genome-wide analysis of the basic leucine zipper (bZIP) transcription factor gene family in six legume genomes.

    PubMed

    Wang, Zhihui; Cheng, Ke; Wan, Liyun; Yan, Liying; Jiang, Huifang; Liu, Shengyi; Lei, Yong; Liao, Boshou

    2015-12-10

    Plant bZIP proteins characteristically harbor a highly conserved bZIP domain with two structural features: a DNA-binding basic region and a leucine (Leu) zipper dimerization region. They have been shown to be diverse transcriptional regulators, playing crucial roles in plant development, physiological processes, and biotic/abiotic stress responses. Despite the availability of six completely sequenced legume genomes, a comprehensive investigation of bZIP family members in legumes has yet to be presented. In this study, we identified 428 bZIP genes encoding 585 distinct proteins in six legumes, Glycine max, Medicago truncatula, Phaseolus vulgaris, Cicer arietinum, Cajanus cajan, and Lotus japonicus. The legume bZIP genes were categorized into 11 groups according to their phylogenetic relationships with genes from Arabidopsis. Four kinds of intron patterns (a-d) within the basic and hinge regions were defined and additional conserved motifs were identified, both presenting high group specificity and supporting the group classification. We predicted the DNA-binding patterns and the dimerization properties, based on the characteristic features in the basic and hinge regions and the Leu zipper, respectively, which indicated that some highly conserved amino acid residues existed across each major group. The chromosome distribution and analysis for WGD-derived duplicated blocks revealed that the legume bZIP genes have expanded mainly by segmental duplication rather than tandem duplication. Expression data further revealed that the legume bZIP genes were expressed constitutively or in an organ-specific, development-dependent manner playing roles in multiple seed developmental stages and tissues. We also detected several key legume bZIP genes involved in drought- and salt-responses by comparing fold changes of expression values in drought-stressed or salt-stressed roots and leaves. In summary, this genome-wide identification, characterization and expression analysis of legume bZIP genes provides valuable information for understanding the molecular functions and evolution of the legume bZIP transcription factor family, and highlights potential legume bZIP genes involved in regulating tissue development and abiotic stress responses.

  10. Transcriptome analysis of phosphorus stress responsiveness in the seedlings of Dongxiang wild rice (Oryza rufipogon Griff.).

    PubMed

    Deng, Qian-Wen; Luo, Xiang-Dong; Chen, Ya-Ling; Zhou, Yi; Zhang, Fan-Tao; Hu, Biao-Lin; Xie, Jian-Kun

    2018-03-15

    Low phosphorus availability is a major factor restricting rice growth. Dongxiang wild rice (Oryza rufipogon Griff.) has many useful genes lacking in cultivated rice, including stress resistance to phosphorus deficiency, cold, salt and drought, which is considered to be a precious germplasm resource for rice breeding. However, the molecular mechanism of regulation of phosphorus deficiency tolerance is not clear. In this study, cDNA libraries were constructed from the leaf and root tissues of phosphorus stressed and untreated Dongxiang wild rice seedlings, and transcriptome sequencing was performed with the goal of elucidating the molecular mechanisms involved in phosphorus stress response. The results indicated that 1184 transcripts were differentially expressed in the leaves (323 up-regulated and 861 down-regulated) and 986 transcripts were differentially expressed in the roots (756 up-regulated and 230 down-regulated). 43 genes were up-regulated both in leaves and roots, 38 genes were up-regulated in roots but down-regulated in leaves, and only 2 genes were down-regulated in roots but up-regulated in leaves. Among these differentially expressed genes, the detection of many transcription factors and functional genes demonstrated that multiple regulatory pathways were involved in phosphorus deficiency tolerance. Meanwhile, the differentially expressed genes were also annotated with gene ontology terms and key pathways via functional classification and Kyoto Encyclopedia of Gene and Genomes pathway mapping, respectively. A set of the most important candidate genes was then identified by combining the differentially expressed genes found in the present study with previously identified phosphorus deficiency tolerance quantitative trait loci. The present work provides abundant genomic information for functional dissection of the phosphorus deficiency resistance of Dongxiang wild rice, which will be help to understand the biological regulatory mechanisms of phosphorus deficiency tolerance in Dongxiang wild rice.

  11. Supervised group Lasso with applications to microarray data analysis

    PubMed Central

    Ma, Shuangge; Song, Xiao; Huang, Jian

    2007-01-01

    Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436

  12. Genome-Wide Classification and Evolutionary and Expression Analyses of Citrus MYB Transcription Factor Families in Sweet Orange

    PubMed Central

    Hou, Xiao-Jin; Li, Si-Bei; Liu, Sheng-Rui; Hu, Chun-Gen; Zhang, Jin-Zhi

    2014-01-01

    MYB family genes are widely distributed in plants and comprise one of the largest transcription factors involved in various developmental processes and defense responses of plants. To date, few MYB genes and little expression profiling have been reported for citrus. Here, we describe and classify 177 members of the sweet orange MYB gene (CsMYB) family in terms of their genomic gene structures and similarity to their putative Arabidopsis orthologs. According to these analyses, these CsMYBs were categorized into four groups (4R-MYB, 3R-MYB, 2R-MYB and 1R-MYB). Gene structure analysis revealed that 1R-MYB genes possess relatively more introns as compared with 2R-MYB genes. Investigation of their chromosomal localizations revealed that these CsMYBs are distributed across nine chromosomes. Sweet orange includes a relatively small number of MYB genes compared with the 198 members in Arabidopsis, presumably due to a paralog reduction related to repetitive sequence insertion into promoter and non-coding transcribed region of the genes. Comparative studies of CsMYBs and Arabidopsis showed that CsMYBs had fewer gene duplication events. Expression analysis revealed that the MYB gene family has a wide expression profile in sweet orange development and plays important roles in development and stress responses. In addition, 337 new putative microsatellites with flanking sequences sufficient for primer design were also identified from the 177 CsMYBs. These results provide a useful reference for the selection of candidate MYB genes for cloning and further functional analysis forcitrus. PMID:25375352

  13. The endogenous and reactive depression subtypes revisited: integrative animal and human studies implicate multiple distinct molecular mechanisms underlying major depressive disorder.

    PubMed

    Malki, Karim; Keers, Robert; Tosto, Maria Grazia; Lourdusamy, Anbarasu; Carboni, Lucia; Domenici, Enrico; Uher, Rudolf; McGuffin, Peter; Schalkwyk, Leonard C

    2014-05-07

    Traditional diagnoses of major depressive disorder (MDD) suggested that the presence or absence of stress prior to onset results in either 'reactive' or 'endogenous' subtypes of the disorder, respectively. Several lines of research suggest that the biological underpinnings of 'reactive' or 'endogenous' subtypes may also differ, resulting in differential response to treatment. We investigated this hypothesis by comparing the gene-expression profiles of three animal models of 'reactive' and 'endogenous' depression. We then translated these findings to clinical samples using a human post-mortem mRNA study. Affymetrix mouse whole-genome oligonucleotide arrays were used to measure gene expression from hippocampal tissues of 144 mice from the Genome-based Therapeutic Drugs for Depression (GENDEP) project. The study used four inbred mouse strains and two depressogenic 'stress' protocols (maternal separation and Unpredictable Chronic Mild Stress) to model 'reactive' depression. Stress-related mRNA differences in mouse were compared with a parallel mRNA study using Flinders Sensitive and Resistant rat lines as a model of 'endogenous' depression. Convergent genes differentially expressed across the animal studies were used to inform candidate gene selection in a human mRNA post-mortem case control study from the Stanley Brain Consortium. In the mouse 'reactive' model, the expression of 350 genes changed in response to early stresses and 370 in response to late stresses. A minimal genetic overlap (less than 8.8%) was detected in response to both stress protocols, but 30% of these genes (21) were also differentially regulated in the 'endogenous' rat study. This overlap is significantly greater than expected by chance. The VAMP-2 gene, differentially expressed across the rodent studies, was also significantly altered in the human study after correcting for multiple testing. Our results suggest that 'endogenous' and 'reactive' subtypes of depression are associated with largely distinct changes in gene-expression. However, they also suggest that the molecular signature of 'reactive' depression caused by early stressors differs considerably from that of 'reactive' depression caused by late stressors. A small set of genes was consistently dysregulated across each paradigm and in post-mortem brain tissue of depressed patients suggesting a final common pathway to the disorder. These genes included the VAMP-2 gene, which has previously been associated with Axis-I disorders including MDD, bipolar depression, schizophrenia and with antidepressant treatment response. We also discuss the implications of our findings for disease classification, personalized medicine and case-control studies of MDD.

  14. Nonlinear programming for classification problems in machine learning

    NASA Astrophysics Data System (ADS)

    Astorino, Annabella; Fuduli, Antonio; Gaudioso, Manlio

    2016-10-01

    We survey some nonlinear models for classification problems arising in machine learning. In the last years this field has become more and more relevant due to a lot of practical applications, such as text and web classification, object recognition in machine vision, gene expression profile analysis, DNA and protein analysis, medical diagnosis, customer profiling etc. Classification deals with separation of sets by means of appropriate separation surfaces, which is generally obtained by solving a numerical optimization model. While linear separability is the basis of the most popular approach to classification, the Support Vector Machine (SVM), in the recent years using nonlinear separating surfaces has received some attention. The objective of this work is to recall some of such proposals, mainly in terms of the numerical optimization models. In particular we tackle the polyhedral, ellipsoidal, spherical and conical separation approaches and, for some of them, we also consider the semisupervised versions.

  15. Integrative Genomic Analysis of Cholangiocarcinoma Identifies Distinct IDH-Mutant Molecular Profiles.

    PubMed

    Farshidfar, Farshad; Zheng, Siyuan; Gingras, Marie-Claude; Newton, Yulia; Shih, Juliann; Robertson, A Gordon; Hinoue, Toshinori; Hoadley, Katherine A; Gibb, Ewan A; Roszik, Jason; Covington, Kyle R; Wu, Chia-Chin; Shinbrot, Eve; Stransky, Nicolas; Hegde, Apurva; Yang, Ju Dong; Reznik, Ed; Sadeghi, Sara; Pedamallu, Chandra Sekhar; Ojesina, Akinyemi I; Hess, Julian M; Auman, J Todd; Rhie, Suhn K; Bowlby, Reanne; Borad, Mitesh J; Zhu, Andrew X; Stuart, Josh M; Sander, Chris; Akbani, Rehan; Cherniack, Andrew D; Deshpande, Vikram; Mounajjed, Taofic; Foo, Wai Chin; Torbenson, Michael S; Kleiner, David E; Laird, Peter W; Wheeler, David A; McRee, Autumn J; Bathe, Oliver F; Andersen, Jesper B; Bardeesy, Nabeel; Roberts, Lewis R; Kwong, Lawrence N

    2017-03-14

    Cholangiocarcinoma (CCA) is an aggressive malignancy of the bile ducts, with poor prognosis and limited treatment options. Here, we describe the integrated analysis of somatic mutations, RNA expression, copy number, and DNA methylation by The Cancer Genome Atlas of a set of predominantly intrahepatic CCA cases and propose a molecular classification scheme. We identified an IDH mutant-enriched subtype with distinct molecular features including low expression of chromatin modifiers, elevated expression of mitochondrial genes, and increased mitochondrial DNA copy number. Leveraging the multi-platform data, we observed that ARID1A exhibited DNA hypermethylation and decreased expression in the IDH mutant subtype. More broadly, we found that IDH mutations are associated with an expanded histological spectrum of liver tumors with molecular features that stratify with CCA. Our studies reveal insights into the molecular pathogenesis and heterogeneity of cholangiocarcinoma and provide classification information of potential therapeutic significance. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Molecular Diagnostics in Colorectal Carcinoma: Advances and Applications for 2018.

    PubMed

    Bhalla, Amarpreet; Zulfiqar, Muhammad; Bluth, Martin H

    2018-06-01

    The molecular pathogenesis and classification of colorectal carcinoma are based on the traditional adenomaecarcinoma sequence, serrated polyp pathway, and microsatellite instability (MSI). The genetic basis for hereditary nonpolyposis colorectal cancer is the detection of mutations in the MLH1, MSH2, MSH6, PMS2, and EPCAM genes. Genetic testing for Lynch syndrome includes MSI testing, methylator phenotype testing, BRAF mutation testing, and molecular testing for germline mutations in MMR genes. Molecular makers with predictive and prognostic implications include quantitative multigene reverse transcriptase polymerase chain reaction assay and KRAS and BRAF mutation analysis. Mismatch repair-deficient tumors have higher rates of programmed death-ligand 1 expression. Cell-free DNA analysis in fluids are proving beneficial for diagnosis and prognosis in these disease states towards effective patient management. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. The chemiluminescence based Ziplex automated workstation focus array reproduces ovarian cancer Affymetrix GeneChip expression profiles.

    PubMed

    Quinn, Michael C J; Wilson, Daniel J; Young, Fiona; Dempsey, Adam A; Arcand, Suzanna L; Birch, Ashley H; Wojnarowicz, Paulina M; Provencher, Diane; Mes-Masson, Anne-Marie; Englert, David; Tonin, Patricia N

    2009-07-06

    As gene expression signatures may serve as biomarkers, there is a need to develop technologies based on mRNA expression patterns that are adaptable for translational research. Xceed Molecular has recently developed a Ziplex technology, that can assay for gene expression of a discrete number of genes as a focused array. The present study has evaluated the reproducibility of the Ziplex system as applied to ovarian cancer research of genes shown to exhibit distinct expression profiles initially assessed by Affymetrix GeneChip analyses. The new chemiluminescence-based Ziplex gene expression array technology was evaluated for the expression of 93 genes selected based on their Affymetrix GeneChip profiles as applied to ovarian cancer research. Probe design was based on the Affymetrix target sequence that favors the 3' UTR of transcripts in order to maximize reproducibility across platforms. Gene expression analysis was performed using the Ziplex Automated Workstation. Statistical analyses were performed to evaluate reproducibility of both the magnitude of expression and differences between normal and tumor samples by correlation analyses, fold change differences and statistical significance testing. Expressions of 82 of 93 (88.2%) genes were highly correlated (p < 0.01) in a comparison of the two platforms. Overall, 75 of 93 (80.6%) genes exhibited consistent results in normal versus tumor tissue comparisons for both platforms (p < 0.001). The fold change differences were concordant for 87 of 93 (94%) genes, where there was agreement between the platforms regarding statistical significance for 71 (76%) of 87 genes. There was a strong agreement between the two platforms as shown by comparisons of log2 fold differences of gene expression between tumor versus normal samples (R = 0.93) and by Bland-Altman analysis, where greater than 90% of expression values fell within the 95% limits of agreement. Overall concordance of gene expression patterns based on correlations, statistical significance between tumor and normal ovary data, and fold changes was consistent between the Ziplex and Affymetrix platforms. The reproducibility and ease-of-use of the technology suggests that the Ziplex array is a suitable platform for translational research.

  18. Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data.

    PubMed

    Shah, M; Marchand, M; Corbeil, J

    2012-01-01

    One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of the well-known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with a much smaller number of genes while giving competitive classification accuracy but also having tight risk guarantees on future performance, unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.

  19. Gene function prediction based on the Gene Ontology hierarchical structure.

    PubMed

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  20. Upregulation of the ESR1 Gene and ESR Ratio (ESR1/ESR2) is Associated with a Worse Prognosis in Papillary Thyroid Carcinoma: The Impact of the Estrogen Receptor α/β Expression on Clinical Outcomes in Papillary Thyroid Carcinoma Patients.

    PubMed

    Yi, Jin Wook; Kim, Su-Jin; Kim, Jong Kyu; Seong, Chan Yong; Yu, Hyeong Won; Chai, Young Jun; Choi, June Young; Lee, Kyu Eun

    2017-11-01

    A gender disparity exists with respect to the incidence of papillary thyroid cancer (PTC), suggesting that sex hormones such as estrogen play a role in PTC development and progression. In this study, we compared estrogen receptor gene expression patterns in PTCs to determine the clinical significance of estrogen gene expression in PTC. We analyzed ESR1 and ESR2 messenger RNA expression counts using data from The Cancer Genome Atlas (TCGA). To validate the results of TCGA analysis, we analyzed microarray data (GSE 54958) from the Gene Expression Omnibus. ESR1 gene expression and ESR ratio (ESR1/ESR2) were significantly higher in PTC tissues than in paired normal thyroid tissues (mean 659.427 vs. 264.045 for ESR1, 92.017 vs. 19.064 for ESR ratio). Among female patients, ESR1 expression and ESR ratio were negatively correlated with increased age. ESR1 expression and ESR ratio were higher in patients with classic PTC, lymphovascular invasion, BRAF V600E mutation, and radioiodine therapy. Classification analysis demonstrated that higher ESR1 expression and a higher ESR ratio faced a worse overall survival (hazard ratio 6.348 for ESR1, 4.031 for ESR ratio). Validation microarray analysis demonstrated that ESR1 expression and ESR ratio were higher in tumor tissues, classic PTC, and BRAF V600E . Higher ESR1 expression and a higher ESR ratio were associated with aggressive prognostic factors and worse overall survival in female PTC patients. Our results suggest that ESR1 and ESR ratio can be used as prognostic markers to predict female patient survival and have potential as a therapeutic target.

  1. Identification of light-harvesting chlorophyll a/b-binding protein genes of Zostera marina L. and their expression under different environmental conditions

    NASA Astrophysics Data System (ADS)

    Kong, Fanna; Zhou, Yang; Sun, Peipei; Cao, Min; Li, Hong; Mao, Yunxiang

    2016-02-01

    Photosynthesis includes the collection of light and the transfer of solar energy using light-harvesting chlorophyll a/b-binding (LHC) proteins. In high plants, the LHC gene family includes LHCA and LHCB sub-families, which encode proteins constituting the light-harvesting complex of photosystems I and II. Zostera marina L. is a monocotyledonous angiosperm and inhabits submerged marine environments rather than land environments. We characterized the Lhca and Lhcb gene families of Z. marina from the expressed sequence tags (EST) database. In total, 13 unigenes were annotated as ZmLhc, 6 in Lhca family and 7 in ZmLhcb family. ZmLHCA and ZmLHCB contained the conservative LHC motifs and amino acid residues binding chlorophyll. The average similarity among mature ZmLHCA and ZmLHCB was 48.91% and 48.66%, respectively, which indicated a high degree of divergence within ZmLHChc gene family. The reconstructed phylogenetic tree showed that the tree topology and phylogenetic relationship were similar to those reported in other high plants, suggesting that the Lhc genes were highly conservative and the classification of ZmLhc genes was consistent with the evolutionary position of Z. marina. Real-time reverse transcription (RT) PCR analysis showed that different members of ZmLhca and ZmLhcb responded to a stress in different expression patterns. Salinity, temperature, light intensity and light quality may affect the expression of most ZmLhca and ZmLhcb genes. Inorganic carbon concentration and acidity had no obvious effect on ZmLhca and ZmLhcb gene expression, except for ZmLhca6.

  2. New workflow for classification of genetic variants' pathogenicity applied to hereditary recurrent fevers by the International Study Group for Systemic Autoinflammatory Diseases (INSAID).

    PubMed

    Van Gijn, Marielle E; Ceccherini, Isabella; Shinar, Yael; Carbo, Ellen C; Slofstra, Mariska; Arostegui, Juan I; Sarrabay, Guillaume; Rowczenio, Dorota; Omoyımnı, Ebun; Balci-Peynircioglu, Banu; Hoffman, Hal M; Milhavet, Florian; Swertz, Morris A; Touitou, Isabelle

    2018-03-29

    Hereditary recurrent fevers (HRFs) are rare inflammatory diseases sharing similar clinical symptoms and effectively treated with anti-inflammatory biological drugs. Accurate diagnosis of HRF relies heavily on genetic testing. This study aimed to obtain an experts' consensus on the clinical significance of gene variants in four well-known HRF genes: MEFV , TNFRSF1A , NLRP3 and MVK . We configured a MOLGENIS web platform to share and analyse pathogenicity classifications of the variants and to manage a consensus-based classification process. Four experts in HRF genetics submitted independent classifications of 858 variants. Classifications were driven to consensus by recruiting four more expert opinions and by targeting discordant classifications in five iterative rounds. Consensus classification was reached for 804/858 variants (94%). None of the unsolved variants (6%) remained with opposite classifications (eg, pathogenic vs benign). New mutational hotspots were found in all genes. We noted a lower pathogenic variant load and a higher fraction of variants with unknown or unsolved clinical significance in the MEFV gene. Applying a consensus-driven process on the pathogenicity assessment of experts yielded rapid classification of almost all variants of four HRF genes. The high-throughput database will profoundly assist clinicians and geneticists in the diagnosis of HRFs. The configured MOLGENIS platform and consensus evolution protocol are usable for assembly of other variant pathogenicity databases. The MOLGENIS software is available for reuse at http://github.com/molgenis/molgenis; the specific HRF configuration is available at http://molgenis.org/said/. The HRF pathogenicity classifications will be published on the INFEVERS database at https://fmf.igh.cnrs.fr/ISSAID/infevers/. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  3. Targeting Tumor Oct4 to Deplete Prostate Tumor- and Metastasis-Initiating Cells

    DTIC Science & Technology

    2015-10-01

    and stem cell To investigate whether POU5F1B overrxpression can induce cancer stem cell -related genes expression, we did cancer stem cell ...future 15. SUBJECT TERMS OCT4, cancer stem cells , prostate cancer, metastasis, tumor formation 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT...described in last report. Here we describe some findings previously not reported. 1.1 POU5F1B expression in prostatic tissue As cancer stem cell marker

  4. ICBP90 Regulation of DNA Methylation, Histone Ubiquitination, and Tumor Suppressor Gene Expression in Breast Cancer Cells

    DTIC Science & Technology

    2013-09-01

    accomplishments include creation of relevant plant lines, development of in vitro assays, and profiling of mRNA expression in null mutants. 15. SUBJECT TERMS...DNA methylation, UHRF1, VIM1, ubiquitination, epigenetics, chromatin 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF...Molecular Basis of Human Disease ,” which covered several weeks’ worth of material specifically related to the molecular and epigenetic basis of cancer

  5. Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons.

    PubMed

    Smid, Marcel; Coebergh van den Braak, Robert R J; van de Werken, Harmen J G; van Riet, Job; van Galen, Anne; de Weerd, Vanja; van der Vlugt-Daane, Michelle; Bril, Sandra I; Lalmahomed, Zarina S; Kloosterman, Wigard P; Wilting, Saskia M; Foekens, John A; IJzermans, Jan N M; Martens, John W M; Sieuwerts, Anieta M

    2018-06-22

    Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data. We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality. We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain.

  6. A novel method to identify pathways associated with renal cell carcinoma based on a gene co-expression network

    PubMed Central

    RUAN, XIYUN; LI, HONGYUN; LIU, BO; CHEN, JIE; ZHANG, SHIBAO; SUN, ZEQIANG; LIU, SHUANGQING; SUN, FAHAI; LIU, QINGYONG

    2015-01-01

    The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson’s correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson’s correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425

  7. Role of CACNA1C gene polymorphisms and protein expressions in the pathogenesis of schizophrenia: a case-control study in a Chinese population.

    PubMed

    Zhang, Sheng-Yu; Hu, Qiang; Tang, Tao; Liu, Chao; Li, Cheng-Chong; Yang, Xiao-Guang; Zang, Yin-Yin; Cai, Wei-Xiong

    2017-08-01

    The study aimed to investigate the correlations of CACNA1C genetic polymorphisms and protein expression with the pathogenesis of schizophrenia in a Chinese population. This research included 139 patients diagnosed with schizophrenia (case group) and 141 healthy volunteers (control group). Case and control samples were genotyped using denaturing high-performance liquid chromatography (DHPLC). Haplotypes of rs10848683, rs2238032, and rs2299661 were analyzed using the Shesis software. A mouse model of schizophrenia was established and assigned to test and blank groups. Western blotting was used to detect CACNA1C protein expression. The genotype and allele distribution of rs2238032 and rs2299661 differed between the case and control groups. TT genotype of rs2238032 and G allele of rs2299661 could potentially reduce the risk of schizophrenia. The distribution of rs2238032 genotype has a close connection with cognitive disturbance and the results of the general psychopathology classification exam. The distribution of rs2299661 genotypes was closely related to sensory and perceptual disorders, negative symptom subscales, and the results of the general psychopathology classification exam. CTC haplotype increased and CTG decreased the risk of schizophrenia in healthy people. In the brain tissues of mice with schizophrenia, the CACNA1C protein expression was higher in the test group than in the blank group. Our study demonstrated that CACNA1C gene polymorphisms and CACNA1C protein expression were associated with schizophrenia and its clinical phenotypes.

  8. Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data.

    PubMed

    Lee, Hyeonjeong; Shin, Miyoung

    2017-01-01

    The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.

  9. The ESR1 and GPX1 gene expression level in human malignant and non-malignant breast tissues.

    PubMed

    Król, Magdalena B; Galicki, Michał; Grešner, Peter; Wieczorek, Edyta; Jabłońska, Ewa; Reszka, Edyta; Morawiec, Zbigniew; Wąsowicz, Wojciech; Gromadzińska, Jolanta

    2018-01-01

    The aim of this study was to establish whether the gene expression of estrogen receptor alpha (encoded by ESR1) correlates with the expression of glutathione peroxidase 1 (encoded by GPX1) in the tumor and adjacent tumor-free breast tissue, and whether this correlation is affected by breast cancer. Such relationships may give further insights into breast cancer pathology with respect to the status of estrogen receptor. We used the quantitative real-time PCR technique to analyze differences in the expression levels of the ESR1 and GPX1 genes in paired malignant and non-malignant tissues from breast cancer patients. ESR1 and GPX1 expression levels were found to be significantly down-regulated by 14.7% and 7.4% (respectively) in the tumorous breast tissue when compared to the non-malignant one. Down-regulation of these genes was independent of the tumor histopathology classification and clinicopathological factors, while the ESR1 mRNA level was reduced with increasing tumor grade (G1: 103% vs. G2: 85.8% vs. G3: 84.5%; p<0.05). In the non-malignant and malignant breast tissues, the expression levels of ESR1 and GPX1 were significantly correlated with each other (Rs=0.450 and Rs=0.360; respectively). Our data suggest that down-regulation of ESR1 and GPX1 was independent of clinicopathological factors. Down-regulation of ESR1 gene expression was enhanced by the development of the disease. Moreover, GPX1 and ESR1 gene expression was interdependent in the malignant breast tissue and further work is needed to determine the mechanism underlying this relationship.

  10. Functional dissection of drought-responsive gene expression patterns in Cynodon dactylon L.

    PubMed

    Kim, Changsoo; Lemke, Cornelia; Paterson, Andrew H

    2009-05-01

    Water deficit is one of the main abiotic factors that affect plant productivity in subtropical regions. To identify genes induced during the water stress response in Bermudagrass (Cynodon dactylon), cDNA macroarrays were used. The macroarray analysis identified 189 drought-responsive candidate genes from C. dactylon, of which 120 were up-regulated and 69 were down-regulated. The candidate genes were classified into seven groups by cluster analysis of expression levels across two intensities and three durations of imposed stress. Annotation using BLASTX suggested that up-regulated genes may be involved in proline biosynthesis, signal transduction pathways, protein repair systems, and removal of toxins, while down-regulated genes were mostly related to basic plant metabolism such as photosynthesis and glycolysis. The functional classification of gene ontology (GO) was consistent with the BLASTX results, also suggesting some crosstalk between abiotic and biotic stress. Comparative analysis of cis-regulatory elements from the candidate genes implicated specific elements in drought response in Bermudagrass. Although only a subset of genes was studied, Bermudagrass shared many drought-responsive genes and cis-regulatory elements with other botanical models, supporting a strategy of cross-taxon application of drought-responsive genes, regulatory cues, and physiological-genetic information.

  11. Prognostication in eye cancer: the latest tumor, node, metastasis classification and beyond

    PubMed Central

    Kivelä, T; Kujala, E

    2013-01-01

    The tumour, node, metastasis (TNM) classification is a universal cancer staging system, which has been used for five decades. The current seventh edition became effective in 2010 and covers six ophthalmic sites: eyelids, conjunctiva, uvea, retina, orbit, and lacrimal gland; and five cancer types: carcinoma, sarcoma, melanoma, retinoblastoma, and lymphoma. The TNM categories are based on the anatomic extent of the primary tumour (T), regional lymph node metastases (N), and systemic metastases (M). The T categories of ophthalmic cancers are based on the size of the primary tumour and any invasion of periocular structures. The anatomic category is used to determine the TNM stage that correlates with survival. Such staging is currently implemented only for carcinoma of the eyelid and melanoma of the uvea. The classification of ciliary body and choroidal melanoma is the only one based on clinical evidence so far: a database of 7369 patients analysed by the European Ophthalmic Oncology Group. It spans a prognosis from 96% 5-year survival for stage I to 97% 5-year mortality for stage IV. The most accurate criterion for prognostication in uveal melanoma is, however, analysis of chromosomal alterations and gene expression. When such data are available, the TNM stage may be used for further stratification. Prognosis in retinoblastoma is frequently assigned by using an international classification, which predicts conservation of the eye and vision, and an international staging separate from the TNM system, which predicts survival. The TNM cancer staging manual is a useful tool for all ophthalmologists managing eye cancer. PMID:23258307

  12. Nutritional and reproductive signaling revealed by comparative gene expression analysis in Chrysopa pallens (Rambur) at different nutritional statuses

    PubMed Central

    Han, Benfeng; Zhang, Shen; Zeng, Fanrong; Mao, Jianjun

    2017-01-01

    Background The green lacewing, Chrysopa pallens Rambur, is one of the most important natural predators because of its extensive spectrum of prey and wide distribution. However, what we know about the nutritional and reproductive physiology of this species is very scarce. Results By cDNA amplification and Illumina short-read sequencing, we analyzed transcriptomes of C. pallens female adult under starved and fed conditions. In total, 71236 unigenes were obtained with an average length of 833 bp. Four vitellogenins, three insulin-like peptides and two insulin receptors were annotated. Comparison of gene expression profiles suggested that totally 1501 genes were differentially expressed between the two nutritional statuses. KEGG orthology classification showed that these differentially expression genes (DEGs) were mapped to 241 pathways. In turn, the top 4 are ribosome, protein processing in endoplasmic reticulum, biosynthesis of amino acids and carbon metabolism, indicating a distinct difference in nutritional and reproductive signaling between the two feeding conditions. Conclusions Our study yielded large-scale molecular information relevant to C. pallens nutritional and reproductive signaling, which will contribute to mass rearing and commercial use of this predaceous insect species. PMID:28683101

  13. Nutritional and reproductive signaling revealed by comparative gene expression analysis in Chrysopa pallens (Rambur) at different nutritional statuses.

    PubMed

    Han, Benfeng; Zhang, Shen; Zeng, Fanrong; Mao, Jianjun

    2017-01-01

    The green lacewing, Chrysopa pallens Rambur, is one of the most important natural predators because of its extensive spectrum of prey and wide distribution. However, what we know about the nutritional and reproductive physiology of this species is very scarce. By cDNA amplification and Illumina short-read sequencing, we analyzed transcriptomes of C. pallens female adult under starved and fed conditions. In total, 71236 unigenes were obtained with an average length of 833 bp. Four vitellogenins, three insulin-like peptides and two insulin receptors were annotated. Comparison of gene expression profiles suggested that totally 1501 genes were differentially expressed between the two nutritional statuses. KEGG orthology classification showed that these differentially expression genes (DEGs) were mapped to 241 pathways. In turn, the top 4 are ribosome, protein processing in endoplasmic reticulum, biosynthesis of amino acids and carbon metabolism, indicating a distinct difference in nutritional and reproductive signaling between the two feeding conditions. Our study yielded large-scale molecular information relevant to C. pallens nutritional and reproductive signaling, which will contribute to mass rearing and commercial use of this predaceous insect species.

  14. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii)

    USDA-ARS?s Scientific Manuscript database

    Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii),...

  15. Interpretable Early Classification of Multivariate Time Series

    ERIC Educational Resources Information Center

    Ghalwash, Mohamed F.

    2013-01-01

    Recent advances in technology have led to an explosion in data collection over time rather than in a single snapshot. For example, microarray technology allows us to measure gene expression levels in different conditions over time. Such temporal data grants the opportunity for data miners to develop algorithms to address domain-related problems,…

  16. A mathematical model of in vivo bovine blastocyst developmental to gestational Day 15.

    PubMed

    Shorten, P R; Donnison, M; McDonald, R M; Meier, S; Ledgard, A M; Berg, D

    2018-06-20

    Bovine embryo growth involves a complex interaction between the developing embryo and the growth-promoting potential of the uterine environment. We have previously established links between embryonic factors (embryo stage, embryo gene expression), maternal factors (progesterone, body condition score), and embryonic growth to 8 d after bulk transfer of Day 7 in vitro-produced blastocysts. In this study we recovered blastocysts on Days 7 and 15 after artificial insemination to test the hypothesis that in vivo and in vitro embryos follow a similar growth program. We conducted our study using 4 commercial farms and repeated our study over 2 yr (2014, 2015), with data available from 2 of the 4 farms in the second year. Morphological and gene expression measurements (196 candidate genes) of the Day 7 embryos were measured and the progesterone concentration of the cows were measured throughout the reproductive cycle as a reflection of the state of the uterine environment. These data were also used to assess the interaction between the uterine environment and the developing embryo and to examine how well Day 7 embryo stage can be predicted from the Day 7 gene expression profile. Progesterone was not a strong predictor of in vivo embryo growth to Day 15. This contrasts with a range of Day 7 embryo transfer studies which demonstrated that progesterone is a very good predictor of embryo growth to Day 15. Our analysis demonstrates that in vivo embryos are 3 times less sensitive to progesterone than in vitro-transferred embryos (up to Day 15). This highlights that caution must be applied when extrapolating the results of in vitro embryo transfer studies to the in vivo situation. The similar variance in measured and predicted (based on Day 15 length) Day 7 embryo stage indicate low stochastic perturbations for in vivo embryo growth (large stochastic growth effects would generate a significantly larger standard deviation in measured embryo length on Day 15). We also identified that Day 7 embryo stage could be predicted based on the Day 7 gene expression profile (58% overall success rate for classification of 5 embryo stages). Our analysis also associated genes with each developmental stage and demonstrates the high level of temporal regulation of genes that occurs during early embryonic development. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  17. Expression of SLP-2 gene and CCBE1 are associated with prognosis of rectal cancer.

    PubMed

    Zhang, L; Liu, F-J

    2017-03-01

    This study aims to investigate the clinical significance of SLP-2 gene for patients with rectal cancer. To analyze the effect of CCBE1 (Collagen and calcium-binding EGF domain-containing protein 1) on rectal cancer tissue and lymph vessels of para-carcinoma tissue. A total of 50 samples of rectal cancer tissues were enrolled in the experimental group, confirmed by pathological examination. 50 samples of para-carcinoma normal tissues were collected as control group. Protein expression of SLP-2 and CCBE1 was examined with immunohistochemical staining. mRNA expression of SLP-2 was examined with RT-PCR. Lymphatic vessel density (LVD) was evaluated with LYVE-1 immunohistochemical staining. Correlation analysis was performed to assess the relationship between patient survival data and clinical pathological features of rectal cancer. Immunohistochemical staining showed that, compared with the control group, a positive expression rate of SLP-2 in the experimental group was significantly higher (68.0% vs. 24.0%, p<0.05), and mRNA of SLP-2 was also significantly increased (p<0.05). Compared with the control group, protein expression of CCBE1 in the experimental group was significantly higher (p<0.05). Moreover, the expression level of SLP-2 was remarkably associated with TNM classification and lymphatic metastasis. Further analysis demonstrated that a positive expression of CCBE1 was associated with lymphatic metastasis, LVD and Ducks classification, and had a negative correlation with survival rate. Increased expression of SLP-2 promoted the formation of lymph vessels and exacerbated lymphatic metastasis of rectal cancer via up-regulating CCBE1. As a risk factor related to lymphatic metastasis, CCBE1 could be a novel biomarker for diagnosis and prognosis of rectal cancer.

  18. Regulatory Role of Circular RNAs and Neurological Disorders.

    PubMed

    Floris, Gabriele; Zhang, Longbin; Follesa, Paolo; Sun, Tao

    2017-09-01

    Circular RNAs (circRNAs) are a class of long noncoding RNAs that are characterized by the presence of covalently linked ends and have been found in all life kingdoms. Exciting studies in regulatory roles of circRNAs are emerging. Here, we summarize classification, characteristics, biogenesis, and regulatory functions of circRNAs. CircRNAs are found to be preferentially expressed along neural genes and in neural tissues. We thus highlight the association of circRNA dysregulation with neurodegenerative diseases such as Alzheimer's disease. Investigation of regulatory role of circRNAs will shed novel light in gene expression mechanisms during development and under disease conditions and may identify circRNAs as new biomarkers for aging and neurodegenerative disorders.

  19. Visual gene-network analysis reveals the cancer gene co-expression in human endometrial cancer

    PubMed Central

    2014-01-01

    Background Endometrial cancers (ECs) are the most common form of gynecologic malignancy. Recent studies have reported that ECs reveal distinct markers for molecular pathogenesis, which in turn is linked to the various histological types of ECs. To understand further the molecular events contributing to ECs and endometrial tumorigenesis in general, a more precise identification of cancer-associated molecules and signaling networks would be useful for the detection and monitoring of malignancy, improving clinical cancer therapy, and personalization of treatments. Results ECs-specific gene co-expression networks were constructed by differential expression analysis and weighted gene co-expression network analysis (WGCNA). Important pathways and putative cancer hub genes contribution to tumorigenesis of ECs were identified. An elastic-net regularized classification model was built using the cancer hub gene signatures to predict the phenotypic characteristics of ECs. The 19 cancer hub gene signatures had high predictive power to distinguish among three key principal features of ECs: grade, type, and stage. Intriguingly, these hub gene networks seem to contribute to ECs progression and malignancy via cell-cycle regulation, antigen processing and the citric acid (TCA) cycle. Conclusions The results of this study provide a powerful biomarker discovery platform to better understand the progression of ECs and to uncover potential therapeutic targets in the treatment of ECs. This information might lead to improved monitoring of ECs and resulting improvement of treatment of ECs, the 4th most common of cancer in women. PMID:24758163

  20. Lissencephaly: expanded imaging and clinical classification

    PubMed Central

    Di Donato, Nataliya; Chiari, Sara; Mirzaa, Ghayda M.; Aldinger, Kimberly; Parrini, Elena; Olds, Carissa; Barkovich, A. James; Guerrini, Renzo; Dobyns, William B.

    2017-01-01

    Lissencephaly (“smooth brain”, LIS) is a malformation of cortical development associated with deficient neuronal migration and abnormal formation of cerebral convolutions or gyri. The LIS spectrum includes agyria, pachygyria, and subcortical band heterotopia. Our first classification of LIS and subcortical band heterotopia (SBH) was developed to distinguish between the first two genetic causes of LIS – LIS1 (PAFAH1B1) and DCX. However, progress in molecular genetics has led to identification of 19 LIS-associated genes, leaving the existing classification system insufficient to distinguish the increasingly diverse patterns of LIS. To address this challenge, we reviewed clinical, imaging and molecular data on 188 patients with LIS-SBH ascertained during the last five years, and reviewed selected archival data on another ~1,400 patients. Using these data plus published reports, we constructed a new imaging based classification system with 21 recognizable patterns that reliably predict the most likely causative genes. These patterns do not correlate consistently with the clinical outcome, leading us to also develop a new scale useful for predicting clinical severity and outcome. Taken together, our work provides new tools that should prove useful for clinical management and genetic counselling of patients with LIS-SBH (imaging and severity based classifications), and guidance for prioritizing and interpreting genetic testing results (imaging based classification). PMID:28440899

  1. Genome-wide identification, classification and expression analysis in fungal-plant interactions of cutinase gene family and functional analysis of a putative ClCUT7 in Curvularia lunata.

    PubMed

    Liu, Tong; Hou, Jumei; Wang, Yuying; Jin, Yazhong; Borth, Wayne; Zhao, Fengzhou; Liu, Zheng; Hu, John; Zuo, Yuhu

    2016-06-01

    Cutinase is described as playing various roles in fungal-plant pathogen interactions, such as eliciting host-derived signals, fungal spore attachment and carbon acquisition during saprophytic growth. However, the characteristics of the cutinase genes, their expression in compatible interactions and their roles in pathogenesis have not been reported in Curvularia lunata, an important leaf spot pathogen of maize in China. Therefore, a cutinase gene family analysis could have profound significance. In this study, we identified 13 cutinase genes (ClCUT1 to ClCUT13) in the C. lunata genome. Multiple sequence alignment showed that most fungal cutinase proteins had one highly conserved GYSQG motif and a similar DxVCxG[ST]-[LIVMF](3)-x(3)H motif. Gene structure analyses of the cutinases revealed a complex intron-exon pattern with differences in the position and number of introns and exons. Based on phylogenetic relationship analysis, C. lunata cutinases and 78 known cutinase proteins from other fungi were classified into four groups with subgroups, but the C. lunata cutinases clustered in only three of the four groups. Motif analyses showed that each group of cutinases from C. lunata had a common motif. Real-time PCR indicated that transcript levels of the cutinase genes in a compatible interaction between pathogen and host had varied expression patterns. Interestingly, the transcript levels of ClCUT7 gradually increased during early pathogenesis with the most significant up-regulation at 3 h post-inoculation. When ClCUT7 was deleted, pathogenicity of the mutant decreased on unwounded maize (Zea mays) leaves. On wounded maize leaves, however, the mutant caused symptoms similar to the wild-type strain. Moreover, the ClCUT7 mutant had an approximately 10 % reduction in growth rate when cutin was the sole carbon source. In conclusion, we identified and characterized the cutinase family genes of C. lunata, analyzed their expression patterns in a compatible host-pathogen interaction, and explored the role of ClCUT7 in pathogenicity. This work will increase our understanding of cutinase genes in other fungal-plant pathogens.

  2. Genomewide identification and expression analysis of the ARF gene family in apple.

    PubMed

    Luo, Xiao-Cui; Sun, Mei-Hong; Xu, Rui-Rui; Shu, Huai-Rui; Wang, Jia-Wei; Zhang, Shi-Zhong

    2014-12-01

    Auxin response factors (ARF) are transcription factors that regulate auxin responses in plants. Although the genomewide analysis of this family has been performed in some species, little is known regarding ARF genes in apple (Malus domestica). In this study, 31 putative apple ARF genes have been identified and located within the apple genome. The phylogenetic analysis revealed that MdARFs could be divided into three subfamilies (groups I, II and III). The predicted MdARFs were distributed across 15 of 17 chromosomes with different densities. In addition, the analysis of exon-intron junctions and of the intron phase inside the predicted coding region of each candidate gene has revealed high levels of conservation within and between phylogenetic groups. Expression profile analyses of MdARF genes were performed in different tissues (root, stem, leaf, flower and fruit), and all the selected genes were expressed in at least one of the tissues that were tested, which indicated that MdARFs are involved in various aspects of physiological and developmental processes of apple. To our knowledge, this report is the first to provide a genomewide analysis of the apple ARF gene family. This study provides valuable information for understanding the classification and putative functions of the ARF signal in apple.

  3. Robust nuclear lamina-based cell classification of aging and senescent cells

    PubMed Central

    Righolt, Christiaan H.; van 't Hoff, Merel L.R.; Vermolen, Bart J.; Young, Ian T.; Raz, Vered

    2011-01-01

    Changes in the shape of the nuclear lamina are exhibited in senescent cells, as well as in cells expressing mutations in lamina genes. To identify cells with defects in the nuclear lamina we developed an imaging method that quantifies the intensity and curvature of the nuclear lamina. We show that this method accurately describes changes in the nuclear lamina. Spatial changes in nuclear lamina coincide with redistribution of lamin A proteins and local reduction in protein mobility in senescent cell. We suggest that local accumulation of lamin A in the nuclear envelope leads to bending of the structure. A quantitative distinction of the nuclear lamina shape in cell populations was found between fresh and senescent cells, and between primary myoblasts from young and old donors. Moreover, with this method mutations in lamina genes were significantly distinct from cells with wild-type genes. We suggest that this method can be applied to identify abnormal cells during aging, in in vitro propagation, and in lamina disorders. PMID:22199022

  4. Robust nuclear lamina-based cell classification of aging and senescent cells.

    PubMed

    Righolt, Christiaan H; van 't Hoff, Merel L R; Vermolen, Bart J; Young, Ian T; Raz, Vered

    2011-12-01

    Changes in the shape of the nuclear lamina are exhibited in senescent cells, as well as in cells expressing mutations in lamina genes. To identify cells with defects in the nuclear lamina we developed an imaging method that quantifies the intensity and curvature of the nuclear lamina. We show that this method accurately describes changes in the nuclear lamina. Spatial changes in nuclear lamina coincide with redistribution of lamin A proteins and local reduction in protein mobility in senescent cell. We suggest that local accumulation of lamin A in the nuclear envelope leads to bending of the structure. A quantitative distinction of the nuclear lamina shape in cell populations was found between fresh and senescent cells, and between primary myoblasts from young and old donors. Moreover, with this method mutations in lamina genes were significantly distinct from cells with wild-type genes. We suggest that this method can be applied to identify abnormal cells during aging, in in vitro propagation, and in lamina disorders.

  5. Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

    NASA Astrophysics Data System (ADS)

    Robson, Barry; Li, Jin; Dettinger, Richard; Peters, Amanda; Boyer, Stephen K.

    2011-05-01

    A patent data base of 6.7 million compounds generated by a very high performance computer (Blue Gene) requires new techniques for exploitation when extensive use of chemical similarity is involved. Such exploitation includes the taxonomic classification of chemical themes, and data mining to assess mutual information between themes and companies. Importantly, we also launch candidates that evolve by "natural selection" as failure of partial match against the patent data base and their ability to bind to the protein target appropriately, by simulation on Blue Gene. An unusual feature of our method is that algorithms and workflows rely on dynamic interaction between match-and-edit instructions, which in practice are regular expressions. Similarity testing by these uses SMILES strings and, less frequently, graph or connectivity representations. Examining how this performs in high throughput, we note that chemical similarity and novelty are human concepts that largely have meaning by utility in specific contexts. For some purposes, mutual information involving chemical themes might be a better concept.

  6. Functional Assessment of the Role of BORIS in Ovarian Cancer Using a Novel in Vivo Model System

    DTIC Science & Technology

    2015-12-01

    iv) we obtained founder BORIS-Tg mice and crossed into the FVB/N strain to fully characterize the transgenic gene configuration, v) we conducted...models, transgenic mice 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC a...genes, and wildtype p53 is a negative regulator of BORIS expression. To test these hypotheses, we will develop and utilize a murine transgenic model

  7. Candidate gene database and transcript map for peach, a model species for fruit trees.

    PubMed

    Horn, Renate; Lecouls, Anne-Claire; Callahan, Ann; Dandekar, Abhaya; Garay, Lilibeth; McCord, Per; Howad, Werner; Chan, Helen; Verde, Ignazio; Main, Doreen; Jung, Sook; Georgi, Laura; Forrest, Sam; Mook, Jennifer; Zhebentyayeva, Tatyana; Yu, Yeisoo; Kim, Hye Ran; Jesudurai, Christopher; Sosinski, Bryon; Arús, Pere; Baird, Vance; Parfitt, Dan; Reighard, Gregory; Scorza, Ralph; Tomkins, Jeffrey; Wing, Rod; Abbott, Albert Glenn

    2005-05-01

    Peach (Prunus persica) is a model species for the Rosaceae, which includes a number of economically important fruit tree species. To develop an extensive Prunus expressed sequence tag (EST) database for identifying and cloning the genes important to fruit and tree development, we generated 9,984 high-quality ESTs from a peach cDNA library of developing fruit mesocarp. After assembly and annotation, a putative peach unigene set consisting of 3,842 ESTs was defined. Gene ontology (GO) classification was assigned based on the annotation of the single "best hit" match against the Swiss-Prot database. No significant homology could be found in the GenBank nr databases for 24.3% of the sequences. Using core markers from the general Prunus genetic map, we anchored bacterial artificial chromosome (BAC) clones on the genetic map, thereby providing a framework for the construction of a physical and transcript map. A transcript map was developed by hybridizing 1,236 ESTs from the putative peach unigene set and an additional 68 peach cDNA clones against the peach BAC library. Hybridizing ESTs to genetically anchored BACs immediately localized 11.2% of the ESTs on the genetic map. ESTs showed a clustering of expressed genes in defined regions of the linkage groups. [The data were built into a regularly updated Genome Database for Rosaceae (GDR), available at (http://www.genome.clemson.edu/gdr/).].

  8. Transcription factor AtTCP14 regulates embryonic growth potential during seed germination in Arabidopsis thaliana.

    PubMed

    Tatematsu, Kiyoshi; Nakabayashi, Kazumi; Kamiya, Yuji; Nambara, Eiji

    2008-01-01

    To understand the molecular mechanisms underlying regulation of seed germination, we searched enriched cis elements in the upstream regions of Arabidopsis genes whose transcript levels increased during seed germination. Using available published microarray data, we found that two cis elements, Up1 or Up2, which regulate outgrowth of Arabidopsis axillary shoots, were significantly over-represented. Classification of Up1- and Up2-containing genes by gene ontology revealed that protein synthesis-related genes, especially ribosomal protein genes, were highly over-represented. Expression analysis using a reporter gene driven by a synthetic promoter regulated by these elements showed that the Up1 is necessary and sufficient for germination-associated gene induction, whereas Up2 acts as an enhancer of Up1. Up1-mediated gene expression was suppressed by treatments that blocked germination. Up1 is almost identical to the site II motif, which is the predicted target of TCP transcription factors. Of 24 AtTCP genes, AtTCP14, which showed the highest transcript level just prior to germination, was functionally characterized to test its involvement in the regulation of seed germination. Transposon-tagged lines for AtTCP14 showed delayed germination. In addition, germination of attcp14 mutants exhibited hypersensitivity to exogenously applied abscisic acid and paclobutrazol, an inhibitor of gibberellin biosynthesis. AtTCP14 was predominantly expressed in the vascular tissues of the embryo, and affected gene expression in radicles in a non-cell-autonomous manner. Taken together, these results indicate that AtTCP14 regulates the activation of embryonic growth potential in Arabidopsis seeds.

  9. Derivation of an artificial gene to improve classification accuracy upon gene selection.

    PubMed

    Seo, Minseok; Oh, Sejong

    2012-02-01

    Classification analysis has been developed continuously since 1936. This research field has advanced as a result of development of classifiers such as KNN, ANN, and SVM, as well as through data preprocessing areas. Feature (gene) selection is required for very high dimensional data such as microarray before classification work. The goal of feature selection is to choose a subset of informative features that reduces processing time and provides higher classification accuracy. In this study, we devised a method of artificial gene making (AGM) for microarray data to improve classification accuracy. Our artificial gene was derived from a whole microarray dataset, and combined with a result of gene selection for classification analysis. We experimentally confirmed a clear improvement of classification accuracy after inserting artificial gene. Our artificial gene worked well for popular feature (gene) selection algorithms and classifiers. The proposed approach can be applied to any type of high dimensional dataset. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Identification and Analysis of Mitogen-Activated Protein Kinase (MAPK) Cascades in Fragaria vesca.

    PubMed

    Zhou, Heying; Ren, Suyue; Han, Yuanfang; Zhang, Qing; Qin, Ling; Xing, Yu

    2017-08-13

    Mitogen-activated protein kinase (MAPK) cascades are highly conserved signaling modules in eukaryotes, including yeasts, plants and animals. MAPK cascades are responsible for protein phosphorylation during signal transduction events, and typically consist of three protein kinases: MAPK, MAPK kinase, and MAPK kinase kinase. In this current study, we identified a total of 12 FvMAPK , 7 FvMAPKK , 73 FvMAPKKK , and one FvMAPKKKK genes in the recently published Fragaria vesca genome sequence. This work reported the classification, annotation and phylogenetic evaluation of these genes and an assessment of conserved motifs and the expression profiling of members of the gene family were also analyzed here. The expression profiles of the MAPK and MAPKK genes in different organs and fruit developmental stages were further investigated using quantitative real-time reverse transcription PCR (qRT-PCR). Finally, the MAPK and MAPKK expression patterns in response to hormone and abiotic stresses (salt, drought, and high and low temperature) were investigated in fruit and leaves of F. vesca . The results provide a platform for further characterization of the physiological and biochemical functions of MAPK cascades in strawberry.

  11. Identification of Differentially Expressed Genes Associated with Apple Fruit Ripening and Softening by Suppression Subtractive Hybridization

    PubMed Central

    Zhang, Zongying; Jiang, Shenghui; Wang, Nan; Li, Min; Ji, Xiaohao; Sun, Shasha; Liu, Jingxuan; Wang, Deyun; Xu, Haifeng; Qi, Sumin; Wu, Shujing; Fei, Zhangjun; Feng, Shouqian; Chen, Xuesen

    2015-01-01

    Apple is one of the most economically important horticultural fruit crops worldwide. It is critical to gain insights into fruit ripening and softening to improve apple fruit quality and extend shelf life. In this study, forward and reverse suppression subtractive hybridization libraries were generated from ‘Taishanzaoxia’ apple fruits sampled around the ethylene climacteric to isolate ripening- and softening-related genes. A set of 648 unigenes were derived from sequence alignment and cluster assembly of 918 expressed sequence tags. According to gene ontology functional classification, 390 out of 443 unigenes (88%) were assigned to the biological process category, 356 unigenes (80%) were classified in the molecular function category, and 381 unigenes (86%) were allocated to the cellular component category. A total of 26 unigenes differentially expressed during fruit development period were analyzed by quantitative RT-PCR. These genes were involved in cell wall modification, anthocyanin biosynthesis, aroma production, stress response, metabolism, transcription, or were non-annotated. Some genes associated with cell wall modification, anthocyanin biosynthesis and aroma production were up-regulated and significantly correlated with ethylene production, suggesting that fruit texture, coloration and aroma may be regulated by ethylene in ‘Taishanzaoxia’. Some of the identified unigenes associated with fruit ripening and softening have not been characterized in public databases. The results contribute to an improved characterization of changes in gene expression during apple fruit ripening and softening. PMID:26719904

  12. Gene expression variations during Drosophila metamorphosis in space: The GENE experiment in the Spanish cervantes missions to the ISS

    NASA Astrophysics Data System (ADS)

    Herranz, Raul; Benguria, Alberto; Medina, Javier; Gasset, Gilbert; van Loon, Jack J.; Zaballos, Angel; Marco, Roberto

    2005-08-01

    The ISS expedition 8, a Soyuz Mission, flew to the International Space Station (ISS) to replace the two- member ISS crew during October 2003. During this crew exchanging flight, the Spanish Cervantes Scientific Mission took place. In it some biological experiments were performed among them three proposed by our Team. The third member of the expedition, the Spanish born ESA astronaut Pedro Duque, returned within the Soyuz 7 capsule carrying the experiment containing transport box after almost 11 days in microgravity. In one of the three experiments, the GENE experiment, we intended to determine how microgravity affects the gene expression pattern of Drosophila with one of the current more powerful technologies , a complete Drosophila melanogaster genome microarray (AffymetrixTM, version 1.0). Due to the constrains in the current ISS experiments, we decided to limit our experiment to the organism rebuilding processes that occurs during Drosophila metamorphosis. In addition to the ISS samples, several control experiments have been performed including a 1g Ground control parallel to the ISS flight samples, a Random Position Machine microgravity simulated control and a parallel Hypergravity (10g) experiment. Extracted RNA from the samples was used to test the differences in gene expression during Drosophila development. A preliminary analysis of the results indicates that around five hundred genes change their expression profiles, many of them belonging to particular ontology classification groups.

  13. Analyzing gene expression time-courses based on multi-resolution shape mixture model.

    PubMed

    Li, Ying; He, Ye; Zhang, Yu

    2016-11-01

    Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Statistical inference for remote sensing-based estimates of net deforestation

    Treesearch

    Ronald E. McRoberts; Brian F. Walters

    2012-01-01

    Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...

  15. Decision trees for the analysis of genes involved in Alzheimer's disease pathology.

    PubMed

    Mestizo Gutiérrez, Sonia L; Herrera Rivero, Marisol; Cruz Ramírez, Nicandro; Hernández, Elena; Aranda-Abreu, Gonzalo E

    2014-09-21

    Alzheimer's disease (AD) is characterized by a gradual loss of memory, orientation, judgement and language. There is still no cure for this disorder. AD pathogenesis remains fairly unknown and its underlying molecular mechanisms are not yet fully understood. Several studies have shown that the abnormal accumulation of beta-amyloid and tau proteins occurs 10 to 20 years before the onset of symptoms of the disease, so it is extremely important to identify changes in the brain before the first symptoms. We used decision trees to classify 31 individuals (9 healthy controls and 22 AD patients in three different stages of disease) according to the expression of 69 genes previously reported in a meta-analysis, plus the expression levels of APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT. We also included in our analysis the MMSE (Mini-Mental State Examination) scores and number of NFT (neurofibrillary tangles). Results allowed us to generate a model of classification values for different AD stages of severity, according to MMSE scores, and achieve the identification of the expression level of protein tau that may possibly determine the onset (incipient stage) of AD. We used decision trees to model the different stages of AD (severe, moderate, incipient and control) based on the meta-analysis of gene expression levels plus MMSE and NFT scores. Both classifiers reported the variable MMSE as most informative, however it we were found that the protein tau also an important role in the onset of AD. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Phylogenetic classification and the universal tree.

    PubMed

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  17. Pathogenic Germline Variants in 10,389 Adult Cancers.

    PubMed

    Huang, Kuan-Lin; Mashl, R Jay; Wu, Yige; Ritter, Deborah I; Wang, Jiayin; Oh, Clara; Paczkowska, Marta; Reynolds, Sheila; Wyczalkowski, Matthew A; Oak, Ninad; Scott, Adam D; Krassowski, Michal; Cherniack, Andrew D; Houlahan, Kathleen E; Jayasinghe, Reyka; Wang, Liang-Bo; Zhou, Daniel Cui; Liu, Di; Cao, Song; Kim, Young Won; Koire, Amanda; McMichael, Joshua F; Hucthagowder, Vishwanathan; Kim, Tae-Beom; Hahn, Abigail; Wang, Chen; McLellan, Michael D; Al-Mulla, Fahd; Johnson, Kimberly J; Lichtarge, Olivier; Boutros, Paul C; Raphael, Benjamin; Lazar, Alexander J; Zhang, Wei; Wendl, Michael C; Govindan, Ramaswamy; Jain, Sanjay; Wheeler, David; Kulkarni, Shashikant; Dipersio, John F; Reimand, Jüri; Meric-Bernstam, Funda; Chen, Ken; Shmulevich, Ilya; Plon, Sharon E; Chen, Feng; Ding, Li

    2018-04-05

    We conducted the largest investigation of predisposition variants in cancer to date, discovering 853 pathogenic or likely pathogenic variants in 8% of 10,389 cases from 33 cancer types. Twenty-one genes showed single or cross-cancer associations, including novel associations of SDHA in melanoma and PALB2 in stomach adenocarcinoma. The 659 predisposition variants and 18 additional large deletions in tumor suppressors, including ATM, BRCA1, and NF1, showed low gene expression and frequent (43%) loss of heterozygosity or biallelic two-hit events. We also discovered 33 such variants in oncogenes, including missenses in MET, RET, and PTPN11 associated with high gene expression. We nominated 47 additional predisposition variants from prioritized VUSs supported by multiple evidences involving case-control frequency, loss of heterozygosity, expression effect, and co-localization with mutations and modified residues. Our integrative approach links rare predisposition variants to functional consequences, informing future guidelines of variant classification and germline genetic testing in cancer. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  18. Common and Distant Structural Characteristics of Feruloyl Esterase Families from Aspergillus oryzae

    PubMed Central

    Udatha, D. B. R. K. Gupta; Mapelli, Valeria; Panagiotou, Gianni; Olsson, Lisbeth

    2012-01-01

    Background Feruloyl esterases (FAEs) are important biomass degrading accessory enzymes due to their capability of cleaving the ester links between hemicellulose and pectin to aromatic compounds of lignin, thus enhancing the accessibility of plant tissues to cellulolytic and hemicellulolytic enzymes. FAEs have gained increased attention in the area of biocatalytic transformations for the synthesis of value added compounds with medicinal and nutritional applications. Following the increasing attention on these enzymes, a novel descriptor based classification system has been proposed for FAEs resulting into 12 distinct families and pharmacophore models for three FAE sub-families have been developed. Methodology/Principal Findings The feruloylome of Aspergillus oryzae contains 13 predicted FAEs belonging to six sub-families based on our recently developed descriptor-based classification system. The three-dimensional structures of the 13 FAEs were modeled for structural analysis of the feruloylome. The three genes coding for three enzymes, viz., A.O.2, A.O.8 and A.O.10 from the feruloylome of A. oryzae, representing sub-families with unknown functional features, were heterologously expressed in Pichia pastoris, characterized for substrate specificity and structural characterization through CD spectroscopy. Common feature-based pharamacophore models were developed according to substrate specificity characteristics of the three enzymes. The active site residues were identified for the three expressed FAEs by determining the titration curves of amino acid residues as a function of the pH by applying molecular simulations. Conclusions/Significance Our findings on the structure-function relationships and substrate specificity of the FAEs of A. oryzae will be instrumental for further understanding of the FAE families in the novel classification system. The developed pharmacophore models could be applied for virtual screening of compound databases for short listing the putative substrates prior to docking studies or for post-processing docking results to remove false positives. Our study exemplifies how computational predictions can complement to the information obtained through experimental methods. PMID:22745763

  19. Common and distant structural characteristics of feruloyl esterase families from Aspergillus oryzae.

    PubMed

    Udatha, D B R K Gupta; Mapelli, Valeria; Panagiotou, Gianni; Olsson, Lisbeth

    2012-01-01

    Feruloyl esterases (FAEs) are important biomass degrading accessory enzymes due to their capability of cleaving the ester links between hemicellulose and pectin to aromatic compounds of lignin, thus enhancing the accessibility of plant tissues to cellulolytic and hemicellulolytic enzymes. FAEs have gained increased attention in the area of biocatalytic transformations for the synthesis of value added compounds with medicinal and nutritional applications. Following the increasing attention on these enzymes, a novel descriptor based classification system has been proposed for FAEs resulting into 12 distinct families and pharmacophore models for three FAE sub-families have been developed. The feruloylome of Aspergillus oryzae contains 13 predicted FAEs belonging to six sub-families based on our recently developed descriptor-based classification system. The three-dimensional structures of the 13 FAEs were modeled for structural analysis of the feruloylome. The three genes coding for three enzymes, viz., A.O.2, A.O.8 and A.O.10 from the feruloylome of A. oryzae, representing sub-families with unknown functional features, were heterologously expressed in Pichia pastoris, characterized for substrate specificity and structural characterization through CD spectroscopy. Common feature-based pharamacophore models were developed according to substrate specificity characteristics of the three enzymes. The active site residues were identified for the three expressed FAEs by determining the titration curves of amino acid residues as a function of the pH by applying molecular simulations. Our findings on the structure-function relationships and substrate specificity of the FAEs of A. oryzae will be instrumental for further understanding of the FAE families in the novel classification system. The developed pharmacophore models could be applied for virtual screening of compound databases for short listing the putative substrates prior to docking studies or for post-processing docking results to remove false positives. Our study exemplifies how computational predictions can complement to the information obtained through experimental methods.

  20. Di-codon Usage for Gene Classification

    NASA Astrophysics Data System (ADS)

    Nguyen, Minh N.; Ma, Jianmin; Fogel, Gary B.; Rajapakse, Jagath C.

    Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%, and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.

Top