RUAN, XIYUN; LI, HONGYUN; LIU, BO; CHEN, JIE; ZHANG, SHIBAO; SUN, ZEQIANG; LIU, SHUANGQING; SUN, FAHAI; LIU, QINGYONG
2015-01-01
The aim of the present study was to develop a novel method for identifying pathways associated with renal cell carcinoma (RCC) based on a gene co-expression network. A framework was established where a co-expression network was derived from the database as well as various co-expression approaches. First, the backbone of the network based on differentially expressed (DE) genes between RCC patients and normal controls was constructed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database. The differentially co-expressed links were detected by Pearson’s correlation, the empirical Bayesian (EB) approach and Weighted Gene Co-expression Network Analysis (WGCNA). The co-expressed gene pairs were merged by a rank-based algorithm. We obtained 842; 371; 2,883 and 1,595 co-expressed gene pairs from the co-expression networks of the STRING database, Pearson’s correlation EB method and WGCNA, respectively. Two hundred and eighty-one differentially co-expressed (DC) gene pairs were obtained from the merged network using this novel method. Pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and the network enrichment analysis (NEA) method were performed to verify feasibility of the merged method. Results of the KEGG and NEA pathway analyses showed that the network was associated with RCC. The suggested method was computationally efficient to identify pathways associated with RCC and has been identified as a useful complement to traditional co-expression analysis. PMID:26058425
Automated Video Based Facial Expression Analysis of Neuropsychiatric Disorders
Wang, Peng; Barrett, Frederick; Martin, Elizabeth; Milanova, Marina; Gur, Raquel E.; Gur, Ruben C.; Kohler, Christian; Verma, Ragini
2008-01-01
Deficits in emotional expression are prominent in several neuropsychiatric disorders, including schizophrenia. Available clinical facial expression evaluations provide subjective and qualitative measurements, which are based on static 2D images that do not capture the temporal dynamics and subtleties of expression changes. Therefore, there is a need for automated, objective and quantitative measurements of facial expressions captured using videos. This paper presents a computational framework that creates probabilistic expression profiles for video data and can potentially help to automatically quantify emotional expression differences between patients with neuropsychiatric disorders and healthy controls. Our method automatically detects and tracks facial landmarks in videos, and then extracts geometric features to characterize facial expression changes. To analyze temporal facial expression changes, we employ probabilistic classifiers that analyze facial expressions in individual frames, and then propagate the probabilities throughout the video to capture the temporal characteristics of facial expressions. The applications of our method to healthy controls and case studies of patients with schizophrenia and Asperger’s syndrome demonstrate the capability of the video-based expression analysis method in capturing subtleties of facial expression. Such results can pave the way for a video based method for quantitative analysis of facial expressions in clinical research of disorders that cause affective deficits. PMID:18045693
2016-01-01
Abstract Microarray gene expression data sets are jointly analyzed to increase statistical power. They could either be merged together or analyzed by meta-analysis. For a given ensemble of data sets, it cannot be foreseen which of these paradigms, merging or meta-analysis, works better. In this article, three joint analysis methods, Z -score normalization, ComBat and the inverse normal method (meta-analysis) were selected for survival prognosis and risk assessment of breast cancer patients. The methods were applied to eight microarray gene expression data sets, totaling 1324 patients with two clinical endpoints, overall survival and relapse-free survival. The performance derived from the joint analysis methods was evaluated using Cox regression for survival analysis and independent validation used as bias estimation. Overall, Z -score normalization had a better performance than ComBat and meta-analysis. Higher Area Under the Receiver Operating Characteristic curve and hazard ratio were also obtained when independent validation was used as bias estimation. With a lower time and memory complexity, Z -score normalization is a simple method for joint analysis of microarray gene expression data sets. The derived findings suggest further assessment of this method in future survival prediction and cancer classification applications. PMID:26504096
A Self-Directed Method for Cell-Type Identification and Separation of Gene Expression Microarrays
Zuckerman, Neta S.; Noam, Yair; Goldsmith, Andrea J.; Lee, Peter P.
2013-01-01
Gene expression analysis is generally performed on heterogeneous tissue samples consisting of multiple cell types. Current methods developed to separate heterogeneous gene expression rely on prior knowledge of the cell-type composition and/or signatures - these are not available in most public datasets. We present a novel method to identify the cell-type composition, signatures and proportions per sample without need for a-priori information. The method was successfully tested on controlled and semi-controlled datasets and performed as accurately as current methods that do require additional information. As such, this method enables the analysis of cell-type specific gene expression using existing large pools of publically available microarray datasets. PMID:23990767
Methods for Genome-Wide Analysis of Gene Expression Changes in Polyploids
Wang, Jianlin; Lee, Jinsuk J.; Tian, Lu; Lee, Hyeon-Se; Chen, Meng; Rao, Sheetal; Wei, Edward N.; Doerge, R. W.; Comai, Luca; Jeffrey Chen, Z.
2007-01-01
Polyploidy is an evolutionary innovation, providing extra sets of genetic material for phenotypic variation and adaptation. It is predicted that changes of gene expression by genetic and epigenetic mechanisms are responsible for novel variation in nascent and established polyploids (Liu and Wendel, 2002; Osborn et al., 2003; Pikaard, 2001). Studying gene expression changes in allopolyploids is more complicated than in autopolyploids, because allopolyploids contain more than two sets of genomes originating from divergent, but related, species. Here we describe two methods that are applicable to the genome-wide analysis of gene expression differences resulting from genome duplication in autopolyploids or interactions between homoeologous genomes in allopolyploids. First, we describe an amplified fragment length polymorphism (AFLP)–complementary DNA (cDNA) display method that allows the discrimination of homoeologous loci based on restriction polymorphisms between the progenitors. Second, we describe microarray analyses that can be used to compare gene expression differences between the allopolyploids and respective progenitors using appropriate experimental design and statistical analysis. We demonstrate the utility of these two complementary methods and discuss the pros and cons of using the methods to analyze gene expression changes in autopolyploids and allopolyploids. Furthermore, we describe these methods in general terms to be of wider applicability for comparative gene expression in a variety of evolutionary, genetic, biological, and physiological contexts. PMID:15865985
Rode, Tone Mari; Berget, Ingunn; Langsrud, Solveig; Møretrø, Trond; Holck, Askild
2009-07-01
Microorganisms are constantly exposed to new and altered growth conditions, and respond by changing gene expression patterns. Several methods for studying gene expression exist. During the last decade, the analysis of microarrays has been one of the most common approaches applied for large scale gene expression studies. A relatively new method for gene expression analysis is MassARRAY, which combines real competitive-PCR and MALDI-TOF (matrix-assisted laser desorption/ionization time-of-flight) mass spectrometry. In contrast to microarray methods, MassARRAY technology is suitable for analysing a larger number of samples, though for a smaller set of genes. In this study we compare the results from MassARRAY with microarrays on gene expression responses of Staphylococcus aureus exposed to acid stress at pH 4.5. RNA isolated from the same stress experiments was analysed using both the MassARRAY and the microarray methods. The MassARRAY and microarray methods showed good correlation. Both MassARRAY and microarray estimated somewhat lower fold changes compared with quantitative real-time PCR (qRT-PCR). The results confirmed the up-regulation of the urease genes in acidic environments, and also indicated the importance of metal ion regulation. This study shows that the MassARRAY technology is suitable for gene expression analysis in prokaryotes, and has advantages when a set of genes is being analysed for an organism exposed to many different environmental conditions.
Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun
2014-06-01
Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.
Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.
Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara
2017-01-01
The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.
Soul, Jamie; Hardingham, Timothy E; Boot-Handford, Raymond P; Schwartz, Jean-Marc
2015-01-29
We describe a new method, PhenomeExpress, for the analysis of transcriptomic datasets to identify pathogenic disease mechanisms. Our analysis method includes input from both protein-protein interaction and phenotype similarity networks. This introduces valuable information from disease relevant phenotypes, which aids the identification of sub-networks that are significantly enriched in differentially expressed genes and are related to the disease relevant phenotypes. This contrasts with many active sub-network detection methods, which rely solely on protein-protein interaction networks derived from compounded data of many unrelated biological conditions and which are therefore not specific to the context of the experiment. PhenomeExpress thus exploits readily available animal model and human disease phenotype information. It combines this prior evidence of disease phenotypes with the experimentally derived disease data sets to provide a more targeted analysis. Two case studies, in subchondral bone in osteoarthritis and in Pax5 in acute lymphoblastic leukaemia, demonstrate that PhenomeExpress identifies core disease pathways in both mouse and human disease expression datasets derived from different technologies. We also validate the approach by comparison to state-of-the-art active sub-network detection methods, which reveals how it may enhance the detection of molecular phenotypes and provide a more detailed context to those previously identified as possible candidates.
Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi
2016-01-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi
2015-11-01
Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Wang, Ya-Xuan; Gao, Ying-Lian; Liu, Jin-Xing; Kong, Xiang-Zhen; Li, Hai-Jun
2017-09-01
Identifying differentially expressed genes from the thousands of genes is a challenging task. Robust principal component analysis (RPCA) is an efficient method in the identification of differentially expressed genes. RPCA method uses nuclear norm to approximate the rank function. However, theoretical studies showed that the nuclear norm minimizes all singular values, so it may not be the best solution to approximate the rank function. The truncated nuclear norm is defined as the sum of some smaller singular values, which may achieve a better approximation of the rank function than nuclear norm. In this paper, a novel method is proposed by replacing nuclear norm of RPCA with the truncated nuclear norm, which is named robust principal component analysis regularized by truncated nuclear norm (TRPCA). The method decomposes the observation matrix of genomic data into a low-rank matrix and a sparse matrix. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Thus, the differentially expressed genes can be identified according to the sparse matrix. The experimental results on The Cancer Genome Atlas data illustrate that the TRPCA method outperforms other state-of-the-art methods in the identification of differentially expressed genes.
Multiplex cDNA quantification method that facilitates the standardization of gene expression data
Gotoh, Osamu; Murakami, Yasufumi; Suyama, Akira
2011-01-01
Microarray-based gene expression measurement is one of the major methods for transcriptome analysis. However, current microarray data are substantially affected by microarray platforms and RNA references because of the microarray method can provide merely the relative amounts of gene expression levels. Therefore, valid comparisons of the microarray data require standardized platforms, internal and/or external controls and complicated normalizations. These requirements impose limitations on the extensive comparison of gene expression data. Here, we report an effective approach to removing the unfavorable limitations by measuring the absolute amounts of gene expression levels on common DNA microarrays. We have developed a multiplex cDNA quantification method called GEP-DEAN (Gene expression profiling by DCN-encoding-based analysis). The method was validated by using chemically synthesized DNA strands of known quantities and cDNA samples prepared from mouse liver, demonstrating that the absolute amounts of cDNA strands were successfully measured with a sensitivity of 18 zmol in a highly multiplexed manner in 7 h. PMID:21415008
Brodsky, Leonid; Leontovich, Andrei; Shtutman, Michael; Feinstein, Elena
2004-01-01
Mathematical methods of analysis of microarray hybridizations deal with gene expression profiles as elementary units. However, some of these profiles do not reflect a biologically relevant transcriptional response, but rather stem from technical artifacts. Here, we describe two technically independent but rationally interconnected methods for identification of such artifactual profiles. Our diagnostics are based on detection of deviations from uniformity, which is assumed as the main underlying principle of microarray design. Method 1 is based on detection of non-uniformity of microarray distribution of printed genes that are clustered based on the similarity of their expression profiles. Method 2 is based on evaluation of the presence of gene-specific microarray spots within the slides’ areas characterized by an abnormal concentration of low/high differential expression values, which we define as ‘patterns of differentials’. Applying two novel algorithms, for nested clustering (method 1) and for pattern detection (method 2), we can make a dual estimation of the profile’s quality for almost every printed gene. Genes with artifactual profiles detected by method 1 may then be removed from further analysis. Suspicious differential expression values detected by method 2 may be either removed or weighted according to the probabilities of patterns that cover them, thus diminishing their input in any further data analysis. PMID:14999086
Similarity of markers identified from cancer gene expression studies: observations from GEO.
Shi, Xingjie; Shen, Shihao; Liu, Jin; Huang, Jian; Zhou, Yong; Ma, Shuangge
2014-09-01
Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Wang, Tianyu; Nabavi, Sheida
2018-04-24
Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity. Copyright © 2018 Elsevier Inc. All rights reserved.
Optimal consistency in microRNA expression analysis using reference-gene-based normalization.
Wang, Xi; Gardiner, Erin J; Cairns, Murray J
2015-05-01
Normalization of high-throughput molecular expression profiles secures differential expression analysis between samples of different phenotypes or biological conditions, and facilitates comparison between experimental batches. While the same general principles apply to microRNA (miRNA) normalization, there is mounting evidence that global shifts in their expression patterns occur in specific circumstances, which pose a challenge for normalizing miRNA expression data. As an alternative to global normalization, which has the propensity to flatten large trends, normalization against constitutively expressed reference genes presents an advantage through their relative independence. Here we investigated the performance of reference-gene-based (RGB) normalization for differential miRNA expression analysis of microarray expression data, and compared the results with other normalization methods, including: quantile, variance stabilization, robust spline, simple scaling, rank invariant, and Loess regression. The comparative analyses were executed using miRNA expression in tissue samples derived from subjects with schizophrenia and non-psychiatric controls. We proposed a consistency criterion for evaluating methods by examining the overlapping of differentially expressed miRNAs detected using different partitions of the whole data. Based on this criterion, we found that RGB normalization generally outperformed global normalization methods. Thus we recommend the application of RGB normalization for miRNA expression data sets, and believe that this will yield a more consistent and useful readout of differentially expressed miRNAs, particularly in biological conditions characterized by large shifts in miRNA expression.
Tissue Non-Specific Genes and Pathways Associated with Diabetes: An Expression Meta-Analysis.
Mei, Hao; Li, Lianna; Liu, Shijian; Jiang, Fan; Griswold, Michael; Mosley, Thomas
2017-01-21
We performed expression studies to identify tissue non-specific genes and pathways of diabetes by meta-analysis. We searched curated datasets of the Gene Expression Omnibus (GEO) database and identified 13 and five expression studies of diabetes and insulin responses at various tissues, respectively. We tested differential gene expression by empirical Bayes-based linear method and investigated gene set expression association by knowledge-based enrichment analysis. Meta-analysis by different methods was applied to identify tissue non-specific genes and gene sets. We also proposed pathway mapping analysis to infer functions of the identified gene sets, and correlation and independent analysis to evaluate expression association profile of genes and gene sets between studies and tissues. Our analysis showed that PGRMC1 and HADH genes were significant over diabetes studies, while IRS1 and MPST genes were significant over insulin response studies, and joint analysis showed that HADH and MPST genes were significant over all combined data sets. The pathway analysis identified six significant gene sets over all studies. The KEGG pathway mapping indicated that the significant gene sets are related to diabetes pathogenesis. The results also presented that 12.8% and 59.0% pairwise studies had significantly correlated expression association for genes and gene sets, respectively; moreover, 12.8% pairwise studies had independent expression association for genes, but no studies were observed significantly different for expression association of gene sets. Our analysis indicated that there are both tissue specific and non-specific genes and pathways associated with diabetes pathogenesis. Compared to the gene expression, pathway association tends to be tissue non-specific, and a common pathway influencing diabetes development is activated through different genes at different tissues.
Time-Course Gene Set Analysis for Longitudinal Gene Expression Data
Hejblum, Boris P.; Skinner, Jason; Thiébaut, Rodolphe
2015-01-01
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. PMID:26111374
Analysis of Facial Expression by Taste Stimulation
NASA Astrophysics Data System (ADS)
Tobitani, Kensuke; Kato, Kunihito; Yamamoto, Kazuhiko
In this study, we focused on the basic taste stimulation for the analysis of real facial expressions. We considered that the expressions caused by taste stimulation were unaffected by individuality or emotion, that is, such expressions were involuntary. We analyzed the movement of facial muscles by taste stimulation and compared real expressions with artificial expressions. From the result, we identified an obvious difference between real and artificial expressions. Thus, our method would be a new approach for facial expression recognition.
Yang, Ze-Hui; Zheng, Rui; Gao, Yuan; Zhang, Qiang
2016-09-01
With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies. We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods. Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity. Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01). Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research. © 2015 John Wiley & Sons Ltd.
Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments
Maza, Elie; Frasse, Pierre; Senin, Pavel; Bouzayen, Mondher; Zouine, Mohamed
2013-01-01
In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods. PMID:26442135
puma: a Bioconductor package for propagating uncertainty in microarray analysis.
Pearson, Richard D; Liu, Xuejun; Sanguinetti, Guido; Milo, Marta; Lawrence, Neil D; Rattray, Magnus
2009-07-09
Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied. puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Dynamic association rules for gene expression data analysis.
Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung
2015-10-14
The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
ERIC Educational Resources Information Center
Fisher, Evelyn L.
2017-01-01
Purpose: The purpose of this study was to explore the literature on predictors of outcomes among late talkers using systematic review and meta-analysis methods. We sought to answer the question: What factors predict preschool-age expressive-language outcomes among late-talking toddlers? Method: We entered carefully selected search terms into the…
Instrumentation for noninvasive express-diagnostics bacteriophages and viruses by optical method
NASA Astrophysics Data System (ADS)
Moguilnaia, Tatiana A.; Andreev, Gleb I.; Agibalov, Andrey A.; Botikov, Andrey G.; Kosenkov, Evgeniy; Saguitova, Elena
2004-03-01
The theoretical and the experimental researches of spectra of absent-minded radiation in medium containing viruses were carried out. The information on spectra luminescence 31 viruses was written down.The new method the express - analysis of viruses in organism of the man was developed. It shall be mentioned that the proposed method of express diagnostics allows detection of infection agent in the organism several hours after infection. It makes it suitable for high efficient testing in blood services for detection and rejection of potential donors infected with such viruses as hepatitis, herpes, Epstein-Barre, cytomegalovirus, and immunodeficiency. Methods of serum diagnostics used for that purpose can detect antibodies to virus only 1-3 months after the person has been infected. The device for the express analysis of 31 viruses of the man was created.
Moon, Myungjin; Nakai, Kenta
2018-04-01
Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box-Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.
Evans, Ciaran; Hardin, Johanna; Stoebel, Daniel M
2017-02-27
RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Chen, Xiao-Min; Feng, Ming-Jun; Shen, Cai-Jie; He, Bin; Du, Xian-Feng; Yu, Yi-Bo; Liu, Jing; Chu, Hui-Min
2017-07-01
The present study was designed to develop a novel method for identifying significant pathways associated with human hypertrophic cardiomyopathy (HCM), based on gene co‑expression analysis. The microarray dataset associated with HCM (E‑GEOD‑36961) was obtained from the European Molecular Biology Laboratory‑European Bioinformatics Institute database. Informative pathways were selected based on the Reactome pathway database and screening treatments. An empirical Bayes method was utilized to construct co‑expression networks for informative pathways, and a weight value was assigned to each pathway. Differential pathways were extracted based on weight threshold, which was calculated using a random model. In order to assess whether the co‑expression method was feasible, it was compared with traditional pathway enrichment analysis of differentially expressed genes, which were identified using the significance analysis of microarrays package. A total of 1,074 informative pathways were screened out for subsequent investigations and their weight values were also obtained. According to the threshold of weight value of 0.01057, 447 differential pathways, including folding of actin by chaperonin containing T‑complex protein 1 (CCT)/T‑complex protein 1 ring complex (TRiC), purine ribonucleoside monophosphate biosynthesis and ubiquinol biosynthesis, were obtained. Compared with traditional pathway enrichment analysis, the number of pathways obtained from the co‑expression approach was increased. The results of the present study demonstrated that this method may be useful to predict marker pathways for HCM. The pathways of folding of actin by CCT/TRiC and purine ribonucleoside monophosphate biosynthesis may provide evidence of the underlying molecular mechanisms of HCM, and offer novel therapeutic directions for HCM.
Kakati, Tulika; Kashyap, Hirak; Bhattacharyya, Dhruba K
2016-11-30
There exist many tools and methods for construction of co-expression network from gene expression data and for extraction of densely connected gene modules. In this paper, a method is introduced to construct co-expression network and to extract co-expressed modules having high biological significance. The proposed method has been validated on several well known microarray datasets extracted from a diverse set of species, using statistical measures, such as p and q values. The modules obtained in these studies are found to be biologically significant based on Gene Ontology enrichment analysis, pathway analysis, and KEGG enrichment analysis. Further, the method was applied on an Alzheimer's disease dataset and some interesting genes are found, which have high semantic similarity among them, but are not significantly correlated in terms of expression similarity. Some of these interesting genes, such as MAPT, CASP2, and PSEN2, are linked with important aspects of Alzheimer's disease, such as dementia, increase cell death, and deposition of amyloid-beta proteins in Alzheimer's disease brains. The biological pathways associated with Alzheimer's disease, such as, Wnt signaling, Apoptosis, p53 signaling, and Notch signaling, incorporate these interesting genes. The proposed method is evaluated in regard to existing literature.
Adamski, Mateusz G; Gumann, Patryk; Baird, Alison E
2014-01-01
Over the past decade rapid advances have occurred in the understanding of RNA expression and its regulation. Quantitative polymerase chain reactions (qPCR) have become the gold standard for quantifying gene expression. Microfluidic next generation, high throughput qPCR now permits the detection of transcript copy number in thousands of reactions simultaneously, dramatically increasing the sensitivity over standard qPCR. Here we present a gene expression analysis method applicable to both standard polymerase chain reactions (qPCR) and high throughput qPCR. This technique is adjusted to the input sample quantity (e.g., the number of cells) and is independent of control gene expression. It is efficiency-corrected and with the use of a universal reference sample (commercial complementary DNA (cDNA)) permits the normalization of results between different batches and between different instruments--regardless of potential differences in transcript amplification efficiency. Modifications of the input quantity method include (1) the achievement of absolute quantification and (2) a non-efficiency corrected analysis. When compared to other commonly used algorithms the input quantity method proved to be valid. This method is of particular value for clinical studies of whole blood and circulating leukocytes where cell counts are readily available.
Kakati, Tulika; Kashyap, Hirak; Bhattacharyya, Dhruba K.
2016-01-01
There exist many tools and methods for construction of co-expression network from gene expression data and for extraction of densely connected gene modules. In this paper, a method is introduced to construct co-expression network and to extract co-expressed modules having high biological significance. The proposed method has been validated on several well known microarray datasets extracted from a diverse set of species, using statistical measures, such as p and q values. The modules obtained in these studies are found to be biologically significant based on Gene Ontology enrichment analysis, pathway analysis, and KEGG enrichment analysis. Further, the method was applied on an Alzheimer’s disease dataset and some interesting genes are found, which have high semantic similarity among them, but are not significantly correlated in terms of expression similarity. Some of these interesting genes, such as MAPT, CASP2, and PSEN2, are linked with important aspects of Alzheimer’s disease, such as dementia, increase cell death, and deposition of amyloid-beta proteins in Alzheimer’s disease brains. The biological pathways associated with Alzheimer’s disease, such as, Wnt signaling, Apoptosis, p53 signaling, and Notch signaling, incorporate these interesting genes. The proposed method is evaluated in regard to existing literature. PMID:27901073
2013-01-01
Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432
Kao, Chi H.J.; Bishop, Karen S.; Xu, Yuanye; Han, Dug Yeo; Murray, Pamela M.; Marlow, Gareth J.; Ferguson, Lynnette R.
2016-01-01
Ganoderma lucidum (lingzhi) has been used for the general promotion of health in Asia for many centuries. The common method of consumption is to boil lingzhi in water and then drink the liquid. In this study, we examined the potential anticancer activities of G. lucidum submerged in two commonly consumed forms of alcohol in East Asia: malt whiskey and rice wine. The anticancer effect of G. lucidum, using whiskey and rice wine-based extraction methods, has not been previously reported. The growth inhibition of G. lucidum whiskey and rice wine extracts on the prostate cancer cell lines, PC3 and DU145, was determined. Using Affymetrix gene expression assays, several biologically active pathways associated with the anticancer activities of G. lucidum extracts were identified. Using gene expression analysis (real-time polymerase chain reaction [RT-PCR]) and protein analysis (Western blotting), we confirmed the expression of key genes and their associated proteins that were initially identified with Affymetrix gene expression analysis. PMID:27006591
Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria
2003-01-01
Background Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Results Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. Conclusion In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects. PMID:12962547
Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria
2003-09-08
Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects.
Weniger, Markus; Engelmann, Julia C; Schultz, Jörg
2007-01-01
Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125
Global Gene Expression Analysis of Yeast Cells during Sake Brewing▿ †
Wu, Hong; Zheng, Xiaohong; Araki, Yoshio; Sahara, Hiroshi; Takagi, Hiroshi; Shimoi, Hitoshi
2006-01-01
During the brewing of Japanese sake, Saccharomyces cerevisiae cells produce a high concentration of ethanol compared with other ethanol fermentation methods. We analyzed the gene expression profiles of yeast cells during sake brewing using DNA microarray analysis. This analysis revealed some characteristics of yeast gene expression during sake brewing and provided a scaffold for a molecular level understanding of the sake brewing process. PMID:16997994
Getting the most out of RNA-seq data analysis.
Khang, Tsung Fei; Lau, Ching Yee
2015-01-01
Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.
Xie, Xin-Ping; Xie, Yu-Feng; Wang, Hong-Qiang
2017-08-23
Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal. This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis. Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.
USDA-ARS?s Scientific Manuscript database
Large-scale, gene expression methods allow for high throughput analysis of physiological pathways at a fraction of the cost of individual gene expression analysis. Systems, such as the Fluidigm quantitative PCR array described here, can provide powerful assessments of the effects of diet, environme...
Isoform-level gene expression patterns in single-cell RNA-sequencing data.
Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias
2018-02-27
RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16,562 isoform-pairs from 4,929 genes. Among those, 26% of the discovered patterns were significant (p<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. The effect of drop-out events, mean expression level, and properties of the expression distribution on the performances of ISOP were also investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoformlevel preference, commitment and heterogeneity in single-cell RNA-sequencing data. The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. mattias.rantalainen@ki.se. Supplementary data are available at Bioinformatics online.
Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol; Scalcinati, Gionata; Fagerberg, Linn; Uhlén, Matthias; Nielsen, Jens
2012-01-01
RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation ≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation ≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data. PMID:22965124
2013-01-01
Background Analysis of global gene expression by DNA microarrays is widely used in experimental molecular biology. However, the complexity of such high-dimensional data sets makes it difficult to fully understand the underlying biological features present in the data. The aim of this study is to introduce a method for DNA microarray analysis that provides an intuitive interpretation of data through dimension reduction and pattern recognition. We present the first “Archetypal Analysis” of global gene expression. The analysis is based on microarray data from five integrated studies of Pseudomonas aeruginosa isolated from the airways of cystic fibrosis patients. Results Our analysis clustered samples into distinct groups with comprehensible characteristics since the archetypes representing the individual groups are closely related to samples present in the data set. Significant changes in gene expression between different groups identified adaptive changes of the bacteria residing in the cystic fibrosis lung. The analysis suggests a similar gene expression pattern between isolates with a high mutation rate (hypermutators) despite accumulation of different mutations for these isolates. This suggests positive selection in the cystic fibrosis lung environment, and changes in gene expression for these isolates are therefore most likely related to adaptation of the bacteria. Conclusions Archetypal analysis succeeded in identifying adaptive changes of P. aeruginosa. The combination of clustering and matrix factorization made it possible to reveal minor similarities among different groups of data, which other analytical methods failed to identify. We suggest that this analysis could be used to supplement current methods used to analyze DNA microarray data. PMID:24059747
NASA Technical Reports Server (NTRS)
Nebenfuhr, A.; Lomax, T. L.
1998-01-01
We have developed an improved method for determination of gene expression levels with RT-PCR. The procedure is rapid and does not require extensive optimization or densitometric analysis. Since the detection of individual transcripts is PCR-based, small amounts of tissue samples are sufficient for the analysis of expression patterns in large gene families. Using this method, we were able to rapidly screen nine members of the Aux/IAA family of auxin-responsive genes and identify those genes which vary in message abundance in a tissue- and light-specific manner. While not offering the accuracy of conventional semi-quantitative or competitive RT-PCR, our method allows quick screening of large numbers of genes in a wide range of RNA samples with just a thermal cycler and standard gel analysis equipment.
[Selection of reference genes of Siraitia grosvenorii by real-time PCR].
Tu, Dong-ping; Mo, Chang-ming; Ma, Xiao-jun; Zhao, Huan; Tang, Qi; Huang, Jie; Pan, Li-mei; Wei, Rong-chang
2015-01-01
Siraitia grosvenorii is a traditional Chinese medicine also as edible food. This study selected six candidate reference genes by real-time quantitative PCR, the expression stability of the candidate reference genes in the different samples was analyzed by using the software and methods of geNorm, NormFinder, BestKeeper, Delta CT method and RefFinder, reference genes for S. grosvenorii were selected for the first time. The results showed that 18SrRNA expressed most stable in all samples, was the best reference gene in the genetic analysis. The study has a guiding role for the analysis of gene expression using qRT-PCR methods, providing a suitable reference genes to ensure the results in the study on differential expressed gene in synthesis and biological pathways, also other genes of S. grosvenorii.
Li, Jiangeng; Su, Lei; Pang, Zenan
2015-12-01
Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
Choi, Hyungwon; Kim, Sinae; Fermin, Damian; Tsou, Chih-Chiang; Nesvizhskii, Alexey I
2015-11-03
We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Mollah, Mohammad Manir Hossain; Jamal, Rahman; Mokhtar, Norfilza Mohd; Harun, Roslan; Mollah, Md. Nurul Haque
2015-01-01
Background Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression. Results The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA. Conclusion Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression. PMID:26413858
2009-01-01
Background A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined. Results We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages. Conclusion GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html. PMID:19775443
A cross-species analysis method to analyze animal models' similarity to human's disease state
2012-01-01
Background Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism. Results We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery. We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning. Conclusions We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology. PMID:23282076
A cross-species analysis method to analyze animal models' similarity to human's disease state.
Yu, Shuhao; Zheng, Lulu; Li, Yun; Li, Chunyan; Ma, Chenchen; Li, Yixue; Li, Xuan; Hao, Pei
2012-01-01
Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism. We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery. We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning. We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology.
A method for generating new datasets based on copy number for cancer analysis.
Kim, Shinuk; Kon, Mark; Kang, Hyunsik
2015-01-01
New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer.
Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D
2013-01-01
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
Characterizing differential gene expression in polyploid grasses lacking a reference transcriptome
USDA-ARS?s Scientific Manuscript database
Basal transcriptome characterization and differential gene expression in response to varying conditions are often addressed through next generation sequencing (NGS) and data analysis techniques. While these strategies are commonly used, there are countless tools, pipelines, data analysis methods an...
Image-based Analysis of Emotional Facial Expressions in Full Face Transplants.
Bedeloglu, Merve; Topcu, Çagdas; Akgul, Arzu; Döger, Ela Naz; Sever, Refik; Ozkan, Ozlenen; Ozkan, Omer; Uysal, Hilmi; Polat, Ovunc; Çolak, Omer Halil
2018-01-20
In this study, it is aimed to determine the degree of the development in emotional expression of full face transplant patients from photographs. Hence, a rehabilitation process can be planned according to the determination of degrees as a later work. As envisaged, in full face transplant cases, the determination of expressions can be confused or cannot be achieved as the healthy control group. In order to perform image-based analysis, a control group consist of 9 healthy males and 2 full-face transplant patients participated in the study. Appearance-based Gabor Wavelet Transform (GWT) and Local Binary Pattern (LBP) methods are adopted for recognizing neutral and 6 emotional expressions which consist of angry, scared, happy, hate, confused and sad. Feature extraction was carried out by using both methods and combination of these methods serially. In the performed expressions, the extracted features of the most distinct zones in the facial area where the eye and mouth region, have been used to classify the emotions. Also, the combination of these region features has been used to improve classifier performance. Control subjects and transplant patients' ability to perform emotional expressions have been determined with K-nearest neighbor (KNN) classifier with region-specific and method-specific decision stages. The results have been compared with healthy group. It has been observed that transplant patients don't reflect some emotional expressions. Also, there were confusions among expressions.
Analyzing gene expression time-courses based on multi-resolution shape mixture model.
Li, Ying; He, Ye; Zhang, Yu
2016-11-01
Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.
Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard
2015-03-09
Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. Copyright © 2015 Brown et al.
Anand Brown, Andrew; Ding, Zhihao; Viñuela, Ana; Glass, Dan; Parts, Leopold; Spector, Tim; Winn, John; Durbin, Richard
2015-01-01
Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38×10−5). These phenotypes are more heritable (h2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors. PMID:25758824
Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A
2017-01-25
With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology.
Melo, C H; Sousa, F C; Batista, R I P T; Sanchez, D J D; Souza-Fabjan, J M G; Freitas, V J F; Melo, L M; Teixeira, D I A
2015-07-31
The present study aimed to compare laparoscopic (LP) and ultrasound-guided (US) biopsy methods to obtain either liver or splenic tissue samples for ectopic gene expression analysis in transgenic goats. Tissue samples were collected from human granulocyte colony stimulating factor (hG-CSF)-transgenic bucks and submitted to real-time PCR for the endogenous genes (Sp1, Baff, and Gapdh) and the transgene (hG-CSF). Both LP and US biopsy methods were successful in obtaining liver and splenic samples that could be analyzed by PCR (i.e., sufficient sample sizes and RNA yield were obtained). Although the number of attempts made to obtain the tissue samples was similar (P > 0.05), LP procedures took considerably longer than the US method (P = 0.03). Finally, transgene transcripts were not detected in spleen or liver samples. Thus, for the phenotypic characterization of a transgenic goat line, investigation of ectopic gene expression can be made successfully by LP or US biopsy, avoiding the traditional approach of euthanasia.
Purification of cardiac myocytes from human heart biopsies for gene expression analysis.
Kosloski, L M; Bales, I K; Allen, K B; Walker, B L; Borkon, A M; Stuart, R S; Pak, A F; Wacker, M J
2009-09-01
The collection of gene expression data from human heart biopsies is important for understanding the cellular mechanisms of arrhythmias and diseases such as cardiac hypertrophy and heart failure. Many clinical and basic research laboratories conduct gene expression analysis using RNA from whole cardiac biopsies. This allows for the analysis of global changes in gene expression in areas of the heart, while eliminating the need for more complex and technically difficult single-cell isolation procedures (such as flow cytometry, laser capture microdissection, etc.) that require expensive equipment and specialized training. The abundance of fibroblasts and other cell types in whole biopsies, however, can complicate gene expression analysis and the interpretation of results. Therefore, we have designed a technique to quickly and easily purify cardiac myocytes from whole cardiac biopsies for RNA extraction. Human heart tissue samples were collected, and our purification method was compared with the standard nonpurification method. Cell imaging using acridine orange staining of the purified sample demonstrated that >98% of total RNA was contained within identifiable cardiac myocytes. Real-time RT-PCR was performed comparing nonpurified and purified samples for the expression of troponin T (myocyte marker), vimentin (fibroblast marker), and alpha-smooth muscle actin (smooth muscle marker). Troponin T expression was significantly increased, and vimentin and alpha-smooth muscle actin were significantly decreased in the purified sample (n = 8; P < 0.05). Extracted RNA was analyzed during each step of the purification, and no significant degradation occurred. These results demonstrate that this isolation method yields a more purified cardiac myocyte RNA sample suitable for downstream applications, such as real-time RT-PCR, and allows for more accurate gene expression changes in cardiac myocytes from heart biopsies.
Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne
2004-06-01
One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."
Global gene expression analysis by combinatorial optimization.
Ameur, Adam; Aurell, Erik; Carlsson, Mats; Westholm, Jakub Orzechowski
2004-01-01
Generally, there is a trade-off between methods of gene expression analysis that are precise but labor-intensive, e.g. RT-PCR, and methods that scale up to global coverage but are not quite as quantitative, e.g. microarrays. In the present paper, we show how how a known method of gene expression profiling (K. Kato, Nucleic Acids Res. 23, 3685-3690 (1995)), which relies on a fairly small number of steps, can be turned into a global gene expression measurement by advanced data post-processing, with potentially little loss of accuracy. Post-processing here entails solving an ancillary combinatorial optimization problem. Validation is performed on in silico experiments generated from the FANTOM data base of full-length mouse cDNA. We present two variants of the method. One uses state-of-the-art commercial software for solving problems of this kind, the other a code developed by us specifically for this purpose, released in the public domain under GPL license.
2013-01-01
Background Triglyceride deposit cardiomyovasculopathy (TGCV) is a rare disease, characterized by the massive accumulation of triglyceride (TG) in multiple tissues, especially skeletal muscle, heart muscle and the coronary artery. TGCV is caused by mutation of adipose triglyceride lipase, which is an essential molecule for the hydrolysis of TG. TGCV is at high risk for skeletal myopathy and heart dysfunction, and therefore premature death. Development of therapeutic methods for TGCV is highly desirable. This study aims to discover specific molecules responsible for TGCV pathogenesis. Methods To identify differentially expressed proteins in TGCV patient cells, the stable isotope labeling with amino acids in cell culture (SILAC) method coupled with LC-MS/MS was performed using skin fibroblast cells derived from two TGCV patients and three healthy volunteers. Altered protein expression in TGCV cells was confirmed using the selected reaction monitoring (SRM) method. Microarray-based transcriptome analysis was simultaneously performed to identify changes in gene expression in TGCV cells. Results Using SILAC proteomics, 4033 proteins were quantified, 53 of which showed significantly altered expression in both TGCV patient cells. Twenty altered proteins were chosen and confirmed using SRM. SRM analysis successfully quantified 14 proteins, 13 of which showed the same trend as SILAC proteomics. The altered protein expression data set was used in Ingenuity Pathway Analysis (IPA), and significant networks were identified. Several of these proteins have been previously implicated in lipid metabolism, while others represent new therapeutic targets or markers for TGCV. Microarray analysis quantified 20743 transcripts, and 252 genes showed significantly altered expression in both TGCV patient cells. Ten altered genes were chosen, 9 of which were successfully confirmed using quantitative RT-PCR. Biological networks of altered genes were analyzed using an IPA search. Conclusions We performed the SILAC- and SRM-based identification-through-confirmation study using skin fibroblast cells derived from TGCV patients, and first identified altered proteins specific for TGCV. Microarray analysis also identified changes in gene expression. The functional networks of the altered proteins and genes are discussed. Our findings will be exploited to elucidate the pathogenesis of TGCV and discover clinically relevant molecules for TGCV in the near future. PMID:24360150
Campbell, Kieran R.
2016-01-01
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. PMID:27870852
Time Series Expression Analyses Using RNA-seq: A Statistical Approach
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Time series expression analyses using RNA-seq: a statistical approach.
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Bryan, Kenneth; Cunningham, Pádraig
2008-01-01
Background Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation. PMID:18831786
Facial Affect Recognition Using Regularized Discriminant Analysis-Based Algorithms
NASA Astrophysics Data System (ADS)
Lee, Chien-Cheng; Huang, Shin-Sheng; Shih, Cheng-Yuan
2010-12-01
This paper presents a novel and effective method for facial expression recognition including happiness, disgust, fear, anger, sadness, surprise, and neutral state. The proposed method utilizes a regularized discriminant analysis-based boosting algorithm (RDAB) with effective Gabor features to recognize the facial expressions. Entropy criterion is applied to select the effective Gabor feature which is a subset of informative and nonredundant Gabor features. The proposed RDAB algorithm uses RDA as a learner in the boosting algorithm. The RDA combines strengths of linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). It solves the small sample size and ill-posed problems suffered from QDA and LDA through a regularization technique. Additionally, this study uses the particle swarm optimization (PSO) algorithm to estimate optimal parameters in RDA. Experiment results demonstrate that our approach can accurately and robustly recognize facial expressions.
Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K
2015-06-04
Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.
Huang, Shi-Ming; Zhao, Xia; Zhao, Xue-Mei; Wang, Xiao-Ying; Li, Shan-Shan; Zhu, Yu-Hui
2014-01-01
Objectives: Renal transplantation is the preferred method for most patients with end-stage renal disease, however, acute renal allograft rejection is still a major risk factor for recipients leading to renal injury. To improve the early diagnosis and treatment of acute rejection, study on the molecular mechanism of it is urgent. Methods: MicroRNA (miRNA) expression profile and mRNA expression profile of acute renal allograft rejection and well-functioning allograft downloaded from ArrayExpress database were applied to identify differentially expressed (DE) miRNAs and DE mRNAs. DE miRNAs targets were predicted by combining five algorithm. By overlapping the DE mRNAs and DE miRNAs targets, common genes were obtained. Differentially co-expressed genes (DCGs) were identified by differential co-expression profile (DCp) and differential co-expression enrichment (DCe) methods in Differentially Co-expressed Genes and Links (DCGL) package. Then, co-expression network of DCGs and the cluster analysis were performed. Functional enrichment analysis for DCGs was undergone. Results: A total of 1270 miRNA targets were predicted and 698 DE mRNAs were obtained. While overlapping miRNA targets and DE mRNAs, 59 common genes were gained. We obtained 103 DCGs and 5 transcription factors (TFs) based on regulatory impact factors (RIF), then built the regulation network of miRNA targets and DE mRNAs. By clustering the co-expression network, 5 modules were obtained. Thereinto, module 1 had the highest degree and module 2 showed the most number of DCGs and common genes. TF CEBPB and several common genes, such as RXRA, BASP1 and AKAP10, were mapped on the co-expression network. C1R showed the highest degree in the network. These genes might be associated with human acute renal allograft rejection. Conclusions: We conducted biological analysis on integration of DE mRNA and DE miRNA in acute renal allograft rejection, displayed gene expression patterns and screened out genes and TFs that may be related to acute renal allograft rejection. PMID:25664019
Demidenko, Natalia V; Penin, Aleksey A
2012-01-01
qRT-PCR is a generally acknowledged method for gene expression analysis due to its precision and reproducibility. However, it is well known that the accuracy of qRT-PCR data varies greatly depending on the experimental design and data analysis. Recently, a set of guidelines has been proposed that aims to improve the reliability of qRT-PCR. However, there are additional factors that have not been taken into consideration in these guidelines that can seriously affect the data obtained using this method. In this study, we report the influence that object morphology can have on qRT-PCR data. We have used a number of Arabidopsis thaliana mutants with altered floral morphology as models for this study. These mutants have been well characterised (including in terms of gene expression levels and patterns) by other techniques. This allows us to compare the results from the qRT-PCR with the results inferred from other methods. We demonstrate that the comparison of gene expression levels in objects that differ greatly in their morphology can lead to erroneous results.
Zhang, L; Liu, X J
2016-06-03
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.
Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang
2015-01-01
RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A
2014-01-01
Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets.
Yoshida, Tsuyoshi; Kobayashi, Takumi; Itoda, Masaya; Muto, Taika; Miyaguchi, Ken; Mogushi, Kaoru; Shoji, Satoshi; Shimokawa, Kazuro; Iida, Satoru; Uetake, Hiroyuki; Ishikawa, Toshiaki; Sugihara, Kenichi; Mizushima, Hiroshi; Tanaka, Hiroshi
2010-07-29
Colorectal cancer (CRC) is one of the most frequently occurring cancers in Japan, and thus a wide range of methods have been deployed to study the molecular mechanisms of CRC. In this study, we performed a comprehensive analysis of CRC, incorporating copy number aberration (CRC) and gene expression data. For the last four years, we have been collecting data from CRC cases and organizing the information as an "omics" study by integrating many kinds of analysis into a single comprehensive investigation. In our previous studies, we had experienced difficulty in finding genes related to CRC, as we observed higher noise levels in the expression data than in the data for other cancers. Because chromosomal aberrations are often observed in CRC, here, we have performed a combination of CNA analysis and expression analysis in order to identify some new genes responsible for CRC. This study was performed as part of the Clinical Omics Database Project at Tokyo Medical and Dental University. The purpose of this study was to investigate the mechanism of genetic instability in CRC by this combination of expression analysis and CNA, and to establish a new method for the diagnosis and treatment of CRC. Comprehensive gene expression analysis was performed on 79 CRC cases using an Affymetrix Gene Chip, and comprehensive CNA analysis was performed using an Affymetrix DNA Sty array. To avoid the contamination of cancer tissue with normal cells, laser micro-dissection was performed before DNA/RNA extraction. Data analysis was performed using original software written in the R language. We observed a high percentage of CNA in colorectal cancer, including copy number gains at 7, 8q, 13 and 20q, and copy number losses at 8p, 17p and 18. Gene expression analysis provided many candidates for CRC-related genes, but their association with CRC did not reach the level of statistical significance. The combination of CNA and gene expression analysis, together with the clinical information, suggested UGT2B28, LOC440995, CXCL6, SULT1B1, RALBP1, TYMS, RAB12, RNMT, ARHGDIB, S1000A2, ABHD2, OIT3 and ABHD12 as genes that are possibly associated with CRC. Some of these genes have already been reported as being related to CRC. TYMS has been reported as being associated with resistance to the anti-cancer drug 5-fluorouracil, and we observed a copy number increase for this gene. RALBP1, ARHGDIB and S100A2 have been reported as oncogenes, and we observed copy number increases in each. ARHGDIB has been reported as a metastasis-related gene, and our data also showed copy number increases of this gene in cases with metastasis. The combination of CNA analysis and gene expression analysis was a more effective method for finding genes associated with the clinicopathological classification of CRC than either analysis alone. Using this combination of methods, we were able to detect genes that have already been associated with CRC. We also identified additional candidate genes that may be new markers or targets for this form of cancer.
Matsumoto, Hirotaka; Kiryu, Hisanori
2016-06-08
Single-cell technologies make it possible to quantify the comprehensive states of individual cells, and have the power to shed light on cellular differentiation in particular. Although several methods have been developed to fully analyze the single-cell expression data, there is still room for improvement in the analysis of differentiation. In this paper, we propose a novel method SCOUP to elucidate differentiation process. Unlike previous dimension reduction-based approaches, SCOUP describes the dynamics of gene expression throughout differentiation directly, including the degree of differentiation of a cell (in pseudo-time) and cell fate. SCOUP is superior to previous methods with respect to pseudo-time estimation, especially for single-cell RNA-seq. SCOUP also successfully estimates cell lineage more accurately than previous method, especially for cells at an early stage of bifurcation. In addition, SCOUP can be applied to various downstream analyses. As an example, we propose a novel correlation calculation method for elucidating regulatory relationships among genes. We apply this method to a single-cell RNA-seq data and detect a candidate of key regulator for differentiation and clusters in a correlation network which are not detected with conventional correlation analysis. We develop a stochastic process-based method SCOUP to analyze single-cell expression data throughout differentiation. SCOUP can estimate pseudo-time and cell lineage more accurately than previous methods. We also propose a novel correlation calculation method based on SCOUP. SCOUP is a promising approach for further single-cell analysis and available at https://github.com/hmatsu1226/SCOUP.
Kaneko, Tomoatsu; Okiji, Takashi; Kaneko, Reika; Suda, Hideaki; Nör, Jacques E
2009-12-01
Laser capture microdissection (LCM) allows microscopic procurement of specific cell types from tissue sections that can then be used for gene expression analysis. In conventional LCM, frozen tissues stained with hematoxylin are normally used to the molecular analysis. Recent studies suggested that it is possible to carry out gene expression analysis of formaldehyde-fixated paraffin embedded (FFPE) tissues that were stained with hematoxylin. However, it is still unclear if quantitative gene expression analyses can be performed from LCM cells from FFPE tissues that were subjected to immunostaining to enhance identification of target cells. In this proof-of-principle study, we analyzed by reverse transcription-PCR (RT-PCR) and real time PCR the expression of genes in factor VIII immunostained human endothelial cells that were dissected from FFPE tissues by LCM. We observed that immunostaining should be performed at 4 degrees C to preserve the mRNA from the cells. The expression of Bcl-2 in the endothelial cells was evaluated by RT-PCR and by real time PCR. Glyceraldehyde-3-phosphate dehydrogenase and 18S were used as house keeping genes for RT-PCR and real time PCR, respectively. This report unveils a method for quantitative gene expression analysis in cells that were identified by immunostaining and retrieved by LCM from FFPE tissues. This method is ideally suited for the analysis of relatively rare cell types within a tissue, and should improve on our ability to perform differential diagnosis of pathologies as compared to conventional LCM.
Murine epithelial cells: isolation and culture.
Davidson, Donald J; Gray, Michael A; Kilanowski, Fiona M; Tarran, Robert; Randell, Scott H; Sheppard, David N; Argent, Barry E; Dorin, Julia R
2004-08-01
We describe an air-liquid interface primary culture method for murine tracheal epithelial cells on semi-permeable membranes, forming polarized epithelia with a high transepithelial resistance, differentiation to ciliated and secretory cells, and physiologically appropriate expression of key genes and ion channels. We also describe the isolation of primary murine nasal epithelial cells for patch-clamp analysis, generating polarised cells with physiologically appropriate distribution and ion channel expression. These methods enable more physiologically relevant analysis of murine airway epithelial cells in vitro and ex vivo, better utilisation of transgenic mouse models of human pulmonary diseases, and have been approved by the European Working Group on CFTR expression.
Analysis of gene expression levels in individual bacterial cells without image segmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwak, In Hae; Son, Minjun; Hagen, Stephen J., E-mail: sjhagen@ufl.edu
2012-05-11
Highlights: Black-Right-Pointing-Pointer We present a method for extracting gene expression data from images of bacterial cells. Black-Right-Pointing-Pointer The method does not employ cell segmentation and does not require high magnification. Black-Right-Pointing-Pointer Fluorescence and phase contrast images of the cells are correlated through the physics of phase contrast. Black-Right-Pointing-Pointer We demonstrate the method by characterizing noisy expression of comX in Streptococcus mutans. -- Abstract: Studies of stochasticity in gene expression typically make use of fluorescent protein reporters, which permit the measurement of expression levels within individual cells by fluorescence microscopy. Analysis of such microscopy images is almost invariably based on amore » segmentation algorithm, where the image of a cell or cluster is analyzed mathematically to delineate individual cell boundaries. However segmentation can be ineffective for studying bacterial cells or clusters, especially at lower magnification, where outlines of individual cells are poorly resolved. Here we demonstrate an alternative method for analyzing such images without segmentation. The method employs a comparison between the pixel brightness in phase contrast vs fluorescence microscopy images. By fitting the correlation between phase contrast and fluorescence intensity to a physical model, we obtain well-defined estimates for the different levels of gene expression that are present in the cell or cluster. The method reveals the boundaries of the individual cells, even if the source images lack the resolution to show these boundaries clearly.« less
Crombach, Anton; Cicin-Sain, Damjan; Wotton, Karl R; Jaeger, Johannes
2012-01-01
Understanding the function and evolution of developmental regulatory networks requires the characterisation and quantification of spatio-temporal gene expression patterns across a range of systems and species. However, most high-throughput methods to measure the dynamics of gene expression do not preserve the detailed spatial information needed in this context. For this reason, quantification methods based on image bioinformatics have become increasingly important over the past few years. Most available approaches in this field either focus on the detailed and accurate quantification of a small set of gene expression patterns, or attempt high-throughput analysis of spatial expression through binary pattern extraction and large-scale analysis of the resulting datasets. Here we present a robust, "medium-throughput" pipeline to process in situ hybridisation patterns from embryos of different species of flies. It bridges the gap between high-resolution, and high-throughput image processing methods, enabling us to quantify graded expression patterns along the antero-posterior axis of the embryo in an efficient and straightforward manner. Our method is based on a robust enzymatic (colorimetric) in situ hybridisation protocol and rapid data acquisition through wide-field microscopy. Data processing consists of image segmentation, profile extraction, and determination of expression domain boundary positions using a spline approximation. It results in sets of measured boundaries sorted by gene and developmental time point, which are analysed in terms of expression variability or spatio-temporal dynamics. Our method yields integrated time series of spatial gene expression, which can be used to reverse-engineer developmental gene regulatory networks across species. It is easily adaptable to other processes and species, enabling the in silico reconstitution of gene regulatory networks in a wide range of developmental contexts.
A systematic evaluation of normalization methods in quantitative label-free proteomics.
Välikangas, Tommi; Suomi, Tomi; Elo, Laura L
2018-01-01
To date, mass spectrometry (MS) data remain inherently biased as a result of reasons ranging from sample handling to differences caused by the instrumentation. Normalization is the process that aims to account for the bias and make samples more comparable. The selection of a proper normalization method is a pivotal task for the reliability of the downstream analysis and results. Many normalization methods commonly used in proteomics have been adapted from the DNA microarray techniques. Previous studies comparing normalization methods in proteomics have focused mainly on intragroup variation. In this study, several popular and widely used normalization methods representing different strategies in normalization are evaluated using three spike-in and one experimental mouse label-free proteomic data sets. The normalization methods are evaluated in terms of their ability to reduce variation between technical replicates, their effect on differential expression analysis and their effect on the estimation of logarithmic fold changes. Additionally, we examined whether normalizing the whole data globally or in segments for the differential expression analysis has an effect on the performance of the normalization methods. We found that variance stabilization normalization (Vsn) reduced variation the most between technical replicates in all examined data sets. Vsn also performed consistently well in the differential expression analysis. Linear regression normalization and local regression normalization performed also systematically well. Finally, we discuss the choice of a normalization method and some qualities of a suitable normalization method in the light of the results of our evaluation. © The Author 2016. Published by Oxford University Press.
Oakley, Todd H; Gu, Zhenglong; Abouheif, Ehab; Patel, Nipam H; Li, Wen-Hsiung
2005-01-01
Understanding the evolution of gene function is a primary challenge of modern evolutionary biology. Despite an expanding database from genomic and developmental studies, we are lacking quantitative methods for analyzing the evolution of some important measures of gene function, such as gene-expression patterns. Here, we introduce phylogenetic comparative methods to compare different models of gene-expression evolution in a maximum-likelihood framework. We find that expression of duplicated genes has evolved according to a nonphylogenetic model, where closely related genes are no more likely than more distantly related genes to share common expression patterns. These results are consistent with previous studies that found rapid evolution of gene expression during the history of yeast. The comparative methods presented here are general enough to test a wide range of evolutionary hypotheses using genomic-scale data from any organism.
WGCNA: an R package for weighted correlation network analysis.
Langfelder, Peter; Horvath, Steve
2008-12-29
Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA.
WGCNA: an R package for weighted correlation network analysis
Langfelder, Peter; Horvath, Steve
2008-01-01
Background Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. Results The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. Conclusion The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at . PMID:19114008
Reynier, Frédéric; Petit, Fabien; Paye, Malick; Turrel-Davin, Fanny; Imbert, Pierre-Emmanuel; Hot, Arnaud; Mougin, Bruno; Miossec, Pierre
2011-01-01
The analysis of gene expression data shows that many genes display similarity in their expression profiles suggesting some co-regulation. Here, we investigated the co-expression patterns in gene expression data and proposed a correlation-based research method to stratify individuals. Using blood from rheumatoid arthritis (RA) patients, we investigated the gene expression profiles from whole blood using Affymetrix microarray technology. Co-expressed genes were analyzed by a biclustering method, followed by gene ontology analysis of the relevant biclusters. Taking the type I interferon (IFN) pathway as an example, a classification algorithm was developed from the 102 RA patients and extended to 10 systemic lupus erythematosus (SLE) patients and 100 healthy volunteers to further characterize individuals. We developed a correlation-based algorithm referred to as Classification Algorithm Based on a Biological Signature (CABS), an alternative to other approaches focused specifically on the expression levels. This algorithm applied to the expression of 35 IFN-related genes showed that the IFN signature presented a heterogeneous expression between RA, SLE and healthy controls which could reflect the level of global IFN signature activation. Moreover, the monitoring of the IFN-related genes during the anti-TNF treatment identified changes in type I IFN gene activity induced in RA patients. In conclusion, we have proposed an original method to analyze genes sharing an expression pattern and a biological function showing that the activation levels of a biological signature could be characterized by its overall state of correlation.
NASA Astrophysics Data System (ADS)
Benitez-Garcia, Gibran; Nakamura, Tomoaki; Kaneko, Masahide
2017-01-01
Darwin was the first one to assert that facial expressions are innate and universal, which are recognized across all cultures. However, recent some cross-cultural studies have questioned this assumed universality. Therefore, this paper presents an analysis of the differences between Western and East-Asian faces of the six basic expressions (anger, disgust, fear, happiness, sadness and surprise) focused on three individual facial regions of eyes-eyebrows, nose and mouth. The analysis is conducted by applying PCA for two feature extraction methods: appearance-based by using the pixel intensities of facial parts, and geometric-based by handling 125 feature points from the face. Both methods are evaluated using 4 standard databases for both racial groups and the results are compared with a cross-cultural human study applied to 20 participants. Our analysis reveals that differences between Westerns and East-Asians exist mainly on the regions of eyes-eyebrows and mouth for expressions of fear and disgust respectively. This work presents important findings for a better design of automatic facial expression recognition systems based on the difference between two racial groups.
Martini, Paolo; Risso, Davide; Sales, Gabriele; Romualdi, Chiara; Lanfranchi, Gerolamo; Cagnin, Stefano
2011-04-11
In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
Analysis of facial expressions in parkinson's disease through video-based automatic methods.
Bandini, Andrea; Orlandi, Silvia; Escalante, Hugo Jair; Giovannelli, Fabio; Cincotta, Massimo; Reyes-Garcia, Carlos A; Vanni, Paola; Zaccara, Gaetano; Manfredi, Claudia
2017-04-01
The automatic analysis of facial expressions is an evolving field that finds several clinical applications. One of these applications is the study of facial bradykinesia in Parkinson's disease (PD), which is a major motor sign of this neurodegenerative illness. Facial bradykinesia consists in the reduction/loss of facial movements and emotional facial expressions called hypomimia. In this work we propose an automatic method for studying facial expressions in PD patients relying on video-based METHODS: 17 Parkinsonian patients and 17 healthy control subjects were asked to show basic facial expressions, upon request of the clinician and after the imitation of a visual cue on a screen. Through an existing face tracker, the Euclidean distance of the facial model from a neutral baseline was computed in order to quantify the changes in facial expressivity during the tasks. Moreover, an automatic facial expressions recognition algorithm was trained in order to study how PD expressions differed from the standard expressions. Results show that control subjects reported on average higher distances than PD patients along the tasks. This confirms that control subjects show larger movements during both posed and imitated facial expressions. Moreover, our results demonstrate that anger and disgust are the two most impaired expressions in PD patients. Contactless video-based systems can be important techniques for analyzing facial expressions also in rehabilitation, in particular speech therapy, where patients could get a definite advantage from a real-time feedback about the proper facial expressions/movements to perform. Copyright © 2017 Elsevier B.V. All rights reserved.
Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco
2005-01-01
A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204
Lee, Mikyung; Kim, Yangseok
2009-12-16
Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules. To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square test. By successive operations of two modules, users can clarify how gene expression levels are affected by the phenotype specific genomic alterations. As CHESS was developed in both Java application and web environments, it can be run on a web browser or a local machine. It also supports all experimental platforms if a properly formatted text file is provided to include the chromosomal position of probes and their gene identifiers. CHESS is a user-friendly tool for investigating disease specific genomic alterations and quantitative relationships between those genomic alterations and genome-wide gene expression profiling.
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.
Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping
2016-08-26
Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.
Nilsson, Björn; Håkansson, Petra; Johansson, Mikael; Nelander, Sven; Fioretos, Thoas
2007-01-01
Ontological analysis facilitates the interpretation of microarray data. Here we describe new ontological analysis methods which, unlike existing approaches, are threshold-free and statistically powerful. We perform extensive evaluations and introduce a new concept, detection spectra, to characterize methods. We show that different ontological analysis methods exhibit distinct detection spectra, and that it is critical to account for this diversity. Our results argue strongly against the continued use of existing methods, and provide directions towards an enhanced approach. PMID:17488501
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.
Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T
2017-12-28
In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
Wagner, Florian
2015-01-01
Method Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. Results I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. PMID:26575370
Functional regression method for whole genome eQTL epistasis analysis with sequencing data.
Xu, Kelin; Jin, Li; Xiong, Momiao
2017-05-18
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.
Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich
2015-12-16
Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.
New Tools for Comparing Microscopy Images: Quantitative Analysis of Cell Types in Bacillus subtilis
van Gestel, Jordi; Vlamakis, Hera
2014-01-01
Fluorescence microscopy is a method commonly used to examine individual differences between bacterial cells, yet many studies still lack a quantitative analysis of fluorescence microscopy data. Here we introduce some simple tools that microbiologists can use to analyze and compare their microscopy images. We show how image data can be converted to distribution data. These data can be subjected to a cluster analysis that makes it possible to objectively compare microscopy images. The distribution data can further be analyzed using distribution fitting. We illustrate our methods by scrutinizing two independently acquired data sets, each containing microscopy images of a doubly labeled Bacillus subtilis strain. For the first data set, we examined the expression of srfA and tapA, two genes which are expressed in surfactin-producing and matrix-producing cells, respectively. For the second data set, we examined the expression of eps and tapA; these genes are expressed in matrix-producing cells. We show that srfA is expressed by all cells in the population, a finding which contrasts with a previously reported bimodal distribution of srfA expression. In addition, we show that eps and tapA do not always have the same expression profiles, despite being expressed in the same cell type: both operons are expressed in cell chains, while single cells mainly express eps. These findings exemplify that the quantification and comparison of microscopy data can yield insights that otherwise would go unnoticed. PMID:25448819
New tools for comparing microscopy images: quantitative analysis of cell types in Bacillus subtilis.
van Gestel, Jordi; Vlamakis, Hera; Kolter, Roberto
2015-02-15
Fluorescence microscopy is a method commonly used to examine individual differences between bacterial cells, yet many studies still lack a quantitative analysis of fluorescence microscopy data. Here we introduce some simple tools that microbiologists can use to analyze and compare their microscopy images. We show how image data can be converted to distribution data. These data can be subjected to a cluster analysis that makes it possible to objectively compare microscopy images. The distribution data can further be analyzed using distribution fitting. We illustrate our methods by scrutinizing two independently acquired data sets, each containing microscopy images of a doubly labeled Bacillus subtilis strain. For the first data set, we examined the expression of srfA and tapA, two genes which are expressed in surfactin-producing and matrix-producing cells, respectively. For the second data set, we examined the expression of eps and tapA; these genes are expressed in matrix-producing cells. We show that srfA is expressed by all cells in the population, a finding which contrasts with a previously reported bimodal distribution of srfA expression. In addition, we show that eps and tapA do not always have the same expression profiles, despite being expressed in the same cell type: both operons are expressed in cell chains, while single cells mainly express eps. These findings exemplify that the quantification and comparison of microscopy data can yield insights that otherwise would go unnoticed. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Clustering gene expression data based on predicted differential effects of GV interaction.
Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu
2005-02-01
Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.
2013-01-01
Background Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods. PMID:24373308
Østvik, Ann E.; Drozdov, Ignat; Gustafsson, Bjørn I.; Kidd, Mark; Beisvag, Vidar; Torp, Sverre H.; Waldum, Helge L.; Martinsen, Tom Christian; Damås, Jan Kristian; Espevik, Terje; Sandvik, Arne K.
2013-01-01
Background In inflammatory bowel disease (IBD), genetic susceptibility together with environmental factors disturbs gut homeostasis producing chronic inflammation. The two main IBD subtypes are Ulcerative colitis (UC) and Crohn’s disease (CD). We present the to-date largest microarray gene expression study on IBD encompassing both inflamed and un-inflamed colonic tissue. A meta-analysis including all available, comparable data was used to explore important aspects of IBD inflammation, thereby validating consistent gene expression patterns. Methods Colon pinch biopsies from IBD patients were analysed using Illumina whole genome gene expression technology. Differential expression (DE) was identified using LIMMA linear model in the R statistical computing environment. Results were enriched for gene ontology (GO) categories. Sets of genes encoding antimicrobial proteins (AMP) and proteins involved in T helper (Th) cell differentiation were used in the interpretation of the results. All available data sets were analysed using the same methods, and results were compared on a global and focused level as t-scores. Results Gene expression in inflamed mucosa from UC and CD are remarkably similar. The meta-analysis confirmed this. The patterns of AMP and Th cell-related gene expression were also very similar, except for IL23A which was consistently higher expressed in UC than in CD. Un-inflamed tissue from patients demonstrated minimal differences from healthy controls. Conclusions There is no difference in the Th subgroup involvement between UC and CD. Th1/Th17 related expression, with little Th2 differentiation, dominated both diseases. The different IL23A expression between UC and CD suggests an IBD subtype specific role. AMPs, previously little studied, are strongly overexpressed in IBD. The presented meta-analysis provides a sound background for further research on IBD pathobiology. PMID:23468882
Validation of MIMGO: a method to identify differentially expressed GO terms in a microarray dataset
2012-01-01
Background We previously proposed an algorithm for the identification of GO terms that commonly annotate genes whose expression is upregulated or downregulated in some microarray data compared with in other microarray data. We call these “differentially expressed GO terms” and have named the algorithm “matrix-assisted identification method of differentially expressed GO terms” (MIMGO). MIMGO can also identify microarray data in which genes annotated with a differentially expressed GO term are upregulated or downregulated. However, MIMGO has not yet been validated on a real microarray dataset using all available GO terms. Findings We combined Gene Set Enrichment Analysis (GSEA) with MIMGO to identify differentially expressed GO terms in a yeast cell cycle microarray dataset. GSEA followed by MIMGO (GSEA + MIMGO) correctly identified (p < 0.05) microarray data in which genes annotated to differentially expressed GO terms are upregulated. We found that GSEA + MIMGO was slightly less effective than, or comparable to, GSEA (Pearson), a method that uses Pearson’s correlation as a metric, at detecting true differentially expressed GO terms. However, unlike other methods including GSEA (Pearson), GSEA + MIMGO can comprehensively identify the microarray data in which genes annotated with a differentially expressed GO term are upregulated or downregulated. Conclusions MIMGO is a reliable method to identify differentially expressed GO terms comprehensively. PMID:23232071
Klink, Vincent P.; Overall, Christopher C.; Alkharouf, Nadim W.; MacDonald, Margaret H.; Matthews, Benjamin F.
2010-01-01
Background. A comparative microarray investigation was done using detection call methodology (DCM) and differential expression analyses. The goal was to identify genes found in specific cell populations that were eliminated by differential expression analysis due to the nature of differential expression methods. Laser capture microdissection (LCM) was used to isolate nearly homogeneous populations of plant root cells. Results. The analyses identified the presence of 13,291 transcripts between the 4 different sample types. The transcripts filtered down into a total of 6,267 that were detected as being present in one or more sample types. A comparative analysis of DCM and differential expression methods showed a group of genes that were not differentially expressed, but were expressed at detectable amounts within specific cell types. Conclusion. The DCM has identified patterns of gene expression not shown by differential expression analyses. DCM has identified genes that are possibly cell-type specific and/or involved in important aspects of plant nematode interactions during the resistance response, revealing the uniqueness of a particular cell population at a particular point during its differentiation process. PMID:20508855
Analysis of gene expression levels in individual bacterial cells without image segmentation.
Kwak, In Hae; Son, Minjun; Hagen, Stephen J
2012-05-11
Studies of stochasticity in gene expression typically make use of fluorescent protein reporters, which permit the measurement of expression levels within individual cells by fluorescence microscopy. Analysis of such microscopy images is almost invariably based on a segmentation algorithm, where the image of a cell or cluster is analyzed mathematically to delineate individual cell boundaries. However segmentation can be ineffective for studying bacterial cells or clusters, especially at lower magnification, where outlines of individual cells are poorly resolved. Here we demonstrate an alternative method for analyzing such images without segmentation. The method employs a comparison between the pixel brightness in phase contrast vs fluorescence microscopy images. By fitting the correlation between phase contrast and fluorescence intensity to a physical model, we obtain well-defined estimates for the different levels of gene expression that are present in the cell or cluster. The method reveals the boundaries of the individual cells, even if the source images lack the resolution to show these boundaries clearly. Copyright © 2012 Elsevier Inc. All rights reserved.
Zhang, Qingyang
2018-05-16
Differential co-expression analysis, as a complement of differential expression analysis, offers significant insights into the changes in molecular mechanism of different phenotypes. A prevailing approach to detecting differentially co-expressed genes is to compare Pearson's correlation coefficients in two phenotypes. However, due to the limitations of Pearson's correlation measure, this approach lacks the power to detect nonlinear changes in gene co-expression which is common in gene regulatory networks. In this work, a new nonparametric procedure is proposed to search differentially co-expressed gene pairs in different phenotypes from large-scale data. Our computational pipeline consisted of two main steps, a screening step and a testing step. The screening step is to reduce the search space by filtering out all the independent gene pairs using distance correlation measure. In the testing step, we compare the gene co-expression patterns in different phenotypes by a recently developed edge-count test. Both steps are distribution-free and targeting nonlinear relations. We illustrate the promise of the new approach by analyzing the Cancer Genome Atlas data and the METABRIC data for breast cancer subtypes. Compared with some existing methods, the new method is more powerful in detecting nonlinear type of differential co-expressions. The distance correlation screening can greatly improve computational efficiency, facilitating its application to large data sets.
Zhou, Wei; Song, Xiang-gang; Chen, Chao; Wang, Shu-mei; Liang, Sheng-wang
2015-08-01
Action mechanism and material base of compound Danshen dripping pills in treatment of carotid atherosclerosis were discussed based on gene expression profile and molecular fingerprint in this paper. First, gene expression profiles of atherosclerotic carotid artery tissues and histologically normal tissues in human body were collected, and were screened using significance analysis of microarray (SAM) to screen out differential gene expressions; then differential genes were analyzed by Gene Ontology (GO) analysis and KEGG pathway analysis; to avoid some genes with non-outstanding differential expression but biologically importance, Gene Set Enrichment Analysis (GSEA) were performed, and 7 chemical ingredients with higher negative enrichment score were obtained by Cmap method, implying that they could reversely regulate the gene expression profiles of pathological tissues; and last, based on the hypotheses that similar structures have similar activities, 336 ingredients of compound Danshen dripping pills were compared with 7 drug molecules in 2D molecular fingerprints method. The results showed that 147 differential genes including 60 up-regulated genes and 87 down regulated genes were screened out by SAM. And in GO analysis, Biological Process ( BP) is mainly concerned with biological adhesion, response to wounding and inflammatory response; Cellular Component (CC) is mainly concerned with extracellular region, extracellular space and plasma membrane; while Molecular Function (MF) is mainly concerned with antigen binding, metalloendopeptidase activity and peptide binding. KEGG pathway analysis is mainly concerned with JAK-STAT, RIG-I like receptor and PPAR signaling pathway. There were 10 compounds, such as hexadecane, with Tanimoto coefficients greater than 0.85, which implied that they may be the active ingredients (AIs) of compound Danshen dripping pills in treatment of carotid atherosclerosis (CAs). The present method can be applied to the research on material base and molecular action mechanism of TCM.
Choi, Youngmin; Lee, Hyung-Sik; Hur, Won-Joo; Sung, Ki-Han; Kim, Ki-Uk; Choi, Sun-Seob; Kim, Su-Jin; Kim, Dae-Cheol
2013-01-01
Purpose There are conflicting results surrounding the prognostic significance of epidermal growth factor receptor (EGFR) status in glioblastoma (GBM) patients. Accordingly, we attempted to assess the influence of EGFR expression on the survival of GBM patients receiving postoperative radiotherapy. Materials and Methods Thirty three GBM patients who had received surgery and postoperative radiotherapy at our institute, between March 1997 and February 2006, were included. The evaluation of EGFR expression with immunohistochemistry was available for 30 patients. Kaplan-Meier survival analysis and Cox regression were used for statistical analysis. Results EGFR was expressed in 23 patients (76.7%), and not expressed in seven (23.3%). Survival in EGFR expressing GBM patients was significantly less than that in non-expressing patients (median survival: 12.5 versus 17.5 months, p=0.013). Patients who received more than 60 Gy showed improved survival over those who received up to 60 Gy (median survival: 17.0 versus 9.0 months, p=0.000). Negative EGFR expression and a higher radiation dose were significantly correlated with improved survival on multivariate analysis. Survival rates showed no differences according to age, sex, and surgical extent. Conclusion The expression of EGFR demonstrated a significantly deleterious effect on the survival of GBM patients. Therefore, approaches targeting EGFR should be considered in potential treatment methods for GBM patients, in addition to current management strategies. PMID:23225805
Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W
2006-03-01
Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.
COLLECTING URINE SAMPLES FROM YOUNG CHILDREN FOR PESTICIDE STUDIES
To estimate pesticide exposure for young children wearing diapers, a method for collecting urine samples for analysis of pesticide metabolites is needed. To find a practical method, two possibilities were investigated: (1) analysis of expressed urine from cotton diaper inserts ...
Vartanian, Kristina; Slottke, Rachel; Johnstone, Timothy; Casale, Amanda; Planck, Stephen R; Choi, Dongseok; Smith, Justine R; Rosenbaum, James T; Harrington, Christina A
2009-01-01
Background Peripheral blood is an accessible and informative source of transcriptomal information for many human disease and pharmacogenomic studies. While there can be significant advantages to analyzing RNA isolated from whole blood, particularly in clinical studies, the preparation of samples for microarray analysis is complicated by the need to minimize artifacts associated with highly abundant globin RNA transcripts. The impact of globin RNA transcripts on expression profiling data can potentially be reduced by using RNA preparation and labeling methods that remove or block globin RNA during the microarray assay. We compared four different methods for preparing microarray hybridization targets from human whole blood collected in PAXGene tubes. Three of the methods utilized the Affymetrix one-cycle cDNA synthesis/in vitro transcription protocol but varied treatment of input RNA as follows: i. no treatment; ii. treatment with GLOBINclear; or iii. treatment with globin PNA oligos. In the fourth method cDNA targets were prepared with the Ovation amplification and labeling system. Results We find that microarray targets generated with labeling methods that reduce globin mRNA levels or minimize the impact of globin transcripts during hybridization detect more transcripts in the microarray assay compared with the standard Affymetrix method. Comparison of microarray results with quantitative PCR analysis of a panel of genes from the NF-kappa B pathway shows good correlation of transcript measurements produced with all four target preparation methods, although method-specific differences in overall correlation were observed. The impact of freezing blood collected in PAXGene tubes on data reproducibility was also examined. Expression profiles show little or no difference when RNA is extracted from either fresh or frozen blood samples. Conclusion RNA preparation and labeling methods designed to reduce the impact of globin mRNA transcripts can significantly improve the sensitivity of the DNA microarray expression profiling assay for whole blood samples. While blockage of globin transcripts during first strand cDNA synthesis with globin PNAs resulted in the best overall performance in this study, we conclude that selection of a protocol for expression profiling studies in blood should depend on several factors, including implementation requirements of the method and study design. RNA isolated from either freshly collected or frozen blood samples stored in PAXGene tubes can be used without altering gene expression profiles. PMID:19123946
NASA Astrophysics Data System (ADS)
Starikova, M. K.; Bulanova, A. A.; Bukreeva, E. B.; Karapuzikov, A. A.; Karapuzikov, A. I.; Kistenev, Y. V.; Klementyev, V. M.; Kolker, D. B.; Kuzmin, D. A.; Nikiforova, O. Y.; Ponomarev, Yu. N.; Sherstov, I. V.; Boyko, A. A.
2013-11-01
Pulmonary diseases diagnostics always occupies one of the key positions in medicine practices. A large variety of high technology methods are used today, but none of them cannot be used for early screening of pulmonary diseases. We discuss abilities of methods of IR and terahertz laser spectroscopy for noninvasive express diagnostics of pulmonary diseases on a base of analysis of absorption spectra of patient's gas emission, in particular, exhaled air. Experience in the field of approaches to experimental data analysis and hard-ware realization of gas analyzers for medical applications is also discussed.
Jiang, Zhiquan; Gui, Songbo; Zhang, Yazhuo
2010-09-01
Growth-hormone-secreting pituitary adenomas (GHomas) account for approximately 20% of all pituitary neoplasms. However, the pathogenesis of GHomas remains to be elucidated. To explore the possible pathogenesis of GHomas, we used bead-based fiber-optic arrays to examine the gene expression in five GHomas and compared them to three healthy pituitaries. Four differentially expressed genes were chosen randomly for validation by quantitative real-time reverse transcription-polymerase chain reaction. We then performed pathway analysis on the identified differentially expressed genes using the Kyoto Encyclopedia of Genes and Genomes. Array analysis showed significant increases in the expression of 353 genes and 206 expressed sequence tags (ESTs) and decreases in 565 genes and 29 ESTs. Bioinformatic analysis showed that the genes HIGD1B, HOXB2, ANGPT2, HPGD and BTG2 may play an important role in the tumorigenesis and progression of GHomas. Pathway analysis showed that the wingless-type signaling pathway and extracellular-matrix receptor interactions may play a key role in the tumorigenesis and progression of GHomas. Our data suggested that there are numerous aberrantly expressed genes and pathways involved in the pathogenesis of GHomas. Bead-based fiber-optic arrays combined with pathway analysis of differentially expressed genes appear to be a valid method for investigating the pathogenesis of tumors.
JIANG, ZHIQUAN; GUI, SONGBO; ZHANG, YAZHUO
2010-01-01
Growth-hormone-secreting pituitary adenomas (GHomas) account for approximately 20% of all pituitary neoplasms. However, the pathogenesis of GHomas remains to be elucidated. To explore the possible pathogenesis of GHomas, we used bead-based fiber-optic arrays to examine the gene expression in five GHomas and compared them to three healthy pituitaries. Four differentially expressed genes were chosen randomly for validation by quantitative real-time reverse transcription-polymerase chain reaction. We then performed pathway analysis on the identified differentially expressed genes using the Kyoto Encyclopedia of Genes and Genomes. Array analysis showed significant increases in the expression of 353 genes and 206 expressed sequence tags (ESTs) and decreases in 565 genes and 29 ESTs. Bioinformatic analysis showed that the genes HIGD1B, HOXB2, ANGPT2, HPGD and BTG2 may play an important role in the tumorigenesis and progression of GHomas. Pathway analysis showed that the wingless-type signaling pathway and extracellular-matrix receptor interactions may play a key role in the tumorigenesis and progression of GHomas. Our data suggested that there are numerous aberrantly expressed genes and pathways involved in the pathogenesis of GHomas. Bead-based fiber-optic arrays combined with pathway analysis of differentially expressed genes appear to be a valid method for investigating the pathogenesis of tumors. PMID:22993617
Wang, Hongyang; Owens, James D; Shih, Joanna H; Li, Ming-Chung; Bonner, Robert F; Mushinski, J Frederic
2006-04-27
Gene expression profiling by microarray analysis of cells enriched by laser capture microdissection (LCM) faces several technical challenges. Frozen sections yield higher quality RNA than paraffin-imbedded sections, but even with frozen sections, the staining methods used for histological identification of cells of interest could still damage the mRNA in the cells. To study the contribution of staining methods to degradation of results from gene expression profiling of LCM samples, we subjected pellets of the mouse plasma cell tumor cell line TEPC 1165 to direct RNA extraction and to parallel frozen sectioning for LCM and subsequent RNA extraction. We used microarray hybridization analysis to compare gene expression profiles of RNA from cell pellets with gene expression profiles of RNA from frozen sections that had been stained with hematoxylin and eosin (H&E), Nissl Stain (NS), and for immunofluorescence (IF) as well as with the plasma cell-revealing methyl green pyronin (MGP) stain. All RNAs were amplified with two rounds of T7-based in vitro transcription and analyzed by two-color expression analysis on 10-K cDNA microarrays. The MGP-stained samples showed the least introduction of mRNA loss, followed by H&E and immunofluorescence. Nissl staining was significantly more detrimental to gene expression profiles, presumably owing to an aqueous step in which RNA may have been damaged by endogenous or exogenous RNAases. RNA damage can occur during the staining steps preparatory to laser capture microdissection, with the consequence of loss of representation of certain genes in microarray hybridization analysis. Inclusion of RNAase inhibitor in aqueous staining solutions appears to be important in protecting RNA from loss of gene transcripts.
Wang, Hongyang; Owens, James D; Shih, Joanna H; Li, Ming-Chung; Bonner, Robert F; Mushinski, J Frederic
2006-01-01
Background Gene expression profiling by microarray analysis of cells enriched by laser capture microdissection (LCM) faces several technical challenges. Frozen sections yield higher quality RNA than paraffin-imbedded sections, but even with frozen sections, the staining methods used for histological identification of cells of interest could still damage the mRNA in the cells. To study the contribution of staining methods to degradation of results from gene expression profiling of LCM samples, we subjected pellets of the mouse plasma cell tumor cell line TEPC 1165 to direct RNA extraction and to parallel frozen sectioning for LCM and subsequent RNA extraction. We used microarray hybridization analysis to compare gene expression profiles of RNA from cell pellets with gene expression profiles of RNA from frozen sections that had been stained with hematoxylin and eosin (H&E), Nissl Stain (NS), and for immunofluorescence (IF) as well as with the plasma cell-revealing methyl green pyronin (MGP) stain. All RNAs were amplified with two rounds of T7-based in vitro transcription and analyzed by two-color expression analysis on 10-K cDNA microarrays. Results The MGP-stained samples showed the least introduction of mRNA loss, followed by H&E and immunofluorescence. Nissl staining was significantly more detrimental to gene expression profiles, presumably owing to an aqueous step in which RNA may have been damaged by endogenous or exogenous RNAases. Conclusion RNA damage can occur during the staining steps preparatory to laser capture microdissection, with the consequence of loss of representation of certain genes in microarray hybridization analysis. Inclusion of RNAase inhibitor in aqueous staining solutions appears to be important in protecting RNA from loss of gene transcripts. PMID:16643667
An efficient method to identify differentially expressed genes in microarray experiments
Qin, Huaizhen; Feng, Tao; Harding, Scott A.; Tsai, Chung-Jui; Zhang, Shuanglin
2013-01-01
Motivation Microarray experiments typically analyze thousands to tens of thousands of genes from small numbers of biological replicates. The fact that genes are normally expressed in functionally relevant patterns suggests that gene-expression data can be stratified and clustered into relatively homogenous groups. Cluster-wise dimensionality reduction should make it feasible to improve screening power while minimizing information loss. Results We propose a powerful and computationally simple method for finding differentially expressed genes in small microarray experiments. The method incorporates a novel stratification-based tight clustering algorithm, principal component analysis and information pooling. Comprehensive simulations show that our method is substantially more powerful than the popular SAM and eBayes approaches. We applied the method to three real microarray datasets: one from a Populus nitrogen stress experiment with 3 biological replicates; and two from public microarray datasets of human cancers with 10 to 40 biological replicates. In all three analyses, our method proved more robust than the popular alternatives for identification of differentially expressed genes. Availability The C++ code to implement the proposed method is available upon request for academic use. PMID:18453554
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.
Wagner, Florian
2015-01-01
Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.
Ju, Jin Hyun; Crystal, Ronald G.
2017-01-01
Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL. PMID:28505156
Ju, Jin Hyun; Shenoy, Sushila A; Crystal, Ronald G; Mezey, Jason G
2017-05-01
Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL.
Cohen, C D; Kretzler, M
2009-03-01
Histological analysis of kidney biopsies is an essential part of our current diagnostic workup of patients with renal disease. Besides the already established diagnostic tools, new methods allow extensive analysis of the sample tissue's gene expression. Using results from a European multicenter study on gene expression analysis of renal biopsies, in this review we demonstrate that this novel approach not only expands the scope of so-called basic research but also might supplement future biopsy diagnostics. The goals are improved diagnosis and more specific therapy choice and prognosis estimates.
Network-Induced Classification Kernels for Gene Expression Profile Analysis
Dror, Gideon; Shamir, Ron
2012-01-01
Abstract Computational classification of gene expression profiles into distinct disease phenotypes has been highly successful to date. Still, robustness, accuracy, and biological interpretation of the results have been limited, and it was suggested that use of protein interaction information jointly with the expression profiles can improve the results. Here, we study three aspects of this problem. First, we show that interactions are indeed relevant by showing that co-expressed genes tend to be closer in the network of interactions. Second, we show that the improved performance of one extant method utilizing expression and interactions is not really due to the biological information in the network, while in another method this is not the case. Finally, we develop a new kernel method—called NICK—that integrates network and expression data for SVM classification, and demonstrate that overall it achieves better results than extant methods while running two orders of magnitude faster. PMID:22697242
Application of laser-capture microdissection to analysis of gene expression in the testis.
Sluka, Pavel; O'Donnell, Liza; McLachlan, Robert I; Stanton, Peter G
2008-01-01
The isolation and molecular analysis of highly purified cell populations from complex, heterogeneous tissues has been a challenge for many years. Spermatogenesis in the testis is a particularly difficult process to study given the unique multiple cellular associations within the seminiferous epithelium, making the isolation of specific cell types difficult. Laser-capture microdissection (LCM) is a recently developed technique that enables the isolation of individual cell populations from complex tissues. This technology has enhanced our ability to directly examine gene expression in enriched testicular cell populations by routine methods of gene expression analysis, such as real-time RT-PCR, differential display, and gene microarrays. The application of LCM has however introduced methodological hurdles that have not been encountered with more conventional molecular analyses of whole tissue. In particular, tissue handling (i.e. fixation, storage, and staining), consumables (e.g. slide choice), staining reagents (conventional H&E vs. fluorescence), extraction methods, and downstream applications have all required re-optimisation to facilitate differential gene expression analysis using the small amounts of material obtained using LCM. This review will discuss three critical issues that are essential for successful procurement of cells from testicular tissue sections; tissue morphology, capture success, and maintenance of molecular integrity. The importance of these issues will be discussed with specific reference to the two most commonly used LCM systems; the Arcturus PixCell IIe and PALM systems. The rat testis will be used as a model, and emphasis will be placed on issues of tissue handling, processing, and staining methods, including the application of fluorescence techniques to assist in the identification of cells of interest for the purposes of mRNA expression analysis.
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Hauser, Péter; Hanzély, Zoltán; Jakab, Zsuzsanna; Oláh, Lászlóné; Szabó, Erika; Jeney, András; Schuler, Dezso; Fekete, Gyoörgy; Bognár, László; Garami, Miklós
2006-07-01
Expression of heat shock proteins (HSPs) is of prognostic significance in several tumor types. HSP expression levels were determined in medulloblastomas and tested whether HSPs expression was associated with prognostic parameters. Expression of antiapoptotic HSP 27, HSP 70, and HSP 90 was investigated by immunohistochemistry, on paraffin-embedded sections from 65 patients. Expression of HSPs was validated on internal vascular controls and by Western blotting analysis. Sample evaluation was based on the estimated percentage of HSP positive tumor cells. For survival analysis Kaplan-Meier method, for statistical analysis chi2 test, univariate analysis, and log rank test were applied. Expression of HSPs varied in medulloblastomas. On the basis of the average expression rate of HSPs, at HSP 27 and HSP 90 with a 10% cut off, and at HSP 70 with a 70% cut off 2 groups were created. The amount of expression of any of the HSP types was not significantly associated with known prognostic factors (age of patient, extent of resection, presence of metastasis) and histologic subtype. After an average follow-up period of 4.30 years, no significant difference was observed in survival depending on the expression of HSP 27 or HSP 70 or HSP 90. The high expression of HSPs indicates that these proteins are potential therapeutic targets.
Hashimoto, Masakazu; Bogdanovic, Nenad; Nakagawa, Hiroyuki; Volkmann, Inga; Aoki, Mikio; Winblad, Bengt; Sakai, Jun; Tjernberg, Lars O
2012-01-01
Abstract It is evident that the symptoms of Alzheimer's disease (AD) are derived from severe neuronal damage, and especially pyramidal neurons in the hippocampus are affected pathologically. Here, we analysed the proteome of hippocampal neurons, isolated from post-mortem brains by laser capture microdissection. By using 18O labelling and mass spectrometry, the relative expression levels of 150 proteins in AD and controls were estimated. Many of the identified proteins are involved in transcription and nucleotide binding, glycolysis, heat-shock response, microtubule stabilization, axonal transport or inflammation. The proteins showing the most altered expression in AD were selected for immunohistochemical analysis. These analyses confirmed the altered expression levels, and showed in many AD cases a pathological pattern. For comparison, we also analysed hippocampal sections by Western blot. The expression levels found by this method showed poor correlation with the neuron-specific analysis. Hence, we conclude that cell-specific proteome analysis reveals differences in the proteome that cannot be detected by bulk analysis. PMID:21883897
Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong
2013-10-18
Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.
Application of meta-analysis methods for identifying proteomic expression level differences.
Amess, Bob; Kluge, Wolfgang; Schwarz, Emanuel; Haenisch, Frieder; Alsaif, Murtada; Yolken, Robert H; Leweke, F Markus; Guest, Paul C; Bahn, Sabine
2013-07-01
We present new statistical approaches for identification of proteins with expression levels that are significantly changed when applying meta-analysis to two or more independent experiments. We showed that the Euclidean distance measure has reduced risk of false positives compared to the rank product method. Our Ψ-ranking method has advantages over the traditional fold-change approach by incorporating both the fold-change direction as well as the p-value. In addition, the second novel method, Π-ranking, considers the ratio of the fold-change and thus integrates all three parameters. We further improved the latter by introducing our third technique, Σ-ranking, which combines all three parameters in a balanced nonparametric approach. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Zhao, W; Busto, R; Truettner, J; Ginsberg, M D
2001-07-30
The analysis of pixel-based relationships between local cerebral blood flow (LCBF) and mRNA expression can reveal important insights into brain function. Traditionally, LCBF and in situ hybridization studies for genes of interest have been analyzed in separate series. To overcome this limitation and to increase the power of statistical analysis, this study focused on developing a double-label method to measure local cerebral blood flow (LCBF) and gene expressions simultaneously by means of a dual-autoradiography procedure. A 14C-iodoantipyrine autoradiographic LCBF study was first performed. Serial brain sections (12 in this study) were obtained at multiple coronal levels and were processed in the conventional manner to yield quantitative LCBF images. Two replicate sections at each bregma level were then used for in situ hybridization. To eliminate the 14C-iodoantipyrine from these sections, a chloroform-washout procedure was first performed. The sections were then processed for in situ hybridization autoradiography for the probes of interest. This method was tested in Wistar rats subjected to 12 min of global forebrain ischemia by two-vessel occlusion plus hypotension, followed by 2 or 6 h of reperfusion (n=4-6 per group). LCBF and in situ hybridization images for heat shock protein 70 (HSP70) were generated for each rat, aligned by disparity analysis, and analyzed on a pixel-by-pixel basis. This method yielded detailed inter-modality correlation between LCBF and HSP70 mRNA expressions. The advantages of this method include reducing the number of experimental animals by one-half; and providing accurate pixel-based correlations between different modalities in the same animals, thus enabling paired statistical analyses. This method can be extended to permit correlation of LCBF with the expression of multiple genes of interest.
Psychometric challenges and proposed solutions when scoring facial emotion expression codes.
Olderbak, Sally; Hildebrandt, Andrea; Pinkpank, Thomas; Sommer, Werner; Wilhelm, Oliver
2014-12-01
Coding of facial emotion expressions is increasingly performed by automated emotion expression scoring software; however, there is limited discussion on how best to score the resulting codes. We present a discussion of facial emotion expression theories and a review of contemporary emotion expression coding methodology. We highlight methodological challenges pertinent to scoring software-coded facial emotion expression codes and present important psychometric research questions centered on comparing competing scoring procedures of these codes. Then, on the basis of a time series data set collected to assess individual differences in facial emotion expression ability, we derive, apply, and evaluate several statistical procedures, including four scoring methods and four data treatments, to score software-coded emotion expression data. These scoring procedures are illustrated to inform analysis decisions pertaining to the scoring and data treatment of other emotion expression questions and under different experimental circumstances. Overall, we found applying loess smoothing and controlling for baseline facial emotion expression and facial plasticity are recommended methods of data treatment. When scoring facial emotion expression ability, maximum score is preferred. Finally, we discuss the scoring methods and data treatments in the larger context of emotion expression research.
High-Throughput RT-PCR for small-molecule screening assays
Bittker, Joshua A.
2012-01-01
Quantitative measurement of the levels of mRNA expression using real-time reverse transcription polymerase chain reaction (RT-PCR) has long been used for analyzing expression differences in tissue or cell lines of interest. This method has been used somewhat less frequently to measure the changes in gene expression due to perturbagens such as small molecules or siRNA. The availability of new instrumentation for liquid handling and real-time PCR analysis as well as the commercial availability of start-to-finish kits for RT-PCR has enabled the use of this method for high-throughput small-molecule screening on a scale comparable to traditional high-throughput screening (HTS) assays. This protocol focuses on the special considerations necessary for using quantitative RT-PCR as a primary small-molecule screening assay, including the different methods available for mRNA isolation and analysis. PMID:23487248
Ozerov, Ivan V; Lezhnina, Ksenia V; Izumchenko, Evgeny; Artemov, Artem V; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N; Labat, Ivan; West, Michael D; Buzdin, Anton; Cantor, Charles R; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex
2016-11-16
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
Ozerov, Ivan V.; Lezhnina, Ksenia V.; Izumchenko, Evgeny; Artemov, Artem V.; Medintsev, Sergey; Vanhaelen, Quentin; Aliper, Alexander; Vijg, Jan; Osipov, Andreyan N.; Labat, Ivan; West, Michael D.; Buzdin, Anton; Cantor, Charles R.; Nikolsky, Yuri; Borisov, Nikolay; Irincheeva, Irina; Khokhlovich, Edward; Sidransky, David; Camargo, Miguel Luiz; Zhavoronkov, Alex
2016-01-01
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy. PMID:27848968
Normal uniform mixture differential gene expression detection for cDNA microarrays
Dean, Nema; Raftery, Adrian E
2005-01-01
Background One of the primary tasks in analysing gene expression data is finding genes that are differentially expressed in different samples. Multiple testing issues due to the thousands of tests run make some of the more popular methods for doing this problematic. Results We propose a simple method, Normal Uniform Differential Gene Expression (NUDGE) detection for finding differentially expressed genes in cDNA microarrays. The method uses a simple univariate normal-uniform mixture model, in combination with new normalization methods for spread as well as mean that extend the lowess normalization of Dudoit, Yang, Callow and Speed (2002) [1]. It takes account of multiple testing, and gives probabilities of differential expression as part of its output. It can be applied to either single-slide or replicated experiments, and it is very fast. Three datasets are analyzed using NUDGE, and the results are compared to those given by other popular methods: unadjusted and Bonferroni-adjusted t tests, Significance Analysis of Microarrays (SAM), and Empirical Bayes for microarrays (EBarrays) with both Gamma-Gamma and Lognormal-Normal models. Conclusion The method gives a high probability of differential expression to genes known/suspected a priori to be differentially expressed and a low probability to the others. In terms of known false positives and false negatives, the method outperforms all multiple-replicate methods except for the Gamma-Gamma EBarrays method to which it offers comparable results with the added advantages of greater simplicity, speed, fewer assumptions and applicability to the single replicate case. An R package called nudge to implement the methods in this paper will be made available soon at . PMID:16011807
Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.
Yip, Shun H; Sham, Pak Chung; Wang, Junwen
2018-02-21
Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.
Scavuzzo-Duggan, Tess R.; Chaves, Arielle M.; Roberts, Alison W.
2015-07-14
Here, a method for rapid in vivo functional analysis of engineered proteins was developed using Physcomitrella patens. A complementation assay was designed for testing structure/function relationships in cellulose synthase (CESA) proteins. The components of the assay include (1) construction of test vectors that drive expression of epitope-tagged PpCESA5 carrying engineered mutations, (2) transformation of a ppcesa5 knockout line that fails to produce gametophores with test and control vectors, (3) scoring the stable transformants for gametophore production, (4) statistical analysis comparing complementation rates for test vectors to positive and negative control vectors, and (5) analysis of transgenic protein expression by Westernmore » blotting. The assay distinguished mutations that generate fully functional, nonfunctional, and partially functional proteins. In conclusion, compared with existing methods for in vivo testing of protein function, this complementation assay provides a rapid method for investigating protein structure/function relationships in plants.« less
Jia, Zhilong; Liu, Ying; Guan, Naiyang; Bo, Xiaochen; Luo, Zhigang; Barnes, Michael R
2016-05-27
Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the "black box" nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate this using a psoriatic skin transcriptome, as an exemplar, and recover two widely used Psoriasis drugs (Methotrexate and Ciclosporin) with distinct modes of action. Cogena out-performs the results of Connectivity Map and NFFinder webservers in similar disease transcriptome analyses. Furthermore, we investigated the literature support for the other top-ranked compounds to treat psoriasis and showed how the outputs of cogena analysis can contribute new insight to support the progression of drugs into the clinic. We have made cogena freely available within Bioconductor or https://github.com/zhilongjia/cogena . In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and repositioning, allowing the grouping and prioritisation of drug repositioning candidates on the basis of putative mode of action.
ERIC Educational Resources Information Center
Vives, Robert
1983-01-01
Based on a literature review and analysis of teaching methods and objectives, it is proposed that the emphasis on communicative competence ascendant in French foreign language instruction is closely related to, and borrows from, expressive techniques taught in French native language instruction in the 1960s. (MSE)
Systemic bioinformatics analysis of skeletal muscle gene expression profiles of sepsis
Yang, Fang; Wang, Yumei
2018-01-01
Sepsis is a type of systemic inflammatory response syndrome with high morbidity and mortality. Skeletal muscle dysfunction is one of the major complications of sepsis that may also influence the outcome of sepsis. The aim of the present study was to explore and identify potential mechanisms and therapeutic targets of sepsis. Systemic bioinformatics analysis of skeletal muscle gene expression profiles from the Gene Expression Omnibus was performed. Differentially expressed genes (DEGs) in samples from patients with sepsis and control samples were screened out using the limma package. Differential co-expression and coregulation (DCE and DCR, respectively) analysis was performed based on the Differential Co-expression Analysis package to identify differences in gene co-expression and coregulation patterns between the control and sepsis groups. Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways of DEGs were identified using the Database for Annotation, Visualization and Integrated Discovery, and inflammatory, cancer and skeletal muscle development-associated biological processes and pathways were identified. DCE and DCR analysis revealed several potential therapeutic targets for sepsis, including genes and transcription factors. The results of the present study may provide a basis for the development of novel therapeutic targets and treatment methods for sepsis. PMID:29805480
Shi, Xingjie; Zhao, Qing; Huang, Jian; Xie, Yang; Ma, Shuangge
2015-01-01
Motivation: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. Results: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. Availability and implementation: R code is available at http://works.bepress.com/shuangge/49/ Contact: shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342102
Computerised analysis of facial emotion expression in eating disorders
2017-01-01
Background Problems with social-emotional processing are known to be an important contributor to the development and maintenance of eating disorders (EDs). Diminished facial communication of emotion has been frequently reported in individuals with anorexia nervosa (AN). Less is known about facial expressivity in bulimia nervosa (BN) and in people who have recovered from AN (RecAN). This study aimed to pilot the use of computerised facial expression analysis software to investigate emotion expression across the ED spectrum and recovery in a large sample of participants. Method 297 participants with AN, BN, RecAN, and healthy controls were recruited. Participants watched film clips designed to elicit happy or sad emotions, and facial expressions were then analysed using FaceReader. Results The finding mirrored those from previous work showing that healthy control and RecAN participants expressed significantly more positive emotions during the positive clip compared to the AN group. There were no differences in emotion expression during the sad film clip. Discussion These findings support the use of computerised methods to analyse emotion expression in EDs. The findings also demonstrate that reduced positive emotion expression is likely to be associated with the acute stage of AN illness, with individuals with BN showing an intermediate profile. PMID:28575109
Rapin, Nicolas; Bagger, Frederik Otzen; Jendholm, Johan; Mora-Jensen, Helena; Krogh, Anders; Kohlmann, Alexander; Thiede, Christian; Borregaard, Niels; Bullinger, Lars; Winther, Ole; Theilgaard-Mönch, Kim; Porse, Bo T
2014-02-06
Gene expression profiling has been used extensively to characterize cancer, identify novel subtypes, and improve patient stratification. However, it has largely failed to identify transcriptional programs that differ between cancer and corresponding normal cells and has not been efficient in identifying expression changes fundamental to disease etiology. Here we present a method that facilitates the comparison of any cancer sample to its nearest normal cellular counterpart, using acute myeloid leukemia (AML) as a model. We first generated a gene expression-based landscape of the normal hematopoietic hierarchy, using expression profiles from normal stem/progenitor cells, and next mapped the AML patient samples to this landscape. This allowed us to identify the closest normal counterpart of individual AML samples and determine gene expression changes between cancer and normal. We find the cancer vs normal method (CvN method) to be superior to conventional methods in stratifying AML patients with aberrant karyotype and in identifying common aberrant transcriptional programs with potential importance for AML etiology. Moreover, the CvN method uncovered a novel poor-outcome subtype of normal-karyotype AML, which allowed for the generation of a highly prognostic survival signature. Collectively, our CvN method holds great potential as a tool for the analysis of gene expression profiles of cancer patients.
Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry
2018-06-25
The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Alahakoon, Thushari I; Zhang, Weiyi; Arbuckle, Susan; Zhang, Kewei; Lee, Vincent
2018-05-01
To localize, quantify and compare angiogenic factors, vascular endothelial growth factor (VEGF), placental growth factor (PlGF), as well as their receptors fms-like tyrosine kinase receptor (Flt-1) and kinase insert domain receptor (KDR) in the placentas of normal pregnancy and complications of preeclampsia (PE), intrauterine fetal growth restriction (IUGR) and PE + IUGR. In a prospective cross-sectional case-control study, 30 pregnant women between 24-40 weeks of gestation, were recruited into four clinical groups. Representative placental samples were stained for VEGF, PlGF, Flt-1 and KDR. Analysis was performed using semiquantitative methods and digital image analysis. The overall VEGF and Flt-1 were strongly expressed and did not show any conclusive difference in the expression between study groups. PlGF and KDR were significantly reduced in expression in the placentas from pregnancies complicated by IUGR compared with normal and preeclamptic pregnancies. The lack of PlGF and KDR may be a cause for the development of IUGR and may explain the loss of vasculature and villous architecture in IUGR. Automated digital image analysis software is a viable alternative method to the manual reading of placental immunohistochemical staining. © 2018 Japan Society of Obstetrics and Gynecology.
Adhikari, Kiran; Otaki, Joji M
2016-02-01
It is often desirable but difficult to retrieve information on the mature phenotype of an immature tissue sample that has been subjected to gene expression analysis. This problem cannot be ignored when individual variation within a species is large. To circumvent this problem in the butterfly wing system, we developed a new surgical method for removing a single forewing from a pupa using Junonia orithya; the operated pupa was left to develop to an adult without eclosion. The removed right forewing was subjected to gene expression analysis, whereas the non-removed left forewing was examined for color patterns. As a test case, we focused on Distal-less (Dll), which likely plays an active role in inducing elemental patterns, including eyespots. The Dll expression level in forewings was paired with eyespot size data from the same individual. One third of the operated pupae survived and developed wing color patterns. Dll expression levels were significantly higher in males than in females, although male eyespots were smaller in size than female eyespots. Eyespot size data showed weak but significant correlations with the Dll expression level in females. These results demonstrate that a single-wing removal method was successfully applied to the butterfly wing system and suggest the weak and non-exclusive contribution of Dll to eyespot size determination in this butterfly. Our novel methodology for establishing correspondence between gene expression and phenotype can be applied to other candidate genes for color pattern development in butterflies. Conceptually similar methods may also be applicable in other developmental systems.
Chai, Xiaoqiang; Han, Yanan; Yang, Jian; Zhao, Xianxian; Liu, Yewang; Hou, Xugang; Tang, Yiheng; Zhao, Shirong; Li, Xiao
2016-02-01
The molecular pathogenesis of infection by hepatitis B virus with human is extremely complex and heterogeneous. To date the molecular information is not clearly defined despite intensive research efforts. Thus, studies aimed at transcription and regulation during virus infection or combined researches of those already known to be beneficial are needed. With the purpose of identifying the transcriptional regulators related to infection of hepatitis B virus in gene level, the gene expression profiles from some normal individuals and hepatitis B patients were analyzed in our study. In this work, the differential expressed genes were selected primarily. The several genes among those were validated in an independent set by qRT-PCR. Then the differentially co-expression analysis was conducted to identify differentially co-expressed links and differential co-expressed genes. Next, the analysis of the regulatory impact factors was performed through mapping the links and regulatory data. In order to give a further insight to these regulators, the co-expression gene modules were identified using a threshold-based hierarchical clustering method. Incidentally, the construction of the regulatory network was generated using the computer software. A total of 137,284 differentially co-expressed links and 780 differential co-expressed genes were identified. These co-expressed genes were significantly enriched inflammatory response. The results of regulatory impact factors revealed several crucial regulators related to hepatocellular carcinoma and other high-rank regulators. Meanwhile, more than one hundred co-expression gene modules were identified using clustering method. In our study, some important transcriptional regulators were identified using a computational method, which may enhance the understanding of disease mechanisms and lead to an improved treatment of hepatitis B. However, further experimental studies are required to confirm these findings. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
2012-01-01
Background Many women are unable to practice exclusive breastfeeding because they are separated from their infants while working. Expressing their breast milk helps them to continue breastfeeding. This study explores the perception and experiences related to the feasibility, acceptability and safety of breast milk expression among formally employed women in Kelantan, Malaysia. Methods A qualitative method using in-depth interviews was conducted from December 2008 to December 2009 among Malay women from urban and rural areas. A snowball sampling method was used to recruit the informants, and the interviews, which were facilitated by an interview guide, were audio-recorded and transcribed verbatim. Thematic analysis was conducted, with construction of codes and themes from each interview. Results Analysis of the interviews with 20 informants identified three themes related to breast milk expression. The themes were as follows: (i) lack of feasibility of expressing breast milk, (ii) negative feelings about expressing breast milk, and (iii) doubts about the safety and hygiene of expressed breast milk. The informants who did not practice exclusive breastfeeding believed that expressing their breast milk was not feasible, commonly because they felt there were not enough facilities for them. They also had negative feelings such as embarrassment. The safety and hygiene of the expressed breast milk was also their main concern. Conclusion More practical and focused education, as well as provision of facilities, is needed for women to effectively and safely express and store their breast milk. The issue of inadequate milk production should be emphasized, especially by encouraging them to express their breast milk as a way to improve milk production. PMID:22929649
Karsten, Stanislav L.; Van Deerlin, Vivianna M. D.; Sabatti, Chiara; Gill, Lisa H.; Geschwind, Daniel H.
2002-01-01
Archival formalin-fixed, paraffin-embedded and ethanol-fixed tissues represent a potentially invaluable resource for gene expression analysis, as they are the most widely available material for studies of human disease. Little data are available evaluating whether RNA obtained from fixed (archival) tissues could produce reliable and reproducible microarray expression data. Here we compare the use of RNA isolated from human archival tissues fixed in ethanol and formalin to frozen tissue in cDNA microarray experiments. Since an additional factor that can limit the utility of archival tissue is the often small quantities available, we also evaluate the use of the tyramide signal amplification method (TSA), which allows the use of small amounts of RNA. Detailed analysis indicates that TSA provides a consistent and reproducible signal amplification method for cDNA microarray analysis, across both arrays and the genes tested. Analysis of this method also highlights the importance of performing non-linear channel normalization and dye switching. Furthermore, archived, fixed specimens can perform well, but not surprisingly, produce more variable results than frozen tissues. Consistent results are more easily obtainable using ethanol-fixed tissues, whereas formalin-fixed tissue does not typically provide a useful substrate for cDNA synthesis and labeling. PMID:11788730
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han
2017-03-01
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
Validating internal controls for quantitative plant gene expression studies.
Brunner, Amy M; Yakovlev, Igor A; Strauss, Steven H
2004-08-18
Real-time reverse transcription PCR (RT-PCR) has greatly improved the ease and sensitivity of quantitative gene expression studies. However, accurate measurement of gene expression with this method relies on the choice of a valid reference for data normalization. Studies rarely verify that gene expression levels for reference genes are adequately consistent among the samples used, nor compare alternative genes to assess which are most reliable for the experimental conditions analyzed. Using real-time RT-PCR to study the expression of 10 poplar (genus Populus) housekeeping genes, we demonstrate a simple method for determining the degree of stability of gene expression over a set of experimental conditions. Based on a traditional method for analyzing the stability of varieties in plant breeding, it defines measures of gene expression stability from analysis of variance (ANOVA) and linear regression. We found that the potential internal control genes differed widely in their expression stability over the different tissues, developmental stages and environmental conditions studied. Our results support that quantitative comparisons of candidate reference genes are an important part of real-time RT-PCR studies that seek to precisely evaluate variation in gene expression. The method we demonstrated facilitates statistical and graphical evaluation of gene expression stability. Selection of the best reference gene for a given set of experimental conditions should enable detection of biologically significant changes in gene expression that are too small to be revealed by less precise methods, or when highly variable reference genes are unknowingly used in real-time RT-PCR experiments.
Kayano, Mitsunori; Matsui, Hidetoshi; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2016-04-01
High-throughput time course expression profiles have been available in the last decade due to developments in measurement techniques and devices. Functional data analysis, which treats smoothed curves instead of originally observed discrete data, is effective for the time course expression profiles in terms of dimension reduction, robustness, and applicability to data measured at small and irregularly spaced time points. However, the statistical method of differential analysis for time course expression profiles has not been well established. We propose a functional logistic model based on elastic net regularization (F-Logistic) in order to identify the genes with dynamic alterations in case/control study. We employ a mixed model as a smoothing method to obtain functional data; then F-Logistic is applied to time course profiles measured at small and irregularly spaced time points. We evaluate the performance of F-Logistic in comparison with another functional data approach, i.e. functional ANOVA test (F-ANOVA), by applying the methods to real and synthetic time course data sets. The real data sets consist of the time course gene expression profiles for long-term effects of recombinant interferon β on disease progression in multiple sclerosis. F-Logistic distinguishes dynamic alterations, which cannot be found by competitive approaches such as F-ANOVA, in case/control study based on time course expression profiles. F-Logistic is effective for time-dependent biomarker detection, diagnosis, and therapy. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Liao, Zhijun; Wang, Xinrui; Zeng, Yeting; Zou, Quan
2016-12-01
The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
Koponen, Jonna K; Turunen, Anna-Mari; Ylä-Herttuala, Seppo
2002-03-01
Real-time PCR is a powerful method for the quantification of gene expression in biological samples. This method uses TaqMan chemistry based on the 5' -exonuclease activity of the AmpliTaq Gold DNA polymerase which releases fluorescence from hybridized probes during synthesis of each new PCR product. Many gene therapy studies use lacZ, encoding Escherichia coli beta-galactosidase, as a marker gene. Our results demonstrate that E. coli DNA contamination in AmpliTaq Gold polymerase interferes with TaqMan analysis of lacZ gene expression and decreases sensitivity of the method below the level required for biodistribution and long-term gene expression studies. In biodistribution analyses the contamination can lead to false-negative results by masking low-level lacZ expression in target and ectopic tissues, and false-positive results if sufficient controls are not used. We conclude that, to get reliable TaqMan results with lacZ, adequate controls should be included in each run to rule out contamination from AmpliTaq Gold polymerase.
Zhang, Wensheng; Edwards, Andrea; Fan, Wei; Zhu, Dongxiao; Zhang, Kun
2010-06-22
Comparative analysis of gene expression profiling of multiple biological categories, such as different species of organisms or different kinds of tissue, promises to enhance the fundamental understanding of the universality as well as the specialization of mechanisms and related biological themes. Grouping genes with a similar expression pattern or exhibiting co-expression together is a starting point in understanding and analyzing gene expression data. In recent literature, gene module level analysis is advocated in order to understand biological network design and system behaviors in disease and life processes; however, practical difficulties often lie in the implementation of existing methods. Using the singular value decomposition (SVD) technique, we developed a new computational tool, named svdPPCS (SVD-based Pattern Pairing and Chart Splitting), to identify conserved and divergent co-expression modules of two sets of microarray experiments. In the proposed methods, gene modules are identified by splitting the two-way chart coordinated with a pair of left singular vectors factorized from the gene expression matrices of the two biological categories. Importantly, the cutoffs are determined by a data-driven algorithm using the well-defined statistic, SVD-p. The implementation was illustrated on two time series microarray data sets generated from the samples of accessory gland (ACG) and malpighian tubule (MT) tissues of the line W118 of M. drosophila. Two conserved modules and six divergent modules, each of which has a unique characteristic profile across tissue kinds and aging processes, were identified. The number of genes contained in these models ranged from five to a few hundred. Three to over a hundred GO terms were over-represented in individual modules with FDR < 0.1. One divergent module suggested the tissue-specific relationship between the expressions of mitochondrion-related genes and the aging process. This finding, together with others, may be of biological significance. The validity of the proposed SVD-based method was further verified by a simulation study, as well as the comparisons with regression analysis and cubic spline regression analysis plus PAM based clustering. svdPPCS is a novel computational tool for the comparative analysis of transcriptional profiling. It especially fits the comparison of time series data of related organisms or different tissues of the same organism under equivalent or similar experimental conditions. The general scheme can be directly extended to the comparisons of multiple data sets. It also can be applied to the integration of data sets from different platforms and of different sources.
Zucchetto, Antonella; Bomben, Riccardo; Bo, Michele Dal; Nanni, Paola; Bulian, Pietro; Rossi, Francesca Maria; Del Principe, Maria Ilaria; Santini, Simone; Del Poeta, Giovanni; Degan, Massimo; Gattei, Valter
2006-07-15
Expression of T cell specific zeta-associated protein 70 (ZAP-70) by B-cell chronic lymphocytic leukemia (B-CLL) cells, as investigated by flow cytometry, has both prognostic relevance and predictive power as surrogate for immunoglobulin heavy chain variable region (IgV(H)) mutations, although a standardization of the cytometric protocol is still lacking. Flow cytometric analyses for ZAP-70 were performed in peripheral blood samples from 145 B-CLL (124 with IgV(H) mutations) by a standard three-color protocol. Identification of ZAP-70(+) cell population was based on an external negative control, i.e., the isotypic control (ISO method) or an internal positive control, i.e., the population of residual normal T/NK cells (TNK method). A comparison between these two approaches was performed. While 86/145 cases were concordant as for ZAP-70 expression according to the two methods (ISO(+)TNK(+) or ISO(-)TNK(-)), 59/145 cases had discordant ZAP-70 expression, mainly (56/59) showing a ISO(+)TNK(-) profile. These latter cases express higher levels of ZAP-70 in their normal T cell component. Moreover, discordant ISO(+)TNK(-) cases had a IgV(H) gene mutation profile similar to that of concordantly positive cases and different from ZAP-70 concordantly negative B-CLL. Analysis of ZAP-70 expression by B-CLL cells by using the ISO method allows to overcome the variability in the expression of ZAP-70 by residual T cells and yields a better correlation with IgV(H) gene mutations. A receiver operating characteristic analysis suggests to employ a higher cut-off than the commonly used 20%. A parallel evaluation of the prognostic value of ZAP-70 expression, as determined according to the ISO and TNK methods, is still needed. (c) 2006 International Society for Analytical Cytology.
Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng
2014-01-01
Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154
Acoustic correlates of Japanese expressions associated with voice quality of male adults
NASA Astrophysics Data System (ADS)
Kido, Hiroshi; Kasuya, Hideki
2004-05-01
Japanese expressions associated with the voice quality of male adults were extracted by a series of questionnaire surveys and statistical multivariate analysis. One hundred and thirty-seven Japanese expressions were collected through the first questionnaire and careful investigations of well-established Japanese dictionaries and articles. From the second questionnaire about familiarity with each of the expressions and synonymity that were addressed to 249 subjects, 25 expressions were extracted. The third questionnaire was about an evaluation of their own voice quality. By applying a statistical clustering method and a correlation analysis to the results of the questionnaires, eight bipolar expressions and one unipolar expression were obtained. They constituted high-pitched/low-pitched, masculine/feminine, hoarse/clear, calm/excited, powerful/weak, youthful/elderly, thick/thin, tense/lax, and nasal, respectively. Acoustic correlates of each of the eight bipolar expressions were extracted by means of perceptual evaluation experiments that were made with sentence utterances of 36 males and by a statistical decision tree method. They included an average of the fundamental frequency (F0) of the utterance, speaking rate, spectral tilt, formant frequency parameter, standard deviation of F0 values, and glottal noise, when SPL of each of the stimuli was maintained identical in the perceptual experiments.
ADAGE signature analysis: differential expression analysis with data-defined gene sets.
Tan, Jie; Huyck, Matthew; Hu, Dongbo; Zelaya, René A; Hogan, Deborah A; Greene, Casey S
2017-11-22
Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.
Huang, Shi-Ming; Zhao, Xia; Zhao, Xue-Mei; Wang, Xiao-Ying; Li, Shan-Shan; Zhu, Yu-Hui
2014-01-01
Renal transplantation is the preferred method for most patients with end-stage renal disease, however, acute renal allograft rejection is still a major risk factor for recipients leading to renal injury. To improve the early diagnosis and treatment of acute rejection, study on the molecular mechanism of it is urgent. MicroRNA (miRNA) expression profile and mRNA expression profile of acute renal allograft rejection and well-functioning allograft downloaded from ArrayExpress database were applied to identify differentially expressed (DE) miRNAs and DE mRNAs. DE miRNAs targets were predicted by combining five algorithm. By overlapping the DE mRNAs and DE miRNAs targets, common genes were obtained. Differentially co-expressed genes (DCGs) were identified by differential co-expression profile (DCp) and differential co-expression enrichment (DCe) methods in Differentially Co-expressed Genes and Links (DCGL) package. Then, co-expression network of DCGs and the cluster analysis were performed. Functional enrichment analysis for DCGs was undergone. A total of 1270 miRNA targets were predicted and 698 DE mRNAs were obtained. While overlapping miRNA targets and DE mRNAs, 59 common genes were gained. We obtained 103 DCGs and 5 transcription factors (TFs) based on regulatory impact factors (RIF), then built the regulation network of miRNA targets and DE mRNAs. By clustering the co-expression network, 5 modules were obtained. Thereinto, module 1 had the highest degree and module 2 showed the most number of DCGs and common genes. TF CEBPB and several common genes, such as RXRA, BASP1 and AKAP10, were mapped on the co-expression network. C1R showed the highest degree in the network. These genes might be associated with human acute renal allograft rejection. We conducted biological analysis on integration of DE mRNA and DE miRNA in acute renal allograft rejection, displayed gene expression patterns and screened out genes and TFs that may be related to acute renal allograft rejection.
Thematic Analysis of the Children's Drawings on Museum Visit: Adaptation of the Kuhn's Method
ERIC Educational Resources Information Center
Kisovar-Ivanda, Tamara
2014-01-01
Researchers are using techniques that allow children to express their perspectives. In 2003, Kuhn developed the method of data collection and analysis which combined thematic drawing and focused, episodic interview. In this article the Kuhn's method is adjusted using the draw and write technique as a research methodology. Reflections on the…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rades, Dirk, E-mail: Rades.Dirk@gmx.net; Setter, Cornelia; Dahl, Olav
2012-01-01
Purpose: The prognostic value of the tumor cell expression of the fibroblast growth factor 2 (FGF-2) in patients with non-small-cell lung cancer (NSCLC) is unclear. The present study investigated the effect of tumor cell expression of FGF-2 on the outcome of 60 patients irradiated for Stage II-III NSCLC. Methods and Materials: The effect of FGF-2 expression and 13 additional factors on locoregional control (LRC), metastasis-free survival (MFS), and overall survival (OS) were retrospectively evaluated. These additional factors included age, gender, Karnofsky performance status, histologic type, histologic grade, T and N category, American Joint Committee on Cancer stage, surgery, chemotherapy, pack-years,more » smoking during radiotherapy, and hemoglobin during radiotherapy. Locoregional failure was identified by endoscopy or computed tomography. Univariate analyses were performed with the Kaplan-Meier method and the Wilcoxon test and multivariate analyses with the Cox proportional hazard model. Results: On univariate analysis, improved LRC was associated with surgery (p = .017), greater hemoglobin levels (p = .036), and FGF-2 negativity (p <.001). On multivariate analysis of LRC, surgery (relative risk [RR], 2.44; p = .037), and FGF-2 expression (RR, 5.06; p <.001) maintained significance. On univariate analysis, improved MFS was associated with squamous cell carcinoma (p = .020), greater hemoglobin levels (p = .007), and FGF-2 negativity (p = .001). On multivariate analysis of MFS, the hemoglobin levels (RR, 2.65; p = .019) and FGF-2 expression (RR, 3.05; p = .004) were significant. On univariate analysis, improved OS was associated with a lower N category (p = .048), greater hemoglobin levels (p <.001), and FGF-2 negativity (p <.001). On multivariate analysis of OS, greater hemoglobin levels (RR, 4.62; p = .002) and FGF-2 expression (RR, 3.25; p = .002) maintained significance. Conclusions: Tumor cell expression of FGF-2 appeared to be an independent negative predictor of LRC, MFS, and OS.« less
Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro
2015-12-01
Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.
Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J
2017-12-01
qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
2010-01-01
Background Recent developments in high-throughput methods of analyzing transcriptomic profiles are promising for many areas of biology, including ecophysiology. However, although commercial microarrays are available for most common laboratory models, transcriptome analysis in non-traditional model species still remains a challenge. Indeed, the signal resulting from heterologous hybridization is low and difficult to interpret because of the weak complementarity between probe and target sequences, especially when no microarray dedicated to a genetically close species is available. Results We show here that transcriptome analysis in a species genetically distant from laboratory models is made possible by using MAXRS, a new method of analyzing heterologous hybridization on microarrays. This method takes advantage of the design of several commercial microarrays, with different probes targeting the same transcript. To illustrate and test this method, we analyzed the transcriptome of king penguin pectoralis muscle hybridized to Affymetrix chicken microarrays, two organisms separated by an evolutionary distance of approximately 100 million years. The differential gene expression observed between different physiological situations computed by MAXRS was confirmed by real-time PCR on 10 genes out of 11 tested. Conclusions MAXRS appears to be an appropriate method for gene expression analysis under heterologous hybridization conditions. PMID:20509979
Detecting complexes from edge-weighted PPI networks via genes expression analysis.
Zhang, Zehua; Song, Jian; Tang, Jijun; Xu, Xinying; Guo, Fei
2018-04-24
Identifying complexes from PPI networks has become a key problem to elucidate protein functions and identify signal and biological processes in a cell. Proteins binding as complexes are important roles of life activity. Accurate determination of complexes in PPI networks is crucial for understanding principles of cellular organization. We propose a novel method to identify complexes on PPI networks, based on different co-expression information. First, we use Markov Cluster Algorithm with an edge-weighting scheme to calculate complexes on PPI networks. Then, we propose some significant features, such as graph information and gene expression analysis, to filter and modify complexes predicted by Markov Cluster Algorithm. To evaluate our method, we test on two experimental yeast PPI networks. On DIP network, our method has Precision and F-Measure values of 0.6004 and 0.5528. On MIPS network, our method has F-Measure and S n values of 0.3774 and 0.3453. Comparing to existing methods, our method improves Precision value by at least 0.1752, F-Measure value by at least 0.0448, S n value by at least 0.0771. Experiments show that our method achieves better results than some state-of-the-art methods for identifying complexes on PPI networks, with the prediction quality improved in terms of evaluation criteria.
Ma, Chuang; Wang, Xiangfeng
2012-09-01
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey's biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.
Ma, Chuang; Wang, Xiangfeng
2012-01-01
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey’s biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses. PMID:22797655
Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier
2008-01-01
Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
Integrated analysis of chromosome copy number variation and gene expression in cervical carcinoma
Yan, Deng; Yi, Song; Chiu, Wang Chi; Qin, Liu Gui; Kin, Wong Hoi; Kwok Hung, Chung Tony; Linxiao, Han; Wai, Choy Kwong; Yi, Sui; Tao, Yang; Tao, Tang
2017-01-01
Objective This study was conducted to explore chromosomal copy number variations (CNV) and transcript expression and to examine pathways in cervical pathogenesis using genome-wide high resolution microarrays. Methods Genome-wide chromosomal CNVs were investigated in 6 cervical cancer cell lines by Human Genome CGH Microarray Kit (4x44K). Gene expression profiles in cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues were also studied using the Whole Human Genome Microarray Kit (4x44K). Results Fifty common chromosomal CNVs were identified in the cervical cancer cell lines. Correlation analysis revealed that gene up-regulation or down-regulation is significantly correlated with genomic amplification (P=0.009) or deletion (P=0.006) events. Expression profiles were identified through cluster analysis. Gene annotation analysis pinpointed cell cycle pathways was significantly (P=1.15E-08) affected in cervical cancer. Common CNVs were associated with cervical cancer. Conclusion Chromosomal CNVs may contribute to their transcript expression in cervical cancer. PMID:29312578
Course 10: Three Lectures on Biological Networks
NASA Astrophysics Data System (ADS)
Magnasco, M. O.
1 Enzymatic networks. Proofreading knots: How DNA topoisomerases disentangle DNA 1.1 Length scales and energy scales 1.2 DNA topology 1.3 Topoisomerases 1.4 Knots and supercoils 1.5 Topological equilibrium 1.6 Can topoisomerases recognize topology? 1.7 Proposal: Kinetic proofreading 1.8 How to do it twice 1.9 The care and proofreading of knots 1.10 Suppression of supercoils 1.11 Problems and outlook 1.12 Disquisition 2 Gene expression networks. Methods for analysis of DNA chip experiments 2.1 The regulation of gene expression 2.2 Gene expression arrays 2.3 Analysis of array data 2.4 Some simplifying assumptions 2.5 Probeset analysis 2.6 Discussion 3 Neural and gene expression networks: Song-induced gene expression in the canary brain 3.1 The study of songbirds 3.2 Canary song 3.3 ZENK 3.4 The blush 3.5 Histological analysis 3.6 Natural vs. artificial 3.7 The Blush II: gAP 3.8 Meditation
Choi, Ted; Eskin, Eleazar
2013-01-01
Gene expression data, in conjunction with information on genetic variants, have enabled studies to identify expression quantitative trait loci (eQTLs) or polymorphic locations in the genome that are associated with expression levels. Moreover, recent technological developments and cost decreases have further enabled studies to collect expression data in multiple tissues. One advantage of multiple tissue datasets is that studies can combine results from different tissues to identify eQTLs more accurately than examining each tissue separately. The idea of aggregating results of multiple tissues is closely related to the idea of meta-analysis which aggregates results of multiple genome-wide association studies to improve the power to detect associations. In principle, meta-analysis methods can be used to combine results from multiple tissues. However, eQTLs may have effects in only a single tissue, in all tissues, or in a subset of tissues with possibly different effect sizes. This heterogeneity in terms of effects across multiple tissues presents a key challenge to detect eQTLs. In this paper, we develop a framework that leverages two popular meta-analysis methods that address effect size heterogeneity to detect eQTLs across multiple tissues. We show by using simulations and multiple tissue data from mouse that our approach detects many eQTLs undetected by traditional eQTL methods. Additionally, our method provides an interpretation framework that accurately predicts whether an eQTL has an effect in a particular tissue. PMID:23785294
Gong, Zu-Kang; Wang, Shuang-Jie; Huang, Yong-Qi; Zhao, Rui-Qiang; Zhu, Qi-Fang; Lin, Wen-Zhen
2014-12-01
RT-qPCR is a commonly used method for evaluating gene expression; however, its accuracy and reliability are dependent upon the choice of appropriate reference gene(s), and there is limited information available on suitable reference gene(s) that can be used in mouse testis at different stages. In this study, using the RT-qPCR method, we investigated the expression variations of six reference genes representing different functional classes (Actb, Gapdh, Ppia, Tbp, Rps29, Hprt1) in mice testis during embryonic and postnatal development. The expression stabilities of putative reference genes were evaluated using five algorithms: geNorm, NormFinder, Bestkeeper, the comparative delta C(t) method and integrated tool RefFinder. Analysis of the results showed that Ppia, Gapdh and Actb were identified as the most stable genes and the geometric mean of Ppia, Gapdh and Actb constitutes an appropriate normalization factor for gene expression studies. The mRNA expression of AT1 as a test gene of interest varied depending upon which of the reference gene(s) was used as an internal control(s). This study suggested that Ppia, Gapdh and Actb are suitable reference genes among the six genes used for RT-qPCR normalization and provide crucial information for transcriptional analyses in future studies of gene expression in the developing mouse testis.
A Review of Feature Extraction Software for Microarray Gene Expression Data
Tan, Ching Siang; Ting, Wai Soon; Mohamad, Mohd Saberi; Chan, Weng Howe; Deris, Safaai; Ali Shah, Zuraini
2014-01-01
When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method. PMID:25250315
2009-01-01
Background Whole genome transcriptomic analysis is a powerful approach to elucidate the molecular mechanisms controlling the pathogenesis of obligate intracellular bacteria. However, the major hurdle resides in the low quantity of prokaryotic mRNAs extracted from host cells. Our model Ehrlichia ruminantium (ER), the causative agent of heartwater, is transmitted by tick Amblyomma variegatum. This bacterium affects wild and domestic ruminants and is present in Sub-Saharan Africa and the Caribbean islands. Because of its strictly intracellular location, which constitutes a limitation for its extensive study, the molecular mechanisms involved in its pathogenicity are still poorly understood. Results We successfully adapted the SCOTS method (Selective Capture of Transcribed Sequences) on the model Rickettsiales ER to capture mRNAs. Southern Blots and RT-PCR revealed an enrichment of ER's cDNAs and a diminution of ribosomal contaminants after three rounds of capture. qRT-PCR and whole-genome ER microarrays hybridizations demonstrated that SCOTS method introduced only a limited bias on gene expression. Indeed, we confirmed the differential gene expression between poorly and highly expressed genes before and after SCOTS captures. The comparative gene expression obtained from ER microarrays data, on samples before and after SCOTS at 96 hpi was significantly correlated (R2 = 0.7). Moreover, SCOTS method is crucial for microarrays analysis of ER, especially for early time points post-infection. There was low detection of transcripts for untreated samples whereas 24% and 70.7% were revealed for SCOTS samples at 24 and 96 hpi respectively. Conclusions We conclude that this SCOTS method has a key importance for the transcriptomic analysis of ER and can be potentially used for other Rickettsiales. This study constitutes the first step for further gene expression analyses that will lead to a better understanding of both ER pathogenicity and the adaptation of obligate intracellular bacteria to their environment. PMID:20034374
Gripsrud, Birgitta Haga; Brassil, Kelly J; Summers, Barbara; Søiland, Håvard; Kronowitz, Steven; Lode, Kirsten
2015-01-01
Background Expressive writing has been shown to improve quality of life, fatigue, and post-traumatic stress among breast cancer patients across cultures. Understanding how and why the method may be beneficial to patients can increase awareness of the psychosocial impact of breast cancer and enhance interventional work within this population. Qualitative research on experiential aspects of interventions may inform the theoretical understanding, and generate hypotheses for future studies. Aim To explore and describe the experience and feasibility of expressive writing among women with breast cancer following mastectomy and immediate or delayed reconstructive surgery. Methods Seven participants enrolled to undertake 4 episodes of expressive writing at home, with semi-structured interviews conducted afterwards and analyzed using experiential thematic analysis. Results Three themes emerged through analysis: writing as process, writing as therapeutic, and writing as a means to help others. Implications for practice This study augments existing evidence to support the appropriateness of expressive writing as an intervention after a breast cancer diagnosis. Further studies should evaluate its feasibility at different time points in survivorship. Conclusions Findings illuminate experiential variations in expressive writing and how storytelling encourages a release of cognitive and emotional strains, surrendering these to reside in the text. The method was said to process feelings and capture experiences tied to a new and overwhelming illness situation, as impressions became expressions through writing. Expressive writing, therefore, is a valuable tool for health care providers to introduce into the plan of care for patients with breast cancer, and potentially other cancer patient groups. PMID:26390074
Identifying stochastic oscillations in single-cell live imaging time series using Gaussian processes
Manning, Cerys; Rattray, Magnus
2017-01-01
Multiple biological processes are driven by oscillatory gene expression at different time scales. Pulsatile dynamics are thought to be widespread, and single-cell live imaging of gene expression has lead to a surge of dynamic, possibly oscillatory, data for different gene networks. However, the regulation of gene expression at the level of an individual cell involves reactions between finite numbers of molecules, and this can result in inherent randomness in expression dynamics, which blurs the boundaries between aperiodic fluctuations and noisy oscillators. This underlies a new challenge to the experimentalist because neither intuition nor pre-existing methods work well for identifying oscillatory activity in noisy biological time series. Thus, there is an acute need for an objective statistical method for classifying whether an experimentally derived noisy time series is periodic. Here, we present a new data analysis method that combines mechanistic stochastic modelling with the powerful methods of non-parametric regression with Gaussian processes. Our method can distinguish oscillatory gene expression from random fluctuations of non-oscillatory expression in single-cell time series, despite peak-to-peak variability in period and amplitude of single-cell oscillations. We show that our method outperforms the Lomb-Scargle periodogram in successfully classifying cells as oscillatory or non-oscillatory in data simulated from a simple genetic oscillator model and in experimental data. Analysis of bioluminescent live-cell imaging shows a significantly greater number of oscillatory cells when luciferase is driven by a Hes1 promoter (10/19), which has previously been reported to oscillate, than the constitutive MoMuLV 5’ LTR (MMLV) promoter (0/25). The method can be applied to data from any gene network to both quantify the proportion of oscillating cells within a population and to measure the period and quality of oscillations. It is publicly available as a MATLAB package. PMID:28493880
Validating internal controls for quantitative plant gene expression studies
Brunner, Amy M; Yakovlev, Igor A; Strauss, Steven H
2004-01-01
Background Real-time reverse transcription PCR (RT-PCR) has greatly improved the ease and sensitivity of quantitative gene expression studies. However, accurate measurement of gene expression with this method relies on the choice of a valid reference for data normalization. Studies rarely verify that gene expression levels for reference genes are adequately consistent among the samples used, nor compare alternative genes to assess which are most reliable for the experimental conditions analyzed. Results Using real-time RT-PCR to study the expression of 10 poplar (genus Populus) housekeeping genes, we demonstrate a simple method for determining the degree of stability of gene expression over a set of experimental conditions. Based on a traditional method for analyzing the stability of varieties in plant breeding, it defines measures of gene expression stability from analysis of variance (ANOVA) and linear regression. We found that the potential internal control genes differed widely in their expression stability over the different tissues, developmental stages and environmental conditions studied. Conclusion Our results support that quantitative comparisons of candidate reference genes are an important part of real-time RT-PCR studies that seek to precisely evaluate variation in gene expression. The method we demonstrated facilitates statistical and graphical evaluation of gene expression stability. Selection of the best reference gene for a given set of experimental conditions should enable detection of biologically significant changes in gene expression that are too small to be revealed by less precise methods, or when highly variable reference genes are unknowingly used in real-time RT-PCR experiments. PMID:15317655
Transcriptome assembly and digital gene expression atlas of the rainbow trout
USDA-ARS?s Scientific Manuscript database
Background: Transcriptome analysis is a preferred method for gene discovery, marker development and gene expression profiling in non-model organisms. Previously, we sequenced a transcriptome reference using Sanger-based and 454-pyrosequencing, however, a transcriptome assembly is still incomplete an...
Super-delta: a new differential gene expression analysis procedure with robust data normalization.
Liu, Yuhang; Zhang, Jinfeng; Qiu, Xing
2017-12-21
Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems.
Namkung, Junghyun; Nam, Jin-Wu; Park, Taesung
2007-01-01
Many genes with major effects on quantitative traits have been reported to interact with other genes. However, finding a group of interacting genes from thousands of SNPs is challenging. Hence, an efficient and robust algorithm is needed. The genetic algorithm (GA) is useful in searching for the optimal solution from a very large searchable space. In this study, we show that genome-wide interaction analysis using GA and a statistical interaction model can provide a practical method to detect biologically interacting loci. We focus our search on transcriptional regulators by analyzing gene x gene interactions for cancer-related genes. The expression values of three cancer-related genes were selected from the expression data of the Genetic Analysis Workshop 15 Problem 1 data set. We implemented a GA to identify the expression quantitative trait loci that are significantly associated with expression levels of the cancer-related genes. The time complexity of the GA was compared with that of an exhaustive search algorithm. As a result, our GA, which included heuristic methods, such as archive, elitism, and local search, has greatly reduced computational time in a genome-wide search for gene x gene interactions. In general, the GA took one-fifth the computation time of an exhaustive search for the most significant pair of single-nucleotide polymorphisms.
Namkung, Junghyun; Nam, Jin-Wu; Park, Taesung
2007-01-01
Many genes with major effects on quantitative traits have been reported to interact with other genes. However, finding a group of interacting genes from thousands of SNPs is challenging. Hence, an efficient and robust algorithm is needed. The genetic algorithm (GA) is useful in searching for the optimal solution from a very large searchable space. In this study, we show that genome-wide interaction analysis using GA and a statistical interaction model can provide a practical method to detect biologically interacting loci. We focus our search on transcriptional regulators by analyzing gene × gene interactions for cancer-related genes. The expression values of three cancer-related genes were selected from the expression data of the Genetic Analysis Workshop 15 Problem 1 data set. We implemented a GA to identify the expression quantitative trait loci that are significantly associated with expression levels of the cancer-related genes. The time complexity of the GA was compared with that of an exhaustive search algorithm. As a result, our GA, which included heuristic methods, such as archive, elitism, and local search, has greatly reduced computational time in a genome-wide search for gene × gene interactions. In general, the GA took one-fifth the computation time of an exhaustive search for the most significant pair of single-nucleotide polymorphisms. PMID:18466570
contamDE: differential expression analysis of RNA-seq data for contaminated tumor samples.
Shen, Qi; Hu, Jiyuan; Jiang, Ning; Hu, Xiaohua; Luo, Zewei; Zhang, Hong
2016-03-01
Accurate detection of differentially expressed genes between tumor and normal samples is a primary approach of cancer-related biomarker identification. Due to the infiltration of tumor surrounding normal cells, the expression data derived from tumor samples would always be contaminated with normal cells. Ignoring such cellular contamination would deflate the power of detecting DE genes and further confound the biological interpretation of the analysis results. For the time being, there does not exists any differential expression analysis approach for RNA-seq data in literature that can properly account for the contamination of tumor samples. Without appealing to any extra information, we develop a new method 'contamDE' based on a novel statistical model that associates RNA-seq expression levels with cell types. It is demonstrated through simulation studies that contamDE could be much more powerful than the existing methods that ignore the contamination. In the application to two cancer studies, contamDE uniquely found several potential therapy and prognostic biomarkers of prostate cancer and non-small cell lung cancer. An R package contamDE is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/ zhanghfd@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
David N. Bengston; David P. Fan
1999-01-01
Public attitudes, beliefs, and underlying values about roads on the U.S. national forests expressed in more than 4,000 on-line news stories during a 3-year period are analyzed by using computer methods. The belief that forest roads provide access for recreation was expressed most frequently, accounting for about 40% of all beliefs expressed. The belief that roads cause...
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Analysis of intracellular cytokines using flowcytometry.
Arora, Sunil K
2002-01-01
Characterization of T-cell clones and identification of functional subsets of the helper T-cells with polarized cytokine production is based on testing of cytokine expression. Several methods have been developed that allow cytokine expression to be measured like ELISA, RT-PCR, ELISPOT, ISH and flowcytometry. Among all these methods, monitoring of cytokine production using flowcytometric analysis has its own advantages and disadvantages. Multi-parametric characterization of cytokine production on single cell basis, without long-term culture and cloning along with high throughput of samples is main feature attached to flowcytometric analysis. The interpretation may be difficult at times due to change in the phenotype of the cells. Cells with similar surface phenotype but synthesizing different cytokines and having different functional characteristics can be analyzed with this technique.
Takamura, Ayari; Watanabe, Ken; Akutsu, Tomoko
2017-07-01
Identification of human semen is indispensable for the investigation of sexual assaults. Fluorescence staining methods using commercial kits, such as the series of SPERM HY-LITER™ kits, have been useful to detect human sperm via strong fluorescence. These kits have been examined from various forensic aspects. However, because of a lack of evaluation methods, these studies did not provide objective, or quantitative, descriptions of the results nor clear criteria for the decisions reached. In addition, the variety of validations was considerably limited. In this study, we conducted more advanced validations of SPERM HY-LITER™ Express using our established image analysis method. Use of this method enabled objective and specific identification of fluorescent sperm's spots and quantitative comparisons of the sperm detection performance under complex experimental conditions. For body fluid mixtures, we examined interference with the fluorescence staining from other body fluid components. Effects of sample decomposition were simulated in high humidity and high temperature conditions. Semen with quite low sperm concentrations, such as azoospermia and oligospermia samples, represented the most challenging cases in application of the kit. Finally, the tolerance of the kit against various acidic and basic environments was analyzed. The validations herein provide useful information for the practical applications of the SPERM HY-LITER™ Express kit, which were previously unobtainable. Moreover, the versatility of our image analysis method toward various complex cases was demonstrated.
Carcone, April Idalski; Barton, Ellen; Eggly, Susan; Brogan Hartlieb, Kathryn E.; Thominet, Luke; Naar, Sylvie
2016-01-01
Objective We conducted an exploratory mixed methods study to describe the ambivalence African-American adolescents and their caregivers expressed during motivational interviewing sessions targeting weight loss. Methods We extracted ambivalence statements from 37 previously coded counseling sessions. We used directed content analysis to categorize ambivalence related to the target behaviors of nutrition, activity, or weight. We compared adolescent-caregiver dyads’ ambivalence using the paired sample t-test and Wilcoxon signed-rank test. We then used conventional content analysis to compare the specific content of adolescents’ and caregivers’ ambivalence statements. Results Adolescents and caregivers expressed the same number of ambivalence statements overall, related to activity and weight, but caregivers expressed more statements about nutrition. Content analysis revealed convergences and divergences in caregivers’ and adolescents’ ambivalence about weight loss. Conclusion Understanding divergences in adolescent-caregiver ambivalence about the specific behaviors to target may partially explain the limited success of family-based weight loss interventions targeting African American families and provides a unique opportunity for providers to enhance family communication, foster teamwork, and build self-efficacy to promote behavior change. Practice implications Clinicians working in family contexts should explore how adolescents and caregivers converge and diverge in their ambivalence in order to recommend weight loss strategies that best meet families’ needs. PMID:26916012
RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods
Holik, Aliaksei Z.; Law, Charity W.; Liu, Ruijie; Wang, Zeya; Wang, Wenyi; Ahn, Jaeil; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.
2017-01-01
Abstract Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable. PMID:27899618
ADGO: analysis of differentially expressed gene sets using composite GO annotation.
Nam, Dougu; Kim, Sang-Bae; Kim, Seon-Kyu; Yang, Sungjin; Kim, Seon-Young; Chu, In-Sun
2006-09-15
Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. chu@kribb.re.kr http://array.kobic.re.kr/ADGO.
Counteracting Misconceptions About the Socratic Method.
ERIC Educational Resources Information Center
Fishman, Ethan M.
1985-01-01
The Socratic method, while utilizing student participation, emphasizes self-knowledge, not self-expression. This is accomplished on the basis of successive stages of issue analysis and self-examination. The Socratic method strives to get at the root of belief by studying assumptions. (MLW)
Corneanu, Ciprian Adrian; Simon, Marc Oliu; Cohn, Jeffrey F; Guerrero, Sergio Escalera
2016-08-01
Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.
Clinical Significance of SASH1 Expression in Glioma
Yang, Liu; Zhang, Haitao; Yao, Qi; Yan, Yingying; Wu, Ronghua; Liu, Mei
2015-01-01
Objective. SAM and SH3 domain containing 1 (SASH1) is a recently discovered tumor suppressor gene. The role of SASH1 in glioma has not yet been described. We investigated SASH1 expression in glioma cases to determine its clinical significance on glioma pathogenesis and prognosis. Methods. We produced tissue microarrays using 121 patient-derived glioma samples and 30 patient-derived nontumor cerebral samples. Immunohistochemistry and Western blotting were used to evaluate SASH1 expression. We used Fisher's exact tests to determine relationships between SASH1 expression and clinicopathological characteristics; Cox regression analysis to evaluate the independency of different SASH1 expression; Kaplan-Meier analysis to determine any correlation of SASH1 expression with survival rate. Results. SASH1 expression was closely correlated with the WHO glioma grade. Of the 121 cases, 66.9% with low SASH1 expression were mostly grade III-IV cases, whereas 33.1% with high SASH1 expression were mostly grades I-II. Kaplan-Meier analysis revealed a significant positive correlation between SASH1 expression and postoperative survival. Conclusions. SASH1 was widely expressed in normal and low-grade glioma tissues. SASH1 expression strongly correlated with glioma grades, showing higher expression at a lower grade, which decreased significantly as grade increased. Furthermore, SASH1 expression was positively correlated with better postoperative survival in patients with glioma. PMID:26424902
Measuring single-cell gene expression dynamics in bacteria using fluorescence time-lapse microscopy
Young, Jonathan W; Locke, James C W; Altinok, Alphan; Rosenfeld, Nitzan; Bacarian, Tigran; Swain, Peter S; Mjolsness, Eric; Elowitz, Michael B
2014-01-01
Quantitative single-cell time-lapse microscopy is a powerful method for analyzing gene circuit dynamics and heterogeneous cell behavior. We describe the application of this method to imaging bacteria by using an automated microscopy system. This protocol has been used to analyze sporulation and competence differentiation in Bacillus subtilis, and to quantify gene regulation and its fluctuations in individual Escherichia coli cells. The protocol involves seeding and growing bacteria on small agarose pads and imaging the resulting microcolonies. Images are then reviewed and analyzed using our laboratory's custom MATLAB analysis code, which segments and tracks cells in a frame-to-frame method. This process yields quantitative expression data on cell lineages, which can illustrate dynamic expression profiles and facilitate mathematical models of gene circuits. With fast-growing bacteria, such as E. coli or B. subtilis, image acquisition can be completed in 1 d, with an additional 1–2 d for progressing through the analysis procedure. PMID:22179594
Learning challenges and sustainable development: A methodological perspective.
Seppänen, Laura
2017-01-01
Sustainable development requires learning, but the contents of learning are often complex and ambiguous. This requires new integrated approaches from research. It is argued that investigation of people's learning challenges in every-day work is beneficial for research on sustainable development. The aim of the paper is to describe a research method for examining learning challenges in promoting sustainable development. This method is illustrated with a case example from organic vegetable farming in Finland. The method, based on Activity Theory, combines historical analysis with qualitative analysis of need expressions in discourse data. The method linking local and subjective need expressions with general historical analysis is a promising way to overcome the gap between the individual and society, so much needed in research for sustainable development. Dialectically informed historical frameworks have practical value as tools in collaborative negotiations and participatory designs for sustainable development. The simultaneous use of systemic and subjective perspectives allows researchers to manage the complexity of practical work activities and to avoid too simplistic presumptions about sustainable development.
2013-01-01
Background Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. Results This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Conclusions Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics. PMID:24134721
Bikel, Shirley; Jacobo-Albavera, Leonor; Sánchez-Muñoz, Fausto; Cornejo-Granados, Fernanda; Canizales-Quinteros, Samuel; Soberón, Xavier; Sotelo-Mundo, Rogerio R.; del Río-Navarro, Blanca E.; Mendoza-Vargas, Alfredo; Sánchez, Filiberto
2017-01-01
Background In spite of the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under a specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. Methods We extracted the RNA from 16 children leukocyte samples (nine males and seven females, ages 6–10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). Results From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and confirmed their expression level by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEGG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Discussion Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments. PMID:29230367
NASA Astrophysics Data System (ADS)
Motegi, Kohei
2018-05-01
We present a method to analyze the wavefunctions of six-vertex models by extending the Izergin-Korepin analysis originally developed for domain wall boundary partition functions. First, we apply the method to the case of the basic wavefunctions of the XXZ-type six-vertex model. By giving the Izergin-Korepin characterization of the wavefunctions, we show that these wavefunctions can be expressed as multiparameter deformations of the quantum group deformed Grothendieck polynomials. As a second example, we show that the Izergin-Korepin analysis is effective for analysis of the wavefunctions for a triangular boundary and present the explicit forms of the symmetric functions representing these wavefunctions. As a third example, we apply the method to the elliptic Felderhof model which is a face-type version and an elliptic extension of the trigonometric Felderhof model. We show that the wavefunctions can be expressed as one-parameter deformations of an elliptic analog of the Vandermonde determinant and elliptic symmetric functions.
Comparison of software packages for detecting differential expression in RNA-seq studies
Seyednasrollah, Fatemeh; Laiho, Asta
2015-01-01
RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. PMID:24300110
Comparison of software packages for detecting differential expression in RNA-seq studies.
Seyednasrollah, Fatemeh; Laiho, Asta; Elo, Laura L
2015-01-01
RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. © The Author 2013. Published by Oxford University Press.
Microarray characterization of gene expression changes in blood during acute ethanol exposure
2013-01-01
Background As part of the civil aviation safety program to define the adverse effects of ethanol on flying performance, we performed a DNA microarray analysis of human whole blood samples from a five-time point study of subjects administered ethanol orally, followed by breathalyzer analysis, to monitor blood alcohol concentration (BAC) to discover significant gene expression changes in response to the ethanol exposure. Methods Subjects were administered either orange juice or orange juice with ethanol. Blood samples were taken based on BAC and total RNA was isolated from PaxGene™ blood tubes. The amplified cDNA was used in microarray and quantitative real-time polymerase chain reaction (RT-qPCR) analyses to evaluate differential gene expression. Microarray data was analyzed in a pipeline fashion to summarize and normalize and the results evaluated for relative expression across time points with multiple methods. Candidate genes showing distinctive expression patterns in response to ethanol were clustered by pattern and further analyzed for related function, pathway membership and common transcription factor binding within and across clusters. RT-qPCR was used with representative genes to confirm relative transcript levels across time to those detected in microarrays. Results Microarray analysis of samples representing 0%, 0.04%, 0.08%, return to 0.04%, and 0.02% wt/vol BAC showed that changes in gene expression could be detected across the time course. The expression changes were verified by qRT-PCR. The candidate genes of interest (GOI) identified from the microarray analysis and clustered by expression pattern across the five BAC points showed seven coordinately expressed groups. Analysis showed function-based networks, shared transcription factor binding sites and signaling pathways for members of the clusters. These include hematological functions, innate immunity and inflammation functions, metabolic functions expected of ethanol metabolism, and pancreatic and hepatic function. Five of the seven clusters showed links to the p38 MAPK pathway. Conclusions The results of this study provide a first look at changing gene expression patterns in human blood during an acute rise in blood ethanol concentration and its depletion because of metabolism and excretion, and demonstrate that it is possible to detect changes in gene expression using total RNA isolated from whole blood. The analysis approach for this study serves as a workflow to investigate the biology linked to expression changes across a time course and from these changes, to identify target genes that could serve as biomarkers linked to pilot performance. PMID:23883607
Recombinational Cloning Using Gateway and In-Fusion Cloning Schemes
Throop, Andrea L.; LaBaer, Joshua
2015-01-01
The comprehensive study of protein structure and function, or proteomics, depends on the obtainability of full-length cDNAs in species-specific expression vectors and subsequent functional analysis of the expressed protein. Recombinational cloning is a universal cloning technique based on site-specific recombination that is independent of the insert DNA sequence of interest, which differentiates this method from the classical restriction enzyme-based cloning methods. Recombinational cloning enables rapid and efficient parallel transfer of DNA inserts into multiple expression systems. This unit summarizes strategies for generating expression-ready clones using the most popular recombinational cloning technologies, including the commercially available Gateway® (Life Technologies) and In-Fusion® (Clontech) cloning technologies. PMID:25827088
Dong, Chao; Wang, Xiao-li; Ma, Bin-lin
2015-01-01
Aim. Spindle and kinetochore-associated protein 1 (SKA1) is one subtype of SKA, whose protein can make spindle microtubules attach steadily to the kinetochore in the middle of mitosis. At present, there are fewer researches on the relationship between SKA1 expression and tumor development. Methods. In this study, immunohistochemical analysis was used to determine the expression of SKA1 in papillary thyroid carcinoma (PTC) and adjacent tissues. We used quantitative real-time polymerase chain reaction (qRT-PCR) and Western blot analysis to further verify the results. Results. We found that SKA1 expression was significantly higher in PTC tissues than normal adjacent tissues (P < 0.05). There existed a significant correlation among a higher SKA1 expression, including lymphoid node (P = 0.005), clinical stage (P = 0.015), and extrathyroid invasion (P = 0.004). Survival analysis showed high SKA1 expression in PTC patients more likely to relapse after surgery. Conclusion. High SKA1 expression is predictive of poor prognosis of PTC, implying that SKA1 may be a promising new target for targeted therapies for PTC. PMID:26063960
Evaluation of RNA from human trabecular bone and identification of stable reference genes.
Cepollaro, Simona; Della Bella, Elena; de Biase, Dario; Visani, Michela; Fini, Milena
2018-06-01
The isolation of good quality RNA from tissues is an essential prerequisite for gene expression analysis to study pathophysiological processes. This study evaluated the RNA isolated from human trabecular bone and defined a set of stable reference genes. After pulverization, RNA was extracted with a phenol/chloroform method and then purified using silica columns. The A260/280 ratio, A260/230 ratio, RIN, and ribosomal ratio were measured to evaluate RNA quality and integrity. Moreover, the expression of six candidates was analyzed by qPCR and different algorithms were applied to assess reference gene stability. A good purity and quality of RNA was achieved according to A260/280 and A260/230 ratios, and RIN values. TBP, YWHAZ, and PGK1 were the most stable reference genes that should be used for gene expression analysis. In summary, the method proposed is suitable for gene expression evaluation in human bone and a set of reliable reference genes has been identified. © 2017 Wiley Periodicals, Inc.
Computerized system for recognition of autism on the basis of gene expression microarray data.
Latkowski, Tomasz; Osowski, Stanislaw
2015-01-01
The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of different methods of gene selection, to select the most representative input attributes for an ensemble of classifiers. The set of classifiers is responsible for distinguishing autism data from the reference class. Simultaneous application of a few gene selection methods enables analysis of the ill-conditioned gene expression matrix from different points of view. The results of selection combined with a genetic algorithm and SVM classifier have shown increased accuracy of autism recognition. Early recognition of autism is extremely important for treatment of children and increases the probability of their recovery and return to normal social communication. The results of this research can find practical application in early recognition of autism on the basis of gene expression microarray analysis. Copyright © 2014 Elsevier Ltd. All rights reserved.
Darbani, Behrooz; Stewart, C Neal; Noeparvar, Shahin; Borg, Søren
2014-10-20
This report investigates for the first time the potential inter-treatment bias source of cell number for gene expression studies. Cell-number bias can affect gene expression analysis when comparing samples with unequal total cellular RNA content or with different RNA extraction efficiencies. For maximal reliability of analysis, therefore, comparisons should be performed at the cellular level. This could be accomplished using an appropriate correction method that can detect and remove the inter-treatment bias for cell-number. Based on inter-treatment variations of reference genes, we introduce an analytical approach to examine the suitability of correction methods by considering the inter-treatment bias as well as the inter-replicate variance, which allows use of the best correction method with minimum residual bias. Analyses of RNA sequencing and microarray data showed that the efficiencies of correction methods are influenced by the inter-treatment bias as well as the inter-replicate variance. Therefore, we recommend inspecting both of the bias sources in order to apply the most efficient correction method. As an alternative correction strategy, sequential application of different correction approaches is also advised. Copyright © 2014 Elsevier B.V. All rights reserved.
Baldrian, Petr; López-Mondéjar, Rubén
2014-02-01
Molecular methods for the analysis of biomolecules have undergone rapid technological development in the last decade. The advent of next-generation sequencing methods and improvements in instrumental resolution enabled the analysis of complex transcriptome, proteome and metabolome data, as well as a detailed annotation of microbial genomes. The mechanisms of decomposition by model fungi have been described in unprecedented detail by the combination of genome sequencing, transcriptomics and proteomics. The increasing number of available genomes for fungi and bacteria shows that the genetic potential for decomposition of organic matter is widespread among taxonomically diverse microbial taxa, while expression studies document the importance of the regulation of expression in decomposition efficiency. Importantly, high-throughput methods of nucleic acid analysis used for the analysis of metagenomes and metatranscriptomes indicate the high diversity of decomposer communities in natural habitats and their taxonomic composition. Today, the metaproteomics of natural habitats is of interest. In combination with advanced analytical techniques to explore the products of decomposition and the accumulation of information on the genomes of environmentally relevant microorganisms, advanced methods in microbial ecophysiology should increase our understanding of the complex processes of organic matter transformation.
NASA Astrophysics Data System (ADS)
Alekhin, Artem A.; Gorbunova, Elena V.; Chertov, Aleksandr N.; Petuhova, Darya B.
2013-04-01
Due to the depletion of solid minerals ore reserves and the involvement in the production of the poor and refractory ores a process of continuous appreciation of minerals is going. In present time at the market of enrichment equipment are well represented optical sorters of various firms. All these sorters are essentially different from each other by parameters of productivity, classes of particles sizes for processed raw, nuances of decision algorithm, as well as by color model (RGB, YUV, HSB, etc.) chosen to describe the color of separating mineral samples. At the same time there is no dressability estimation method for mineral raw materials without direct semi-industrial test on the existing type of optical sorter, as well as there is no equipment realizing mentioned dressability estimation method. It should also be note the lack of criteria for choosing of one or another manufacturer (or type) of optical sorter. A direct consequence of this situation is the "opacity" of the color sorting method and the rejection of its potential customers. The proposed solution of mentioned problems is to develop the dressability estimation method, and to create an optical-electronic system for express analysis of mineral raw materials dressability by color sorting method. This paper has the description of structure organization and operating principles of experimental model optical-electronic system for express analysis of mineral raw material. Also in this work are represented comparison results of the proposed optical-electronic system and the real color sorter.
Fox, Bridget C; Devonshire, Alison S; Baradez, Marc-Olivier; Marshall, Damian; Foy, Carole A
2012-08-15
Single cell gene expression analysis can provide insights into development and disease progression by profiling individual cellular responses as opposed to reporting the global average of a population. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) is the "gold standard" for the quantification of gene expression levels; however, the technical performance of kits and platforms aimed at single cell analysis has not been fully defined in terms of sensitivity and assay comparability. We compared three kits using purification columns (PicoPure) or direct lysis (CellsDirect and Cells-to-CT) combined with a one- or two-step RT-qPCR approach using dilutions of cells and RNA standards to the single cell level. Single cell-level messenger RNA (mRNA) analysis was possible using all three methods, although the precision, linearity, and effect of lysis buffer and cell background differed depending on the approach used. The impact of using a microfluidic qPCR platform versus a standard instrument was investigated for potential variability introduced by preamplification of template or scaling down of the qPCR to nanoliter volumes using laser-dissected single cell samples. The two approaches were found to be comparable. These studies show that accurate gene expression analysis is achievable at the single cell level and highlight the importance of well-validated experimental procedures for low-level mRNA analysis. Copyright © 2012 Elsevier Inc. All rights reserved.
ProbFAST: Probabilistic functional analysis system tool.
Silva, Israel T; Vêncio, Ricardo Z N; Oliveira, Thiago Y K; Molfetta, Greice A; Silva, Wilson A
2010-03-30
The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http://gdm.fmrp.usp.br/probfast.
ProbFAST: Probabilistic Functional Analysis System Tool
2010-01-01
Background The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. Results We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. Conclusions ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http://gdm.fmrp.usp.br/probfast. PMID:20353576
2012-01-01
Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set. PMID:23122055
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ilgu, Muslum
A detailed study was done of the neomycin-B RNA aptamer for determining its selectivity and binding ability to both neomycin– and kanamycin-class aminoglycosides. A novel method to increase drug concentrations in cells for more efficiently killing is described. To test the method, a bacterial model system was adopted and several small RNA molecules interacting with aminoglycosides were cloned downstream of T7 RNA polymerase promoter in an expression vector. Then, the growth analysis of E. coli expressing aptamers was observed for 12-hour period. Our analysis indicated that aptamers helped to increase the intracellular concentration of aminoglycosides thereby increasing their efficacy.
Sample-space-based feature extraction and class preserving projection for gene expression data.
Wang, Wenjun
2013-01-01
In order to overcome the problems of high computational complexity and serious matrix singularity for feature extraction using Principal Component Analysis (PCA) and Fisher's Linear Discrinimant Analysis (LDA) in high-dimensional data, sample-space-based feature extraction is presented, which transforms the computation procedure of feature extraction from gene space to sample space by representing the optimal transformation vector with the weighted sum of samples. The technique is used in the implementation of PCA, LDA, Class Preserving Projection (CPP) which is a new method for discriminant feature extraction proposed, and the experimental results on gene expression data demonstrate the effectiveness of the method.
A catalog of automated analysis methods for enterprise models.
Florez, Hector; Sánchez, Mario; Villalobos, Jorge
2016-01-01
Enterprise models are created for documenting and communicating the structure and state of Business and Information Technologies elements of an enterprise. After models are completed, they are mainly used to support analysis. Model analysis is an activity typically based on human skills and due to the size and complexity of the models, this process can be complicated and omissions or miscalculations are very likely. This situation has fostered the research of automated analysis methods, for supporting analysts in enterprise analysis processes. By reviewing the literature, we found several analysis methods; nevertheless, they are based on specific situations and different metamodels; then, some analysis methods might not be applicable to all enterprise models. This paper presents the work of compilation (literature review), classification, structuring, and characterization of automated analysis methods for enterprise models, expressing them in a standardized modeling language. In addition, we have implemented the analysis methods in our modeling tool.
The antagonistic effect between STAT1 and Survivin and its clinical significance in gastric cancer.
Deng, Hao; Zhen, Hongyan; Fu, Zhengqi; Huang, Xuan; Zhou, Hongyan; Liu, Lijiang
2012-01-01
In previous studies, we observed that STAT1 and Survivin correlated negatively with gastric cancer tissues, and that the functions of the IFN-γ-STAT1 pathway and Survivin in gastric cancer are the same as those reported for other types of cancer. In this study, the SGC7901 gastric cancer cell line and 83 gastric cancer specimens were used to confirm the relationship between STAT1 and Survivin, as well as the clinical significance of this relationship in gastric cancer. IFN-γ and STAT1 and Survivin antisense oligonucleotides (ASONs) were used to knock down the expression in SGC7901 cells. The protein expression of STAT1 and Survivin was tested by immunocytochemical and image analysis methods. A gastric cancer tissue microarray was prepared and tested by immunohistochemical methods. Data were analyzed by the Spearman's rank correlation analysis, the χ(2) test and Cox's multivariate regression analysis. Upon knockdown of IFN-γ, STAT1 and Survivin expression by ASON in the SGC7901 cell line, an antagonistic effect was observed between STAT1 and Survivin. In gastric cancer tissues, STAT1 showed a negative correlation with depth of invasion (p<0.05) in gastric cancer tissues exhibiting a negative Survivin protein expression. Furthermore, in tissues exhibiting a negative STAT1 protein expression, Survivin correlated negatively with N stage (p<0.05). Pathological and molecular markers were used to conduct Cox's multivariate regression analysis, and depth of invasion and N stage were found to be prognostic factors (p<0.05). On the other hand, in tissues exhibiting a negative Survivin protein expression, Cox's multivariate regression analysis revealed that the differentiation type and STAT1 protein expression were prognostic factors (p<0.05). There is an antagonistic effect between STAT1 and Survivin in gastric cancer, and this antagonistic effect is of clinical significance in gastric cancer.
GC-Content Normalization for RNA-Seq Data
2011-01-01
Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264
2012-01-01
Background Identification of active causal regulators is a crucial problem in understanding mechanism of diseases or finding drug targets. Methods that infer causal regulators directly from primary data have been proposed and successfully validated in some cases. These methods necessarily require very large sample sizes or a mix of different data types. Recent studies have shown that prior biological knowledge can successfully boost a method's ability to find regulators. Results We present a simple data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and a specific type of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their regulatees, we focus on coherence of regulatees of a regulator. Using simulated datasets we show that our method performs very well at recovering even weak regulatory relationships with a low false discovery rate. Using three separate real biological datasets we were able to recover well known and as yet undescribed, active regulators for each disease population. The results are represented as a rank-ordered list of regulators, and reveals both single and higher-order regulatory relationships. Conclusions CSA is an intuitive data-driven way of selecting directed perturbation experiments that are relevant to a disease population of interest and represent a starting point for further investigation. Our findings demonstrate that combining co-expression analysis on regulatee sets with a literature-derived network can successfully identify causal regulators and help develop possible hypothesis to explain disease progression. PMID:22443377
Zheng, Tingting; Ni, Yueqiong; Li, Jun; Chow, Billy K. C.; Panagiotou, Gianni
2017-01-01
Background: A range of computational methods that rely on the analysis of genome-wide expression datasets have been developed and successfully used for drug repositioning. The success of these methods is based on the hypothesis that introducing a factor (in this case, a drug molecule) that could reverse the disease gene expression signature will lead to a therapeutic effect. However, it has also been shown that globally reversing the disease expression signature is not a prerequisite for drug activity. On the other hand, the basic idea of significant anti-correlation in expression profiles could have great value for establishing diet-disease associations and could provide new insights into the role of dietary interventions in disease. Methods: We performed an integrated analysis of publicly available gene expression profiles for foods, diseases and drugs, by calculating pairwise similarity scores for diet and disease gene expression signatures and characterizing their topological features in protein-protein interaction networks. Results: We identified 485 diet-disease pairs where diet could positively influence disease development and 472 pairs where specific diets should be avoided in a disease state. Multiple evidence suggests that orange, whey and coconut fat could be beneficial for psoriasis, lung adenocarcinoma and macular degeneration, respectively. On the other hand, fructose-rich diet should be restricted in patients with chronic intermittent hypoxia and ovarian cancer. Since humans normally do not consume foods in isolation, we also applied different algorithms to predict synergism; as a result, 58 food pairs were predicted. Interestingly, the diets identified as anti-correlated with diseases showed a topological proximity to the disease proteins similar to that of the corresponding drugs. Conclusions: In conclusion, we provide a computational framework for establishing diet-disease associations and additional information on the role of diet in disease development. Due to the complexity of analyzing the food composition and eating patterns of individuals our in silico analysis, using large-scale gene expression datasets and network-based topological features, may serve as a proof-of-concept in nutritional systems biology for identifying diet-disease relationships and subsequently designing dietary recommendations. PMID:29033850
Xiao, Xiaolin; Moreno-Moral, Aida; Rotival, Maxime; Bottolo, Leonardo; Petretto, Enrico
2014-01-01
Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.
Soneson, Charlotte; Lilljebjörn, Henrik; Fioretos, Thoas; Fontes, Magnus
2010-04-15
With the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia. Using the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA. We conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
Fundamental limits on dynamic inference from single-cell snapshots
Weinreb, Caleb; Tusi, Betsabeh K.; Socolovsky, Merav
2018-01-01
Single-cell expression profiling reveals the molecular states of individual cells with unprecedented detail. Because these methods destroy cells in the process of analysis, they cannot measure how gene expression changes over time. However, some information on dynamics is present in the data: the continuum of molecular states in the population can reflect the trajectory of a typical cell. Many methods for extracting single-cell dynamics from population data have been proposed. However, all such attempts face a common limitation: for any measured distribution of cell states, there are multiple dynamics that could give rise to it, and by extension, multiple possibilities for underlying mechanisms of gene regulation. Here, we describe the aspects of gene expression dynamics that cannot be inferred from a static snapshot alone and identify assumptions necessary to constrain a unique solution for cell dynamics from static snapshots. We translate these constraints into a practical algorithmic approach, population balance analysis (PBA), which makes use of a method from spectral graph theory to solve a class of high-dimensional differential equations. We use simulations to show the strengths and limitations of PBA, and then apply it to single-cell profiles of hematopoietic progenitor cells (HPCs). Cell state predictions from this analysis agree with HPC fate assays reported in several papers over the past two decades. By highlighting the fundamental limits on dynamic inference faced by any method, our framework provides a rigorous basis for dynamic interpretation of a gene expression continuum and clarifies best experimental designs for trajectory reconstruction from static snapshot measurements. PMID:29463712
2014-12-26
additive value function, which assumes mutual preferential independence (Gregory S. Parnell, 2013). In other words, this method can be used if the... additive value function method to calculate the aggregate value of multiple objectives. Step 9 : Sensitivity Analysis Once the global values are...gravity metric, the additive method will be applied using equal weights for each axis value function. Pilot Satisfaction (Usability) As expressed
NASA Astrophysics Data System (ADS)
Irshad, Humayun; Oh, Eun-Yeong; Schmolze, Daniel; Quintana, Liza M.; Collins, Laura; Tamimi, Rulla M.; Beck, Andrew H.
2017-02-01
The assessment of protein expression in immunohistochemistry (IHC) images provides important diagnostic, prognostic and predictive information for guiding cancer diagnosis and therapy. Manual scoring of IHC images represents a logistical challenge, as the process is labor intensive and time consuming. Since the last decade, computational methods have been developed to enable the application of quantitative methods for the analysis and interpretation of protein expression in IHC images. These methods have not yet replaced manual scoring for the assessment of IHC in the majority of diagnostic laboratories and in many large-scale research studies. An alternative approach is crowdsourcing the quantification of IHC images to an undefined crowd. The aim of this study is to quantify IHC images for labeling of ER status with two different crowdsourcing approaches, image-labeling and nuclei-labeling, and compare their performance with automated methods. Crowdsourcing- derived scores obtained greater concordance with the pathologist interpretations for both image-labeling and nuclei-labeling tasks (83% and 87%), as compared to the pathologist concordance achieved by the automated method (81%) on 5,338 TMA images from 1,853 breast cancer patients. This analysis shows that crowdsourcing the scoring of protein expression in IHC images is a promising new approach for large scale cancer molecular pathology studies.
Correlation of EGFR expression, gene copy number and clinicopathological status in NSCLC.
Gaber, Rania; Watermann, Iris; Kugler, Christian; Reinmuth, Nils; Huber, Rudolf M; Schnabel, Philipp A; Vollmer, Ekkehard; Reck, Martin; Goldmann, Torsten
2014-09-17
Epidermal Growth Factor Receptor (EGFR) targeting therapies are currently of great relevance for the treatment of lung cancer. For this reason, in addition to mutational analysis immunohistochemistry (IHC) of EGFR in lung cancer has been discussed for the decision making of according therapeutic strategies. The aim of this study was to obtain standardization of EGFR-expression methods for the selection of patients who might benefit of EGFR targeting therapies. As a starting point of a broad investigation, aimed at elucidating the expression of EGFR on different biological levels, four EGFR specific antibodies were analyzed concerning potential differences in expression levels by Immunohistochemistry (IHC) and correlated with fluorescence in situ hybridization (FISH) analysis and clinicopathological data. 206 tumor tissues were analyzed in a tissue microarray format employing immunohistochemistry with four different antibodies including Dako PharmDx kit (clone 2-18C9), clone 31G7, clone 2.1E1 and clone SP84 using three different scoring methods. Protein expression was compared to FISH utilizing two different probes. EGFR protein expression determined by IHC with Dako PharmDx kit, clone 31G7 and clone 2.1E1 (p ≤ 0.05) correlated significantly with both FISH probes independently of the three scoring methods; best correlation is shown for 31G7 using the scoring method that defined EGFR positivity when ≥ 10% of the tumor cells show membranous staining of moderate and severe intensity (p=0.001). Overall, our data show differences in EGFR expression determined by IHC, due to the applied antibody. Highest concordance with FISH is shown for antibody clone 31G7, evaluated with score B (p=0.001). On this account, this antibody clone might by utilized for standard evaluation of EGFR expression by IHC. The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/13000_2014_165.
QuASAR: quantitative allele-specific analysis of reads
Harvey, Chris T.; Moyerbrailean, Gregory A.; Davis, Gordon O.; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger
2015-01-01
Motivation: Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. Results: We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. Availability and implementation: http://github.com/piquelab/QuASAR. Contact: fluca@wayne.edu or rpique@wayne.edu Supplementary information: Supplementary Material is available at Bioinformatics online. PMID:25480375
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181
Global Study of the Simple Pendulum by the Homotopy Analysis Method
ERIC Educational Resources Information Center
Bel, A.; Reartes, W.; Torresi, A.
2012-01-01
Techniques are developed to find all periodic solutions in the simple pendulum by means of the homotopy analysis method (HAM). This involves the solution of the equations of motion in two different coordinate representations. Expressions are obtained for the cycles and periods of oscillations with a high degree of accuracy in the whole range of…
Using Peptide-Level Proteomics Data for Detecting Differentially Expressed Proteins.
Suomi, Tomi; Corthals, Garry L; Nevalainen, Olli S; Elo, Laura L
2015-11-06
The expression of proteins can be quantified in high-throughput means using different types of mass spectrometers. In recent years, there have emerged label-free methods for determining protein abundance. Although the expression is initially measured at the peptide level, a common approach is to combine the peptide-level measurements into protein-level values before differential expression analysis. However, this simple combination is prone to inconsistencies between peptides and may lose valuable information. To this end, we introduce here a method for detecting differentially expressed proteins by combining peptide-level expression-change statistics. Using controlled spike-in experiments, we show that the approach of averaging peptide-level expression changes yields more accurate lists of differentially expressed proteins than does the conventional protein-level approach. This is particularly true when there are only few replicate samples or the differences between the sample groups are small. The proposed technique is implemented in the Bioconductor package PECA, and it can be downloaded from http://www.bioconductor.org.
[Diagnosis of tropical malaria by express-methods].
Popov, A F; Nikiforov, N D; Ivanis, V A; Barkun, S P; Sanin, B I; Fed'kina, L I
2004-01-01
An examination of a thick blood drop and of blood smear for the presence of plasmodia is a classic and indisputable diagnostic test for tropic malaria. However, express-methods, based on the immune-enzyme analysis, have been introduced into the health-care practice primarily in developing and underdeveloped countries. The diagnosis of tropic malaria by using the discussed methods enables, in the non-laboratory settings, a rapid and reliable detection of PI. falciparum in blood. This is important because an untimely diagnosis of tropic malaria increases the risk of the lethal outcome.
Expression of SLP-2 Was Associated with Invasion of Esophageal Squamous Cell Carcinoma
Cao, Wenfeng; Zhang, Bin; Ding, Fang; Zhang, Weiran; Sun, Baocun; Liu, Zhihua
2013-01-01
Introduction Stomatin-like protein 2 (SLP-2), a member of the Stomatin superfamily, has been identified as an oncogenic-related protein and found to be up-regulated in multi-cancers. Nonetheless, the expression pattern and regulation of SLP-2 in human esophageal squamous cell carcinoma (ESCC) remain unexplored. Methods Immunohistochemistry and immunofluorescence staining analysis were performed to show SLP-2 expression and location. RNAi method was used to inhibit specific protein expression. Transwell assay was done to investigate cells invasive capability. RT-PCR and Western blot analysis were used to detect mRNA and protein expression levels. Results Immunohistochemical analysis showed that up-regulation of SLP-2 was found in invasive front compared with cancer central tissue in ESCC. Inhibition of SLP-2 by SLP-2 siRNA can decrease ESCC cells invasive capability through MMP-2 dependent manner. Up-regulation of SLP-2 was effectively abrogated by the ERK1/2 inhibitors either PD98059 or U0126, but no effect was showed by the treatment of AKT inhibitors either LY294002 or MK-2206. So the regulation of SLP-2 was involved in activation of the MAPK/ERK pathway. Conclusions We found that PMA/EGF could induce the up-regulated expression of SLP-2 probably through activating ERK signalling. The current study suggests that SLP-2 may represent an important molecular hallmark that is clinically relevant to the invasion of ESCC. PMID:23667687
PD-L1 expression as poor prognostic factor in patients with non-squamous non-small cell lung cancer
Zheng, Xiaobin; Li, Zhanyu; Sun, Tiantian; Li, Jie; Wang, Shuncong; Zhou, Xiuling; Sun, Hongliu; Cheng, Zhibin; Zhang, Hongyu; Ma, Haiqing
2017-01-01
Objectives The role of programmed cell death ligand 1 (PD-L1) in non-small cell lung cancer (NSCLC), especially according to histologic type, remains controversial. The purpose of this study was to assess PD-L1 expression and its association with overall survival (OS) and clinicopathologic characteristics in NSCLC. Materials and methods Formalin-fixed paraffin-embedded specimens were obtained from 108 patients with surgically resected primary NSCLC. PD-L1 expression was assessed via immunohistochemistry using a histochemistry score system. The relationship between OS or clinicopathologic characteristics and PD-L1 expression was evaluated via the Kaplan-Meier method and Cox proportional hazards model, respectively. Results Of 108 NSCLC specimens, 44 had high PD-L1 expression, which was highly associated with histologic type (p = 0.003). Patients without PD-L1 expression had remarkably longer OS than those with PD-L1 expression (median OS: 96 months vs. 33 months, p < 0.001). In the subgroup analysis of non-squamous cell carcinoma, OS was more favorable in those without PD-L1 expression than in those with PD-L1 expression (median OS: 113 months vs. 37 months, p < 0.001). Multivariate analysis revealed that PD-L1 expression (95% confidence interval 1.459-4.520, p < 0.001), male sex and higher tumor-node-metastasis stage were significantly correlated with shorter OS. Conclusions This study demonstrated that PD-L1 expression is an independent prognostic factor for poor survival in NSCLC patients, especially those with non-squamous NSCLC. PMID:28938570
Qasim, Ban J.; Ali, Hussam H.; Hussein, Alaa G.
2013-01-01
Background/Aim: To evaluate the immunohistochemical expression of matrix metalloproteinase-7 (MMP-7) in colorectal adenomas, and to correlate this expression with different clinicopathological parameters. Patients and Methods: The study was retrospectively designed. Thirty three paraffin blocks from patients with colorectal adenoma and 20 samples of non-tumerous colonic tissue taken as control group were included in the study. MMP-7 expression was assessed by immunohistochemistry method. The scoring of immunohistochemical staining was conducted utilizing a specified automated cellular image analysis system (Digimizer). Results: The frequency of positive immunohistochemical expression of MMP-7 was significantly higher in adenoma than control group (45.45% versus 10%) (P value < 0.001). Strong MMP-7 staining was mainly seen in adenoma cases (30.30%) in comparison with control (0%) the difference is significant (P < 0.001). The three digital parameters of MMP-7 immunohistochemical expression (Area (A), Number of objects (N), and intensity (I)) were significantly higher in adenoma than control. Mean (A and I) of MMP-7 showed a significant correlation with large sized adenoma (≥ 1cm) (P < 0.05), also a significant positive correlation of the three digital parameters (A, N, and I) of MMP-7 expression with villous configuration and severe dysplasia in colorectal adenoma had been identified (P < 0.05). Conclusion: MMP-7 plays an important role in the growth and malignant conversion of colorectal adenomas as it is more likely to be expressed in advanced colorectal adenomatous polyps with large size, severe dysplasia and villous histology. The use of automated cellular image analysis system (Digmizer) to quantify immunohistochemical staining yields more consistent assay results, converts semi-quantitative assay to a truly quantitative assay, and improves assay objectivity and reproducibility. PMID:23319034
Marko, Nicholas F.; Weil, Robert J.
2012-01-01
Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863
ExAtlas: An interactive online tool for meta-analysis of gene expression data.
Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H
2015-12-01
We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-Expression Analysis.
Wang, Duolin; Wang, Juexin; Jiang, Yuexu; Liang, Yanchun; Xu, Dong
2017-02-03
Comparing the gene-expression profiles between biological conditions is useful for understanding gene regulation underlying complex phenotypes. Along this line, analysis of differential co-expression (DC) has gained attention in the recent years, where genes under one condition have different co-expression patterns compared with another. We developed an R package Bayes Factor approach for Differential Co-expression Analysis (BFDCA) for DC analysis. BFDCA is unique in integrating various aspects of DC patterns (including Shift, Cross, and Re-wiring) into one uniform Bayes factor. We tested BFDCA using simulation data and experimental data. Simulation results indicate that BFDCA outperforms existing methods in accuracy and robustness of detecting DC pairs and DC modules. Results of using experimental data suggest that BFDCA can cluster disease-related genes into functional DC subunits and estimate the regulatory impact of disease-related genes well. BFDCA also achieves high accuracy in predicting case-control phenotypes by using significant DC gene pairs as markers. BFDCA is publicly available at http://dx.doi.org/10.17632/jdz4vtvnm3.1. Copyright © 2016 Elsevier Ltd. All rights reserved.
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison
Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth
2006-01-01
Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497
Multiscale Embedded Gene Co-expression Network Analysis
Song, Won-Min; Zhang, Bin
2015-01-01
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma. PMID:26618778
Multiscale Embedded Gene Co-expression Network Analysis.
Song, Won-Min; Zhang, Bin
2015-11-01
Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
A robust two-way semi-linear model for normalization of cDNA microarray data
Wang, Deli; Huang, Jian; Xie, Hehuang; Manzella, Liliana; Soares, Marcelo Bento
2005-01-01
Background Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values. Methods We propose a robust semiparametric method in a two-way semi-linear model (TW-SLM) for normalization of cDNA microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that: (i) the percentage of differentially expressed genes is small; or (ii) the numbers of up- and down-regulated genes are about the same, as required in the LOWESS normalization method. We conduct simulation studies to evaluate the proposed method and use a real data set from a specially designed microarray experiment to compare the performance of the proposed method with that of the LOWESS normalization approach. Results The simulation results show that the proposed method performs better than the LOWESS normalization method in terms of mean square errors for estimated gene effects. The results of analysis of the real data set also show that the proposed method yields more consistent results between the direct and the indirect comparisons and also can detect more differentially expressed genes than the LOWESS method. Conclusions Our simulation studies and the real data example indicate that the proposed robust TW-SLM method works at least as well as the LOWESS method and works better when the underlying assumptions for the LOWESS method are not satisfied. Therefore, it is a powerful alternative to the existing normalization methods. PMID:15663789
Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David
2001-01-01
Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
Flow cytometric monitoring of hormone receptor expression in human solid tumors
NASA Astrophysics Data System (ADS)
Krishan, Awtar
2002-05-01
Hormone receptor expression in human breast and prostate tumors is of diagnostic and therapeutic importance. With the availability of anti-estrogen, androgen and progesterone antibodies, immunohistochemistry has become a standard tool for determination of receptor expression in human tumor biopsies. However, this method is dependent on examination of a small number of cells under a microscope and the data obtained in most cases is not quantitative. As most of the commercially used anti-hormone antibodies have nuclear specificity, we have developed methods for isolation and antigen unmasking of nuclei from formalin fixed/paraffin embedded archival human tumors. After immunostaining with the antibodies and propidium iodide (for DNA content and cell cycle analysis), nuclei are analyzed by multiparametric laser flow cytometry for hormone receptor expression, DNA content, aneuploidy and cell cycle determination. These multiparametric methods are especially important for retrospective studies seeking to correlate hormone receptor expression with clinical response to anti-hormonal therapy of human breast and prostate tumors.
Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho
2015-10-28
Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.
Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian
2015-01-01
In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213
RNA sample preparation applied to gene expression profiling for the horse biological passport.
Bailly-Chouriberry, Ludovic; Baudoin, Florent; Cormant, Florence; Glavieux, Yohan; Loup, Benoit; Garcia, Patrice; Popot, Marie-Agnès; Bonnaire, Yves
2017-09-01
The improvement of doping control is an ongoing race. Techniques to fight doping are usually based on the direct detection of drugs or their metabolites by analytical methods such as chromatography hyphenated to mass spectrometry after ad hoc sample preparation. Nowadays, omic methods constitute an attractive development and advances have been achieved particularly by application of molecular biology tools for detection of anabolic androgenic steroids (AAS), erythropoiesis-stimulating agent (ESA), or to control human growth hormone misuses. These interesting results across different animal species have suggested that modification of gene expression offers promising new methods of improving the window of detection of banned substances by targeting their effects on blood cell gene expression. In this context, the present study describes the possibility of using a modified version of the dedicated Human IVD (in vitro Diagnostics) PAXgene® Blood RNA Kit for horse gene expression analysis in blood collected on PAXgene® tubes applied to the horse biological passport. The commercial kit was only approved for human blood samples and has required an optimization of specific technical requirements for equine blood samples. Improvements and recommendations were achieved for sample collection, storage and RNA extraction procedure. Following these developments, RNA yield and quality were demonstrated to be suitable for downstream gene expression analysis by qPCR techniques. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Mining the archives: a cross-platform analysis of gene ...
Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason; Wang, Xiu-Jie; Au, Kin Fai
2017-01-01
Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only. PMID:27899656
Zhou, Xiao-liang; Shi, Pei-ji; Wang, Hao
2011-01-01
To prepare RGD4CβL fusion protein using prokaryotic expression system and evaluate the biological activity of the RGD4CβL. RGD4CβL gene was cloned into pColdII to contruct β-Lactamase prokaryotic expression vector. After transformation, the recombinant vector was induced to express recombinant protein RGD4CβL by IPTG in E.coli BL(DE3). The recombinant protein was purified by Ni-NTA resin under denaturing condition and then dialyzed to renature. The tumor cell targeting ability of the recombinant protein was analyzed by flow cytometric analysis. After cleavage and purification, β-Lactamase moiety showed the expected size of 42 000 on Tricine-SDS-PAGE, and was further confirmed by Western blotting. Based on flow cytometric analysis, the purified protein specially targeted breast cancer cell line MCF-7. This research successfully estiblished a method for prokaryotic expression and purification of β-lactamase. These results suggest the potential use of the protein as an agent for ADEPT.
Li, Qi-Gang; He, Yong-Han; Wu, Huan; Yang, Cui-Ping; Pu, Shao-Yan; Fan, Song-Qing; Jiang, Li-Ping; Shen, Qiu-Shuo; Wang, Xiao-Xiong; Chen, Xiao-Qiong; Yu, Qin; Li, Ying; Sun, Chang; Wang, Xiangting; Zhou, Jumin; Li, Hai-Peng; Chen, Yong-Bin; Kong, Qing-Peng
2017-01-01
Heterogeneity in transcriptional data hampers the identification of differentially expressed genes (DEGs) and understanding of cancer, essentially because current methods rely on cross-sample normalization and/or distribution assumption-both sensitive to heterogeneous values. Here, we developed a new method, Cross-Value Association Analysis (CVAA), which overcomes the limitation and is more robust to heterogeneous data than the other methods. Applying CVAA to a more complex pan-cancer dataset containing 5,540 transcriptomes discovered numerous new DEGs and many previously rarely explored pathways/processes; some of them were validated, both in vitro and in vivo , to be crucial in tumorigenesis, e.g., alcohol metabolism ( ADH1B ), chromosome remodeling ( NCAPH ) and complement system ( Adipsin ). Together, we present a sharper tool to navigate large-scale expression data and gain new mechanistic insights into tumorigenesis.
A Simple Exact Error Rate Analysis for DS-CDMA with Arbitrary Pulse Shape in Flat Nakagami Fading
NASA Astrophysics Data System (ADS)
Rahman, Mohammad Azizur; Sasaki, Shigenobu; Kikuchi, Hisakazu; Harada, Hiroshi; Kato, Shuzo
A simple exact error rate analysis is presented for random binary direct sequence code division multiple access (DS-CDMA) considering a general pulse shape and flat Nakagami fading channel. First of all, a simple model is developed for the multiple access interference (MAI). Based on this, a simple exact expression of the characteristic function (CF) of MAI is developed in a straight forward manner. Finally, an exact expression of error rate is obtained following the CF method of error rate analysis. The exact error rate so obtained can be much easily evaluated as compared to the only reliable approximate error rate expression currently available, which is based on the Improved Gaussian Approximation (IGA).
Unsupervised Outlier Profile Analysis
Ghosh, Debashis; Li, Song
2014-01-01
In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C(α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study. PMID:25452686
Knowles, David W; Biggin, Mark D
2013-01-01
Animals comprise dynamic three-dimensional arrays of cells that express gene products in intricate spatial and temporal patterns that determine cellular differentiation and morphogenesis. A rigorous understanding of these developmental processes requires automated methods that quantitatively record and analyze complex morphologies and their associated patterns of gene expression at cellular resolution. Here we summarize light microscopy-based approaches to establish permanent, quantitative datasets-atlases-that record this information. We focus on experiments that capture data for whole embryos or large areas of tissue in three dimensions, often at multiple time points. We compare and contrast the advantages and limitations of different methods and highlight some of the discoveries made. We emphasize the need for interdisciplinary collaborations and integrated experimental pipelines that link sample preparation, image acquisition, image analysis, database design, visualization, and quantitative analysis. Copyright © 2013 Wiley Periodicals, Inc.
Hopp, Lydia; Löffler-Wirth, Henry; Galle, Jörg; Binder, Hans
2018-06-11
We present here a novel method that enables unraveling the interplay between gene expression and DNA methylation in complex diseases such as cancer. The method is based on self-organizing maps and allows for analysis of data landscapes from 'governed by methylation' to 'governed by expression'. We identified regulatory modules of coexpressed and comethylated genes in high-grade gliomas: two modes are governed by genes hypermethylated and underexpressed in IDH-mutated cases, while two other modes reflect immune and stromal signatures in the classical and mesenchymal subtypes. A fifth mode with proneural characteristics comprises genes of repressed and poised chromatin states active in healthy brain. Two additional modes enrich genes either in active or repressed chromatin states. The method disentangles the interplay between gene expression and methylation. It has the potential to integrate also mutation and copy number data and to apply to large sample cohorts.
Matsunaga, Hiroko; Goto, Mari; Arikawa, Koji; Shirai, Masataka; Tsunoda, Hiroyuki; Huang, Huan; Kambara, Hideki
2015-02-15
Analyses of gene expressions in single cells are important for understanding detailed biological phenomena. Here, a highly sensitive and accurate method by sequencing (called "bead-seq") to obtain a whole gene expression profile for a single cell is proposed. A key feature of the method is to use a complementary DNA (cDNA) library on magnetic beads, which enables adding washing steps to remove residual reagents in a sample preparation process. By adding the washing steps, the next steps can be carried out under the optimal conditions without losing cDNAs. Error sources were carefully evaluated to conclude that the first several steps were the key steps. It is demonstrated that bead-seq is superior to the conventional methods for single-cell gene expression analyses in terms of reproducibility, quantitative accuracy, and biases caused during sample preparation and sequencing processes. Copyright © 2014 Elsevier Inc. All rights reserved.
Li, Li; Zhang, Jiangyu; Deng, Qingshan; Li, Jieming; Li, Zhengfen; Xiao, Yao; Hu, Shuiwang; Li, Tiantian; Tan, Qiuxiao; Li, Xiaofang; Luo, Bingshu; Mo, Hui
2016-01-01
Objectives To identify differential protein expression pattern associated with polycystic ovary syndrome (PCOS). Methods Twenty women were recruited for the study, ten with PCOS as a test group and ten without PCOS as a control group. Differential in-gel electrophoresis (DIGE) analysis and mass spectroscopy were employed to identify proteins that were differentially expressed between the PCOS and normal ovaries. The differentially expressed proteins were further validated by western blot (WB) and immunohistochemistry (IHC). Results DIGE analysis revealed eighteen differentially expressed proteins in the PCOS ovaries of which thirteen were upregulated, and five downregulated. WB and IHC confirmed the differential expression of membrane-associated progesterone receptor component 1 (PGRMC1), retinol-binding protein 1 (RBP1), heat shock protein 90B1, calmodulin 1, annexin A6, and tropomyosin 2. Also, WB analysis revealed significantly (P<0.05) higher expression of PGRMC1 and RBP1 in PCOS ovaries as compared to the normal ovaries. The differential expression of the proteins was also validated by IHC. Conclusions The present study identified novel differentially expressed proteins in the ovarian tissues of women with PCOS that can serve as potential biomarkers for the diagnosis and development of novel therapeutics for the treatment of PCOS using molecular interventions. PMID:27846214
Liu, Y T; Li, S R; Wang, Z; Xiao, J Z
2016-09-13
Objective: To profile the gene expression changes associated with endoplasmic reticulum stress in INS-1-3 cells induced by thapsigargin (TG) and tunicamycin (TM). Methods: Normal cultured INS-1-3 cells were used as a control. TG and TM were used to induce endoplasmic reticulum stress in INS-1-3 cells. Digital gene expression profiling technique was used to detect differentially expressed gene. The changes of gene expression were detected by expression pattern clustering analysis, gene ontology (GO) function and pathway enrichment analysis. Real time polymerase chain reaction (RT-PCR) was used to verify the key changes of gene expression. Results: Compared with the control group, there were 57 (45 up-regulated, 12 down-regulated) and 135 (99 up-regulated, 36 down-regulated) differentially expressed genes in TG and TM group, respectively. GO function enrichment analyses indicated that the main enrichment was in the endoplasmic reticulum. In signaling pathway analysis, the identified pathways were related with endoplasmic reticulum stress, antigen processing and presentation, protein export, and most of all, the maturity onset diabetes of the young (MODY) pathway. Conclusion: Under the condition of endoplasmic reticulum stress, the related expression changes of transcriptional factors in MODY signaling pathway may be related with the impaired function in islet beta cells.
He, F-Y; Liu, H-J; Guo, Q; Sheng, J-L
2017-02-01
miR-300 has been demonstrated to play an important role in the progression of several tumors, but its role in tumorigenesis of laryngeal squamous cell carcinoma (LSCC) is still unclear. The purpose of this study was to explore miR-300 expression in LSCC patients and analyze its association with clinicopathological factors and prognosis. In the present study, we measured the expression level of miR-300 in LSCC tissues by RT-PCR. Associations between miRNA-300 expressions and various clinicopathological characteristics were analyzed. Patient survival and their differences were determined by Kaplan-Meier method and log-rank test. The univariate and multivariate analysis were performed using the Cox proportional hazard analysis. miR-300 expression was significantly increased in LSCC tissues compared with that in adjacent non-cancerous tissues (p < 0.01). In addition, lymph node metastasis (p = 0.004) and TNM stage (p = 0.001) were obvious influence factors for the expression of miR-300. More importantly, Kaplan-Meier analysis showed that LSCC patients with low miR-300 expression tended to have shorter overall survival (p < 0.001). Finally, multivariate analysis revealed that miR-300 expression was an independent prognostic factor for LSCC patients. Our results pointed to miR-300 as a powerful prognostic marker in LSCC and as a novel target for tumor-suppressive therapy.
Single-feature polymorphism discovery in the barley transcriptome
Rostoks, Nils; Borevitz, Justin O; Hedley, Peter E; Russell, Joanne; Mudie, Sharon; Morris, Jenny; Cardle, Linda; Marshall, David F; Waugh, Robbie
2005-01-01
A probe-level model for analysis of GeneChip gene-expression data is presented which identified more than 10,000 single-feature polymorphisms (SFP) between two barley genotypes. The method has good sensitivity, as 67% of known single-nucleotide polymorphisms (SNP) were called as SFPs. This method is applicable to all oligonucleotide microarray data, accounts for SNP effects in gene-expression data and represents an efficient and versatile approach for highly parallel marker identification in large genomes. PMID:15960806
The Use of EST Expression Matrixes for the Quality Control of Gene Expression Data
Milnthorpe, Andrew T.; Soloviev, Mikhail
2012-01-01
EST expression profiling provides an attractive tool for studying differential gene expression, but cDNA libraries' origins and EST data quality are not always known or reported. Libraries may originate from pooled or mixed tissues; EST clustering, EST counts, library annotations and analysis algorithms may contain errors. Traditional data analysis methods, including research into tissue-specific gene expression, assume EST counts to be correct and libraries to be correctly annotated, which is not always the case. Therefore, a method capable of assessing the quality of expression data based on that data alone would be invaluable for assessing the quality of EST data and determining their suitability for mRNA expression analysis. Here we report an approach to the selection of a small generic subset of 244 UniGene clusters suitable for identification of the tissue of origin for EST libraries and quality control of the expression data using EST expression information alone. We created a small expression matrix of UniGene IDs using two rounds of selection followed by two rounds of optimisation. Our selection procedures differ from traditional approaches to finding “tissue-specific” genes and our matrix yields consistency high positive correlation values for libraries with confirmed tissues of origin and can be applied for tissue typing and quality control of libraries as small as just a few hundred total ESTs. Furthermore, we can pick up tissue correlations between related tissues e.g. brain and peripheral nervous tissue, heart and muscle tissues and identify tissue origins for a few libraries of uncharacterised tissue identity. It was possible to confirm tissue identity for some libraries which have been derived from cancer tissues or have been normalised. Tissue matching is affected strongly by cancer progression or library normalisation and our approach may potentially be applied for elucidating the stage of normalisation in normalised libraries or for cancer staging. PMID:22412959
Ahrens, Maike; Turewicz, Michael; Casjens, Swaantje; May, Caroline; Pesch, Beate; Stephan, Christian; Woitalla, Dirk; Gold, Ralf; Brüning, Thomas; Meyer, Helmut E.
2013-01-01
Detection of yet unknown subgroups showing differential gene or protein expression is a frequent goal in the analysis of modern molecular data. Applications range from cancer biology over developmental biology to toxicology. Often a control and an experimental group are compared, and subgroups can be characterized by differential expression for only a subgroup-specific set of genes or proteins. Finding such genes and corresponding patient subgroups can help in understanding pathological pathways, diagnosis and defining drug targets. The size of the subgroup and the type of differential expression determine the optimal strategy for subgroup identification. To date, commonly used software packages hardly provide statistical tests and methods for the detection of such subgroups. Different univariate methods for subgroup detection are characterized and compared, both on simulated and on real data. We present an advanced design for simulation studies: Data is simulated under different distributional assumptions for the expression of the subgroup, and performance results are compared against theoretical upper bounds. For each distribution, different degrees of deviation from the majority of observations are considered for the subgroup. We evaluate classical approaches as well as various new suggestions in the context of omics data, including outlier sum, PADGE, and kurtosis. We also propose the new FisherSum score. ROC curve analysis and AUC values are used to quantify the ability of the methods to distinguish between genes or proteins with and without certain subgroup patterns. In general, FisherSum for small subgroups and -test for large subgroups achieve best results. We apply each method to a case-control study on Parkinson's disease and underline the biological benefit of the new method. PMID:24278130
HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.
Song, Chi; Tseng, George C
2014-01-01
Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.
[Combine fats products: methodic opportunities of it identification].
Viktorova, E V; Kulakova, S N; Mikhaĭlov, N A
2006-01-01
At present time very topical problem is falsification of milk fat. The number of methods was considered to detection of milk fat authention and possibilities his difference from combined fat products. The analysis of modern approaches to valuation of milk fat authention has showed that the main method for detection of fat nature is gas chromatography analysis. The computer method of express identification of fat products is proposed for quick getting of information about accessory of examine fat to nature milk or combined fat product.
Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data.
Tekwe, Carmen D; Carroll, Raymond J; Dabney, Alan R
2012-08-01
Protein abundance in quantitative proteomics is often based on observed spectral features derived from liquid chromatography mass spectrometry (LC-MS) or LC-MS/MS experiments. Peak intensities are largely non-normal in distribution. Furthermore, LC-MS-based proteomics data frequently have large proportions of missing peak intensities due to censoring mechanisms on low-abundance spectral features. Recognizing that the observed peak intensities detected with the LC-MS method are all positive, skewed and often left-censored, we propose using survival methodology to carry out differential expression analysis of proteins. Various standard statistical techniques including non-parametric tests such as the Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney rank sum tests, and the parametric survival model and accelerated failure time-model with log-normal, log-logistic and Weibull distributions were used to detect any differentially expressed proteins. The statistical operating characteristics of each method are explored using both real and simulated datasets. Survival methods generally have greater statistical power than standard differential expression methods when the proportion of missing protein level data is 5% or more. In particular, the AFT models we consider consistently achieve greater statistical power than standard testing procedures, with the discrepancy widening with increasing missingness in the proportions. The testing procedures discussed in this article can all be performed using readily available software such as R. The R codes are provided as supplemental materials. ctekwe@stat.tamu.edu.
Han, Tianci; Shu, Tianci; Dong, Siyuan; Li, Peiwen; Li, Weinan; Liu, Dali; Qi, Ruiqun; Zhang, Shuguang; Zhang, Lin
2017-05-01
Decreased expression of human chemokine-like factor-like MARVEL transmembrane domain-containing 3 (CMTM3) has been identified in a number of human tumors and tumor cell lines, including gastric and testicular cancer, and PC3, CAL27 and Tca-83 cell lines. However, the association between CMTM3 expression and the clinicopathological features and prognosis of esophageal squamous cell carcinoma (ESCC) patients remains unclear. The aim of the present study was to investigate the correlation between CMTM3 expression and clinicopathological parameters and prognosis in ESCC. CMTM3 mRNA and protein expression was analyzed in ESCC and paired non-tumor tissues by quantitative real-time polymerase chain reaction, western blotting and immunohistochemical analysis. The Kaplan-Meier method was used to plot survival curves and the Cox proportional hazards regression model was also used for univariate and multivariate survival analysis. The results revealed that CMTM3 mRNA and protein expression levels were lower in 82.5% (30/40) and 75% (30/40) of ESCC tissues, respectively, when compared with matched non-tumor tissues. Statistical analysis demonstrated that CMTM3 expression was significantly correlated with lymph node metastasis (P=0.002) and clinical stage (P<0.001) in ESCC tissues. Furthermore, the survival time of ESCC patients exhibiting low CMTM3 expression was significantly shorter than that of ESCC patients exhibiting high CMTM3 expression (P=0.01). In addition, Kaplan-Meier survival analysis revealed that the overall survival time of patients exhibiting low CMTM3 expression was significantly decreased compared with patients exhibiting high CMTM3 expression (P=0.010). Cox multivariate analysis indicated that CMTM3 protein expression was an independent prognostic predictor for ESCC after resection. This study indicated that CMTM3 expression is significantly decreased in ESCC tissues and CMTM3 protein expression in resected tumors may present an effective prognostic biomarker.
Evaluation of predictive capacities of biomarkers based on research synthesis.
Hattori, Satoshi; Zhou, Xiao-Hua
2016-11-10
The objective of diagnostic studies or prognostic studies is to evaluate and compare predictive capacities of biomarkers. Suppose we are interested in evaluation and comparison of predictive capacities of continuous biomarkers for a binary outcome based on research synthesis. In analysis of each study, subjects are often classified into two groups of the high-expression and low-expression groups according to a cut-off value, and statistical analysis is based on a 2 × 2 table defined by the response and the high expression or low expression of the biomarker. Because the cut-off is study specific, it is difficult to interpret a combined summary measure such as an odds ratio based on the standard meta-analysis techniques. The summary receiver operating characteristic curve is a useful method for meta-analysis of diagnostic studies in the presence of heterogeneity of cut-off values to examine discriminative capacities of biomarkers. We develop a method to estimate positive or negative predictive curves, which are alternative to the receiver operating characteristic curve based on information reported in published papers of each study. These predictive curves provide a useful graphical presentation of pairs of positive and negative predictive values and allow us to compare predictive capacities of biomarkers of different scales in the presence of heterogeneity in cut-off values among studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Chen, Chih-Hao; Hsu, Chueh-Lin; Huang, Shih-Hao; Chen, Shih-Yuan; Hung, Yi-Lin; Chen, Hsiao-Rong; Wu, Yu-Chung
2015-01-01
Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. PMID:25793610
[Oral and written affective expression in children of low socioeconomic status].
Larraguibel, M; Lolas Stepke, F
1991-06-01
Descriptive data on affective expression of 58 children (33 girls and 25 boys) of low socioeconomic status (Graffar index), with ages between 8 and 12 are presented. Intelligence was assessed by means of Raven Progressive Matrixes Test, all subjects exhibiting mean level. Evaluated were the six forms of anxiety and the four hostility forms defined by the Gottschalk method of verbal content analysis. Hope scores, positive and negative, were also obtained from the same verbal samples. The oral sample consisted in speech produced spontaneously during 5 minutes, in response to a standard instruction, and the written sample consisted in brief stories produced under standardized conditions during 15 minutes. The most frequently expressed form of anxiety was separation anxiety, while the most frequently expressed form of hostility was directed outwards covert hostility. "Positive" hope was expressed more frequently than "negative" hope. Data are discussed in terms of their contribution to the establishment of population norms in Spanish-speaking populations for the psychological constructs explored. It is concluded that the method of content analysis of verbal behavior may represent a useful tool for the study of child psychology in different contexts.
Capurro, Alberto; Bodea, Liviu-Gabriel; Schaefer, Patrick; Luthi-Carter, Ruth; Perreau, Victoria M.
2015-01-01
The characterization of molecular changes in diseased tissues gives insight into pathophysiological mechanisms and is important for therapeutic development. Genome-wide gene expression analysis has proven valuable for identifying biological processes in neurodegenerative diseases using post mortem human brain tissue and numerous datasets are publically available. However, many studies utilize heterogeneous tissue samples consisting of multiple cell types, all of which contribute to global gene expression values, confounding biological interpretation of the data. In particular, changes in numbers of neuronal and glial cells occurring in neurodegeneration confound transcriptomic analyses, particularly in human brain tissues where sample availability and controls are limited. To identify cell specific gene expression changes in neurodegenerative disease, we have applied our recently published computational deconvolution method, population specific expression analysis (PSEA). PSEA estimates cell-type-specific expression values using reference expression measures, which in the case of brain tissue comprises mRNAs with cell-type-specific expression in neurons, astrocytes, oligodendrocytes and microglia. As an exercise in PSEA implementation and hypothesis development regarding neurodegenerative diseases, we applied PSEA to Parkinson's and Huntington's disease (PD, HD) datasets. Genes identified as differentially expressed in substantia nigra pars compacta neurons by PSEA were validated using external laser capture microdissection data. Network analysis and Annotation Clustering (DAVID) identified molecular processes implicated by differential gene expression in specific cell types. The results of these analyses provided new insights into the implementation of PSEA in brain tissues and additional refinement of molecular signatures in human HD and PD. PMID:25620908
Expression of Stanniocalcin 1 in Thyroid Side Population Cells and Thyroid Cancer Cells
Hayase, Suguru; Sasaki, Yoshihito; Matsubara, Tsutomu; Seo, Daekwan; Miyakoshi, Masaaki; Murata, Tsubasa; Ozaki, Takashi; Kakudo, Kennichi; Kumamoto, Kensuke; Ylaya, Kris; Cheng, Sheue-yann; Thorgeirsson, Snorri S.; Hewitt, Stephen M.; Ward, Jerrold M.
2015-01-01
Background: Mouse thyroid side population (SP) cells consist of a minor population of mouse thyroid cells that may have multipotent thyroid stem cell characteristics. However the nature of thyroid SP cells remains elusive, particularly in relation to thyroid cancer. Stanniocalcin (STC) 1 and 2 are secreted glycoproteins known to regulate serum calcium and phosphate homeostasis. In recent years, the relationship of STC1/2 expression to cancer has been described in various tissues. Method: Microarray analysis was carried out to determine genes up- and down-regulated in thyroid SP cells as compared with non-SP cells. Among genes up-regulated, stanniocalcin 1 (STC1) was chosen for study because of its expression in various thyroid cells by Western blotting and immunohistochemistry. Results: Gene expression analysis revealed that genes known to be highly expressed in cancer cells and/or involved in cancer invasion/metastasis were markedly up-regulated in SP cells from both intact as well as partial thyroidectomized thyroids. Among these genes, expression of STC1 was found in five human thyroid carcinoma–derived cell lines as revealed by analysis of mRNA and protein, and its expression was inversely correlated with the differentiation status of the cells. Immunohistochemical analysis demonstrated higher expression of STC1 in the thyroid tumor cell line and thyroid tumor tissues from humans and mice. Conclusion: These results suggest that SP cells contain a population of cells that express genes also highly expressed in cancer cells including Stc1, which warrants further study on the role of SP cells and/or STC1 expression in thyroid cancer. PMID:25647164
Brell, Marta; Ibáñez, Javier; Tortosa, Avelina
2011-01-26
The DNA repair protein O6-Methylguanine-DNA methyltransferase (MGMT) confers resistance to alkylating agents. Several methods have been applied to its analysis, with methylation-specific polymerase chain reaction (MSP) the most commonly used for promoter methylation study, while immunohistochemistry (IHC) has become the most frequently used for the detection of MGMT protein expression. Agreement on the best and most reliable technique for evaluating MGMT status remains unsettled. The aim of this study was to perform a systematic review and meta-analysis of the correlation between IHC and MSP. A computer-aided search of MEDLINE (1950-October 2009), EBSCO (1966-October 2009) and EMBASE (1974-October 2009) was performed for relevant publications. Studies meeting inclusion criteria were those comparing MGMT protein expression by IHC with MGMT promoter methylation by MSP in the same cohort of patients. Methodological quality was assessed by using the QUADAS and STARD instruments. Previously published guidelines were followed for meta-analysis performance. Of 254 studies identified as eligible for full-text review, 52 (20.5%) met the inclusion criteria. The review showed that results of MGMT protein expression by IHC are not in close agreement with those obtained with MSP. Moreover, type of tumour (primary brain tumour vs others) was an independent covariate of accuracy estimates in the meta-regression analysis beyond the cut-off value. Protein expression assessed by IHC alone fails to reflect the promoter methylation status of MGMT. Thus, in attempts at clinical diagnosis the two methods seem to select different groups of patients and should not be used interchangeably.
Momose, Haruka; Mizukami, Takuo; Kuramitsu, Madoka; Takizawa, Kazuya; Masumi, Atsuko; Araki, Kumiko; Furuhata, Keiko; Yamaguchi, Kazunari; Hamaguchi, Isao
2015-01-01
We have previously identified 17 biomarker genes which were upregulated by whole virion influenza vaccines, and reported that gene expression profiles of these biomarker genes had a good correlation with conventional animal safety tests checking body weight and leukocyte counts. In this study, we have shown that conventional animal tests showed varied and no dose-dependent results in serially diluted bulk materials of influenza HA vaccines. In contrast, dose dependency was clearly shown in the expression profiles of biomarker genes, demonstrating higher sensitivity of gene expression analysis than the current animal safety tests of influenza vaccines. The introduction of branched DNA based-concurrent expression analysis could simplify the complexity of multiple gene expression approach, and could shorten the test period from 7 days to 3 days. Furthermore, upregulation of 10 genes, Zbp1, Mx2, Irf7, Lgals9, Ifi47, Tapbp, Timp1, Trafd1, Psmb9, and Tap2, was seen upon virosomal-adjuvanted vaccine treatment, indicating that these biomarkers could be useful for the safety control of virosomal-adjuvanted vaccines. In summary, profiling biomarker gene expression could be a useful, rapid, and highly sensitive method of animal safety testing compared with conventional methods, and could be used to evaluate the safety of various types of influenza vaccines, including adjuvanted vaccine. PMID:25909814
Image analysis tools and emerging algorithms for expression proteomics
English, Jane A.; Lisacek, Frederique; Morris, Jeffrey S.; Yang, Guang-Zhong; Dunn, Michael J.
2012-01-01
Since their origins in academic endeavours in the 1970s, computational analysis tools have matured into a number of established commercial packages that underpin research in expression proteomics. In this paper we describe the image analysis pipeline for the established 2-D Gel Electrophoresis (2-DE) technique of protein separation, and by first covering signal analysis for Mass Spectrometry (MS), we also explain the current image analysis workflow for the emerging high-throughput ‘shotgun’ proteomics platform of Liquid Chromatography coupled to MS (LC/MS). The bioinformatics challenges for both methods are illustrated and compared, whilst existing commercial and academic packages and their workflows are described from both a user’s and a technical perspective. Attention is given to the importance of sound statistical treatment of the resultant quantifications in the search for differential expression. Despite wide availability of proteomics software, a number of challenges have yet to be overcome regarding algorithm accuracy, objectivity and automation, generally due to deterministic spot-centric approaches that discard information early in the pipeline, propagating errors. We review recent advances in signal and image analysis algorithms in 2-DE, MS, LC/MS and Imaging MS. Particular attention is given to wavelet techniques, automated image-based alignment and differential analysis in 2-DE, Bayesian peak mixture models and functional mixed modelling in MS, and group-wise consensus alignment methods for LC/MS. PMID:21046614
Genome-Wide Analysis of Long Noncoding RNA (lncRNA) Expression in Hepatoblastoma Tissues
Xue, Ping; Cui, Ximao; Li, Kai; Zheng, Shan; He, Xianghuo; Dong, Kuiran
2014-01-01
Long noncoding RNAs (lncRNAs) have crucial roles in cancer biology. We performed a genome-wide analysis of lncRNA expression in hepatoblastoma tissues to identify novel targets for further study of hepatoblastoma. Hepatoblastoma and normal liver tissue samples were obtained from hepatoblastoma patients. The genome-wide analysis of lncRNA expression in these tissues was performed using a 4×180 K lncRNA microarray and Sureprint G3 Human lncRNA Chips. Quantitative RT-PCR (qRT-PCR) was performed to confirm these results. The differential expressions of lncRNAs and mRNAs were identified through fold-change filtering. Gene Ontology (GO) and pathway analyses were performed using the standard enrichment computation method. Associations between lncRNAs and adjacent protein-coding genes were determined through complex transcriptional loci analysis. We found that 2736 lncRNAs were differentially expressed in hepatoblastoma tissues. Among these, 1757 lncRNAs were upregulated more than two-fold relative to normal tissues and 979 lncRNAs were downregulated. Moreover, in hepatoblastoma there were 420 matched lncRNA-mRNA pairs for 120 differentially expressed lncRNAs, and 167 differentially expressed mRNAs. The co-expression network analysis predicted 252 network nodes and 420 connections between 120 lncRNAs and 132 coding genes. Within this co-expression network, 369 pairs were positive, and 51 pairs were negative. Lastly, qRT-PCR data verified six upregulated and downregulated lncRNAs in hepatoblastoma, plus endothelial cell-specific molecule 1 (ESM1) mRNA. Our results demonstrated that expression of these aberrant lncRNAs could respond to hepatoblastoma development. Further study of these lncRNAs could provide useful insight into hepatoblastoma biology. PMID:24465615
Ceol, M; Forino, M; Gambaro, G; Sauer, U; Schleicher, E D; D'Angelo, A; Anglani, F
2001-01-01
Gene expression can be examined with different techniques including ribonuclease protection assay (RPA), in situ hybridisation (ISH), and quantitative reverse transcription-polymerase chain reaction (RT/PCR). These methods differ considerably in their sensitivity and precision in detecting and quantifying low abundance mRNA. Although there is evidence that RT/PCR can be performed in a quantitative manner, the quantitative capacity of this method is generally underestimated. To demonstrate that the comparative kinetic RT/PCR strategy-which uses a housekeeping gene as internal standard-is a quantitative method to detect significant differences in mRNA levels between different samples, the inhibitory effect of heparin on phorbol 12-myristate 13-acetate (PMA)-induced-TGF-beta1 mRNA expression was evaluated by RT/PCR and RPA, the standard method of mRNA quantification, and the results were compared. The reproducibility of RT/PCR amplification was calculated by comparing the quantity of G3PDH and TGF-beta1 PCR products, generated during the exponential phases, estimated from two different RT/PCR (G3PDH, r = 0.968, P = 0.0000; TGF-beta1, r = 0.966, P = 0.0000). The quantitative capacity of comparative kinetic RT/PCR was demonstrated by comparing the results obtained from RPA and RT/PCR using linear regression analysis. Starting from the same RNA extraction, but using only 1% of the RNA for the RT/PCR compared to RPA, significant correlation was observed (r = 0.984, P = 0.0004). Moreover the morphometric analysis of ISH signal was applied for the semi-quantitative evaluation of the expression and localisation of TGF-beta1 mRNA in the entire cell population. Our results demonstrate the close similarity of the RT/PCR and RPA methods in giving quantitative information on mRNA expression and indicate the possibility to adopt the comparative kinetic RT/PCR as reliable quantitative method of mRNA analysis. Copyright 2001 Wiley-Liss, Inc.
Dagnino, Lina; Crawford, Melissa
2018-03-27
In this article, we provide a method to isolate primary epidermal melanocytes from reporter mice, which also allow targeted gene inactivation. The mice from which these cells are isolated are bred into a Rosa26 mT/mG reporter background, which results in GFP expression in the targeted melanocytic cell population. These cells are isolated and cultured to >95% purity. The cells can be used for gene expression studies, clonogenic experiments, and biological assays, such as capacity for migration. Melanocytes are slow moving cells, and we also provide a method to measure motility using individual cell tracking and data analysis.
Stevens, David Cole; Conway, Kyle R.; Pearce, Nelson; Villegas-Peñaranda, Luis Roberto; Garza, Anthony G.; Boddy, Christopher N.
2013-01-01
Background Heterologous expression of bacterial biosynthetic gene clusters is currently an indispensable tool for characterizing biosynthetic pathways. Development of an effective, general heterologous expression system that can be applied to bioprospecting from metagenomic DNA will enable the discovery of a wealth of new natural products. Methodology We have developed a new Escherichia coli-based heterologous expression system for polyketide biosynthetic gene clusters. We have demonstrated the over-expression of the alternative sigma factor σ54 directly and positively regulates heterologous expression of the oxytetracycline biosynthetic gene cluster in E. coli. Bioinformatics analysis indicates that σ54 promoters are present in nearly 70% of polyketide and non-ribosomal peptide biosynthetic pathways. Conclusions We have demonstrated a new mechanism for heterologous expression of the oxytetracycline polyketide biosynthetic pathway, where high-level pleiotropic sigma factors from the heterologous host directly and positively regulate transcription of the non-native biosynthetic gene cluster. Our bioinformatics analysis is consistent with the hypothesis that heterologous expression mediated by the alternative sigma factor σ54 may be a viable method for the production of additional polyketide products. PMID:23724102
Turewicz, Michael; Kohl, Michael; Ahrens, Maike; Mayer, Gerhard; Uszkoreit, Julian; Naboulsi, Wael; Bracht, Thilo; Megger, Dominik A; Sitek, Barbara; Marcus, Katrin; Eisenacher, Martin
2017-11-10
The analysis of high-throughput mass spectrometry-based proteomics data must address the specific challenges of this technology. To this end, the comprehensive proteomics workflow offered by the de.NBI service center BioInfra.Prot provides indispensable components for the computational and statistical analysis of this kind of data. These components include tools and methods for spectrum identification and protein inference, protein quantification, expression analysis as well as data standardization and data publication. All particular methods of the workflow which address these tasks are state-of-the-art or cutting edge. As has been shown in previous publications, each of these methods is adequate to solve its specific task and gives competitive results. However, the methods included in the workflow are continuously reviewed, updated and improved to adapt to new scientific developments. All of these particular components and methods are available as stand-alone BioInfra.Prot services or as a complete workflow. Since BioInfra.Prot provides manifold fast communication channels to get access to all components of the workflow (e.g., via the BioInfra.Prot ticket system: bioinfraprot@rub.de) users can easily benefit from this service and get support by experts. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Parallel human genome analysis: microarray-based expression monitoring of 1000 genes.
Schena, M; Shalon, D; Heller, R; Chai, A; Brown, P O; Davis, R W
1996-01-01
Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery. Images Fig. 1 Fig. 2 Fig. 3 PMID:8855227
Li, Xiaohong; Brock, Guy N; Rouchka, Eric C; Cooper, Nigel G F; Wu, Dongfeng; O'Toole, Timothy E; Gill, Ryan S; Eteleeb, Abdallah M; O'Brien, Liz; Rai, Shesh N
2017-01-01
Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a challenge to choose an optimal method due to multiple factors contributing to read count variability that affects the overall sensitivity and specificity. In order to properly determine the most appropriate normalization methods, it is critical to compare the performance and shortcomings of a representative set of normalization routines based on different dataset characteristics. Therefore, we set out to evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med UQ and FQ) and two new methods we propose: Med-pgQ2 and UQ-pgQ2 (per-gene normalization after per-sample median or upper-quartile global scaling). Our per-gene normalization approach allows for comparisons between conditions based on similar count levels. Using the benchmark Microarray Quality Control Project (MAQC) and simulated datasets, we performed differential gene expression analysis to evaluate these methods. When evaluating MAQC2 with two replicates, we observed that Med-pgQ2 and UQ-pgQ2 achieved a slightly higher area under the Receiver Operating Characteristic Curve (AUC), a specificity rate > 85%, the detection power > 92% and an actual false discovery rate (FDR) under 0.06 given the nominal FDR (≤0.05). Although the top commonly used methods (DESeq and TMM-edgeR) yield a higher power (>93%) for MAQC2 data, they trade off with a reduced specificity (<70%) and a slightly higher actual FDR than our proposed methods. In addition, the results from an analysis based on the qualitative characteristics of sample distribution for MAQC2 and human breast cancer datasets show that only our gene-wise normalization methods corrected data skewed towards lower read counts. However, when we evaluated MAQC3 with less variation in five replicates, all methods performed similarly. Thus, our proposed Med-pgQ2 and UQ-pgQ2 methods perform slightly better for differential gene analysis of RNA-seq data skewed towards lowly expressed read counts with high variation by improving specificity while maintaining a good detection power with a control of the nominal FDR level.
Li, Xiaohong; Brock, Guy N.; Rouchka, Eric C.; Cooper, Nigel G. F.; Wu, Dongfeng; O’Toole, Timothy E.; Gill, Ryan S.; Eteleeb, Abdallah M.; O’Brien, Liz
2017-01-01
Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a challenge to choose an optimal method due to multiple factors contributing to read count variability that affects the overall sensitivity and specificity. In order to properly determine the most appropriate normalization methods, it is critical to compare the performance and shortcomings of a representative set of normalization routines based on different dataset characteristics. Therefore, we set out to evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med UQ and FQ) and two new methods we propose: Med-pgQ2 and UQ-pgQ2 (per-gene normalization after per-sample median or upper-quartile global scaling). Our per-gene normalization approach allows for comparisons between conditions based on similar count levels. Using the benchmark Microarray Quality Control Project (MAQC) and simulated datasets, we performed differential gene expression analysis to evaluate these methods. When evaluating MAQC2 with two replicates, we observed that Med-pgQ2 and UQ-pgQ2 achieved a slightly higher area under the Receiver Operating Characteristic Curve (AUC), a specificity rate > 85%, the detection power > 92% and an actual false discovery rate (FDR) under 0.06 given the nominal FDR (≤0.05). Although the top commonly used methods (DESeq and TMM-edgeR) yield a higher power (>93%) for MAQC2 data, they trade off with a reduced specificity (<70%) and a slightly higher actual FDR than our proposed methods. In addition, the results from an analysis based on the qualitative characteristics of sample distribution for MAQC2 and human breast cancer datasets show that only our gene-wise normalization methods corrected data skewed towards lower read counts. However, when we evaluated MAQC3 with less variation in five replicates, all methods performed similarly. Thus, our proposed Med-pgQ2 and UQ-pgQ2 methods perform slightly better for differential gene analysis of RNA-seq data skewed towards lowly expressed read counts with high variation by improving specificity while maintaining a good detection power with a control of the nominal FDR level. PMID:28459823
Peterson, Leif E
2002-01-01
CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
Gripsrud, Birgitta Haga; Brassil, Kelly J; Summers, Barbara; Søiland, Håvard; Kronowitz, Steven; Lode, Kirsten
2016-01-01
Expressive writing has been shown to improve quality of life, fatigue, and posttraumatic stress among breast cancer patients across cultures. Understanding how and why the method may be beneficial to patients can increase awareness of the psychosocial impact of breast cancer and enhance interventional work within this population. Qualitative research on experiential aspects of interventions may inform the theoretical understanding and generate hypotheses for future studies. The aim of the study was to explore and describe the experience and feasibility of expressive writing among women with breast cancer following mastectomy and immediate or delayed reconstructive surgery. Seven participants enrolled to undertake 4 episodes of expressive writing at home, with semistructured interviews conducted afterward and analyzed using experiential thematic analysis. Three themes emerged through analysis: writing as process, writing as therapeutic, and writing as a means to help others. Findings illuminate experiential variations in expressive writing and how storytelling encourages a release of cognitive and emotional strains, surrendering these to reside in the text. The method was said to process feelings and capture experiences tied to a new and overwhelming illness situation, as impressions became expressions through writing. Expressive writing, therefore, is a valuable tool for healthcare providers to introduce into the plan of care for patients with breast cancer and potentially other cancer patient groups. This study augments existing evidence to support the appropriateness of expressive writing as an intervention after a breast cancer diagnosis. Further studies should evaluate its feasibility at different time points in survivorship.
BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data
Gonçalves, Joana P; Madeira, Sara C; Oliveira, Arlindo L
2009-01-01
Background The ability to monitor changes in expression patterns over time, and to observe the emergence of coherent temporal responses using expression time series, is critical to advance our understanding of complex biological processes. Biclustering has been recognized as an effective method for discovering local temporal expression patterns and unraveling potential regulatory mechanisms. The general biclustering problem is NP-hard. In the case of time series this problem is tractable, and efficient algorithms can be used. However, there is still a need for specialized applications able to take advantage of the temporal properties inherent to expression time series, both from a computational and a biological perspective. Findings BiGGEsTS makes available state-of-the-art biclustering algorithms for analyzing expression time series. Gene Ontology (GO) annotations are used to assess the biological relevance of the biclusters. Methods for preprocessing expression time series and post-processing results are also included. The analysis is additionally supported by a visualization module capable of displaying informative representations of the data, including heatmaps, dendrograms, expression charts and graphs of enriched GO terms. Conclusion BiGGEsTS is a free open source graphical software tool for revealing local coexpression of genes in specific intervals of time, while integrating meaningful information on gene annotations. It is freely available at: . We present a case study on the discovery of transcriptional regulatory modules in the response of Saccharomyces cerevisiae to heat stress. PMID:19583847
Rangan, Aaditya V; McGrouther, Caroline C; Kelsoe, John; Schork, Nicholas; Stahl, Eli; Zhu, Qian; Krishnan, Arjun; Yao, Vicky; Troyanskaya, Olga; Bilaloglu, Seda; Raghavan, Preeti; Bergen, Sarah; Jureus, Anders; Landen, Mikael
2018-05-14
A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).
Maximizing RNA yield from archival renal tumors and optimizing gene expression analysis.
Glenn, Sean T; Head, Karen L; Teh, Bin T; Gross, Kenneth W; Kim, Hyung L
2010-01-01
Formalin-fixed, paraffin-embedded tissues are widely available for gene expression analysis using TaqMan PCR. Five methods, including 4 commercial kits, for recovering RNA from paraffin-embedded renal tumor tissue were compared. The MasterPure kit from Epicentre produced the highest RNA yield. However, the difference in RNA yield between the kit from Epicenter and Invitrogen's TRIzol method was not significant. Using the top 3 RNA isolation methods, the manufacturers' protocols were modified to include an overnight Proteinase K digestion. Overnight protein digestion resulted in a significant increase in RNA yield. To optimize the reverse transcription reaction, conventional reverse transcription with random oligonucleotide primers was compared to reverse transcription using primers specific for genes of interest. Reverse transcription using gene-specific primers significantly increased the quantity of cDNA detectable by TaqMan PCR. Therefore, expression profiling of formalin-fixed, paraffin-embedded tissue using TaqMan qPCR can be optimized by using the MasterPure RNA isolation kit modified to include an overnight Proteinase K digestion and gene-specific primers during the reverse transcription.
NASA Technical Reports Server (NTRS)
Arnold, Steven M.; Trowbridge, D.
2001-01-01
A critical issue in the micromechanics-based analysis of composite structures becomes the availability of a computationally efficient homogenization technique: one that is 1) Capable of handling the sophisticated, physically based, viscoelastoplastic constitutive and life models for each constituent; 2) Able to generate accurate displacement and stress fields at both the macro and the micro levels; 3) Compatible with the finite element method. The Generalized Method of Cells (GMC) developed by Paley and Aboudi is one such micromechanical model that has been shown to predict accurately the overall macro behavior of various types of composites given the required constituent properties. Specifically, the method provides "closed-form" expressions for the macroscopic composite response in terms of the properties, size, shape, distribution, and response of the individual constituents or phases that make up the material. Furthermore, expressions relating the internal stress and strain fields in the individual constituents in terms of the macroscopically applied stresses and strains are available through strain or stress concentration matrices. These expressions make possible the investigation of failure processes at the microscopic level at each step of an applied load history.
PmiRExAt: plant miRNA expression atlas database and web applications
Gurjar, Anoop Kishor Singh; Panwar, Abhijeet Singh; Gupta, Rajinder; Mantri, Shrikant S.
2016-01-01
High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and nutritional value. Bioinformatics resources enabling meta-analysis of miRNA expression across multiple plant species are still evolving. Here, we report PmiRExAt, a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs. A feature enabling expression study of conserved miRNA across multiple species is also implemented. Custom expression analysis feature enables expression analysis of novel miRNA in total 117 datasets. New sRNA dataset can also be uploaded for analysing miRNA expression profiles for 73 plant species. PmiRExAt application program interface, a simple object access protocol web service allows other programmers to remotely invoke the methods written for doing programmatic search operations on PmiRExAt database. Database URL: http://pmirexat.nabi.res.in. PMID:27081157
Wang, Xiao-Fei; Zhu, Yi-Tong; Wang, Jia-Jia; Zeng, Da-Xiong; Mu, Chuan-Yong; Chen, Yan-Bin; Lei, Wei; Zhu, Ye-Han; Huang, Jian-An
2017-01-01
Interleukin-17 (IL-17) plays an important role in cancer progression. Previous studies remained controversial regarding the correlation between IL-17 expression and lung cancer (LC) prognosis. To comprehensively and quantitatively summarize the prognostic value of IL-17 expression in LC patients, a systematic review and meta-analysis were performed. We identified the relevant literatures by searching the PubMed, EMBASE, Cochrane Library, SinoMed, China National Knowledge Infrastructure (CNKI) and Wanfang Data databases, up until April 1, 2017. Overall survival (OS), disease free survival (DFS) and clinicopathological characteristics were collected from relevant studies. Pooled hazard ratios (HR) and corresponding 95% confidence intervals (CI) were calculated to estimate the effective value of IL-17 expression on clinical outcomes. Six studies containing 479 Chinese LC patients were involved in this meta-analysis. The results indicated high IL-17 expression was independently correlated with poorer OS (HR = 1.82, 95% CI 1.44-2.29, P < 0.00001) and shorter DFS (HR = 2.41, 95% CI 1.42-4.08, P = 0.001) in LC patients. Further, when stratified by LC histological type (non-small cell lung cancer and small cell lung cancer), tumor stage (Ⅰ-Ⅲ,Ⅰ-Ⅳ and Ⅳ), detection specimen (serum, intratumoral tissue and pleural effusion), test method (immunological histological chemistry and enzyme linked immunosorbent assay), and HR estimated method (reported and estimated), all of the results were statistically significant. These data indicated that elevated IL-17 expression is correlated with poor clinical outcomes in LC. The meta-analysis did not show heterogeneity or publication bias. The present meta-analysis revealed that high IL-17 expression was an indicator of poor prognosis for Chinese patients with LC. It could potentially help to assess patients' prognosis and estimate treatment efficacy in therapeutic interventions.
Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia
2012-01-01
Objectives To perform a meta-analysis of gene expression microarray data from animal studies of lung injury, and to identify an injury-specific gene expression signature capable of predicting the development of lung injury in humans. Methods We performed a microarray meta-analysis using 77 microarray chips across six platforms, two species and different animal lung injury models exposed to lung injury with or/and without mechanical ventilation. Individual gene chips were classified and grouped based on the strategy used to induce lung injury. Effect size (change in gene expression) was calculated between non-injurious and injurious conditions comparing two main strategies to pool chips: (1) one-hit and (2) two-hit lung injury models. A random effects model was used to integrate individual effect sizes calculated from each experiment. Classification models were built using the gene expression signatures generated by the meta-analysis to predict the development of lung injury in human lung transplant recipients. Results Two injury-specific lists of differentially expressed genes generated from our meta-analysis of lung injury models were validated using external data sets and prospective data from animal models of ventilator-induced lung injury (VILI). Pathway analysis of gene sets revealed that both new and previously implicated VILI-related pathways are enriched with differentially regulated genes. Classification model based on gene expression signatures identified in animal models of lung injury predicted development of primary graft failure (PGF) in lung transplant recipients with larger than 80% accuracy based upon injury profiles from transplant donors. We also found that better classifier performance can be achieved by using meta-analysis to identify differentially-expressed genes than using single study-based differential analysis. Conclusion Taken together, our data suggests that microarray analysis of gene expression data allows for the detection of “injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521
Wong, Linda; Hill, Beth L; Hunsberger, Benjamin C; Bagwell, C Bruce; Curtis, Adam D; Davis, Bruce H
2015-01-01
Leuko64™ (Trillium Diagnostics) is a flow cytometric assay that measures neutrophil CD64 expression and serves as an in vitro indicator of infection/sepsis or the presence of a systemic acute inflammatory response. Leuko64 assay currently utilizes QuantiCALC, a semiautomated software that employs cluster algorithms to define cell populations. The software reduces subjective gating decisions, resulting in interanalyst variability of <5%. We evaluated a completely automated approach to measuring neutrophil CD64 expression using GemStone™ (Verity Software House) and probability state modeling (PSM). Four hundred and fifty-seven human blood samples were processed using the Leuko64 assay. Samples were analyzed on four different flow cytometer models: BD FACSCanto II, BD FACScan, BC Gallios/Navios, and BC FC500. A probability state model was designed to identify calibration beads and three leukocyte subpopulations based on differences in intensity levels of several parameters. PSM automatically calculates CD64 index values for each cell population using equations programmed into the model. GemStone software uses PSM that requires no operator intervention, thus totally automating data analysis and internal quality control flagging. Expert analysis with the predicate method (QuantiCALC) was performed. Interanalyst precision was evaluated for both methods of data analysis. PSM with GemStone correlates well with the expert manual analysis, r(2) = 0.99675 for the neutrophil CD64 index values with no intermethod bias detected. The average interanalyst imprecision for the QuantiCALC method was 1.06% (range 0.00-7.94%), which was reduced to 0.00% with the GemStone PSM. The operator-to-operator agreement in GemStone was a perfect correlation, r(2) = 1.000. Automated quantification of CD64 index values produced results that strongly correlate with expert analysis using a standard gate-based data analysis method. PSM successfully evaluated flow cytometric data generated by multiple instruments across multiple lots of the Leuko64 kit in all 457 cases. The probability-based method provides greater objectivity, higher data analysis speed, and allows for greater precision for in vitro diagnostic flow cytometric assays. © 2015 International Clinical Cytometry Society.
ERIC Educational Resources Information Center
Bantum, Erin O'Carroll; Owen, Jason E.
2009-01-01
Psychological interventions provide linguistic data that are particularly useful for testing mechanisms of action and improving intervention methodologies. For this study, emotional expression in an Internet-based intervention for women with breast cancer (n = 63) was analyzed via rater coding and 2 computerized coding methods (Linguistic Inquiry…
Semi-supervised prediction of gene regulatory networks using machine learning algorithms.
Patel, Nihir; Wang, Jason T L
2015-10-01
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Modelling gene expression profiles related to prostate tumor progression using binary states
2013-01-01
Background Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. Methods We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. Results We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. Conclusions Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies. PMID:23721350
Zhu, Ying; Zhang, Yun-Xia; Liu, Wen-Wen; Ma, Yan; Fang, Qun; Yao, Bo
2015-04-01
This paper describes a nanoliter droplet array-based single-cell reverse transcription quantitative PCR (RT-qPCR) assay method for quantifying gene expression in individual cells. By sequentially printing nanoliter-scale droplets on microchip using a microfluidic robot, all liquid-handling operations including cell encapsulation, lysis, reverse transcription, and quantitative PCR with real-time fluorescence detection, can be automatically achieved. The inhibition effect of cell suspension buffer on RT-PCR assay was comprehensively studied to achieve high-sensitivity gene quantification. The present system was applied in the quantitative measurement of expression level of mir-122 in single Huh-7 cells. A wide distribution of mir-122 expression in single cells from 3061 copies/cell to 79998 copies/cell was observed, showing a high level of cell heterogeneity. With the advantages of full-automation in liquid-handling, simple system structure, and flexibility in achieving multi-step operations, the present method provides a novel liquid-handling mode for single cell gene expression analysis, and has significant potentials in transcriptional identification and rare cell analysis.
Zhu, Ying; Zhang, Yun-Xia; Liu, Wen-Wen; Ma, Yan; Fang, Qun; Yao, Bo
2015-01-01
This paper describes a nanoliter droplet array-based single-cell reverse transcription quantitative PCR (RT-qPCR) assay method for quantifying gene expression in individual cells. By sequentially printing nanoliter-scale droplets on microchip using a microfluidic robot, all liquid-handling operations including cell encapsulation, lysis, reverse transcription, and quantitative PCR with real-time fluorescence detection, can be automatically achieved. The inhibition effect of cell suspension buffer on RT-PCR assay was comprehensively studied to achieve high-sensitivity gene quantification. The present system was applied in the quantitative measurement of expression level of mir-122 in single Huh-7 cells. A wide distribution of mir-122 expression in single cells from 3061 copies/cell to 79998 copies/cell was observed, showing a high level of cell heterogeneity. With the advantages of full-automation in liquid-handling, simple system structure, and flexibility in achieving multi-step operations, the present method provides a novel liquid-handling mode for single cell gene expression analysis, and has significant potentials in transcriptional identification and rare cell analysis. PMID:25828383
Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou
2013-08-01
Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.
Enhanced expression of G-protein coupled estrogen receptor (GPER/GPR30) in lung cancer
2012-01-01
Background G-protein-coupled estrogen receptor (GPER/GPR30) was reported to bind 17β-estradiol (E2), tamoxifen, and ICI 182,780 (fulvestrant) and promotes activation of epidermal growth factor receptor (EGFR)-mediated signaling in breast, endometrial and thyroid cancer cells. Although lung adenocarcinomas express estrogen receptors α and β (ERα and ERβ), the expression of GPER in lung cancer has not been investigated. The purpose of this study was to examine the expression of GPER in lung cancer. Methods The expression patterns of GPER in various lung cancer lines and lung tumors were investigated using standard quantitative real time PCR (at mRNA levels), Western blot and immunohistochemistry (IHC) methods (at protein levels). The expression of GPER was scored and the pairwise comparisons (cancer vs adjacent tissues as well as cancer vs normal lung tissues) were performed. Results Analysis by real-time PCR and Western blotting revealed a significantly higher expression of GPER at both mRNA and protein levels in human non small cell lung cancer cell (NSCLC) lines relative to immortalized normal lung bronchial epithelial cells (HBECs). The virally immortalized human small airway epithelial cell line HPL1D showed higher expression than HBECs and similar expression to NSCLC cells. Immunohistochemical analysis of tissue sections of murine lung adenomas as well as human lung adenocarcinomas, squamous cell carcinomas and non-small cell lung carcinomas showed consistently higher expression of GPER in the tumor relative to the surrounding non-tumor tissue. Conclusion The results from this study demonstrate increased GPER expression in lung cancer cells and tumors compared to normal lung. Further evaluation of the function and regulation of GPER will be necessary to determine if GPER is a marker of lung cancer progression. PMID:23273253
Kanai, Masatake; Mano, Shoji; Nishimura, Mikio
2017-01-11
Plant seeds accumulate large amounts of storage reserves comprising biodegradable organic matter. Humans rely on seed storage reserves for food and as industrial materials. Gene expression profiles are powerful tools for investigating metabolic regulation in plant cells. Therefore, detailed, accurate gene expression profiles during seed development are required for crop breeding. Acquiring highly purified RNA is essential for producing these profiles. Efficient methods are needed to isolate highly purified RNA from seeds. Here, we describe a method for isolating RNA from seeds containing large amounts of oils, proteins, and polyphenols, which have inhibitory effects on high-purity RNA isolation. Our method enables highly purified RNA to be obtained from seeds without the use of phenol, chloroform, or additional processes for RNA purification. This method is applicable to Arabidopsis, rapeseed, and soybean seeds. Our method will be useful for monitoring the expression patterns of low level transcripts in developing and mature seeds.
Shape sensitivity analysis of flutter response of a laminated wing
NASA Technical Reports Server (NTRS)
Bergen, Fred D.; Kapania, Rakesh K.
1988-01-01
A method is presented for calculating the shape sensitivity of a wing aeroelastic response with respect to changes in geometric shape. Yates' modified strip method is used in conjunction with Giles' equivalent plate analysis to predict the flutter speed, frequency, and reduced frequency of the wing. Three methods are used to calculate the sensitivity of the eigenvalue. The first method is purely a finite difference calculation of the eigenvalue derivative directly from the solution of the flutter problem corresponding to the two different values of the shape parameters. The second method uses an analytic expression for the eigenvalue sensitivities of a general complex matrix, where the derivatives of the aerodynamic, mass, and stiffness matrices are computed using a finite difference approximation. The third method also uses an analytic expression for the eigenvalue sensitivities, but the aerodynamic matrix is computed analytically. All three methods are found to be in good agreement with each other. The sensitivities of the eigenvalues were used to predict the flutter speed, frequency, and reduced frequency. These approximations were found to be in good agreement with those obtained using a complete reanalysis.
2013-01-01
Background Surrogate variable analysis (SVA) is a powerful method to identify, estimate, and utilize the components of gene expression heterogeneity due to unknown and/or unmeasured technical, genetic, environmental, or demographic factors. These sources of heterogeneity are common in gene expression studies, and failing to incorporate them into the analysis can obscure results. Using SVA increases the biological accuracy and reproducibility of gene expression studies by identifying these sources of heterogeneity and correctly accounting for them in the analysis. Results Here we have developed a web application called SVAw (Surrogate variable analysis Web app) that provides a user friendly interface for SVA analyses of genome-wide expression studies. The software has been developed based on open source bioconductor SVA package. In our software, we have extended the SVA program functionality in three aspects: (i) the SVAw performs a fully automated and user friendly analysis workflow; (ii) It calculates probe/gene Statistics for both pre and post SVA analysis and provides a table of results for the regression of gene expression on the primary variable of interest before and after correcting for surrogate variables; and (iii) it generates a comprehensive report file, including graphical comparison of the outcome for the user. Conclusions SVAw is a web server freely accessible solution for the surrogate variant analysis of high-throughput datasets and facilitates removing all unwanted and unknown sources of variation. It is freely available for use at http://psychiatry.igm.jhmi.edu/sva. The executable packages for both web and standalone application and the instruction for installation can be downloaded from our web site. PMID:23497726
QuASAR: quantitative allele-specific analysis of reads.
Harvey, Chris T; Moyerbrailean, Gregory A; Davis, Gordon O; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger
2015-04-15
Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. http://github.com/piquelab/QuASAR. fluca@wayne.edu or rpique@wayne.edu Supplementary Material is available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Biggar, Kyle K; Wu, Cheng-Wei; Storey, Kenneth B
2014-10-01
This study makes a significant advancement on a microRNA amplification technique previously used for expression analysis and sequencing in animal models without annotated mature microRNA sequences. As research progresses into the post-genomic era of microRNA prediction and analysis, the need for a rapid and cost-effective method for microRNA amplification is critical to facilitate wide-scale analysis of microRNA expression. To facilitate this requirement, we have reoptimized the design of amplification primers and introduced a polyadenylation step to allow amplification of all mature microRNAs from a single RNA sample. Importantly, this method retains the ability to sequence reverse transcription polymerase chain reaction (RT-PCR) products, validating microRNA-specific amplification. Copyright © 2014 Elsevier Inc. All rights reserved.
Integrative Analysis Reveals Regulatory Programs in Endometriosis
Yang, Huan; Kang, Kai; Cheng, Chao; Mamillapalli, Ramanaiah; Taylor, Hugh S.
2015-01-01
Endometriosis is a common gynecological disease found in approximately 10% of reproductive-age women. Gene expression analysis has been performed to explore alterations in gene expression associated with endometriosis; however, the underlying transcription factors (TFs) governing such expression changes have not been investigated in a systematic way. In this study, we propose a method to integrate gene expression with TF binding data and protein–protein interactions to construct an integrated regulatory network (IRN) for endometriosis. The IRN has shown that the most regulated gene in endometriosis is RUNX1, which is targeted by 14 of 26 TFs also involved in endometriosis. Using 2 published cohorts, GSE7305 (Hover, n = 20) and GSE7307 (Roth, n = 36) from the Gene Expression Omnibus database, we identified a network of TFs, which bind to target genes that are differentially expressed in endometriosis. Enrichment analysis based on the hypergeometric distribution allowed us to predict the TFs involved in endometriosis (n = 40). This included known TFs such as androgen receptor (AR) and critical factors in the pathology of endometriosis, estrogen receptor α, and estrogen receptor β. We also identified several new ones from which we selected FOXA2 and TFAP2C, and their regulation was confirmed by quantitative real-time polymerase chain reaction and immunohistochemistry (IHC). Further, our analysis revealed that the function of AR and p53 in endometriosis is regulated by posttranscriptional changes and not by differential gene expression. Our integrative analysis provides new insights into the regulatory programs involved in endometriosis. PMID:26134036
Ding, Liang-Hao; Xie, Yang; Park, Seongmi; Xiao, Guanghua; Story, Michael D.
2008-01-01
Despite the tremendous growth of microarray usage in scientific studies, there is a lack of standards for background correction methodologies, especially in single-color microarray platforms. Traditional background subtraction methods often generate negative signals and thus cause large amounts of data loss. Hence, some researchers prefer to avoid background corrections, which typically result in the underestimation of differential expression. Here, by utilizing nonspecific negative control features integrated into Illumina whole genome expression arrays, we have developed a method of model-based background correction for BeadArrays (MBCB). We compared the MBCB with a method adapted from the Affymetrix robust multi-array analysis algorithm and with no background subtraction, using a mouse acute myeloid leukemia (AML) dataset. We demonstrated that differential expression ratios obtained by using the MBCB had the best correlation with quantitative RT–PCR. MBCB also achieved better sensitivity in detecting differentially expressed genes with biological significance. For example, we demonstrated that the differential regulation of Tnfr2, Ikk and NF-kappaB, the death receptor pathway, in the AML samples, could only be detected by using data after MBCB implementation. We conclude that MBCB is a robust background correction method that will lead to more precise determination of gene expression and better biological interpretation of Illumina BeadArray data. PMID:18450815
A robust method for RNA extraction and purification from a single adult mouse tendon.
Grinstein, Mor; Dingwall, Heather L; Shah, Rishita R; Capellini, Terence D; Galloway, Jenna L
2018-01-01
Mechanistic understanding of tendon molecular and cellular biology is crucial toward furthering our abilities to design new therapies for tendon and ligament injuries and disease. Recent transcriptomic and epigenomic studies in the field have harnessed the power of mouse genetics to reveal new insights into tendon biology. However, many mouse studies pool tendon tissues or use amplification methods to perform RNA analysis, which can significantly increase the experimental costs and limit the ability to detect changes in expression of low copy transcripts. Single Achilles tendons were harvested from uninjured, contralateral injured, and wild type mice between three and five months of age, and RNA was extracted. RNA Integrity Number (RIN) and concentration were determined, and RT-qPCR gene expression analysis was performed. After testing several RNA extraction approaches on single adult mouse Achilles tendons, we developed a protocol that was successful at obtaining high RIN and sufficient concentrations suitable for RNA analysis. We found that the RNA quality was sensitive to the time between tendon harvest and homogenization, and the RNA quality and concentration was dependent on the duration of homogenization. Using this method, we demonstrate that analysis of Scx gene expression in single mouse tendons reduces the biological variation caused by pooling tendons from multiple mice. We also show successful use of this approach to analyze Sox9 and Col1a2 gene expression changes in injured compared with uninjured control tendons. Our work presents a robust, cost-effective, and straightforward method to extract high quality RNA from a single adult mouse Achilles tendon at sufficient amounts for RT-qPCR as well as RNA-seq. We show this can reduce variation and decrease the overall costs associated with experiments. This approach can also be applied to other skeletal tissues, as well as precious human samples.
Dual oxidase 1: A predictive tool for the prognosis of hepatocellular carcinoma patients.
Chen, Shengsen; Ling, Qingxia; Yu, Kangkang; Huang, Chong; Li, Ning; Zheng, Jianming; Bao, Suxia; Cheng, Qi; Zhu, Mengqi; Chen, Mingquan
2016-06-01
Dual oxidase 1 (DUOX1), which is the main source of reactive oxygen species (ROS) production in the airway, can be silenced in human lung cancer and hepatocellular carcinomas. However, the prognostic value of DUOX1 expression in hepatocellular carcinoma patients is still unclear. We investigated the prognostic value of DUOX1 expression in liver cancer patients. DUOX1 mRNA expression was determined in tumor tissues and non-tumor tissues by real‑time PCR. For evaluation of the prognostic value of DUOX1 expression, Kaplan-Meier method and Cox's proportional hazards model (univariate analysis and multivariate analysis) were employed. A simple risk score was devised by using significant variables obtained from the Cox's regression analysis to further predict the HCC patient prognosis. We observed a reduced DUOX1 mRNA level in the cancer tissues in comparison to the non‑cancer tissues. More importantly, Kaplan-Meier analysis showed that patients with high DUOX1 expression had longer disease-free survival and overall survival compared with those with low expression of DUOX1. Cox's regression analysis indicated that DUOX1 expression, age, and intrahepatic metastasis may be significant prognostic factors for disease-free survival and overall survival. Finally, we found that patients with total scores of >2 and >1 were more likely to relapse and succumb to the disease than patients whose total scores were ≤2 and ≤1. In conclusion, DUOX1 expression in liver tumors is a potential prognostic tool for patients. The risk scoring system is useful for predicting the survival of liver cancer patients after tumor resection.
Role of SRC-3delta4 in the Progression and Metastasis of Castration-Resistant Prostate Cancer
2013-10-01
Expression of SRC-3∆4, GAPDH, and AR target genes including PSA, KLK2, IGFBP5, Cyclin A2, and UBE2C was determined by RT-qPCR analysis . Data are...Expression of AR (B), GAPDH, and TMPRSS2- ERG (C) was determined by RT-qPCR analysis . Data are presented using the comparative Ct method, in which GAPDH...input. An irrelevant region (1800 bp downstream of transcription start site) was served as a negative control. (E) and (F). ChIP analysis of SRC-3∆4’s
Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud
Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.
2015-01-01
Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053
ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis
Han, Junwei; Shi, Xinrui; Zhang, Yunpeng; Xu, Yanjun; Jiang, Ying; Zhang, Chunlong; Feng, Li; Yang, Haixiu; Shang, Desi; Sun, Zeguo; Su, Fei; Li, Chunquan; Li, Xia
2015-01-01
Pathway analyses are playing an increasingly important role in understanding biological mechanism, cellular function and disease states. Current pathway-identification methods generally focus on only the changes of gene expression levels; however, the biological relationships among genes are also the fundamental components of pathways, and the dysregulated relationships may also alter the pathway activities. We propose a powerful computational method, Edge Set Enrichment Analysis (ESEA), for the identification of dysregulated pathways. This provides a novel way of pathway analysis by investigating the changes of biological relationships of pathways in the context of gene expression data. Simulation studies illustrate the power and performance of ESEA under various simulated conditions. Using real datasets from p53 mutation, Type 2 diabetes and lung cancer, we validate effectiveness of ESEA in identifying dysregulated pathways. We further compare our results with five other pathway enrichment analysis methods. With these analyses, we show that ESEA is able to help uncover dysregulated biological pathways underlying complex traits and human diseases via specific use of the dysregulated biological relationships. We develop a freely available R-based tool of ESEA. Currently, ESEA can support pathway analysis of the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther). PMID:26267116
TCC: an R package for comparing tag count data with robust normalization strategies
2013-01-01
Background Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. Results TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. Conclusion DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13. PMID:23837715
Mori, Yoshifumi; Chung, Ung-Il; Tanaka, Sakae; Saito, Taku
2014-01-01
Superficial zone (SFZ) cells, which are morphologically and functionally distinct from chondrocytes in deeper zones, play important roles in the maintenance of articular cartilage. Here, we established an easy and reliable method for performance of laser microdissection (LMD) on cryosections of mature rat articular cartilage using an adhesive membrane. We further examined gene expression profiles in the SFZ and the deeper zones of articular cartilage by performing RNA sequencing (RNA-seq). We validated sample collection methods, RNA amplification and the RNA-seq data using real-time RT-PCR. The combined data provide comprehensive information regarding genes specifically expressed in the SFZ or deeper zones, as well as a useful protocol for expression analysis of microsamples of hard tissues.
Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun
2013-01-01
The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867
Duan, Fenghai; Xu, Ye
2017-01-01
To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.
Expression of E-cadherin in canine anal sac gland carcinoma and its association with survival.
Polton, G A; Brearley, M J; Green, L M; Scase, T J
2007-12-01
The objective of this study was to determine whether an association could be demonstrated between survival and the expression of the adhesion molecule E-cadherin by the neoplastic cells in a group of dogs with anal sac gland carcinomas (ASGCs). Archived formalin-fixed, paraffin wax-embedded primary tumour specimens were obtained for 36 cases of canine ASGC with known clinical management and survival data. Immunohistochemical methods were used to evaluate E-cadherin expression by the neoplastic cells and data were evaluated for an association between E-cadherin expression and survival. On univariate analysis, the median survival time for cases with tumours expressing E-cadherin in more than 75% of cells was significantly greater than that for cases with tumours expressing E-cadherin in fewer than 75% of cells (1168 versus 448 days, P = 0.0246). Both E-cadherin expression and presence or absence of distant metastases were significantly associated with survival on multivariate analysis. This study demonstrates that expression of E-cadherin at the cytoplasmic membrane in canine ASGCs is variable and potentially predictive of survival.
Facial recognition in education system
NASA Astrophysics Data System (ADS)
Krithika, L. B.; Venkatesh, K.; Rathore, S.; Kumar, M. Harish
2017-11-01
Human beings exploit emotions comprehensively for conveying messages and their resolution. Emotion detection and face recognition can provide an interface between the individuals and technologies. The most successful applications of recognition analysis are recognition of faces. Many different techniques have been used to recognize the facial expressions and emotion detection handle varying poses. In this paper, we approach an efficient method to recognize the facial expressions to track face points and distances. This can automatically identify observer face movements and face expression in image. This can capture different aspects of emotion and facial expressions.
Murine cell glycolipids customization by modular expression of glycosyltransferases.
Cid, Emili; Yamamoto, Miyako; Buschbeck, Marcus; Yamamoto, Fumiichiro
2013-01-01
Functional analysis of glycolipids has been hampered by their complex nature and combinatorial expression in cells and tissues. We report an efficient and easy method to generate cells with specific glycolipids. In our proof of principle experiments we have demonstrated the customized expression of two relevant glycosphingolipids on murine fibroblasts, stage-specific embryonic antigen 3 (SSEA-3), a marker for stem cells, and Forssman glycolipid, a xenoantigen. Sets of genes encoding glycosyltansferases were transduced by viral infection followed by multi-color cell sorting based on coupled expression of fluorescent proteins.
Parameterized Facial Expression Synthesis Based on MPEG-4
NASA Astrophysics Data System (ADS)
Raouzaiou, Amaryllis; Tsapatsoulis, Nicolas; Karpouzis, Kostas; Kollias, Stefanos
2002-12-01
In the framework of MPEG-4, one can include applications where virtual agents, utilizing both textual and multisensory data, including facial expressions and nonverbal speech help systems become accustomed to the actual feelings of the user. Applications of this technology are expected in educational environments, virtual collaborative workplaces, communities, and interactive entertainment. Facial animation has gained much interest within the MPEG-4 framework; with implementation details being an open research area (Tekalp, 1999). In this paper, we describe a method for enriching human computer interaction, focusing on analysis and synthesis of primary and intermediate facial expressions (Ekman and Friesen (1978)). To achieve this goal, we utilize facial animation parameters (FAPs) to model primary expressions and describe a rule-based technique for handling intermediate ones. A relation between FAPs and the activation parameter proposed in classical psychological studies is established, leading to parameterized facial expression analysis and synthesis notions, compatible with the MPEG-4 standard.
Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis
Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill
2016-01-01
Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956
Fisher, Evelyn L
2017-10-17
The purpose of this study was to explore the literature on predictors of outcomes among late talkers using systematic review and meta-analysis methods. We sought to answer the question: What factors predict preschool-age expressive-language outcomes among late-talking toddlers? We entered carefully selected search terms into the following electronic databases: Communication & Mass Media Complete, ERIC, Medline, PsycEXTRA, Psychological and Behavioral Sciences, and PsycINFO. We conducted a separate, random-effects model meta-analysis for each individual predictor that was used in a minimum of 5 studies. We also tested potential moderators of the relationship between predictors and outcomes using metaregression and subgroup analysis. Last, we conducted publication-bias and sensitivity analyses. We identified 20 samples, comprising 2,134 children, in a systematic review. According to the results of the meta-analyses, significant predictors of expressive-language outcomes included toddlerhood expressive-vocabulary size, receptive language, and socioeconomic status. Nonsignificant predictors included phrase speech, gender, and family history. To our knowledge this is the first synthesis of the literature on predictors of outcomes among late talkers using meta-analysis. Our findings clarify the contributions of several constructs to outcomes and highlight the importance of early receptive language to expressive-language development. https://doi.org/10.23641/asha.5313454.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.
Lee, Sangseon; Park, Youngjune; Kim, Sun
2017-07-15
Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information. http://biohealth.snu.ac.kr/software/MIDAS/. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
2013-01-01
Background As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. Results We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS A : DE genes with non-zero effect sizes in all studies, (2) HS B : DE genes with non-zero effect sizes in one or more studies and (3) HS r : DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. Conclusions The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS A , HS B , and HS r ). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author’s publication website. PMID:24359104
Chang, Lun-Ching; Lin, Hui-Min; Sibille, Etienne; Tseng, George C
2013-12-21
As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS(A): DE genes with non-zero effect sizes in all studies, (2) HS(B): DE genes with non-zero effect sizes in one or more studies and (3) HS(r): DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS(A), HS(B), and HS(r)). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.
The opportunities and challenges of large-scale molecular approaches to songbird neurobiology
Mello, C.V.; Clayton, D.F.
2014-01-01
High-through put methods for analyzing genome structure and function are having a large impact in song-bird neurobiology. Methods include genome sequencing and annotation, comparative genomics, DNA microarrays and transcriptomics, and the development of a brain atlas of gene expression. Key emerging findings include the identification of complex transcriptional programs active during singing, the robust brain expression of non-coding RNAs, evidence of profound variations in gene expression across brain regions, and the identification of molecular specializations within song production and learning circuits. Current challenges include the statistical analysis of large datasets, effective genome curations, the efficient localization of gene expression changes to specific neuronal circuits and cells, and the dissection of behavioral and environmental factors that influence brain gene expression. The field requires efficient methods for comparisons with organisms like chicken, which offer important anatomical, functional and behavioral contrasts. As sequencing costs plummet, opportunities emerge for comparative approaches that may help reveal evolutionary transitions contributing to vocal learning, social behavior and other properties that make songbirds such compelling research subjects. PMID:25280907
Developing molecular tools for Chlamydomonas reinhardtii
NASA Astrophysics Data System (ADS)
Noor-Mohammadi, Samaneh
Microalgae have garnered increasing interest over the years for their ability to produce compounds ranging from biofuels to neutraceuticals. A main focus of researchers has been to use microalgae as a natural bioreactor for the production of valuable and complex compounds. Recombinant protein expression in the chloroplasts of green algae has recently become more routine; however, the heterologous expression of multiple proteins or complete biosynthetic pathways remains a significant challenge. To take full advantage of these organisms' natural abilities, sophisticated molecular tools are needed to be able to introduce and functionally express multiple gene biosynthetic pathways in its genome. To achieve the above objective, we have sought to establish a method to construct, integrate and express multigene operons in the chloroplast and nuclear genome of the model microalgae Chlamydomonas reinhardtii. Here we show that a modified DNA Assembler approach can be used to rapidly assemble multiple-gene biosynthetic pathways in yeast and then integrate these assembled pathways at a site-specific location in the chloroplast, or by random integration in the nuclear genome of C. reinhardtii. As a proof of concept, this method was used to successfully integrate and functionally express up to three reporter proteins (AphA6, AadA, and GFP) in the chloroplast of C. reinhardtii and up to three reporter proteins (Ble, AphVIII, and GFP) in its nuclear genome. An analysis of the relative gene expression of the engineered strains showed significant differences in the mRNA expression levels of the reporter genes and thus highlights the importance of proper promoter/untranslated-region selection when constructing a target pathway. In addition, this work focuses on expressing the cofactor regeneration enzyme phosphite dehydrogenase (PTDH) in the chloroplast and nuclear genomes of C. reinhardtii. The PTDH enzyme converts phosphite into phosphate and NAD(P)+ into NAD(P)H. The reduced nicotinamide cofactor NAD(P)H plays a pivotal role in many biochemical oxidation and reduction reactions, thus this enzyme would allow regeneration of NAD(P)H in a microalgae strain over-expressing a NAD(P)H-dependent oxidoreductase. A phosphite dehydrogenase gene was introduced into the chloroplast genome (codon optimized) and nuclear genome of C. reinhardtii by biolistic transformation and electroporation in separate events, respectively. Successful expression of the heterologous protein was confirmed by transcript analysis and protein analysis. In conclusion, this new method represents a useful genetic tool in the construction and integration of complex biochemical pathways into the chloroplast or nuclear genome of microalgae, and this should aid current efforts to engineer algae for recombinant protein expression, biofuels production and production of other desirable natural products.
Estimation of gene induction enables a relevance-based ranking of gene sets.
Bartholomé, Kilian; Kreutz, Clemens; Timmer, Jens
2009-07-01
In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.
Reprogramming Methods Do Not Affect Gene Expression Profile of Human Induced Pluripotent Stem Cells.
Trevisan, Marta; Desole, Giovanna; Costanzi, Giulia; Lavezzo, Enrico; Palù, Giorgio; Barzon, Luisa
2017-01-20
Induced pluripotent stem cells (iPSCs) are pluripotent cells derived from adult somatic cells. After the pioneering work by Yamanaka, who first generated iPSCs by retroviral transduction of four reprogramming factors, several alternative methods to obtain iPSCs have been developed in order to increase the yield and safety of the process. However, the question remains open on whether the different reprogramming methods can influence the pluripotency features of the derived lines. In this study, three different strategies, based on retroviral vectors, episomal vectors, and Sendai virus vectors, were applied to derive iPSCs from human fibroblasts. The reprogramming efficiency of the methods based on episomal and Sendai virus vectors was higher than that of the retroviral vector-based approach. All human iPSC clones derived with the different methods showed the typical features of pluripotent stem cells, including the expression of alkaline phosphatase and stemness maker genes, and could give rise to the three germ layer derivatives upon embryoid bodies assay. Microarray analysis confirmed the presence of typical stem cell gene expression profiles in all iPSC clones and did not identify any significant difference among reprogramming methods. In conclusion, the use of different reprogramming methods is equivalent and does not affect gene expression profile of the derived human iPSCs.
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-01
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448
Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong
2016-01-11
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
ERIC Educational Resources Information Center
Eckes, Suzanne E.
2017-01-01
This article examines an education policy matter that involves homophobic speech in public schools. Using legal research methods, two federal circuit court opinions that have examined the tension surrounding anti-LGBTQ student expression are analyzed. This legal analysis provides non-lawyers some insight into the current realities of student…
Analysis of Written Expression Revision Skills of the Students in Faculty of Education
ERIC Educational Resources Information Center
Can, Remzi
2017-01-01
This study aims to analyze written expression revision skills of students in Turkish Education Department, Education Faculty. This study was done using qualitative research method. The study group of the research consisted of 3rd grade students. The research data were collected by means of document review, a qualitative research technique. The…
Franco, Érika Mendonça Fernandes; Valarelli, Fabrício Pinelli; Fernandes, João Batista; Cançado, Rodrigo Hermont; de Freitas, Karina Maria Salvatore
2015-01-01
Abstract Objective: The aim of this study was to compare torque expression in active and passive self-ligating and conventional brackets. Methods: A total of 300 segments of stainless steel wire 0.019 x 0.025-in and six different brands of brackets (Damon 3MX, Portia, In-Ovation R, Bioquick, Roth SLI and Roth Max) were used. Torque moments were measured at 12°, 24°, 36° and 48°, using a wire torsion device associated with a universal testing machine. The data obtained were compared by analysis of variance followed by Tukey test for multiple comparisons. Regression analysis was performed by the least-squares method to generate the mathematical equation of the optimal curve for each brand of bracket. Results: Statistically significant differences were observed in the expression of torque among all evaluated bracket brands in all evaluated torsions (p < 0.05). It was found that Bioquick presented the lowest torque expression in all tested torsions; in contrast, Damon 3MX bracket presented the highest torque expression up to 36° torsion. Conclusions: The connection system between wire/bracket (active, passive self-ligating or conventional with elastic ligature) seems not to interfere in the final torque expression, the latter being probably dependent on the interaction between the wire and the bracket chosen for orthodontic mechanics. PMID:26691972
Gene expression in the rectus abdominus muscle of patients with and without pelvic organ prolapse.
Hundley, Andrew F; Yuan, Lingwen; Visco, Anthony G
2008-02-01
The objective of the study was to compare gene expression in a group of actin and myosin-related proteins in the rectus muscle of 15 patients with pelvic organ prolapse and 13 controls. Six genes previously identified by microarray GeneChip analysis were examined using real-time quantitative reverse transcriptase-polymerase chain reaction analysis, including 2 genes showing differential expression in pubococcygeus muscle. Samples and controls were run in triplicate in multiplexed wells, and levels of gene expression were analyzed using the comparative critical threshold method. One gene, MYH3, was 3.2 times overexpressed in patients with prolapse (P = .032), but no significant differences in expression were seen for the other genes examined. An age-matched subset of 9 patients and controls showed that MYH3 gene expression was no longer significantly different (P = .058). Differential messenger ribonucleic acid levels of actin and myosin-related genes in patients with pelvic organ prolapse and controls may be limited to skeletal muscle from the pelvic floor.
2014-01-01
Background Torque Teno Virus (TTV) is a DNA virus with high rate of prevalence globally. Since its discovery in 1997, several studies have questioned the role of this virus in causing disease. However, it still remains an enigma. Although methods are available for detection of TTV infection, there is still a need for simple, rapid and reliable method for screening of this virus in human population. Present investigation describes the cloning and expression of N22 region of TTV-genome and the use of expressed peptide in development of immunoassay to detect anti-TTV antibodies in serum. Since TTV genotype-1 is more common in India, the serum positive for genotype-1 was used as source of N22 for expression purpose. Methods Full length N22 region of ORF1 from TTV genotype-1 was amplified and cloned in pGEM®-T Easy vector. After cloning, the amplicon was transformed and expressed as a fusion protein containing hexa-histidine tag in pET-28a(+) vector using BL21 E. coli cells as host. Expression was conducted both in LB medium as well as ZYP-5052 auto-induction medium. The expressed peptide was purified using metal-chelate affinity chromatography and used as antigen in developing a blot immunoassay. Results Analysis of translated product by SDS-PAGE and western blotting demonstrated the presence of 25 kDa polypeptide produced after expression. Solubility studies showed the polypeptide to be associated with insoluble fraction. The use of this peptide as antigen in blot assay produced prominent spot on membrane treated with sera from TTV-infected patients. Analysis of sera from 75 patients with liver and renal diseases demonstrated a successful implication of N22 polypeptide based immunoassay in screening sera for anti-TTV antibodies. Comparison of the immunoassay developed using expressed N22 peptide with established PCR method for TTV-DNA detection showed good coherence between TTV-DNA and presence of anti-TTV antibodies in the sera analysed. Conclusions This concludes that TTV N22 region may be expressed and safely used as antigen for blot assay to detect anti-TTV antibodies in sera. PMID:24884576
Methods to increase reproducibility in differential gene expression via meta-analysis
Sweeney, Timothy E.; Haynes, Winston A.; Vallania, Francesco; Ioannidis, John P.; Khatri, Purvesh
2017-01-01
Findings from clinical and biological studies are often not reproducible when tested in independent cohorts. Due to the testing of a large number of hypotheses and relatively small sample sizes, results from whole-genome expression studies in particular are often not reproducible. Compared to single-study analysis, gene expression meta-analysis can improve reproducibility by integrating data from multiple studies. However, there are multiple choices in designing and carrying out a meta-analysis. Yet, clear guidelines on best practices are scarce. Here, we hypothesized that studying subsets of very large meta-analyses would allow for systematic identification of best practices to improve reproducibility. We therefore constructed three very large gene expression meta-analyses from clinical samples, and then examined meta-analyses of subsets of the datasets (all combinations of datasets with up to N/2 samples and K/2 datasets) compared to a ‘silver standard’ of differentially expressed genes found in the entire cohort. We tested three random-effects meta-analysis models using this procedure. We showed relatively greater reproducibility with more-stringent effect size thresholds with relaxed significance thresholds; relatively lower reproducibility when imposing extraneous constraints on residual heterogeneity; and an underestimation of actual false positive rate by Benjamini–Hochberg correction. In addition, multivariate regression showed that the accuracy of a meta-analysis increased significantly with more included datasets even when controlling for sample size. PMID:27634930
Valkonen, Mari; Mojzita, Dominik; Penttilä, Merja
2013-01-01
The ability of cells to maintain pH homeostasis in response to environmental changes has elicited interest in basic and applied research and has prompted the development of methods for intracellular pH measurements. Many traditional methods provide information at population level and thus the average values of the studied cell physiological phenomena, excluding the fact that cell cultures are very heterogeneous. Single-cell analysis, on the other hand, offers more detailed insight into population variability, thereby facilitating a considerably deeper understanding of cell physiology. Although microscopy methods can address this issue, they suffer from limitations in terms of the small number of individual cells that can be studied and complicated image processing. We developed a noninvasive high-throughput method that employs flow cytometry to analyze large populations of cells that express pHluorin, a genetically encoded ratiometric fluorescent probe that is sensitive to pH. The method described here enables measurement of the intracellular pH of single cells with high sensitivity and speed, which is a clear improvement compared to previously published methods that either require pretreatment of the cells, measure cell populations, or require complex data analysis. The ratios of fluorescence intensities, which correlate to the intracellular pH, are independent of the expression levels of the pH probe, making the use of transiently or extrachromosomally expressed probes possible. We conducted an experiment on the kinetics of the pH homeostasis of Saccharomyces cerevisiae cultures grown to a stationary phase after ethanol or glucose addition and after exposure to weak acid stress and glucose pulse. Minor populations with pH homeostasis behaving differently upon treatments were identified. PMID:24038689
Valkonen, Mari; Mojzita, Dominik; Penttilä, Merja; Bencina, Mojca
2013-12-01
The ability of cells to maintain pH homeostasis in response to environmental changes has elicited interest in basic and applied research and has prompted the development of methods for intracellular pH measurements. Many traditional methods provide information at population level and thus the average values of the studied cell physiological phenomena, excluding the fact that cell cultures are very heterogeneous. Single-cell analysis, on the other hand, offers more detailed insight into population variability, thereby facilitating a considerably deeper understanding of cell physiology. Although microscopy methods can address this issue, they suffer from limitations in terms of the small number of individual cells that can be studied and complicated image processing. We developed a noninvasive high-throughput method that employs flow cytometry to analyze large populations of cells that express pHluorin, a genetically encoded ratiometric fluorescent probe that is sensitive to pH. The method described here enables measurement of the intracellular pH of single cells with high sensitivity and speed, which is a clear improvement compared to previously published methods that either require pretreatment of the cells, measure cell populations, or require complex data analysis. The ratios of fluorescence intensities, which correlate to the intracellular pH, are independent of the expression levels of the pH probe, making the use of transiently or extrachromosomally expressed probes possible. We conducted an experiment on the kinetics of the pH homeostasis of Saccharomyces cerevisiae cultures grown to a stationary phase after ethanol or glucose addition and after exposure to weak acid stress and glucose pulse. Minor populations with pH homeostasis behaving differently upon treatments were identified.
Wan, B; Yarbrough, J W; Schultz, T W
2008-01-01
This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.
Automation of fluorescent differential display with digital readout.
Meade, Jonathan D; Cho, Yong-Jig; Fisher, Jeffrey S; Walden, Jamie C; Guo, Zhen; Liang, Peng
2006-01-01
Since its invention in 1992, differential display (DD) has become the most commonly used technique for identifying differentially expressed genes because of its many advantages over competing technologies such as DNA microarray, serial analysis of gene expression (SAGE), and subtractive hybridization. Despite the great impact of the method on biomedical research, there has been a lack of automation of DD technology to increase its throughput and accuracy for systematic gene expression analysis. Most of previous DD work has taken a "shot-gun" approach of identifying one gene at a time, with a limited number of polymerase chain reaction (PCR) reactions set up manually, giving DD a low-tech and low-throughput image. We have optimized the DD process with a new platform that incorporates fluorescent digital readout, automated liquid handling, and large-format gels capable of running entire 96-well plates. The resulting streamlined fluorescent DD (FDD) technology offers an unprecedented accuracy, sensitivity, and throughput in comprehensive and quantitative analysis of gene expression. These major improvements will allow researchers to find differentially expressed genes of interest, both known and novel, quickly and easily.
High throughput protein production screening
Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA
2009-09-08
Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.
Literature-based condition-specific miRNA-mRNA target prediction.
Oh, Minsik; Rhee, Sungmin; Moon, Ji Hwan; Chae, Heejoon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun
2017-01-01
miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.
Ye, Bingyuan; Wang, Ruihua; Wang, Jianbo
2016-01-01
Raphanobrassica is an allopolyploid species derived from inter-generic hybridization that combines the R genome from R. sativus and the C genome from B. oleracea var. alboglabra. In the present study, we used a high-throughput sequencing method to identify the mRNA and miRNA profiles in Raphanobrassica and its parents. A total of 33,561 mRNAs and 283 miRNAs were detected, 9,209 mRNAs and 134 miRNAs were differentially expressed respectively, 7,633 mRNAs and 39 miRNAs showed ELD expression, 5,219 mRNAs and 57 miRNAs were non-additively expressed in Raphanobrassica. Remarkably, differentially expressed genes (DEGs) were up-regulated and maternal bias was detected in Raphanobrassica. In addition, a miRNA-mRNA interaction network was constructed based on reverse regulated miRNA-mRNAs, which included 75 miRNAs and 178 mRNAs, 31 miRNAs were non-additively expressed target by 13 miRNAs. The related target genes were significantly enriched in the GO term ‘metabolic processes’. Non-additive related target genes regulation is involved in a range of biological pathways, like providing a driving force for variation and adaption in this allopolyploid. The integrative analysis of mRNA and miRNA profiling provides more information to elucidate gene expression mechanism and may supply a comprehensive and corresponding method to study genetic and transcription variation of allopolyploid. PMID:27874043
Sewer, Alain; Gubian, Sylvain; Kogel, Ulrike; Veljkovic, Emilija; Han, Wanjiang; Hengstermann, Arnd; Peitsch, Manuel C; Hoeng, Julia
2014-05-17
High-quality expression data are required to investigate the biological effects of microRNAs (miRNAs). The goal of this study was, first, to assess the quality of miRNA expression data based on microarray technologies and, second, to consolidate it by applying a novel normalization method. Indeed, because of significant differences in platform designs, miRNA raw data cannot be normalized blindly with standard methods developed for gene expression. This fundamental observation motivated the development of a novel multi-array normalization method based on controllable assumptions, which uses the spike-in control probes to adjust the measured intensities across arrays. Raw expression data were obtained with the Exiqon dual-channel miRCURY LNA™ platform in the "common reference design" and processed as "pseudo-single-channel". They were used to apply several quality metrics based on the coefficient of variation and to test the novel spike-in controls based normalization method. Most of the considerations presented here could be applied to raw data obtained with other platforms. To assess the normalization method, it was compared with 13 other available approaches from both data quality and biological outcome perspectives. The results showed that the novel multi-array normalization method reduced the data variability in the most consistent way. Further, the reliability of the obtained differential expression values was confirmed based on a quantitative reverse transcription-polymerase chain reaction experiment performed for a subset of miRNAs. The results reported here support the applicability of the novel normalization method, in particular to datasets that display global decreases in miRNA expression similarly to the cigarette smoke-exposed mouse lung dataset considered in this study. Quality metrics to assess between-array variability were used to confirm that the novel spike-in controls based normalization method provided high-quality miRNA expression data suitable for reliable downstream analysis. The multi-array miRNA raw data normalization method was implemented in an R software package called ExiMiR and deposited in the Bioconductor repository.
2014-01-01
Background High-quality expression data are required to investigate the biological effects of microRNAs (miRNAs). The goal of this study was, first, to assess the quality of miRNA expression data based on microarray technologies and, second, to consolidate it by applying a novel normalization method. Indeed, because of significant differences in platform designs, miRNA raw data cannot be normalized blindly with standard methods developed for gene expression. This fundamental observation motivated the development of a novel multi-array normalization method based on controllable assumptions, which uses the spike-in control probes to adjust the measured intensities across arrays. Results Raw expression data were obtained with the Exiqon dual-channel miRCURY LNA™ platform in the “common reference design” and processed as “pseudo-single-channel”. They were used to apply several quality metrics based on the coefficient of variation and to test the novel spike-in controls based normalization method. Most of the considerations presented here could be applied to raw data obtained with other platforms. To assess the normalization method, it was compared with 13 other available approaches from both data quality and biological outcome perspectives. The results showed that the novel multi-array normalization method reduced the data variability in the most consistent way. Further, the reliability of the obtained differential expression values was confirmed based on a quantitative reverse transcription–polymerase chain reaction experiment performed for a subset of miRNAs. The results reported here support the applicability of the novel normalization method, in particular to datasets that display global decreases in miRNA expression similarly to the cigarette smoke-exposed mouse lung dataset considered in this study. Conclusions Quality metrics to assess between-array variability were used to confirm that the novel spike-in controls based normalization method provided high-quality miRNA expression data suitable for reliable downstream analysis. The multi-array miRNA raw data normalization method was implemented in an R software package called ExiMiR and deposited in the Bioconductor repository. PMID:24886675
A prior-based integrative framework for functional transcriptional regulatory network inference
Siahpirani, Alireza F.
2017-01-01
Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550
Kuhn-Tucker optimization based reliability analysis for probabilistic finite elements
NASA Technical Reports Server (NTRS)
Liu, W. K.; Besterfield, G.; Lawrence, M.; Belytschko, T.
1988-01-01
The fusion of probability finite element method (PFEM) and reliability analysis for fracture mechanics is considered. Reliability analysis with specific application to fracture mechanics is presented, and computational procedures are discussed. Explicit expressions for the optimization procedure with regard to fracture mechanics are given. The results show the PFEM is a very powerful tool in determining the second-moment statistics. The method can determine the probability of failure or fracture subject to randomness in load, material properties and crack length, orientation, and location.
Impact of missing data imputation methods on gene expression clustering and classification.
de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G
2015-02-26
Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .
Linnorm: improved statistical analysis for single cell RNA-seq expression data
Yip, Shun H.; Wang, Panwen; Kocher, Jean-Pierre A.; Sham, Pak Chung
2017-01-01
Abstract Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. PMID:28981748
Analysis of high-throughput biological data using their rank values.
Dembélé, Doulaye
2018-01-01
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
Partitioning of functional gene expression data using principal points.
Kim, Jaehee; Kim, Haseong
2017-10-12
DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system. A principal points based functional partitioning method is proposed for time-course gene expression data. The method explores the relationship between genes using Legendre coefficients as principal points to extract the features of gene functions. Our proposed method provides high connectivity in connectedness after clustering for simulated data and finds a significant subsets of genes with the increased connectivity. Our approach has comparative advantages that fewer coefficients are used from the functional data and self-consistency of principal points for partitioning. As real data applications, we are able to find partitioned genes through the gene expressions found in budding yeast data and Escherichia coli data. The proposed method benefitted from the use of principal points, dimension reduction, and choice of orthogonal basis system as well as provides appropriately connected genes in the resulting subsets. We illustrate our method by applying with each set of cell-cycle-regulated time-course yeast genes and E. coli genes. The proposed method is able to identify highly connected genes and to explore the complex dynamics of biological systems in functional genomics.
Enrichment and isolation of neurons from adult mouse brain for ex vivo analysis.
Berl, Sabina; Karram, Khalad; Scheller, Anja; Jungblut, Melanie; Kirchhoff, Frank; Waisman, Ari
2017-05-01
Isolation of neurons from the adult mouse CNS is important in order to study their gene expression during development or the course of different diseases. Here we present two different methods for the enrichment or isolation of neurons from adult mouse CNS. These methods: are either based on flow cytometry sorting of eYFP expressing neurons, or by depletion of non-neuronal cells by sorting with magnetic-beads. Enrichment by FACS sorting of eYFP positive neurons results in a population of 62.4% NeuN positive living neurons. qPCR data shows a 3-5fold upregulation of neuronal markers. The isolation of neurons based on depletion of non-neuronal cells using the Miltenyi Neuron Isolation Kit, reaches a purity of up to 86.5%. qPCR data of these isolated neurons shows an increase in neuronal markers and an absence of glial markers, proving pure neuronal RNA isolation. Former data related to neuronal gene expression are mainly based on histology, which does not allow for high-throughput transcriptome analysis to examine differential gene expression. These protocols can be used to study cell type specific gene expression of neurons to unravel their function in the process of damage to the CNS. Copyright © 2017 Elsevier B.V. All rights reserved.
Kunnath-Velayudhan, Shajo; Goldberg, Michael F; Saini, Neeraj K; Johndrow, Christopher T; Ng, Tony W; Johnson, Alison J; Xu, Jiayong; Chan, John; Jacobs, William R; Porcelli, Steven A
2017-10-01
Analysis of Ag-specific CD4 + T cells in mycobacterial infections at the transcriptome level is informative but technically challenging. Although several methods exist for identifying Ag-specific T cells, including intracellular cytokine staining, cell surface cytokine-capture assays, and staining with peptide:MHC class II multimers, all of these have significant technical constraints that limit their usefulness. Measurement of activation-induced expression of CD154 has been reported to detect live Ag-specific CD4 + T cells, but this approach remains underexplored and, to our knowledge, has not previously been applied in mycobacteria-infected animals. In this article, we show that CD154 expression identifies adoptively transferred or endogenous Ag-specific CD4 + T cells induced by Mycobacterium bovis bacillus Calmette-Guérin vaccination. We confirmed that Ag-specific cytokine production was positively correlated with CD154 expression by CD4 + T cells from bacillus Calmette-Guérin-vaccinated mice and show that high-quality microarrays can be performed from RNA isolated from CD154 + cells purified by cell sorting. Analysis of microarray data demonstrated that the transcriptome of CD4 + CD154 + cells was distinct from that of CD154 - cells and showed major enrichment of transcripts encoding multiple cytokines and pathways of cellular activation. One notable finding was the identification of a previously unrecognized subset of mycobacteria-specific CD4 + T cells that is characterized by the production of IL-3. Our results support the use of CD154 expression as a practical and reliable method to isolate live Ag-specific CD4 + T cells for transcriptomic analysis and potentially for a range of other studies in infected or previously immunized hosts. Copyright © 2017 by The American Association of Immunologists, Inc.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
2008-05-12
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Comparative analysis of gene regulatory networks: from network reconstruction to evolution.
Thompson, Dawn; Regev, Aviv; Roy, Sushmita
2015-01-01
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
NASA Astrophysics Data System (ADS)
Song, Sutao; Huang, Yuxia; Long, Zhiying; Zhang, Jiacai; Chen, Gongxiang; Wang, Shuqing
2016-03-01
Recently, several studies have successfully applied multivariate pattern analysis methods to predict the categories of emotions. These studies are mainly focused on self-experienced emotions, such as the emotional states elicited by music or movie. In fact, most of our social interactions involve perception of emotional information from the expressions of other people, and it is an important basic skill for humans to recognize the emotional facial expressions of other people in a short time. In this study, we aimed to determine the discriminability of perceived emotional facial expressions. In a rapid event-related fMRI design, subjects were instructed to classify four categories of facial expressions (happy, disgust, angry and neutral) by pressing different buttons, and each facial expression stimulus lasted for 2s. All participants performed 5 fMRI runs. One multivariate pattern analysis method, support vector machine was trained to predict the categories of facial expressions. For feature selection, ninety masks defined from anatomical automatic labeling (AAL) atlas were firstly generated and each were treated as the input of the classifier; then, the most stable AAL areas were selected according to prediction accuracies, and comprised the final feature sets. Results showed that: for the 6 pair-wise classification conditions, the accuracy, sensitivity and specificity were all above chance prediction, among which, happy vs. neutral , angry vs. disgust achieved the lowest results. These results suggested that specific neural signatures of perceived emotional facial expressions may exist, and happy vs. neutral, angry vs. disgust might be more similar in information representation in the brain.
Granlund, Atle van Beelen; Flatberg, Arnar; Østvik, Ann E; Drozdov, Ignat; Gustafsson, Bjørn I; Kidd, Mark; Beisvag, Vidar; Torp, Sverre H; Waldum, Helge L; Martinsen, Tom Christian; Damås, Jan Kristian; Espevik, Terje; Sandvik, Arne K
2013-01-01
In inflammatory bowel disease (IBD), genetic susceptibility together with environmental factors disturbs gut homeostasis producing chronic inflammation. The two main IBD subtypes are Ulcerative colitis (UC) and Crohn's disease (CD). We present the to-date largest microarray gene expression study on IBD encompassing both inflamed and un-inflamed colonic tissue. A meta-analysis including all available, comparable data was used to explore important aspects of IBD inflammation, thereby validating consistent gene expression patterns. Colon pinch biopsies from IBD patients were analysed using Illumina whole genome gene expression technology. Differential expression (DE) was identified using LIMMA linear model in the R statistical computing environment. Results were enriched for gene ontology (GO) categories. Sets of genes encoding antimicrobial proteins (AMP) and proteins involved in T helper (Th) cell differentiation were used in the interpretation of the results. All available data sets were analysed using the same methods, and results were compared on a global and focused level as t-scores. Gene expression in inflamed mucosa from UC and CD are remarkably similar. The meta-analysis confirmed this. The patterns of AMP and Th cell-related gene expression were also very similar, except for IL23A which was consistently higher expressed in UC than in CD. Un-inflamed tissue from patients demonstrated minimal differences from healthy controls. There is no difference in the Th subgroup involvement between UC and CD. Th1/Th17 related expression, with little Th2 differentiation, dominated both diseases. The different IL23A expression between UC and CD suggests an IBD subtype specific role. AMPs, previously little studied, are strongly overexpressed in IBD. The presented meta-analysis provides a sound background for further research on IBD pathobiology.
Employing conservation of co-expression to improve functional inference
Daub, Carsten O; Sonnhammer, Erik LL
2008-01-01
Background Observing co-expression between genes suggests that they are functionally coupled. Co-expression of orthologous gene pairs across species may improve function prediction beyond the level achieved in a single species. Results We used orthology between genes of the three different species S. cerevisiae, D. melanogaster, and C. elegans to combine co-expression across two species at a time. This led to increased function prediction accuracy when we incorporated expression data from either of the other two species and even further increased when conservation across both of the two other species was considered at the same time. Employing the conservation across species to incorporate abundant model organism data for the prediction of protein interactions in poorly characterized species constitutes a very powerful annotation method. Conclusion To be able to employ the most suitable co-expression distance measure for our analysis, we evaluated the ability of four popular gene co-expression distance measures to detect biologically relevant interactions between pairs of genes. For the expression datasets employed in our co-expression conservation analysis above, we used the GO and the KEGG PATHWAY databases as gold standards. While the differences between distance measures were small, Spearman correlation showed to give most robust results. PMID:18808668
Computerised analysis of facial emotion expression in eating disorders.
Leppanen, Jenni; Dapelo, Marcela Marin; Davies, Helen; Lang, Katie; Treasure, Janet; Tchanturia, Kate
2017-01-01
Problems with social-emotional processing are known to be an important contributor to the development and maintenance of eating disorders (EDs). Diminished facial communication of emotion has been frequently reported in individuals with anorexia nervosa (AN). Less is known about facial expressivity in bulimia nervosa (BN) and in people who have recovered from AN (RecAN). This study aimed to pilot the use of computerised facial expression analysis software to investigate emotion expression across the ED spectrum and recovery in a large sample of participants. 297 participants with AN, BN, RecAN, and healthy controls were recruited. Participants watched film clips designed to elicit happy or sad emotions, and facial expressions were then analysed using FaceReader. The finding mirrored those from previous work showing that healthy control and RecAN participants expressed significantly more positive emotions during the positive clip compared to the AN group. There were no differences in emotion expression during the sad film clip. These findings support the use of computerised methods to analyse emotion expression in EDs. The findings also demonstrate that reduced positive emotion expression is likely to be associated with the acute stage of AN illness, with individuals with BN showing an intermediate profile.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
SPIRE: Systematic protein investigative research environment.
Kolker, Eugene; Higdon, Roger; Morgan, Phil; Sedensky, Margaret; Welch, Dean; Bauman, Andrew; Stewart, Elizabeth; Haynes, Winston; Broomall, William; Kolker, Natali
2011-12-10
The SPIRE (Systematic Protein Investigative Research Environment) provides web-based experiment-specific mass spectrometry (MS) proteomics analysis (https://www.proteinspire.org). Its emphasis is on usability and integration of the best analytic tools. SPIRE provides an easy to use web-interface and generates results in both interactive and simple data formats. In contrast to run-based approaches, SPIRE conducts the analysis based on the experimental design. It employs novel methods to generate false discovery rates and local false discovery rates (FDR, LFDR) and integrates the best and complementary open-source search and data analysis methods. The SPIRE approach of integrating X!Tandem, OMSSA and SpectraST can produce an increase in protein IDs (52-88%) over current combinations of scoring and single search engines while also providing accurate multi-faceted error estimation. One of SPIRE's primary assets is combining the results with data on protein function, pathways and protein expression from model organisms. We demonstrate some of SPIRE's capabilities by analyzing mitochondrial proteins from the wild type and 3 mutants of C. elegans. SPIRE also connects results to publically available proteomics data through its Model Organism Protein Expression Database (MOPED). SPIRE can also provide analysis and annotation for user supplied protein ID and expression data. Copyright © 2011. Published by Elsevier B.V.
2012-01-01
Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019
Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M
2012-09-17
RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Nohata, Nijiro; Abba, Martin C.; Gutkind, J. Silvio
2017-01-01
Objectives The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. Materials and Methods The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Results Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. Conclusion LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. PMID:27424183
GECKO: a complete large-scale gene expression analysis platform.
Theilhaber, Joachim; Ulyanov, Anatoly; Malanthara, Anish; Cole, Jack; Xu, Dapeng; Nahf, Robert; Heuer, Michael; Brockel, Christoph; Bushnell, Steven
2004-12-10
Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing approximately 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (approximately 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. The Gecko system is being made publicly available as free software http://sourceforge.net/projects/geckoe. In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs.
Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K
2016-01-01
RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.
Debey-Pascher, Svenja; Hofmann, Andrea; Kreusch, Fatima; Schuler, Gerold; Schuler-Thurner, Beatrice; Schultze, Joachim L.; Staratschek-Jox, Andrea
2011-01-01
Microarray-based transcriptome analysis of peripheral blood as surrogate tissue has become an important approach in clinical implementations. However, application of gene expression profiling in routine clinical settings requires careful consideration of the influence of sample handling and RNA isolation methods on gene expression profile outcome. We evaluated the effect of different sample preservation strategies (eg, cryopreservation of peripheral blood mononuclear cells or freezing of PAXgene-stabilized whole blood samples) on gene expression profiles. Expression profiles obtained from cryopreserved peripheral blood mononuclear cells differed substantially from those of their nonfrozen counterpart samples. Furthermore, expression profiles in cryopreserved peripheral blood mononuclear cell samples were found to undergo significant alterations with increasing storage period, whereas long-term freezing of PAXgene RNA stabilized whole blood samples did not significantly affect stability of gene expression profiles. This report describes important technical aspects contributing toward the establishment of robust and reliable guidance for gene expression studies using peripheral blood and provides a promising strategy for reliable implementation in routine handling for diagnostic purposes. PMID:21704280
Quantification of differential gene expression by multiplexed targeted resequencing of cDNA
Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.
2017-01-01
Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677
Assessment of sexual orientation using the hemodynamic brain response to visual sexual stimuli.
Ponseti, Jorge; Granert, Oliver; Jansen, Olav; Wolff, Stephan; Mehdorn, Hubertus; Bosinski, Hartmut; Siebner, Hartwig
2009-06-01
The assessment of sexual orientation is of importance to the diagnosis and treatment of sex offenders and paraphilic disorders. Phallometry is considered gold standard in objectifying sexual orientation, yet this measurement has been criticized because of its intrusiveness and limited reliability. To evaluate whether the spatial response pattern to sexual stimuli as revealed by a change in blood oxygen level-dependent (BOLD) signal can be used for individual classification of sexual orientation. We used a preexisting functional MRI (fMRI) data set that had been acquired in a nonclinical sample of 12 heterosexual men and 14 homosexual men. During fMRI, participants were briefly exposed to pictures of same-sex and opposite-sex genitals. Data analysis involved four steps: (i) differences in the BOLD response to female and male sexual stimuli were calculated for each subject; (ii) these contrast images were entered into a group analysis to calculate whole-brain difference maps between homosexual and heterosexual participants; (iii) a single expression value was computed for each subject expressing its correspondence to the group result; and (iv) based on these expression values, Fisher's linear discriminant analysis and the kappa-nearest neighbor classification method were used to predict the sexual orientation of each subject. Sensitivity and specificity of the two classification methods in predicting individual sexual orientation. Both classification methods performed well in predicting individual sexual orientation with a mean accuracy of >85% (Fisher's linear discriminant analysis: 92% sensitivity, 85% specificity; kappa-nearest neighbor classification: 88% sensitivity, 92% specificity). Despite the small sample size, the functional response patterns of the brain to sexual stimuli contained sufficient information to predict individual sexual orientation with high accuracy. These results suggest that fMRI-based classification methods hold promise for the diagnosis of paraphilic disorders (e.g., pedophilia).
Goldblatt, Hadass; Cohen, Miri; Azaiza, Faisal
2016-12-01
Researchers have suggested that older adults express less negative emotions. Yet, emotional expression patterns in older and younger breast cancer survivors, have barely been examined. This study aimed to explore types and intensity of negative and positive emotional expression related to the breast cancer experience by younger and older Arab breast cancer survivors. Participants were 20 younger (aged 32-50) and 20 older (aged 51-75) Muslim and Christian Arab breast cancer survivors (stages I-III), currently free of disease. Data were gathered through in-depth semi-structured interviews. Mixed methods analyses were conducted, including: (1) frequency analysis of participants' emotional expressions; (2) content analysis of emotional expressions, categorized according to negative and positive emotions. Three emotional expression modalities were revealed: (1) Succinct versus comprehensive accounts; (2) expression of emotions versus avoidance of emotions; (3) patterns of expression of positive emotions and a sense of personal growth. Younger women provided more detailed accounts about their illness experiences than older women. Older women's accounts were succinct, action-focused, and included more emotion-avoiding expressions than younger women. Understanding the relationships between emotional expression, emotional experience, and cancer survivors' quality of life, specifically of those from traditional communities, is necessary for developing effective psycho-social interventions.
pcr: an R package for quality assessment, analysis and testing of qPCR data
Ahmed, Mahmoud
2018-01-01
Background Real-time quantitative PCR (qPCR) is a broadly used technique in the biomedical research. Currently, few different analysis models are used to determine the quality of data and to quantify the mRNA level across the experimental conditions. Methods We developed an R package to implement methods for quality assessment, analysis and testing qPCR data for statistical significance. Double Delta CT and standard curve models were implemented to quantify the relative expression of target genes from CT in standard qPCR control-group experiments. In addition, calculation of amplification efficiency and curves from serial dilution qPCR experiments are used to assess the quality of the data. Finally, two-group testing and linear models were used to test for significance of the difference in expression control groups and conditions of interest. Results Using two datasets from qPCR experiments, we applied different quality assessment, analysis and statistical testing in the pcr package and compared the results to the original published articles. The final relative expression values from the different models, as well as the intermediary outputs, were checked against the expected results in the original papers and were found to be accurate and reliable. Conclusion The pcr package provides an intuitive and unified interface for its main functions to allow biologist to perform all necessary steps of qPCR analysis and produce graphs in a uniform way. PMID:29576953
In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer.
Pandi, Narayanan Sathiya; Suganya, Sivagurunathan; Rajendran, Suriliyandi
2013-10-04
Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC. Copyright © 2013 Elsevier Inc. All rights reserved.
Szelinger, Szabolcs; Malenica, Ivana; Corneveaux, Jason J.; Siniard, Ashley L.; Kurdoglu, Ahmet A.; Ramsey, Keri M.; Schrauwen, Isabelle; Trent, Jeffrey M.; Narayanan, Vinodh; Huentelman, Matthew J.; Craig, David W.
2014-01-01
In females, X chromosome inactivation (XCI) is an epigenetic, gene dosage compensatory mechanism by inactivation of one copy of X in cells. Random XCI of one of the parental chromosomes results in an approximately equal proportion of cells expressing alleles from either the maternally or paternally inherited active X, and is defined by the XCI ratio. Skewed XCI ratio is suggestive of non-random inactivation, which can play an important role in X-linked genetic conditions. Current methods rely on indirect, semi-quantitative DNA methylation-based assay to estimate XCI ratio. Here we report a direct approach to estimate XCI ratio by integrated, family-trio based whole-exome and mRNA sequencing using phase-by-transmission of alleles coupled with allele-specific expression analysis. We applied this method to in silico data and to a clinical patient with mild cognitive impairment but no clear diagnosis or understanding molecular mechanism underlying the phenotype. Simulation showed that phased and unphased heterozygous allele expression can be used to estimate XCI ratio. Segregation analysis of the patient's exome uncovered a de novo, interstitial, 1.7 Mb deletion on Xp22.31 that originated on the paternally inherited X and previously been associated with heterogeneous, neurological phenotype. Phased, allelic expression data suggested an 83∶20 moderately skewed XCI that favored the expression of the maternally inherited, cytogenetically normal X and suggested that the deleterious affect of the de novo event on the paternal copy may be offset by skewed XCI that favors expression of the wild-type X. This study shows the utility of integrated sequencing approach in XCI ratio estimation. PMID:25503791
Zhou, Weichen; Ma, Yanyun; Zhang, Jun; Hu, Jingyi; Zhang, Menghan; Wang, Yi; Li, Yi; Wu, Lijun; Pan, Yida; Zhang, Yitong; Zhang, Xiaonan; Zhang, Xinxin; Zhang, Zhanqing; Zhang, Jiming; Li, Hai; Lu, Lungen; Jin, Li; Wang, Jiucun; Yuan, Zhenghong; Liu, Jie
2017-11-01
Liver biopsy is the gold standard to assess pathological features (eg inflammation grades) for hepatitis B virus-infected patients although it is invasive and traumatic; meanwhile, several gene profiles of chronic hepatitis B (CHB) have been separately described in relatively small hepatitis B virus (HBV)-infected samples. We aimed to analyse correlations among inflammation grades, gene expressions and clinical parameters (serum alanine amino transaminase, aspartate amino transaminase and HBV-DNA) in large-scale CHB samples and to predict inflammation grades by using clinical parameters and/or gene expressions. We analysed gene expressions with three clinical parameters in 122 CHB samples by an improved regression model. Principal component analysis and machine-learning methods including Random Forest, K-nearest neighbour and support vector machine were used for analysis and further diagnosis models. Six normal samples were conducted to validate the predictive model. Significant genes related to clinical parameters were found enriching in the immune system, interferon-stimulated, regulation of cytokine production, anti-apoptosis, and etc. A panel of these genes with clinical parameters can effectively predict binary classifications of inflammation grade (area under the ROC curve [AUC]: 0.88, 95% confidence interval [CI]: 0.77-0.93), validated by normal samples. A panel with only clinical parameters was also valuable (AUC: 0.78, 95% CI: 0.65-0.86), indicating that liquid biopsy method for detecting the pathology of CHB is possible. This is the first study to systematically elucidate the relationships among gene expressions, clinical parameters and pathological inflammation grades in CHB, and to build models predicting inflammation grades by gene expressions and/or clinical parameters as well. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Eickelberg, Garrett J; Fisher, Alison J
2013-01-01
We present a novel laboratory project employing "real-time" RT-qPCR to measure the effect of environment on the expression of the FLOWERING LOCUS C gene, a key regulator of floral timing in Arabidopsis thaliana plants. The project requires four 3-hr laboratory sessions and is aimed at upper-level undergraduate students in biochemistry or molecular biology courses. The project provides students with hands-on experience with RT-qPCR, the current "gold standard" for gene expression analysis, including detailed data analysis using the common 2-ΔΔCT method. Moreover, it provides a convenient starting point for many inquiry-driven projects addressing diverse questions concerning ecological biochemistry, naturally occurring genetic variation, developmental biology, and the regulation of gene expression in nature. Copyright © 2013 Wiley Periodicals, Inc.
A comprehensive simulation study on classification of RNA-Seq data.
Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet
2017-01-01
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
Cha, Kihoon; Hwang, Taeho; Oh, Kimin; Yi, Gwan-Su
2015-01-01
It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
2015-01-01
Background It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes. Results In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases. Conclusions This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation. PMID:26043779
Lu, Minxun; Liu, Yang; Zheng, Tianying; Feng, Shijian; Hao, Meiqin; Shi, Huashan
2015-01-01
Objective To evaluate the predicting value of MUC1 expression in lymph node and distant metastasis of colorectal cancer (CRC). Methods Pubmed/ MEDLINE and EMBASE were searched to identify eligible studies that evaluated the correlation between MUC1 and CRC. A meta-analysis was conducted to evaluate the impact of MUC1 expression on CRC metastasis. Results A total of 18 studies (n = 3271) met inclusion criteria and the mean Newcastle-Ottawa Scale (NOS) score was 6.3 with a range from 4 to 8. The pooled OR in the meta-analysis of 15 studies indicated that positive MUC1 expression correlated with more CRC node metastasis (OR = 2.32, 95% CI = 1.63–3.29). The data synthesis of 6 studies suggested that MUC1 expression predicted more possibility of CRC distant metastasis (OR = 2.22, 95% CI = 1.23–4.00). In addition, the combined OR of 7 studies showed that MUC1 expression indicated higher Duke’s stage (OR = 3.02, 95% CI = 2.11–4.33). No publication bias was found in the mate-analysis by Begg’s test or Egger’s test with the exception of the meta-analysis of MUC1 with CRC node metastasis (Begg’s test p = 0.729, Egger’s test p = 0.000). Conclusions Despite of some modest bias, the pooled evidence suggested that MUC1 expression was significantly correlated with CRC metastasis. PMID:26367866
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.
Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben
2017-06-06
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.
Zhu, Yuerong; Zhu, Yuelin; Xu, Wei
2008-01-01
Background Though microarray experiments are very popular in life science research, managing and analyzing microarray data are still challenging tasks for many biologists. Most microarray programs require users to have sophisticated knowledge of mathematics, statistics and computer skills for usage. With accumulating microarray data deposited in public databases, easy-to-use programs to re-analyze previously published microarray data are in high demand. Results EzArray is a web-based Affymetrix expression array data management and analysis system for researchers who need to organize microarray data efficiently and get data analyzed instantly. EzArray organizes microarray data into projects that can be analyzed online with predefined or custom procedures. EzArray performs data preprocessing and detection of differentially expressed genes with statistical methods. All analysis procedures are optimized and highly automated so that even novice users with limited pre-knowledge of microarray data analysis can complete initial analysis quickly. Since all input files, analysis parameters, and executed scripts can be downloaded, EzArray provides maximum reproducibility for each analysis. In addition, EzArray integrates with Gene Expression Omnibus (GEO) and allows instantaneous re-analysis of published array data. Conclusion EzArray is a novel Affymetrix expression array data analysis and sharing system. EzArray provides easy-to-use tools for re-analyzing published microarray data and will help both novice and experienced users perform initial analysis of their microarray data from the location of data storage. We believe EzArray will be a useful system for facilities with microarray services and laboratories with multiple members involved in microarray data analysis. EzArray is freely available from . PMID:18218103
Biophysics and bioinformatics of transcription regulation in bacteria and bacteriophages
NASA Astrophysics Data System (ADS)
Djordjevic, Marko
2005-11-01
Due to rapid accumulation of biological data, bioinformatics has become a very important branch of biological research. In this thesis, we develop novel bioinformatic approaches and aid design of biological experiments by using ideas and methods from statistical physics. Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of the regulatory circuits that control expression of genes. We propose a novel, biophysics based algorithm, for the supervised detection of transcription factor (TF) binding sites. The method classifies potential binding sites by explicitly estimating the sequence-specific binding energy and the chemical potential of a given TF. In contrast with the widely used information theory based weight matrix method, our approach correctly incorporates saturation in the transcription factor/DNA binding probability. This results in a significant reduction in the number of expected false positives, and in the explicit appearance---and determination---of a binding threshold. The new method was used to identify likely genomic binding sites for the Escherichia coli TFs, and to examine the relationship between TF binding specificity and degree of pleiotropy (number of regulatory targets). We next address how parameters of protein-DNA interactions can be obtained from data on protein binding to random oligos under controlled conditions (SELEX experiment data). We show that 'robust' generation of an appropriate data set is achieved by a suitable modification of the standard SELEX procedure, and propose a novel bioinformatic algorithm for analysis of such data. Finally, we use quantitative data analysis, bioinformatic methods and kinetic modeling to analyze gene expression strategies of bacterial viruses. We study bacteriophage Xp10 that infects rice pathogen Xanthomonas oryzae. Xp10 is an unusual bacteriophage, which has morphology and genome organization that most closely resembles temperate phages, such as lambda. It, however, encodes its own T7-like RNA polymerase (characteristic of virulent phages), whose role in gene expression was unclear. Our analysis resulted in quantitative understanding of the role of both host and phage RNA polymerase, and in the identification of the previously unknown promoter sequence for Xp10 RNA polymerase. More generally, an increasing number of phage genomes are being sequenced every year, and we expect that methods of quantitative data analysis that we introduced will provide an efficient way to study gene expression strategies of novel bacterial viruses.
Dehne, T.; Lindahl, A.; Brittberg, M.; Pruss, A.; Ringe, J.; Sittinger, M.; Karlsson, C.
2012-01-01
Objective: It is well known that expression of markers for WNT signaling is dysregulated in osteoarthritic (OA) bone. However, it is still not fully known if the expression of these markers also is affected in OA cartilage. The aim of this study was therefore to examine this issue. Methods: Human cartilage biopsies from OA and control donors were subjected to genome-wide oligonucleotide microarrays. Genes involved in WNT signaling were selected using the BioRetis database, KEGG pathway analysis was searched using DAVID software tools, and cluster analysis was performed using Genesis software. Results from the microarray analysis were verified using quantitative real-time PCR and immunohistochemistry. In order to study the impact of cytokines for the dysregulated WNT signaling, OA and control chondrocytes were stimulated with interleukin-1 and analyzed with real-time PCR for their expression of WNT-related genes. Results: Several WNT markers displayed a significantly altered expression in OA compared to normal cartilage. Interestingly, inhibitors of the canonical and planar cell polarity WNT signaling pathways displayed significantly increased expression in OA cartilage, while the Ca2+/WNT signaling pathway was activated. Both real-time PCR and immunohistochemistry verified the microarray results. Real-time PCR analysis demonstrated that interleukin-1 upregulated expression of important WNT markers. Conclusions: WNT signaling is significantly affected in OA cartilage. The result suggests that both the canonical and planar cell polarity WNT signaling pathways were partly inhibited while the Ca2+/WNT pathway was activated in OA cartilage. PMID:26069618
Pulp Inflammation Diagnosis from Clinical to Inflammatory Mediators: A Systematic Review.
Zanini, Marjorie; Meyer, Elisabeth; Simon, Stéphane
2017-07-01
Similar to other tissues, the dental pulp mounts an inflammatory reaction as a way to eliminate pathogens and stimulate repair. Pulp inflammation is prerequisite for dentin pulp complex repair and regeneration; otherwise, chronic disease or pulp necrosis occurs. Evaluation of pulp inflammation severity is necessary to predict the clinical success of maintaining pulp vitality. Clinical limitations to evaluating in situ inflammatory status are well-described. A molecular approach that aids clinical distinction between reversible and irreversible pulpitis could improve the success rate of vital pulp therapy. The aim of this article is to review inflammatory mediator expression in the context of clinical diagnosis. We searched PubMed and Cochrane databases for articles published between 1970 and December 2016. Only published studies of inflammatory mediator expression related to clinical diagnosis were eligible for inclusion and analysis. Thirty-two articles were analyzed. Two molecular approaches were described by study methods, protein expression analysis and gene expression analysis. Our review indicates that interleukin-8, matrix metalloproteinase 9, tumor necrosis factor-α, and receptor for advanced glycation end products expression increase at both the gene and protein levels during inflammation. Clinical irreversible pulpitis is related to specific levels of inflammatory mediator expression. The difference in expression between reversible and irreversible disease is both quantitative and qualitative. On the basis of our analysis, in situ quantification of inflammatory mediators may aid in the clinical distinction between reversible and irreversible pulpitis. Copyright © 2017 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen
2016-09-15
Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.
2009-01-01
Background New, third-generation aromatase inhibitors (AIs) have proven comparable or superior to the anti-estrogen tamoxifen for treatment of estrogen receptor (ER) and/or progesterone receptor (PR) positive breast cancer. AIs suppress total body and intratumoral estrogen levels. It is unclear whether in situ carcinoma cell aromatization is the primary source of estrogen production for tumor growth and whether the aromatase expression is predictive of response to endocrine therapy. Due to methodological difficulties in the determination of the aromatase protein, COX-2, an enzyme involved in the synthesis of aromatase, has been suggested as a surrogate marker for aromatase expression. Methods Primary tumor material was retrospectively collected from 88 patients who participated in a randomized clinical trial comparing the AI letrozole to the anti-estrogen tamoxifen for first-line treatment of advanced breast cancer. Semi-quantitative immunohistochemical (IHC) analysis was performed for ER, PR, COX-2 and aromatase using Tissue Microarrays (TMAs). Aromatase was also analyzed using whole sections (WS). Kappa analysis was applied to compare association of protein expression levels. Univariate Wilcoxon analysis and the Cox-analysis were performed to evaluate time to progression (TTP) in relation to marker expression. Results Aromatase expression was associated with ER, but not with PR or COX-2 expression in carcinoma cells. Measurements of aromatase in WS were not comparable to results from TMAs. Expression of COX-2 and aromatase did not predict response to endocrine therapy. Aromatase in combination with high PR expression may select letrozole treated patients with a longer TTP. Conclusion TMAs are not suitable for IHC analysis of in situ aromatase expression and we did not find COX-2 expression in carcinoma cells to be a surrogate marker for aromatase. In situ aromatase expression in tumor cells is associated with ER expression and may thus point towards good prognosis. Aromatase expression in cancer cells is not predictive of response to endocrine therapy, indicating that in situ estrogen synthesis may not be the major source of intratumoral estrogen. However, aromatase expression in combination with high PR expression may select letrozole treated patients with longer TTP. Trial registration Sub-study of trial P025 for advanced breast cancer. PMID:19531212
Peptidome analysis of human skim milk in term and preterm milk
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wan, Jun; Cui, Xian-wei; Zhang, Jun
Highlights: •A method was developed for preparation of peptide extracts from human milk. •Analysis of the extracts by LC–MS/MS resulted in the detection of 1000–3000 peptide-like features. •419 Peptides were identified by LC–MS/MS from 34 proteins. •Isotope dimethyl labeling analysis revealed 41 peptides differentially expressed. -- Abstract: The abundant proteins in human milk have been well characterized and are known to provide nutritional, protective, and developmental advantages to both term and preterm infants. Due to the difficulties associated with detection technology of the peptides, the expression of the peptides present in human milk is not known widely. In recent years,more » peptidome analysis has received increasing attention. In this report, the analysis of endogenous peptides in human milk was done by mass spectrometry. A method was also developed by our researchers, which can be used in the extraction of peptide from human milk. Analysis of the extracts by LC–MS/MS resulted in the detection of 1000–3000 Da peptide-like features. Out of these, 419 peptides were identified by MS/MS. The identified peptides were found to originate from 34 proteins, of which several have been reported. Analysis of the peptides’ cleavage sites showed that the peptides are cleaved with regulations. This may reflect the protease activity and distribution in human body, and also represent the biological state of the tissue and provide a fresh source for biomarker discovery. Isotope dimethyl labeling analysis was also used to test the effects of premature delivery on milk protein composition in this study. Differences in peptides expression between breast milk in term milk (38–41 weeks gestation) and preterm milk (28–32 weeks gestation) were investigated in this study. 41 Peptides in these two groups were found expressed differently. 23 Peptides were present at higher levels in preterm milk, and 18 were present at higher levels in term milk.« less
USDA-ARS?s Scientific Manuscript database
This study was conducted to clone and analyze the expression pattern of a C4H gene encoding cinnamate 4-hydroxylase from kenaf (Hibiscus cannabinus L.). A full-length C4H ortholog was cloned using degenerate primers and the RACE (rapid amplification of cDNA ends) method. The full-length C4H ortholog...
Microarray Data Mining for Potential Selenium Targets in Chemoprevention of Prostate Cancer
ZHANG, HAITAO; DONG, YAN; ZHAO, HONGJUAN; BROOKS, JAMES D.; HAWTHORN, LESLEYANN; NOWAK, NORMA; MARSHALL, JAMES R.; GAO, ALLEN C.; IP, CLEMENT
2008-01-01
Background A previous clinical trial showed that selenium supplementation significantly reduced the incidence of prostate cancer. We report here a bioinformatics approach to gain new insights into selenium molecular targets that might be relevant to prostate cancer chemoprevention. Materials and Methods We first performed data mining analysis to identify genes which are consistently dysregulated in prostate cancer using published datasets from gene expression profiling of clinical prostate specimens. We then devised a method to systematically analyze three selenium microarray datasets from the LNCaP human prostate cancer cells, and to match the analysis to the cohort of genes implicated in prostate carcinogenesis. Moreover, we compared the selenium datasets with two datasets obtained from expression profiling of androgen-stimulated LNCaP cells. Results We found that selenium reverses the expression of genes implicated in prostate carcinogenesis. In addition, we found that selenium could counteract the effect of androgen on the expression of a subset obtained from androgen-regulated genes. Conclusions The above information provides us with a treasure of new clues to investigate the mechanism of selenium chemoprevention of prostate cancer. Furthermore, these selenium target genes could also serve as biomarkers in future clinical trials to gauge the efficacy of selenium intervention. PMID:18548127
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.
Jiang, Xue; Zhang, Han; Quan, Xiongwen
2016-01-01
Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.
Expression of masculine identity in individuals with a traumatic brain injury.
Keegan, Louise C; Togher, Leanne; Murdock, Macy; Hendry, Emma
2017-01-01
This research seeks to examine and describe how four males with a traumatic brain injury (TBI) use language to negotiate their masculine identities. Qualitative research methods were employed with a 'case study' design that allows for a detailed description of the cases, and the interactions examined. The tools of inquiry applied included a topic analysis, as well as linguistic analysis methods that incorporated the theory of Systemic Functional Linguistics. Such tools were employed in the analysis of 12, two-hour group treatment sessions in order to describe how linguistic choices contributed to the construction of a masculine identity in communicative interactions. Although all participants had significant difficulties with cognitive communication, they all demonstrated an ability to use language to assert their masculine identities. Results revealed that prominent topics used to assert masculinity included confidence, women, risk-taking behaviour and interests and that expressions of masculinity often occurred in giving information roles and involved appraisal and modality. The results have implications for the development of rehabilitation interventions for social communication that provide individuals with TBI with the linguistic tools and communication opportunities necessary in order to successfully express identity and reveal masculinity.
Analysis of gene expression in single live neurons.
Eberwine, J; Yeh, H; Miyashiro, K; Cao, Y; Nair, S; Finnell, R; Zettel, M; Coleman, P
1992-01-01
We present here a method for broadly characterizing single cells at the molecular level beyond the more common morphological and transmitter/receptor classifications. The RNA from defined single cells is amplified by microinjecting primer, nucleotides, and enzyme into acutely dissociated cells from a defined region of rat brain. Further processing yields amplified antisense RNA. A second round of amplification results in greater than 10(6)-fold amplification of the original starting material, which is adequate for analysis--e.g., use as a probe, making of cDNA libraries, etc. We demonstrate this method by constructing expression profiles of single live cells from rat hippocampus. This profiling suggests that cells that appear to be morphologically similar may show marked differences in patterns of expression. In addition, we characterize several mRNAs from a single cell, some of which were previously undescribed, perhaps due to "rarity" when averaged over many cell types. Electrophysiological analysis coupled with molecular biology within the same cell will facilitate a better understanding of how changes at the molecular level are manifested in functional properties. This approach should be applicable to a wide variety of studies, including development, mutant models, aging, and neurodegenerative disease. Images PMID:1557406
Tran, Cassie M.; Markova, Dessislava; Smith, Harvey E.; Susarla, Bala; Ponnappan, Ravi Kumar; Anderson, D Greg; Symes, Aviva; Shapiro, Irving M.; Risbud, Makarand V.
2011-01-01
Objective To investigate TGFβ regulation of CTGF expression in cells of the nucleus pulposus. Methods Real Time RT-PCR and Western blot analysis was used to measure CTGF expression in the nucleus pulposus. Transfections were used to measure the effect of Smad2/3/7 and AP1on TGFβ mediated CTGF promoter activity. Results CTGF expression was lower in the neonatal disc compared with the skeletally mature rat disc. An increase in CTGF expression and promoter activity was observed in nucleus pulposus cells after TGFβ treatment. Deletion analysis indicated that promoter constructs lacking smad and AP1 motifs were unresponsive to treatment. Analysis showed that full-length Smad3 and the Smad3-MH2 domain alone increased CTGF activity. Further evidence of Smad3 and AP1 involvement was seen when DN-Smad3, SiRNA-Smad3, smad7 and DN-AP1 suppressed TGFβ mediated activation of the CTGF promoter. When either Smad3 or AP1 sites were mutated, CTGF promoter induction by TGFβ was suppressed. We also observed a decrease in expression of CTGF in discs of Smad3 null mice compared to the wild type. Analysis of human nucleus pulposus indicated a trend of increasing CTGF and TGFβ expression in the degenerate state. Conclusion TGFβ, through Smad3 and AP1, serves as a positive regulator of CTGF expression in the nucleus pulposus. We propose that CTGF is a part of the limited reparative response of the degenerate disc. PMID:20222112
Bleul, Tim; Rühl, Ralph; Bulashevska, Svetlana; Karakhanova, Svetlana; Werner, Jens; Bazhin, Alexandr V
2015-09-01
Pancreatic ductal adenocarcinoma (PDAC) represents one of the deadliest cancers in the world. All-trans retinoic acid (ATRA) is the major physiologically active form of vitamin A, regulating expression of many genes. Disturbances of vitamin A metabolism are prevalent in some cancer cells. The main aim of this work was to investigate deeply the components of retinoid signaling in PDAC compared to in the normal pancreas and to prove the clinical importance of retinoid receptor expression. For the study, human tumor tissues obtained from PDAC patients and murine tumors from the orthotopic Panc02 model were used for the analysis of retinoids, using high performance liquid chromatography mass spectrometry and real-time RT-PCR gene expression analysis. Survival probabilities in univariate analysis were estimated using the Kaplan-Meier method and the Cox proportional hazards model was used for the multivariate analysis. In this work, we showed for the first time that the ATRA and all-trans retinol concentration is reduced in PDAC tissue compared to their normal counterparts. The expression of RARα and β as well as RXRα and β are down-regulated in PDAC tissue. This reduced expression of retinoid receptors correlates with the expression of some markers of differentiation and epithelial-to-mesenchymal transition as well as of cancer stem cell markers. Importantly, the expression of RARα and RXRβ is associated with better overall survival of PDAC patients. Thus, reduction of retinoids and their receptors is an important feature of PDAC and is associated with worse patient survival outcomes. © 2014 Wiley Periodicals, Inc.
Separate-channel analysis of two-channel microarrays: recovering inter-spot information.
Smyth, Gordon K; Altman, Naomi S
2013-05-26
Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
Methods of DNA methylation analysis.
USDA-ARS?s Scientific Manuscript database
The purpose of this review was to provide guidance for investigators who are new to the field of DNA methylation analysis. Epigenetics is the study of mitotically heritable alterations in gene expression potential that are not mediated by changes in DNA sequence. Recently, it has become clear that n...
Integrative sparse principal component analysis of gene expression data.
Liu, Mengque; Fan, Xinyan; Fang, Kuangnan; Zhang, Qingzhao; Ma, Shuangge
2017-12-01
In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform "classic" meta-analysis and other multidatasets techniques and single-dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance. © 2017 WILEY PERIODICALS, INC.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Ji, Kun; Zhang, Liyan; Zhang, Mingxuan; Chu, Qi; Li, Xin; Wang, Wei
2016-02-01
The overexpression of phosphorylated signal transducer and activator of transcription 3 (p-stat3) was detected in a variety of human tumors. The published studies on p-stat3 expression among gastric carcinoma patients remain controversial.In order to clarify the prognosis value of p-stat3 with overall survival and its association with clinicopathological characteristics in gastric carcinoma, we performed a systematic review and meta-analysis.Eligible studies were retrieved by searching PubMed, Embase, Cochrane library, and Chinese biomedical literature service system databases.Studies described the association between p-stat3 expression and clinicopathological characteristics and overall survival in gastric carcinoma patients; p-stat3 expression was detected by immunohistochemistry (IHC).Odds ratio (OR) and hazard ratio (HR) were considered as a measure of evaluating the association in meta-analysis; I was used to assess the heterogeneity across studies; publication bias was assessed with funnel plot, Egger test, and Begg test.Twenty-three studies including 2872 patients which evaluated the p-stat3 expression by IHC in gastric carcinoma were included. The pooled HR (HR = 2.02, 95% CI: 1.49-2.73, P < 0.00001) indicated that the increased p-stat3 expression was significantly associated with poor overall survival. In addition, when we investigated the association between p-stat3 overexpression and clinicopathological characteristics of gastric carcinoma, we found that the increased p-stat3 expression was significantly associated with tumor differentiation (poorly vs well-moderately: OR = 3.70, 95% CI: 1.98-6.93, P < 0.0001) and lymph node metastasis (present vs absent: OR = 2.40, 95% CI: 1.28-4.50, P = 0.007).The different type of primary antibody was used; the assessment methods of p-stat3 positive expression were defined differently; the locations of p-stat3 expression were different; the method of extrapolating HR from Kaplan-Meier survival curves did seem to be less reliable than when HR was extracted directly from literatures; sample sizes, the age of patients, and the follow-up durations are different.In conclusion, our meta-analysis indicates that the increased p-stat3 expression may be not only predict poor prognosis, but also be associated with worse tumor differentiation and lymph node metastasis in patients with gastric carcinoma.
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.
Liu, Xuejun; Shi, Xinxin; Chen, Chunlin; Zhang, Li
2015-10-16
The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.
High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis
Daye, Z. John; Chen, Jinbo; Li, Hongzhe
2011-01-01
Summary We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. PMID:22547833
Influence of 2 cryopreservation methods to induce CCL-13 from dental pulp cells.
Ahn, Su-Jin; Jang, Ji-Hyun; Seo, Ji-Sung; Cho, Kyu Min; Jung, Su-Hee; Lee, Hyeon-Woo; Kim, Eun-Cheol; Park, Sang Hyuk
2013-12-01
Cryopreservation preserves periodontal ligament cells but has a lower success rate with dental pulp cells (DPCs) because it causes inflammation. There are 2 well-known cryopreservation methods that reduce inflammation, slow freezing and rapid freezing, but the effects of the 2 methods on inflammation are not well-established. The purpose of this study was to compare the effects of the 2 different cryopreservation methods on CCL-13 induction from DPCs by using microarrays, real-time polymerase chain reaction (PCR), Western blotting, enzyme-linked immunosorbent assay, and confocal laser scanning microscopy (CLSM). In this study, the concentration of cryoprotectant was fixed, and the methods compared differed with respect to freezing speed. Initially we screened the DPCs of cryopreserved teeth with expression microarrays, and CCL-13 was identified as a differentially expressed gene involved in generalized inflammation. We then compared the expression of CCL-13 after exposing teeth to the 2 cryopreservation methods by using real-time PCR, Western blot, enzyme-linked immunosorbent assay, and CLSM. Expression of CCL-13 was up-regulated significantly only in the rapid freezing group, except in measurements made by real-time PCR. CLSM analysis also confirmed this up-regulation visually. Rapid freezing increased the expression of CCL-13 in DPCs compared with slow freezing. Understanding the inflammatory effect of cryopreservation should help to establish an optimal cryoprofile to minimize inflammation of DPCs and reduce the need for endodontic treatment. Copyright © 2013 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
ALE: automated label extraction from GEO metadata.
Giles, Cory B; Brown, Chase A; Ripperger, Michael; Dennis, Zane; Roopnarinesingh, Xiavan; Porter, Hunter; Perz, Aleksandra; Wren, Jonathan D
2017-12-28
NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...
2016-11-24
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
A comparative analysis of biclustering algorithms for gene expression data
Eren, Kemal; Deveci, Mehmet; Küçüktunç, Onur; Çatalyürek, Ümit V.
2013-01-01
The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters. PMID:22772837
Bradford, Emily M; Vairamani, Kanimozhi; Shull, Gary E
2016-01-01
AIM: To investigate the intestinal functions of the NKCC1 Na+-K+-2Cl cotransporter (SLC12a2 gene), differential mRNA expression changes in NKCC1-null intestine were analyzed. METHODS: Microarray analysis of mRNA from intestines of adult wild-type mice and gene-targeted NKCC1-null mice (n = 6 of each genotype) was performed to identify patterns of differential gene expression changes. Differential expression patterns were further examined by Gene Ontology analysis using the online Gorilla program, and expression changes of selected genes were verified using northern blot analysis and quantitative real time-polymerase chain reaction. Histological staining and immunofluorescence were performed to identify cell types in which upregulated pancreatic digestive enzymes were expressed. RESULTS: Genes typically associated with pancreatic function were upregulated. These included lipase, amylase, elastase, and serine proteases indicative of pancreatic exocrine function, as well as insulin and regenerating islet genes, representative of endocrine function. Northern blot analysis and immunohistochemistry showed that differential expression of exocrine pancreas mRNAs was specific to the duodenum and localized to a subset of goblet cells. In addition, a major pattern of changes involving differential expression of olfactory receptors that function in chemical sensing, as well as other chemosensing G-protein coupled receptors, was observed. These changes in chemosensory receptor expression may be related to the failure of intestinal function and dependency on parenteral nutrition observed in humans with SLC12a2 mutations. CONCLUSION: The results suggest that loss of NKCC1 affects not only secretion, but also goblet cell function and chemosensing of intestinal contents via G-protein coupled chemosensory receptors. PMID:26909237
Linnorm: improved statistical analysis for single cell RNA-seq expression data.
Yip, Shun H; Wang, Panwen; Kocher, Jean-Pierre A; Sham, Pak Chung; Wang, Junwen
2017-12-15
Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Adaptation of video game UVW mapping to 3D visualization of gene expression patterns
NASA Astrophysics Data System (ADS)
Vize, Peter D.; Gerth, Victor E.
2007-01-01
Analysis of gene expression patterns within an organism plays a critical role in associating genes with biological processes in both health and disease. During embryonic development the analysis and comparison of different gene expression patterns allows biologists to identify candidate genes that may regulate the formation of normal tissues and organs and to search for genes associated with congenital diseases. No two individual embryos, or organs, are exactly the same shape or size so comparing spatial gene expression in one embryo to that in another is difficult. We will present our efforts in comparing gene expression data collected using both volumetric and projection approaches. Volumetric data is highly accurate but difficult to process and compare. Projection methods use UV mapping to align texture maps to standardized spatial frameworks. This approach is less accurate but is very rapid and requires very little processing. We have built a database of over 180 3D models depicting gene expression patterns mapped onto the surface of spline based embryo models. Gene expression data in different models can easily be compared to determine common regions of activity. Visualization software, both Java and OpenGL optimized for viewing 3D gene expression data will also be demonstrated.
Multivariate analysis of flow cytometric data using decision trees.
Simon, Svenja; Guthke, Reinhard; Kamradt, Thomas; Frey, Oliver
2012-01-01
Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.
A whole-mount in situ hybridization method for microRNA detection in Caenorhabditis elegans
Andachi, Yoshiki; Kohara, Yuji
2016-01-01
Whole-mount in situ hybridization (WISH) is an outstanding method to decipher the spatiotemporal expression patterns of microRNAs (miRNAs) and provides important clues for elucidating their functions. The first WISH method for miRNA detection was developed in zebrafish. Although this method was quickly adapted for other vertebrates and fruit flies, WISH analysis has not been successfully used to detect miRNAs in Caenorhabditis elegans. Here, we show a novel WISH method for miRNA detection in C. elegans. Using this method, mir-1 miRNA was detected in the body-wall muscle where the expression and roles of mir-1 miRNA have been previously elucidated. Application of the method to let-7 family miRNAs, let-7, mir-48, mir-84, and mir-241, revealed their distinct but partially overlapping expression patterns, indicating that miRNAs sharing a short common sequence were distinguishably detected. In pash-1 mutants that were depleted of mature miRNAs, signals of mir-48 miRNA were greatly reduced, suggesting that mature miRNAs were detected by the method. These results demonstrate the validity of WISH to detect mature miRNAs in C. elegans. PMID:27154969
Claus, Rainer; Lucas, David M.; Stilgenbauer, Stephan; Ruppert, Amy S.; Yu, Lianbo; Zucknick, Manuela; Mertens, Daniel; Bühler, Andreas; Oakes, Christopher C.; Larson, Richard A.; Kay, Neil E.; Jelinek, Diane F.; Kipps, Thomas J.; Rassenti, Laura Z.; Gribben, John G.; Döhner, Hartmut; Heerema, Nyla A.; Marcucci, Guido; Plass, Christoph; Byrd, John C.
2012-01-01
Purpose Increased ZAP-70 expression predicts poor prognosis in chronic lymphocytic leukemia (CLL). Current methods for accurately measuring ZAP-70 expression are problematic, preventing widespread application of these tests in clinical decision making. We therefore used comprehensive DNA methylation profiling of the ZAP-70 regulatory region to identify sites important for transcriptional control. Patients and Methods High-resolution quantitative DNA methylation analysis of the entire ZAP-70 gene regulatory regions was conducted on 247 samples from patients with CLL from four independent clinical studies. Results Through this comprehensive analysis, we identified a small area in the 5′ regulatory region of ZAP-70 that showed large variability in methylation in CLL samples but was universally methylated in normal B cells. High correlation with mRNA and protein expression, as well as activity in promoter reporter assays, revealed that within this differentially methylated region, a single CpG dinucleotide and neighboring nucleotides are particularly important in ZAP-70 transcriptional regulation. Furthermore, by using clustering approaches, we identified a prognostic role for this site in four independent data sets of patients with CLL using time to treatment, progression-free survival, and overall survival as clinical end points. Conclusion Comprehensive quantitative DNA methylation analysis of the ZAP-70 gene in CLL identified important regions responsible for transcriptional regulation. In addition, loss of methylation at a specific single CpG dinucleotide in the ZAP-70 5′ regulatory sequence is a highly predictive and reproducible biomarker of poor prognosis in this disease. This work demonstrates the feasibility of using quantitative specific ZAP-70 methylation analysis as a relevant clinically applicable prognostic test in CLL. PMID:22564988
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Gene expression inference with deep learning.
Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui
2016-06-15
Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Gene expression inference with deep learning
Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui
2016-01-01
Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. Availability and implementation: D-GEX is available at https://github.com/uci-cbcl/D-GEX. Contact: xhx@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26873929
Optimization of cDNA microarrays procedures using criteria that do not rely on external standards.
Bruland, Torunn; Anderssen, Endre; Doseth, Berit; Bergum, Hallgeir; Beisvag, Vidar; Laegreid, Astrid
2007-10-18
The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish.
Optimization of cDNA microarrays procedures using criteria that do not rely on external standards
Bruland, Torunn; Anderssen, Endre; Doseth, Berit; Bergum, Hallgeir; Beisvag, Vidar; Lægreid, Astrid
2007-01-01
Background The measurement of gene expression using microarray technology is a complicated process in which a large number of factors can be varied. Due to the lack of standard calibration samples such as are used in traditional chemical analysis it may be a problem to evaluate whether changes done to the microarray procedure actually improve the identification of truly differentially expressed genes. The purpose of the present work is to report the optimization of several steps in the microarray process both in laboratory practices and in data processing using criteria that do not rely on external standards. Results We performed a cDNA microarry experiment including RNA from samples with high expected differential gene expression termed "high contrasts" (rat cell lines AR42J and NRK52E) compared to self-self hybridization, and optimized a pipeline to maximize the number of genes found to be differentially expressed in the "high contrasts" RNA samples by estimating the false discovery rate (FDR) using a null distribution obtained from the self-self experiment. The proposed high-contrast versus self-self method (HCSSM) requires only four microarrays per evaluation. The effects of blocking reagent dose, filtering, and background corrections methodologies were investigated. In our experiments a dose of 250 ng LNA (locked nucleic acid) dT blocker, no background correction and weight based filtering gave the largest number of differentially expressed genes. The choice of background correction method had a stronger impact on the estimated number of differentially expressed genes than the choice of filtering method. Cross platform microarray (Illumina) analysis was used to validate that the increase in the number of differentially expressed genes found by HCSSM was real. Conclusion The results show that HCSSM can be a useful and simple approach to optimize microarray procedures without including external standards. Our optimizing method is highly applicable to both long oligo-probe microarrays which have become commonly used for well characterized organisms such as man, mouse and rat, as well as to cDNA microarrays which are still of importance for organisms with incomplete genome sequence information such as many bacteria, plants and fish. PMID:17949480
The human disease network in terms of dysfunctional regulatory mechanisms.
Yang, Jing; Wu, Su-Juan; Dai, Wen-Tao; Li, Yi-Xue; Li, Yuan-Yuan
2015-10-08
Elucidation of human disease similarities has emerged as an active research area, which is highly relevant to etiology, disease classification, and drug repositioning. In pioneer studies, disease similarity was commonly estimated according to clinical manifestation. Subsequently, scientists started to investigate disease similarity based on gene-phenotype knowledge, which were inevitably biased to well-studied diseases. In recent years, estimating disease similarity according to transcriptomic behavior significantly enhances the probability of finding novel disease relationships, while the currently available studies usually mine expression data through differential expression analysis that has been considered to have little chance of unraveling dysfunctional regulatory relationships, the causal pathogenesis of diseases. We developed a computational approach to measure human disease similarity based on expression data. Differential coexpression analysis, instead of differential expression analysis, was employed to calculate differential coexpression level of every gene for each disease, which was then summarized to the pathway level. Disease similarity was eventually calculated as the partial correlation coefficients of pathways' differential coexpression values between any two diseases. The significance of disease relationships were evaluated by permutation test. Based on mRNA expression data and a differential coexpression analysis based method, we built a human disease network involving 1326 significant Disease-Disease links among 108 diseases. Compared with disease relationships captured by differential expression analysis based method, our disease links shared known disease genes and drugs more significantly. Some novel disease relationships were discovered, for example, Obesity and cancer, Obesity and Psoriasis, lung adenocarcinoma and S. pneumonia, which had been commonly regarded as unrelated to each other, but recently found to share similar molecular mechanisms. Additionally, it was found that both the type of disease and the type of affected tissue influenced the degree of disease similarity. A sub-network including Allergic asthma, Type 2 diabetes and Chronic kidney disease was extracted to demonstrate the exploration of their common pathogenesis. The present study produces a global view of human diseasome for the first time from the viewpoint of regulation mechanisms, which therefore could provide insightful clues to etiology and pathogenesis, and help to perform drug repositioning and design novel therapeutic interventions.
Evaluation of isolation methods for bacterial RNA quantitation in Dickeya dadantii
USDA-ARS?s Scientific Manuscript database
Dickeya dadantii is a difficult source for RNA of a sufficient quality for real-time qRT-PCR analysis of gene expression. Three RNA isolation methods were evaluated for their ability to produce high-quality RNA from this bacterium. Bacterial lysis with Trizol using standard protocols consistently ga...
Complexity reduction of biochemical rate expressions.
Schmidt, Henning; Madsen, Mads F; Danø, Sune; Cedersund, Gunnar
2008-03-15
The current trend in dynamical modelling of biochemical systems is to construct more and more mechanistically detailed and thus complex models. The complexity is reflected in the number of dynamic state variables and parameters, as well as in the complexity of the kinetic rate expressions. However, a greater level of complexity, or level of detail, does not necessarily imply better models, or a better understanding of the underlying processes. Data often does not contain enough information to discriminate between different model hypotheses, and such overparameterization makes it hard to establish the validity of the various parts of the model. Consequently, there is an increasing demand for model reduction methods. We present a new reduction method that reduces complex rational rate expressions, such as those often used to describe enzymatic reactions. The method is a novel term-based identifiability analysis, which is easy to use and allows for user-specified reductions of individual rate expressions in complete models. The method is one of the first methods to meet the classical engineering objective of improved parameter identifiability without losing the systems biology demand of preserved biochemical interpretation. The method has been implemented in the Systems Biology Toolbox 2 for MATLAB, which is freely available from http://www.sbtoolbox2.org. The Supplementary Material contains scripts that show how to use it by applying the method to the example models, discussed in this article.
Ontology based molecular signatures for immune cell types via gene expression analysis
2013-01-01
Background New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity. Results We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types. Conclusions This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights. PMID:24004649
Design component method for sensitivity analysis of built-up structures
NASA Technical Reports Server (NTRS)
Choi, Kyung K.; Seong, Hwai G.
1986-01-01
A 'design component method' that provides a unified and systematic organization of design sensitivity analysis for built-up structures is developed and implemented. Both conventional design variables, such as thickness and cross-sectional area, and shape design variables of components of built-up structures are considered. It is shown that design of components of built-up structures can be characterized and system design sensitivity expressions obtained by simply adding contributions from each component. The method leads to a systematic organization of computations for design sensitivity analysis that is similar to the way in which computations are organized within a finite element code.
oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes
Ho Sui, Shannan J.; Mortimer, James R.; Arenillas, David J.; Brumm, Jochen; Walsh, Christopher J.; Kennedy, Brian P.; Wasserman, Wyeth W.
2005-01-01
Targeted transcript profiling studies can identify sets of co-expressed genes; however, identification of the underlying functional mechanism(s) is a significant challenge. Established methods for the analysis of gene annotations, particularly those based on the Gene Ontology, can identify functional linkages between genes. Similar methods for the identification of over-represented transcription factor binding sites (TFBSs) have been successful in yeast, but extension to human genomics has largely proved ineffective. Creation of a system for the efficient identification of common regulatory mechanisms in a subset of co-expressed human genes promises to break a roadblock in functional genomics research. We have developed an integrated system that searches for evidence of co-regulation by one or more transcription factors (TFs). oPOSSUM combines a pre-computed database of conserved TFBSs in human and mouse promoters with statistical methods for identification of sites over-represented in a set of co-expressed genes. The algorithm successfully identified mediating TFs in control sets of tissue-specific genes and in sets of co-expressed genes from three transcript profiling studies. Simulation studies indicate that oPOSSUM produces few false positives using empirically defined thresholds and can tolerate up to 50% noise in a set of co-expressed genes. PMID:15933209
Supported Lipid Bilayer Technology for the Study of Cellular Interfaces
Crites, Travis J.; Maddox, Michael; Padhan, Kartika; Muller, James; Eigsti, Calvin; Varma, Rajat
2015-01-01
Glass-supported lipid bilayers presenting freely diffusing proteins have served as a powerful tool for studying cell-cell interfaces, in particular, T cell–antigen presenting cell (APC) interactions, using optical microscopy. Here we expand upon existing protocols and describe the preparation of liposomes by an extrusion method, and describe how this system can be used to study immune synapse formation by Jurkat cells. We also present a method for forming such lipid bilayers on silica beads for the study of signaling responses by population methods, such as western blotting, flow cytometry, and gene-expression analysis. Finally, we describe how to design and prepare transmembrane-anchored protein-laden liposomes, following expression in suspension CHO (CHOs) cells, a mammalian expression system alternative to insect and bacterial cell lines, which do not produce mammalian glycosylation patterns. Such transmembrane-anchored proteins may have many novel applications in cell biology and immunology. PMID:26331983
Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma.
Fowles, Jared S; Brown, Kristen C; Hess, Ann M; Duval, Dawn L; Gustafson, Daniel L
2016-02-19
Genomics-based predictors of drug response have the potential to improve outcomes associated with cancer therapy. Osteosarcoma (OS), the most common primary bone cancer in dogs, is commonly treated with adjuvant doxorubicin or carboplatin following amputation of the affected limb. We evaluated the use of gene-expression based models built in an intra- or interspecies manner to predict chemosensitivity and treatment outcome in canine OS. Models were built and evaluated using microarray gene expression and drug sensitivity data from human and canine cancer cell lines, and canine OS tumor datasets. The "COXEN" method was utilized to filter gene signatures between human and dog datasets based on strong co-expression patterns. Models were built using linear discriminant analysis via the misclassification penalized posterior algorithm. The best doxorubicin model involved genes identified in human lines that were co-expressed and trained on canine OS tumor data, which accurately predicted clinical outcome in 73 % of dogs (p = 0.0262, binomial). The best carboplatin model utilized canine lines for gene identification and model training, with canine OS tumor data for co-expression. Dogs whose treatment matched our predictions had significantly better clinical outcomes than those that didn't (p = 0.0006, Log Rank), and this predictor significantly associated with longer disease free intervals in a Cox multivariate analysis (hazard ratio = 0.3102, p = 0.0124). Our data show that intra- and interspecies gene expression models can successfully predict response in canine OS, which may improve outcome in dogs and serve as pre-clinical validation for similar methods in human cancer research.
Validation of Symbolic Expressions in Circuit Analysis E-Learning
ERIC Educational Resources Information Center
Weyten, L.; Rombouts, P.; Catteau, B.; De Bock, M.
2011-01-01
Symbolic circuit analysis is a cornerstone of electrical engineering education. Solving a suitable set of selected problems is essential to developing professional skills in the field. A new method is presented for automatic validation of circuit equations representing a student's intermediate steps in the solving process. Providing this immediate…
Bengtsson, Henrik; Hössjer, Ola
2006-03-01
Low-level processing and normalization of microarray data are most important steps in microarray analysis, which have profound impact on downstream analysis. Multiple methods have been suggested to date, but it is not clear which is the best. It is therefore important to further study the different normalization methods in detail and the nature of microarray data in general. A methodological study of affine models for gene expression data is carried out. Focus is on two-channel comparative studies, but the findings generalize also to single- and multi-channel data. The discussion applies to spotted as well as in-situ synthesized microarray data. Existing normalization methods such as curve-fit ("lowess") normalization, parallel and perpendicular translation normalization, and quantile normalization, but also dye-swap normalization are revisited in the light of the affine model and their strengths and weaknesses are investigated in this context. As a direct result from this study, we propose a robust non-parametric multi-dimensional affine normalization method, which can be applied to any number of microarrays with any number of channels either individually or all at once. A high-quality cDNA microarray data set with spike-in controls is used to demonstrate the power of the affine model and the proposed normalization method. We find that an affine model can explain non-linear intensity-dependent systematic effects in observed log-ratios. Affine normalization removes such artifacts for non-differentially expressed genes and assures that symmetry between negative and positive log-ratios is obtained, which is fundamental when identifying differentially expressed genes. In addition, affine normalization makes the empirical distributions in different channels more equal, which is the purpose of quantile normalization, and may also explain why dye-swap normalization works or fails. All methods are made available in the aroma package, which is a platform-independent package for R.
van Dam, Jesse C J; Schaap, Peter J; Martins dos Santos, Vitor A P; Suárez-Diez, María
2014-09-26
Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each separate set has an intrinsic value that is diluted and partly lost when building a consensus network. Here we present a methodology to generate co-expression networks and, instead of a consensus network, we propose an integration framework where the different networks are kept and analysed with additional tools to efficiently combine the information extracted from each network. We developed a workflow to efficiently analyse information generated by different inference and prediction methods. Our methodology relies on providing the user the means to simultaneously visualise and analyse the coexisting networks generated by different algorithms, heterogeneous datasets, and a suite of analysis tools. As a show case, we have analysed the gene co-expression networks of Mycobacterium tuberculosis generated using over 600 expression experiments. Regarding DNA damage repair, we identified SigC as a key control element, 12 new targets for LexA, an updated LexA binding motif, and a potential mismatch repair system. We expanded the DevR regulon with 27 genes while identifying 9 targets wrongly assigned to this regulon. We discovered 10 new genes linked to zinc uptake and a new regulatory mechanism for ZuR. The use of co-expression networks to perform system level analysis allows the development of custom made methodologies. As show cases we implemented a pipeline to integrate ChIP-seq data and another method to uncover multiple regulatory layers. Our workflow is based on representing the multiple types of information as network representations and presenting these networks in a synchronous framework that allows their simultaneous visualization while keeping specific associations from the different networks. By simultaneously exploring these networks and metadata, we gained insights into regulatory mechanisms in M. tuberculosis that could not be obtained through the separate analysis of each data type.
2009-01-01
Background Large discrepancies in signature composition and outcome concordance have been observed between different microarray breast cancer expression profiling studies. This is often ascribed to differences in array platform as well as biological variability. We conjecture that other reasons for the observed discrepancies are the measurement error associated with each feature and the choice of preprocessing method. Microarray data are known to be subject to technical variation and the confidence intervals around individual point estimates of expression levels can be wide. Furthermore, the estimated expression values also vary depending on the selected preprocessing scheme. In microarray breast cancer classification studies, however, these two forms of feature variability are almost always ignored and hence their exact role is unclear. Results We have performed a comprehensive sensitivity analysis of microarray breast cancer classification under the two types of feature variability mentioned above. We used data from six state of the art preprocessing methods, using a compendium consisting of eight diferent datasets, involving 1131 hybridizations, containing data from both one and two-color array technology. For a wide range of classifiers, we performed a joint study on performance, concordance and stability. In the stability analysis we explicitly tested classifiers for their noise tolerance by using perturbed expression profiles that are based on uncertainty information directly related to the preprocessing methods. Our results indicate that signature composition is strongly influenced by feature variability, even if the array platform and the stratification of patient samples are identical. In addition, we show that there is often a high level of discordance between individual class assignments for signatures constructed on data coming from different preprocessing schemes, even if the actual signature composition is identical. Conclusion Feature variability can have a strong impact on breast cancer signature composition, as well as the classification of individual patient samples. We therefore strongly recommend that feature variability is considered in analyzing data from microarray breast cancer expression profiling experiments. PMID:19941644
Jia, Peilin; Chen, Xiangning; Xie, Wei; Kendler, Kenneth S; Zhao, Zhongming
2018-06-20
Numerous high-throughput omics studies have been conducted in schizophrenia, providing an accumulated catalog of susceptible variants and genes. The results from these studies, however, are highly heterogeneous. The variants and genes nominated by different omics studies often have limited overlap with each other. There is thus a pressing need for integrative analysis to unify the different types of data and provide a convergent view of schizophrenia candidate genes (SZgenes). In this study, we collected a comprehensive, multidimensional dataset, including 7819 brain-expressed genes. The data hosted genome-wide association evidence in genetics (eg, genotyping data, copy number variations, de novo mutations), epigenetics, transcriptomics, and literature mining. We developed a method named mega-analysis of odds ratio (MegaOR) to prioritize SZgenes. Application of MegaOR in the multidimensional data resulted in consensus sets of SZgenes (up to 530), each enriched with dense, multidimensional evidence. We proved that these SZgenes had highly tissue-specific expression in brain and nerve and had intensive interactions that were significantly stronger than chance expectation. Furthermore, we found these SZgenes were involved in human brain development by showing strong spatiotemporal expression patterns; these characteristics were replicated in independent brain expression datasets. Finally, we found the SZgenes were enriched in critical functional gene sets involved in neuronal activities, ligand gated ion signaling, and fragile X mental retardation protein targets. In summary, MegaOR analysis reported consensus sets of SZgenes with enriched association evidence to schizophrenia, providing insights into the pathophysiology underlying schizophrenia.
TransAtlasDB: an integrated database connecting expression data, metadata and variants
Adetunji, Modupeore O; Lamont, Susan J; Schmidt, Carl J
2018-01-01
Abstract High-throughput transcriptome sequencing (RNAseq) is the universally applied method for target-free transcript identification and gene expression quantification, generating huge amounts of data. The constraint of accessing such data and interpreting results can be a major impediment in postulating suitable hypothesis, thus an innovative storage solution that addresses these limitations, such as hard disk storage requirements, efficiency and reproducibility are paramount. By offering a uniform data storage and retrieval mechanism, various data can be compared and easily investigated. We present a sophisticated system, TransAtlasDB, which incorporates a hybrid architecture of both relational and NoSQL databases for fast and efficient data storage, processing and querying of large datasets from transcript expression analysis with corresponding metadata, as well as gene-associated variants (such as SNPs) and their predicted gene effects. TransAtlasDB provides the data model of accurate storage of the large amount of data derived from RNAseq analysis and also methods of interacting with the database, either via the command-line data management workflows, written in Perl, with useful functionalities that simplifies the complexity of data storage and possibly manipulation of the massive amounts of data generated from RNAseq analysis or through the web interface. The database application is currently modeled to handle analyses data from agricultural species, and will be expanded to include more species groups. Overall TransAtlasDB aims to serve as an accessible repository for the large complex results data files derived from RNAseq gene expression profiling and variant analysis. Database URL: https://modupeore.github.io/TransAtlasDB/ PMID:29688361
NASA Astrophysics Data System (ADS)
Suparmi, A.; Cari, C.; Lilis Elviyanti, Isnaini
2018-04-01
Analysis of relativistic energy and wave function for zero spin particles using Klein Gordon equation was influenced by separable noncentral cylindrical potential was solved by asymptotic iteration method (AIM). By using cylindrical coordinates, the Klein Gordon equation for the case of symmetry spin was reduced to three one-dimensional Schrodinger like equations that were solvable using variable separation method. The relativistic energy was calculated numerically with Matlab software, and the general unnormalized wave function was expressed in hypergeometric terms.
2015-01-01
Background Transgenerational epigenetics (TGE) are currently considered important in disease, but the mechanisms involved are not yet fully understood. TGE abnormalities expected to cause disease are likely to be initiated during development and to be mediated by aberrant gene expression associated with aberrant promoter methylation that is heritable between generations. However, because methylation is removed and then re-established during development, it is not easy to identify promoter methylation abnormalities by comparing normal lineages with those expected to exhibit TGE abnormalities. Methods This study applied the recently proposed principal component analysis (PCA)-based unsupervised feature extraction to previously reported and publically available gene expression/promoter methylation profiles of rat primordial germ cells, between E13 and E16 of the F3 generation vinclozolin lineage that are expected to exhibit TGE abnormalities, to identify multiple genes that exhibited aberrant gene expression/promoter methylation during development. Results The biological feasibility of the identified genes were tested via enrichment analyses of various biological concepts including pathway analysis, gene ontology terms and protein-protein interactions. All validations suggested superiority of the proposed method over three conventional and popular supervised methods that employed t test, limma and significance analysis of microarrays, respectively. The identified genes were globally related to tumors, the prostate, kidney, testis and the immune system and were previously reported to be related to various diseases caused by TGE. Conclusions Among the genes reported by PCA-based unsupervised feature extraction, we propose that chemokine signaling pathways and leucine rich repeat proteins are key factors that initiate transgenerational epigenetic-mediated diseases, because multiple genes included in these two categories were identified in this study. PMID:26677731
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tholouli, Eleni; MacDermott, Sarah; Hoyland, Judith
2012-08-24
Highlights: Black-Right-Pointing-Pointer Development of a quantitative high throughput in situ expression profiling method. Black-Right-Pointing-Pointer Application to a tissue microarray of 242 AML bone marrow samples. Black-Right-Pointing-Pointer Identification of HOXA4, HOXA9, Meis1 and DNMT3A as prognostic markers in AML. -- Abstract: Measurement and validation of microarray gene signatures in routine clinical samples is problematic and a rate limiting step in translational research. In order to facilitate measurement of microarray identified gene signatures in routine clinical tissue a novel method combining quantum dot based oligonucleotide in situ hybridisation (QD-ISH) and post-hybridisation spectral image analysis was used for multiplex in-situ transcript detection inmore » archival bone marrow trephine samples from patients with acute myeloid leukaemia (AML). Tissue-microarrays were prepared into which white cell pellets were spiked as a standard. Tissue microarrays were made using routinely processed bone marrow trephines from 242 patients with AML. QD-ISH was performed for six candidate prognostic genes using triplex QD-ISH for DNMT1, DNMT3A, DNMT3B, and for HOXA4, HOXA9, Meis1. Scrambled oligonucleotides were used to correct for background staining followed by normalisation of expression against the expression values for the white cell pellet standard. Survival analysis demonstrated that low expression of HOXA4 was associated with poorer overall survival (p = 0.009), whilst high expression of HOXA9 (p < 0.0001), Meis1 (p = 0.005) and DNMT3A (p = 0.04) were associated with early treatment failure. These results demonstrate application of a standardised, quantitative multiplex QD-ISH method for identification of prognostic markers in formalin-fixed paraffin-embedded clinical samples, facilitating measurement of gene expression signatures in routine clinical samples.« less
Qiu, Wei-Hai; Chen, Gui-Yan; Cui, Lu; Zhang, Ting-Ming; Wei, Feng; Yang, Yong
2016-01-01
To identify differential pathways between papillary thyroid carcinoma (PTC) patients and normal controls utilizing a novel method which combined pathway with co-expression network. The proposed method included three steps. In the first step, we conducted pretreatments for background pathways and gained representative pathways in PTC. Subsequently, a co-expression network for representative pathways was constructed using empirical Bayes (EB) approach to assign a weight value for each pathway. Finally, random model was extracted to set the thresholds of identifying differential pathways. We obtained 1267 representative pathways and their weight values based on the co-expressed pathway network, and then by meeting the criterion (Weight > 0.0296), 87 differential pathways in total across PTC patients and normal controls were identified. The top three ranked differential pathways were CREB phosphorylation, attachment of GPI anchor to urokinase plasminogen activator receptor (uPAR) and loss of function of SMAD2/3 in cancer. In conclusion, we successfully identified differential pathways (such as CREB phosphorylation, attachment of GPI anchor to uPAR and post-translational modification: synthesis of GPI-anchored proteins) for PTC using the proposed pathway co-expression method, and these pathways might be potential biomarkers for target therapy and detection of PTC.
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Robinson, Mark D; McCarthy, Davis J; Smyth, Gordon K
2010-01-01
It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruebel, Oliver
2009-11-20
Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less
Peng, Hao; Yang, Yifan; Zhe, Shandian; Wang, Jian; Gribskov, Michael; Qi, Yuan
2017-01-01
Abstract Motivation High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy. Results We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature. Availability and implementation The software is available at https://github.com/hao-peng/DEIsoM Contact pengh@alumni.purdue.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28595376
Dimensionless Numbers Expressed in Terms of Common CVD Process Parameters
NASA Technical Reports Server (NTRS)
Kuczmarski, Maria A.
1999-01-01
A variety of dimensionless numbers related to momentum and heat transfer are useful in Chemical Vapor Deposition (CVD) analysis. These numbers are not traditionally calculated by directly using reactor operating parameters, such as temperature and pressure. In this paper, these numbers have been expressed in a form that explicitly shows their dependence upon the carrier gas, reactor geometry, and reactor operation conditions. These expressions were derived for both monatomic and diatomic gases using estimation techniques for viscosity, thermal conductivity, and heat capacity. Values calculated from these expressions compared well to previously published values. These expressions provide a relatively quick method for predicting changes in the flow patterns resulting from changes in the reactor operating conditions.
Biomarker discovery and transcriptomic responses in Daphnia magna exposed to munitions constituents.
Garcia-Reyero, Natalia; Poynton, Helen C; Kennedy, Alan J; Guan, Xin; Escalon, B Lynn; Chang, Bonnie; Varshavsky, Julia; Loguinov, Alex V; Vulpe, Chris D; Perkins, Edward J
2009-06-01
Ecotoxicogenomic approaches are emerging as alternative methods in environmental monitoring because they allow insight into pollutant modes of action and help assess the causal agents and potential toxicity beyond the traditional end points of death, growth, and reproduction. Gene expression analysis has shown particular promise for identifying gene expression biomarkers of chemical exposure that can be further used to monitor specific chemical exposures in the environment. We focused on the development of gene expression markers to detect and discriminate between chemical exposures. Using a custom cDNA microarray for Daphnia magna, we identified distinct expression fingerprints in response to exposure at sublethal concentrations of Cu, Zn, Pb, and munitions constituents. Using the results obtained from microarray analysis, we chose a suite of potential biomarkers for each of the specific exposures. The selected potential biomarkers were tested in independent chemical exposures for specificity using quantitative reverse transcription polymerase chain reaction. Six genes were confirmed as differentially regulated bythe selected chemical exposures. Furthermore, each exposure was identified by response of a unique combination (suite) of individual gene expression biomarkers. These results demonstrate the potential for discovery and validation of novel biomarkers of chemical exposures using gene expression analysis, which could have broad applicability in environmental monitoring.
The Omics Dashboard for interactive exploration of gene-expression data.
Paley, Suzanne; Parker, Karen; Spaulding, Aaron; Tomb, Jean-Francois; O'Maille, Paul; Karp, Peter D
2017-12-01
The Omics Dashboard is a software tool for interactive exploration and analysis of gene-expression datasets. The Omics Dashboard is organized as a hierarchy of cellular systems. At the highest level of the hierarchy the Dashboard contains graphical panels depicting systems such as biosynthesis, energy metabolism, regulation and central dogma. Each of those panels contains a series of X-Y plots depicting expression levels of subsystems of that panel, e.g. subsystems within the central dogma panel include transcription, translation and protein maturation and folding. The Dashboard presents a visual read-out of the expression status of cellular systems to facilitate a rapid top-down user survey of how all cellular systems are responding to a given stimulus, and to enable the user to quickly view the responses of genes within specific systems of interest. Although the Dashboard is complementary to traditional statistical methods for analysis of gene-expression data, we show how it can detect changes in gene expression that statistical techniques may overlook. We present the capabilities of the Dashboard using two case studies: the analysis of lipid production for the marine alga Thalassiosira pseudonana, and an investigation of a shift from anaerobic to aerobic growth for the bacterium Escherichia coli. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Omics Dashboard for interactive exploration of gene-expression data
Paley, Suzanne; Parker, Karen; Spaulding, Aaron; Tomb, Jean-Francois; O’Maille, Paul
2017-01-01
Abstract The Omics Dashboard is a software tool for interactive exploration and analysis of gene-expression datasets. The Omics Dashboard is organized as a hierarchy of cellular systems. At the highest level of the hierarchy the Dashboard contains graphical panels depicting systems such as biosynthesis, energy metabolism, regulation and central dogma. Each of those panels contains a series of X–Y plots depicting expression levels of subsystems of that panel, e.g. subsystems within the central dogma panel include transcription, translation and protein maturation and folding. The Dashboard presents a visual read-out of the expression status of cellular systems to facilitate a rapid top-down user survey of how all cellular systems are responding to a given stimulus, and to enable the user to quickly view the responses of genes within specific systems of interest. Although the Dashboard is complementary to traditional statistical methods for analysis of gene-expression data, we show how it can detect changes in gene expression that statistical techniques may overlook. We present the capabilities of the Dashboard using two case studies: the analysis of lipid production for the marine alga Thalassiosira pseudonana, and an investigation of a shift from anaerobic to aerobic growth for the bacterium Escherichia coli. PMID:29040755
A Model-Based Joint Identification of Differentially Expressed Genes and Phenotype-Associated Genes
Seo, Minseok; Shin, Su-kyung; Kwon, Eun-Young; Kim, Sung-Eun; Bae, Yun-Jung; Lee, Seungyeoun; Sung, Mi-Kyung; Choi, Myung-Sook; Park, Taesung
2016-01-01
Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs) among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs). However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naïve approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods. Through analysis of data from experimental microarrays and simulation studies, the proposed model-based approach was shown to provide a more powerful result than the naïve approach and the hierarchical approach. Since our approach is model-based, it is very flexible and can easily handle different types of covariates. PMID:26964035
Breast Angiosarcoma: Case Series and Expression of Vascular Endothelial Growth Factor
Brar, Rondeep; West, Robert; Witten, Daniela; Raman, Bhargav; Jacobs, Charlotte; Ganjoo, Kristen
2009-01-01
Purpose Angiosarcoma of the breast is a rare, malignant tumor for which little is known regarding prognostic indicators and optimal therapeutic regimens. To address this issue, we performed a retrospective analysis of breast angiosarcoma cases seen at Stanford University along with immunohistochemical analysis for markers of angiogenesis. Methods Breast angiosarcoma cases seen between 1980 and 2008 were examined. Viable tissue blocks were analyzed for expression of vascular endothelial growth factor and its receptors. Results A total of 16 cases were identified. Data was collected regarding epidemiology, treatment, response rates, disease-free survival, and the use of various imaging modalities. Five tissue blocks remained viable for immunohistochemical analysis. Vascular endothelial growth factor-A was positively expressed in 3 of these samples. Conclusion Angiosarcoma of the breast is an aggressive malignancy with a propensity for both local recurrence and distant metastases. Angiogenesis inhibition may represent a novel therapeutic modality in this rare, vascular malignancy. PMID:20737044
Cloning of rat MLH1 and expression analysis of MSH2, MSH3, MSH6, and MLH1 during spermatogenesis.
Geeta Vani, R; Varghese, C M; Rao, M R
1999-12-15
The mismatch repair system has been highly conserved in various species. In eukaryotic cells, the Mut S and Mut L homologues play crucial roles in both DNA mismatch repair and meiotic recombination. A full-length rat cDNA clone for rat MLH1 has been constructed using the RT-PCR method. The cDNA has an open reading frame of 2274 nucleotides for a protein of 757 amino acids. We have also obtained partial cDNA clones for MSH3 and MSH6. Northern blot analysis of rat MLH1, MSH2, MSH3, and MSH6 in the testes of rats of different ages showed differential expression of these genes as a function of developmental maturation of the testes. The expression analysis suggests that MSH3 may have a more predominant role in the meiotic recombination process. Copyright 1999 Academic Press.
Ruijter, Jan M; Pfaffl, Michael W; Zhao, Sheng; Spiess, Andrej N; Boggy, Gregory; Blom, Jochen; Rutledge, Robert G; Sisti, Davide; Lievens, Antoon; De Preter, Katleen; Derveaux, Stefaan; Hellemans, Jan; Vandesompele, Jo
2013-01-01
RNA transcripts such as mRNA or microRNA are frequently used as biomarkers to determine disease state or response to therapy. Reverse transcription (RT) in combination with quantitative PCR (qPCR) has become the method of choice to quantify small amounts of such RNA molecules. In parallel with the democratization of RT-qPCR and its increasing use in biomedical research or biomarker discovery, we witnessed a growth in the number of gene expression data analysis methods. Most of these methods are based on the principle that the position of the amplification curve with respect to the cycle-axis is a measure for the initial target quantity: the later the curve, the lower the target quantity. However, most methods differ in the mathematical algorithms used to determine this position, as well as in the way the efficiency of the PCR reaction (the fold increase of product per cycle) is determined and applied in the calculations. Moreover, there is dispute about whether the PCR efficiency is constant or continuously decreasing. Together this has lead to the development of different methods to analyze amplification curves. In published comparisons of these methods, available algorithms were typically applied in a restricted or outdated way, which does not do them justice. Therefore, we aimed at development of a framework for robust and unbiased assessment of curve analysis performance whereby various publicly available curve analysis methods were thoroughly compared using a previously published large clinical data set (Vermeulen et al., 2009) [11]. The original developers of these methods applied their algorithms and are co-author on this study. We assessed the curve analysis methods' impact on transcriptional biomarker identification in terms of expression level, statistical significance, and patient-classification accuracy. The concentration series per gene, together with data sets from unpublished technical performance experiments, were analyzed in order to assess the algorithms' precision, bias, and resolution. While large differences exist between methods when considering the technical performance experiments, most methods perform relatively well on the biomarker data. The data and the analysis results per method are made available to serve as benchmark for further development and evaluation of qPCR curve analysis methods (http://qPCRDataMethods.hfrc.nl). Copyright © 2012 Elsevier Inc. All rights reserved.
Preprocessing of gene expression data by optimally robust estimators
2010-01-01
Background The preprocessing of gene expression data obtained from several platforms routinely includes the aggregation of multiple raw signal intensities to one expression value. Examples are the computation of a single expression measure based on the perfect match (PM) and mismatch (MM) probes for the Affymetrix technology, the summarization of bead level values to bead summary values for the Illumina technology or the aggregation of replicated measurements in the case of other technologies including real-time quantitative polymerase chain reaction (RT-qPCR) platforms. The summarization of technical replicates is also performed in other "-omics" disciplines like proteomics or metabolomics. Preprocessing methods like MAS 5.0, Illumina's default summarization method, RMA, or VSN show that the use of robust estimators is widely accepted in gene expression analysis. However, the selection of robust methods seems to be mainly driven by their high breakdown point and not by efficiency. Results We describe how optimally robust radius-minimax (rmx) estimators, i.e. estimators that minimize an asymptotic maximum risk on shrinking neighborhoods about an ideal model, can be used for the aggregation of multiple raw signal intensities to one expression value for Affymetrix and Illumina data. With regard to the Affymetrix data, we have implemented an algorithm which is a variant of MAS 5.0. Using datasets from the literature and Monte-Carlo simulations we provide some reasoning for assuming approximate log-normal distributions of the raw signal intensities by means of the Kolmogorov distance, at least for the discussed datasets, and compare the results of our preprocessing algorithms with the results of Affymetrix's MAS 5.0 and Illumina's default method. The numerical results indicate that when using rmx estimators an accuracy improvement of about 10-20% is obtained compared to Affymetrix's MAS 5.0 and about 1-5% compared to Illumina's default method. The improvement is also visible in the analysis of technical replicates where the reproducibility of the values (in terms of Pearson and Spearman correlation) is increased for all Affymetrix and almost all Illumina examples considered. Our algorithms are implemented in the R package named RobLoxBioC which is publicly available via CRAN, The Comprehensive R Archive Network (http://cran.r-project.org/web/packages/RobLoxBioC/). Conclusions Optimally robust rmx estimators have a high breakdown point and are computationally feasible. They can lead to a considerable gain in efficiency for well-established bioinformatics procedures and thus, can increase the reproducibility and power of subsequent statistical analysis. PMID:21118506
Gene expression profiling of single cells on large-scale oligonucleotide arrays
Hartmann, Claudia H.; Klein, Christoph A.
2006-01-01
Over the last decade, important insights into the regulation of cellular responses to various stimuli were gained by global gene expression analyses of cell populations. More recently, specific cell functions and underlying regulatory networks of rare cells isolated from their natural environment moved to the center of attention. However, low cell numbers still hinder gene expression profiling of rare ex vivo material in biomedical research. Therefore, we developed a robust method for gene expression profiling of single cells on high-density oligonucleotide arrays with excellent coverage of low abundance transcripts. The protocol was extensively tested with freshly isolated single cells of very low mRNA content including single epithelial, mature and immature dendritic cells and hematopoietic stem cells. Quantitative PCR confirmed that the PCR-based global amplification method did not change the relative ratios of transcript abundance and unsupervised hierarchical cluster analysis revealed that the histogenetic origin of an individual cell is correctly reflected by the gene expression profile. Moreover, the gene expression data from dendritic cells demonstrate that cellular differentiation and pathway activation can be monitored in individual cells. PMID:17071717
Nagarajan, R; Hariharan, M; Satiyan, M
2012-08-01
Developing tools to assist physically disabled and immobilized people through facial expression is a challenging area of research and has attracted many researchers recently. In this paper, luminance stickers based facial expression recognition is proposed. Recognition of facial expression is carried out by employing Discrete Wavelet Transform (DWT) as a feature extraction method. Different wavelet families with their different orders (db1 to db20, Coif1 to Coif 5 and Sym2 to Sym8) are utilized to investigate their performance in recognizing facial expression and to evaluate their computational time. Standard deviation is computed for the coefficients of first level of wavelet decomposition for every order of wavelet family. This standard deviation is used to form a set of feature vectors for classification. In this study, conventional validation and cross validation are performed to evaluate the efficiency of the suggested feature vectors. Three different classifiers namely Artificial Neural Network (ANN), k-Nearest Neighborhood (kNN) and Linear Discriminant Analysis (LDA) are used to classify a set of eight facial expressions. The experimental results demonstrate that the proposed method gives very promising classification accuracies.
Person-independent facial expression analysis by fusing multiscale cell features
NASA Astrophysics Data System (ADS)
Zhou, Lubing; Wang, Han
2013-03-01
Automatic facial expression recognition is an interesting and challenging task. To achieve satisfactory accuracy, deriving a robust facial representation is especially important. A novel appearance-based feature, the multiscale cell local intensity increasing patterns (MC-LIIP), to represent facial images and conduct person-independent facial expression analysis is presented. The LIIP uses a decimal number to encode the texture or intensity distribution around each pixel via pixel-to-pixel intensity comparison. To boost noise resistance, MC-LIIP carries out comparison computation on the average values of scalable cells instead of individual pixels. The facial descriptor fuses region-based histograms of MC-LIIP features from various scales, so as to encode not only textural microstructures but also the macrostructures of facial images. Finally, a support vector machine classifier is applied for expression recognition. Experimental results on the CK+ and Karolinska directed emotional faces databases show the superiority of the proposed method.
Wang, Yonghong; Yang, Xukui; Yang, Yuanyuan; Wang, Wenjun; Zhao, Meiling; Liu, Huiqiang; Li, Dongyan; Hao, Min
2016-01-01
Objective: To identify the specific microRNA (miRNA) biomarkers of preeclampsia (PE), the miRNA profiles analysis were performed. Study Design: The blood samples were obtained from five PE patients and five normal healthy pregnant women. The small RNA profiles were analyzed to identify miRNA expression levels and find out miRNAs that may associate with PE. The quantitative reverse transcriptase–PCR (qRT-PCR) assay was used to validate differentially expressed peripheral leucocyte miRNAs in a new cohort. Result: The data analysis showed that 10 peripheral leucocyte miRNAs were significantly differently expressed in severe PE patients. Four differently expressed miRNAs were successfully validated using qRT-PCR method. Conclusion: We successfully constructed a model with high accuracy to predict PE. A combination of four peripheral leucocyte miRNAs has great potential to serve as diagnostic biomarkers of PE. PMID:26675000
Kent, Clement; Azanchi, Reza; Smith, Ben; Chu, Adrienne; Levine, Joel
2007-01-01
Drosophila Cuticular Hydrocarbons (CH) influence courtship behaviour, mating, aggregation, oviposition, and resistance to desiccation. We measured levels of 24 different CH compounds of individual male D. melanogaster hourly under a variety of environmental (LD/DD) conditions. Using a model-based analysis of CH variation, we developed an improved normalization method for CH data, and show that CH compounds have reproducible cyclic within-day temporal patterns of expression which differ between LD and DD conditions. Multivariate clustering of expression patterns identified 5 clusters of co-expressed compounds with common chemical characteristics. Turnover rate estimates suggest CH production may be a significant metabolic cost. Male cuticular hydrocarbon expression is a dynamic trait influenced by light and time of day; since abundant hydrocarbons affect male sexual behavior, males may present different pheromonal profiles at different times and under different conditions. PMID:17896002
Huet, S; Marie, J P; Gualde, N; Robert, J
1998-12-15
Multidrug resistance (MDR) associated with overexpression of the MDR1 gene and of its product, P-glycoprotein (Pgp), plays an important role in limiting cancer treatment efficacy. Many studies have investigated Pgp expression in clinical samples of hematological malignancies but failed to give definitive conclusion on its usefulness. One convenient method for fluorescent detection of Pgp in malignant cells is flow cytometry which however gives variable results from a laboratory to another one, partly due to the lack of a reference method rigorously tested. The purpose of this technical note is to describe each step of a reference flow cytometric method. The guidelines for sample handling, staining and analysis have been established both for Pgp detection with monoclonal antibodies directed against extracellular epitopes (MRK16, UIC2 and 4E3), and for Pgp functional activity measurement with Rhodamine 123 as a fluorescent probe. Both methods have been validated on cultured cell lines and clinical samples by 12 laboratories of the French Drug Resistance Network. This cross-validated multicentric study points out crucial steps for the accuracy and reproducibility of the results, like cell viability, data analysis and expression.
Expression Profile of Long Noncoding RNAs in Human Earlobe Keloids: A Microarray Analysis
Guo, Liang; Xu, Kai; Yan, Hongbo; Feng, Haifeng
2016-01-01
Background. Long noncoding RNAs (lncRNAs) play key roles in a wide range of biological processes and their deregulation results in human disease, including keloids. Earlobe keloid is a type of pathological skin scar, and the molecular pathogenesis of this disease remains largely unknown. Methods. In this study, microarray analysis was used to determine the expression profiles of lncRNAs and mRNAs between 3 pairs of earlobe keloid and normal specimens. Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to identify the main functions of the differentially expressed genes and earlobe keloid-related pathways. Results. A total of 2068 lncRNAs and 1511 mRNAs were differentially expressed between earlobe keloid and normal tissues. Among them, 1290 lncRNAs and 1092 mRNAs were upregulated, and 778 lncRNAs and 419 mRNAs were downregulated. Pathway analysis revealed that 24 pathways were correlated to the upregulated transcripts, while 11 pathways were associated with the downregulated transcripts. Conclusion. We characterized the expression profiles of lncRNA and mRNA in earlobe keloids and suggest that lncRNAs may serve as diagnostic biomarkers for the therapy of earlobe keloid. PMID:28101509
Tascilar, Oge; Cakmak, Güldeniz Karadeniz; Tekin, Ishak Ozel; Emre, Ali Ugur; Ucan, Bulent Hamdi; Irkorucu, Oktay; Karakaya, Kemal; Gül, Mesut; Engin, Hüseyin Bülent; Comert, Mustafa
2007-01-01
AIM: To evaluate the frequency of neural cell adhesion molecule (NCAM)-180 expression in fresh tumor tissue samples and to discuss the prognostic value of NCAM-180 in routine clinical practice. METHODS: Twenty-six patients (16 men, 10 women) with colorectal cancer were included in the study. Fresh tumor tissue samples and macroscopically healthy proximal margins of each specimen were subjected to flow-cytometric analysis for NCAM-180 expression. RESULTS: Flow-cytometric analysis determined NCAM-180 expression in whole tissue samples of macroscopically healthy colorectal tissues. However, NCAM-180 expression was positive in only one case (3.84%) with well-differentiated Stage II disease who experienced no active disease at 30 mon follow-up. CONCLUSION: As a consequence of the limited number of cases in our series, it might not be possible to make a generalisation, nevertheless the routine use of NCAM-180 expression as a prognostic marker for colorectal carcinoma seems to be unfeasible and not cost-effective in clinical practice due to its very low incidence. PMID:17907291
Toll like receptors gene expression of human keratinocytes cultured of severe burn injury.
Cornick, Sarita Mac; Noronha, Silvana Aparecida Alves Corrêa de; Noronha, Samuel Marcos Ribeiro de; Cezillo, Marcus V B; Ferreira, Lydia Masako; Gragnani, Alfredo
2014-01-01
To evaluate the expression profile of genes related to Toll Like Receptors (TLR) pathways of human Primary Epidermal keratinocytes of patients with severe burns. After obtaining viable fragments of skin with and without burning, culture hKEP was initiated by the enzymatic method using Dispase (Sigma-Aldrich). These cells were treated with Trizol(r) (Life Technologies) for extraction of total RNA. This was quantified and analyzed for purity for obtaining cDNA for the analysis of gene expression using specific TLR pathways PCR Arrays plates (SA Biosciences). After the analysis of gene expression we found that 21% of these genes were differentially expressed, of which 100% were repressed or hyporegulated. Among these, the following genes (fold decrease): HSPA1A (-58), HRAS (-36), MAP2K3 (-23), TOLLIP (-23), RELA (-18), FOS (-16), and TLR1 (-6.0). This study contributes to the understanding of the molecular mechanisms related to TLR pathways and underlying wound infection caused by the burn. Furthermore, it may provide new strategies to restore normal expression of these genes and thereby change the healing process and improve clinical outcome.
Chen, Si; Weddell, Jared; Gupta, Pavan; Conard, Grace; Parkin, James; Imoukhuede, Princess I
2017-01-01
Nanosensor-based detection of biomarkers can improve medical diagnosis; however, a critical factor in nanosensor development is deciding which biomarker to target, as most diseases present several biomarkers. Biomarker-targeting decisions can be informed via an understanding of biomarker expression. Currently, immunohistochemistry (IHC) is the accepted standard for profiling biomarker expression. While IHC provides a relative mapping of biomarker expression, it does not provide cell-by-cell readouts of biomarker expression or absolute biomarker quantification. Flow cytometry overcomes both these IHC challenges by offering biomarker expression on a cell-by-cell basis, and when combined with calibration standards, providing quantitation of biomarker concentrations: this is known as qFlow cytometry. Here, we outline the key components for applying qFlow cytometry to detect biomarkers within the angiogenic vascular endothelial growth factor receptor family. The key aspects of the qFlow cytometry methodology include: antibody specificity testing, immunofluorescent cell labeling, saturation analysis, fluorescent microsphere calibration, and quantitative analysis of both ensemble and cell-by-cell data. Together, these methods enable high-throughput quantification of biomarker expression.
The prognostic implications of growth-related gene product β in laryngeal squamous cell carcinoma.
Tang, Mingming; Xu, Xinjiang; Chen, Juanjuan; Huang, Jiangfei; Jiang, Bin; Han, Liang
2017-09-01
Growth-related gene product β (GROβ) is an angiogenic chemokine that belongs to the CXC chemokine family, and a number of studies have suggested that GROβ is associated with tumor development and progression. However, a number of studies have investigated the association between GROβ expression and the clinical attributes of laryngeal squamous cell carcinoma (LSCC). In the present study, one-step quantitative polymerase chain reaction and immunohistochemistry analysis were used to detect GROβ expression and evaluate the association between its expression and the clinicopathological characteristics of LSCC. The results demonstrated that the GROβ mRNA and protein expression levels were significantly increased in LSCC compared with the corresponding non-cancerous tissues. GROβ protein expression in LSCC was associated with tumor-node-metastasis stage, lymph node metastasis and histopathological grade. The Kaplan-Meier method and Cox multi-factor analysis indicated that high GROβ expression, lymph node metastasis and histopathological grade were significantly associated with poor survival of patients with LSCC. These data indicated that GROβ may be a novel prognostic biomarker of LSCC.
Determination of absolute expression profiles using multiplexed miRNA analysis
Song, Jee Hoon; Cheng, Yulan; Saeui, Christopher T.; Cheung, Douglas G.; Croce, Carlo M.; Yarema, Kevin J.; Meltzer, Stephen J.; Liu, Kelvin J.; Wang, Tza-Huei
2017-01-01
Accurate measurement of miRNA expression is critical to understanding their role in gene expression as well as their application as disease biomarkers. Correct identification of changes in miRNA expression rests on reliable normalization to account for biological and technological variance between samples. Ligo-miR is a multiplex assay designed to rapidly measure absolute miRNA copy numbers, thus reducing dependence on biological controls. It uses a simple 2-step ligation process to generate length coded products that can be quantified using a variety of DNA sizing methods. We demonstrate Ligo-miR’s ability to quantify miRNA expression down to 20 copies per cell sensitivity, accurately discriminate between closely related miRNA, and reliably measure differential changes as small as 1.2-fold. Then, benchmarking studies were performed to show the high correlation between Ligo-miR, microarray, and TaqMan qRT-PCR. Finally, Ligo-miR was used to determine copy number profiles in a number of breast, esophageal, and pancreatic cell lines and to demonstrate the utility of copy number analysis for providing layered insight into expression profile changes. PMID:28704432
Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara
2017-01-01
Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes (CPQ, PLVAP, TFF3, ACVRL1, ZFYVE21, FAM189A2, and CLEC3B). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes (CPQ, PLVAP, TFF3, ACVRL1). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material. PMID:28574441
Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara
2017-06-02
Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes ( CPQ , PLVAP , TFF3 , ACVRL1 , ZFYVE21 , FAM189A2 , and CLEC3B ). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes ( CPQ , PLVAP , TFF3 , ACVRL1 ). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material.
Palumbo, Maria Concetta; Zenoni, Sara; Fasoli, Marianna; Massonnet, Mélanie; Farina, Lorenzo; Castiglione, Filippo; Pezzotti, Mario; Paci, Paola
2014-12-01
We developed an approach that integrates different network-based methods to analyze the correlation network arising from large-scale gene expression data. By studying grapevine (Vitis vinifera) and tomato (Solanum lycopersicum) gene expression atlases and a grapevine berry transcriptomic data set during the transition from immature to mature growth, we identified a category named "fight-club hubs" characterized by a marked negative correlation with the expression profiles of neighboring genes in the network. A special subset named "switch genes" was identified, with the additional property of many significant negative correlations outside their own group in the network. Switch genes are involved in multiple processes and include transcription factors that may be considered master regulators of the previously reported transcriptome remodeling that marks the developmental shift from immature to mature growth. All switch genes, expressed at low levels in vegetative/green tissues, showed a significant increase in mature/woody organs, suggesting a potential regulatory role during the developmental transition. Finally, our analysis of tomato gene expression data sets showed that wild-type switch genes are downregulated in ripening-deficient mutants. The identification of known master regulators of tomato fruit maturation suggests our method is suitable for the detection of key regulators of organ development in different fleshy fruit crops. © 2014 American Society of Plant Biologists. All rights reserved.
Palumbo, Maria Concetta; Zenoni, Sara; Fasoli, Marianna; Massonnet, Mélanie; Farina, Lorenzo; Castiglione, Filippo; Pezzotti, Mario; Paci, Paola
2014-01-01
We developed an approach that integrates different network-based methods to analyze the correlation network arising from large-scale gene expression data. By studying grapevine (Vitis vinifera) and tomato (Solanum lycopersicum) gene expression atlases and a grapevine berry transcriptomic data set during the transition from immature to mature growth, we identified a category named “fight-club hubs” characterized by a marked negative correlation with the expression profiles of neighboring genes in the network. A special subset named “switch genes” was identified, with the additional property of many significant negative correlations outside their own group in the network. Switch genes are involved in multiple processes and include transcription factors that may be considered master regulators of the previously reported transcriptome remodeling that marks the developmental shift from immature to mature growth. All switch genes, expressed at low levels in vegetative/green tissues, showed a significant increase in mature/woody organs, suggesting a potential regulatory role during the developmental transition. Finally, our analysis of tomato gene expression data sets showed that wild-type switch genes are downregulated in ripening-deficient mutants. The identification of known master regulators of tomato fruit maturation suggests our method is suitable for the detection of key regulators of organ development in different fleshy fruit crops. PMID:25490918
2017-01-01
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
In vivo Magnetic Resonance Imaging of Tumor Protease Activity
Haris, Mohammad; Singh, Anup; Mohammed, Imran; Ittyerah, Ranjit; Nath, Kavindra; Nanga, Ravi Prakash Reddy; Debrosse, Catherine; Kogan, Feliks; Cai, Kejia; Poptani, Harish; Reddy, Damodar; Hariharan, Hari; Reddy, Ravinder
2014-01-01
Increased expression of cathepsins has diagnostic as well as prognostic value in several types of cancer. Here, we demonstrate a novel magnetic resonance imaging (MRI) method, which uses poly-L-glutamate (PLG) as an MRI probe to map cathepsin expression in vivo, in a rat brain tumor model. This noninvasive, high-resolution and non-radioactive method exploits the differences in the CEST signals of PLG in the native form and cathepsin mediated cleaved form. The method was validated in phantoms with known physiological concentrations, in tumor cells and in an animal model of brain tumor along with immunohistochemical analysis. Potential applications in tumor diagnosis and evaluation of therapeutic response are outlined. PMID:25124082
Correspondence regarding Zhong et al., BMC Bioinformatics 2013 Mar 7;14:89.
Kuhn, Alexandre
2014-11-28
Computational expression deconvolution aims to estimate the contribution of individual cell populations to expression profiles measured in samples of heterogeneous composition. Zhong et al. recently proposed Digital Sorting Algorithm (BMC Bioinformatics 2013 Mar 7;14:89) and showed that they could accurately estimate population-specific expression levels and expression differences between two populations. They compared DSA with Population-Specific Expression Analysis (PSEA), a previous deconvolution method that we developed to detect expression changes occurring within the same population between two conditions (e.g. disease versus non-disease). However, Zhong et al. compared PSEA-derived specific expression levels across different cell populations. Specific expression levels obtained with PSEA cannot be directly compared across different populations as they are on a relative scale. They are accurate as we demonstrate by deconvolving the same dataset used by Zhong et al. and, importantly, allow for comparison of population-specific expression across conditions.
Gene expression variability in human hepatic drug metabolizing enzymes and transporters.
Yang, Lun; Price, Elvin T; Chang, Ching-Wei; Li, Yan; Huang, Ying; Guo, Li-Wu; Guo, Yongli; Kaput, Jim; Shi, Leming; Ning, Baitang
2013-01-01
Interindividual variability in the expression of drug-metabolizing enzymes and transporters (DMETs) in human liver may contribute to interindividual differences in drug efficacy and adverse reactions. Published studies that analyzed variability in the expression of DMET genes were limited by sample sizes and the number of genes profiled. We systematically analyzed the expression of 374 DMETs from a microarray data set consisting of gene expression profiles derived from 427 human liver samples. The standard deviation of interindividual expression for DMET genes was much higher than that for non-DMET genes. The 20 DMET genes with the largest variability in the expression provided examples of the interindividual variation. Gene expression data were also analyzed using network analysis methods, which delineates the similarities of biological functionalities and regulation mechanisms for these highly variable DMET genes. Expression variability of human hepatic DMET genes may affect drug-gene interactions and disease susceptibility, with concomitant clinical implications.
Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco
2008-01-01
Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways. PMID:18764936
Fu, Shijie; Pan, Xufeng; Fang, Wentao
2014-08-01
Lung cancer severely reduces the quality of life worldwide and causes high socioeconomic burdens. However, key genes leading to the generation of pulmonary adenocarcinoma remain elusive despite intensive research efforts. The present study aimed to identify the potential associations between transcription factors (TFs) and differentially co‑expressed genes (DCGs) in the regulation of transcription in pulmonary adenocarcinoma. Gene expression profiles of pulmonary adenocarcinoma were downloaded from the Gene Expression Omnibus, and gene expression was analyzed using a computational method. A total of 37,094 differentially co‑expressed links (DCLs) and 251 DCGs were identified, which were significantly enriched in 10 pathways. The construction of the regulatory network and the analysis of the regulatory impact factors revealed eight crucial TFs in the regulatory network. These TFs regulated the expression of DCGs by promoting or inhibiting their expression. In addition, certain TFs and target genes associated with DCGs did not appear in the DCLs, which indicated that those TFs could be synergistic with other factors. This is likely to provide novel insights for research into pulmonary adenocarcinoma. In conclusion, the present study may enhance the understanding of disease mechanisms and lead to an improved diagnosis of lung cancer. However, further studies are required to confirm these observations.
Bikel, Shirley; Jacobo-Albavera, Leonor; Sánchez-Muñoz, Fausto; Cornejo-Granados, Fernanda; Canizales-Quinteros, Samuel; Soberón, Xavier; Sotelo-Mundo, Rogerio R; Del Río-Navarro, Blanca E; Mendoza-Vargas, Alfredo; Sánchez, Filiberto; Ochoa-Leyva, Adrian
2017-01-01
In spite of the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under a specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. We extracted the RNA from 16 children leukocyte samples (nine males and seven females, ages 6-10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and confirmed their expression level by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEGG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments.
Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B
2005-04-06
Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.
Kim, Bo-Bae; Kim, Minji; Park, Yun-Hee; Ko, Youngkyung; Park, Jun-Beom
2017-06-01
Objective Next-generation sequencing was performed to evaluate the effects of short-term application of dexamethasone on human gingiva-derived mesenchymal stem cells. Methods Human gingiva-derived stem cells were treated with a final concentration of 10 -7 M dexamethasone and the same concentration of vehicle control. This was followed by mRNA sequencing and data analysis, gene ontology and pathway analysis, quantitative real-time polymerase chain reaction of mRNA, and western blot analysis of RUNX2 and β-catenin. Results In total, 26,364 mRNAs were differentially expressed. Comparison of the results of dexamethasone versus control at 2 hours revealed that 7 mRNAs were upregulated and 25 mRNAs were downregulated. The application of dexamethasone reduced the expression of RUNX2 and β-catenin in human gingiva-derived mesenchymal stem cells. Conclusion The effects of dexamethasone on stem cells were evaluated with mRNA sequencing, and validation of the expression was performed with qualitative real-time polymerase chain reaction and western blot analysis. The results of this study can provide new insights into the role of mRNA sequencing in maxillofacial areas.
A statistical method for the conservative adjustment of false discovery rate (q-value).
Lai, Yinglei
2017-03-14
q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation. We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method. The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak.
Li, Meng-Yao; Song, Xiong; Wang, Feng; Xiong, Ai-Sheng
2016-01-01
Parsley, one of the most important vegetables in the Apiaceae family, is widely used in the food, medicinal, and cosmetic industries. Recent studies on parsley mainly focus on its chemical composition, and further research involving the analysis of the plant's gene functions and expressions is required. qPCR is a powerful method for detecting very low quantities of target transcript levels and is widely used to study gene expression. To ensure the accuracy of results, a suitable reference gene is necessary for expression normalization. In this study, four software, namely geNorm, NormFinder, BestKeeper, and RefFinder were used to evaluate the expression stabilities of eight candidate reference genes of parsley (GAPDH, ACTIN, eIF-4α, SAND, UBC, TIP41, EF-1α, and TUB) under various conditions, including abiotic stresses (heat, cold, salt, and drought) and hormone stimuli treatments (GA, SA, MeJA, and ABA). Results showed that EF-1α and TUB were the most stable genes for abiotic stresses, whereas EF-1α, GAPDH, and TUB were the top three choices for hormone stimuli treatments. Moreover, EF-1α and TUB were the most stable reference genes among all tested samples, and UBC was the least stable one. Expression analysis of PcDREB1 and PcDREB2 further verified that the selected stable reference genes were suitable for gene expression normalization. This study can guide the selection of suitable reference genes in gene expression in parsley. PMID:27746803
Li, Meng-Yao; Song, Xiong; Wang, Feng; Xiong, Ai-Sheng
2016-01-01
Parsley, one of the most important vegetables in the Apiaceae family, is widely used in the food, medicinal, and cosmetic industries. Recent studies on parsley mainly focus on its chemical composition, and further research involving the analysis of the plant's gene functions and expressions is required. qPCR is a powerful method for detecting very low quantities of target transcript levels and is widely used to study gene expression. To ensure the accuracy of results, a suitable reference gene is necessary for expression normalization. In this study, four software, namely geNorm, NormFinder, BestKeeper, and RefFinder were used to evaluate the expression stabilities of eight candidate reference genes of parsley ( GAPDH, ACTIN, eIF-4 α, SAND, UBC, TIP41, EF-1 α, and TUB ) under various conditions, including abiotic stresses (heat, cold, salt, and drought) and hormone stimuli treatments (GA, SA, MeJA, and ABA). Results showed that EF-1 α and TUB were the most stable genes for abiotic stresses, whereas EF-1 α, GAPDH , and TUB were the top three choices for hormone stimuli treatments. Moreover, EF-1 α and TUB were the most stable reference genes among all tested samples, and UBC was the least stable one. Expression analysis of PcDREB1 and PcDREB2 further verified that the selected stable reference genes were suitable for gene expression normalization. This study can guide the selection of suitable reference genes in gene expression in parsley.
Parallel gene analysis with allele-specific padlock probes and tag microarrays
Banér, Johan; Isaksson, Anders; Waldenström, Erik; Jarvius, Jonas; Landegren, Ulf; Nilsson, Mats
2003-01-01
Parallel, highly specific analysis methods are required to take advantage of the extensive information about DNA sequence variation and of expressed sequences. We present a scalable laboratory technique suitable to analyze numerous target sequences in multiplexed assays. Sets of padlock probes were applied to analyze single nucleotide variation directly in total genomic DNA or cDNA for parallel genotyping or gene expression analysis. All reacted probes were then co-amplified and identified by hybridization to a standard tag oligonucleotide array. The technique was illustrated by analyzing normal and pathogenic variation within the Wilson disease-related ATP7B gene, both at the level of DNA and RNA, using allele-specific padlock probes. PMID:12930977
NASA Astrophysics Data System (ADS)
Moloshnikov, I. A.; Sboev, A. G.; Rybka, R. B.; Gydovskikh, D. V.
2016-02-01
The composite algorithm integrating, on one hand, the algorithm of finding documents on a given topic, and, on the other hand, the method of emotiveness evaluation of topical texts is presented. This method is convenient for analysis of people opinions expressed in social media and, as a result, for automated analysis of event evolutions in social media. Some examples of such analysing are demonstrated and discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rades, Dirk, E-mail: Rades.Dirk@gmx.ne; Setter, Cornelia; Dahl, Olav
Purpose: Prognostic factors can guide the physician in selecting the optimal treatment for an individual patient. This study investigates the prognostic value of erythropoietin (EPO) and EPO receptor (EPO-R) expression of tumor cells for locoregional control and survival in non-small-cell lung cancer (NSCLC) patients. Methods and Materials: Fourteen factors were investigated in 62 patients irradiated for stage II/III NSCLC, as follows: age, gender, Karnofsky performance score (KPS), histology, grading, TNM/American Joint Committee on Cancer (AJCC) stage, surgery, chemotherapy, pack years (average number of packages of cigarettes smoked per day multiplied by the number of years smoked), smoking during radiotherapy, hemoglobinmore » levels during radiotherapy, EPO expression, and EPO-R expression. Additionally, patients with tumors expressing both EPO and EPO-R were compared to those expressing either EPO or EPO-R and to those expressing neither EPO nor EPO-R. Results: On univariate analysis, improved locoregional control was associated with AJCC stage II cancer (p < 0.048), surgery (p < 0.042), no smoking during radiotherapy (p = 0.024), and no EPO expression (p = 0.001). A trend was observed for a KPS of >70 (p = 0.08), an N stage of 0 to 1 (p = 0.07), and no EPO-R expression (p = 0.10). On multivariate analysis, AJCC stage II and no EPO expression remained significant. No smoking during radiotherapy was almost significant. On univariate analysis, improved survival was associated with N stage 0 to 1 (p = 0.009), surgery (p = 0.039), hemoglobin levels of {>=}12 g/d (p = 0.016), and no EPO expression (p = 0.001). On multivariate analysis, N stage 0 to 1 and no EPO expression maintained significance. Hemoglobin levels of {>=}12 g/d were almost significant. On subgroup analyses, patients with tumors expressing both EPO and EPO-R had worse outcomes than those expressing either EPO or EPO-R and those expressing neither EPO nor RPO-R. Conclusions: EPO expression of tumor cells was an independent prognostic factor for locoregional control and survival in patients irradiated for NSCLC. EPO-R expression showed a trend. Patients with tumors expressing both EPO and EPO-R have an unfavorable prognosis.« less
In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pandi, Narayanan Sathiya, E-mail: sathiyapandi@gmail.com; Suganya, Sivagurunathan; Rajendran, Suriliyandi
Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However,more » the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.« less
Korkolopoulou, P; Levidou, G; El-Habr, E A; Adamopoulos, C; Fragkou, P; Boviatsis, E; Themistocleous, M S; Petraki, K; Vrettakos, G; Sakalidou, M; Samaras, V; Zisakis, A; Saetta, A; Chatziandreou, I; Patsouris, E; Piperi, C
2013-01-01
Background: Sox11 is a transcription factor expressed in foetal and neoplastic brain tissue, including gliomas. It has been shown to suppress the tumourigenicity of glioma stem cells in vivo, thereby being hypothesised to function as a tumour suppressor. Methods: We investigated the expression of Sox11 in 132 diffuse astrocytomas in relation to the regulator cell marker nestin, c-Met and IDH1-R132H, which have shown to be differentially expressed among the molecular subgroups of malignant gliomas, as well as to an inducer of astrocytic differentiation, that is, signal transducer and activator of transcription (p-STAT-3), clinicopathological features and survival. Results: Sox11 immunoreactivity was identified in all tumours irrespective of grade, but being correlated with p-STAT-3. Three out of seven cases showed partial Sox11 promoter methylation. In >50% of our cases neoplastic cells coexpressed Sox11 and nestin, a finding further confirmed in primary glioblastoma cell cultures. Furthermore, nestin, c-Met and IDH1-R132H expression differed among grade categories. Cluster analysis identified four groups of patients according to c-Met, nestin and IDH1-R132H expression. The c-Met/nestin high-expressor group displayed a higher Sox11 expression. Sox11 expression was an indicator of favourable prognosis in glioblastomas, which remained in multivariate analysis and validated in an independent set of 72 cases. The c-Met/nestin high-expressor group was marginally with shorter survival in univariate analysis. Conclusions: We highlight the importance of Sox11 expression as a favourable prognosticator in glioblastomas. c-Met/nestin/IDH1-R132H expression phenotypes recapitulate the molecular subgroups of malignant glioma. PMID:23619925
Han, Chao-Dong; Ge, Wen-Sheng
2016-11-01
BACKGROUND The angiotensin-converting enzyme (ACE, CD143) gene plays a crucial role in the pathology of many cancers. Previous studies mostly focused on the gene polymorphism, but the other functions of ACE have rarely been reported. The purpose of this study was to investigate the expression of ACE and its biological function, as well as its prognostic value, in laryngeal cancer. MATERIAL AND METHODS The expression of ACE was detected by quantitative real-time polymerase chain reaction (qRT-PCR) analysis in 106 patients with laryngeal cancer and 85 healthy people. Then the cell proliferation was estimated after the cell lines Hep-2 were transfected with pGL3-ACE and empty vector, respectively. In addition, the relationship between ACE expression and clinicopathologic characteristics was analyzed. Finally, Kaplan-Meier analysis was used to evaluate the overall survival of patients with different ACE expression, while Cox regression analysis was conducted to reveal the prognostic value of ACE in laryngeal cancer. RESULTS Our results demonstrate that ACE is over-expressed in laryngeal cancer and thus promotes cell proliferation. The up-regulation of ACE was significantly influenced by tumor stage and lymph node metastasis. Patients with high ACE expression had a shorter overall survival compared with those with low ACE expression according to Kaplan-Meier analysis. The ACE gene was also found to be an important factor in the prognosis of laryngeal cancer. CONCLUSIONS Our study shows that the ACE gene was up-regulated, which promoted the cell proliferation, and it could be an independent prognostic marker in laryngeal cancer.
Computerized image analysis for quantitative neuronal phenotyping in zebrafish.
Liu, Tianming; Lu, Jianfeng; Wang, Ye; Campbell, William A; Huang, Ling; Zhu, Jinmin; Xia, Weiming; Wong, Stephen T C
2006-06-15
An integrated microscope image analysis pipeline is developed for automatic analysis and quantification of phenotypes in zebrafish with altered expression of Alzheimer's disease (AD)-linked genes. We hypothesize that a slight impairment of neuronal integrity in a large number of zebrafish carrying the mutant genotype can be detected through the computerized image analysis method. Key functionalities of our zebrafish image processing pipeline include quantification of neuron loss in zebrafish embryos due to knockdown of AD-linked genes, automatic detection of defective somites, and quantitative measurement of gene expression levels in zebrafish with altered expression of AD-linked genes or treatment with a chemical compound. These quantitative measurements enable the archival of analyzed results and relevant meta-data. The structured database is organized for statistical analysis and data modeling to better understand neuronal integrity and phenotypic changes of zebrafish under different perturbations. Our results show that the computerized analysis is comparable to manual counting with equivalent accuracy and improved efficacy and consistency. Development of such an automated data analysis pipeline represents a significant step forward to achieve accurate and reproducible quantification of neuronal phenotypes in large scale or high-throughput zebrafish imaging studies.
Chan, Dessy; Tsoi, Miriam Yuen-Tung; Liu, Christina Di; Chan, Sau-Hing; Law, Simon Ying-Kit; Chan, Kwok-Wah; Chan, Yuen-Piu; Gopalan, Vinod; Lam, Alfred King-Yin; Tang, Johnny Cheuk-On
2013-01-01
AIM: To identify the downstream regulated genes of GAEC1 oncogene in esophageal squamous cell carcinoma and their clinicopathological significance. METHODS: The anti-proliferative effect of knocking down the expression of GAEC1 oncogene was studied by using the RNA interference (RNAi) approach through transfecting the GAEC1-overexpressed esophageal carcinoma cell line KYSE150 with the pSilencer vector cloned with a GAEC1-targeted sequence, followed by MTS cell proliferation assay and cell cycle analysis using flow cytometry. RNA was then extracted from the parental, pSilencer-GAEC1-targeted sequence transfected and pSilencer negative control vector transfected KYSE150 cells for further analysis of different patterns in gene expression. Genes differentially expressed with suppressed GAEC1 expression were then determined using Human Genome U133 Plus 2.0 cDNA microarray analysis by comparing with the parental cells and normalized with the pSilencer negative control vector transfected cells. The most prominently regulated genes were then studied by immunohistochemical staining using tissue microarrays to determine their clinicopathological correlations in esophageal squamous cell carcinoma by statistical analyses. RESULTS: The RNAi approach of knocking down gene expression showed the effective suppression of GAEC1 expression in esophageal squamous cell carcinoma cell line KYSE150 that resulted in the inhibition of cell proliferation and increase of apoptotic population. cDNA microarray analysis for identifying differentially expressed genes detected the greatest levels of downregulation of calpain 10 (CAPN10) and upregulation of trinucleotide repeat containing 6C (TNRC6C) transcripts when GAEC1 expression was suppressed. At the tissue level, the high level expression of calpain 10 protein was significantly associated with longer patient survival (month) of esophageal squamous cell carcinoma compared to the patients with low level of calpain 10 expression (37.73 ± 16.33 vs 12.62 ± 12.44, P = 0.032). No significant correction was observed among the TNRC6C protein expression level and the clinocopathologcial features of esophageal squamous cell carcinoma. CONCLUSION: GAEC1 regulates the expression of CAPN10 and TNRC6C downstream. Calpain 10 expression is a potential prognostic marker in patients with esophageal squamous cell carcinoma. PMID:23687414
Non-biased and efficient global amplification of a single-cell cDNA library
Huang, Huan; Goto, Mari; Tsunoda, Hiroyuki; Sun, Lizhou; Taniguchi, Kiyomi; Matsunaga, Hiroko; Kambara, Hideki
2014-01-01
Analysis of single-cell gene expression promises a more precise understanding of molecular mechanisms of a living system. Most techniques only allow studies of the expressions for limited numbers of gene species. When amplification of cDNA was carried out for analysing more genes, amplification biases were frequently reported. A non-biased and efficient global-amplification method, which uses a single-cell cDNA library immobilized on beads, was developed for analysing entire gene expressions for single cells. Every step in this analysis from reverse transcription to cDNA amplification was optimized. By removing degrading excess primers, the bias due to the digestion of cDNA was prevented. Since the residual reagents, which affect the efficiency of each subsequent reaction, could be removed by washing beads, the conditions for uniform and maximized amplification of cDNAs were achieved. The differences in the amplification rates for randomly selected eight genes were within 1.5-folds, which could be negligible for most of the applications of single-cell analysis. The global amplification gives a large amount of amplified cDNA (>100 μg) from a single cell (2-pg mRNA), and that amount is enough for downstream analysis. The proposed global-amplification method was used to analyse transcript ratios of multiple cDNA targets (from several copies to several thousand copies) quantitatively. PMID:24141095
Differential co-expression analysis of rheumatoid arthritis with microarray data.
Wang, Kunpeng; Zhao, Liqiang; Liu, Xuefeng; Hao, Zhenyong; Zhou, Yong; Yang, Chuandong; Li, Hongqiang
2014-11-01
The aim of the present study was to investigate the underlying molecular mechanisms of rheumatoid arthritis (RA) using microarray expression profiles from osteoarthritis and RA patients, to improve diagnosis and treatment strategies for the condition. The gene expression profile of GSE27390 was downloaded from Gene Expression Omnibus, including 19 samples from patients with RA (n=9) or osteoarthritis (n=10). Firstly, the differentially expressed genes (DEGs) were obtained with the thresholds of |logFC|>1.0 and P<0.05, using the t‑test method in LIMMA package. Then, differentially co-expressed genes (DCGs) and differentially co-expressed links (DCLs) were screened with q<0.25 by the differential coexpression analysis and differential regulation analysis of gene expression microarray data package. Secondly, pathway enrichment analysis for DCGs was performed by the Database for Annotation, Visualization and Integrated Discovery and the DCLs associated with RA were selected by comparing the obtained DCLs with known transcription factor (TF)-targets in the TRANSFAC database. Finally, the obtained TFs were mapped to the known TF-targets to construct the network using cytoscape software. A total of 1755 DEGs, 457 DCGs and 101988 DCLs were achieved and there were 20 TFs in the obtained six TF-target relations (STAT3-TNF, PBX1‑PLAU, SOCS3-STAT3, GATA1-ETS2, ETS1-ICAM4 and CEBPE‑GATA1) and 457 DCGs. A number of TF-target relations in the constructed network were not within DCLs when the TF and target gene were DCGs. The identified TFs may have an important role in the pathogenesis of RA and have the potential to be used as biomarkers for the development of novel diagnostic and therapeutic strategies for RA.
2011-01-01
Introduction Microtubule associated proteins (MAPs) endogenously regulate microtubule stabilization and have been reported as prognostic and predictive markers for taxane response. The microtubule stabilizer, MAP-tau, has shown conflicting results. We quantitatively assessed MAP-tau expression in two independent breast cancer cohorts to determine prognostic and predictive value of this biomarker. Methods MAP-tau expression was evaluated in the retrospective Yale University breast cancer cohort (n = 651) using tissue microarrays and also in the TAX 307 cohort, a clinical trial randomized for TAC versus FAC chemotherapy (n = 140), using conventional whole tissue sections. Expression was measured using the AQUA method for quantitative immunofluorescence. Scores were correlated with clinicopathologic variables, survival, and response to therapy. Results Assessment of the Yale cohort using Cox univariate analysis indicated an improved overall survival (OS) in tumors with a positive correlation between high MAP-tau expression and overall survival (OS) (HR = 0.691, 95% CI = 0.489-0.974; P = 0.004). Kaplan Meier analysis showed 10-year survival for 65% of patients with high MAP-tau expression compared to 52% with low expression (P = .006). In TAX 307, high expression was associated with significantly longer median time to tumor progression (TTP) regardless of treatment arm (33.0 versus 23.4 months, P = 0.010) with mean TTP of 31.2 months. Response rates did not differ by MAP-tau expression (P = 0.518) or by treatment arm (P = 0.584). Conclusions Quantitative measurement of MAP-tau expression has prognostic value in both cohorts, with high expression associated with longer TTP and OS. Differences by treatment arm or response rate in low versus high MAP-tau groups were not observed, indicating that MAP-tau is not associated with response to taxanes and is not a useful predictive marker for taxane-based chemotherapy. PMID:21888627
Highly Multiplexed, Single Cell Transcriptomic Analysis of T-Cells by Microfluidic PCR.
Dominguez, Maria; Roederer, Mario; Chattopadhyay, Pratip K
2017-01-01
Recently, technologies have been developed to measure expression of 96 (or more) mRNA transcripts at once from a single cell. Here we describe methods and important considerations for use of Fluidigm's BioMark platform for multiplexed single cell gene expression. We describe how to qualify primer/probes, select genes to examine in 96-parameter panels, perform the reverse transcription/cDNA synthesis step, and operate the instrument. In addition, we describe data analysis considerations. This technology has enormous value for characterizing the heterogeneity of T-cells, thereby providing a useful tool for immune monitoring.
A Dual-Color Reporter Assay of Cohesin-Mediated Gene Regulation in Budding Yeast Meiosis.
Fan, Jinbo; Jin, Hui; Yu, Hong-Guo
2017-01-01
In this chapter, we describe a quantitative fluorescence-based assay of gene expression using the ratio of the reporter green fluorescence protein (GFP) to the internal red fluorescence protein (RFP) control. With this dual-color heterologous reporter assay, we have revealed cohesin-regulated genes and discovered a cis-acting DNA element, the Ty1-LTR, which interacts with cohesin and regulates gene expression during yeast meiosis. The method described here provides an effective cytological approach for quantitative analysis of global gene expression in budding yeast meiosis.
Methods for Force Analysis of Overconstrained Parallel Mechanisms: A Review
NASA Astrophysics Data System (ADS)
Liu, Wen-Lan; Xu, Yun-Dou; Yao, Jian-Tao; Zhao, Yong-Sheng
2017-11-01
The force analysis of overconstrained PMs is relatively complex and difficult, for which the methods have always been a research hotspot. However, few literatures analyze the characteristics and application scopes of the various methods, which is not convenient for researchers and engineers to master and adopt them properly. A review of the methods for force analysis of both passive and active overconstrained PMs is presented. The existing force analysis methods for these two kinds of overconstrained PMs are classified according to their main ideas. Each category is briefly demonstrated and evaluated from such aspects as the calculation amount, the comprehensiveness of considering limbs' deformation, and the existence of explicit expressions of the solutions, which provides an important reference for researchers and engineers to quickly find a suitable method. The similarities and differences between the statically indeterminate problem of passive overconstrained PMs and that of active overconstrained PMs are discussed, and a universal method for these two kinds of overconstrained PMs is pointed out. The existing deficiencies and development directions of the force analysis methods for overconstrained systems are indicated based on the overview.
Taguchi, Y-H
2018-05-08
Even though coexistence of multiple phenotypes sharing the same genomic background is interesting, it remains incompletely understood. Epigenomic profiles may represent key factors, with unknown contributions to the development of multiple phenotypes, and social-insect castes are a good model for elucidation of the underlying mechanisms. Nonetheless, previous studies have failed to identify genes associated with aberrant gene expression and methylation profiles because of the lack of suitable methodology that can address this problem properly. A recently proposed principal component analysis (PCA)-based and tensor decomposition (TD)-based unsupervised feature extraction (FE) can solve this problem because these two approaches can deal with gene expression and methylation profiles even when a small number of samples is available. PCA-based and TD-based unsupervised FE methods were applied to the analysis of gene expression and methylation profiles in the brains of two social insects, Polistes canadensis and Dinoponera quadriceps. Genes associated with differential expression and methylation between castes were identified, and analysis of enrichment of Gene Ontology terms confirmed reliability of the obtained sets of genes from the biological standpoint. Biologically relevant genes, shown to be associated with significant differential gene expression and methylation between castes, were identified here for the first time. The identification of these genes may help understand the mechanisms underlying epigenetic control of development of multiple phenotypes under the same genomic conditions.
Technique for quantitative RT-PCR analysis directly from single muscle fibers.
Wacker, Michael J; Tehel, Michelle M; Gallagher, Philip M
2008-07-01
The use of single-cell quantitative RT-PCR has greatly aided the study of gene expression in fields such as muscle physiology. For this study, we hypothesized that single muscle fibers from a biopsy can be placed directly into the reverse transcription buffer and that gene expression data can be obtained without having to first extract the RNA. To test this hypothesis, biopsies were taken from the vastus lateralis of five male subjects. Single muscle fibers were isolated and underwent RNA isolation (technique 1) or placed directly into reverse transcription buffer (technique 2). After cDNA conversion, individual fiber cDNA was pooled and quantitative PCR was performed using primer-probes for beta(2)-microglobulin, glyceraldehyde-3-phosphate dehydrogenase, insulin-like growth factor I receptor, and glucose transporter subtype 4. The no RNA extraction method provided similar quantitative PCR data as that of the RNA extraction method. A third technique was also tested in which we used one-quarter of an individual fiber's cDNA for PCR (not pooled) and the average coefficient of variation between fibers was <8% (cycle threshold value) for all genes studied. The no RNA extraction technique was tested on isolated muscle fibers using a gene known to increase after exercise (pyruvate dehydrogenase kinase 4). We observed a 13.9-fold change in expression after resistance exercise, which is consistent with what has been previously observed. These results demonstrate a successful method for gene expression analysis directly from single muscle fibers.
AKPINAR, GURLER; KASAP, MURAT; CANTURK, NUH ZAFER; ZULFIGAROVA, MEHIN; ISLEK, EYLÜL ECE; GULER, SERTAC ATA; SIMSEK, TURGAY; CANTURK, ZEYNEP
2017-01-01
Background/Aim: To unveil the pathophysiology of primary hyperparathyroidism, molecular details of parathyroid hyperplasia and adenoma have to be revealed. Such details will provide the tools necessary for differentiation of these two look-alike diseases. Therefore, in the present study, a comparative proteomic study using postoperative tissue samples from the parathyroid adenoma and parathyroid hyperplasia patients was performed. Materials and Methods: Protein extracts were prepared from tissue samples (n=8 per group). Protein pools were created for each group and subjected to DIGE and conventional 2DE. Following image analysis, spots representing the differentially regulated proteins were excised from the and used for identification via MALDI-TOF/TOF analysis. Results: The identities of 40 differentially-expressed proteins were revealed. Fourteen of these proteins were over-expressed in the hyperplasia while 26 of them were over-expressed in the adenoma. Conclusion: Most proteins found to be over-expressed in the hyperplasia samples were mitochondrial, underlying the importance of the mitochondrial activity as a potential biomarker for differentiation of parathyroid hyperplasia from adenoma. PMID:28446534
Xu, Peng; Wang, Junhua; Sun, Bo; Xiao, Zhongdang
2018-06-15
Investigating the potential biological function of differential changed genes through integrating multiple omics data including miRNA and mRNA expression profiles, is always hot topic. However, how to evaluate the repression effect on target genes integrating miRNA and mRNA expression profiles are not fully solved. In this study, we provide an analyzing method by integrating both miRNAs and mRNAs expression data simultaneously. Difference analysis was adopted based on the repression score, then significantly repressed mRNAs were screened out by DEGseq. Pathway analysis for the significantly repressed mRNAs shows that multiple pathways such as MAPK signaling pathway, TGF-beta signaling pathway and so on, may correlated to the colorectal cancer(CRC). Focusing on the MAPK signaling pathway, a miRNA-mRNA network that centering the cell fate genes was constructed. Finally, the miRNA-mRNAs that potentially important in the CRC carcinogenesis were screened out and scored by impact index. Copyright © 2018 Elsevier B.V. All rights reserved.
MicroRNA-34c-5p is related to recurrence in laryngeal squamous cell carcinoma.
Re, Massimo; Çeka, Artan; Rubini, Corrado; Ferrante, Luigi; Zizzi, Antonio; Gioacchini, Federico M; Tulli, Michele; Spazzafumo, Liana; Sellari-Franceschini, Stefano; Procopio, Antonio D; Olivieri, Fabiola
2015-09-01
Altered microRNA expression has been found in many cancer types, including laryngeal squamous cell carcinoma (LSCC). We investigated the association of LSCC-related miR-34c-5p with disease-free survival and overall survival. Retrospective cohort study. Expression levels of miR-34c-5p were detected in 90 LSCC formalin-fixed paraffin-embedded tissues by reverse-transcription quantitative polymerase chain reaction. Overall survival and disease-free survival were evaluated using the Kaplan-Meier method, and multivariate analysis was performed using Cox proportional hazard analysis. A downregulation of miR-34c-5p expression significantly correlated with worse disease-free and overall survival. In the multivariate analysis, low miR-34c-5p expression was associated with an increased risk of recurrence. A downregulation of miR-34c-5p in LSCC is independently associated with unfavorable disease-free survival, suggesting that miR-34c-5p might be a promising marker for evaluating the risk of recurrences. NA. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.
Tezval, Hossein; Merseburger, Axel S; Matuschek, Ira; Machtens, Stefan; Kuczyk, Markus A; Serth, Jürgen
2008-01-01
Background Epigenetic silencing of RAS association family 1A (RASSF1A) tumor suppressor gene occurs in various histological subtypes of renal cell carcinoma (RCC) but RASSF1A protein expression in clear cell RCC as well as a possible correlation with clinicopathological parameters of patients has not been analyzed at yet. Methods 318 primary clear cell carcinomas were analyzed using tissue microarray analysis and immunohistochemistry. Survival analysis was carried out for 187 patients considering a follow-up period of 2–240 month. Results Expression of RASSF1A was found to be significantly decreased in tumoral cells when compared to normal tubular epithelial cells. RASSF1A immunopositivity was significantly associated with pT stage, group stage and histological grade of tumors and showed a tendency for impaired survival in Kaplan-Meier analysis. Conclusion While most tumors demonstrate a loss of RASSF1A protein, a subset of tumors was identified to exhibit substantial RASSF1A protein expression and show increased tumor progression. Thus RCC tumorigenesis without depletion of RASSF1A may be associated with an adverse clinical outcome. PMID:18822131