Science.gov

Sample records for gene set statistics

  1. Self-Contained Statistical Analysis of Gene Sets

    PubMed Central

    Cannon, Judy L.; Ricoy, Ulises M.; Johnson, Christopher

    2016-01-01

    Microarrays are a powerful tool for studying differential gene expression. However, lists of many differentially expressed genes are often generated, and unraveling meaningful biological processes from the lists can be challenging. For this reason, investigators have sought to quantify the statistical probability of compiled gene sets rather than individual genes. The gene sets typically are organized around a biological theme or pathway. We compute correlations between different gene set tests and elect to use Fisher’s self-contained method for gene set analysis. We improve Fisher’s differential expression analysis of a gene set by limiting the p-value of an individual gene within the gene set to prevent a small percentage of genes from determining the statistical significance of the entire set. In addition, we also compute dependencies among genes within the set to determine which genes are statistically linked. The method is applied to T-ALL (T-lineage Acute Lymphoblastic Leukemia) to identify differentially expressed gene sets between T-ALL and normal patients and T-ALL and AML (Acute Myeloid Leukemia) patients. PMID:27711232

  2. Gene set analysis for GWAS: assessing the use of modified Kolmogorov-Smirnov statistics.

    PubMed

    Debrabant, Birgit; Soerensen, Mette

    2014-10-01

    We discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic, the corresponding test can be considered to infer the classical self-contained null hypothesis. We use simulations to estimate the power for different kinds of alternatives, and to assess the impact of the weight parameter of the modified KS statistic on the power. Finally, we show the analogy between the weight parameter and the genesis and distribution of the gene-level statistics, and illustrate the effects of differential weighting in a real-life example.

  3. Gene set enrichment analysis.

    PubMed

    Tilford, Charles A; Siemers, Nathan O

    2009-01-01

    Set enrichment analytical methods have become commonplace tools applied to the analysis and interpretation of biological data. The statistical techniques are used to identify categorical biases within lists of genes, proteins, or metabolites. The goal is to discover the shared functions or properties of the biological items represented within the lists. Application of these methods can provide great biological insight, including the discovery of participation in the same biological activity or pathway, shared interacting genes or regulators, common cellular compartmentalization, or association with disease. The methods require ordered or unordered lists of biological items as input, understanding of the reference set from which the lists were selected, categorical classifiers describing the items, and a statistical algorithm to assess bias of each classifier. Due to the complexity of most algorithms and the number of calculations performed, computer software is almost always used for execution of the algorithm, as well as for presentation of the results. This chapter will provide an overview of the statistical methods used to perform an enrichment analysis. Guidelines for assembly of the requisite information will be presented, with a focus on careful definition of the sets used by the statistical algorithms. The need for multiple test correction when working with large libraries of classifiers is emphasized, and we outline several options for performing the corrections. Finally, interpreting the results of such analysis will be discussed along with examples of recent research utilizing the techniques.

  4. A statistical approach towards the derivation of predictive gene sets for potency ranking of chemicals in the mouse embryonic stem cell test.

    PubMed

    Schulpen, Sjors H W; Pennings, Jeroen L A; Tonk, Elisa C M; Piersma, Aldert H

    2014-03-21

    The embryonic stem cell test (EST) is applied as a model system for detection of embryotoxicants. The application of transcriptomics allows a more detailed effect assessment compared to the morphological endpoint. Genes involved in cell differentiation, modulated by chemical exposures, may be useful as biomarkers of developmental toxicity. We describe a statistical approach to obtain a predictive gene set for toxicity potency ranking of compounds within one class. This resulted in a gene set based on differential gene expression across concentration-response series of phthalatic monoesters. We determined the concentration at which gene expression was changed at least 1.5-fold. Genes responding with the same potency ranking in vitro and in vivo embryotoxicity were selected. A leave-one-out cross-validation showed that the relative potency of each phthalate was always predicted correctly. The classical morphological 50% effect level (ID50) in EST was similar to the predicted concentration using gene set expression responses. A general down-regulation of development-related genes and up-regulation of cell-cycle related genes was observed, reminiscent of the differentiation inhibition in EST. This study illustrates the feasibility of applying dedicated gene set selections as biomarkers for developmental toxicity potency ranking on the basis of in vitro testing in the EST.

  5. Probabilities for separating sets of order statistics.

    PubMed

    Glueck, D H; Karimpour-Fard, A; Mandel, J; Muller, K E

    2010-04-01

    Consider a set of order statistics that arise from sorting samples from two different populations, each with their own, possibly different distribution functions. The probability that these order statistics fall in disjoint, ordered intervals and that of the smallest statistics, a certain number come from the first populations is given in terms of the two distribution functions. The result is applied to computing the joint probability of the number of rejections and the number of false rejections for the Benjamini-Hochberg false discovery rate procedure.

  6. Statistical considerations in setting product specifications.

    PubMed

    Dong, Xiaoyu; Tsong, Yi; Shen, Meiyu

    2015-01-01

    According to ICH Q6A (1999), a specification is defined as a list of tests, references to analytical procedures, and appropriate acceptance criteria, which are numerical limits, ranges, or other criteria for the tests described. For drug products, specifications usually consist of test methods and acceptance criteria for assay, impurities, pH, dissolution, moisture, and microbial limits, depending on the dosage forms. They are usually proposed by the manufacturers and subject to the regulatory approval for use. When the acceptance criteria in product specifications cannot be pre-defined based on prior knowledge, the conventional approach is to use data from a limited number of clinical batches during the clinical development phases. Often in time, such acceptance criterion is set as an interval bounded by the sample mean plus and minus two to four standard deviations. This interval may be revised with the accumulated data collected from released batches after drug approval. In this article, we describe and discuss the statistical issues of commonly used approaches in setting or revising specifications (usually tighten the limits), including reference interval, (Min, Max) method, tolerance interval, and confidence limit of percentiles. We also compare their performance in terms of the interval width and the intended coverage. Based on our study results and review experiences, we make some recommendations on how to select the appropriate statistical methods in setting product specifications to better ensure the product quality.

  7. Statistical mechanics of the hitting set problem.

    PubMed

    Mézard, Marc; Tarzia, Marco

    2007-10-01

    In this paper we present a detailed study of the hitting set (HS) problem. This problem is a generalization of the standard vertex cover to hypergraphs: one seeks a configuration of particles with minimal density such that every hyperedge of the hypergraph contains at least one particle. It can also be used in important practical tasks, such as the group testing procedures where one wants to detect defective items in a large group by pool testing. Using a statistical mechanics approach based on the cavity method, we study the phase diagram of the HS problem, in the case of random regular hypergraphs. Depending on the values of the variables and tests degrees different situations can occur: The HS problem can be either in a replica symmetric phase, or in a one-step replica symmetry breaking one. In these two cases, we give explicit results on the minimal density of particles, and the structure of the phase space. These problems are thus in some sense simpler than the original vertex cover problem, where the need for a full replica symmetry breaking has prevented the derivation of exact results so far. Finally, we show that decimation procedures based on the belief propagation and the survey propagation algorithms provide very efficient strategies to solve large individual instances of the hitting set problem.

  8. Patient-oriented gene set analysis for cancer mutation data.

    PubMed

    Boca, Simina M; Kinzler, Kenneth W; Velculescu, Victor E; Vogelstein, Bert; Parmigiani, Giovanni

    2010-01-01

    Recent research has revealed complex heterogeneous genomic landscapes in human cancers. However, mutations tend to occur within a core group of pathways and biological processes that can be grouped into gene sets. To better understand the significance of these pathways, we have developed an approach that initially scores each gene set at the patient rather than the gene level. In mutation analysis, these patient-oriented methods are more transparent, interpretable, and statistically powerful than traditional gene-oriented methods.

  9. MAGMA: generalized gene-set analysis of GWAS data.

    PubMed

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  10. Importance of data management with statistical analysis set division.

    PubMed

    Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai

    2015-11-01

    Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality. PMID:26911044

  11. Importance of data management with statistical analysis set division.

    PubMed

    Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai

    2015-11-01

    Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality.

  12. Transformations on Data Sets and Their Effects on Descriptive Statistics

    ERIC Educational Resources Information Center

    Fox, Thomas B.

    2005-01-01

    The activity asks students to examine the effects on the descriptive statistics of a data set that has undergone either a translation or a scale change. They make conjectures relative to the effects on the statistics of a transformation on a data set and then they defend their conjectures and deductively verify several of them.

  13. YGA: identifying distinct biological features between yeast gene sets.

    PubMed

    Chang, Darby Tien-Hao; Li, Wen-Si; Bai, Yi-Han; Wu, Wei-Sheng

    2013-04-10

    The advance of high-throughput experimental technologies generates many gene sets with different biological meanings, where many important insights can only be extracted by identifying the biological (regulatory/functional) features that are distinct between different gene sets (e.g. essential vs. non-essential genes, TATA box-containing vs. TATA box-less genes, induced vs. repressed genes under certain biological conditions). Although many servers have been developed to identify enriched features in a gene set, most of them were designed to analyze one gene set at a time but cannot compare two gene sets. Moreover, the features used in existing servers were mainly focused on functional annotations (GO terms), pathways, transcription factor binding sites (TFBSs) and/or protein-protein interactions (PPIs). In yeast, various important regulatory features, including promoter bendability, nucleosome occupancy, 5'-UTR length, and TF-gene regulation evidence, are available but have not been used in any enrichment analysis servers. This motivates us to develop the Yeast Genes Analyzer (YGA), a web server that simultaneously analyzes various biological (regulatory/functional) features of two gene sets and performs statistical tests to identify the distinct features between them. Many well-studied gene sets such as essential, stress-response, TATA box-containing and cell cycle genes were pre-compiled in YGA for users, if they have only one gene set, to compare with. In comparison with the existing enrichment analysis servers, YGA tests more comprehensive regulatory features (e.g. promoter bendability, nucleosome occupancy, 5'-UTR length, experimental evidence of TF-gene binding and TF-gene regulation) and functional features (e.g. PPI, GO terms, pathways and functional groups of genes, including essential/non-essential genes, stress-induced/-repressed genes, TATA box-containing/-less genes, occupied/depleted proximal-nucleosome genes and cell cycle genes). Furthermore, YGA

  14. An Independent Filter for Gene Set Testing Based on Spectral Enrichment.

    PubMed

    Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H

    2015-01-01

    Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.

  15. Gene set analyses for interpreting microarray experiments on prokaryotic organisms.

    SciTech Connect

    Tintle, Nathan; Best, Aaron; Dejongh, Matthew; VanBruggen, Dirk; Heffron, Fred; Porwollik, Steffen; Taylor, Ronald C.

    2008-11-05

    Background: Recent advances in microarray technology have brought with them the need for enhanced methods of biologically interpreting gene expression data. Recently, methods like Gene Set Enrichment Analysis (GSEA) and variants of Fisher’s exact test have been proposed which utilize a priori biological information. Typically, these methods are demonstrated with a priori biological information from the Gene Ontology. Results: Alternative gene set definitions are presented based on gene sets inferred from the SEED: open-source software environment for comparative genome annotation and analysis of microbial organisms. Many of these gene sets are then shown to provide consistent expression across a series of experiments involving Salmonella Typhimurium. Implementation of the gene sets in an analysis of microarray data is then presented for the Salmonella Typhimurium data. Conclusions: SEED inferred gene sets can be naturally defined based on subsystems in the SEED. The consistent expression values of these SEED inferred gene sets suggest their utility for statistical analyses of gene expression data based on a priori biological information

  16. MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

    DOE PAGESBeta

    Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; Chen, James J.

    2014-01-01

    Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less

  17. Multivariate gene-set testing based on graphical models.

    PubMed

    Städler, Nicolas; Mukherjee, Sach

    2015-01-01

    The identification of predefined groups of genes ("gene-sets") which are differentially expressed between two conditions ("gene-set analysis", or GSA) is a very popular analysis in bioinformatics. GSA incorporates biological knowledge by aggregating over genes that are believed to be functionally related. This can enhance statistical power over analyses that consider only one gene at a time. However, currently available GSA approaches are based on univariate two-sample comparison of single genes. This means that they cannot test for multivariate hypotheses such as differences in covariance structure between the two conditions. Yet interplay between genes is a central aspect of biological investigation and it is likely that such interplay may differ between conditions. This paper proposes a novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions. Testing hypotheses concerning networks is challenging due the nature of the underlying estimation problem. Our starting point is a recent, general approach for high-dimensional two-sample testing. We refine the approach and show how it can be used to perform multivariate, network-based gene-set testing. We validate the approach in simulated examples and show results using high-throughput data from several studies in cancer biology.

  18. Sets, Probability and Statistics: The Mathematics of Life Insurance.

    ERIC Educational Resources Information Center

    Clifford, Paul C.; And Others

    The practical use of such concepts as sets, probability and statistics are considered by many to be vital and necessary to our everyday life. This student manual is intended to familiarize students with these concepts and to provide practice using real life examples. It also attempts to illustrate how the insurance industry uses such mathematic…

  19. Establishment of an attentional set via statistical learning.

    PubMed

    Cosman, Joshua D; Vecera, Shaun P

    2014-02-01

    The ability to overcome attentional capture and attend goal-relevant information is typically viewed as a volitional, effortful process that relies on the maintenance of current task priorities or "attentional sets" in working memory. However, the visual system possesses statistical learning mechanisms that can incidentally encode probabilistic associations between goal-relevant objects and the attributes likely to define them. Thus, it is possible that statistical learning may contribute to the establishment of a given attentional set and modulate the effects of attentional capture. Here we provide evidence for such a mechanism, showing that implicitly learned associations between a search target and its likely color directly influence the ability of a salient color precue to capture attention in a classic attentional capture task. This indicates a novel role for statistical learning in the modulation of attentional capture, and emphasizes the role that this learning may play in goal-directed attentional control more generally. PMID:24099589

  20. Comparing Data Sets: Implicit Summaries of the Statistical Properties of Number Sets

    ERIC Educational Resources Information Center

    Morris, Bradley J.; Masnick, Amy M.

    2015-01-01

    Comparing datasets, that is, sets of numbers in context, is a critical skill in higher order cognition. Although much is known about how people compare single numbers, little is known about how number sets are represented and compared. We investigated how subjects compared datasets that varied in their statistical properties, including ratio of…

  1. Statistical Software for spatial analysis of stratigraphic data sets

    2003-04-08

    Stratistics s a tool for statistical analysis of spatially explicit data sets and model output for description and for model-data comparisons. lt is intended for the analysis of data sets commonly used in geology, such as gamma ray logs and lithologic sequences, as well as 2-D data such as maps. Stratistics incorporates a far wider range of spatial analysis methods drawn from multiple disciplines, than are currently available in other packages. These include incorporation ofmore » techniques from spatial and landscape ecology, fractal analysis, and mathematical geology. Its use should substantially reduce the risk associated with the use of predictive models« less

  2. Time-Course Gene Set Analysis for Longitudinal Gene Expression Data.

    PubMed

    Hejblum, Boris P; Skinner, Jason; Thiébaut, Rodolphe

    2015-06-01

    Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package.

  3. STATISTICS OF DARK MATTER HALOS FROM THE EXCURSION SET APPROACH

    SciTech Connect

    Lapi, A.; Salucci, P.; Danese, L.

    2013-08-01

    We exploit the excursion set approach in integral formulation to derive novel, accurate analytic approximations of the unconditional and conditional first crossing distributions for random walks with uncorrelated steps and general shapes of the moving barrier; we find the corresponding approximations of the unconditional and conditional halo mass functions for cold dark matter (DM) power spectra to represent very well the outcomes of state-of-the-art cosmological N-body simulations. In addition, we apply these results to derive, and confront with simulations, other quantities of interest in halo statistics, including the rates of halo formation and creation, the average halo growth history, and the halo bias. Finally, we discuss how our approach and main results change when considering random walks with correlated instead of uncorrelated steps, and warm instead of cold DM power spectra.

  4. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values (rth ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  5. A statistical mechanics analysis of the set covering problem

    NASA Astrophysics Data System (ADS)

    Fontanari, J. F.

    1996-02-01

    The dependence of the optimal solution average cost 0305-4470/29/3/004/img1 of the set covering problem on the density of 1's of the incidence matrix (0305-4470/29/3/004/img2) and on the number of constraints (P) is investigated in the limit where the number of items (N) goes to infinity. The annealed approximation is employed to study two stochastic models: the constant density model, where the elements of the incidence matrix are statistically independent random variables, and the Karp model, where the rows of the incidence matrix possess the same number of 1's. Lower bounds for 0305-4470/29/3/004/img1 are presented in the case that P scales with ln N and 0305-4470/29/3/004/img2 is of order 1, as well as in the case that P scales linearly with N and 0305-4470/29/3/004/img2 is of order 1/N. It is shown that in the case that P scales with exp N and 0305-4470/29/3/004/img2 is of order 1 the annealed approximation yields exact results for both models.

  6. Applying Statistical Process Quality Control Methodology to Educational Settings.

    ERIC Educational Resources Information Center

    Blumberg, Carol Joyce

    A subset of Statistical Process Control (SPC) methodology known as Control Charting is introduced. SPC methodology is a collection of graphical and inferential statistics techniques used to study the progress of phenomena over time. The types of control charts covered are the null X (mean), R (Range), X (individual observations), MR (moving…

  7. Simple Data Sets for Distinct Basic Summary Statistics

    ERIC Educational Resources Information Center

    Lesser, Lawrence M.

    2011-01-01

    It is important to avoid ambiguity with numbers because unfortunate choices of numbers can inadvertently make it possible for students to form misconceptions or make it difficult for teachers to tell if students obtained the right answer for the right reason. Therefore, it is important to make sure when introducing basic summary statistics that…

  8. Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq.

    PubMed

    Yang, Tae Young; Jeong, Seongmun

    2013-01-01

    In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq. We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang.

  9. Chronic periodontitis genome-wide association studies: gene-centric and gene set enrichment analyses.

    PubMed

    Rhodin, K; Divaris, K; North, K E; Barros, S P; Moss, K; Beck, J D; Offenbacher, S

    2014-09-01

    Recent genome-wide association studies (GWAS) of chronic periodontitis (CP) offer rich data sources for the investigation of candidate genes, functional elements, and pathways. We used GWAS data of CP (n = 4,504) and periodontal pathogen colonization (n = 1,020) from a cohort of adult Americans of European descent participating in the Atherosclerosis Risk in Communities study and employed a MAGENTA approach (i.e., meta-analysis gene set enrichment of variant associations) to obtain gene-centric and gene set association results corrected for gene size, number of single-nucleotide polymorphisms, and local linkage disequilibrium characteristics based on the human genome build 18 (National Center for Biotechnology Information build 36). We used the Gene Ontology, Ingenuity, KEGG, Panther, Reactome, and Biocarta databases for gene set enrichment analyses. Six genes showed evidence of statistically significant association: 4 with severe CP (NIN, p = 1.6 × 10(-7); ABHD12B, p = 3.6 × 10(-7); WHAMM, p = 1.7 × 10(-6); AP3B2, p = 2.2 × 10(-6)) and 2 with high periodontal pathogen colonization (red complex-KCNK1, p = 3.4 × 10(-7); Porphyromonas gingivalis-DAB2IP, p = 1.0 × 10(-6)). Top-ranked genes for moderate CP were HGD (p = 1.4 × 10(-5)), ZNF675 (p = 1.5 × 10(-5)), TNFRSF10C (p = 2.0 × 10(-5)), and EMR1 (p = 2.0 × 10(-5)). Loci containing NIN, EMR1, KCNK1, and DAB2IP had showed suggestive evidence of association in the earlier single-nucleotide polymorphism-based analyses, whereas WHAMM and AP2B2 emerged as novel candidates. The top gene sets included severe CP ("endoplasmic reticulum membrane," "cytochrome P450," "microsome," and "oxidation reduction") and moderate CP ("regulation of gene expression," "zinc ion binding," "BMP signaling pathway," and "ruffle"). Gene-centric analyses offer a promising avenue for efficient interrogation of large-scale GWAS data. These results highlight genes in previously identified loci and new candidate genes and pathways

  10. The Effect of Distributed Practice in Undergraduate Statistics Homework Sets: A Randomized Trial

    ERIC Educational Resources Information Center

    Crissinger, Bryan R.

    2015-01-01

    Most homework sets in statistics courses are constructed so that students concentrate or "mass" their practice on a certain topic in one problem set. Distributed practice homework sets include review problems in each set so that practice on a topic is distributed across problem sets. There is a body of research that points to the…

  11. Excursion sets and non-Gaussian void statistics

    SciTech Connect

    D'Amico, Guido; Musso, Marcello; Paranjape, Aseem; Norena, Jorge

    2011-01-15

    Primordial non-Gaussianity (NG) affects the large scale structure (LSS) of the Universe by leaving an imprint on the distribution of matter at late times. Much attention has been focused on using the distribution of collapsed objects (i.e. dark matter halos and the galaxies and galaxy clusters that reside in them) to probe primordial NG. An equally interesting and complementary probe however is the abundance of extended underdense regions or voids in the LSS. The calculation of the abundance of voids using the excursion set formalism in the presence of primordial NG is subject to the same technical issues as the one for halos, which were discussed e.g. in Ref. [51][G. D'Amico, M. Musso, J. Norena, and A. Paranjape, arXiv:1005.1203.]. However, unlike the excursion set problem for halos which involved random walks in the presence of one barrier {delta}{sub c}, the void excursion set problem involves two barriers {delta}{sub v} and {delta}{sub c}. This leads to a new complication introduced by what is called the 'void-in-cloud' effect discussed in the literature, which is unique to the case of voids. We explore a path integral approach which allows us to carefully account for all these issues, leading to a rigorous derivation of the effects of primordial NG on void abundances. The void-in-cloud issue, in particular, makes the calculation conceptually rather different from the one for halos. However, we show that its final effect can be described by a simple yet accurate approximation. Our final void abundance function is valid on larger scales than the expressions of other authors, while being broadly in agreement with those expressions on smaller scales.

  12. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  13. Rotation gene set testing for longitudinal expression data.

    PubMed

    Dørum, Guro; Snipen, Lars; Solheim, Margrete; Saebø, Solve

    2014-11-01

    Gene set analysis methods are popular tools for identifying differentially expressed gene sets in microarray data. Most existing methods use a permutation test to assess significance for each gene set. The permutation test's assumption of exchangeable samples is often not satisfied for time-series data and complex experimental designs, and in addition it requires a certain number of samples to compute p-values accurately. The method presented here uses a rotation test rather than a permutation test to assess significance. The rotation test can compute accurate p-values also for very small sample sizes. The method can handle complex designs and is particularly suited for longitudinal microarray data where the samples may have complex correlation structures. Dependencies between genes, modeled with the use of gene networks, are incorporated in the estimation of correlations between samples. In addition, the method can test for both gene sets that are differentially expressed and gene sets that show strong time trends. We show on simulated longitudinal data that the ability to identify important gene sets may be improved by taking the correlation structure between samples into account. Applied to real data, the method identifies both gene sets with constant expression and gene sets with strong time trends.

  14. Gene coexpression measures in large heterogeneous samples using count statistics.

    PubMed

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance. PMID:25288767

  15. Gene coexpression measures in large heterogeneous samples using count statistics

    PubMed Central

    Wang, Y. X. Rachel; Waterman, Michael S.; Huang, Haiyan

    2014-01-01

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the “big data” challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance. PMID:25288767

  16. Gene coexpression measures in large heterogeneous samples using count statistics.

    PubMed

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  17. Principles for the organization of gene-sets.

    PubMed

    Li, Wentian; Freudenberg, Jan; Oswald, Michaela

    2015-12-01

    A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.

  18. Sets, Probability and Statistics: The Mathematics of Life Insurance. [Computer Program.] Second Edition.

    ERIC Educational Resources Information Center

    King, James M.; And Others

    The materials described here represent the conversion of a highly popular student workbook "Sets, Probability and Statistics: The Mathematics of Life Insurance" into a computer program. The program is designed to familiarize students with the concepts of sets, probability, and statistics, and to provide practice using real life examples. It also…

  19. Statistical aspect of trait mapping using a dense set of markers: A partial review

    SciTech Connect

    Dupuis, J.

    1996-12-31

    This paper presents a review of statistical methods used to locate trait loci using maps of markers spanning the whole genome. Such maps are becoming readily available and can be especially useful in mapping traits that are non Mendelian. Genome-wide search for a trait locus is often called a {open_quotes}global search{close_quotes}. Global search methods include, but are not restricted to, identifying disease susceptibility genes using affected relative pairs, finding quantitative trait loci in experimental organisms and locating quantitative trait loci in humans. For human linkage, we concentrate on methods using pairs of affected relatives rather than pedigree analysis. We begin in the next section with a review of work on the use of affected pairs of relatives to identify gene loci that increase susceptibility to a particular disease. We first review Risch`s 1990 series of papers. Risch`s method can be used to search the entire genome for such susceptibility genes. Using Risch`s idea Elston explored the issue of how many pairs and markers are necessary to reach a certain probability of detecting a locus if there exists one. He proposed a more economical two stage design that uses few markers at the first stage but adds markers around the {open_quotes}promising{close_quotes} area of the genome at the second stage. However, Risch and Elston do not use multipoint linkage analysis, which takes into account all markers at once (rather than one at a time) in the calculation of the test statistic. Such multipoint methods for affected relatives have been developed by Feingold and Feingold et al. The last authors` multipoint method is based on a continuous specification of identity by descent between the affected relatives but can also be used for a set of linked markers spanning the genome. A brief description of their method and treatment of more complex issues such as combining relative pairs is included. 29 refs., 4 tabs.

  20. Curated eutherian third party data gene data sets.

    PubMed

    Premzl, Marko

    2016-03-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.

  1. Curated eutherian third party data gene data sets

    PubMed Central

    Premzl, Marko

    2015-01-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets. PMID:26862561

  2. Identification of pleiotropic genes and gene sets underlying growth and immunity traits: a case study on Meishan pigs.

    PubMed

    Zhang, Z; Wang, Z; Yang, Y; Zhao, J; Chen, Q; Liao, R; Chen, Z; Zhang, X; Xue, M; Yang, H; Zheng, Y; Wang, Q; Pan, Y

    2016-04-01

    Both growth and immune capacity are important traits in animal breeding. The animal quantitative trait loci (QTL) database is a valuable resource and can be used for interpreting the genetic mechanisms that underlie growth and immune traits. However, QTL intervals often involve too many candidate genes to find the true causal genes. Therefore, the aim of this study was to provide an effective annotation pipeline that can make full use of the information of Gene Ontology terms annotation, linkage gene blocks and pathways to further identify pleiotropic genes and gene sets in the overlapping intervals of growth-related and immunity-related QTLs. In total, 55 non-redundant QTL overlapping intervals were identified, 1893 growth-related genes and 713 immunity-related genes were further classified into overlapping intervals and 405 pleiotropic genes shared by the two gene sets were determined. In addition, 19 pleiotropic gene linkage blocks and 67 pathways related to immunity and growth traits were discovered. A total of 343 growth-related genes and 144 immunity-related genes involved in pleiotropic pathways were also identified, respectively. We also sequenced and genotyped 284 individuals from Chinese Meishan pigs and European pigs and mapped the single nucleotide polymorphisms (SNPs) to the pleiotropic genes and gene sets that we identified. A total of 971 high-confidence SNPs were mapped to the pleiotropic genes and gene sets that we identified, and among them 743 SNPs were statistically significant in allele frequency between Meishan and European pigs. This study explores the relationship between growth and immunity traits from the view of QTL overlapping intervals and can be generalized to explore the relationships between other traits.

  3. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering

    PubMed Central

    Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample. PMID:27764138

  4. Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line.

    PubMed

    Blayney, Jaine K; Davison, Timothy; McCabe, Nuala; Walker, Steven; Keating, Karen; Delaney, Thomas; Greenan, Caroline; Williams, Alistair R; McCluggage, W Glenn; Capes-Davis, Amanda; Harkin, D Paul; Gourley, Charlie; Kennedy, Richard D

    2016-09-30

    Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.

  5. Prior knowledge transfer across transcriptional data sets and technologies using compositional statistics yields new mislabelled ovarian cell line

    PubMed Central

    Blayney, Jaine K.; Davison, Timothy; McCabe, Nuala; Walker, Steven; Keating, Karen; Delaney, Thomas; Greenan, Caroline; Williams, Alistair R.; McCluggage, W. Glenn; Capes-Davis, Amanda; Harkin, D. Paul; Gourley, Charlie; Kennedy, Richard D.

    2016-01-01

    Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package. PMID:27353327

  6. Identifying the Genetic Variation of Gene Expression Using Gene Sets: Application of Novel Gene Set eQTL Approach to PharmGKB and KEGG

    PubMed Central

    Abo, Ryan; Jenkins, Gregory D.; Wang, Liewei; Fridley, Brooke L.

    2012-01-01

    Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set – expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations. PMID:22905253

  7. From Biophysics to Evolutionary Genetics: Statistical Aspects of Gene Regulation

    NASA Astrophysics Data System (ADS)

    Lässig, Michael

    Genomic functions often cannot be understood at the level of single genes but require the study of gene networks. This systems biology credo is nearly commonplace by now. Evidence comes from the comparative analysis of entire genomes: current estimates put, for example, the number of human genes at around 22,000, hardly more than the 14,000 of the fruit fly, and not even an order of magnitude higher than the 6,000 of baker's yeast. The complexity and diversity of higher animals, therefore, cannot be explained in terms of their gene numbers. If, however, a biological function requires the concerted action of several genes, and conversely, a gene takes part in several functional contexts, an organism may be defined less by its individual genes but by their interactions. The emerging picture of the genome as a strongly interacting system with many degrees of freedom brings new challenges for experiment and theory, many of which are of a statistical nature. And indeed, this picture continues to make the subject attractive to a growing number of statistical physicists.

  8. Functional-network-based gene set analysis using gene-ontology.

    PubMed

    Chang, Billy; Kustra, Rafal; Tian, Weidong

    2013-01-01

    To account for the functional non-equivalence among a set of genes within a biological pathway when performing gene set analysis, we introduce GOGANPA, a network-based gene set analysis method, which up-weights genes with functions relevant to the gene set of interest. The genes are weighted according to its degree within a genome-scale functional network constructed using the functional annotations available from the gene ontology database. By benchmarking GOGANPA using a well-studied P53 data set and three breast cancer data sets, we will demonstrate the power and reproducibility of our proposed method over traditional unweighted approaches and a competing network-based approach that involves a complex integrated network. GOGANPA's sole reliance on gene ontology further allows GOGANPA to be widely applicable to the analysis of any gene-ontology-annotated genome. PMID:23418449

  9. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics

    USGS Publications Warehouse

    Antweiler, R.C.; Taylor, H.E.

    2008-01-01

    The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.

  10. A Bayesian variable selection procedure to rank overlapping gene sets

    PubMed Central

    2012-01-01

    Background Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows. Results We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations. Conclusions Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize. PMID:22554182

  11. Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

    PubMed

    Mathur, Sunil; Sadana, Ajit

    2015-12-01

    We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set.

  12. Zebrafish Expression Ontology of Gene Sets (ZEOGS): a tool to analyze enrichment of zebrafish anatomical terms in large gene sets.

    PubMed

    Prykhozhij, Sergey V; Marsico, Annalisa; Meijsing, Sebastiaan H

    2013-09-01

    The zebrafish (Danio rerio) is an established model organism for developmental and biomedical research. It is frequently used for high-throughput functional genomics experiments, such as genome-wide gene expression measurements, to systematically analyze molecular mechanisms. However, the use of whole embryos or larvae in such experiments leads to a loss of the spatial information. To address this problem, we have developed a tool called Zebrafish Expression Ontology of Gene Sets (ZEOGS) to assess the enrichment of anatomical terms in large gene sets. ZEOGS uses gene expression pattern data from several sources: first, in situ hybridization experiments from the Zebrafish Model Organism Database (ZFIN); second, it uses the Zebrafish Anatomical Ontology, a controlled vocabulary that describes connected anatomical structures; and third, the available connections between expression patterns and anatomical terms contained in ZFIN. Upon input of a gene set, ZEOGS determines which anatomical structures are overrepresented in the input gene set. ZEOGS allows one for the first time to look at groups of genes and to describe them in terms of shared anatomical structures. To establish ZEOGS, we first tested it on random gene selections and on two public microarray datasets with known tissue-specific gene expression changes. These tests showed that ZEOGS could reliably identify the tissues affected, whereas only very few enriched terms to none were found in the random gene sets. Next we applied ZEOGS to microarray datasets of 24 and 72 h postfertilization zebrafish embryos treated with beclomethasone, a potent glucocorticoid. This analysis resulted in the identification of several anatomical terms related to glucocorticoid-responsive tissues, some of which were stage-specific. Our studies highlight the ability of ZEOGS to extract spatial information from datasets derived from whole embryos, indicating that ZEOGS could be a useful tool to automatically analyze gene expression

  13. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

  14. On sufficient statistics of least-squares superposition of vector sets.

    PubMed

    Konagurthu, Arun S; Kasarapu, Parthan; Allison, Lloyd; Collier, James H; Lesk, Arthur M

    2015-06-01

    The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.

  15. Integrated gene set analysis for microRNA studies

    PubMed Central

    Garcia-Garcia, Francisco; Panadero, Joaquin; Dopazo, Joaquin; Montaner, David

    2016-01-01

    Motivation: Functional interpretation of miRNA expression data is currently done in a three step procedure: select differentially expressed miRNAs, find their target genes, and carry out gene set overrepresentation analysis. Nevertheless, major limitations of this approach have already been described at the gene level, while some newer arise in the miRNA scenario. Here, we propose an enhanced methodology that builds on the well-established gene set analysis paradigm. Evidence for differential expression at the miRNA level is transferred to a gene differential inhibition score which is easily interpretable in terms of gene sets or pathways. Such transferred indexes account for the additive effect of several miRNAs targeting the same gene, and also incorporate cancellation effects between cases and controls. Together, these two desirable characteristics allow for more accurate modeling of regulatory processes. Results: We analyze high-throughput sequencing data from 20 different cancer types and provide exhaustive reports of gene and Gene Ontology-term deregulation by miRNA action. Availability and Implementation: The proposed methodology was implemented in the Bioconductor library mdgsa. http://bioconductor.org/packages/mdgsa. For the purpose of reproducibility all of the scripts are available at https://github.com/dmontaner-papers/gsa4mirna Contact: david.montaner@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27324197

  16. Analysis of Gene Sets Based on the Underlying Regulatory Network

    PubMed Central

    Michailidis, George

    2009-01-01

    Abstract Networks are often used to represent the interactions among genes and proteins. These interactions are known to play an important role in vital cell functions and should be included in the analysis of genes that are differentially expressed. Methods of gene set analysis take advantage of external biological information and analyze a priori defined sets of genes. These methods can potentially preserve the correlation among genes; however, they do not directly incorporate the information about the gene network. In this paper, we propose a latent variable model that directly incorporates the network information. We then use the theory of mixed linear models to present a general inference framework for the problem of testing the significance of subnetworks. Several possible test procedures are introduced and a network based method for testing the changes in expression levels of genes as well as the structure of the network is presented. The performance of the proposed method is compared with methods of gene set analysis using both simulation studies, as well as real data on genes related to the galactose utilization pathway in yeast. PMID:19254181

  17. Differential Effects of Goal Setting and Value Reappraisal on College Women's Motivation and Achievement in Statistics

    ERIC Educational Resources Information Center

    Acee, Taylor Wayne

    2009-01-01

    The purpose of this dissertation was to investigate the differential effects of goal setting and value reappraisal on female students' self-efficacy beliefs, value perceptions, exam performance and continued interest in statistics. It was hypothesized that the Enhanced Goal Setting Intervention (GS-E) would positively impact students'…

  18. Statistically invalid classification of high throughput gene expression data.

    PubMed

    Barbash, Shahar; Soreq, Hermona

    2013-01-01

    Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes.

  19. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    PubMed

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  20. Evaluating gene set enrichment analysis via a hybrid data model.

    PubMed

    Hua, Jianping; Bittner, Michael L; Dougherty, Edward R

    2014-01-01

    Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance.

  1. Gene set enrichment ensemble using fold change data only.

    PubMed

    Huang, Hai; Zhang, Shaohong; Shen, Wen-Jun; Wong, Hau-San; Xie, Dongqing

    2015-10-01

    In a number of biological studies, the raw gene expression data are not usually published due to different causes, such as data privacy and patent rights. Instead, significant gene lists with fold change values are usually provided in most studies. However, due to variations in data sources and profiling conditions, only a small number of common significant genes could be found among similar studies. Moreover, traditional gene set based analyses that consider these genes have not taken into account the fold change values, which may be important to distinguish between the different levels of significance of the genes. Human embryonic stem cell derived cardiomyocytes (hESC-CM) is a good representative of this category. hESC-CMs, with its role as a potentially unlimited source of human heart cells for regenerative medicine, have attracted the attentions of biological and medical researchers. Because of the difficulty of acquiring data and the resulting expenses, there are only a few related hESC-CM studies and few hESC-CM gene expression data are provided. In view of these challenges, we propose a new Gene Set Enrichment Ensemble (GSEE) approach to perform gene set based analysis on individual studies based on significant up-regulated gene lists with fold change data only. Our approach provides both explicit and implicit ways to utilize the fold change data, in order to make full use of scarce data. We validate our approach with hESC-CM data and fetal heart data, respectively. Experimental results on significant gene lists from different studies illustrate the effectiveness of our proposed approach. PMID:26241354

  2. Gene set enrichment ensemble using fold change data only.

    PubMed

    Huang, Hai; Zhang, Shaohong; Shen, Wen-Jun; Wong, Hau-San; Xie, Dongqing

    2015-10-01

    In a number of biological studies, the raw gene expression data are not usually published due to different causes, such as data privacy and patent rights. Instead, significant gene lists with fold change values are usually provided in most studies. However, due to variations in data sources and profiling conditions, only a small number of common significant genes could be found among similar studies. Moreover, traditional gene set based analyses that consider these genes have not taken into account the fold change values, which may be important to distinguish between the different levels of significance of the genes. Human embryonic stem cell derived cardiomyocytes (hESC-CM) is a good representative of this category. hESC-CMs, with its role as a potentially unlimited source of human heart cells for regenerative medicine, have attracted the attentions of biological and medical researchers. Because of the difficulty of acquiring data and the resulting expenses, there are only a few related hESC-CM studies and few hESC-CM gene expression data are provided. In view of these challenges, we propose a new Gene Set Enrichment Ensemble (GSEE) approach to perform gene set based analysis on individual studies based on significant up-regulated gene lists with fold change data only. Our approach provides both explicit and implicit ways to utilize the fold change data, in order to make full use of scarce data. We validate our approach with hESC-CM data and fetal heart data, respectively. Experimental results on significant gene lists from different studies illustrate the effectiveness of our proposed approach.

  3. Turning publicly available gene expression data into discoveries using gene set context analysis.

    PubMed

    Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

    2016-01-01

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data.

  4. The essential gene set of a photosynthetic organism.

    PubMed

    Rubin, Benjamin E; Wetmore, Kelly M; Price, Morgan N; Diamond, Spencer; Shultzaberger, Ryan K; Lowe, Laura C; Curtin, Genevieve; Arkin, Adam P; Deutschbauer, Adam; Golden, Susan S

    2015-12-01

    Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ∼ 250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism's 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlap with well-conserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNA(Leu), which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism's physiology and defines the essential gene set required for the growth of a photosynthetic organism.

  5. A unified set-based test with adaptive filtering for gene-environment interaction analyses.

    PubMed

    Liu, Qianying; Chen, Lin S; Nicolae, Dan L; Pierce, Brandon L

    2016-06-01

    In genome-wide gene-environment interaction (GxE) studies, a common strategy to improve power is to first conduct a filtering test and retain only the SNPs that pass the filtering in the subsequent GxE analyses. Inspired by two-stage tests and gene-based tests in GxE analysis, we consider the general problem of jointly testing a set of parameters when only a few are truly from the alternative hypothesis and when filtering information is available. We propose a unified set-based test that simultaneously considers filtering on individual parameters and testing on the set. We derive the exact distribution and approximate the power function of the proposed unified statistic in simplified settings, and use them to adaptively calculate the optimal filtering threshold for each set. In the context of gene-based GxE analysis, we show that although the empirical power function may be affected by many factors, the optimal filtering threshold corresponding to the peak of the power curve primarily depends on the size of the gene. We further propose a resampling algorithm to calculate P-values for each gene given the estimated optimal filtering threshold. The performance of the method is evaluated in simulation studies and illustrated via a genome-wide gene-gender interaction analysis using pancreatic cancer genome-wide association data. PMID:26496228

  6. A unified set-based test with adaptive filtering for gene-environment interaction analyses

    PubMed Central

    Liu, Qianying; Chen, Lin S.; Nicolae, Dan L.; Pierce, Brandon L.

    2015-01-01

    Summary In genome-wide gene-environment interaction (GxE) studies, a common strategy to improve power is to first conduct a filtering test and retain only the SNPs that pass the filtering in the subsequent GxE analyses. Inspired by two-stage tests and gene-based tests in GxE analysis, we consider the general problem of jointly testing a set of parameters when only a few are truly from the alternative hypothesis and when filtering information is available. We propose a unified set-based test that simultaneously considers filtering on individual parameters and testing on the set. We derive the exact distribution and approximate the power function of the proposed unified statistic in simplified settings, and use them to adaptively calculate the optimal filtering threshold for each set. In the context of gene-based GxE analysis, we show that although the empirical power function may be affected by many factors, the optimal filtering threshold corresponding to the peak of the power curve primarily depends on the size of the gene. We further propose a resampling algorithm to calculate p-values for each gene given the estimated optimal filtering threshold. The performance of the method is evaluated in simulation studies and illustrated via a genome-wide gene-gender interaction analysis using pancreatic cancer genome-wide association data. PMID:26496228

  7. TransFind--predicting transcriptional regulators for gene sets.

    PubMed

    Kiełbasa, Szymon M; Klein, Holger; Roider, Helge G; Vingron, Martin; Blüthgen, Nils

    2010-07-01

    The analysis of putative transcription factor binding sites in promoter regions of coregulated genes allows to infer the transcription factors that underlie observed changes in gene expression. While such analyses constitute a central component of the in-silico characterization of transcriptional regulatory networks, there is still a lack of simple-to-use web servers able to combine state-of-the-art prediction methods with phylogenetic analysis and appropriate multiple testing corrected statistics, which returns the results within a short time. Having these aims in mind we developed TransFind, which is freely available at http://transfind.sys-bio.net/.

  8. Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

    PubMed Central

    Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

    2013-01-01

    Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426

  9. Using Stimulus Equivalence Technology to Teach Statistical Inference in a Group Setting

    ERIC Educational Resources Information Center

    Critchfield, Thomas S.; Fienup, Daniel M.

    2010-01-01

    Computerized lessons employing stimulus equivalence technology, used previously under laboratory conditions to teach inferential statistics concepts to college students, were employed in a group setting for the first time. Students showed the same directly taught and emergent learning gains as in laboratory studies. A brief paper-and-pencil…

  10. Statistical Analysis of Probability of Detection Hit/Miss Data for Small Data Sets

    NASA Astrophysics Data System (ADS)

    Harding, C. A.; Hugo, G. R.

    2003-03-01

    This paper examines the validity of statistical methods for determining nondestructive inspection probability of detection (POD) curves from relatively small hit/miss POD data sets. One method published in the literature is shown to be invalid for analysis of POD hit/miss data. Another standard method is shown to be valid only for data sets containing more than 200 observations. An improved method is proposed which allows robust lower 95% confidence limit POD curves to be determined from data sets containing as few as 50 hit/miss observations.

  11. A Complete Set of Nascent Transcription Rates for Yeast Genes

    PubMed Central

    Pelechano, Vicent; Chávez, Sebastián; Pérez-Ortín, José E.

    2010-01-01

    The amount of mRNA in a cell is the result of two opposite reactions: transcription and mRNA degradation. These reactions are governed by kinetics laws, and the most regulated step for many genes is the transcription rate. The transcription rate, which is assumed to be exercised mainly at the RNA polymerase recruitment level, can be calculated using the RNA polymerase densities determined either by run-on or immunoprecipitation using specific antibodies. The yeast Saccharomyces cerevisiae is the ideal model organism to generate a complete set of nascent transcription rates that will prove useful for many gene regulation studies. By combining genomic data from both the GRO (Genomic Run-on) and the RNA pol ChIP-on-chip methods we generated a new, more accurate nascent transcription rate dataset. By comparing this dataset with the indirect ones obtained from the mRNA stabilities and mRNA amount datasets, we are able to obtain biological information about posttranscriptional regulation processes and a genomic snapshot of the location of the active transcriptional machinery. We have obtained nascent transcription rates for 4,670 yeast genes. The median RNA polymerase II density in the genes is 0.078 molecules/kb, which corresponds to an average of 0.096 molecules/gene. Most genes have transcription rates of between 2 and 30 mRNAs/hour and less than 1% of yeast genes have >1 RNA polymerase molecule/gene. Histone and ribosomal protein genes are the highest transcribed groups of genes and other than these exceptions the transcription of genes is an infrequent phenomenon in a yeast cell. PMID:21103382

  12. Gene set analysis of survival following ovarian cancer implicates macrolide binding and intracellular signaling genes

    PubMed Central

    Fridley, Brooke L.; Jenkins, Gregory D.; Tsai, Ya-Yu; Song, Honglin; Bolton, Kelly L.; Fenstermacher, David; Tyrer, Jonathan; Ramus, Susan J.; Cunningham, Julie M.; Vierkant, Robert A.; Chen, Zhihua; Chen, Y. Ann; Iversen, Ed; Menon, Usha; Gentry-Maharaj, Aleksandra; Schildkraut, Joellen; Sutphen, Rebecca; Gayther, Simon A.; Hartmann, Lynn C.; Pharoah, Paul D. P.; Sellers, Thomas A.; Goode, Ellen L.

    2012-01-01

    Background Genome-wide association studies (GWAS) for epithelial ovarian cancer (EOC), the most lethal gynecologic malignancy, have identified novel susceptibility loci. GWAS for survival after EOC have had more limited success. The association of each single nucleotide polymorphism (SNP) individually may not be well-suited to detect small effects of multiple SNPs, such as those operating within the same biological pathway. Gene set analysis (GSA) overcomes this limitation by assessing overall evidence for association of a phenotype with all measured variation in a set of genes. Methods To determine gene sets associated with EOC overall survival, we conducted GSA using data from two large GWASes (N cases = 2,813, N deaths = 1,116), with a novel Principal Component – Gamma GSA method. Analysis was completed for all cases and then separately for high grade serous (HGS) histological subtype. Results Analysis of the HGS subjects resulted in 43 gene sets with p<0.005 (1.7%); of these, 21 gene sets had p < 0.10 in both GWASes, including intracellular signaling pathway (p = 7.3 × 10−5) and macrolide binding (p = 6.2 ×10−4) gene sets. The top gene sets in analysis of all cases were meiotic mismatch repair (p=6.3 ×10−4) and macrolide binding (p=1.0×10−3). Of 18 gene sets with p<0.005 (0.7%), eight had p < 0.10 in both GWASes. Conclusion This research detected novel gene sets associated with EOC survival. Impact Novel gene sets associated with EOC survival might lead to new insights and avenues for development of novel therapies for EOC and pharmacogenomic studies. PMID:22302016

  13. Key genes and pathways in thyroid cancer based on gene set enrichment analysis.

    PubMed

    He, Wenwu; Qi, Bin; Zhou, Qiuxi; Lu, Chuansen; Huang, Qi; Xian, Lei; Chen, Mingwu

    2013-09-01

    The incidence of thyroid cancer and its associated morbidity has shown the most rapid increase among all cancers since 1982, but the mechanisms involved in thyroid cancer, particularly significant key genes induced in thyroid cancer, remain undefined. In many studies, gene probes have been used to search for key genes involved in causing and facilitating thyroid cancer. As a result, many possible virulence genes and pathways have been identified. However, these studies lack a case contrast for selecting the most possible virulence genes and pathways, as well as conclusive results with which to clarify the mechanisms of cancer development. In the present study, we used gene set enrichment and meta-analysis to select key genes and pathways. Based on gene set enrichment, we identified 5 downregulated and 4 upregulated mixed pathways in 6 tissue datasets. Based on the meta-analysis, there were 17 common pathways in the tissue datasets. One pathway, the p53 signaling pathway, which includes 13 genes, was identified by both the gene set enrichment analysis and meta-analysis. Genes are important elements that form key pathways. These pathways can induce the development of thyroid cancer later in life. The key pathways and genes identified in the present study can be used in the next stage of research, which will involve gene elimination and other methods of experimentation.

  14. Non-Euclidean basis function based level set segmentation with statistical shape prior.

    PubMed

    Ruiz, Esmeralda; Reisert, Marco; Bai, Li

    2013-01-01

    We present a new framework for image segmentation with statistical shape model enhanced level sets represented as a linear combination of non-Euclidean radial basis functions (RBFs). The shape prior for the level set is represented as a probabilistic map created from the training data and registered with the target image. The new framework has the following advantages: 1) the explicit RBF representation of the level set allows the level set evolution to be represented as ordinary differential equations and reinitialization is no longer required. 2) The non-Euclidean distance RBFs makes it possible to incorporate image information into the basis functions, which results in more accurate and topologically more flexible solutions. Experimental results are presented to demonstrate the advantages of the method, as well as critical analysis of level sets versus the combination of both methods.

  15. Parallel evolution of nacre building gene sets in molluscs.

    PubMed

    Jackson, Daniel J; McDougall, Carmel; Woodcroft, Ben; Moase, Patrick; Rose, Robert A; Kube, Michael; Reinhardt, Richard; Rokhsar, Daniel S; Montagnani, Caroline; Joubert, Caroline; Piquemal, David; Degnan, Bernard M

    2010-03-01

    The capacity to biomineralize is closely linked to the rapid expansion of animal life during the early Cambrian, with many skeletonized phyla first appearing in the fossil record at this time. The appearance of disparate molluscan forms during this period leaves open the possibility that shells evolved independently and in parallel in at least some groups. To test this proposition and gain insight into the evolution of structural genes that contribute to shell fabrication, we compared genes expressed in nacre (mother-of-pearl) forming cells in the mantle of the bivalve Pinctada maxima and the gastropod Haliotis asinina. Despite both species having highly lustrous nacre, we find extensive differences in these expressed gene sets. Following the removal of housekeeping genes, less than 10% of all gene clusters are shared between these molluscs, with some being conserved biomineralization genes that are also found in deuterostomes. These differences extend to secreted proteins that may localize to the organic shell matrix, with less than 15% of this secretome being shared. Despite these differences, H. asinina and P. maxima both secrete proteins with repetitive low-complexity domains (RLCDs). Pinctada maxima RLCD proteins-for example, the shematrins-are predominated by silk/fibroin-like domains, which are absent from the H. asinina data set. Comparisons of shematrin genes across three species of Pinctada indicate that this gene family has undergone extensive divergent evolution within pearl oysters. We also detect fundamental bivalve-gastropod differences in extracellular matrix proteins involved in mollusc-shell formation. Pinctada maxima expresses a chitin synthase at high levels and several chitin deacetylation genes, whereas only one protein involved in chitin interactions is present in the H. asinina data set, suggesting that the organic matrix on which calcification proceeds differs fundamentally between these species. Large-scale differences in genes expressed

  16. Methods of artificial enlargement of the training set for statistical shape models.

    PubMed

    Koikkalainen, Juha; Tölli, Tuomas; Lauerma, Kirsi; Antila, Kari; Mattila, Elina; Lilja, Mikko; Lötjönen, Jyrki

    2008-11-01

    Due to the small size of training sets, statistical shape models often over-constrain the deformation in medical image segmentation. Hence, artificial enlargement of the training set has been proposed as a solution for the problem to increase the flexibility of the models. In this paper, different methods were evaluated to artificially enlarge a training set. Furthermore, the objectives were to study the effects of the size of the training set, to estimate the optimal number of deformation modes, to study the effects of different error sources, and to compare different deformation methods. The study was performed for a cardiac shape model consisting of ventricles, atria, and epicardium, and built from magnetic resonance (MR) volume images of 25 subjects. Both shape modeling and image segmentation accuracies were studied. The objectives were reached by utilizing different training sets and datasets, and two deformation methods. The evaluation proved that artificial enlargement of the training set improves both the modeling and segmentation accuracy. All but one enlargement techniques gave statistically significantly (p < 0.05) better segmentation results than the standard method without enlargement. The two best enlargement techniques were the nonrigid movement technique and the technique that combines principal component analysis (PCA) and finite element model (FEM). The optimal number of deformation modes was found to be near 100 modes in our application. The active shape model segmentation gave better segmentation accuracy than the one based on the simulated annealing optimization of the model weights.

  17. Textrous!: extracting semantic textual meaning from gene sets.

    PubMed

    Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M; Siddiqui, Sana; Luttrell, Louis M; Maudsley, Stuart

    2013-01-01

    The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.

  18. The essential gene set of a photosynthetic organism

    PubMed Central

    Rubin, Benjamin E.; Wetmore, Kelly M.; Price, Morgan N.; Diamond, Spencer; Shultzaberger, Ryan K.; Lowe, Laura C.; Curtin, Genevieve; Arkin, Adam P.; Deutschbauer, Adam; Golden, Susan S.

    2015-01-01

    Synechococcus elongatus PCC 7942 is a model organism used for studying photosynthesis and the circadian clock, and it is being developed for the production of fuel, industrial chemicals, and pharmaceuticals. To identify a comprehensive set of genes and intergenic regions that impacts fitness in S. elongatus, we created a pooled library of ∼250,000 transposon mutants and used sequencing to identify the insertion locations. By analyzing the distribution and survival of these mutants, we identified 718 of the organism’s 2,723 genes as essential for survival under laboratory conditions. The validity of the essential gene set is supported by its tight overlap with well-conserved genes and its enrichment for core biological processes. The differences noted between our dataset and these predictors of essentiality, however, have led to surprising biological insights. One such finding is that genes in a large portion of the TCA cycle are dispensable, suggesting that S. elongatus does not require a cyclic TCA process. Furthermore, the density of the transposon mutant library enabled individual and global statements about the essentiality of noncoding RNAs, regulatory elements, and other intergenic regions. In this way, a group I intron located in tRNALeu, which has been used extensively for phylogenetic studies, was shown here to be essential for the survival of S. elongatus. Our survey of essentiality for every locus in the S. elongatus genome serves as a powerful resource for understanding the organism’s physiology and defines the essential gene set required for the growth of a photosynthetic organism. PMID:26508635

  19. Statistical Mechanics of Horizontal Gene Transfer in Evolutionary Ecology

    NASA Astrophysics Data System (ADS)

    Chia, Nicholas; Goldenfeld, Nigel

    2011-04-01

    The biological world, especially its majority microbial component, is strongly interacting and may be dominated by collective effects. In this review, we provide a brief introduction for statistical physicists of the way in which living cells communicate genetically through transferred genes, as well as the ways in which they can reorganize their genomes in response to environmental pressure. We discuss how genome evolution can be thought of as related to the physical phenomenon of annealing, and describe the sense in which genomes can be said to exhibit an analogue of information entropy. As a direct application of these ideas, we analyze the variation with ocean depth of transposons in marine microbial genomes, predicting trends that are consistent with recent observations using metagenomic surveys.

  20. Comparisons of power of statistical methods for gene-environment interaction analyses.

    PubMed

    Ege, Markus J; Strachan, David P

    2013-10-01

    Any genome-wide analysis is hampered by reduced statistical power due to multiple comparisons. This is particularly true for interaction analyses, which have lower statistical power than analyses of associations. To assess gene-environment interactions in population settings we have recently proposed a statistical method based on a modified two-step approach, where first genetic loci are selected by their associations with disease and environment, respectively, and subsequently tested for interactions. We have simulated various data sets resembling real world scenarios and compared single-step and two-step approaches with respect to true positive rate (TPR) in 486 scenarios and (study-wide) false positive rate (FPR) in 252 scenarios. Our simulations confirmed that in all two-step methods the two steps are not correlated. In terms of TPR, two-step approaches combining information on gene-disease association and gene-environment association in the first step were superior to all other methods, while preserving a low FPR in over 250 million simulations under the null hypothesis. Our weighted modification yielded the highest power across various degrees of gene-environment association in the controls. An optimal threshold for step 1 depended on the interacting allele frequency and the disease prevalence. In all scenarios, the least powerful method was to proceed directly to an unbiased full interaction model, applying conventional genome-wide significance thresholds. This simulation study confirms the practical advantage of two-step approaches to interaction testing over more conventional one-step designs, at least in the context of dichotomous disease outcomes and other parameters that might apply in real-world settings.

  1. Gene Set Analysis: A Step-By-Step Guide

    PubMed Central

    Mooney, Michael A.; Wilmot, Beth

    2015-01-01

    To maximize the potential of genome-wide association studies, many researchers are performing secondary analyses to identify sets of genes jointly associated with the trait of interest. Although methods for gene-set analyses (GSA), also called pathway analyses, have been around for more than a decade, the field is still evolving. There are numerous algorithms available for testing the cumulative effect of multiple SNPs, yet no real consensus in the field about the best way to perform a GSA. This paper provides an overview of the factors that can affect the results of a GSA, the lessons learned from past studies, and suggestions for how to make analysis choices that are most appropriate for different types of data. PMID:26059482

  2. Imputing gene expression from optimally reduced probe sets

    PubMed Central

    Donner, Yoni; Feng, Ting; Benoist, Christophe; Koller, Daphne

    2012-01-01

    Measuring complete gene expression profiles for a large number of experiments is costly. We propose an approach in which a small subset of probes is selected based on a preliminary set of full expression profiles. In subsequent experiments, only the subset is measured, and the missing values are imputed. We develop several algorithms to simultaneously select probes and impute missing values, and demonstrate that these probe selection for imputation (PSI) algorithms can successfully reconstruct missing gene expression values in a wide variety of applications, as evaluated using multiple metrics of biological importance. We analyze the performance of PSI methods under varying conditions, provide guidelines for choosing the optimal method based on the experimental setting, and indicate how to estimate imputation accuracy. Finally, we apply our approach to a large-scale study of immune system variation. PMID:23064520

  3. Statistical plant set estimation using Schroeder-phased multisinusoidal input design

    NASA Technical Reports Server (NTRS)

    Bayard, D. S.

    1992-01-01

    A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.

  4. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

    PubMed Central

    Fan, Jean; Salathia, Neeraj; Liu, Rui; Kaeser, Gwendolyn E.; Yung, Yun C.; Herman, Joseph L.; Kaper, Fiona; Fan, Jian-Bing; Zhang, Kun; Chun, Jerold; Kharchenko, Peter V.

    2016-01-01

    The transcriptional state of a cell reflects a variety of biological factors, from persistent cell-type specific features to transient processes such as cell cycle. Depending on biological context, all such aspects of transcriptional heterogeneity may be of interest, but detecting them from noisy single-cell RNA-seq data remains challenging. We developed PAGODA to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability amongst measured cells. PMID:26780092

  5. Quantum Statistical Mechanical Derivation of the Second Law of Thermodynamics: A Hybrid Setting Approach.

    PubMed

    Tasaki, Hal

    2016-04-29

    Based on quantum statistical mechanics and microscopic quantum dynamics, we prove Planck's and Kelvin's principles for macroscopic systems in a general and realistic setting. We consider a hybrid quantum system that consists of the thermodynamic system, which is initially in thermal equilibrium, and the "apparatus" which operates on the former, and assume that the whole system evolves autonomously. This provides a satisfactory derivation of the second law for macroscopic systems.

  6. Quantum Statistical Mechanical Derivation of the Second Law of Thermodynamics: A Hybrid Setting Approach.

    PubMed

    Tasaki, Hal

    2016-04-29

    Based on quantum statistical mechanics and microscopic quantum dynamics, we prove Planck's and Kelvin's principles for macroscopic systems in a general and realistic setting. We consider a hybrid quantum system that consists of the thermodynamic system, which is initially in thermal equilibrium, and the "apparatus" which operates on the former, and assume that the whole system evolves autonomously. This provides a satisfactory derivation of the second law for macroscopic systems. PMID:27176507

  7. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    NASA Astrophysics Data System (ADS)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  8. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  9. A novel method to quantify gene set functional association based on gene ontology.

    PubMed

    Lv, Sali; Li, Yan; Wang, Qianghu; Ning, Shangwei; Huang, Teng; Wang, Peng; Sun, Jie; Zheng, Yan; Liu, Weisha; Ai, Jing; Li, Xia

    2012-05-01

    Numerous gene sets have been used as molecular signatures for exploring the genetic basis of complex disorders. These gene sets are distinct but related to each other in many cases; therefore, efforts have been made to compare gene sets for studies such as those evaluating the reproducibility of different experiments. Comparison in terms of biological function has been demonstrated to be helpful to biologists. We improved the measurement of semantic similarity to quantify the functional association between gene sets in the context of gene ontology and developed a web toolkit named Gene Set Functional Similarity (GSFS; http://bioinfo.hrbmu.edu.cn/GSFS). Validation based on protein complexes for which the functional associations are known demonstrated that the GSFS scores tend to be correlated with sequence similarity scores and that complexes with high GSFS scores tend to be involved in the same functional catalogue. Compared with the pairwise method and the annotation method, the GSFS shows better discrimination and more accurately reflects the known functional catalogues shared between complexes. Case studies comparing differentially expressed genes of prostate tumour samples from different microarray platforms and identifying coronary heart disease susceptibility pathways revealed that the method could contribute to future studies exploring the molecular basis of complex disorders.

  10. Multivariate statistical approach to a data set of dioxin and furan contaminations in human milk

    SciTech Connect

    Lindstrom, G.U.M.; Sjostrom, M.; Swanson, S.E. ); Furst, P.; Kruger, C.; Meemken, H.A.; Groebel, W. )

    1988-05-01

    The levels of chlorinated dibenzodioxins, PCDDs, and dibenzofurans, PCDFs, in human milk have been of great concern after the discovery of the toxic 2,3,7,8-substituted isomers in milk of European origin. As knowledge of environmental contamination of human breast milk increases, questions will continue to be asked about possible risks from breast feeding. Before any recommendations can be made, there must be knowledge of contaminant levels in mothers' breast milk. Researchers have measured PCB and 17 different dioxins and furans in human breast milk samples. To date the data has only been analyzed by univariate and bivariate statistical methods. However to extract as much information as possible from this data set, multivariate statistical methods must be used. Here the authors present a multivariate analysis where the relationships between the polychlorinated compounds and the personalia of the mothers have been studied. For the data analysis partial least squares (PLS) modelling has been used.

  11. Regionalisation of statistical model outputs creating gridded data sets for Germany

    NASA Astrophysics Data System (ADS)

    Höpp, Simona Andrea; Rauthe, Monika; Deutschländer, Thomas

    2016-04-01

    The goal of the German research program ReKliEs-De (regional climate projection ensembles for Germany, http://.reklies.hlug.de) is to distribute robust information about the range and the extremes of future climate for Germany and its neighbouring river catchment areas. This joint research project is supported by the German Federal Ministry of Education and Research (BMBF) and was initiated by the German Federal States. The Project results are meant to support the development of adaptation strategies to mitigate the impacts of future climate change. The aim of our part of the project is to adapt and transfer the regionalisation methods of the gridded hydrological data set (HYRAS) from daily station data to the station based statistical regional climate model output of WETTREG (regionalisation method based on weather patterns). The WETTREG model output covers the period of 1951 to 2100 with a daily temporal resolution. For this, we generate a gridded data set of the WETTREG output for precipitation, air temperature and relative humidity with a spatial resolution of 12.5 km x 12.5 km, which is common for regional climate models. Thus, this regionalisation allows comparing statistical to dynamical climate model outputs. The HYRAS data set was developed by the German Meteorological Service within the German research program KLIWAS (www.kliwas.de) and consists of daily gridded data for Germany and its neighbouring river catchment areas. It has a spatial resolution of 5 km x 5 km for the entire domain for the hydro-meteorological elements precipitation, air temperature and relative humidity and covers the period of 1951 to 2006. After conservative remapping the HYRAS data set is also convenient for the validation of climate models. The presentation will consist of two parts to present the actual state of the adaptation of the HYRAS regionalisation methods to the statistical regional climate model WETTREG: First, an overview of the HYRAS data set and the regionalisation

  12. Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

    PubMed Central

    Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

    2013-01-01

    We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487

  13. On application of constitutional descriptors for merging of quinoxaline data sets using linear statistical methods.

    PubMed

    Ghosh, Payel; Vracko, Marjan; Chattopadhyay, Asis Kumar; Bagchi, Manish C

    2008-08-01

    The present paper is an attempt for unifying two different quinoxaline data sets with a wide range of substituents in 2, 3, 7, and 8 positions having excellent antitubercular activities with a view to developing robust and reliable structure-activity relationships. The merging has been performed for these two sets of quinoxaline 1,4-di-N-oxides derivatives comprising 29 and 18 compounds, respectively, on the basis of constitutional descriptors, which denotes the structural characterization of the molecules. Principal component analysis was performed to see the distribution of the compounds from two data sets for the constitutional descriptors. The distribution of compounds in score plot based on constitutional descriptors suggests unification of quinoxaline data sets which is useful for the model development. Outlier detection was performed from the standpoint of residual analysis of the partial least squares regression models. The superiority of the constitutional descriptors over other calculated molecular descriptors has been established from the standpoint of leave-one-out cross-validation technique associated with partial least squares regression analysis. Internal validation through the leave-many-out methodology was also performed with good results, assuring the stability of the models. The results obtained from linear partial least squares regression analysis lead to a statistically significant and robust quantitative structure-activity relationship modeling.

  14. Level Set Segmentation of Medical Images Based on Local Region Statistics and Maximum a Posteriori Probability

    PubMed Central

    Wang, Yi; Lei, Tao; Fan, Yangyu; Feng, Yan

    2013-01-01

    This paper presents a variational level set method for simultaneous segmentation and bias field estimation of medical images with intensity inhomogeneity. In our model, the statistics of image intensities belonging to each different tissue in local regions are characterized by Gaussian distributions with different means and variances. According to maximum a posteriori probability (MAP) and Bayes' rule, we first derive a local objective function for image intensities in a neighborhood around each pixel. Then this local objective function is integrated with respect to the neighborhood center over the entire image domain to give a global criterion. In level set framework, this global criterion defines an energy in terms of the level set functions that represent a partition of the image domain and a bias field that accounts for the intensity inhomogeneity of the image. Therefore, image segmentation and bias field estimation are simultaneously achieved via a level set evolution process. Experimental results for synthetic and real images show desirable performances of our method. PMID:24302974

  15. Discriminatory power of game-related statistics in 14-15 year age group male volleyball, according to set.

    PubMed

    García-Hermoso, Antonio; Dávila-Romero, Carlos; Saavedra, Jose M

    2013-02-01

    This study compared volleyball game-related statistics by outcome (winners and losers of sets) and set number (total, initial, and last) to identify characteristics that discriminated game performance. Game-related statistics from 314 sets (44 matches) played by teams of male 14- to 15-year-olds in a regional volleyball championship were analysed (2011). Differences between contexts (winning or losing teams) and "set number" (total, initial, and last) were assessed. A discriminant analysis was then performed according to outcome (winners and losers of sets) and "set number" (total, initial, and last). The results showed differences (winning or losing sets) in several variables of Complexes I (attack point and error reception) and II (serve and aces). Game-related statistics which discriminate performance in the sets index the serve, positive reception, and attack point. The predictors of performance at these ages when players are still learning could help coaches plan their training. PMID:23829141

  16. Meta-analysis of differentiating mouse embryonic stem cell gene expression kinetics reveals early change of a small gene set.

    PubMed

    Glover, Clive H; Marin, Michael; Eaves, Connie J; Helgason, Cheryl D; Piret, James M; Bryan, Jennifer

    2006-11-24

    Stem cell differentiation involves critical changes in gene expression. Identification of these should provide endpoints useful for optimizing stem cell propagation as well as potential clues about mechanisms governing stem cell maintenance. Here we describe the results of a new meta-analysis methodology applied to multiple gene expression datasets from three mouse embryonic stem cell (ESC) lines obtained at specific time points during the course of their differentiation into various lineages. We developed methods to identify genes with expression changes that correlated with the altered frequency of functionally defined, undifferentiated ESC in culture. In each dataset, we computed a novel statistical confidence measure for every gene which captured the certainty that a particular gene exhibited an expression pattern of interest within that dataset. This permitted a joint analysis of the datasets, despite the different experimental designs. Using a ranking scheme that favored genes exhibiting patterns of interest, we focused on the top 88 genes whose expression was consistently changed when ESC were induced to differentiate. Seven of these (103728_at, 8430410A17Rik, Klf2, Nr0b1, Sox2, Tcl1, and Zfp42) showed a rapid decrease in expression concurrent with a decrease in frequency of undifferentiated cells and remained predictive when evaluated in additional maintenance and differentiating protocols. Through a novel meta-analysis, this study identifies a small set of genes whose expression is useful for identifying changes in stem cell frequencies in cultures of mouse ESC. The methods and findings have broader applicability to understanding the regulation of self-renewal of other stem cell types.

  17. Statistical criteria to set alarm levels for continuous measurements of ground contamination.

    PubMed

    Brandl, A; Jimenez, A D Herrera

    2008-08-01

    In the course of the decommissioning of the ASTRA research reactor at the site of the Austrian Research Centers at Seibersdorf, the operator and licensee, Nuclear Engineering Seibersdorf, conducted an extensive site survey and characterization to demonstrate compliance with regulatory site release criteria. This survey included radiological characterization of approximately 400,000 m(2) of open land on the Austrian Research Centers premises. Part of this survey was conducted using a mobile large-area gas proportional counter, continuously recording measurements while it was moved at a speed of 0.5 ms(-1). In order to set reasonable investigation levels, two alarm levels based on statistical considerations were developed. This paper describes the derivation of these alarm levels and the operational experience gained by detector deployment in the field. PMID:18617795

  18. Statistical criteria to set alarm levels for continuous measurements of ground contamination.

    PubMed

    Brandl, A; Jimenez, A D Herrera

    2008-08-01

    In the course of the decommissioning of the ASTRA research reactor at the site of the Austrian Research Centers at Seibersdorf, the operator and licensee, Nuclear Engineering Seibersdorf, conducted an extensive site survey and characterization to demonstrate compliance with regulatory site release criteria. This survey included radiological characterization of approximately 400,000 m(2) of open land on the Austrian Research Centers premises. Part of this survey was conducted using a mobile large-area gas proportional counter, continuously recording measurements while it was moved at a speed of 0.5 ms(-1). In order to set reasonable investigation levels, two alarm levels based on statistical considerations were developed. This paper describes the derivation of these alarm levels and the operational experience gained by detector deployment in the field.

  19. Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis.

    PubMed

    Charmpi, Konstantina; Ycart, Bernard

    2015-06-01

    Gene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

  20. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    PubMed Central

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  1. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  2. Statistics

    Cancer.gov

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  3. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data.

    PubMed

    Xia, Jianguo; Gill, Erin E; Hancock, Robert E W

    2015-06-01

    Meta-analysis of gene expression data sets is increasingly performed to help identify robust molecular signatures and to gain insights into underlying biological processes. The complicated nature of such analyses requires both advanced statistics and innovative visualization strategies to support efficient data comparison, interpretation and hypothesis generation. NetworkAnalyst (http://www.networkanalyst.ca) is a comprehensive web-based tool designed to allow bench researchers to perform various common and complex meta-analyses of gene expression data via an intuitive web interface. By coupling well-established statistical procedures with state-of-the-art data visualization techniques, NetworkAnalyst allows researchers to easily navigate large complex gene expression data sets to determine important features, patterns, functions and connections, thus leading to the generation of new biological hypotheses. This protocol provides a step-wise description of how to effectively use NetworkAnalyst to perform network analysis and visualization from gene lists; to perform meta-analysis on gene expression data while taking into account multiple metadata parameters; and, finally, to perform a meta-analysis of multiple gene expression data sets. NetworkAnalyst is designed to be accessible to biologists rather than to specialist bioinformaticians. The complete protocol can be executed in ∼1.5 h. Compared with other similar web-based tools, NetworkAnalyst offers a unique visual analytics experience that enables data analysis within the context of protein-protein interaction networks, heatmaps or chord diagrams. All of these analysis methods provide the user with supporting statistical and functional evidence.

  4. New cyt b gene universal primer set for forensic analysis.

    PubMed

    Lopez-Oceja, A; Gamarra, D; Borragan, S; Jiménez-Moreno, S; de Pancorbo, M M

    2016-07-01

    Analysis of mitochondrial DNA, and in particular the cytochrome b gene (cyt b), has become an essential tool for species identification in routine forensic practice. In cases of degraded samples, where the DNA is fractionated, universal primers that are highly efficient for the amplification of the target region are necessary. Therefore, in the present study a new universal cyt b primer set with high species identification capabilities, even in samples with highly degraded DNA, has been developed. In order to achieve this objective, the primers were designed following the alignment of complete sequences of the cyt b from 751 species from the Class of Mammalia listed in GenBank. A highly variable region of 148bp flanked by highly conserved sequences was chosen for placing the primers. The effectiveness of the new pair of primers was examined in 63 animal species belonging to 38 Families from 14 Orders and 5 Classes (Mammalia, Aves, Reptilia, Actinopterygii, and Malacostraca). Species determination was possible in all cases, which shows that the fragment analyzed provided a high capability for species identification. Furthermore, to ensure the efficiency of the 148bp fragment, the intraspecific variability was analyzed by calculating the concordance between individuals with the BLAST tool from the NCBI (National Center for Biotechnological Information). The intraspecific concordance levels were superior to 97% in all species. Likewise, the phylogenetic information from the selected fragment was confirmed by obtaining the phylogenetic tree from the sequences of the species analyzed. Evidence of the high power of phylogenetic discrimination of the analyzed fragment of the cyt b was obtained, as 93.75% of the species were grouped within their corresponding Orders. Finally, the analysis of 40 degraded samples with small-size DNA fragments showed that the new pair of primers permits identifying the species, even when the DNA is highly degraded as it is very common in

  5. New cyt b gene universal primer set for forensic analysis.

    PubMed

    Lopez-Oceja, A; Gamarra, D; Borragan, S; Jiménez-Moreno, S; de Pancorbo, M M

    2016-07-01

    Analysis of mitochondrial DNA, and in particular the cytochrome b gene (cyt b), has become an essential tool for species identification in routine forensic practice. In cases of degraded samples, where the DNA is fractionated, universal primers that are highly efficient for the amplification of the target region are necessary. Therefore, in the present study a new universal cyt b primer set with high species identification capabilities, even in samples with highly degraded DNA, has been developed. In order to achieve this objective, the primers were designed following the alignment of complete sequences of the cyt b from 751 species from the Class of Mammalia listed in GenBank. A highly variable region of 148bp flanked by highly conserved sequences was chosen for placing the primers. The effectiveness of the new pair of primers was examined in 63 animal species belonging to 38 Families from 14 Orders and 5 Classes (Mammalia, Aves, Reptilia, Actinopterygii, and Malacostraca). Species determination was possible in all cases, which shows that the fragment analyzed provided a high capability for species identification. Furthermore, to ensure the efficiency of the 148bp fragment, the intraspecific variability was analyzed by calculating the concordance between individuals with the BLAST tool from the NCBI (National Center for Biotechnological Information). The intraspecific concordance levels were superior to 97% in all species. Likewise, the phylogenetic information from the selected fragment was confirmed by obtaining the phylogenetic tree from the sequences of the species analyzed. Evidence of the high power of phylogenetic discrimination of the analyzed fragment of the cyt b was obtained, as 93.75% of the species were grouped within their corresponding Orders. Finally, the analysis of 40 degraded samples with small-size DNA fragments showed that the new pair of primers permits identifying the species, even when the DNA is highly degraded as it is very common in

  6. Gene-set activity toolbox (GAT): A platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach.

    PubMed

    Engchuan, Worrawat; Meechai, Asawin; Tongsima, Sissades; Doungpan, Narumol; Chan, Jonathan H

    2016-08-01

    Cancer is a complex disease that cannot be diagnosed reliably using only single gene expression analysis. Using gene-set analysis on high throughput gene expression profiling controlled by various environmental factors is a commonly adopted technique used by the cancer research community. This work develops a comprehensive gene expression analysis tool (gene-set activity toolbox: (GAT)) that is implemented with data retriever, traditional data pre-processing, several gene-set analysis methods, network visualization and data mining tools. The gene-set analysis methods are used to identify subsets of phenotype-relevant genes that will be used to build a classification model. To evaluate GAT performance, we performed a cross-dataset validation study on three common cancers namely colorectal, breast and lung cancers. The results show that GAT can be used to build a reasonable disease diagnostic model and the predicted markers have biological relevance. GAT can be accessed from http://gat.sit.kmutt.ac.th where GAT's java library for gene-set analysis, simple classification and a database with three cancer benchmark datasets can be downloaded. PMID:27102089

  7. Tracking Difference in Gene Expression in a Time-Course Experiment Using Gene Set Enrichment Analysis

    PubMed Central

    Wong, Pui Shan; Tanaka, Michihiro; Sunaga, Yoshihiko; Tanaka, Masayoshi; Taniguchi, Takeaki; Yoshino, Tomoko; Tanaka, Tsuyoshi; Fujibuchi, Wataru; Aburatani, Sachiyo

    2014-01-01

    Fistulifera sp. strain JPCC DA0580 is a newly sequenced pennate diatom that is capable of simultaneously growing and accumulating lipids. This is a unique trait, not found in other related microalgae so far. It is able to accumulate between 40 to 60% of its cell weight in lipids, making it a strong candidate for the production of biofuel. To investigate this characteristic, we used RNA-Seq data gathered at four different times while Fistulifera sp. strain JPCC DA0580 was grown in oil accumulating and non-oil accumulating conditions. We then adapted gene set enrichment analysis (GSEA) to investigate the relationship between the difference in gene expression of 7,822 genes and metabolic functions in our data. We utilized information in the KEGG pathway database to create the gene sets and changed GSEA to use re-sampling so that data from the different time points could be included in the analysis. Our GSEA method identified photosynthesis, lipid synthesis and amino acid synthesis related pathways as processes that play a significant role in oil production and growth in Fistulifera sp. strain JPCC DA0580. In addition to GSEA, we visualized the results by creating a network of compounds and reactions, and plotted the expression data on top of the network. This made existing graph algorithms available to us which we then used to calculate a path that metabolizes glucose into triacylglycerol (TAG) in the smallest number of steps. By visualizing the data this way, we observed a separate up-regulation of genes at different times instead of a concerted response. We also identified two metabolic paths that used less reactions than the one shown in KEGG and showed that the reactions were up-regulated during the experiment. The combination of analysis and visualization methods successfully analyzed time-course data, identified important metabolic pathways and provided new hypotheses for further research. PMID:25268590

  8. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    SciTech Connect

    Tucker, James D.; Joiner, Michael C.; Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V.; Chinkhota, Chantelle N.; Smolinski, Joseph M.; Divine, George W.; Auner, Gregory W.

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  9. Gene expression analysis reveals a gene set discriminatory to different metals in soil.

    PubMed

    Nota, Benjamin; Verweij, Rudo A; Molenaar, Douwe; Ylstra, Bauke; van Straalen, Nico M; Roelofs, Dick

    2010-05-01

    Environmental pollution is a worldwide problem, and metals are the largest group of contaminants in soil. Microarray toxicogenomic studies with ecologically relevant organisms, such as springtails, supplement traditional ecotoxicological research but are presently rather descriptive. Classifier analysis, a more analytical application of the microarray technique, is able to predict biological classes of unknown samples. We used the uncorrelated shrunken centroid method to classify gene expression profiles of the springtail Folsomia candida exposed to soil spiked with six different metals (barium, cadmium, cobalt, chromium, lead, and zinc). We identified a gene set (classifier) of 188 genes that can discriminate between six different metals present in soil, which allowed us to predict the correct classes for samples of an independent test set with an accuracy of 83% (error rate = 0.17). This study shows further that in order to apply classifier analysis to actual contaminated field soil samples, more insight and information is needed on the transcriptional responses of soil organisms to different soil types (properties) and mixtures of contaminants. PMID:20133373

  10. A new efficient statistical test for detecting variability in the gene expression data.

    PubMed

    Mathur, Sunil; Dolo, Samuel

    2008-08-01

    DNA microarray technology allows researchers to monitor the expressions of thousands of genes under different conditions. The detection of differential gene expression under two different conditions is very important in microarray studies. Microarray experiments are multi-step procedures and each step is a potential source of variance. This makes the measurement of variability difficult because approach based on gene-by-gene estimation of variance will have few degrees of freedom. It is highly possible that the assumption of equal variance for all the expression levels may not hold. Also, the assumption of normality of gene expressions may not hold. Thus it is essential to have a statistical procedure which is not based on the normality assumption and also it can detect genes with differential variance efficiently. The detection of differential gene expression variance will allow us to identify experimental variables that affect different biological processes and accuracy of DNA microarray measurements.In this article, a new nonparametric test for scale is developed based on the arctangent of the ratio of two expression levels. Most of the tests available in literature require the assumption of normal distribution, which makes them inapplicable in many situations, and it is also hard to verify the suitability of the normal distribution assumption for the given data set. The proposed test does not require the assumption of the distribution for the underlying population and hence makes it more practical and widely applicable. The asymptotic relative efficiency is calculated under different distributions, which show that the proposed test is very powerful when the assumption of normality breaks down. Monte Carlo simulation studies are performed to compare the power of the proposed test with some of the existing procedures. It is found that the proposed test is more powerful than commonly used tests under almost all the distributions considered in the study. A

  11. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    PubMed

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.

  12. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    PubMed

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646

  13. Pancreatic beta cells express a diverse set of homeobox genes.

    PubMed Central

    Rudnick, A; Ling, T Y; Odagiri, H; Rutter, W J; German, M S

    1994-01-01

    Homeobox genes, which are found in all eukaryotic organisms, encode transcriptional regulators involved in cell-type differentiation and development. Several homeobox genes encoding homeodomain proteins that bind and activate the insulin gene promoter have been described. In an attempt to identify additional beta-cell homeodomain proteins, we designed primers based on the sequences of beta-cell homeobox genes cdx3 and lmx1 and the Drosophila homeodomain protein Antennapedia and used these primers to amplify inserts by PCR from an insulinoma cDNA library. The resulting amplification products include sequences encoding 10 distinct homeodomain proteins; 3 of these proteins have not been described previously. In addition, an insert was obtained encoding a splice variant of engrailed-2, a homeodomain protein previously identified in the central nervous system. Northern analysis revealed a distinct pattern of expression for each homeobox gene. Interestingly, the PCR-derived clones do not represent a complete sampling of the beta-cell library because no inserts encoding cdx3 or lmx1 protein were obtained. Beta cells probably express additional homeobox genes. The abundance and diversity of homeodomain proteins found in beta cells illustrate the remarkable complexity and redundancy of the machinery controlling beta-cell development and differentiation. Images PMID:7991607

  14. Set statistics in conductive bridge random access memory device with Cu/HfO{sub 2}/Pt structure

    SciTech Connect

    Zhang, Meiyun; Long, Shibing Wang, Guoming; Xu, Xiaoxin; Li, Yang; Liu, Qi; Lv, Hangbing; Liu, Ming; Lian, Xiaojuan; Miranda, Enrique; Suñé, Jordi

    2014-11-10

    The switching parameter variation of resistive switching memory is one of the most important challenges in its application. In this letter, we have studied the set statistics of conductive bridge random access memory with a Cu/HfO{sub 2}/Pt structure. The experimental distributions of the set parameters in several off resistance ranges are shown to nicely fit a Weibull model. The Weibull slopes of the set voltage and current increase and decrease logarithmically with off resistance, respectively. This experimental behavior is perfectly captured by a Monte Carlo simulator based on the cell-based set voltage statistics model and the Quantum Point Contact electron transport model. Our work provides indications for the improvement of the switching uniformity.

  15. Investigating the different mechanisms of genotoxic and non-genotoxic carcinogens by a gene set analysis.

    PubMed

    Lee, Won Jun; Kim, Sang Cheol; Lee, Seul Ji; Lee, Jeongmi; Park, Jeong Hill; Yu, Kyung-Sang; Lim, Johan; Kwon, Sung Won

    2014-01-01

    Based on the process of carcinogenesis, carcinogens are classified as either genotoxic or non-genotoxic. In contrast to non-genotoxic carcinogens, many genotoxic carcinogens have been reported to cause tumor in carcinogenic bioassays in animals. Thus evaluating the genotoxicity potential of chemicals is important to discriminate genotoxic from non-genotoxic carcinogens for health care and pharmaceutical industry safety. Additionally, investigating the difference between the mechanisms of genotoxic and non-genotoxic carcinogens could provide the foundation for a mechanism-based classification for unknown compounds. In this study, we investigated the gene expression of HepG2 cells treated with genotoxic or non-genotoxic carcinogens and compared their mechanisms of action. To enhance our understanding of the differences in the mechanisms of genotoxic and non-genotoxic carcinogens, we implemented a gene set analysis using 12 compounds for the training set (12, 24, 48 h) and validated significant gene sets using 22 compounds for the test set (24, 48 h). For a direct biological translation, we conducted a gene set analysis using Globaltest and selected significant gene sets. To validate the results, training and test compounds were predicted by the significant gene sets using a prediction analysis for microarrays (PAM). Finally, we obtained 6 gene sets, including sets enriched for genes involved in the adherens junction, bladder cancer, p53 signaling pathway, pathways in cancer, peroxisome and RNA degradation. Among the 6 gene sets, the bladder cancer and p53 signaling pathway sets were significant at 12, 24 and 48 h. We also found that the DDB2, RRM2B and GADD45A, genes related to the repair and damage prevention of DNA, were consistently up-regulated for genotoxic carcinogens. Our results suggest that a gene set analysis could provide a robust tool in the investigation of the different mechanisms of genotoxic and non-genotoxic carcinogens and construct a more detailed

  16. A reference gene set for chemosensory receptor genes of Manduca sexta.

    PubMed

    Koenig, Christopher; Hirsh, Ariana; Bucks, Sascha; Klinner, Christian; Vogel, Heiko; Shukla, Aditi; Mansfield, Jennifer H; Morton, Brian; Hansson, Bill S; Grosse-Wilde, Ewald

    2015-11-01

    The order of Lepidoptera has historically been crucial for chemosensory research, with many important advances coming from the analysis of species like Bombyx mori or the tobacco hornworm, Manduca sexta. Specifically M. sexta has long been a major model species in the field, especially regarding the importance of olfaction in an ecological context, mainly the interaction with its host plants. In recent years transcriptomic data has led to the discovery of members of all major chemosensory receptor families in the species, but the data was fragmentary and incomplete. Here we present the analysis of the newly available high-quality genome data for the species, supplemented by additional transcriptome data to generate a high quality reference gene set for the three major chemosensory receptor gene families, the gustatory (GR), olfactory (OR) and antennal ionotropic receptors (IR). Coupled with gene expression analysis our approach allows association of specific receptor types and behaviors, like pheromone and host detection. The dataset will provide valuable support for future analysis of these essential chemosensory modalities in this species and in Lepidoptera in general. PMID:26365739

  17. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

    PubMed Central

    Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi

    2016-01-01

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961

  18. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

    PubMed

    Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

    2016-07-01

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961

  19. Fundamental Limitations of High Contrast Imaging Set by Small Sample Statistics

    NASA Astrophysics Data System (ADS)

    Mawet, D.; Milli, J.; Wahhaj, Z.; Pelat, D.; Absil, O.; Delacroix, C.; Boccaletti, A.; Kasper, M.; Kenworthy, M.; Marois, C.; Mennesson, B.; Pueyo, L.

    2014-09-01

    In this paper, we review the impact of small sample statistics on detection thresholds and corresponding confidence levels (CLs) in high-contrast imaging at small angles. When looking close to the star, the number of resolution elements decreases rapidly toward small angles. This reduction of the number of degrees of freedom dramatically affects CLs and false alarm probabilities. Naively using the same ideal hypothesis and methods as for larger separations, which are well understood and commonly assume Gaussian noise, can yield up to one order of magnitude error in contrast estimations at fixed CL. The statistical penalty exponentially increases toward very small inner working angles. Even at 5-10 resolution elements from the star, false alarm probabilities can be significantly higher than expected. Here we present a rigorous statistical analysis that ensures robustness of the CL, but also imposes a substantial limitation on corresponding achievable detection limits (thus contrast) at small angles. This unavoidable fundamental statistical effect has a significant impact on current coronagraphic and future high-contrast imagers. Finally, the paper concludes with practical recommendations to account for small number statistics when computing the sensitivity to companions at small angles and when exploiting the results of direct imaging planet surveys.

  20. Fundamental limitations of high contrast imaging set by small sample statistics

    SciTech Connect

    Mawet, D.; Milli, J.; Wahhaj, Z.; Pelat, D.; Absil, O.; Delacroix, C.; Boccaletti, A.; Kasper, M.; Kenworthy, M.; Marois, C.; Mennesson, B.; Pueyo, L.

    2014-09-10

    In this paper, we review the impact of small sample statistics on detection thresholds and corresponding confidence levels (CLs) in high-contrast imaging at small angles. When looking close to the star, the number of resolution elements decreases rapidly toward small angles. This reduction of the number of degrees of freedom dramatically affects CLs and false alarm probabilities. Naively using the same ideal hypothesis and methods as for larger separations, which are well understood and commonly assume Gaussian noise, can yield up to one order of magnitude error in contrast estimations at fixed CL. The statistical penalty exponentially increases toward very small inner working angles. Even at 5-10 resolution elements from the star, false alarm probabilities can be significantly higher than expected. Here we present a rigorous statistical analysis that ensures robustness of the CL, but also imposes a substantial limitation on corresponding achievable detection limits (thus contrast) at small angles. This unavoidable fundamental statistical effect has a significant impact on current coronagraphic and future high-contrast imagers. Finally, the paper concludes with practical recommendations to account for small number statistics when computing the sensitivity to companions at small angles and when exploiting the results of direct imaging planet surveys.

  1. Gene-Set Local Hierarchical Clustering (GSLHC)--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    PubMed

    Chung, Feng-Hsiang; Jin, Zhen-Hua; Hsu, Tzu-Ting; Hsu, Chueh-Lin; Liu, Hsueh-Chuan; Lee, Hoong-Chien

    2015-01-01

    Gene-set-based analysis (GSA), which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA), which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap), an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap), in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC) for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases.

  2. Gene-Set Local Hierarchical Clustering (GSLHC)--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    PubMed

    Chung, Feng-Hsiang; Jin, Zhen-Hua; Hsu, Tzu-Ting; Hsu, Chueh-Lin; Liu, Hsueh-Chuan; Lee, Hoong-Chien

    2015-01-01

    Gene-set-based analysis (GSA), which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA), which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap), an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap), in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC) for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases. PMID:26473729

  3. Gene-Set Local Hierarchical Clustering (GSLHC)—A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups

    PubMed Central

    Hsu, Tzu-Ting; Hsu, Chueh-Lin; Liu, Hsueh-Chuan; Lee, Hoong-Chien

    2015-01-01

    Gene-set-based analysis (GSA), which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA), which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap), an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap), in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC) for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases. PMID:26473729

  4. Non-Gaussian statistics of critical sets in 2D and 3D: Peaks, voids, saddles, genus, and skeleton

    NASA Astrophysics Data System (ADS)

    Gay, Christophe; Pichon, Christophe; Pogosyan, Dmitry

    2012-01-01

    The formalism to compute the geometrical and topological one-point statistics of mildly non-Gaussian two-dimensional and three-dimensional (3D) cosmological fields is developed. Leveraging the isotropy of the target statistics, the Gram-Charlier expansion is reformulated with rotation-invariant variables. This formulation allows us to track the geometrical statistics of the cosmic field to all orders. It then allows us to connect the one-point statistics of the critical sets to the growth factor through perturbation theory, which predicts the redshift evolution of higher-order cumulants. In particular, the cosmic nonlinear evolution of the skeleton’s length, together with the statistics of extrema and Euler characteristic are investigated in turn. In two dimensions, the corresponding differential densities are analytic as a function of the excursion set threshold and the shape parameter. In 3D, the Euler characteristics and the field isosurface area are also analytic to all orders in the expansion. Numerical integrations are performed and simple fits are provided whenever closed form expressions are not available. These statistics are compared to estimates from N-body simulations and are shown to match well the cosmic evolution up to root mean square of the density field of ˜0.2. In 3D, gravitational perturbation theory is implemented to predict the cosmic evolution of all the relevant Gram-Charlier coefficients for universes with scale-invariant matter distribution. The one-point statistics of critical sets could be used to constrain primordial non-Gaussianities and the dark energy equation of state on upcoming cosmic surveys; this is illustrated on idealized experiments.

  5. Degrees of separation as a statistical tool for evaluating candidate genes.

    PubMed

    Nelson, Ronald M; Pettersson, Mats E

    2014-12-01

    Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available.

  6. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

    PubMed Central

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646

  7. A Demonstration of Using Person-Fit Statistics in Standard Setting.

    ERIC Educational Resources Information Center

    Bay, Luz; Nering, Michael L.

    The use of person-fit methods to determine the extent to which a panelist's ratings fit the item response theory (IRT) models used in the National Assessment of Educational Progress (NAEP) is demonstrated. Person-fit methods are statistical methods that allow the identification of nonfitting response vectors. To determine whether panelists'…

  8. Mechanical Unloading of Mouse Bone in Microgravity Significantly Alters Cell Cycle Gene Set Expression

    NASA Astrophysics Data System (ADS)

    Blaber, Elizabeth; Dvorochkin, Natalya; Almeida, Eduardo; Kaplan, Warren; Burns, Brnedan

    2012-07-01

    unloading in spaceflight, we conducted genome wide microarray analysis of total RNA isolated from the mouse pelvis. Specifically, 16 week old mice were subjected to 15 days spaceflight onboard NASA's STS-131 space shuttle mission. The pelvis of the mice was dissected, the bone marrow was flushed and the bones were briefly stored in RNAlater. The pelvii were then homogenized, and RNA was isolated using TRIzol. RNA concentration and quality was measured using a Nanodrop spectrometer, and 0.8% agarose gel electrophoresis. Samples of cDNA were analyzed using an Affymetrix GeneChip\\S Gene 1.0 ST (Sense Target) Array System for Mouse and GenePattern Software. We normalized the ST gene arrays using Robust Multichip Average (RMA) normalization, which summarizes perfectly matched spots on the array through the median polish algorithm, rather than normalizing according to mismatched spots. We also used Limma for statistical analysis, using the BioConductor Limma Library by Gordon Smyth, and differential expression analysis to identify genes with significant changes in expression between the two experimental conditions. Finally we used GSEApreRanked for Gene Set Enrichment Analysis (GSEA), with Kolmogorov-Smirnov style statistics to identify groups of genes that are regulated together using the t-statistics derived from Limma. Preliminary results show that 6,603 genes expressed in pelvic bone had statistically significant alterations in spaceflight compared to ground controls. These prominently included cell cycle arrest molecules p21, and p18, cell survival molecule Crbp1, and cell cycle molecules cyclin D1, and Cdk1. Additionally, GSEA results indicated alterations in molecular targets of cyclin D1 and Cdk4, senescence pathways resulting from abnormal laminin maturation, cell-cell contacts via E-cadherin, and several pathways relating to protein translation and metabolism. In total 111 gene sets out of 2,488, about 4%, showed statistically significant set alterations. These

  9. Retroviruses and yeast retrotransposons use overlapping sets of host genes

    PubMed Central

    Irwin, Becky; Aye, Michael; Baldi, Pierre; Beliakova-Bethell, Nadejda; Cheng, Henry; Dou, Yimeng; Liou, Willy; Sandmeyer, Suzanne

    2005-01-01

    A collection of 4457 Saccharomyces cerevisiae mutants deleted for nonessential genes was screened for mutants with increased or decreased mobilization of the gypsylike retroelement Ty3. Of these, 64 exhibited increased and 66 decreased Ty3 transposition compared with the parental strain. Genes identified in this screen were grouped according to function by using GOnet software developed as part of this study. Gene clusters were related to chromatin and transcript elongation, translation and cytoplasmic RNA processing, vesicular trafficking, nuclear transport, and DNA maintenance. Sixty-six of the mutants were tested for Ty3 proteins and cDNA. Ty3 cDNA and transposition were increased in mutants affected in nuclear pore biogenesis and in a subset of mutants lacking proteins that interact physically or genetically with a replication clamp loader. Our results suggest that nuclear entry is linked mechanistically to Ty3 cDNA synthesis but that host replication factors antagonize Ty3 replication. Some of the factors we identified have been previously shown to affect Ty1 transposition and others to affect retroviral budding. Host factors, such as these, shared by distantly related Ty retroelements and retroviruses are novel candidates for antiviral targets. PMID:15837808

  10. Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

    PubMed

    Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

    2013-09-01

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling

  11. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.

    PubMed

    Lamparter, David; Marbach, Daniel; Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.

  12. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics

    PubMed Central

    Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries. PMID:26808494

  13. Re-Conceptualization of Modified Angoff Standard Setting: Unified Statistical, Measurement, Cognitive, and Social Psychological Theories

    ERIC Educational Resources Information Center

    Iyioke, Ifeoma Chika

    2013-01-01

    This dissertation describes a design for training, in accordance with probability judgment heuristics principles, for the Angoff standard setting method. The new training with instruction, practice, and feedback tailored to the probability judgment heuristics principles was called the Heuristic training and the prevailing Angoff method training…

  14. Comprehensive analysis of SET domain gene family in foxtail millet identifies the putative role of SiSET14 in abiotic stress tolerance

    PubMed Central

    Yadav, Chandra Bhan; Muthamilarasan, Mehanathan; Dangi, Anand; Shweta, Shweta; Prasad, Manoj

    2016-01-01

    SET domain-containing genes catalyse histone lysine methylation, which alters chromatin structure and regulates the transcription of genes that are involved in various developmental and physiological processes. The present study identified 53 SET domain-containing genes in C4 panicoid model, foxtail millet (Setaria italica) and the genes were physically mapped onto nine chromosomes. Phylogenetic and structural analyses classified SiSET proteins into five classes (I–V). RNA-seq derived expression profiling showed that SiSET genes were differentially expressed in four tissues namely, leaf, root, stem and spica. Expression analyses using qRT-PCR was performed for 21 SiSET genes under different abiotic stress and hormonal treatments, which showed differential expression of these genes during late phase of stress and hormonal treatments. Significant upregulation of SiSET gene was observed during cold stress, which has been confirmed by over-expressing a candidate gene, SiSET14 in yeast. Interestingly, hypermethylation was observed in gene body of highly differentially expressed genes, whereas methylation event was completely absent in their transcription start sites. This suggested the occurrence of demethylation events during various abiotic stresses, which enhance the gene expression. Altogether, the present study would serve as a base for further functional characterization of SiSET genes towards understanding their molecular roles in conferring stress tolerance. PMID:27585852

  15. Comprehensive analysis of SET domain gene family in foxtail millet identifies the putative role of SiSET14 in abiotic stress tolerance.

    PubMed

    Yadav, Chandra Bhan; Muthamilarasan, Mehanathan; Dangi, Anand; Shweta, Shweta; Prasad, Manoj

    2016-01-01

    SET domain-containing genes catalyse histone lysine methylation, which alters chromatin structure and regulates the transcription of genes that are involved in various developmental and physiological processes. The present study identified 53 SET domain-containing genes in C4 panicoid model, foxtail millet (Setaria italica) and the genes were physically mapped onto nine chromosomes. Phylogenetic and structural analyses classified SiSET proteins into five classes (I-V). RNA-seq derived expression profiling showed that SiSET genes were differentially expressed in four tissues namely, leaf, root, stem and spica. Expression analyses using qRT-PCR was performed for 21 SiSET genes under different abiotic stress and hormonal treatments, which showed differential expression of these genes during late phase of stress and hormonal treatments. Significant upregulation of SiSET gene was observed during cold stress, which has been confirmed by over-expressing a candidate gene, SiSET14 in yeast. Interestingly, hypermethylation was observed in gene body of highly differentially expressed genes, whereas methylation event was completely absent in their transcription start sites. This suggested the occurrence of demethylation events during various abiotic stresses, which enhance the gene expression. Altogether, the present study would serve as a base for further functional characterization of SiSET genes towards understanding their molecular roles in conferring stress tolerance. PMID:27585852

  16. Integrated Data Collection Analysis (IDCA) Program - Statistical Analysis of RDX Standard Data Sets

    SciTech Connect

    Sandstrom, Mary M.; Brown, Geoffrey W.; Preston, Daniel N.; Pollard, Colin J.; Warner, Kirstin F.; Sorensen, Daniel N.; Remmers, Daniel L.; Phillips, Jason J.; Shelley, Timothy J.; Reyes, Jose A.; Hsu, Peter C.; Reynolds, John G.

    2015-10-30

    The Integrated Data Collection Analysis (IDCA) program is conducting a Proficiency Test for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard. The material was tested as a well-characterized standard several times during the proficiency study to assess differences among participants and the range of results that may arise for well-behaved explosive materials. The analyses show that there are detectable differences among the results from IDCA participants. While these differences are statistically significant, most of them can be disregarded for comparison purposes to assess potential variability when laboratories attempt to measure identical samples using methods assumed to be nominally the same. The results presented in this report include the average sensitivity results for the IDCA participants and the ranges of values obtained. The ranges represent variation about the mean values of the tests of between 26% and 42%. The magnitude of this variation is attributed to differences in operator, method, and environment as well as the use of different instruments that are also of varying age. The results appear to be a good representation of the broader safety testing community based on the range of methods, instruments, and environments included in the IDCA Proficiency Test.

  17. Statistics of dark matter halos in the excursion set peak framework

    SciTech Connect

    Lapi, A.; Danese, L. E-mail: danese@sissa.it

    2014-07-01

    We derive approximated, yet very accurate analytical expressions for the abundance and clustering properties of dark matter halos in the excursion set peak framework; the latter relies on the standard excursion set approach, but also includes the effects of a realistic filtering of the density field, a mass-dependent threshold for collapse, and the prescription from peak theory that halos tend to form around density maxima. We find that our approximations work excellently for diverse power spectra, collapse thresholds and density filters. Moreover, when adopting a cold dark matter power spectra, a tophat filtering and a mass-dependent collapse threshold (supplemented with conceivable scatter), our approximated halo mass function and halo bias represent very well the outcomes of cosmological N-body simulations.

  18. Statistical inference of selection and divergence of rice blast resistance gene Pi-ta

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The resistance gene Pi-ta has been effectively used to control rice blast disease worldwide. A few recent studies have described the possible evolution of Pi-ta in cultivated and weedy rice. However, evolutionary statistics used for the studies are too limited to precisely understand selection and d...

  19. Cross-cultural adaptation of research instruments: language, setting, time and statistical considerations

    PubMed Central

    2010-01-01

    Background Research questionnaires are not always translated appropriately before they are used in new temporal, cultural or linguistic settings. The results based on such instruments may therefore not accurately reflect what they are supposed to measure. This paper aims to illustrate the process and required steps involved in the cross-cultural adaptation of a research instrument using the adaptation process of an attitudinal instrument as an example. Methods A questionnaire was needed for the implementation of a study in Norway 2007. There was no appropriate instruments available in Norwegian, thus an Australian-English instrument was cross-culturally adapted. Results The adaptation process included investigation of conceptual and item equivalence. Two forward and two back-translations were synthesized and compared by an expert committee. Thereafter the instrument was pretested and adjusted accordingly. The final questionnaire was administered to opioid maintenance treatment staff (n=140) and harm reduction staff (n=180). The overall response rate was 84%. The original instrument failed confirmatory analysis. Instead a new two-factor scale was identified and found valid in the new setting. Conclusions The failure of the original scale highlights the importance of adapting instruments to current research settings. It also emphasizes the importance of ensuring that concepts within an instrument are equal between the original and target language, time and context. If the described stages in the cross-cultural adaptation process had been omitted, the findings would have been misleading, even if presented with apparent precision. Thus, it is important to consider possible barriers when making a direct comparison between different nations, cultures and times. PMID:20144247

  20. Genome-Wide Survey and Developmental Expression Mapping of Zebrafish SET Domain-Containing Genes

    PubMed Central

    Zhou, Ting; Hu, Ming; Fu, Chun-Tang; Zhang, Yong; Jin, Yi; Chen, Yi; Chen, Sai-Juan; Huang, Qiu-Hua; Liu, Ting Xi; Chen, Zhu

    2008-01-01

    SET domain-containing proteins represent an evolutionarily conserved family of epigenetic regulators, which are responsible for most histone lysine methylation. Since some of these genes have been revealed to be essential for embryonic development, we propose that the zebrafish, a vertebrate model organism possessing many advantages for developmental studies, can be utilized to study the biological functions of these genes and the related epigenetic mechanisms during early development. To this end, we have performed a genome-wide survey of zebrafish SET domain genes. 58 genes total have been identified. Although gene duplication events give rise to several lineage-specific paralogs, clear reciprocal orthologous relationship reveals high conservation between zebrafish and human SET domain genes. These data were further subject to an evolutionary analysis ranging from yeast to human, leading to the identification of putative clusters of orthologous groups (COGs) of this gene family. By means of whole-mount mRNA in situ hybridization strategy, we have also carried out a developmental expression mapping of these genes. A group of maternal SET domain genes, which are implicated in the programming of histone modification states in early development, have been identified and predicted to be responsible for all known sites of SET domain-mediated histone methylation. Furthermore, some genes show specific expression patterns in certain tissues at certain stages, suggesting the involvement of epigenetic mechanisms in the development of these systems. These results provide a global view of zebrafish SET domain histone methyltransferases in evolutionary and developmental dimensions and pave the way for using zebrafish to systematically study the roles of these genes during development. PMID:18231586

  1. Statistical Analysis of Hurst Exponents of Essential/Nonessential Genes in 33 Bacterial Genomes

    PubMed Central

    Liu, Xiao; Wang, Baojin; Xu, Luo

    2015-01-01

    Methods for identifying essential genes currently depend predominantly on biochemical experiments. However, there is demand for improved computational methods for determining gene essentiality. In this study, we used the Hurst exponent, a characteristic parameter to describe long-range correlation in DNA, and analyzed its distribution in 33 bacterial genomes. In most genomes (31 out of 33) the significance levels of the Hurst exponents of the essential genes were significantly higher than for the corresponding full-gene-set, whereas the significance levels of the Hurst exponents of the nonessential genes remained unchanged or increased only slightly. All of the Hurst exponents of essential genes followed a normal distribution, with one exception. We therefore propose that the distribution feature of Hurst exponents of essential genes can be used as a classification index for essential gene prediction in bacteria. For computer-aided design in the field of synthetic biology, this feature can build a restraint for pre- or post-design checking of bacterial essential genes. Moreover, considering the relationship between gene essentiality and evolution, the Hurst exponents could be used as a descriptive parameter related to evolutionary level, or be added to the annotation of each gene. PMID:26067107

  2. Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks

    PubMed Central

    Blatti, Charles; Sinha, Saurabh

    2016-01-01

    Motivation: Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or ‘properties’ such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene–gene or gene–property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks. Results: We present a network-based method for ranking genes or properties related to a given gene set. Such related genes or properties are identified from among the nodes of a large, heterogeneous network of biological information. Our method involves a random walk with restarts, performed on an initial network with multiple node and edge types that preserve more of the original, specific property information than current methods that operate on homogeneous networks. In this first stage of our algorithm, we find the properties that are the most relevant to the given gene set and extract a subnetwork of the original network, comprising only these relevant properties. We then re-rank genes by their similarity to the given gene set, based on a second random walk with restarts, performed on the above subnetwork. We demonstrate the effectiveness of this algorithm for ranking genes related to Drosophila embryonic development and aggressive responses in the brains of social animals. Availability and Implementation: DRaWR was implemented as

  3. A statistical investigation into the stability of iris recognition in diverse population sets

    NASA Astrophysics Data System (ADS)

    Howard, John J.; Etter, Delores M.

    2014-05-01

    Iris recognition is increasingly being deployed on population wide scales for important applications such as border security, social service administration, criminal identification and general population management. The error rates for this incredibly accurate form of biometric identification are established using well known, laboratory quality datasets. However, it is has long been acknowledged in biometric theory that not all individuals have the same likelihood of being correctly serviced by a biometric system. Typically, techniques for identifying clients that are likely to experience a false non-match or a false match error are carried out on a per-subject basis. This research makes the novel hypothesis that certain ethnical denominations are more or less likely to experience a biometric error. Through established statistical techniques, we demonstrate this hypothesis to be true and document the notable effect that the ethnicity of the client has on iris similarity scores. Understanding the expected impact of ethnical diversity on iris recognition accuracy is crucial to the future success of this technology as it is deployed in areas where the target population consists of clientele from a range of geographic backgrounds, such as border crossings and immigration check points.

  4. Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression.

    PubMed

    Zhang, Shao-Wu; Shao, Dong-Dong; Zhang, Song-Yao; Wang, Yi-Bin

    2014-06-01

    The identification of disease genes is very important not only to provide greater understanding of gene function and cellular mechanisms which drive human disease, but also to enhance human disease diagnosis and treatment. Recently, high-throughput techniques have been applied to detect dozens or even hundreds of candidate genes. However, experimental approaches to validate the many candidates are usually time-consuming, tedious and expensive, and sometimes lack reproducibility. Therefore, numerous theoretical and computational methods (e.g. network-based approaches) have been developed to prioritize candidate disease genes. Many network-based approaches implicitly utilize the observation that genes causing the same or similar diseases tend to correlate with each other in gene-protein relationship networks. Of these network approaches, the random walk with restart algorithm (RWR) is considered to be a state-of-the-art approach. To further improve the performance of RWR, we propose a novel method named ESFSC to identify disease-related genes, by enlarging the seed set according to the centrality of disease genes in a network and fusing information of the protein-protein interaction (PPI) network topological similarity and the gene expression correlation. The ESFSC algorithm restarts at all of the nodes in the seed set consisting of the known disease genes and their k-nearest neighbor nodes, then walks in the global network separately guided by the similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles. As a result, all the genes in the network are ranked by weighted fusing the above results of the RWR guided by two types of transition matrices. Comprehensive simulation results of the 10 diseases with 97 known disease genes collected from the Online Mendelian Inheritance in Man (OMIM) database show that ESFSC outperforms existing methods for

  5. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  6. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  7. A U-statistics-based approach for modeling Cronbach coefficient alpha within a longitudinal data setting.

    PubMed

    Yan, Ma; Alejandro, Gonzalez Della Valle; Hui, Zhang; Tu, X M

    2010-03-15

    Cronbach coefficient alpha (CCA) is a classic measure of item internal consistency of an instrument and is used in a wide range of behavioral, biomedical, psychosocial, and health-care-related research. Methods are available for making inference about one CCA or multiple CCAs from correlated outcomes. However, none of the existing approaches effectively address missing data. As longitudinal study designs become increasingly popular and complex in modern-day clinical studies, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data (at the instrument level due to subject dropout) within a longitudinal data setting. The approach is illustrated with both clinical and simulated data.

  8. An abdominal aortic aneurysm segmentation method: Level set with region and statistical information

    SciTech Connect

    Zhuge Feng; Rubin, Geoffrey D.; Sun Shaohua; Napel, Sandy

    2006-05-15

    We present a system for segmenting the human aortic aneurysm in CT angiograms (CTA), which, in turn, allows measurements of volume and morphological aspects useful for treatment planning. The system estimates a rough 'initial surface', and then refines it using a level set segmentation scheme augmented with two external analyzers: The global region analyzer, which incorporates a priori knowledge of the intensity, volume, and shape of the aorta and other structures, and the local feature analyzer, which uses voxel location, intensity, and texture features to train and drive a support vector machine classifier. Each analyzer outputs a value that corresponds to the likelihood that a given voxel is part of the aneurysm, which is used during level set iteration to control the evolution of the surface. We tested our system using a database of 20 CTA scans of patients with aortic aneurysms. The mean and worst case values of volume overlap, volume error, mean distance error, and maximum distance error relative to human tracing were 95.3%{+-}1.4% (s.d.); worst case=92.9%, 3.5%{+-}2.5% (s.d.); worst case=7.0%, 0.6{+-}0.2 mm (s.d.); worst case=1.0 mm, and 5.2{+-}2.3mm (s.d.); worstcase=9.6 mm, respectively. When implemented on a 2.8 GHz Pentium IV personal computer, the mean time required for segmentation was 7.4{+-}3.6min (s.d.). We also performed experiments that suggest that our method is insensitive to parameter changes within 10% of their experimentally determined values. This preliminary study proves feasibility for an accurate, precise, and robust system for segmentation of the abdominal aneurysm from CTA data, and may be of benefit to patients with aortic aneurysms.

  9. Gene integrated set profile analysis: a context-based approach for inferring biological endpoints

    PubMed Central

    Kowalski, Jeanne; Dwivedi, Bhakti; Newman, Scott; Switchenko, Jeffery M.; Pauly, Rini; Gutman, David A.; Arora, Jyoti; Gandhi, Khanjan; Ainslie, Kylie; Doho, Gregory; Qin, Zhaohui; Moreno, Carlos S.; Rossi, Michael R.; Vertino, Paula M.; Lonial, Sagar; Bernal-Mizrachi, Leon; Boise, Lawrence H.

    2016-01-01

    The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance. PMID:26826710

  10. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    PubMed Central

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  11. EVE (external variance estimation) increases statistical power for detecting differentially expressed genes.

    PubMed

    Wille, Anja; Gruissem, Wilhelm; Bühlmann, Peter; Hennig, Lars

    2007-11-01

    Accurately identifying differentially expressed genes from microarray data is not a trivial task, partly because of poor variance estimates of gene expression signals. Here, after analyzing 380 replicated microarray experiments, we found that probesets have typical, distinct variances that can be estimated based on a large number of microarray experiments. These probeset-specific variances depend at least in part on the function of the probed gene: genes for ribosomal or structural proteins often have a small variance, while genes implicated in stress responses often have large variances. We used these variance estimates to develop a statistical test for differentially expressed genes called EVE (external variance estimation). The EVE algorithm performs better than the t-test and LIMMA on some real-world data, where external information from appropriate databases is available. Thus, EVE helps to maximize the information gained from a typical microarray experiment. Nonetheless, only a large number of replicates will guarantee to identify nearly all truly differentially expressed genes. However, our simulation studies suggest that even limited numbers of replicates will usually result in good coverage of strongly differentially expressed genes.

  12. Consistently altered expression of gene sets in postmortem brains of individuals with major psychiatric disorders

    PubMed Central

    Darby, M M; Yolken, R H; Sabunciyan, S

    2016-01-01

    The measurement of gene expression in postmortem brain is an important tool for understanding the pathogenesis of serious psychiatric disorders. We hypothesized that major molecular deficits associated with psychiatric disease would affect the entire brain, and such deficits may be shared across disorders. We performed RNA sequencing and quantified gene expression in the hippocampus of 100 brains in the Stanley Array Collection followed by replication in the orbitofrontal cortex of 57 brains in the Stanley Neuropathology Consortium. We then identified genes and canonical pathway gene sets with significantly altered expression in schizophrenia and bipolar disorder in the hippocampus and in schizophrenia, bipolar disorder and major depression in the orbitofrontal cortex. Although expression of individual genes varied, gene sets were significantly enriched in both of the brain regions, and many of these were consistent across diagnostic groups. Further examination of core gene sets with consistently increased or decreased expression in both of the brain regions and across target disorders revealed that ribosomal genes are overexpressed while genes involved in neuronal processes, GABAergic signaling, endocytosis and antigen processing have predominantly decreased expression in affected individuals compared to controls without a psychiatric disorder. Our results highlight pathways of central importance to psychiatric health and emphasize messenger RNA processing and protein synthesis as potential therapeutic targets for all three of the disorders. PMID:27622934

  13. Consistently altered expression of gene sets in postmortem brains of individuals with major psychiatric disorders.

    PubMed

    Darby, M M; Yolken, R H; Sabunciyan, S

    2016-01-01

    The measurement of gene expression in postmortem brain is an important tool for understanding the pathogenesis of serious psychiatric disorders. We hypothesized that major molecular deficits associated with psychiatric disease would affect the entire brain, and such deficits may be shared across disorders. We performed RNA sequencing and quantified gene expression in the hippocampus of 100 brains in the Stanley Array Collection followed by replication in the orbitofrontal cortex of 57 brains in the Stanley Neuropathology Consortium. We then identified genes and canonical pathway gene sets with significantly altered expression in schizophrenia and bipolar disorder in the hippocampus and in schizophrenia, bipolar disorder and major depression in the orbitofrontal cortex. Although expression of individual genes varied, gene sets were significantly enriched in both of the brain regions, and many of these were consistent across diagnostic groups. Further examination of core gene sets with consistently increased or decreased expression in both of the brain regions and across target disorders revealed that ribosomal genes are overexpressed while genes involved in neuronal processes, GABAergic signaling, endocytosis and antigen processing have predominantly decreased expression in affected individuals compared to controls without a psychiatric disorder. Our results highlight pathways of central importance to psychiatric health and emphasize messenger RNA processing and protein synthesis as potential therapeutic targets for all three of the disorders. PMID:27622934

  14. Integration of Diverse Statistical Evidence of Gene-Trait Association in Systems Biology Studies

    PubMed Central

    Cheng, Cheng

    2012-01-01

    The rapid advancement of high-throughput genomic assay technologies has generated large amounts of diverse genomic data in disparate human populations and diseases. These data provide a unique opportunity for biomedical investigators to systematically study multifaceted aspects of genes’ involvement in the biological processes underlying important traits from the systems biology perspective. An important component in such a study is the inference that integrates diverse lines of statistical evidence for gene-trait association from the observed trait values and the massive numbers of measured genomic features. A novel integrated statistical analysis procedure is developed in this paper and is illustrated by an application in studying childhood leukemia. PMID:22589094

  15. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)?

    PubMed Central

    Barrett, Craig F.; Specht, Chelsea D.; Leebens-Mack, Jim; Stevenson, Dennis Wm.; Zomlefer, Wendy B.; Davis, Jerrold I.

    2014-01-01

    Background and Aims Zingiberales comprise a clade of eight tropical monocot families including approx. 2500 species and are hypothesized to have undergone an ancient, rapid radiation during the Cretaceous. Zingiberales display substantial variation in floral morphology, and several members are ecologically and economically important. Deep phylogenetic relationships among primary lineages of Zingiberales have proved difficult to resolve in previous studies, representing a key region of uncertainty in the monocot tree of life. Methods Next-generation sequencing was used to construct complete plastid gene sets for nine taxa of Zingiberales, which were added to five previously sequenced sets in an attempt to resolve deep relationships among families in the order. Variation in taxon sampling, process partition inclusion and partition model parameters were examined to assess their effects on topology and support. Key Results Codon-based likelihood analysis identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Lowiaceae, Strelitziaceae)), collectively sister to Heliconiaceae. However, the deepest divergences in this phylogenetic analysis comprised short branches with weak support. Additionally, manipulation of matrices resulted in differing deep topologies in an unpredictable fashion. Alternative topology testing allowed statistical rejection of some of the topologies. Saturation fails to explain observed topological uncertainty and low support at the base of Zingiberales. Evidence for conflict among the plastid data was based on a support metric that accounts for conflicting resampled topologies. Conclusions Many relationships were resolved with robust support, but the paucity of character information supporting the deepest nodes and the existence of conflict suggest that plastid coding regions are insufficient to resolve and support the earliest divergences among families of Zingiberales. Whole plastomes

  16. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.

    PubMed

    Tarca, Adi L; Bhatti, Gaurav; Romero, Roberto

    2013-01-01

    Identification of functional sets of genes associated with conditions of interest from omics data was first reported in 1999, and since, a plethora of enrichment methods were published for systematic analysis of gene sets collections including Gene Ontology and biological pathways. Despite their widespread usage in reducing the complexity of omics experiment results, their performance is poorly understood. Leveraging the existence of disease specific gene sets in KEGG and Metacore® databases, we compared the performance of sixteen methods under relaxed assumptions while using 42 real datasets (over 1,400 samples). Most of the methods ranked high the gene sets designed for specific diseases whenever samples from affected individuals were compared against controls via microarrays. The top methods for gene set prioritization were different from the top ones in terms of sensitivity, and four of the sixteen methods had large false positives rates assessed by permuting the phenotype of the samples. The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG. The best method in the category that generated higher than expected false positives was MRGSE.

  17. Identification of a conserved set of upregulated genes in mouse skeletal muscle hypertrophy and regrowth

    PubMed Central

    Chaillou, Thomas; Jackson, Janna R.; England, Jonathan H.; Kirby, Tyler J.; Richards-White, Jena; Esser, Karyn A.; Dupont-Versteegden, Esther E.

    2014-01-01

    The purpose of this study was to compare the gene expression profile of mouse skeletal muscle undergoing two forms of growth (hypertrophy and regrowth) with the goal of identifying a conserved set of differentially expressed genes. Expression profiling by microarray was performed on the plantaris muscle subjected to 1, 3, 5, 7, 10, and 14 days of hypertrophy or regrowth following 2 wk of hind-limb suspension. We identified 97 differentially expressed genes (≥2-fold increase or ≥50% decrease compared with control muscle) that were conserved during the two forms of muscle growth. The vast majority (∼90%) of the differentially expressed genes was upregulated and occurred at a single time point (64 out of 86 genes), which most often was on the first day of the time course. Microarray analysis from the conserved upregulated genes showed a set of genes related to contractile apparatus and stress response at day 1, including three genes involved in mechanotransduction and four genes encoding heat shock proteins. Our analysis further identified three cell cycle-related genes at day and several genes associated with extracellular matrix (ECM) at both days 3 and 10. In conclusion, we have identified a core set of genes commonly upregulated in two forms of muscle growth that could play a role in the maintenance of sarcomere stability, ECM remodeling, cell proliferation, fast-to-slow fiber type transition, and the regulation of skeletal muscle growth. These findings suggest conserved regulatory mechanisms involved in the adaptation of skeletal muscle to increased mechanical loading. PMID:25554798

  18. Gene-set Analysis with CGI Information for Differential DNA Methylation Profiling

    PubMed Central

    Chang, Chia-Wei; Lu, Tzu-Pin; She, Chang-Xian; Feng, Yen-Chen; Hsiao, Chuhsing Kate

    2016-01-01

    DNA methylation is a well-established epigenetic biomarker for many diseases. Studying the relationships among a group of genes and their methylations may help to unravel the etiology of diseases. Since CpG-islands (CGIs) play a crucial role in the regulation of transcription during methylation, including them in the analysis may provide further information in understanding the pathogenesis of cancers. Such CGI information, however, has usually been overlooked in existing gene-set analyses. Here we aimed to include both pathway information and CGI status to rank competing gene-sets and identify among them the genes most likely contributing to DNA methylation changes. To accomplish this, we devised a Bayesian model for matched case-control studies with parameters for CGI status and pathway associations, while incorporating intra-gene-set information. Three cancer studies with candidate pathways were analyzed to illustrate this approach. The strength of association for each candidate pathway and the influence of each gene were evaluated. Results show that, based on probabilities, the importance of pathways and genes can be determined. The findings confirm that some of these genes are cancer-related and may hold the potential to be targeted in drug development. PMID:27090937

  19. Mechanism-based biomarker gene sets for glutathione depletion-related hepatotoxicity in rats

    SciTech Connect

    Gao Weihua; Mizukawa, Yumiko; Nakatsu, Noriyuki; Minowa, Yosuke; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro

    2010-09-15

    Chemical-induced glutathione depletion is thought to be caused by two types of toxicological mechanisms: PHO-type glutathione depletion [glutathione conjugated with chemicals such as phorone (PHO) or diethyl maleate (DEM)], and BSO-type glutathione depletion [i.e., glutathione synthesis inhibited by chemicals such as L-buthionine-sulfoximine (BSO)]. In order to identify mechanism-based biomarker gene sets for glutathione depletion in rat liver, male SD rats were treated with various chemicals including PHO (40, 120 and 400 mg/kg), DEM (80, 240 and 800 mg/kg), BSO (150, 450 and 1500 mg/kg), and bromobenzene (BBZ, 10, 100 and 300 mg/kg). Liver samples were taken 3, 6, 9 and 24 h after administration and examined for hepatic glutathione content, physiological and pathological changes, and gene expression changes using Affymetrix GeneChip Arrays. To identify differentially expressed probe sets in response to glutathione depletion, we focused on the following two courses of events for the two types of mechanisms of glutathione depletion: a) gene expression changes occurring simultaneously in response to glutathione depletion, and b) gene expression changes after glutathione was depleted. The gene expression profiles of the identified probe sets for the two types of glutathione depletion differed markedly at times during and after glutathione depletion, whereas Srxn1 was markedly increased for both types as glutathione was depleted, suggesting that Srxn1 is a key molecule in oxidative stress related to glutathione. The extracted probe sets were refined and verified using various compounds including 13 additional positive or negative compounds, and they established two useful marker sets. One contained three probe sets (Akr7a3, Trib3 and Gstp1) that could detect conjugation-type glutathione depletors any time within 24 h after dosing, and the other contained 14 probe sets that could detect glutathione depletors by any mechanism. These two sets, with appropriate scoring

  20. Identification of a set of genes showing regionally enriched expression in the mouse brain

    PubMed Central

    D'Souza, Cletus A; Chopra, Vikramjit; Varhol, Richard; Xie, Yuan-Yun; Bohacec, Slavita; Zhao, Yongjun; Lee, Lisa LC; Bilenky, Mikhail; Portales-Casamar, Elodie; He, An; Wasserman, Wyeth W; Goldowitz, Daniel; Marra, Marco A; Holt, Robert A; Simpson, Elizabeth M; Jones, Steven JM

    2008-01-01

    Background The Pleiades Promoter Project aims to improve gene therapy by designing human mini-promoters (< 4 kb) that drive gene expression in specific brain regions or cell-types of therapeutic interest. Our goal was to first identify genes displaying regionally enriched expression in the mouse brain so that promoters designed from orthologous human genes can then be tested to drive reporter expression in a similar pattern in the mouse brain. Results We have utilized LongSAGE to identify regionally enriched transcripts in the adult mouse brain. As supplemental strategies, we also performed a meta-analysis of published literature and inspected the Allen Brain Atlas in situ hybridization data. From a set of approximately 30,000 mouse genes, 237 were identified as showing specific or enriched expression in 30 target regions of the mouse brain. GO term over-representation among these genes revealed co-involvement in various aspects of central nervous system development and physiology. Conclusion Using a multi-faceted expression validation approach, we have identified mouse genes whose human orthologs are good candidates for design of mini-promoters. These mouse genes represent molecular markers in several discrete brain regions/cell-types, which could potentially provide a mechanistic explanation of unique functions performed by each region. This set of markers may also serve as a resource for further studies of gene regulatory elements influencing brain expression. PMID:18625066

  1. Employing gene set top scoring pairs to identify deregulated pathway-signatures in dilated cardiomyopathy from integrated microarray gene expression data.

    PubMed

    Tan, Aik Choon

    2012-01-01

    It is well accepted that a set of genes must act in concert to drive various cellular processes. However, under different biological phenotypes, not all the members of a gene set will participate in a biological process. Hence, it is useful to construct a discriminative classifier by focusing on the core members (subset) of a highly informative gene set. Such analyses can reveal which of those subsets from the same gene set correspond to different biological phenotypes. In this study, we propose Gene Set Top Scoring Pairs (GSTSP) approach that exploits the simple yet powerful relative expression reversal concept at the gene set levels to achieve these goals. To illustrate the usefulness of GSTSP, we applied this method to five different human heart failure gene expression data sets. We take advantage of the direct data integration feature in the GSTSP approach to combine two data sets, identify a discriminative gene set from >190 predefined gene sets, and evaluate the predictive power of the GSTSP classifier derived from this informative gene set on three independent test sets (79.31% in test accuracy). The discriminative gene pairs identified in this study may provide new biological understanding on the disturbed pathways that are involved in the development of heart failure. GSTSP methodology is general in purpose and is applicable to a variety of phenotypic classification problems using gene expression data.

  2. Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis.

    PubMed

    Hass, Johanna; Walton, Esther; Wright, Carrie; Beyer, Andreas; Scholz, Markus; Turner, Jessica; Liu, Jingyu; Smolka, Michael N; Roessner, Veit; Sponheim, Scott R; Gollub, Randy L; Calhoun, Vince D; Ehrlich, Stefan

    2015-06-01

    Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, among others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behavior and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia. PMID:25598502

  3. Associations between DNA methylation and schizophrenia-related intermediate phenotypes - a gene set enrichment analysis.

    PubMed

    Hass, Johanna; Walton, Esther; Wright, Carrie; Beyer, Andreas; Scholz, Markus; Turner, Jessica; Liu, Jingyu; Smolka, Michael N; Roessner, Veit; Sponheim, Scott R; Gollub, Randy L; Calhoun, Vince D; Ehrlich, Stefan

    2015-06-01

    Multiple genetic approaches have identified microRNAs as key effectors in psychiatric disorders as they post-transcriptionally regulate expression of thousands of target genes. However, their role in specific psychiatric diseases remains poorly understood. In addition, epigenetic mechanisms such as DNA methylation, which affect the expression of both microRNAs and coding genes, are critical for our understanding of molecular mechanisms in schizophrenia. Using clinical, imaging, genetic, and epigenetic data of 103 patients with schizophrenia and 111 healthy controls of the Mind Clinical Imaging Consortium (MCIC) study of schizophrenia, we conducted gene set enrichment analysis to identify markers for schizophrenia-associated intermediate phenotypes. Genes were ranked based on the correlation between DNA methylation patterns and each phenotype, and then searched for enrichment in 221 predicted microRNA target gene sets. We found the predicted hsa-miR-219a-5p target gene set to be significantly enriched for genes (EPHA4, PKNOX1, ESR1, among others) whose methylation status is correlated with hippocampal volume independent of disease status. Our results were strengthened by significant associations between hsa-miR-219a-5p target gene methylation patterns and hippocampus-related neuropsychological variables. IPA pathway analysis of the respective predicted hsa-miR-219a-5p target genes revealed associated network functions in behavior and developmental disorders. Altered methylation patterns of predicted hsa-miR-219a-5p target genes are associated with a structural aberration of the brain that has been proposed as a possible biomarker for schizophrenia. The (dys)regulation of microRNA target genes by epigenetic mechanisms may confer additional risk for developing psychiatric symptoms. Further study is needed to understand possible interactions between microRNAs and epigenetic changes and their impact on risk for brain-based disorders such as schizophrenia.

  4. Neighborhood Rough Set Reduction-Based Gene Selection and Prioritization for Gene Expression Profile Analysis and Molecular Cancer Classification

    PubMed Central

    Hou, Mei-Ling; Wang, Shu-Lin; Li, Xue-Ling; Lei, Ying-Ke

    2010-01-01

    Selection of reliable cancer biomarkers is crucial for gene expression profile-based precise diagnosis of cancer type and successful treatment. However, current studies are confronted with overfitting and dimensionality curse in tumor classification and false positives in the identification of cancer biomarkers. Here, we developed a novel gene-ranking method based on neighborhood rough set reduction for molecular cancer classification based on gene expression profile. Comparison with other methods such as PAM, ClaNC, Kruskal-Wallis rank sum test, and Relief-F, our method shows that only few top-ranked genes could achieve higher tumor classification accuracy. Moreover, although the selected genes are not typical of known oncogenes, they are found to play a crucial role in the occurrence of tumor through searching the scientific literature and analyzing protein interaction partners, which may be used as candidate cancer biomarkers. PMID:20625410

  5. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    PubMed Central

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values < 0.05) of the next-gen TM-derived gene sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants

  6. Statistical methods in detecting differential expressed genes, analyzing insertion tolerance for genes and group selection for survival data

    NASA Astrophysics Data System (ADS)

    Liu, Fangfang

    The thesis is composed of three independent projects: (i) analyzing transposon-sequencing data to infer functions of genes on bacteria growth (chapter 2), (ii) developing semi-parametric Bayesian method for differential gene expression analysis with RNA-sequencing data (chapter 3), (iii) solving group selection problem for survival data (chapter 4). All projects are motivated by statistical challenges raised in biological research. The first project is motivated by the need to develop statistical models to accommodate the transposon insertion sequencing (Tn-Seq) data, Tn-Seq data consist of sequence reads around each transposon insertion site. The detection of transposon insertion at a given site indicates that the disruption of genomic sequence at this site does not cause essential function loss and the bacteria can still grow. Hence, such measurements have been used to infer the functions of each gene on bacteria growth. We propose a zero-inflated Poisson regression method for analyzing the Tn-Seq count data, and derive an Expectation-Maximization (EM) algorithm to obtain parameter estimates. We also propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant, and hyper-tolerant, while controlling false discovery rate. Simulation studies show our method provides good estimation of model parameters and inference on gene functions. In the second project, we model the count data from RNA-sequencing experiment for each gene using a Poisson-Gamma hierarchical model, or equivalently, a negative binomial (NB) model. We derive a full semi-parametric Bayesian approach with Dirichlet process as the prior for the fold changes between two treatment means. An inference strategy using Gibbs algorithm is developed for differential expression analysis. We evaluate our method with several simulation studies, and the results demonstrate that our method outperforms other methods including the popularly applied ones such as edge

  7. Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana

    PubMed Central

    2010-01-01

    Background The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific. Results Comparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self

  8. A prognosis classifier for breast cancer based on conserved gene regulation between mammary gland development and tumorigenesis: a multiscale statistical model.

    PubMed

    Tian, Yingpu; Chen, Baozhen; Guan, Pengfei; Kang, Yujia; Lu, Zhongxian

    2013-01-01

    Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0-2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer. PMID:23565194

  9. A prognosis classifier for breast cancer based on conserved gene regulation between mammary gland development and tumorigenesis: a multiscale statistical model.

    PubMed

    Tian, Yingpu; Chen, Baozhen; Guan, Pengfei; Kang, Yujia; Lu, Zhongxian

    2013-01-01

    Identification of novel cancer genes for molecular therapy and diagnosis is a current focus of breast cancer research. Although a few small gene sets were identified as prognosis classifiers, more powerful models are still needed for the definition of effective gene sets for the diagnosis and treatment guidance in breast cancer. In the present study, we have developed a novel statistical approach for systematic analysis of intrinsic correlations of gene expression between development and tumorigenesis in mammary gland. Based on this analysis, we constructed a predictive model for prognosis in breast cancer that may be useful for therapy decisions. We first defined developmentally associated genes from a mouse mammary gland epithelial gene expression database. Then, we found that the cancer modulated genes were enriched in this developmentally associated genes list. Furthermore, the developmentally associated genes had a specific expression profile, which associated with the molecular characteristics and histological grade of the tumor. These result suggested that the processes of mammary gland development and tumorigenesis share gene regulatory mechanisms. Then, the list of regulatory genes both on the developmental and tumorigenesis process was defined an 835-member prognosis classifier, which showed an exciting ability to predict clinical outcome of three groups of breast cancer patients (the predictive accuracy 64∼72%) with a robust prognosis prediction (hazard ratio 3.3∼3.8, higher than that of other clinical risk factors (around 2.0-2.8)). In conclusion, our results identified the conserved molecular mechanisms between mammary gland development and neoplasia, and provided a unique potential model for mining unknown cancer genes and predicting the clinical status of breast tumors. These findings also suggested that developmental roles of genes may be important criteria for selecting genes for prognosis prediction in breast cancer.

  10. Histone H4 Lys 20 monomethylation by histone methylase SET8 mediates Wnt target gene activation.

    PubMed

    Li, Zhenfei; Nie, Fen; Wang, Sheng; Li, Lin

    2011-02-22

    Histone methylation has an important role in transcriptional regulation. However, unlike H3K4 and H3K9 methylation, the role of H4K20 monomethylation (H4K20me-1) in transcriptional regulation remains unclear. Here, we show that Wnt3a specifically stimulates H4K20 monomethylation at the T cell factor (TCF)-binding element through the histone methylase SET8. Additionally, SET8 is crucial for activation of the Wnt reporter gene and target genes in both mammalian cells and zebrafish. Furthermore, SET8 interacts with lymphoid enhancing factor-1 (LEF1)/TCF4 directly, and this interaction is regulated by Wnt3a. Therefore, we conclude that SET8 is a Wnt signaling mediator and is recruited by LEF1/TCF4 to regulate the transcription of Wnt-activated genes, possibly through H4K20 monomethylation at the target gene promoters. Our findings also indicate that H4K20me-1 is a marker for gene transcription activation, at least in canonical Wnt signaling. PMID:21282610

  11. Multivariate Risk Adjustment of Primary Care Patient Panels in a Public Health Setting: A Comparison of Statistical Models.

    PubMed

    Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani

    2016-01-01

    We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings.

  12. Multivariate Risk Adjustment of Primary Care Patient Panels in a Public Health Setting: A Comparison of Statistical Models.

    PubMed

    Hirozawa, Anne M; Montez-Rath, Maria E; Johnson, Elizabeth C; Solnit, Stephen A; Drennan, Michael J; Katz, Mitchell H; Marx, Rani

    2016-01-01

    We compared prospective risk adjustment models for adjusting patient panels at the San Francisco Department of Public Health. We used 4 statistical models (linear regression, two-part model, zero-inflated Poisson, and zero-inflated negative binomial) and 4 subsets of predictor variables (age/gender categories, chronic diagnoses, homelessness, and a loss to follow-up indicator) to predict primary care visit frequency. Predicted visit frequency was then used to calculate patient weights and adjusted panel sizes. The two-part model using all predictor variables performed best (R = 0.20). This model, designed specifically for safety net patients, may prove useful for panel adjustment in other public health settings. PMID:27576054

  13. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification.

    PubMed

    Li, Cong-Jun; Li, Robert W; Baldwin, Ransom L; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  14. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification

    PubMed Central

    Li, Cong-Jun; Li, Robert W.; Baldwin, Ransom L.; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  15. Gene regulatory network inference using fused LASSO on multiple data sets.

    PubMed

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M O; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-02-11

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions.

  16. Gene regulatory network inference using fused LASSO on multiple data sets

    PubMed Central

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-01-01

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687

  17. Protein Interaction Networks Reveal Novel Autism Risk Genes within GWAS Statistical Noise

    PubMed Central

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M.

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical “noise” that warrant further analysis for causal variants. PMID:25409314

  18. Protein interaction networks reveal novel autism risk genes within GWAS statistical noise.

    PubMed

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical "noise" that warrant further analysis for causal variants.

  19. Protein interaction networks reveal novel autism risk genes within GWAS statistical noise.

    PubMed

    Correia, Catarina; Oliveira, Guiomar; Vicente, Astrid M

    2014-01-01

    Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical "noise" that warrant further analysis for causal variants. PMID:25409314

  20. A small set of extra-embryonic genes defines a new landmark for bovine embryo staging.

    PubMed

    Degrelle, Séverine A; Lê Cao, Kim-Anh; Heyman, Yvan; Everts, Robin E; Campion, Evelyne; Richard, Christophe; Ducroix-Crépy, Céline; Tian, X Cindy; Lewin, Harris A; Renard, Jean-Paul; Robert-Granié, Christèle; Hue, Isabelle

    2011-01-01

    Axis specification in mouse is determined by a sequence of reciprocal interactions between embryonic and extra-embryonic tissues so that a few extra-embryonic genes appear as 'patterning' the embryo. Considering these interactions as essential, but lacking in most mammals the genetically driven approaches used in mouse and the corresponding patterning mutants, we examined whether a molecular signature originating from extra-embryonic tissues could relate to the developmental stage of the embryo proper and predict it. To this end, we have profiled bovine extra-embryonic tissues at peri-implantation stages, when gastrulation and early neurulation occur, and analysed the subsequent expression profiles through the use of predictive methods as previously reported for tumour classification. A set of six genes (CALM1, CPA3, CITED1, DLD, HNRNPDL, and TGFB3), half of which had not been previously associated with any extra-embryonic feature, appeared significantly discriminative and mainly dependent on embryonic tissues for its faithful expression. The predictive value of this set of genes for gastrulation and early neurulation stages, as assessed on naive samples, was remarkably high (93%). In silico connected to the bovine orthologues of the mouse patterning genes, this gene set is proposed as a new trait for embryo staging. As such, this will allow saving the bovine embryo proper for molecular or cellular studies. To us, it offers as well new perspectives for developmental phenotyping and modelling of embryonic/extra-embryonic co-differentiation.

  1. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  2. Harnessing the power of gene microarrays for the study of brain aging and Alzheimer's disease: statistical reliability and functional correlation.

    PubMed

    Blalock, E M; Chen, K-C; Stromberg, A J; Norris, C M; Kadish, I; Kraner, S D; Porter, N M; Landfield, P W

    2005-11-01

    During normal brain aging, numerous alterations develop in the physiology, biochemistry and structure of neurons and glia. Aging changes occur in most brain regions and, in the hippocampus, have been linked to declining cognitive performance in both humans and animals. Age-related changes in hippocampal regions also may be harbingers of more severe decrements to come from neurodegenerative disorders such as Alzheimer's disease (AD). However, unraveling the mechanisms underlying brain aging, AD and impaired function has been difficult because of the complexity of the networks that drive these aging-related changes. Gene microarray technology allows massively parallel analysis of most genes expressed in a tissue, and therefore is an important new research tool that potentially can provide the investigative power needed to address the complexity of brain aging/neurodegenerative processes. However, along with this new analytic power, microarrays bring several major bioinformatics and resource problems that frequently hinder the optimal application of this technology. In particular, microarray analyses generate extremely large and unwieldy data sets and are subject to high false positive and false negative rates. Concerns also have been raised regarding their accuracy and uniformity. Furthermore, microarray analyses can result in long lists of altered genes, most of which may be difficult to evaluate for functional relevance. These and other problems have led to some skepticism regarding the reliability and functional usefulness of microarray data and to a general view that microarray data should be validated by an independent method. Given recent progress, however, we suggest that the major problem for current microarray research is no longer validity of expression measurements, but rather, the reliability of inferences from the data, an issue more appropriately redressed by statistical approaches than by validation with a separate method. If tested using statistically

  3. General approach for in vivo recovery of cell type-specific effector gene sets.

    PubMed

    Barsi, Julius C; Tu, Qiang; Davidson, Eric H

    2014-05-01

    Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database.

  4. General approach for in vivo recovery of cell type-specific effector gene sets

    PubMed Central

    Barsi, Julius C.; Tu, Qiang; Davidson, Eric H.

    2014-01-01

    Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database. PMID:24604781

  5. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer

    PubMed Central

    Araujo, Jhajaira M.; Prado, Alexandra; Cardenas, Nadezhda K.; Zaharia, Mayer; Dyer, Richard; Doimi, Franco; Bravo, Leny; Pinillos, Luis; Morante, Zaida; Aguilar, Alfredo; Mas, Luis A.; Gomez, Henry L.; Vallejos, Carlos S.; Rolfo, Christian; Pinto, Joseph A.

    2016-01-01

    There are different biological and clinical patterns of lung cancer between genders indicating intrinsic differences leading to increased sensitivity to cigarette smoke-induced DNA damage, mutational patterns of KRAS and better clinical outcomes in women while differences between genders at gene-expression levels was not previously reported. Here we show an enrichment of immune genes in NSCLC in women compared to men. We found in a GSEA analysis (by biological processes annotated from Gene Ontology) of six public datasets a repeated observation of immune gene sets enrichment in women. “Immune system process”, “immune response”, “defense response”, “cellular defense response” and “regulation of immune system process” were the gene sets most over-represented while APOBEC3G, APOBEC3F, LAT, CD1D and CCL5 represented the top-five core genes. Characterization of immune cell composition with the platform CIBERSORT showed no differences between genders; however, there were differences when tumor tissues were compared to normal tissues. Our results suggest different immune responses in NSCLC between genders that could be related with the different clinical outcome. PMID:26958810

  6. Repeated observation of immune gene sets enrichment in women with non-small cell lung cancer.

    PubMed

    Araujo, Jhajaira M; Prado, Alexandra; Cardenas, Nadezhda K; Zaharia, Mayer; Dyer, Richard; Doimi, Franco; Bravo, Leny; Pinillos, Luis; Morante, Zaida; Aguilar, Alfredo; Mas, Luis A; Gomez, Henry L; Vallejos, Carlos S; Rolfo, Christian; Pinto, Joseph A

    2016-04-12

    There are different biological and clinical patterns of lung cancer between genders indicating intrinsic differences leading to increased sensitivity to cigarette smoke-induced DNA damage, mutational patterns of KRAS and better clinical outcomes in women while differences between genders at gene-expression levels was not previously reported. Here we show an enrichment of immune genes in NSCLC in women compared to men. We found in a GSEA analysis (by biological processes annotated from Gene Ontology) of six public datasets a repeated observation of immune gene sets enrichment in women. "Immune system process", "immune response", "defense response", "cellular defense response" and "regulation of immune system process" were the gene sets most over-represented while APOBEC3G, APOBEC3F, LAT, CD1D and CCL5 represented the top-five core genes. Characterization of immune cell composition with the platform CIBERSORT showed no differences between genders; however, there were differences when tumor tissues were compared to normal tissues. Our results suggest different immune responses in NSCLC between genders that could be related with the different clinical outcome.

  7. Multiple divergent haplotypes express completely distinct sets of class I MHC genes in zebrafish.

    PubMed

    McConnell, Sean C; Restaino, Anthony C; de Jong, Jill L O

    2014-03-01

    The zebrafish is an important animal model for stem cell biology, cancer, and immunology research. Histocompatibility represents a key intersection of these disciplines; however, histocompatibility in zebrafish remains poorly understood. We examined a set of diverse zebrafish class I major histocompatibility complex (MHC) genes that segregate with specific haplotypes at chromosome 19, and for which donor-recipient matching has been shown to improve engraftment after hematopoietic transplantation. Using flanking gene polymorphisms, we identified six distinct chromosome 19 haplotypes. We describe several novel class I U lineage genes and characterize their sequence properties, expression, and haplotype distribution. Altogether, ten full-length zebrafish class I genes were analyzed, mhc1uba through mhc1uka. Expression data and sequence properties indicate that most are candidate classical genes. Several substitutions in putative peptide anchor residues, often shared with deduced MHC molecules from additional teleost species, suggest flexibility in antigen binding. All ten zebrafish class I genes were uniquely assigned among the six haplotypes, with dominant or codominant expression of one to three genes per haplotype. Interestingly, while the divergent MHC haplotypes display variable gene copy number and content, the different genes appear to have ancient origin, with extremely high levels of sequence diversity. Furthermore, haplotype variability extends beyond the MHC genes to include divergent forms of psmb8. The many disparate haplotypes at this locus therefore represent a remarkable form of genomic region configuration polymorphism. Defining the functional MHC genes within these divergent class I haplotypes in zebrafish will provide an important foundation for future studies in immunology and transplantation. PMID:24291825

  8. Application of a statistical software package for analysis of large patient dose data sets obtained from RIS.

    PubMed

    Fazakerley, J; Charnock, P; Wilde, R; Jones, R; Ward, M

    2010-01-01

    For the purpose of patient dose audit, clinical audit and radiology workload analysis, data from Radiology Information Systems (RIS) at many hospitals are collected using a database and the analysis was automated using a statistical package and Visual Basic coding. The database is a Structured Query Language database, which can be queried using an off-the-shelf statistical package, Statistica. Macros were created to automatically format the data to a consistent format between different hospitals ready for analysis. These macros can also be used to automate further analysis such as detailing mean kV, mAs and entrance surface dose per room and per gender. Standard deviation and standard error of the mean are also generated. Graphs can also be generated to illustrate the trends in doses between different variables such as room and gender. Collectively, this information can be used to generate a report. A process that once could take up to 1 d to complete now takes around 1 h. A major benefit in providing the service to hospital trusts is that less resource is now required to report on RIS data, making the possibility of continuous dose audit more likely. Time that was spent on sorting through data can now be spent on improving the analysis to provide benefit to the customer. Using data sets from RIS is a good way to perform dose audits as the huge numbers of data available provide the bases for very accurate analysis. Using macros written in Statistica Visual Basic has helped sort and consistently analyse these data. Being able to analyse by exposure factors has provided a more detailed report to the customer.

  9. The Use of Multi-Component Statistical Techniques in Understanding Subduction Zone Arc Granitic Geochemical Data Sets

    NASA Astrophysics Data System (ADS)

    Pompe, L.; Clausen, B. L.; Morton, D. M.

    2015-12-01

    Multi-component statistical techniques and GIS visualization are emerging trends in understanding large data sets. Our research applies these techniques to a large igneous geochemical data set from southern California to better understand magmatic and plate tectonic processes. A set of 480 granitic samples collected by Baird from this area were analyzed for 39 geochemical elements. Of these samples, 287 are from the Peninsular Ranges Batholith (PRB) and 164 from part of the Transverse Ranges (TR). Principal component analysis (PCA) summarized the 39 variables into 3 principal components (PC) by matrix multiplication and for the PRB are interpreted as follows: PC1 with about 30% of the variation included mainly compatible elements and SiO2 and indicates extent of differentation; PC2 with about 20% of the variation included HFS elements and may indicate crustal contamination as usually identified by Sri; PC3 with about 20% of the variation included mainly HRE elements and may indicate magma source depth as often diplayed using REE spider diagrams and possibly Sr/Y. Several elements did not fit well in any of the three components: Cr, Ni, U, and Na2O.For the PRB, the PC1 correlation with SiO2 was r=-0.85, the PC2 correlation with Sri was r=0.80, and the PC3 correlation with Gd/Yb was r=-0.76 and with Sr/Y was r=-0.66 . Extending this method to the TR, correlations were r=-0.85, -0.21, -0.06, and -0.64, respectively. A similar extent of correlation for both areas was visually evident using GIS interpolation.PC1 seems to do well at indicating differentiation index for both the PRB and TR and correlates very well with SiO2, Al2O3, MgO, FeO*, CaO, K2O, Sc, V, and Co, but poorly with Na2O and Cr. If the crustal component is represented by Sri, PC2 correlates well and less expesively with this indicator in the PRB, but not in the TR. Source depth has been related to the slope on REE spidergrams, and PC3 based on only the HREE and using the Sr/Y ratios gives a reasonable

  10. [Somatic hypermutagenesis in immunoglobulin genes. I. Connection of somatic mutations with repeats. A statistical weighting method].

    PubMed

    Solov'ev, V V; Rogozin, I V; Kolchanov, N A

    1989-01-01

    Based on the analysis of a number of immunoglobulin genes' nucleotide sequences, it has been suggested, that somatic mutations emerge by means of imperfect duplexes correction, formed by mispairing of complementary regions of direct and inverted repeats. In the present work provides new data, confirming this mechanism of somatic hypermutagenesis. It has been shown that the presented sample of V- and J-segments of immunoglobulin genes is abundant in nonrandom imperfect direct repeats and complementary palindromes. To prove the connection of somatic mutations with the correction of imperfect duplexes, made up by the regions of these repeats, we have developed the method of statistical weights, permitting us to analyse the samples of mutations and repeats and to reveal the reliability of the connection between them. Using this method we have investigated the collection of 203 nucleotide substitutions in V- and J-segments and have shown a statistically reliable (P less than 10(-4) connection of these mutation positions with imperfect repeats.

  11. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013.

    PubMed

    Wang, Jing; Duncan, Dexter; Shi, Zhiao; Zhang, Bing

    2013-07-01

    Functional enrichment analysis is an essential task for the interpretation of gene lists derived from large-scale genetic, transcriptomic and proteomic studies. WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) has become one of the popular software tools in this field since its publication in 2005. For the last 7 years, WebGestalt data holdings have grown substantially to satisfy the requirements of users from different research areas. The current version of WebGestalt supports 8 organisms and 201 gene identifiers from various databases and different technology platforms, making it directly available to the fast growing omics community. Meanwhile, by integrating functional categories derived from centrally and publicly curated databases as well as computational analyses, WebGestalt has significantly increased the coverage of functional categories in various biological contexts including Gene Ontology, pathway, network module, gene-phenotype association, gene-disease association, gene-drug association and chromosomal location, leading to a total of 78 612 functional categories. Finally, new interactive features, such as pathway map, hierarchical network visualization and phenotype ontology visualization have been added to WebGestalt to help users better understand the enrichment results. WebGestalt can be freely accessed through http://www.webgestalt.org or http://bioinfo.vanderbilt.edu/webgestalt/.

  12. GSVA: gene set variation analysis for microarray and RNA-Seq data

    PubMed Central

    2013-01-01

    Background Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. Results To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. Conclusions GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org. PMID:23323831

  13. Evaluation of Statistical Treatments of Left-Censored Environmental Data Using Coincident Uncensored Data Sets. II. Group Comparisons.

    PubMed

    Antweiler, Ronald C

    2015-11-17

    The main classes of statistical treatments that have been used to determine if two groups of censored environmental data arise from the same distribution are substitution methods, maximum likelihood (MLE) techniques, and nonparametric methods. These treatments along with using all instrument-generated data (IN), even those less than the detection limit, were evaluated by examining 550 data sets in which the true values of the censored data were known, and therefore "true" probabilities could be calculated and used as a yardstick for comparison. It was found that technique "quality" was strongly dependent on the degree of censoring present in the groups. For low degrees of censoring (<25% in each group), the Generalized Wilcoxon (GW) technique and substitution of √2/2 times the detection limit gave overall the best results. For moderate degrees of censoring, MLE worked best, but only if the distribution could be estimated to be normal or log-normal prior to its application; otherwise, GW was a suitable alternative. For higher degrees of censoring (each group >40% censoring), no technique provided reliable estimates of the true probability. Group size did not appear to influence the quality of the result, and no technique appeared to become better or worse than other techniques relative to group size. Finally, IN appeared to do very well relative to the other techniques regardless of censoring or group size.

  14. Statistical analysis of the exon-intron structure of higher and lower eukaryote genes.

    PubMed

    Kriventseva, E V; Gelfand, M S

    1999-10-01

    Statistics of the exon-intron structure and splicing sites of several diverse eukaryotes was studied. The yeast exon-intron structures have a number of unique features. A yeast gene usually have at most one intron. The branch site is strongly conserved, whereas the polypirimidine tract is short. Long yeast introns tend to have stronger acceptor sites. In other species the branch site is less conserved and often cannot be determined. In non-yeast samples there is an almost universal correlation between lengths of neighboring exons (all samples excluding protists) and correlation between lengths of neighboring introns (human, drosophila, protists). On the average first introns are longer, and anomalously long introns are usually first introns in a gene. There is a universal preference for exons and exon pairs with the (total) length divisible by 3. Introns positioned between codons are preferred, whereas those positioned between the first and second positions in codon are avoided. The choice of A or G at the third position of intron (the donor splice sites generally prefer purines at this position) is correlated with the overall GC-composition of the gene. In all samples dinucleotide AG is avoided in the region preceding the acceptor site.

  15. Gene set enrichment analysis and ingenuity pathway analysis of metastatic clear cell renal cell carcinoma cell line.

    PubMed

    Khan, Mohammed I; Dębski, Konrad J; Dabrowski, Michał; Czarnecka, Anna M; Szczylik, Cezary

    2016-08-01

    In recent years, genome-wide RNA expression analysis has become a routine tool that offers a great opportunity to study and understand the key role of genes that contribute to carcinogenesis. Various microarray platforms and statistical approaches can be used to identify genes that might serve as prognostic biomarkers and be developed as antitumor therapies in the future. Metastatic renal cell carcinoma (mRCC) is a serious, life-threatening disease, and there are few treatment options for patients. In this study, we performed one-color microarray gene expression (4×44K) analysis of the mRCC cell line Caki-1 and the healthy kidney cell line ASE-5063. A total of 1,921 genes were differentially expressed in the Caki-1 cell line (1,023 upregulated and 898 downregulated). Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis (IPA) approaches were used to analyze the differential-expression data. The objective of this research was to identify complex biological changes that occur during metastatic development using Caki-1 as a model mRCC cell line. Our data suggest that there are multiple deregulated pathways associated with metastatic clear cell renal cell carcinoma (mccRCC), including integrin-linked kinase (ILK) signaling, leukocyte extravasation signaling, IGF-I signaling, CXCR4 signaling, and phosphoinositol 3-kinase/AKT/mammalian target of rapamycin signaling. The IPA upstream analysis predicted top transcriptional regulators that are either activated or inhibited, such as estrogen receptors, TP53, KDM5B, SPDEF, and CDKN1A. The GSEA approach was used to further confirm enriched pathway data following IPA.

  16. Detection of RTX toxin genes in gram-negative bacteria with a set of specific probes.

    PubMed Central

    Kuhnert, P; Heyberger-Meyer, B; Burnens, A P; Nicolet, J; Frey, J

    1997-01-01

    The family of RTX (RTX representing repeats in the structural toxin) toxins is composed of several protein toxins with a characteristic nonapeptide glycine-rich repeat motif. Most of its members were shown to have cytolytic activity. By comparing the genetic relationships of the RTX toxin genes we established a set of 10 gene probes to be used for screening as-yet-unknown RTX toxin genes in bacterial species. The probes include parts of apxIA, apxIIA, and apxIIIA from Actinobacillus pleuropneumoniae, cyaA from Bordetella pertusis, frpA from Neisseria meningitidis, prtC from Erwinia chrysanthemi, hlyA and elyA from Escherichia coli, aaltA from Actinobacillus actinomycetemcomitans and lktA from Pasteurella haemolytica. A panel of pathogenic and nonpathogenic gram-negative bacteria were investigated for the presence of RTX toxin genes. The probes detected all known genes for RTX toxins. Moreover, we found potential RTX toxin genes in several pathogenic bacterial species for which no such toxins are known yet. This indicates that RTX or RTX-like toxins are widely distributed among pathogenic gram-negative bacteria. The probes generated by PCR and the hybridization method were optimized to allow broad-range screening for RTX toxin genes in one step. This included the binding of unlabelled probes to a nylon filter and subsequent hybridization of the filter with labelled genomic DNA of the strain to be tested. The method constitutes a powerful tool for the assessment of the potential pathogenicity of poorly characterized strains intended to be used in biotechnological applications. Moreover, it is useful for the detection of already-known or new RTX toxin genes in bacteria of medical importance. PMID:9172345

  17. Defining the optimal animal model for translational research using gene set enrichment analysis.

    PubMed

    Weidner, Christopher; Steinfath, Matthias; Opitz, Elisa; Oelgeschläger, Michael; Schönfelder, Gilbert

    2016-01-01

    The mouse is the main model organism used to study the functions of human genes because most biological processes in the mouse are highly conserved in humans. Recent reports that compared identical transcriptomic datasets of human inflammatory diseases with datasets from mouse models using traditional gene-to-gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. To reduce susceptibility to biased interpretation, all genes of interest for the biological question under investigation should be considered. Thus, standardized approaches for systematic data analysis are needed. We analyzed the same datasets using gene set enrichment analysis focusing on pathways assigned to inflammatory processes in either humans or mice. The analyses revealed a moderate overlap between all human and mouse datasets, with average positive and negative predictive values of 48 and 57% significant correlations. Subgroups of the septic mouse models (i.e., Staphylococcus aureus injection) correlated very well with most human studies. These findings support the applicability of targeted strategies to identify the optimal animal model and protocol to improve the success of translational research. PMID:27311961

  18. Powerful Set-Based Gene-Environment Interaction Testing Framework for Complex Diseases.

    PubMed

    Jiao, Shuo; Peters, Ulrike; Berndt, Sonja; Bézieau, Stéphane; Brenner, Hermann; Campbell, Peter T; Chan, Andrew T; Chang-Claude, Jenny; Lemire, Mathieu; Newcomb, Polly A; Potter, John D; Slattery, Martha L; Woods, Michael O; Hsu, Li

    2015-12-01

    Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. Based on our previously developed Set Based gene EnviRonment InterAction test (SBERIA), in this paper we propose a powerful framework for enhanced set-based G × E testing (eSBERIA). The major challenge of signal aggregation within a set is how to tell signals from noise. eSBERIA tackles this challenge by adaptively aggregating the interaction signals within a set weighted by the strength of the marginal and correlation screening signals. eSBERIA then combines the screening-informed aggregate test with a variance component test to account for the residual signals. Additionally, we develop a case-only extension for eSBERIA (coSBERIA) and an existing set-based method, which boosts the power not only by exploiting the G-E independence assumption but also by avoiding the need to specify main effects for a large number of variants in the set. Through extensive simulation, we show that coSBERIA and eSBERIA are considerably more powerful than existing methods within the case-only and the case-control method categories across a wide range of scenarios. We conduct a genome-wide G × E search by applying our methods to Illumina HumanExome Beadchip data of 10,446 colorectal cancer cases and 10,191 controls and identify two novel interactions between nonsteroidal anti-inflammatory drugs (NSAIDs) and MINK1 and PTCHD3.

  19. Powerful Set-Based Gene-Environment Interaction Testing Framework for Complex Diseases.

    PubMed

    Jiao, Shuo; Peters, Ulrike; Berndt, Sonja; Bézieau, Stéphane; Brenner, Hermann; Campbell, Peter T; Chan, Andrew T; Chang-Claude, Jenny; Lemire, Mathieu; Newcomb, Polly A; Potter, John D; Slattery, Martha L; Woods, Michael O; Hsu, Li

    2015-12-01

    Identification of gene-environment interaction (G × E) is important in understanding the etiology of complex diseases. Based on our previously developed Set Based gene EnviRonment InterAction test (SBERIA), in this paper we propose a powerful framework for enhanced set-based G × E testing (eSBERIA). The major challenge of signal aggregation within a set is how to tell signals from noise. eSBERIA tackles this challenge by adaptively aggregating the interaction signals within a set weighted by the strength of the marginal and correlation screening signals. eSBERIA then combines the screening-informed aggregate test with a variance component test to account for the residual signals. Additionally, we develop a case-only extension for eSBERIA (coSBERIA) and an existing set-based method, which boosts the power not only by exploiting the G-E independence assumption but also by avoiding the need to specify main effects for a large number of variants in the set. Through extensive simulation, we show that coSBERIA and eSBERIA are considerably more powerful than existing methods within the case-only and the case-control method categories across a wide range of scenarios. We conduct a genome-wide G × E search by applying our methods to Illumina HumanExome Beadchip data of 10,446 colorectal cancer cases and 10,191 controls and identify two novel interactions between nonsteroidal anti-inflammatory drugs (NSAIDs) and MINK1 and PTCHD3. PMID:26095235

  20. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation.

    PubMed

    Wang, Jiying; Wang, Yanping; Wang, Huaizhong; Wang, Haifei; Liu, Jian-Feng; Wu, Ying; Guo, Jianfeng

    2016-05-03

    Polyinosinic-polycytidylic acid (poly I:C), a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC) of piglets of one Chinese indigenous breed (Dapulian) and one modern commercial breed (Landrace) with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq). Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE) genes in Dapulian (g = 290) as well as Landrace (g = 85). We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA) package, and identified some significantly enriched gene sets in Dapulian (g = 18) and Landrace (g = 21). Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs.

  1. Transcriptomic Analysis Identifies Candidate Genes and Gene Sets Controlling the Response of Porcine Peripheral Blood Mononuclear Cells to Poly I:C Stimulation

    PubMed Central

    Wang, Jiying; Wang, Yanping; Wang, Huaizhong; Wang, Haifei; Liu, Jian-Feng; Wu, Ying; Guo, Jianfeng

    2016-01-01

    Polyinosinic-polycytidylic acid (poly I:C), a synthetic dsRNA analog, has been demonstrated to have stimulatory effects similar to viral dsRNA. To gain deep knowledge of the host transcriptional response of pigs to poly I:C stimulation, in the present study, we cultured and stimulated peripheral blood mononuclear cells (PBMC) of piglets of one Chinese indigenous breed (Dapulian) and one modern commercial breed (Landrace) with poly I:C, and compared their transcriptional profiling using RNA-sequencing (RNA-seq). Our results indicated that poly I:C stimulation can elicit significantly differentially expressed (DE) genes in Dapulian (g = 290) as well as Landrace (g = 85). We also performed gene set analysis using the Gene Set Enrichment Analysis (GSEA) package, and identified some significantly enriched gene sets in Dapulian (g = 18) and Landrace (g = 21). Most of the shared DE genes and gene sets were immune-related, and may play crucial rules in the immune response of poly I:C stimulation. In addition, we detected large sets of significantly DE genes and enriched gene sets when comparing the gene expression profile between the two breeds, including control and poly I:C stimulation groups. Besides immune-related functions, some of the DE genes and gene sets between the two breeds were involved in development and growth of various tissues, which may be correlated with the different characteristics of the two breeds. The DE genes and gene sets detected herein provide crucial information towards understanding the immune regulation of antiviral responses, and the molecular mechanisms of different genetic resistance to viral infection, in modern and indigenous pigs. PMID:26935416

  2. Primer Sets Developed To Amplify Conserved Genes from Filamentous Ascomycetes Are Useful in Differentiating Fusarium Species Associated with Conifers

    PubMed Central

    Donaldson, G. C.; Ball, L. A.; Axelrood, P. E.; Glass, N. L.

    1995-01-01

    We examined the usefulness of primer sets designed to amplify introns within conserved genes in filamentous ascomycetes to differentiate 35 isolates representing six different species of Fusarium commonly found in association with conifer seedlings. We analyzed restriction fragment length polymorphisms (RFLP) in five amplified PCR products from each Fusarium isolate. The primers used in this study were constructed on the basis of sequence information from the H3, H4, and (beta)-tubulin genes in Neurospora crassa. Primers previously developed for the intergenic transcribed spacer region of the ribosomal DNA were also used. The degree of interspecific polymorphism observed in the PCR products from the six Fusarium species allowed differentiation by a limited number of amplifications and restriction endonuclease digestions. The level of intraspecific RFLP variation in the five PCR products was low in both Fusarium proliferatum and F. avenaceum but was high in a population sample of F. oxysporum isolates. Clustering of the 35 isolates by statistical analyses gave similar dendrograms for H3, H4, and (beta)-tubulin RFLP analysis, but a dendrogram produced by intergenic transcribed spacer analysis varied in the placement of some F. oxysporum isolates. PMID:16534991

  3. A Minimal Set of Glycolytic Genes Reveals Strong Redundancies in Saccharomyces cerevisiae Central Metabolism.

    PubMed

    Solis-Escalante, Daniel; Kuijpers, Niels G A; Barrajon-Simancas, Nuria; van den Broek, Marcel; Pronk, Jack T; Daran, Jean-Marc; Daran-Lapujade, Pascale

    2015-08-01

    As a result of ancestral whole-genome and small-scale duplication events, the genomes of Saccharomyces cerevisiae and many eukaryotes still contain a substantial fraction of duplicated genes. In all investigated organisms, metabolic pathways, and more particularly glycolysis, are specifically enriched for functionally redundant paralogs. In ancestors of the Saccharomyces lineage, the duplication of glycolytic genes is purported to have played an important role leading to S. cerevisiae's current lifestyle favoring fermentative metabolism even in the presence of oxygen and characterized by a high glycolytic capacity. In modern S. cerevisiae strains, the 12 glycolytic reactions leading to the biochemical conversion from glucose to ethanol are encoded by 27 paralogs. In order to experimentally explore the physiological role of this genetic redundancy, a yeast strain with a minimal set of 14 paralogs was constructed (the "minimal glycolysis" [MG] strain). Remarkably, a combination of a quantitative systems approach and semiquantitative analysis in a wide array of growth environments revealed the absence of a phenotypic response to the cumulative deletion of 13 glycolytic paralogs. This observation indicates that duplication of glycolytic genes is not a prerequisite for achieving the high glycolytic fluxes and fermentative capacities that are characteristic of S. cerevisiae and essential for many of its industrial applications and argues against gene dosage effects as a means of fixing minor glycolytic paralogs in the yeast genome. The MG strain was carefully designed and constructed to provide a robust prototrophic platform for quantitative studies and has been made available to the scientific community.

  4. Globularity and language-readiness: generating new predictions by expanding the set of genes of interest.

    PubMed

    Boeckx, Cedric; Benítez-Burraco, Antonio

    2014-01-01

    This study builds on the hypothesis put forth in Boeckx and Benítez-Burraco (2014), according to which the developmental changes expressed at the levels of brain morphology and neural connectivity that resulted in a more globular braincase in our species were crucial to understand the origins of our language-ready brain. Specifically, this paper explores the links between two well-known 'language-related' genes like FOXP2 and ROBO1 implicated in vocal learning and the initial set of genes of interest put forth in Boeckx and Benítez-Burraco (2014), with RUNX2 as focal point. Relying on the existing literature, we uncover potential molecular links that could be of interest to future experimental inquiries into the biological foundations of language and the testing of our initial hypothesis. Our discussion could also be relevant for clinical linguistics and for the interpretation of results from paleogenomics.

  5. Globularity and language-readiness: generating new predictions by expanding the set of genes of interest

    PubMed Central

    Boeckx, Cedric; Benítez-Burraco, Antonio

    2014-01-01

    This study builds on the hypothesis put forth in Boeckx and Benítez-Burraco (2014), according to which the developmental changes expressed at the levels of brain morphology and neural connectivity that resulted in a more globular braincase in our species were crucial to understand the origins of our language-ready brain. Specifically, this paper explores the links between two well-known ‘language-related’ genes like FOXP2 and ROBO1 implicated in vocal learning and the initial set of genes of interest put forth in Boeckx and Benítez-Burraco (2014), with RUNX2 as focal point. Relying on the existing literature, we uncover potential molecular links that could be of interest to future experimental inquiries into the biological foundations of language and the testing of our initial hypothesis. Our discussion could also be relevant for clinical linguistics and for the interpretation of results from paleogenomics. PMID:25505436

  6. Gene Ontology Analysis of GWA Study Data Sets Provides Insights into the Biology of Bipolar Disorder

    PubMed Central

    Holmans, Peter; Green, Elaine K.; Pahwa, Jaspreet Singh; Ferreira, Manuel A.R.; Purcell, Shaun M.; Sklar, Pamela; Owen, Michael J.; O'Donovan, Michael C.; Craddock, Nick

    2009-01-01

    We present a method for testing overrepresentation of biological pathways, indexed by gene-ontology terms, in lists of significant SNPs from genome-wide association studies. This method corrects for linkage disequilibrium between SNPs, variable gene size, and multiple testing of nonindependent pathways. The method was applied to the Wellcome Trust Case-Control Consortium Crohn disease (CD) data set. At a general level, the biological basis of CD is relatively well known for a complex genetic trait, and it thus acted as a test of the method. The method, known as ALIGATOR (Association LIst Go AnnoTatOR), successfully detected biological pathways implicated in CD. The method was also applied to a meta-analysis of bipolar disorder, and it implicated the modulation of transcription and cellular activity, including that which occurs via hormonal action, as an important player in pathogenesis. PMID:19539887

  7. In silico analysis of stomach lineage specific gene set expression pattern in gastric cancer

    SciTech Connect

    Pandi, Narayanan Sathiya Suganya, Sivagurunathan; Rajendran, Suriliyandi

    2013-10-04

    Highlights: •Identified stomach lineage specific gene set (SLSGS) was found to be under expressed in gastric tumors. •Elevated expression of SLSGS in gastric tumor is a molecular predictor of metabolic type gastric cancer. •In silico pathway scanning identified estrogen-α signaling is a putative regulator of SLSGS in gastric cancer. •Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. -- Abstract: Stomach lineage specific gene products act as a protective barrier in the normal stomach and their expression maintains the normal physiological processes, cellular integrity and morphology of the gastric wall. However, the regulation of stomach lineage specific genes in gastric cancer (GC) is far less clear. In the present study, we sought to investigate the role and regulation of stomach lineage specific gene set (SLSGS) in GC. SLSGS was identified by comparing the mRNA expression profiles of normal stomach tissue with other organ tissue. The obtained SLSGS was found to be under expressed in gastric tumors. Functional annotation analysis revealed that the SLSGS was enriched for digestive function and gastric epithelial maintenance. Employing a single sample prediction method across GC mRNA expression profiles identified the under expression of SLSGS in proliferative type and invasive type gastric tumors compared to the metabolic type gastric tumors. Integrative pathway activation prediction analysis revealed a close association between estrogen-α signaling and SLSGS expression pattern in GC. Elevated expression of SLSGS in GC is associated with an overall increase in the survival of GC patients. In conclusion, our results highlight that estrogen mediated regulation of SLSGS in gastric tumor is a molecular predictor of metabolic type GC and prognostic factor in GC.

  8. Identifying the 'inorganic gene' for high-temperature piezoelectric perovskites through statistical learning.

    PubMed

    Balachandran, Prasanna V; Broderick, Scott R; Rajan, Krishna

    2011-08-01

    This paper develops a statistical learning approach to identify potentially new high-temperature ferroelectric piezoelectric perovskite compounds. Unlike most computational studies on crystal chemistry, where the starting point is some form of electronic structure calculation, we use a data-driven approach to initiate our search. This is accomplished by identifying patterns of behaviour between discrete scalar descriptors associated with crystal and electronic structure and the reported Curie temperature (TC) of known compounds; extracting design rules that govern critical structure-property relationships; and discovering in a quantitative fashion the exact role of these materials descriptors. Our approach applies linear manifold methods for data dimensionality reduction to discover the dominant descriptors governing structure-property correlations (the 'genes') and Shannon entropy metrics coupled to recursive partitioning methods to quantitatively assess the specific combination of descriptors that govern the link between crystal chemistry and TC (their 'sequencing'). We use this information to develop predictive models that can suggest new structure/chemistries and/or properties. In this manner, BiTmO3-PbTiO3 and BiLuO3-PbTiO3 are predicted to have a TC of 730(°)C and 705(°)C, respectively. A quantitative structure-property relationship model similar to those used in biology and drug discovery not only predicts our new chemistries but also validates published reports.

  9. Detecting Gene-Environment Interactions in Human Birth Defects: Study Designs and Statistical Methods

    PubMed Central

    Tai, Caroline G.; Graff, Rebecca E.; Liu, Jinghua; Passarelli, Michael N.; Mefford, Joel A.; Shaw, Gary M.; Hoffmann, Thomas J.; Witte, John S.

    2015-01-01

    Background The National Birth Defects Prevention Study (NBDPS) contains a wealth of information on affected and unaffected family triads, and thus provides numerous opportunities to study gene-environment interactions (GxE) in the etiology of birth defect outcomes. Depending on the research objective, several analytic options exist to estimate GxE effects that utilize varying combinations of individuals drawn from available triads. Methods In this paper we discuss several considerations in the collection of genetic data and environmental exposures. We will also present several population- and family-based approaches that can be applied to data from the NBDPS including case-control, case-only, family-based trio, and maternal versus fetal effects. For each, we describe the data requirements, applicable statistical methods, advantages and disadvantages. Discussion A range of approaches can be used to evaluate potentially important GxE effects in the NBDPS. Investigators should be aware of the limitations inherent to each approach when choosing a study design and interpreting results. PMID:26010994

  10. Branch-and-bound approach for parsimonious inference of a species tree from a set of gene family trees.

    PubMed

    Doyon, Jean-Philippe; Chauve, Cedric

    2011-01-01

    We describe a Branch-and-Bound algorithm for computing a parsimonious species tree, given a set of gene family trees. Our algorithm can consider three cost measures: number of gene duplications, number of gene losses, and both combined. Moreover, to cope with intrinsic limitations of Branch-and-Bound algorithms for species trees inference regarding the number of taxa that can be considered, our algorithm can naturally take into account predefined relationships between sets of taxa. We test our algorithm on a dataset of eukaryotic gene families spanning 29 taxa.

  11. Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory.

    PubMed

    Meng, Jun; Zhang, Jing; Luan, Yushi

    2015-01-01

    Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well.

  12. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  13. Detecting Cooperativity between Transcription Factors Based on Functional Coherence and Similarity of Their Target Gene Sets

    PubMed Central

    Wu, Wei-Sheng; Lai, Fu-Jou

    2016-01-01

    In eukaryotic cells, transcriptional regulation of gene expression is usually achieved by cooperative transcription factors (TFs). Therefore, knowing cooperative TFs is the first step toward uncovering the molecular mechanisms of gene expression regulation. Many algorithms based on different rationales have been proposed to predict cooperative TF pairs in yeast. Although various types of rationales have been used in the existing algorithms, functional coherence is not yet used. This prompts us to develop a new algorithm based on functional coherence and similarity of the target gene sets to identify cooperative TF pairs in yeast. The proposed algorithm predicted 40 cooperative TF pairs. Among them, three (Pdc2-Thi2, Hot1-Msn1 and Leu3-Met28) are novel predictions, which have not been predicted by any existing algorithms. Strikingly, two (Pdc2-Thi2 and Hot1-Msn1) of the three novel predictions have been experimentally validated, demonstrating the power of the proposed algorithm. Moreover, we show that the predictions of the proposed algorithm are more biologically meaningful than the predictions of 17 existing algorithms under four evaluation indices. In summary, our study suggests that new algorithms based on novel rationales are worthy of developing for detecting previously unidentifiable cooperative TF pairs. PMID:27623007

  14. Detecting Cooperativity between Transcription Factors Based on Functional Coherence and Similarity of Their Target Gene Sets.

    PubMed

    Wu, Wei-Sheng; Lai, Fu-Jou

    2016-01-01

    In eukaryotic cells, transcriptional regulation of gene expression is usually achieved by cooperative transcription factors (TFs). Therefore, knowing cooperative TFs is the first step toward uncovering the molecular mechanisms of gene expression regulation. Many algorithms based on different rationales have been proposed to predict cooperative TF pairs in yeast. Although various types of rationales have been used in the existing algorithms, functional coherence is not yet used. This prompts us to develop a new algorithm based on functional coherence and similarity of the target gene sets to identify cooperative TF pairs in yeast. The proposed algorithm predicted 40 cooperative TF pairs. Among them, three (Pdc2-Thi2, Hot1-Msn1 and Leu3-Met28) are novel predictions, which have not been predicted by any existing algorithms. Strikingly, two (Pdc2-Thi2 and Hot1-Msn1) of the three novel predictions have been experimentally validated, demonstrating the power of the proposed algorithm. Moreover, we show that the predictions of the proposed algorithm are more biologically meaningful than the predictions of 17 existing algorithms under four evaluation indices. In summary, our study suggests that new algorithms based on novel rationales are worthy of developing for detecting previously unidentifiable cooperative TF pairs. PMID:27623007

  15. A FORTRAN program for the statistical analysis of incomplete time series data sets by a method of partition.

    PubMed

    Patel, M K; Waterhouse, J P

    1993-03-01

    A program written in FORTRAN-77 which executes an analysis for periodicity of a time series data set is presented. Time series analysis now has applicability and use in a wide range of biomedical studies. The analytical method termed here a method of partition is derived from periodogram analysis, but uses the principle of analysis of variance (ANOVA). It is effective when used on incomplete data sets. An example in which a data set is made progressively more incomplete by the random removal of values demonstrates this, and a listing of the program and a sample output in both an abbreviated and a full version are given.

  16. A comprehensive time-course-based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set.

    PubMed

    Sweeney, Timothy E; Shidham, Aaditya; Wong, Hector R; Khatri, Purvesh

    2015-05-13

    Although several dozen studies of gene expression in sepsis have been published, distinguishing sepsis from a sterile systemic inflammatory response syndrome (SIRS) is still largely up to clinical suspicion. We hypothesized that a multicohort analysis of the publicly available sepsis gene expression data sets would yield a robust set of genes for distinguishing patients with sepsis from patients with sterile inflammation. A comprehensive search for gene expression data sets in sepsis identified 27 data sets matching our inclusion criteria. Five data sets (n = 663 samples) compared patients with sterile inflammation (SIRS/trauma) to time-matched patients with infections. We applied our multicohort analysis framework that uses both effect sizes and P values in a leave-one-data set-out fashion to these data sets. We identified 11 genes that were differentially expressed (false discovery rate ≤1%, inter-data set heterogeneity P > 0.01, summary effect size >1.5-fold) across all discovery cohorts with excellent diagnostic power [mean area under the receiver operating characteristic curve (AUC), 0.87; range, 0.7 to 0.98]. We then validated these 11 genes in 15 independent cohorts comparing (i) time-matched infected versus noninfected trauma patients (4 cohorts), (ii) ICU/trauma patients with infections over the clinical time course (3 cohorts), and (iii) healthy subjects versus sepsis patients (8 cohorts). In the discovery Glue Grant cohort, SIRS plus the 11-gene set improved prediction of infection (compared to SIRS alone) with a continuous net reclassification index of 0.90. Overall, multicohort analysis of time-matched cohorts yielded 11 genes that robustly distinguish sterile inflammation from infectious inflammation. PMID:25972003

  17. [Comparison of several Russian populations by vital statistics and frequency of genes, causing hereditary diseases].

    PubMed

    El'chinova, G I; Mamedova, R A; Brusintseva, O V; Ginter, E K

    1994-11-01

    Distances computed from vital statistics using the Euclid formula and thus termed "vital" are proposed for use in population studies. An example of use of these statistics for comparison of four large geographically separated Russian populations is given.

  18. An ancient dental gene set governs development and continuous regeneration of teeth in sharks.

    PubMed

    Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J

    2016-07-15

    The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary

  19. An ancient dental gene set governs development and continuous regeneration of teeth in sharks.

    PubMed

    Rasch, Liam J; Martin, Kyle J; Cooper, Rory L; Metscher, Brian D; Underwood, Charlie J; Fraser, Gareth J

    2016-07-15

    The evolution of oral teeth is considered a major contributor to the overall success of jawed vertebrates. This is especially apparent in cartilaginous fishes including sharks and rays, which develop elaborate arrays of highly specialized teeth, organized in rows and retain the capacity for life-long regeneration. Perpetual regeneration of oral teeth has been either lost or highly reduced in many other lineages including important developmental model species, so cartilaginous fishes are uniquely suited for deep comparative analyses of tooth development and regeneration. Additionally, sharks and rays can offer crucial insights into the characters of the dentition in the ancestor of all jawed vertebrates. Despite this, tooth development and regeneration in chondrichthyans is poorly understood and remains virtually uncharacterized from a developmental genetic standpoint. Using the emerging chondrichthyan model, the catshark (Scyliorhinus spp.), we characterized the expression of genes homologous to those known to be expressed during stages of early dental competence, tooth initiation, morphogenesis, and regeneration in bony vertebrates. We have found that expression patterns of several genes from Hh, Wnt/β-catenin, Bmp and Fgf signalling pathways indicate deep conservation over ~450 million years of tooth development and regeneration. We describe how these genes participate in the initial emergence of the shark dentition and how they are redeployed during regeneration of successive tooth generations. We suggest that at the dawn of the vertebrate lineage, teeth (i) were most likely continuously regenerative structures, and (ii) utilised a core set of genes from members of key developmental signalling pathways that were instrumental in creating a dental legacy redeployed throughout vertebrate evolution. These data lay the foundation for further experimental investigations utilizing the unique regenerative capacity of chondrichthyan models to answer evolutionary

  20. Cscan: finding common regulators of a set of genes by using a collection of genome-wide ChIP-seq datasets

    PubMed Central

    Zambelli, Federico; Prazzoli, Gian Marco; Pesole, Graziano; Pavesi, Giulio

    2012-01-01

    The regulation of transcription of eukaryotic genes is a very complex process, which involves interactions between transcription factors (TFs) and DNA, as well as other epigenetic factors like histone modifications, DNA methylation, and so on, which nowadays can be studied and characterized with techniques like ChIP-Seq. Cscan is a web resource that includes a large collection of genome-wide ChIP-Seq experiments performed on TFs, histone modifications, RNA polymerases and others. Enriched peak regions from the ChIP-Seq experiments are crossed with the genomic coordinates of a set of input genes, to identify which of the experiments present a statistically significant number of peaks within the input genes’ loci. The input can be a cluster of co-expressed genes, or any other set of genes sharing a common regulatory profile. Users can thus single out which TFs are likely to be common regulators of the genes, and their respective correlations. Also, by examining results on promoter activation, transcription, histone modifications, polymerase binding and so on, users can investigate the effect of the TFs (activation or repression of transcription) as well as of the cell or tissue specificity of the genes’ regulation and expression. The web interface is free for use, and there is no login requirement. Available at: http://www.beaconlab.it/cscan. PMID:22669907

  1. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

    PubMed

    Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

    2013-08-01

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. PMID:23891470

  2. Meta-analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics

    PubMed Central

    Hu, Yi-Juan; Berndt, Sonja I.; Gustafsson, Stefan; Ganna, Andrea; Berndt, Sonja I.; Gustafsson, Stefan; Mägi, Reedik; Ganna, Andrea; Wheeler, Eleanor; Feitosa, Mary F.; Justice, Anne E.; Monda, Keri L.; Croteau-Chonka, Damien C.; Day, Felix R.; Esko, Tõnu; Fall, Tove; Ferreira, Teresa; Gentilini, Davide; Jackson, Anne U.; Luan, Jian’an; Randall, Joshua C.; Vedantam, Sailaja; Willer, Cristen J.; Winkler, Thomas W.; Wood, Andrew R.; Workalemahu, Tsegaselassie; Hu, Yi-Juan; Lee, Sang Hong; Liang, Liming; Lin, Dan-Yu; Min, Josine L.; Neale, Benjamin M.; Thorleifsson, Gudmar; Yang, Jian; Albrecht, Eva; Amin, Najaf; Bragg-Gresham, Jennifer L.; Cadby, Gemma; den Heijer, Martin; Eklund, Niina; Fischer, Krista; Goel, Anuj; Hottenga, Jouke-Jan; Huffman, Jennifer E.; Jarick, Ivonne; Johansson, Åsa; Johnson, Toby; Kanoni, Stavroula; Kleber, Marcus E.; König, Inke R.; Kristiansson, Kati; Kutalik, Zoltán; Lamina, Claudia; Lecoeur, Cecile; Li, Guo; Mangino, Massimo; McArdle, Wendy L.; Medina-Gomez, Carolina; Müller-Nurasyid, Martina; Ngwa, Julius S.; Nolte, Ilja M.; Paternoster, Lavinia; Pechlivanis, Sonali; Perola, Markus; Peters, Marjolein J.; Preuss, Michael; Rose, Lynda M.; Shi, Jianxin; Shungin, Dmitry; Smith, Albert Vernon; Strawbridge, Rona J.; Surakka, Ida; Teumer, Alexander; Trip, Mieke D.; Tyrer, Jonathan; Van Vliet-Ostaptchouk, Jana V.; Vandenput, Liesbeth; Waite, Lindsay L.; Zhao, Jing Hua; Absher, Devin; Asselbergs, Folkert W.; Atalay, Mustafa; Attwood, Antony P.; Balmforth, Anthony J.; Basart, Hanneke; Beilby, John; Bonnycastle, Lori L.; Brambilla, Paolo; Bruinenberg, Marcel; Campbell, Harry; Chasman, Daniel I.; Chines, Peter S.; Collins, Francis S.; Connell, John M.; Cookson, William; de Faire, Ulf; de Vegt, Femmie; Dei, Mariano; Dimitriou, Maria; Edkins, Sarah; Estrada, Karol; Evans, David M.; Farrall, Martin; Ferrario, Marco M.; Ferrières, Jean; Franke, Lude; Frau, Francesca; Gejman, Pablo V.; Grallert, Harald; Grönberg, Henrik; Gudnason, Vilmundur; Hall, Alistair S.; Hall, Per; Hartikainen, Anna-Liisa; Hayward, Caroline; Heard-Costa, Nancy L.; Heath, Andrew C.; Hebebrand, Johannes; Homuth, Georg; Hu, Frank B.; Hunt, Sarah E.; Hyppönen, Elina; Iribarren, Carlos; Jacobs, Kevin B.; Jansson, John-Olov; Jula, Antti; Kähönen, Mika; Kathiresan, Sekar; Kee, Frank; Khaw, Kay-Tee; Kivimaki, Mika; Koenig, Wolfgang; Kraja, Aldi T.; Kumari, Meena; Kuulasmaa, Kari; Kuusisto, Johanna; Laitinen, Jaana H.; Lakka, Timo A.; Langenberg, Claudia; Launer, Lenore J.; Lind, Lars; Lindström, Jaana; Liu, Jianjun; Liuzzi, Antonio; Lokki, Marja-Liisa; Lorentzon, Mattias; Madden, Pamela A.; Magnusson, Patrik K.; Manunta, Paolo; Marek, Diana; März, Winfried; Leach, Irene Mateo; McKnight, Barbara; Medland, Sarah E.; Mihailov, Evelin; Milani, Lili; Montgomery, Grant W.; Mooser, Vincent; Mühleisen, Thomas W.; Munroe, Patricia B.; Musk, Arthur W.; Narisu, Narisu; Navis, Gerjan; Nicholson, George; Nohr, Ellen A.; Ong, Ken K.; Oostra, Ben A.; Palmer, Colin N.A.; Palotie, Aarno; Peden, John F.; Pedersen, Nancy; Peters, Annette; Polasek, Ozren; Pouta, Anneli; Pramstaller, Peter P.; Prokopenko, Inga; Pütter, Carolin; Radhakrishnan, Aparna; Raitakari, Olli; Rendon, Augusto; Rivadeneira, Fernando; Rudan, Igor; Saaristo, Timo E.; Sambrook, Jennifer G.; Sanders, Alan R.; Sanna, Serena; Saramies, Jouko; Schipf, Sabine; Schreiber, Stefan; Schunkert, Heribert; Shin, So-Youn; Signorini, Stefano; Sinisalo, Juha; Skrobek, Boris; Soranzo, Nicole; Stančáková, Alena; Stark, Klaus; Stephens, Jonathan C.; Stirrups, Kathleen; Stolk, Ronald P.; Stumvoll, Michael; Swift, Amy J.; Theodoraki, Eirini V.; Thorand, Barbara; Tregouet, David-Alexandre; Tremoli, Elena; Van der Klauw, Melanie M.; van Meurs, Joyce B.J.; Vermeulen, Sita H.; Viikari, Jorma; Virtamo, Jarmo; Vitart, Veronique; Waeber, Gérard; Wang, Zhaoming; Widén, Elisabeth; Wild, Sarah H.; Willemsen, Gonneke; Winkelmann, Bernhard R.; Witteman, Jacqueline C.M.; Wolffenbuttel, Bruce H.R.; Wong, Andrew; Wright, Alan F.

    2013-01-01

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying “causal” rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. PMID:23891470

  3. Statistical Analysis of a Large Sample Size Pyroshock Test Data Set Including Post Flight Data Assessment. Revision 1

    NASA Technical Reports Server (NTRS)

    Hughes, William O.; McNelis, Anne M.

    2010-01-01

    The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.

  4. An optimized gene set for transcriptomics based neurodevelopmental toxicity prediction in the neural embryonic stem cell test.

    PubMed

    Pennings, Jeroen L A; Theunissen, Peter T; Piersma, Aldert H

    2012-10-28

    The murine neural embryonic stem cell test (ESTn) is an in vitro model for neurodevelopmental toxicity testing. Recent studies have shown that application of transcriptomics analyses in the ESTn is useful for obtaining more accurate predictions as well as mechanistic insights. Gene expression responses due to stem cell neural differentiation versus toxicant exposure could be distinguished using the Principal Component Analysis based differentiation track algorithm. In this study, we performed a de novo analysis on combined raw data (10 compounds, 19 exposures) from three previous transcriptomics studies to identify an optimized gene set for neurodevelopmental toxicity prediction in the ESTn. By evaluating predictions of 200,000 randomly selected gene sets, we identified genes which significantly contributed to the prediction reliability. A set of 100 genes was obtained, predominantly involved in (neural) development. Further stringency restrictions resulted in a set of 29 genes that allowed for 84% prediction accuracy (area under the curve 94%). We anticipate these gene sets will contribute to further improve ESTn transcriptomics studies aimed at compound risk assessment.

  5. Building a statistical emulator for prediction of crop yield response to climate change: a global gridded panel data set approach

    NASA Astrophysics Data System (ADS)

    Mistry, Malcolm; De Cian, Enrica; Wing, Ian Sue

    2015-04-01

    There is widespread concern that trends and variability in weather induced by climate change will detrimentally affect global agricultural productivity and food supplies. Reliable quantification of the risks of negative impacts at regional and global scales is a critical research need, which has so far been met by forcing state-of-the-art global gridded crop models with outputs of global climate model (GCM) simulations in exercises such as the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP)-Fastrack. Notwithstanding such progress, it remains challenging to use these simulation-based projections to assess agricultural risk because their gridded fields of crop yields are fundamentally denominated as discrete combinations of warming scenarios, GCMs and crop models, and not as model-specific or model-averaged yield response functions of meteorological shifts, which may have their own independent probability of occurrence. By contrast, the empirical climate economics literature has adeptly represented agricultural responses to meteorological variables as reduced-form statistical response surfaces which identify the crop productivity impacts of additional exposure to different intervals of temperature and precipitation [cf Schlenker and Roberts, 2009]. This raises several important questions: (1) what do the equivalent reduced-form statistical response surfaces look like for crop model outputs, (2) do they exhibit systematic variation over space (e.g., crop suitability zones) or across crop models with different characteristics, (3) how do they compare to estimates based on historical observations, and (4) what are the implications for the characterization of climate risks? We address these questions by estimating statistical yield response functions for four major crops (maize, rice, wheat and soybeans) over the historical period (1971-2004) as well as future climate change scenarios (2005-2099) using ISIMIP-Fastrack data for five GCMs and seven crop models

  6. Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study

    PubMed Central

    2010-01-01

    Background In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services. Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases. Methods A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links. Results The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events. Conclusions The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm

  7. Evaluation of daily precipitation statistics and monsoon onset/retreat over western Sahel in multiple data sets

    NASA Astrophysics Data System (ADS)

    Diaconescu, Emilia Paula; Gachon, Philippe; Scinocca, John; Laprise, René

    2015-09-01

    The West Africa rainfall regime constitutes a considerable challenge for Regional Climate Models (RCMs) due to the complexity of dynamical and physical processes that characterise the West African Monsoon. In this paper, daily precipitation statistics are evaluated from the contributions to the AFRICA-CORDEX experiment from two ERA-Interim driven Canadian RCMs: CanRCM4, developed at the Canadian Centre for Climate Modelling and Analysis (CCCma) and CRCM5, developed at the University of Québec at Montréal. These modelled precipitation statistics are evaluated against three gridded observed datasets—the Global Precipitation Climatology Project (GPCP), the Tropical Rainfall Measuring Mission (TRMM), and the Africa Rainfall Climatology (ARC2)—and four reanalysis products (ECMWF ERA-Interim, NCEP/DOE Reanalysis II, NASA MERRA and NOAA-CIRES Twentieth Century Reanalysis). The two RCMs share the same dynamics from the Environment Canada GEM forecast model, but have two different physics' packages: CanRCM4 obtains its physics from CCCma's global atmospheric model (CanAM4), while CRCM5 shares a number of its physics modules with the limited-area version of GEM forecast model. The evaluation is focused on various daily precipitation statistics (maximum number of consecutive wet days, number of moderate and very heavy precipitation events, precipitation frequency distribution) and on the monsoon onset and retreat over the Sahel region. We find that the CRCM5 has a good representation of daily precipitation statistics over the southern Sahel, with spatial distributions close to GPCP dataset. Some differences are observed in the northern part of the Sahel, where the model is characterised by a dry bias. CanRCM4 and the ERA-Interim and MERRA reanalysis products overestimate the number of wet days over Sahel with a shift in the frequency distribution toward smaller daily precipitation amounts than in observations. Both RCMs and reanalyses have difficulties in reproducing

  8. Statistical tests against systematic errors in data sets based on the equality of residual means and variances from control samples: theory and applications.

    PubMed

    Henn, Julian; Meindl, Kathrin

    2015-03-01

    Statistical tests are applied for the detection of systematic errors in data sets from least-squares refinements or other residual-based reconstruction processes. Samples of the residuals of the data are tested against the hypothesis that they belong to the same distribution. For this it is necessary that they show the same mean values and variances within the limits given by statistical fluctuations. When the samples differ significantly from each other, they are not from the same distribution within the limits set by the significance level. Therefore they cannot originate from a single Gaussian function in this case. It is shown that a significance cutoff results in exactly this case. Significance cutoffs are still frequently used in charge-density studies. The tests are applied to artificial data with and without systematic errors and to experimental data from the literature.

  9. A transcriptomic approach to identify regulatory genes involved in fruit set of wild-type and parthenocarpic tomato genotypes.

    PubMed

    Ruiu, Fabrizio; Picarella, Maurizio Enea; Imanishi, Shunsuke; Mazzucato, Andrea

    2015-10-01

    The tomato parthenocarpic fruit (pat) mutation associates a strong competence for parthenocarpy with homeotic transformation of anthers and aberrancy of ovules. To dissect this complex floral phenotype, genes involved in the pollination-independent fruit set of the pat mutant were investigated by microarray analysis using wild-type and mutant ovaries. Normalized expression data were subjected to one-way ANOVA and 2499 differentially expressed genes (DEGs) displaying a >1.5 log-fold change in at least one of the pairwise comparisons analyzed were detected. DEGs were categorized into 20 clusters and clusters classified into five groups representing transcripts with similar expression dynamics. The "regulatory function" group (685 DEGs) contained putative negative or positive fruit set regulators, "pollination-dependent" (411 DEGs) included genes activated by pollination, "fruit growth-related" (815 DEGs) genes activated at early fruit growth. The last groups listed genes with different or similar expression pattern at all stages in the two genotypes. qRT-PCR validation of 20 DEGs plus other four selected genes assessed the high reliability of microarray expression data; the average correlation coefficient for the 20 DEGs was 0.90. In all the groups were evidenced relevant transcription factors encoding proteins regulating meristem differentiation and floral organ development, genes involved in metabolism, transport and response of hormones, genes involved in cell division and in primary and secondary metabolism. Among pathways related to secondary metabolites emerged genes related to the synthesis of flavonoids, supporting the recent evidence that these compounds are important at the fruit set phase. Selected genes showing a de-regulated expression pattern in pat were studied in other four parthenocarpic genotypes either genetically anonymous or carrying lesions in known gene sequences. This comparative approach offered novel insights for improving the present

  10. A transcriptomic approach to identify regulatory genes involved in fruit set of wild-type and parthenocarpic tomato genotypes.

    PubMed

    Ruiu, Fabrizio; Picarella, Maurizio Enea; Imanishi, Shunsuke; Mazzucato, Andrea

    2015-10-01

    The tomato parthenocarpic fruit (pat) mutation associates a strong competence for parthenocarpy with homeotic transformation of anthers and aberrancy of ovules. To dissect this complex floral phenotype, genes involved in the pollination-independent fruit set of the pat mutant were investigated by microarray analysis using wild-type and mutant ovaries. Normalized expression data were subjected to one-way ANOVA and 2499 differentially expressed genes (DEGs) displaying a >1.5 log-fold change in at least one of the pairwise comparisons analyzed were detected. DEGs were categorized into 20 clusters and clusters classified into five groups representing transcripts with similar expression dynamics. The "regulatory function" group (685 DEGs) contained putative negative or positive fruit set regulators, "pollination-dependent" (411 DEGs) included genes activated by pollination, "fruit growth-related" (815 DEGs) genes activated at early fruit growth. The last groups listed genes with different or similar expression pattern at all stages in the two genotypes. qRT-PCR validation of 20 DEGs plus other four selected genes assessed the high reliability of microarray expression data; the average correlation coefficient for the 20 DEGs was 0.90. In all the groups were evidenced relevant transcription factors encoding proteins regulating meristem differentiation and floral organ development, genes involved in metabolism, transport and response of hormones, genes involved in cell division and in primary and secondary metabolism. Among pathways related to secondary metabolites emerged genes related to the synthesis of flavonoids, supporting the recent evidence that these compounds are important at the fruit set phase. Selected genes showing a de-regulated expression pattern in pat were studied in other four parthenocarpic genotypes either genetically anonymous or carrying lesions in known gene sequences. This comparative approach offered novel insights for improving the present

  11. A novel proposal of a simplified bacterial gene set and the neo-construction of a general minimized metabolic network

    PubMed Central

    Ye, Yuan-Nong; Ma, Bin-Guang; Dong, Chuan; Zhang, Hong; Chen, Ling-Ling; Guo, Feng-Biao

    2016-01-01

    A minimal gene set (MGS) is critical for the assembly of a minimal artificial cell. We have developed a proposal of simplifying bacterial gene set to approximate a bacterial MGS by the following procedure. First, we base our simplified bacterial gene set (SBGS) on experimentally determined essential genes to ensure that the genes included in the SBGS are critical. Second, we introduced a half-retaining strategy to extract persistent essential genes to ensure stability. Third, we constructed a viable metabolic network to supplement SBGS. The proposed SBGS includes 327 genes and required 431 reactions. This report describes an SBGS that preserves both self-replication and self-maintenance systems. In the minimized metabolic network, we identified five novel hub metabolites and confirmed 20 known hubs. Highly essential genes were found to distribute the connecting metabolites into more reactions. Based on our SBGS, we expanded the pool of targets for designing broad-spectrum antibacterial drugs to reduce pathogen resistance. We also suggested a rough semi-de novo strategy to synthesize an artificial cell, with potential applications in industry. PMID:27713529

  12. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    PubMed Central

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  13. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    PubMed

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed. PMID:27446945

  14. Statistical generation of training sets for measuring NO3(-), NH4(+) and major ions in natural waters using an ion selective electrode array.

    PubMed

    Mueller, Amy V; Hemond, Harold F

    2016-05-18

    Knowledge of ionic concentrations in natural waters is essential to understand watershed processes. Inorganic nitrogen, in the form of nitrate and ammonium ions, is a key nutrient as well as a participant in redox, acid-base, and photochemical processes of natural waters, leading to spatiotemporal patterns of ion concentrations at scales as small as meters or hours. Current options for measurement in situ are costly, relying primarily on instruments adapted from laboratory methods (e.g., colorimetric, UV absorption); free-standing and inexpensive ISE sensors for NO3(-) and NH4(+) could be attractive alternatives if interferences from other constituents were overcome. Multi-sensor arrays, coupled with appropriate non-linear signal processing, offer promise in this capacity but have not yet successfully achieved signal separation for NO3(-) and NH4(+)in situ at naturally occurring levels in unprocessed water samples. A novel signal processor, underpinned by an appropriate sensor array, is proposed that overcomes previous limitations by explicitly integrating basic chemical constraints (e.g., charge balance). This work further presents a rationalized process for the development of such in situ instrumentation for NO3(-) and NH4(+), including a statistical-modeling strategy for instrument design, training/calibration, and validation. Statistical analysis reveals that historical concentrations of major ionic constituents in natural waters across New England strongly covary and are multi-modal. This informs the design of a statistically appropriate training set, suggesting that the strong covariance of constituents across environmental samples can be exploited through appropriate signal processing mechanisms to further improve estimates of minor constituents. Two artificial neural network architectures, one expanded to incorporate knowledge of basic chemical constraints, were tested to process outputs of a multi-sensor array, trained using datasets of varying degrees of

  15. A new set of differentially expressed signaling genes is early expressed in coffee leaf rust race II incompatible interaction.

    PubMed

    Diola, Valdir; Brito, Giovani G; Caixeta, Eveline T; Pereira, Luiz F P; Loureiro, Marcelo E

    2013-08-01

    defense genes: early expression of signaling genes support the hypothesis that higher expression of the signaling components up regulates the defense genes identified. Additionally the increased gene expression of these two gene sets is associated with a single monogenic resistance trait to to leaf coffee rust in the interaction characterized here.

  16. Now you Bayes, now you don't: effects of set-problem and frequency-format mental representations on statistical reasoning.

    PubMed

    Sirota, Miroslav; Kostovičová, Lenka; Vallée-Tourangeau, Frédéric

    2015-10-01

    People appear to be Bayesian when statistical information is presented in terms of natural frequencies and non-Bayesian when presented in terms of single-event probabilities, unless the probabilities resemble natural frequencies, for example, as chances. The isomorphic format of chances, however, does not always facilitate performance to the extent that the format of natural frequencies does. Prior research has not addressed the underlying mechanism that accounts for this gap despite its theoretical significance. The mechanism explaining this external format gap could lie in the interpretation of the problem as a set-problem, which cues relevant problem model and arithmetic operations (the problem interpretation hypothesis) and/or in the interpretation of the format as frequencies, which may be easier to process (the format interpretation hypothesis). In two parallel experiments, we found support for the problem interpretation hypothesis only: set representations mediated solely the isomorphic format gap (Experiment 1: part A) and accounted for the transfer effect to natural frequencies (Experiment 1: part B); priming set representations improved performance with chances (Experiment 2). We discuss how the supported explanation corroborates the nested-sets rather than the ecological rationality account of statistical reasoning and how it helps explain individual differences in Bayesian reasoning.

  17. Dissection of the oncogenic MYCN transcriptional network reveals a large set of clinically relevant cell cycle genes as drivers of neuroblastoma tumorigenesis.

    PubMed

    Murphy, Derek M; Buckley, Patrick G; Bryan, Kenneth; Watters, Karen M; Koster, Jan; van Sluis, Peter; Molenaar, Jan; Versteeg, Rogier; Stallings, Raymond L

    2011-06-01

    Amplification of the oncogenic transcription factor MYCN plays a major role in the pathogenesis of several pediatric cancers, including neuroblastoma, medulloblastoma, and rhabodomyosarcoma. For neuroblastoma, MYCN amplification is the most powerful genetic predictor of poor patient survival, yet the mechanism by which MYCN drives tumorigenesis is only partially understood. To gain an insight into the distribution of MYCN binding and to identify clinically relevant MYCN target genes, we performed an integrated analysis of MYCN ChIP-chip and mRNA expression using the MYCN repressible SHEP-21N neuroblastoma cell line. We hypothesized that genes exclusively MYCN bound in SHEP-21N cells over-expressing MYCN would be enriched for direct targets which contribute to the process of disease progression. Integrated analysis revealed that MYCN drives tumorigenesis predominantly as a positive regulator of target gene transcription. A high proportion of genes (24%) that are MYCN bound and up-regulated in the SHEP-21N model are significantly associated with poor overall patient survival (OS) in a set of 88 tumors. In contrast, the proportion of genes down-regulated when bound by MYCN in the SHEP-21N model and which are significantly associated with poor overall patient survival when under-expressed in primary tumors was significantly lower (5%). Gene ontology analysis determined a highly statistically significant enrichment for cell cycle related genes within the over-expressed MYCN target group which were also associated with poor OS. We conclude that the over-expression of MYCN leads to aberrant binding and over-expression of genes associated with cell cycle regulation which are significantly correlated with poor OS and MYCN amplification.

  18. lambda altSF: a phage variant that acquired the ability to substitute specific sets of genes at high frequency.

    PubMed Central

    Friedman, D; Tomich, P; Parsons, C; Olson, E; Deans, R; Flamm, E

    1981-01-01

    We report the isolation of lambda altSF, a variant of Escherichia coli phage lambda that substitutes sets of genes at high frequency. Two forms of the variant phage have been studied: lambda altSF lambda, which exhibits the immunity (repressor recognition) of phage lambda, and lambda altSF22, which exhibits the immunity of Salmonella phage P22. Lysates made from single plaques of lambda altSF lambda contain 10-30% phage of the P22 form. Similarly, lysates from single plaques of lambda altSF22 contain as much as 1% phage of the lambda form. Heteroduplex analyses reveal the following features of the lambda altSF chromosomes: (i) each form has the immunity genes appropriate to its immune phenotype, (ii) the substituted segments include genes involved in regulation and replication, and (iii) the alt phages have unusual additions and substitutions of DNA not normally found associated with either immunity region. In the case of lambda altSF lambda, there is a small insertion in the region of the cI gene. Because revertants that lose this inserted DNA concomitantly lose the ability to substitute, we conclude that the insertion plays a role in the substitution process. In the case of change from lambda altSF lambda to lambda altSF22, the substituting P22 genes are derived from the E. coli host. We have identified a set of Salmonella phage P22 genes in a standard nonlysogenic strain of E. coli K-12 that is apparently carried in a silent form. The reason for this lack of expression is not obvious, because this P22 material includes structural genes and associated promoters and is potentially active. When this set of genes substitutes for the analogous set of genetic material on the genome of lambda altSF lambda, the P22 genes are expressed in a normal manner. Images PMID:6454136

  19. Correlation of a set of gene variants, life events and personality features on adult ADHD severity.

    PubMed

    Müller, Daniel J; Chiesa, Alberto; Mandelli, Laura; De Luca, Vincenzo; De Ronchi, Diana; Jain, Umesh; Serretti, Alessandro; Kennedy, James L

    2010-07-01

    Increasing evidence suggests that symptoms of attention deficit hyperactivity disorder (ADHD) could persist into adult life in a substantial proportion of cases. The aim of the present study was to investigate the impact of (1) adverse events, (2) personality traits and (3) genetic variants chosen on the basis of previous findings and (4) their possible interactions on adult ADHD severity. One hundred and ten individuals diagnosed with adult ADHD were evaluated for occurrence of adverse events in childhood and adulthood, and personality traits by the Temperament and Character Inventory (TCI). Common polymorphisms within a set of nine important candidate genes (SLC6A3, DBH, DRD4, DRD5, HTR2A, CHRNA7, BDNF, PRKG1 and TAAR9) were genotyped for each subject. Life events, personality traits and genetic variations were analyzed in relationship to severity of current symptoms, according to the Brown Attention Deficit Disorder Scale (BADDS). Genetic variations were not significantly associated with severity of ADHD symptoms. Life stressors displayed only a minor effect as compared to personality traits. Indeed, symptoms' severity was significantly correlated with the temperamental trait of Harm avoidance and the character trait of Self directedness. The results of the present work are in line with previous evidence of a significant correlation between some personality traits and adult ADHD. However, several limitations such as the small sample size and the exclusion of patients with other severe comorbid psychiatric disorders could have influenced the significance of present findings.

  20. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  1. Nutritional status of breastfed infants in rural Zambia: comparison of the National Center for Health Statistics growth reference versus the WHO 12-month breastfed pooled data set.

    PubMed Central

    Hautvast, J. L.; Pandor, A.; Burema, J.; Tolboom, J. J.; Chishimba, N.; Monnens, L. A.; van Staveren, W. A.

    2000-01-01

    Cross-sectional data for breastfed infants in rural Zambia were used to evaluate the effect of applying two different data sets as a reference, i.e. the WHO 12-month breastfed pooled data set and the National Center for Health Statistics (NCHS) growth reference in terms of prevalence of malnutrition (stunting, underweight, and wasting). A total of 518 infants who were attending mother-and-child health clinics were included. Age, weight and length were recorded. Anthropometric Z-scores were calculated in two ways: by applying the NCHS growth reference and by using the WHO breastfed data set. Anthropometric Z-scores calculated using the breastfed data set were lower during the first 6-7 months of life compared with those calculated by applying the NCHS growth reference. This resulted in a higher proportion of children aged 0-6 months being classified as stunted and underweight using the breastfed data set versus the NCHS growth reference. After the age of 7 months, similar prevalences of stunting or underweight were observed. Relatively few infants were classified as wasted. In order to adequately assess the prevalence of stunting and underweight in breastfed infants, it is recommended that a new growth reference be developed, as has been initiated by WHO. PMID:10885182

  2. Identical sets of methylated and nonmethylated genes in Ciona intestinalis sperm and muscle cells

    PubMed Central

    2013-01-01

    Background The discovery of gene body methylation, which refers to DNA methylation within gene coding region, suggests an as yet unknown role of DNA methylation at actively transcribed genes. In invertebrates, gene bodies are the primary targets of DNA methylation, and only a subset of expressed genes is modified. Results Here we investigate the tissue variability of both the global levels and distribution of 5-methylcytosine (5mC) in the sea squirt Ciona intestinalis. We find that global 5mC content of early developmental embryos is high, but is strikingly reduced in body wall tissues. We chose sperm and adult muscle cells, with high and reduced levels of global 5mC respectively, for genome-wide analysis of 5mC targets. By means of CXXC-affinity purification followed by deep sequencing (CAP-seq), and genome-wide bisulfite sequencing (BS-seq), we designated body-methylated and unmethylated genes in each tissue. Surprisingly, body-methylated and unmethylated gene groups are identical in the sperm and muscle cells. Our analysis of microarray expression data shows that gene body methylation is associated with broad expression throughout development. Moreover, transgenic analysis reveals contrasting gene body methylation at an identical gene-promoter combination when integrated at different genomic sites. Conclusions We conclude that gene body methylation is not a direct regulator of tissue specific gene expression in C. intestinalis. Our findings reveal constant targeting of gene body methylation irrespective of cell type, and they emphasize a correlation between gene body methylation and ubiquitously expressed genes. Our transgenic experiments suggest that the promoter does not determine the methylation status of the associated gene body. PMID:24279449

  3. The choice of reference gene set for assessing gene expression in barley (Hordeum vulgare L.) under low temperature and drought stress.

    PubMed

    Janská, Anna; Hodek, Jan; Svoboda, Pavel; Zámečník, Jiří; Prášil, Ilja Tom; Vlasáková, Eva; Milella, Luigi; Ovesná, Jaroslava

    2013-11-01

    Drought and low temperature are the two most significant causes of abiotic stress in agricultural crops and, therefore, they pose considerable challenges in plant science. Hence, it is crucial to study response mechanisms and to select genes for identification signaling pathways that lead from stimulus to response. The assessment of gene expression is often attempted using real-time RT-PCR (qRT-PCR), a technique which requires a careful choice of reference gene(s) for normalization purpose. Here, we report a comparison of 13 potential reference genes for studying gene expression in the leaf and crown of barley seedlings subjected to low temperature or drought stress. All three currently available software packages designed to identify reference genes from qRT-PCR data (GeNorm, NormFinder and BestKeeper) were used to identify informative sets of up to three reference genes. Interestingly, the data obtained from the separate treatment of leaf and crown have led to the recommendations that HSP70 and S-AMD (and possibly HSP90) to be used as the reference genes for low-temperature stressed leaves, HSP90 and EF1α for low-temperature stressed crowns, cyclophilin and ADP-RF (and possibly ACT) for drought-stressed leaves, and EF1α and S-AMD for drought-stressed crowns. Our results have demonstrated that the gene expression can be highly tissue- or organ-specific in barley and have confirmed that reference gene choice is essential in qRT-PCR. The findings can also serve as guidelines for the selection of reference genes under different stress conditions and lay foundation for more accurate and widespread use of qRT-PCR in barley gene analysis.

  4. Comparative analysis of the Acyrthosiphon pisum genome and expressed sequence tag-based gene sets from other aphid species.

    PubMed

    Ollivier, M; Legeai, F; Rispe, C

    2010-03-01

    To study gene repertoires and their evolution within aphids, we compared the complete genome sequence of Acyrthosiphon pisum (reference gene set) and expressed sequence tag (EST) data from three other species: Myzus persicae, Aphis gossypii and Toxoptera citricida. We assembled ESTs, predicted coding sequences, and identified potential pairs of orthologues (reciprocical best hits) with A. pisum. Pairwise comparisons show that a fraction of the genes evolve fast (high ratio of non-synonymous to synonymous rates), including many genes shared by aphids but with no hit in Uniprot. A detailed phylogenetic study for four fast-evolving genes (C002, JHAMT, Apo and GH) shows that rate accelerations are often associated with duplication events. We also compare compositional patterns between the two tribes of aphids, Aphidini and Macrosiphini.

  5. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes

    PubMed Central

    Galperin, Michael Y; Mekhedov, Sergei L; Puigbo, Pere; Smirnov, Sergey; Wolf, Yuri I; Rigden, Daniel J

    2012-01-01

    Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia. PMID:22882546

  6. Counteracting H3K4 methylation modulators Set1 and Jhd2 co-regulate chromatin dynamics and gene transcription

    PubMed Central

    Ramakrishnan, Saravanan; Pokhrel, Srijana; Palani, Sowmiya; Pflueger, Christian; Parnell, Timothy J.; Cairns, Bradley R.; Bhaskara, Srividya; Chandrasekharan, Mahesh B.

    2016-01-01

    Histone H3K4 methylation is connected to gene transcription from yeast to humans, but its mechanistic roles in transcription and chromatin dynamics remain poorly understood. We investigated the functions for Set1 and Jhd2, the sole H3K4 methyltransferase and H3K4 demethylase, respectively, in S. cerevisiae. Here, we show that Set1 and Jhd2 predominantly co-regulate genome-wide transcription. We find combined activities of Set1 and Jhd2 via H3K4 methylation contribute to positive or negative transcriptional regulation. Providing mechanistic insights, our data reveal that Set1 and Jhd2 together control nucleosomal turnover and occupancy during transcriptional co-regulation. Moreover, we find a genome-wide co-regulation of chromatin structure by Set1 and Jhd2 at different groups of transcriptionally active or inactive genes and at different regions within yeast genes. Overall, our study puts forth a model wherein combined actions of Set1 and Jhd2 via modulating H3K4 methylation−demethylation together control chromatin dynamics during various facets of transcriptional regulation. PMID:27325136

  7. Genome-wide association data suggest ABCB1 and immune-related gene sets may be involved in adult antisocial behavior.

    PubMed

    Salvatore, J E; Edwards, A C; McClintick, J N; Bigdeli, T B; Adkins, A; Aliev, F; Edenberg, H J; Foroud, T; Hesselbrock, V; Kramer, J; Nurnberger, J I; Schuckit, M; Tischfield, J A; Xuei, X; Dick, D M

    2015-04-28

    Adult antisocial behavior (AAB) is moderately heritable, relatively common and has adverse consequences for individuals and society. We examined the molecular genetic basis of AAB in 1379 participants from a case-control study in which the cases met criteria for alcohol dependence. We also examined whether genes of interest were expressed in human brain. AAB was measured using a count of the number of Antisocial Personality Disorder criteria endorsed under criterion A from the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV). Participants were genotyped on the Illumina Human 1M BeadChip. In total, all single-nucleotide polymorphisms (SNPs) accounted for 25% of the variance in AAB, although this estimate was not significant (P=0.09). Enrichment tests indicated that more significantly associated genes were over-represented in seven gene sets, and most were immune related. Our most highly associated SNP (rs4728702, P=5.77 × 10(-7)) was located in the protein-coding adenosine triphosphate-binding cassette, sub-family B (MDR/TAP), member 1 (ABCB1). In a gene-based test, ABCB1 was genome-wide significant (q=0.03). Expression analyses indicated that ABCB1 was robustly expressed in the brain. ABCB1 has been implicated in substance use, and in post hoc tests we found that variation in ABCB1 was associated with DSM-IV alcohol and cocaine dependence criterion counts. These results suggest that ABCB1 may confer risk across externalizing behaviors, and are consistent with previous suggestions that immune pathways are associated with externalizing behaviors. The results should be tempered by the fact that we did not replicate the associations for ABCB1 or the gene sets in a less-affected independent sample.

  8. Genome-wide association data suggest ABCB1 and immune-related gene sets may be involved in adult antisocial behavior

    PubMed Central

    Salvatore, J E; Edwards, A C; McClintick, J N; Bigdeli, T B; Adkins, A; Aliev, F; Edenberg, H J; Foroud, T; Hesselbrock, V; Kramer, J; Nurnberger, J I; Schuckit, M; Tischfield, J A; Xuei, X; Dick, D M

    2015-01-01

    Adult antisocial behavior (AAB) is moderately heritable, relatively common and has adverse consequences for individuals and society. We examined the molecular genetic basis of AAB in 1379 participants from a case–control study in which the cases met criteria for alcohol dependence. We also examined whether genes of interest were expressed in human brain. AAB was measured using a count of the number of Antisocial Personality Disorder criteria endorsed under criterion A from the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV). Participants were genotyped on the Illumina Human 1M BeadChip. In total, all single-nucleotide polymorphisms (SNPs) accounted for 25% of the variance in AAB, although this estimate was not significant (P=0.09). Enrichment tests indicated that more significantly associated genes were over-represented in seven gene sets, and most were immune related. Our most highly associated SNP (rs4728702, P=5.77 × 10−7) was located in the protein-coding adenosine triphosphate-binding cassette, sub-family B (MDR/TAP), member 1 (ABCB1). In a gene-based test, ABCB1 was genome-wide significant (q=0.03). Expression analyses indicated that ABCB1 was robustly expressed in the brain. ABCB1 has been implicated in substance use, and in post hoc tests we found that variation in ABCB1 was associated with DSM-IV alcohol and cocaine dependence criterion counts. These results suggest that ABCB1 may confer risk across externalizing behaviors, and are consistent with previous suggestions that immune pathways are associated with externalizing behaviors. The results should be tempered by the fact that we did not replicate the associations for ABCB1 or the gene sets in a less-affected independent sample. PMID:25918995

  9. Genetic Differentiation and Estimation of Gene Flow from F-Statistics under Isolation by Distance

    PubMed Central

    Rousset, F.

    1997-01-01

    I reexamine the use of isolation by distance models as a basis for the estimation of demographic parameters from measures of population subdivision. To that aim, I first provide results for values of F-statistics in one-dimensional models and coalescence times in two-dimensional models, and make more precise earlier results for F-statistics in two-dimensional models and coalescence times in one-dimensional models. Based on these results, I propose a method of data analysis involving the regression of F(ST)/(1 - F(ST)) estimates for pairs of subpopulations on geographic distance for populations along linear habitats or logarithm of distance for populations in two-dimensional habitats. This regression provides in principle an estimate of the product of population density and second moment of parental axial distance. In two cases where comparison to direct estimates is possible, the method proposed here is more satisfactory than previous indirect methods. PMID:9093870

  10. Statistical heterospectroscopy, an approach to the integrated analysis of NMR and UPLC-MS data sets: application in metabonomic toxicology studies.

    PubMed

    Crockford, Derek J; Holmes, Elaine; Lindon, John C; Plumb, Robert S; Zirah, Severine; Bruce, Stephen J; Rainville, Paul; Stumpf, Chris L; Nicholson, Jeremy K

    2006-01-15

    Statistical heterospectroscopy (SHY) is a new statistical paradigm for the coanalysis of multispectroscopic data sets acquired on multiple samples. This method operates through the analysis of the intrinsic covariance between signal intensities in the same and related molecules measured by different techniques across cohorts of samples. The potential of SHY is illustrated using both 600-MHz 1H NMR and UPLC-TOFMS data obtained from control rat urine samples (n = 54) and from a corresponding hydrazine-treated group (n = 58). We show that direct cross-correlation of spectral parameters, viz. chemical shifts from NMR and m/z data from MS, is readily achievable for a variety of metabolites, which leads to improved efficiency of molecular biomarker identification. In addition to structure, higher level biological information can be obtained on metabolic pathway activity and connectivities by examination of different levels of the NMR to MS correlation and anticorrelation matrixes. The SHY approach is of general applicability to complex mixture analysis, if two or more independent spectroscopic data sets are available for any sample cohort. Biological applications of SHY as demonstrated here show promise as a new systems biology tool for biomarker recovery. PMID:16408915

  11. Statistical epistasis between candidate gene alleles for complex tuber traits in an association mapping population of tetraploid potato.

    PubMed

    Li, Li; Paulo, Maria-João; van Eeuwijk, Fred; Gebhardt, Christiane

    2010-11-01

    Association mapping using DNA-based markers is a novel tool in plant genetics for the analysis of complex traits. Potato tuber yield, starch content, starch yield and chip color are complex traits of agronomic relevance, for which carbohydrate metabolism plays an important role. At the functional level, the genes and biochemical pathways involved in carbohydrate metabolism are among the best studied in plants. Quantitative traits such as tuber starch and sugar content are therefore models for association genetics in potato based on candidate genes. In an association mapping experiment conducted with a population of 243 tetraploid potato varieties and breeding clones, we previously identified associations between individual candidate gene alleles and tuber starch content, starch yield and chip quality. In the present paper, we tested 190 DNA markers at 36 loci scored in the same association mapping population for pairwise statistical epistatic interactions. Fifty marker pairs were associated mainly with tuber starch content and/or starch yield, at a cut-off value of q ≤ 0.20 for the experiment-wide false discovery rate (FDR). Thirteen marker pairs had an FDR of q ≤ 0.10. Alleles at loci encoding ribulose-bisphosphate carboxylase/oxygenase activase (Rca), sucrose phosphate synthase (Sps) and vacuolar invertase (Pain1) were most frequently involved in statistical epistatic interactions. The largest effect on tuber starch content and starch yield was observed for the paired alleles Pain1-8c and Rca-1a, explaining 9 and 10% of the total variance, respectively. The combination of these two alleles increased the means of tuber starch content and starch yield. Biological models to explain the observed statistical epistatic interactions are discussed.

  12. Transcriptional Analysis of a Unique Set of Genes Involved in Schistosoma mansoni Female Reproductive Biology

    PubMed Central

    Cogswell, Alexis A.; Kommer, Valerie P.; Williams, David L.

    2012-01-01

    Schistosomiasis affects more than 200 million people globally. The pathology of schistosome infections is due to chronic tissue inflammation and damage from immune generated granulomas surrounding parasite eggs trapped in host tissues. Schistosoma species are unique among trematode parasites because they are dioecious; females require paring with male parasites in order to attain reproductive maturity and produce viable eggs. Ex vivo cultured females lose the ability to produce viable eggs due to an involution of the vitellarium and loss of mature oocytes. In order to better understand schistosome reproductive biology we used data generated by serial analysis of gene expression (SAGE) to identify uncharacterized genes which have different transcript abundance in mature females, those that have been paired with males, and immature females obtained from unisexual infections. To characterize these genes we used bioinformatics, transcript localization, and transcriptional analysis during the regression of in vitro cultured females. Genes transcribed exclusively in mature females localize primarily in the vitellocytes and/or the ovary. Genes transcribed exclusively in females from single sex infections localize to vitellocytes and subtegumental cells. As female reproductive tissues regress, eggshell precursor proteins and genes involved in eggshell synthesis largely have decreased transcript abundance. However, some genes with elevated transcript abundance in mature adults have increased gene expression following regression indicating that the genes in this study function both in eggshell biology as well as vitellogenesis and maintenance of female reproductive tissues. In addition, we found that genes enriched in females from single sex infections have increased expression during regression in ex vivo females. By using these transcriptional analyses we can direct research to examine the areas of female biology that are both relevant to understanding the overall process

  13. Genetic acquisition of NDM gene offers sustainability among clinical isolates of Pseudomonas aeruginosa in clinical settings.

    PubMed

    Mishra, Shweta; Upadhyay, Supriya; Sen, Malay Ranjan; Maurya, Anand Prakash; Choudhury, Debarati; Bhattacharjee, Amitabha

    2015-01-01

    New Delhi metallo β-lactamases are one of the most significant emerging resistance determinants towards carbapenem drugs. Their persistence and adaptability often depends on their genetic environment and linkage. This study reports a unique and novel arrangement of blaNDM-1 gene within clinical isolates of Pseudomonas aeruginosa from a tertiary referral hospital in north India. Three NDM positive clonally unrelated clinical isolates of P. aeruginosa were recovered from hospital patients. Association of integron with blaNDM-1 and presence of gene cassettes were assessed by PCR. Genetic linkage of NDM gene with ISAba125 was determined and in negative cases linkage in upstream region was mapped by inverse PCR. In which only one isolate's NDM gene was linked with ISAba125 for mobility, while other two reveals new genetic arrangement and found to be inserted within DNA directed RNA polymerase gene of the host genome detected by inverse PCR followed by sequencing analysis. In continuation significance of this novel linkage was further analyzed wherein promoter site detected by Softberry BPROM software and activity were assessed by cloning succeeding semi-quantitative RT-PCR indicating the higher expression level of NDM gene. This study concluded out that the unique genetic makeup of NDM gene with DNA-dependent-RNA-polymerase favours adaptability to the host in hospital environment against huge antibiotic pressure. PMID:25635921

  14. Genetic acquisition of NDM gene offers sustainability among clinical isolates of Pseudomonas aeruginosa in clinical settings.

    PubMed

    Mishra, Shweta; Upadhyay, Supriya; Sen, Malay Ranjan; Maurya, Anand Prakash; Choudhury, Debarati; Bhattacharjee, Amitabha

    2015-01-01

    New Delhi metallo β-lactamases are one of the most significant emerging resistance determinants towards carbapenem drugs. Their persistence and adaptability often depends on their genetic environment and linkage. This study reports a unique and novel arrangement of blaNDM-1 gene within clinical isolates of Pseudomonas aeruginosa from a tertiary referral hospital in north India. Three NDM positive clonally unrelated clinical isolates of P. aeruginosa were recovered from hospital patients. Association of integron with blaNDM-1 and presence of gene cassettes were assessed by PCR. Genetic linkage of NDM gene with ISAba125 was determined and in negative cases linkage in upstream region was mapped by inverse PCR. In which only one isolate's NDM gene was linked with ISAba125 for mobility, while other two reveals new genetic arrangement and found to be inserted within DNA directed RNA polymerase gene of the host genome detected by inverse PCR followed by sequencing analysis. In continuation significance of this novel linkage was further analyzed wherein promoter site detected by Softberry BPROM software and activity were assessed by cloning succeeding semi-quantitative RT-PCR indicating the higher expression level of NDM gene. This study concluded out that the unique genetic makeup of NDM gene with DNA-dependent-RNA-polymerase favours adaptability to the host in hospital environment against huge antibiotic pressure.

  15. A Complete Set of Flagellar Genes Acquired by Horizontal Transfer Coexists with the Endogenous Flagellar System in Rhodobacter sphaeroides▿ †

    PubMed Central

    Poggio, Sebastian; Abreu-Goodger, Cei; Fabela, Salvador; Osorio, Aurora; Dreyfus, Georges; Vinuesa, Pablo; Camarena, Laura

    2007-01-01

    Bacteria swim in liquid environments by means of a complex rotating structure known as the flagellum. Approximately 40 proteins are required for the assembly and functionality of this structure. Rhodobacter sphaeroides has two flagellar systems. One of these systems has been shown to be functional and is required for the synthesis of the well-characterized single subpolar flagellum, while the other was found only after the genome sequence of this bacterium was completed. In this work we found that the second flagellar system of R. sphaeroides can be expressed and produces a functional flagellum. In many bacteria with two flagellar systems, one is required for swimming, while the other allows movement in denser environments by producing a large number of flagella over the entire cell surface. In contrast, the second flagellar system of R. sphaeroides produces polar flagella that are required for swimming. Expression of the second set of flagellar genes seems to be positively regulated under anaerobic growth conditions. Phylogenic analysis suggests that the flagellar system that was initially characterized was in fact acquired by horizontal transfer from a γ-proteobacterium, while the second flagellar system contains the native genes. Interestingly, other α-proteobacteria closely related to R. sphaeroides have also acquired a set of flagellar genes similar to the set found in R. sphaeroides, suggesting that a common ancestor received this gene cluster. PMID:17293429

  16. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    PubMed

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.

  17. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2012-01-01

    The term "data snooping" refers to the practice of choosing which statistical analyses to apply to a set of data after having first looked at those data. Data snooping contradicts a fundamental precept of applied statistics, that the scheme of analysis is to be planned in advance. In this column, the authors shall elucidate the statistical…

  18. Assessing the Association of Mitochondrial Genetic Variation With Primary Open-Angle Glaucoma Using Gene-Set Analyses

    PubMed Central

    Khawaja, Anthony P.; Cooke Bailey, Jessica N.; Kang, Jae Hee; Allingham, R. Rand; Hauser, Michael A.; Brilliant, Murray; Budenz, Donald L.; Christen, William G.; Fingert, John; Gaasterland, Douglas; Gaasterland, Terry; Kraft, Peter; Lee, Richard K.; Lichter, Paul R.; Liu, Yutao; Medeiros, Felipe; Moroi, Syoko E.; Richards, Julia E.; Realini, Tony; Ritch, Robert; Schuman, Joel S.; Scott, William K.; Singh, Kuldev; Sit, Arthur J.; Vollrath, Douglas; Wollstein, Gadi; Zack, Donald J.; Zhang, Kang; Pericak-Vance, Margaret; Weinreb, Robert N.; Haines, Jonathan L.; Pasquale, Louis R.; Wiggs, Janey L.

    2016-01-01

    Purpose Recent studies indicate that mitochondrial proteins may contribute to the pathogenesis of primary open-angle glaucoma (POAG). In this study, we examined the association between POAG and common variations in gene-encoding mitochondrial proteins. Methods We examined genetic data from 3430 POAG cases and 3108 controls derived from the combination of the GLAUGEN and NEIGHBOR studies. We constructed biological-system coherent mitochondrial nuclear-encoded protein gene-sets by intersecting the MitoCarta database with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We examined the mitochondrial gene-sets for association with POAG and with normal-tension glaucoma (NTG) and high-tension glaucoma (HTG) subsets using Pathway Analysis by Randomization Incorporating Structure. Results We identified 22 KEGG pathways with significant mitochondrial protein-encoding gene enrichment, belonging to six general biological classes. Among the pathway classes, mitochondrial lipid metabolism was associated with POAG overall (P = 0.013) and with NTG (P = 0.0006), and mitochondrial carbohydrate metabolism was associated with NTG (P = 0.030). Examining the individual KEGG pathway mitochondrial gene-sets, fatty acid elongation and synthesis and degradation of ketone bodies, both lipid metabolism pathways, were significantly associated with POAG (P = 0.005 and P = 0.002, respectively) and NTG (P = 0.0004 and P < 0.0001, respectively). Butanoate metabolism, a carbohydrate metabolism pathway, was significantly associated with POAG (P = 0.004), NTG (P = 0.001), and HTG (P = 0.010). Conclusions We present an effective approach for assessing the contributions of mitochondrial genetic variation to open-angle glaucoma. Our findings support a role for mitochondria in POAG pathogenesis and specifically point to lipid and carbohydrate metabolism pathways as being important. PMID:27661856

  19. A set of structural features defines the cis-regulatory modules of antenna-expressed genes in Drosophila melanogaster.

    PubMed

    López, Yosvany; Vandenbon, Alexis; Nakai, Kenta

    2014-01-01

    Unraveling the biological information within the regulatory region (RR) of genes has become one of the major focuses of current genomic research. It has been hypothesized that RRs of co-expressed genes share similar architecture, but to the best of our knowledge, no studies have simultaneously examined multiple structural features, such as positioning of cis-regulatory elements relative to transcription start sites and to each other, and the order and orientation of regulatory motifs, to accurately describe overall cis-regulatory structure. In our work we present an improved computational method that builds a feature collection based on all of these structural features. We demonstrate the utility of this approach by modeling the cis-regulatory modules of antenna-expressed genes in Drosophila melanogaster. Six potential antenna-related motifs were predicted initially, including three that appeared to be novel. A feature set was created with the predicted motifs, where a correlation-based filter was used to remove irrelevant features, and a genetic algorithm was designed to optimize the feature set. Finally, a set of eight highly informative structural features was obtained for the RRs of antenna-expressed genes, achieving an area under the curve of 0.841. We used these features to score all D. melanogaster RRs for potentially unknown antenna-expressed genes sharing a similar regulatory structure. Validation of our predictions with an independent RNA sequencing dataset showed that 76.7% of genes with high scoring RRs were expressed in antenna. In addition, we found that the structural features we identified are highly conserved in RRs of orthologs in other Drosophila sibling species. This approach to identify tissue-specific regulatory structures showed comparable performance to previous approaches, but also uncovered additional interesting features because it also considered the order and orientation of motifs.

  20. Gene set enrichment and topological analyses based on interaction networks in pediatric acute lymphoblastic leukemia

    PubMed Central

    SUI, SHUXIANG; WANG, XIN; ZHENG, HUA; GUO, HUA; CHEN, TONG; JI, DONG-MEI

    2015-01-01

    Pediatric acute lymphoblastic leukemia (ALL) accounts for over one-quarter of all pediatric cancers. Interacting genes and proteins within the larger human gene interaction network of the human genome are rarely investigated by studies investigating pediatric ALL. In the present study, interaction networks were constructed using the empirical Bayesian approach and the Search Tool for the Retrieval of Interacting Genes/proteins database, based on the differentially-expressed (DE) genes in pediatric ALL, which were identified using the RankProd package. Enrichment analysis of the interaction network was performed using the network-based methods EnrichNet and PathExpand, which were compared with the traditional expression analysis systematic explored (EASE) method. In total, 398 DE genes were identified in pediatric ALL, and LIF was the most significantly DE gene. The co-expression network consisted of 272 nodes, which indicated genes and proteins, and 602 edges, which indicated the number of interactions adjacent to the node. Comparison between EASE and PathExpand revealed that PathExpand detected more pathways or processes that were closely associated with pediatric ALL compared with the EASE method. There were 294 nodes and 1,588 edges in the protein-protein interaction network, with the processes of hematopoietic cell lineage and porphyrin metabolism demonstrating a close association with pediatric ALL. Network enrichment analysis based on the PathExpand algorithm was revealed to be more powerful for the analysis of interaction networks in pediatric ALL compared with the EASE method. LIF and MLLT11 were identified as the most significantly DE genes in pediatric ALL. The process of hematopoietic cell lineage was the pathway most significantly associated with pediatric ALL. PMID:26788135

  1. Comparative analysis of two 16S rRNA gene-based PCR primer sets provides insight into the diversity distribution patterns of anammox bacteria in different environments.

    PubMed

    Wang, Shuailong; Hong, Yiguo; Wu, Jiapeng; Xu, Xiang-Rong; Bin, Liying; Pan, Yueping; Guan, Fengjie; Wen, Jiali

    2015-10-01

    Due to the high divergence among 16S rRNA genes of anammox bacteria, different diversity pattern of the community could be resulted from using different primer set. In this study, the efficiencies and specificities of two commonly used sets, Amx368F/Amx820R and Brod541F/Amx820R, were evaluated by exploring the diversity characteristics of anammox bacteria in sediments from marine, estuary, and freshwater wetland. Statistical analysis indicated that the base mispairing rate between bases on 16S rRNA gene sequences retrieved by Amx368F/Amx820R and their corresponding ones on primer Brod541F was quite high, suggesting the different efficiency and specificity of Amx368F/Amx820R and Brod541F/Amx820R. Further experimental results demonstrated that multiple genera of anammox bacteria, including Ca. Scalindua, Ca. Brocadia, and Ca. Kuenenia, were able to be detected by Amx368F/Amx820R, but only Ca. Scalindua could be retrieved by Brod541F/Amx820R. Moreover, the phylogenetic clusters of Ca. Scalindua by Amx368F/Amx820R were different completely from those by Brod541F/Amx820R, presenting a significant complementary effect. By joint application of these two primer sets, the diversity distribution patterns of anammox bacteria in different environments were analyzed. Almost all retrieved sequences from marine sediments belonged to Ca. Scalindua. Sequences from freshwater wetland were affiliated to Ca. Brocadia and two new clusters, while high diversity of anammox bacteria was found in estuary, including Ca. Scalindua, Ca. Brocadia, and Ca. Kuenenia, corresponding to the river-sea intersection environmental feature. In total, these two prime sets have different characteristic for anammox bacteria detecting from environmental samples, and their combined application could achieve better diversity display of anammox community.

  2. A statistical model and national data set for partioning fish-tissue mercury concentration variation between spatiotemporal and sample characteristic effects

    USGS Publications Warehouse

    Wente, Stephen P.

    2004-01-01

    Many Federal, Tribal, State, and local agencies monitor mercury in fish-tissue samples to identify sites with elevated fish-tissue mercury (fish-mercury) concentrations, track changes in fish-mercury concentrations over time, and produce fish-consumption advisories. Interpretation of such monitoring data commonly is impeded by difficulties in separating the effects of sample characteristics (species, tissues sampled, and sizes of fish) from the effects of spatial and temporal trends on fish-mercury concentrations. Without such a separation, variation in fish-mercury concentrations due to differences in the characteristics of samples collected over time or across space can be misattributed to temporal or spatial trends; and/or actual trends in fish-mercury concentration can be misattributed to differences in sample characteristics. This report describes a statistical model and national data set (31,813 samples) for calibrating the aforementioned statistical model that can separate spatiotemporal and sample characteristic effects in fish-mercury concentration data. This model could be useful for evaluating spatial and temporal trends in fishmercury concentrations and developing fish-consumption advisories. The observed fish-mercury concentration data and model predictions can be accessed, displayed geospatially, and downloaded via the World Wide Web (http://emmma.usgs.gov). This report and the associated web site may assist in the interpretation of large amounts of data from widespread fishmercury monitoring efforts.

  3. A Statistical Approach Reveals Designs for the Most Robust Stochastic Gene Oscillators

    PubMed Central

    2016-01-01

    The engineering of transcriptional networks presents many challenges due to the inherent uncertainty in the system structure, changing cellular context, and stochasticity in the governing dynamics. One approach to address these problems is to design and build systems that can function across a range of conditions; that is they are robust to uncertainty in their constituent components. Here we examine the parametric robustness landscape of transcriptional oscillators, which underlie many important processes such as circadian rhythms and the cell cycle, plus also serve as a model for the engineering of complex and emergent phenomena. The central questions that we address are: Can we build genetic oscillators that are more robust than those already constructed? Can we make genetic oscillators arbitrarily robust? These questions are technically challenging due to the large model and parameter spaces that must be efficiently explored. Here we use a measure of robustness that coincides with the Bayesian model evidence, combined with an efficient Monte Carlo method to traverse model space and concentrate on regions of high robustness, which enables the accurate evaluation of the relative robustness of gene network models governed by stochastic dynamics. We report the most robust two and three gene oscillator systems, plus examine how the number of interactions, the presence of autoregulation, and degradation of mRNA and protein affects the frequency, amplitude, and robustness of transcriptional oscillators. We also find that there is a limit to parametric robustness, beyond which there is nothing to be gained by adding additional feedback. Importantly, we provide predictions on new oscillator systems that can be constructed to verify the theory and advance design and modeling approaches to systems and synthetic biology. PMID:26835539

  4. HoxBlinc RNA recruits Set1/MLL complexes to activate Hox gene expression patterns and mesoderm lineage development

    PubMed Central

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Nao; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2015-01-01

    Summary Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1+ mesoderm and then promotes hematopoietic differentiation through regulating hoxb gene pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated KD or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb gene expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1+ precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1+ precursors and differentiation of Flk1+ cells into hematopoietic lineages. PMID:26725110

  5. Differences in root functions during long-term drought adaptation: comparison of active gene sets of two wheat genotypes.

    PubMed

    Sečenji, M; Lendvai, Á; Miskolczi, P; Kocsy, G; Gallé, Á; Szucs, A; Hoffmann, B; Sárvári, É; Schweizer, P; Stein, N; Dudits, D; Györgyey, J

    2010-11-01

    In an attempt to shed light on the role of root systems in differential responses of wheat genotypes to long-term water limitation, transcriptional differences between two wheat genotypes (Triticum aestivum L., cv. Plainsman V and landrace Kobomugi) were identified during adaptation to moderate water stress at the tillering stage. Differences in organ sizes, water-use efficiency and seed production were detected in plants grown in soil, and root functions were characterised by expression profiling. The molecular genetic background of the behaviour of the two genotypes during this stress was revealed using a cDNA macroarray for transcript profiling of the roots. During a 4-week period of moderate water deficit, a set of up-regulated genes displaying transiently increased expression was identified in young plantlets, mostly in the second week in the roots of Kobomugi, while transcript levels remained constantly high in roots of Plainsman V. These genes encode proteins with various functions, such as transport, protein metabolism, osmoprotectant biosynthesis, cell wall biogenesis and detoxification, and also regulatory proteins. Oxidoreductases, peroxidases and cell wall-related genes were induced significantly only in Plainsman V, while induction of stress- and defence-related genes was more pronounced in Kobomugi. Real-time qPCR analysis of selected members of the glutathione S-transferase gene family revealed differences in regulation of family members in the two genotypes and confirmed the macroarray results. The TaGSTZ gene was stress-activated only in the roots of Kobomugi.

  6. Transcriptome analysis of cortical tissue reveals shared sets of downregulated genes in autism and schizophrenia

    PubMed Central

    Ellis, S E; Panitch, R; West, A B; Arking, D E

    2016-01-01

    Autism (AUT), schizophrenia (SCZ) and bipolar disorder (BPD) are three highly heritable neuropsychiatric conditions. Clinical similarities and genetic overlap between the three disorders have been reported; however, the causes and the downstream effects of this overlap remain elusive. By analyzing transcriptomic RNA-sequencing data generated from post-mortem cortical brain tissues from AUT, SCZ, BPD and control subjects, we have begun to characterize the extent of gene expression overlap between these disorders. We report that the AUT and SCZ transcriptomes are significantly correlated (P<0.001), whereas the other two cross-disorder comparisons (AUT–BPD and SCZ–BPD) are not. Among AUT and SCZ, we find that the genes differentially expressed across disorders are involved in neurotransmission and synapse regulation. Despite the lack of global transcriptomic overlap across all three disorders, we highlight two genes, IQSEC3 and COPS7A, which are significantly downregulated compared with controls across all three disorders, suggesting either shared etiology or compensatory changes across these neuropsychiatric conditions. Finally, we tested for enrichment of genes differentially expressed across disorders in genetic association signals in AUT, SCZ or BPD, reporting lack of signal in any of the previously published genome-wide association study (GWAS). Together, these studies highlight the importance of examining gene expression from the primary tissue involved in neuropsychiatric conditions—the cortical brain. We identify a shared role for altered neurotransmission and synapse regulation in AUT and SCZ, in addition to two genes that may more generally contribute to neurodevelopmental and neuropsychiatric conditions. PMID:27219343

  7. 16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies.

    PubMed

    Perisin, Matthew; Vetter, Madlen; Gilbert, Jack A; Bergelson, Joy

    2016-04-01

    The 16S rRNA gene (16S) is an accepted marker of bacterial taxonomic diversity, even though differences in copy number obscure the relationship between amplicon and organismal abundances. Ancestral state reconstruction methods can predict 16S copy numbers through comparisons with closely related reference genomes; however, the database of closed genomes is limited. Here, we extend the reference database of 16S copy numbers to de novo assembled draft genomes by developing 16Stimator, a method to estimate 16S copy numbers when these repetitive regions collapse during assembly. Using a read depth approach, we estimate 16S copy numbers for 12 endophytic isolates from Arabidopsis thaliana and confirm estimates by qPCR. We further apply this approach to draft genomes deposited in NCBI and demonstrate accurate copy number estimation regardless of sequencing platform, with an overall median deviation of 14%. The expanded database of isolates with 16S copy number estimates increases the power of phylogenetic correction methods for determining organismal abundances from 16S amplicon surveys. PMID:26359911

  8. Gene Set-Based Integrative Analysis Revealing Two Distinct Functional Regulation Patterns in Four Common Subtypes of Epithelial Ovarian Cancer.

    PubMed

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Yi-Ping; Chuang, Jen-Hua; Yang, Ming-Jie; Yen, Ming-Shyen; Chiou, Shih-Hwa; Chang, Cheng-Chang

    2016-08-05

    Clear cell (CCC), endometrioid (EC), mucinous (MC) and high-grade serous carcinoma (SC) are the four most common subtypes of epithelial ovarian carcinoma (EOC). The widely accepted dualistic model of ovarian carcinogenesis divided EOCs into type I and II categories based on the molecular features. However, this hypothesis has not been experimentally demonstrated. We carried out a gene set-based analysis by integrating the microarray gene expression profiles downloaded from the publicly available databases. These quantified biological functions of EOCs were defined by 1454 Gene Ontology (GO) term and 674 Reactome pathway gene sets. The pathogenesis of the four EOC subtypes was investigated by hierarchical clustering and exploratory factor analysis. The patterns of functional regulation among the four subtypes containing 1316 cases could be accurately classified by machine learning. The results revealed that the ERBB and PI3K-related pathways played important roles in the carcinogenesis of CCC, EC and MC; while deregulation of cell cycle was more predominant in SC. The study revealed that two different functional regulation patterns exist among the four EOC subtypes, which were compatible with the type I and II classifications proposed by the dualistic model of ovarian carcinogenesis.

  9. Using RNAi in C. "elegans" to Demonstrate Gene Knockdown Phenotypes in the Undergraduate Biology Lab Setting

    ERIC Educational Resources Information Center

    Roy, Nicole M.

    2013-01-01

    RNA interference (RNAi) is a powerful technology used to knock down genes in basic research and medicine. In 2006 RNAi technology using "Caenorhabditis elegans" ("C. elegans") was awarded the Nobel Prize in medicine and thus students graduating in the biological sciences should have experience with this technology. However,…

  10. The imprinted brain: how genes set the balance between autism and psychosis.

    PubMed

    Badcock, Christopher

    2011-06-01

    The imprinted brain theory proposes that autism spectrum disorder (ASD) represents a paternal bias in the expression of imprinted genes. This is reflected in a preference for mechanistic cognition and in the corresponding mentalistic deficits symptomatic of ASD. Psychotic spectrum disorder (PSD) would correspondingly result from an imbalance in favor of maternal and/or X-chromosome gene expression. If differences in gene expression were reflected locally in the human brain as mouse models and other evidence suggests they are, ASD would represent not so much an 'extreme male brain' as an extreme paternal one, with PSD correspondingly representing an extreme maternal brain. To the extent that copy number variation resembles imprinting and aneuploidy in nullifying or multiplying the expression of particular genes, it has been found to conform to the diametric model of mental illness peculiar to the imprinted brain theory. The fact that nongenetic factors such as nutrition in pregnancy can mimic and/or interact with imprinted gene expression suggests that the theory might even be able to explain the notable effect of maternal starvation on the risk of PSD - not to mention the 'autism epidemic' of modern affluent societies. Finally, the theory suggests that normality represents balanced cognition, and that genius is an extraordinary extension of cognitive configuration in both mentalistic and mechanistic directions. Were it to be proven correct, the imprinted brain theory would represent one of the biggest single advances in our understanding of the mind and of mental illness that has ever taken place, and would revolutionize psychiatric diagnosis, prevention and treatment - not to mention our understanding of epigenomics.

  11. Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data

    PubMed Central

    Sharov, Alexei A; Masui, Shinji; Sharova, Lioudmila V; Piao, Yulan; Aiba, Kazuhiro; Matoba, Ryo; Xin, Li; Niwa, Hitoshi; Ko, Minoru SH

    2008-01-01

    Background Target genes of a transcription factor (TF) Pou5f1 (Oct3/4 or Oct4), which is essential for pluripotency maintenance and self-renewal of embryonic stem (ES) cells, have previously been identified based on their response to Pou5f1 manipulation and occurrence of Chromatin-immunoprecipitation (ChIP)-binding sites in promoters. However, many responding genes with binding sites may not be direct targets because response may be mediated by other genes and ChIP-binding site may not be functional in terms of transcription regulation. Results To reduce the number of false positives, we propose to separate responding genes into groups according to direction, magnitude, and time of response, and to apply the false discovery rate (FDR) criterion to each group individually. Using this novel algorithm with stringent statistical criteria (FDR < 0.2) to a compendium of published and new microarray data (3, 6, 12, and 24 hr after Pou5f1 suppression) and published ChIP data, we identified 420 tentative target genes (TTGs) for Pou5f1. The majority of TTGs (372) were down-regulated after Pou5f1 suppression, indicating that the Pou5f1 functions as an activator of gene expression when it binds to promoters. Interestingly, many activated genes are potent suppressors of transcription, which include polycomb genes, zinc finger TFs, chromatin remodeling factors, and suppressors of signaling. Similar analysis showed that Sox2 and Nanog also function mostly as transcription activators in cooperation with Pou5f1. Conclusion We have identified the most reliable sets of direct target genes for key pluripotency genes – Pou5f1, Sox2, and Nanog, and found that they predominantly function as activators of downstream gene expression. Thus, most genes related to cell differentiation are suppressed indirectly. PMID:18522731

  12. Conjugative transposons: an unusual and diverse set of integrated gene transfer elements.

    PubMed Central

    Salyers, A A; Shoemaker, N B; Stevens, A M; Li, L Y

    1995-01-01

    Conjugative transposons are integrated DNA elements that excise themselves to form a covalently closed circular intermediate. This circular intermediate can either reintegrate in the same cell (intracellular transposition) or transfer by conjugation to a recipient and integrate into the recipient's genome (intercellular transposition). Conjugative transposons were first found in gram-positive cocci but are now known to be present in a variety of gram-positive and gram-negative bacteria also. Conjugative transposons have a surprisingly broad host range, and they probably contribute as much as plasmids to the spread of antibiotic resistance genes in some genera of disease-causing bacteria. Resistance genes need not be carried on the conjugative transposon to be transferred. Many conjugative transposons can mobilize coresident plasmids, and the Bacteroides conjugative transposons can even excise and mobilize unlinked integrated elements. The Bacteroides conjugative transposons are also unusual in that their transfer activities are regulated by tetracycline via a complex regulatory network. PMID:8531886

  13. GAMYB controls different sets of genes and is differentially regulated by microRNA in aleurone cells and anthers.

    PubMed

    Tsuji, Hiroyuki; Aya, Koichiro; Ueguchi-Tanaka, Miyako; Shimada, Yukihisa; Nakazono, Mikio; Watanabe, Ryosuke; Nishizawa, Naoko K; Gomi, Kenji; Shimada, Asako; Kitano, Hidemi; Ashikari, Motoyuki; Matsuoka, Makoto

    2006-08-01

    GAMYB is a component of gibberellin (GA) signaling in cereal aleurone cells, and has an important role in flower development. However, it is unclear how GAMYB function is regulated. We examined the involvement of a microRNA, miR159, in the regulation of GAMYB expression in cereal aleurone cells and flower development. In aleurone cells, no miR159 expression was observed with or without GA treatment, suggesting that miR159 is not involved in the regulation of GAMYB and GAMYB-like genes in this tissue. miR159 was expressed in tissues other than aleurone, and miR159 over-expressors showed similar but more severe phenotypes than the gamyb mutant. GAMYB and GAMYB-like genes are co-expressed with miR159 in anthers, and the mRNA levels for GAMYB and GAMYB-like genes are negatively correlated with miR159 levels during anther development. Thus, OsGAMYB and OsGAMYB-like genes are regulated by miR159 in flowers. A microarray analysis revealed that OsGAMYB and its upstream regulator SLR1 are involved in the regulation of almost all GA-mediated gene expression in rice aleurone cells. Moreover, different sets of genes are regulated by GAMYB in aleurone cells and anthers. GAMYB binds directly to promoter regions of its target genes in anthers as well as aleurone cells. Based on these observations, we suggest that the regulation of GAMYB expression and GAMYB function are different in aleurone cells and flowers in rice.

  14. The smallest known genomes of multicellular and toxic cyanobacteria: comparison, minimal gene sets for linked traits and the evolutionary implications.

    PubMed

    Stucken, Karina; John, Uwe; Cembella, Allan; Murillo, Alejandro A; Soto-Liebe, Katia; Fuentes-Valdés, Juan J; Friedel, Maik; Plominsky, Alvaro M; Vásquez, Mónica; Glöckner, Gernot

    2010-02-16

    Cyanobacterial morphology is diverse, ranging from unicellular spheres or rods to multicellular structures such as colonies and filaments. Multicellular species represent an evolutionary strategy to differentiate and compartmentalize certain metabolic functions for reproduction and nitrogen (N(2)) fixation into specialized cell types (e.g. akinetes, heterocysts and diazocytes). Only a few filamentous, differentiated cyanobacterial species, with genome sizes over 5 Mb, have been sequenced. We sequenced the genomes of two strains of closely related filamentous cyanobacterial species to yield further insights into the molecular basis of the traits of N(2) fixation, filament formation and cell differentiation. Cylindrospermopsis raciborskii CS-505 is a cylindrospermopsin-producing strain from Australia, whereas Raphidiopsis brookii D9 from Brazil synthesizes neurotoxins associated with paralytic shellfish poisoning (PSP). Despite their different morphology, toxin composition and disjunct geographical distribution, these strains form a monophyletic group. With genome sizes of approximately 3.9 (CS-505) and 3.2 (D9) Mb, these are the smallest genomes described for free-living filamentous cyanobacteria. We observed remarkable gene order conservation (synteny) between these genomes despite the difference in repetitive element content, which accounts for most of the genome size difference between them. We show here that the strains share a specific set of 2539 genes with >90% average nucleotide identity. The fact that the CS-505 and D9 genomes are small and streamlined compared to those of other filamentous cyanobacterial species and the lack of the ability for heterocyst formation in strain D9 allowed us to define a core set of genes responsible for each trait in filamentous species. We presume that in strain D9 the ability to form proper heterocysts was secondarily lost together with N(2) fixation capacity. Further comparisons to all available cyanobacterial genomes

  15. Genetic diversity of the conserved motifs of six bacterial leaf blight resistance genes in a set of rice landraces

    PubMed Central

    2014-01-01

    Background Bacterial leaf blight (BLB) caused by the vascular pathogen Xanthomonas oryzae pv. oryzae (Xoo) is one of the most serious diseases leading to crop failure in rice growing countries. A total of 37 resistance genes against Xoo has been identified in rice. Of these, ten BLB resistance genes have been mapped on rice chromosomes, while 6 have been cloned, sequenced and characterized. Diversity analysis at the resistance gene level of this disease is scanty, and the landraces from West Bengal and North Eastern states of India have received little attention so far. The objective of this study was to assess the genetic diversity at conserved domains of 6 BLB resistance genes in a set of 22 rice accessions including landraces and check genotypes collected from the states of Assam, Nagaland, Mizoram and West Bengal. Results In this study 34 pairs of primers were designed from conserved domains of 6 BLB resistance genes; Xa1, xa5, Xa21, Xa21(A1), Xa26 and Xa27. The designed primer pairs were used to generate PCR based polymorphic DNA profiles to detect and elucidate the genetic diversity of the six genes in the 22 diverse rice accessions of known disease phenotype. A total of 140 alleles were identified including 41 rare and 26 null alleles. The average polymorphism information content (PIC) value was 0.56/primer pair. The DNA profiles identified each of the rice landraces unequivocally. The amplified polymorphic DNA bands were used to calculate genetic similarity of the rice landraces in all possible pair combinations. The similarity among the rice accessions ranged from 18% to 89% and the dendrogram produced from the similarity values was divided into 2 major clusters. The conserved domains identified within the sequenced rare alleles include Leucine-Rich Repeat, BED-type zinc finger domain, sugar transferase domain and the domain of the carbohydrate esterase 4 superfamily. Conclusions This study revealed high genetic diversity at conserved domains of six BLB

  16. Gene set enrichment analysis of microarray data from Pimephales promelas (Rafinesque), a non-mammalian model organism

    PubMed Central

    2011-01-01

    Background Methods for gene-class testing, such as Gene Set Enrichment Analysis (GSEA), incorporate biological knowledge into the analysis and interpretation of microarray data by comparing gene expression patterns to pathways, systems and emergent phenotypes. However, to use GSEA to its full capability with non-mammalian model organisms, a microarray platform must be annotated with human gene symbols. Doing so enables the ability to relate a model organism's gene expression, in response to a given treatment, to potential human health consequences of that treatment. We enhanced the annotation of a microarray platform from a non-mammalian model organism, and then used the GSEA approach in a reanalysis of a study examining the biological significance of acute and chronic methylmercury exposure on liver tissue of fathead minnow (Pimephales promelas). Using GSEA, we tested the hypothesis that fathead livers, in response to methylmercury exposure, would exhibit gene expression patterns similar to diseased human livers. Results We describe an enhanced annotation of the fathead minnow microarray platform with human gene symbols. This resource is now compatible with the GSEA approach for gene-class testing. We confirmed that GSEA, using this enhanced microarray platform, is able to recover results consistent with a previous analysis of fathead minnow exposure to methylmercury using standard analytical approaches. Using GSEA to compare fathead gene expression profiles to human phenotypes, we also found that fathead methylmercury-treated livers exhibited expression profiles that are homologous to human systems & pathways and results in damage that is similar to those of human liver damage associated with hepatocellular carcinoma and hepatitis B. Conclusions This study describes a powerful resource for enabling the use of non-mammalian model organisms in the study of human health significance. Results of microarray gene expression studies involving fathead minnow, typically

  17. Root Exudates of Various Host Plants of Rhizobium leguminosarum Contain Different Sets of Inducers of Rhizobium Nodulation Genes.

    PubMed

    Zaat, S A; Wijffelman, C A; Mulders, I H; van Brussel, A A; Lugtenberg, B J

    1988-04-01

    Rhizobium promoters involved in the formation of root nodules on leguminous plants are activated by flavonoids in plant root exudate. A series of Rhizobium strains which all contain the inducible Rhizobium leguminosarum nodA promoter fused to the Escherichia coli lacZ gene, and which differ only in the source of the regulatory nodD gene, were recently used to show that the regulatory nodD gene determines which flavonoids are able to activate the nodA promoter (HP Spaink, CA Wijffelman, E Pees, RJH Okker, BJJ Lugtenberg 1987 Nature 328: 337-340). Since these strains therefore are able to discriminate between various flavonoids, they were used to determine whether or not plants that are nodulated by R. leguminosarum produce different inducers. After chromatographic separation of root exudate constituents from Vicia sativa L. subsp. nigra (L.), V. hirsuta (L.) S.F. Gray, Pisum sativum L. cv Rondo, and Trifolium subterraneum L., the fractions were tested with a set of strains containing a nodD gene of R. leguminosarum, R. trifolii, or Rhizobium meliloti, respectively. It appeared that the source of nodD determined whether, and to what extent, the R. leguminosarum nodA promoter was induced. Lack of induction could not be attributed to the presence of inhibitors. Most of the inducers were able to activate the nodA promoter in the presence of one particular nodD gene only. The inducers that were active in the presence of the R. leguminosarum nodD gene were different in each root exudate.

  18. Gene set of chemosensory receptors in the polyembryonic endoparasitoid Macrocentrus cingulum

    PubMed Central

    Ahmed, Tofael; Zhang, Tiantao; Wang, Zhenying; He, Kanglai; Bai, Shuxiong

    2016-01-01

    Insects are extremely successful animals whose odor perception is very prominent due to their sophisticated olfactory system. The main chemosensory organ, antennae play a critical role in detecting odor in ambient environment before initiating appropriate behavioral responses. The antennal chemosensory receptor genes families have been suggested to be involved in olfactory signal transduction pathway as a sensory neuron response. The Macrocentrus cingulum is deployed successfully as a biological control agent for corn pest insects from the Lepidopteran genus Ostrinia. In this research, we assembled antennal transcriptomes of M. cingulum by using next generation sequencing to identify the major chemosensory receptors gene families. In total, 112 olfactory receptors candidates (79 odorant receptors, 20 gustatory receptors, and 13 ionotropic receptors) have been identified from the male and female antennal transcriptome. The sequences of all of these transcripts were confirmed by RT-PCR, and direct DNA sequencing. Expression profiles of gustatory receptors in olfactory and non-olfactory tissues were measured by RT-qPCR. The sex-specific and sex-biased chemoreceptors expression patterns suggested that they may have important functions in sense detection which behaviorally relevant to odor molecules. This reported result provides a comprehensive resource of the foundation in semiochemicals driven behaviors at molecular level in polyembryonic endoparasitoid. PMID:27090020

  19. Extended triplet set C343 of DNA sequences and its application to the p53 gene

    NASA Astrophysics Data System (ADS)

    Yan, Yan-Yan; Zhu, Ping

    2011-01-01

    Recently, much research has indicated that more and more cancers pose a threat to human life. Cancers are caused by oncogenes. Many human oncogenes have been found and most of them are located on chromosomes. The discovery of the oncogene plays a significant role in the treatment of cancer. The p53 tumor suppressor gene has received much attention because it frequently mutates or deletes in tumor cells of most people. Thus, the study of oncogenes is significant. In order to establish the Galois field (GF(7)), the indefinite gene is introduced as D and oncogene is introduced as O, and P. Taking the polynomial coefficients a0, a1, a2 in GF(7) and the bijective function f: GF(7) → {D,A,C,O,G,T,P}, where f (0) = D, f (1) = A, f (2) = C, f (3) = O, f (4) = G, f (5) = T, and f (6) = P, the bijective phi may be written as phi(a0 + a1x + a2x2). Based on the algebraic structure, we can not only analyse the DNA sequence of oncogenes, but also predict possible new cancers.

  20. Gene set of chemosensory receptors in the polyembryonic endoparasitoid Macrocentrus cingulum.

    PubMed

    Ahmed, Tofael; Zhang, Tiantao; Wang, Zhenying; He, Kanglai; Bai, Shuxiong

    2016-01-01

    Insects are extremely successful animals whose odor perception is very prominent due to their sophisticated olfactory system. The main chemosensory organ, antennae play a critical role in detecting odor in ambient environment before initiating appropriate behavioral responses. The antennal chemosensory receptor genes families have been suggested to be involved in olfactory signal transduction pathway as a sensory neuron response. The Macrocentrus cingulum is deployed successfully as a biological control agent for corn pest insects from the Lepidopteran genus Ostrinia. In this research, we assembled antennal transcriptomes of M. cingulum by using next generation sequencing to identify the major chemosensory receptors gene families. In total, 112 olfactory receptors candidates (79 odorant receptors, 20 gustatory receptors, and 13 ionotropic receptors) have been identified from the male and female antennal transcriptome. The sequences of all of these transcripts were confirmed by RT-PCR, and direct DNA sequencing. Expression profiles of gustatory receptors in olfactory and non-olfactory tissues were measured by RT-qPCR. The sex-specific and sex-biased chemoreceptors expression patterns suggested that they may have important functions in sense detection which behaviorally relevant to odor molecules. This reported result provides a comprehensive resource of the foundation in semiochemicals driven behaviors at molecular level in polyembryonic endoparasitoid. PMID:27090020

  1. HoxBlinc RNA Recruits Set1/MLL Complexes to Activate Hox Gene Expression Patterns and Mesoderm Lineage Development.

    PubMed

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Naohiro; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2016-01-01

    Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1(+) mesoderm and then promotes hematopoietic differentiation through regulation of hoxb pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated knockdown or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1(+) precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1(+) precursors and differentiation of Flk1(+) cells into hematopoietic lineages.

  2. The transcriptional response to encystation stimuli in Giardia lamblia is restricted to a small set of genes.

    PubMed

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G; Hehl, Adrian B

    2010-10-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors. PMID:20693303

  3. The Transcriptional Response to Encystation Stimuli in Giardia lamblia Is Restricted to a Small Set of Genes ▿†

    PubMed Central

    Morf, Laura; Spycher, Cornelia; Rehrauer, Hubert; Fournier, Catharine Aquino; Morrison, Hilary G.; Hehl, Adrian B.

    2010-01-01

    The protozoan parasite Giardia lamblia undergoes stage differentiation in the small intestine of the host to an environmentally resistant and infectious cyst. Encystation involves the secretion of an extracellular matrix comprised of cyst wall proteins (CWPs) and a β(1-3)-GalNAc homopolymer. Upon the induction of encystation, genes coding for CWPs are switched on, and mRNAs coding for a Myb transcription factor and enzymes involved in cyst wall glycan synthesis are upregulated. Encystation in vitro is triggered by several protocols, which call for changes in bile concentrations or availability of lipids, and elevated pH. However, the conditions for induction are not standardized and we predicted significant protocol-specific side effects. This makes reliable identification of encystation factors difficult. Here, we exploited the possibility of inducing encystation with two different protocols, which we show to be equally effective, for a comparative mRNA profile analysis. The standard encystation protocol induced a bipartite transcriptional response with surprisingly minor involvement of stress genes. A comparative analysis revealed a core set of only 18 encystation genes and showed that a majority of genes was indeed upregulated as a side effect of inducing conditions. We also established a Myb binding sequence as a signature motif in encystation promoters, suggesting coordinated regulation of these factors. PMID:20693303

  4. HoxBlinc RNA Recruits Set1/MLL Complexes to Activate Hox Gene Expression Patterns and Mesoderm Lineage Development.

    PubMed

    Deng, Changwang; Li, Ying; Zhou, Lei; Cho, Joonseok; Patel, Bhavita; Terada, Naohiro; Li, Yangqiu; Bungert, Jörg; Qiu, Yi; Huang, Suming

    2016-01-01

    Trithorax proteins and long-intergenic noncoding RNAs are critical regulators of embryonic stem cell pluripotency; however, how they cooperatively regulate germ layer mesoderm specification remains elusive. We report here that HoxBlinc RNA first specifies Flk1(+) mesoderm and then promotes hematopoietic differentiation through regulation of hoxb pathways. HoxBlinc binds to the hoxb genes, recruits Setd1a/MLL1 complexes, and mediates long-range chromatin interactions to activate transcription of the hoxb genes. Depletion of HoxBlinc by shRNA-mediated knockdown or CRISPR-Cas9-mediated genetic deletion inhibits expression of hoxb genes and other factors regulating cardiac/hematopoietic differentiation. Reduced hoxb expression is accompanied by decreased recruitment of Set1/MLL1 and H3K4me3 modification, as well as by reduced chromatin loop formation. Re-expression of hoxb2-b4 genes in HoxBlinc-depleted embryoid bodies rescues Flk1(+) precursors that undergo hematopoietic differentiation. Thus, HoxBlinc plays an important role in controlling hoxb transcription networks that mediate specification of mesoderm-derived Flk1(+) precursors and differentiation of Flk1(+) cells into hematopoietic lineages. PMID:26725110

  5. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star–galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star–galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star–galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  6. Of Genes and Machines: Application of a Combination of Machine Learning Tools to Astronomy Data Sets

    NASA Astrophysics Data System (ADS)

    Heinis, S.; Kumar, S.; Gezari, S.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Flewelling, H.; Kaiser, N.; Magnier, E. A.; Metcalfe, N.; Waters, C.

    2016-04-01

    We apply a combination of genetic algorithm (GA) and support vector machine (SVM) machine learning algorithms to solve two important problems faced by the astronomical community: star-galaxy separation and photometric redshift estimation of galaxies in survey catalogs. We use the GA to select the relevant features in the first step, followed by optimization of SVM parameters in the second step to obtain an optimal set of parameters to classify or regress, in the process of which we avoid overfitting. We apply our method to star-galaxy separation in Pan-STARRS1 data. We show that our method correctly classifies 98% of objects down to {i}{{P1}}=24.5, with a completeness (or true positive rate) of 99% for galaxies and 88% for stars. By combining colors with morphology, our star-galaxy separation method yields better results than the new SExtractor classifier spread_model, in particular at the faint end ({i}{{P1}}\\gt 22). We also use our method to derive photometric redshifts for galaxies in the COSMOS bright multiwavelength data set down to an error in (1+z) of σ =0.013, which compares well with estimates from spectral energy distribution fitting on the same data (σ =0.007) while making a significantly smaller number of assumptions.

  7. An overlapping set of genes is regulated by both NFIB and the glucocorticoid receptor during lung maturation

    PubMed Central

    2014-01-01

    Background Lung maturation is a late fetal developmental event in both mice and humans. Because of this, lung immaturity is a serious problem in premature infants. Disruption of genes for either the glucocorticoid receptor (Nr3c1) or the NFIB transcription factors results in perinatal lethality due to lung immaturity. In both knockouts, the phenotype includes excess cell proliferation, failure of saccularization and reduced expression of markers of epithelial differentiation. This similarity suggests that the two genes may co-regulate a specific set of genes essential for lung maturation. Results We analyzed the roles of these two transcription factors in regulating transcription using ChIP-seq data for NFIB, and RNA expression data and motif analysis for both. Our new ChIP-seq data for NFIB in lung at E16.5 shows that NFIB binds to a NFI motif. This motif is over-represented in the promoters of genes that are under-expressed in Nfib-KO mice at E18.5, suggesting an activator role for NFIB. Using available microarray data from Nr3c1-KO mice, we further identified 52 genes that are under-expressed in both Nfib and Nr3c1 knockouts, an overlap which is 13.1 times larger than what would be expected by chance. Finally, we looked for enrichment of 738 recently published transcription factor motifs in the promoters of these putative target genes and found that the NFIB and glucocorticoid receptor motifs were among the most enriched, suggesting that a subset of these genes may be directly activated by Nfib and Nr3c1. Conclusions Our data provide the first evidence for Nfib and Nr3c1 co-regulating genes related to lung maturation. They also establish that the in vivo DNA-binding specificity of NFIB is the same as previously seen in vitro, and highly similar to that of the other NFI-family members NFIA, NFIC and NFIX. PMID:24661679

  8. A comparison of primer sets for detecting 16S rRNA and hydrazine oxidoreductase genes of anaerobic ammonium-oxidizing bacteria in marine sediments.

    PubMed

    Li, Meng; Hong, Yiguo; Klotz, Martin Gunter; Gu, Ji-Dong

    2010-03-01

    Published polymerase chain reaction primer sets for detecting the genes encoding 16S rRNA gene and hydrazine oxidoreductase (hzo) in anammox bacteria were compared by using the same coastal marine sediment samples. While four previously reported primer sets developed to detect the 16S rRNA gene showed varying specificities between 12% and 77%, an optimized primer combination resulted in up to 98% specificity, and the recovered anammox 16S rRNA gene sequences were >95% sequence identical to published sequences from anammox bacteria in the Candidatus "Scalindua" group. Furthermore, four primer sets used in detecting the hzo gene of anammox bacteria were highly specific (up to 92%) and efficient, and the newly designed primer set in this study amplified longer hzo gene segments suitable for phylogenetic analysis. The optimized primer set for the 16S rRNA gene and the newly designed primer set for the hzo gene were successfully applied to identify anammox bacteria from marine sediments of aquaculture zone, coastal wetland, and deep ocean where the three ecosystems form a gradient of anthropogenic impact. Results indicated a broad distribution of anammox bacteria with high niche-specific community structure within each marine ecosystem. PMID:20107988

  9. [Expression of SET-NUP214 fusion gene in patients with T-cell acute lymphoblastic leukemia and its clinical significance].

    PubMed

    Dai, Hai-Ping; Wang, Qian; Wu, Li-Li; Ping, Na-Na; Wu, Chun-Xiao; Xie, Jun-Dan; Pan, Jin-Lan; Xue, Yong-Quan; Wu, De-Pei; Chen, Su-Ning

    2012-10-01

    This study was aimed to investigate the occurrence and clinical significance of the SET-NUP214 fusion gene in patients with T-cell acute lymphoblastic leukemia (T-ALL), analyse clinical and biological characteristics in this disease. RT-PCR was used to detect the expression of SET-NUP214 fusion gene in 58 T-ALL cases. Interphase FISH and Array-CGH were used to detect the deletion of 9q34. Direct sequencing was applied to detect mutations of PHF6 and NOTCH1. The results showed that 6 out of 58 T-ALL cases (10.3%) were detected to have the SET-NUP214 fusion gene by RT-PCR. Besides T-lineage antigens, expression of CD13 and(or) CD33 were detected in all the 6 cases. Deletions of 9q34 were detected in 4 out of the 6 patients by FISH. Array-CGH results of 3 SET-NUP214 positive T-ALL patients confirmed that this fusion gene was resulted from a cryptic deletion of 9q34.11q34.13. PHF6 and NOTCH1 gene mutations were found in 4 and 5 out of 6 SET-NUP214 positive T-ALL patients, respectively. It is concluded that SET-NUP214 fusion gene is often resulted from del(9)(q34). PHF6 and NOTCH1 mutations may be potential leukemogenic event in SET-NUP214 fusion gene.

  10. A statistical evaluation of models for the initial settlement of the american continent emphasizes the importance of gene flow with Asia.

    PubMed

    Ray, N; Wegmann, D; Fagundes, N J R; Wang, S; Ruiz-Linares, A; Excoffier, L

    2010-02-01

    Although there is agreement in that the Bering Strait was the entry point for the initial colonization of the American continent, there is considerable uncertainty regarding the timing and pattern of human migration from Asia to America. In order to perform a statistical assessment of the relative probability of alternative migration scenarios and to estimate key demographic parameters associated with them, we used an approximate Bayesian computation framework to analyze a data set of 401 autosomal microsatellite loci typed in 29 native American populations. A major finding is that a single, discrete, wave of colonization is highly inconsistent with observed levels of genetic diversity. A scenario with two discrete migration waves is also not supported by the data. The current genetic diversity of Amerindian populations is best explained by a third model involving recurrent gene flow between Asia and America, after initial colonization. We estimate that this colonization involved about 100 individuals and occurred some 13,000 years ago, in agreement with well-established archeological data. PMID:19805438

  11. Comprehensive screening for a complete set of Japanese-population-specific filaggrin gene mutations.

    PubMed

    Kono, M; Nomura, T; Ohguchi, Y; Mizuno, O; Suzuki, S; Tsujiuchi, H; Hamajima, N; McLean, W H I; Shimizu, H; Akiyama, M

    2014-04-01

    Mutations in FLG coding profilaggrin cause ichthyosis vulgaris and are an important predisposing factor for atopic dermatitis. Until now, most case-control studies and population-based screenings have been performed only for prevalent mutations. In this study, we established a high-throughput FLG mutation detection system by real-time PCR with a set of two double-dye probes and conducted comprehensive screening for almost all of the Japanese-population-specific FLG mutations (ten FLG mutations). The present comprehensive screening for all ten FLG mutations provided a more precise prevalence rate for FLG mutations (11.1%, n = 820), which seemed high compared with data of previous reports based on screening for limited numbers of FLG mutations. Our comprehensive screening suggested that population-specific FLG mutations may be a significant predisposing factor for hay fever (odds ratio = 2.01 [95% CI: 1.027-3.936, P < 0.05]), although the sample sizes of this study were too small for reliable subphenotype analysis on the association between FLG mutations and hay fever in the eczema patients and the noneczema individuals, and it is not clear whether the association between FLG mutations and hay fever is due to the close association between FLG mutations and hay fever patients with eczema.

  12. Different CHD chromatin remodelers are required for expression of distinct gene sets and specific stages during development of Dictyostelium discoideum

    PubMed Central

    Platt, James L.; Rogers, Benjamin J.; Rogers, Kelley C.; Harwood, Adrian J.; Kimmel, Alan R.

    2013-01-01

    Control of chromatin structure is crucial for multicellular development and regulation of cell differentiation. The CHD (chromodomain-helicase-DNA binding) protein family is one of the major ATP-dependent, chromatin remodeling factors that regulate nucleosome positioning and access of transcription factors and RNA polymerase to the eukaryotic genome. There are three mammalian CHD subfamilies and their impaired functions are associated with several human diseases. Here, we identify three CHD orthologs (ChdA, ChdB and ChdC) in Dictyostelium discoideum. These CHDs are expressed throughout development, but with unique patterns. Null mutants lacking each CHD have distinct phenotypes that reflect their expression patterns and suggest functional specificity. Accordingly, using genome-wide (RNA-seq) transcriptome profiling for each null strain, we show that the different CHDs regulate distinct gene sets during both growth and development. ChdC is an apparent ortholog of the mammalian Class III CHD group that is associated with the human CHARGE syndrome, and GO analyses of aberrant gene expression in chdC nulls suggest defects in both cell-autonomous and non-autonomous signaling, which have been confirmed through analyses of chdC nulls developed in pure populations or with low levels of wild-type cells. This study provides novel insight into the broad function of CHDs in the regulation development and disease, through chromatin-mediated changes in directed gene expression. PMID:24301467

  13. BAT3 and SET1A form a complex with CTCFL/BORIS to modulate H3K4 histone dimethylation and gene expression.

    PubMed

    Nguyen, Phuongmai; Bar-Sela, Gil; Sun, Lunching; Bisht, Kheem S; Cui, Hengmi; Kohn, Elise; Feinberg, Andrew P; Gius, David

    2008-11-01

    Chromatin status is characterized in part by covalent posttranslational modifications of histones that regulate chromatin dynamics and direct gene expression. BORIS (brother of the regulator of imprinted sites) is an insulator DNA-binding protein that is thought to play a role in chromatin organization and gene expression. BORIS is a cancer-germ line gene; these are genes normally present in male germ cells (testis) that are also expressed in cancer cell lines as well as primary tumors. This work identifies SET1A, an H3K4 methyltransferase, and BAT3, a cochaperone recruiter, as binding partners for BORIS, and these proteins bind to the upstream promoter regions of two well-characterized procarcinogenic genes, Myc and BRCA1. RNA interference (RNAi) knockdown of BAT3, as well as SET1A, decreased Myc and BRCA1 gene expression but did not affect the binding properties of BORIS, but RNAi knockdown of BORIS prevented the assembly of BAT3 and SET1A at the Myc and BRCA1 promoters. Finally, chromatin analysis suggested that BORIS and BAT3 exert their effects on gene expression by recruiting proteins such as SET1A that are linked to changes in H3K4 dimethylation. Thus, we propose that BORIS acts as a platform upon which BAT3 and SET1A assemble and exert effects upon chromatin structure and gene expression. PMID:18765639

  14. The Effects of Violation of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Procedures.

    ERIC Educational Resources Information Center

    Johnson, Colleen Cook

    The purpose of this study is to help define the precise nature and limits of the tolerable range in which a researcher may be relatively confident about the statistical validity of his or her research findings, focusing specifically on the statistical validity of results when violating the assumptions associated with the one-way, fixed-effects…

  15. Gene sets for utilization of primary and secondary nutrition supplies in the distal gut of endangered Iberian lynx.

    PubMed

    Alcaide, María; Messina, Enzo; Richter, Michael; Bargiela, Rafael; Peplies, Jörg; Huws, Sharon A; Newbold, Charles J; Golyshin, Peter N; Simón, Miguel A; López, Guillermo; Yakimov, Michail M; Ferrer, Manuel

    2012-01-01

    Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus) fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads) related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of 'presumptive' aquaporin aqpZ genes and genes encoding 'active' lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(amino)lipids, glyco(amino)glycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases) in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80-100% wild rabbits) but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely suggests a

  16. Gene Sets for Utilization of Primary and Secondary Nutrition Supplies in the Distal Gut of Endangered Iberian Lynx

    PubMed Central

    Alcaide, María; Messina, Enzo; Richter, Michael; Bargiela, Rafael; Peplies, Jörg; Huws, Sharon A.; Newbold, Charles J.; Golyshin, Peter N.; Simón, Miguel A.; López, Guillermo; Yakimov, Michail M.; Ferrer, Manuel

    2012-01-01

    Recent studies have indicated the existence of an extensive trans-genomic trans-mural co-metabolism between gut microbes and animal hosts that is diet-, host phylogeny- and provenance-influenced. Here, we analyzed the biodiversity at the level of small subunit rRNA gene sequence and the metabolic composition of 18 Mbp of consensus metagenome sequences and activity characteristics of bacterial intra-cellular extracts, in wild Iberian lynx (Lynx pardinus) fecal samples. Bacterial signatures (14.43% of all of the Firmicutes reads and 6.36% of total reads) related to the uncultured anaerobic commensals Anaeroplasma spp., which are typically found in ovine and bovine rumen, were first identified. The lynx gut was further characterized by an over-representation of ‘presumptive’ aquaporin aqpZ genes and genes encoding ‘active’ lysosomal-like digestive enzymes that are possibly needed to acquire glycerol, sugars and amino acids from glycoproteins, glyco(amino)lipids, glyco(amino)glycans and nucleoside diphosphate sugars. Lynx gut was highly enriched (28% of the total glycosidases) in genes encoding α-amylase and related enzymes, although it exhibited low rate of enzymatic activity indicative of starch degradation. The preponderance of β-xylosidase activity in protein extracts further suggests lynx gut microbes being most active for the metabolism of β-xylose containing plant N-glycans, although β-xylosidases sequences constituted only 1.5% of total glycosidases. These collective and unique bacterial, genetic and enzymatic activity signatures suggest that the wild lynx gut microbiota not only harbors gene sets underpinning sugar uptake from primary animal tissues (with the monotypic dietary profile of the wild lynx consisting of 80–100% wild rabbits) but also for the hydrolysis of prey-derived plant biomass. Although, the present investigation corresponds to a single sample and some of the statements should be considered qualitative, the data most likely

  17. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm

    PubMed Central

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365

  18. ABAEnrichment: an R package to test for gene set expression enrichment in the adult and developing human brain

    PubMed Central

    Prüfer, Kay; Kelso, Janet; Dannemann, Michael

    2016-01-01

    Summary: We present ABAEnrichment, an R package that tests for expression enrichment in specific brain regions at different developmental stages using expression information gathered from multiple regions of the adult and developing human brain, together with ontologically organized structural information about the brain, both provided by the Allen Brain Atlas. We validate ABAEnrichment by successfully recovering the origin of gene sets identified in specific brain cell-types and developmental stages. Availability and Implementation: ABAEnrichment was implemented as an R package and is available under GPL (≥ 2) from the Bioconductor website (http://bioconductor.org/packages/3.3/bioc/html/ABAEnrichment.html). Contacts: steffi_grote@eva.mpg.de, kelso@eva.mpg.de or michael_dannemann@eva.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27354695

  19. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm.

    PubMed

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui; Zhu, Hu

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization. PMID:27610365

  20. Optimization to the Culture Conditions for Phellinus Production with Regression Analysis and Gene-Set Based Genetic Algorithm

    PubMed Central

    Li, Zhongwei; Xin, Yuezhen; Wang, Xun; Sun, Beibei; Xia, Shengyu; Li, Hui

    2016-01-01

    Phellinus is a kind of fungus and is known as one of the elemental components in drugs to avoid cancers. With the purpose of finding optimized culture conditions for Phellinus production in the laboratory, plenty of experiments focusing on single factor were operated and large scale of experimental data were generated. In this work, we use the data collected from experiments for regression analysis, and then a mathematical model of predicting Phellinus production is achieved. Subsequently, a gene-set based genetic algorithm is developed to optimize the values of parameters involved in culture conditions, including inoculum size, PH value, initial liquid volume, temperature, seed age, fermentation time, and rotation speed. These optimized values of the parameters have accordance with biological experimental results, which indicate that our method has a good predictability for culture conditions optimization.

  1. A Set of miRNAs, Their Gene and Protein Targets and Stromal Genes Distinguish Early from Late Onset ER Positive Breast Cancer

    PubMed Central

    Bastos, E. P.; Brentani, H.; Pereira, C. A. B.; Polpo, A.; Lima, L.; Puga, R. D.; Pasini, F. S.; Osorio, C. A. B. T.; Roela, R. A.; Achatz, M. I.; Trapé, A. P.; Gonzalez-Angulo, A. M.; Brentani, M. M.

    2016-01-01

    Breast cancer (BC) in young adult patients (YA) has a more aggressive biological behavior and is associated with a worse prognosis than BC arising in middle aged patients (MA). We proposed that differentially expressed miRNAs could regulate genes and proteins underlying aggressive phenotypes of breast tumors in YA patients when compared to those arising in MA patients. Objective: Using integrated expression analyses of miRs, their mRNA and protein targets and stromal gene expression, we aimed to identify differentially expressed profiles between tumors from YA-BC and MA-BC. Methodology and Results: Samples of ER+ invasive ductal breast carcinomas, divided into two groups: YA-BC (35 years or less) or MA-BC (50–65 years) were evaluated. Screening for BRCA1/2 status according to the BOADICEA program indicated low risk of patients being carriers of these mutations. Aggressive characteristics were more evident in YA-BC versus MA-BC. Performing qPCR, we identified eight miRs differentially expressed (miR-9, 18b, 33b, 106a, 106b, 210, 518a-3p and miR-372) between YA-BC and MA-BC tumors with high confidence statement, which were associated with aggressive clinicopathological characteristics. The expression profiles by microarray identified 602 predicted target genes associated to proliferation, cell cycle and development biological functions. Performing RPPA, 24 target proteins differed between both groups and 21 were interconnected within a network protein-protein interactions associated with proliferation, development and metabolism pathways over represented in YA-BC. Combination of eight mRNA targets or the combination of eight target proteins defined indicators able to classify individual samples into YA-BC or MA-BC groups. Fibroblast-enriched stroma expression profile analysis resulted in 308 stromal genes differentially expressed between YA-BC and MA-BC. Conclusion: We defined a set of differentially expressed miRNAs, their mRNAs and protein targets and stromal

  2. lingerer, a Drosophila gene involved in initiation and termination of copulation, encodes a set of novel cytoplasmic proteins.

    PubMed Central

    Kuniyoshi, Hisato; Baba, Kotaro; Ueda, Ryu; Kondo, Shunzo; Awano, Wakae; Juni, Naoto; Yamamoto, Daisuke

    2002-01-01

    In an effort to uncover genetic components underlying the courtship behavior of Drosophila melanogaster, we have characterized a novel gene, lingerer (lig), mutations of which result in abnormal copulation. Males carrying a hypomorphic mutation in lig fail to withdraw their genitalia upon termination of copulation, but display no overt abnormalities in their genitalia. A severe reduction in the dosage of the lig gene causes repeated attempted copulations but no successful copulations. Complete loss of lig function results in lethality during early pupal stages. lig is localized to polytene segment 44A on the second chromosome and encodes three alternatively spliced transcripts that generate two types of 150-kD proteins, Lig-A and Lig-B, differing only at the C terminus. Lig proteins show no similarity to known proteins. However, a set of homologous proteins in mammals suggest that Drosophila Lig belongs to a family of proteins that share five highly conserved domains. Lig is a cytoplasmic protein expressed in the central nervous system (CNS), imaginal discs, and gonads. Lig-A expression is selectively reduced in lig mutants and the ubiquitous supply of this protein at the beginning of metamorphosis restores the copulatory defects of the lig mutant. We propose that lig may act in the nervous system to mediate the control of copulatory organs during courtship. PMID:12524348

  3. Genomics in cereals: from genome-wide conserved orthologous set (COS) sequences to candidate genes for trait dissection.

    PubMed

    Quraishi, Umar Masood; Abrouk, Michael; Bolot, Stéphanie; Pont, Caroline; Throude, Mickael; Guilhot, Nicolas; Confolent, Carole; Bortolini, Fernanda; Praud, Sébastien; Murigneux, Alain; Charmet, Gilles; Salse, Jerome

    2009-11-01

    Recent updates in comparative genomics among cereals have provided the opportunity to identify conserved orthologous set (COS) DNA sequences for cross-genome map-based cloning of candidate genes underpinning quantitative traits. New tools are described that are applicable to any cereal genome of interest, namely, alignment criterion for orthologous couples identification, as well as the Intron Spanning Marker software to automatically select intron-spanning primer pairs. In order to test the software, it was applied to the bread wheat genome, and 695 COS markers were assigned to 1,535 wheat loci (on average one marker/2.6 cM) based on 827 robust rice-wheat orthologs. Furthermore, 31 of the 695 COS markers were selected to fine map a pentosan viscosity quantitative trait loci (QTL) on wheat chromosome 7A. Among the 31 COS markers, 14 (45%) were polymorphic between the parental lines and 12 were mapped within the QTL confidence interval with one marker every 0.6 cM defining candidate genes among the rice orthologous region.

  4. Analysis of protein gene products in cells with altered chromosome sets for the purpose of genetic mapping

    SciTech Connect

    Shishkin, S.S.; Zakharov, S.F.; Gromov, P.S.; Shcheglova, M.V.; Kukharenko, V.I.; Shilov, A.G.; Matveeva, N.M.; Zhdanova, N.S.; Efimochkin, A.S.; Krokhina, T.B. |

    1994-12-01

    Two-dimensional electrophoresis was used for analyzing proteins in hybrid cells that contained single human chromosomes (chromosome 5, chromosome 21, or chromosomes 5 and 21) against the background of the mouse genome. By comparing the protein patterns of hybrid and parent cells (about 1000 protein fractions for each kind of cell), five fractions among proteins of hybrid cells were supposedly identified as human proteins. The genes of two of them are probably located on chromosome 5, and those of the other three on chromosome 21. Moreover, analysis of proteins in fibroblasts of patients with the cri-du-chat syndrome (5p-) revealed a decrease in the content of two proteins as compared with those in preparations of diploid fibroblasts. This fact was regarded as evidence that two corresponding genes are located on the short arm of chromosome 5. Methodological problems associated with the use of protein pattern analysis in cells with altered chromosome sets for the purposes of genetic mapping are discussed.

  5. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum.

    PubMed

    Holding, Thomas; Recker, Mario

    2015-12-01

    Infection by the human malaria parasite Plasmodium falciparum results in a broad spectrum of clinical outcomes, ranging from severe and potentially life-threatening malaria to asymptomatic carriage. In a process of naturally acquired immunity, individuals living in malaria-endemic regions build up a level of clinical protection, which attenuates infection severity in an exposure-dependent manner. Underlying this shift in the immunoepidemiology as well as the observed range in malaria pathogenesis is the var multigene family and the phenotypic diversity embedded within. The var gene-encoded surface proteins Plasmodium falciparum erythrocyte membrane protein 1 mediate variant-specific binding of infected red blood cells to a diverse set of host receptors that has been linked to specific disease manifestations, including cerebral and pregnancy-associated malaria. Here, we show that cross-reactive immune responses, which minimize the within-host benefit of each additionally expressed gene during infection, can cause selection for maximum phenotypic diversity at the genome level. We further show that differential functional constraints on protein diversification stably maintain uneven ratios between phenotypic groups, in line with empirical observation. Our results thus suggest that the maintenance of phenotypic diversity within P. falciparum is driven by an evolutionary trade-off that optimizes between within-host parasite fitness and between-host selection pressure.

  6. PathMAPA: a tool for displaying gene expression and performing statistical tests on metabolic pathways at multiple levels for Arabidopsis

    PubMed Central

    Pan, Deyun; Sun, Ning; Cheung, Kei-Hoi; Guan, Zhong; Ma, Ligeng; Holford, Matthew; Deng, Xingwang; Zhao, Hongyu

    2003-01-01

    Background To date, many genomic and pathway-related tools and databases have been developed to analyze microarray data. In published web-based applications to date, however, complex pathways have been displayed with static image files that may not be up-to-date or are time-consuming to rebuild. In addition, gene expression analyses focus on individual probes and genes with little or no consideration of pathways. These approaches reveal little information about pathways that are key to a full understanding of the building blocks of biological systems. Therefore, there is a need to provide useful tools that can generate pathways without manually building images and allow gene expression data to be integrated and analyzed at pathway levels for such experimental organisms as Arabidopsis. Results We have developed PathMAPA, a web-based application written in Java that can be easily accessed over the Internet. An Oracle database is used to store, query, and manipulate the large amounts of data that are involved. PathMAPA allows its users to (i) upload and populate microarray data into a database; (ii) integrate gene expression with enzymes of the pathways; (iii) generate pathway diagrams without building image files manually; (iv) visualize gene expressions for each pathway at enzyme, locus, and probe levels; and (v) perform statistical tests at pathway, enzyme and gene levels. PathMAPA can be used to examine Arabidopsis thaliana gene expression patterns associated with metabolic pathways. Conclusion PathMAPA provides two unique features for the gene expression analysis of Arabidopsis thaliana: (i) automatic generation of pathways associated with gene expression and (ii) statistical tests at pathway level. The first feature allows for the periodical updating of genomic data for pathways, while the second feature can provide insight into how treatments affect relevant pathways for the selected experiment(s). PMID:14604444

  7. The Effects of Single and Compound Violations of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Models.

    ERIC Educational Resources Information Center

    Johnson, Colleen Cook

    This study integrates into one comprehensive Monte Carlo simulation a vast array of previously defined and substantively interrelated research studies of the robustness of analysis of variance (ANOVA) and analysis of covariance (ANCOVA) statistical procedures. Three sets of balanced ANOVA and ANCOVA designs (group sizes of 15, 30, and 45) and one…

  8. Comparative genomic analysis of Brucella abortus vaccine strain 104M reveals a set of candidate genes associated with its virulence attenuation.

    PubMed

    Yu, Dong; Hui, Yiming; Zai, Xiaodong; Xu, Junjie; Liang, Long; Wang, Bingxiang; Yue, Junjie; Li, Shanhu

    2015-01-01

    The Brucella abortus strain 104M, a spontaneously attenuated strain, has been used as a vaccine strain in humans against brucellosis for 6 decades in China. Despite many studies, the molecular mechanisms that cause the attenuation are still unclear. Here, we determined the whole-genome sequence of 104M and conducted a comprehensive comparative analysis against the whole genome sequences of the virulent strain, A13334, and other reference strains. This analysis revealed a highly similar genome structure between 104M and A13334. The further comparative genomic analysis between 104M and A13334 revealed a set of genes missing in 104M. Some of these genes were identified to be directly or indirectly associated with virulence. Similarly, a set of mutations in the virulence-related genes was also identified, which may be related to virulence alteration. This study provides a set of candidate genes associated with virulence attenuation in B.abortus vaccine strain 104M.

  9. Comparative genomic analysis of Brucella abortus vaccine strain 104M reveals a set of candidate genes associated with its virulence attenuation

    PubMed Central

    Yu, Dong; Hui, Yiming; Zai, Xiaodong; Xu, Junjie; Liang, Long; Wang, Bingxiang; Yue, Junjie; Li, Shanhu

    2015-01-01

    The Brucella abortus strain 104M, a spontaneously attenuated strain, has been used as a vaccine strain in humans against brucellosis for 6 decades in China. Despite many studies, the molecular mechanisms that cause the attenuation are still unclear. Here, we determined the whole-genome sequence of 104M and conducted a comprehensive comparative analysis against the whole genome sequences of the virulent strain, A13334, and other reference strains. This analysis revealed a highly similar genome structure between 104M and A13334. The further comparative genomic analysis between 104M and A13334 revealed a set of genes missing in 104M. Some of these genes were identified to be directly or indirectly associated with virulence. Similarly, a set of mutations in the virulence-related genes was also identified, which may be related to virulence alteration. This study provides a set of candidate genes associated with virulence attenuation in B.abortus vaccine strain 104M. PMID:26039674

  10. A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH® terms

    PubMed Central

    2014-01-01

    Background Understanding the molecular mechanisms involved in disease is critical for the development of more effective and individualized strategies for prevention and treatment. The amount of disease-related literature, including new genetic information on the molecular mechanisms of disease, is rapidly increasing. Extracting beneficial information from literature can be facilitated by computational methods such as the knowledge-discovery approach. Several methods for mining gene-disease relationships using computational methods have been developed, however, there has been a lack of research evaluating specific disease candidate genes. Results We present a novel method for gathering and prioritizing specific disease candidate genes. Our approach involved the construction of a set of Medical Subject Headings (MeSH) terms for the effective retrieval of publications related to a disease candidate gene. Information regarding the relationships between genes and publications was obtained from the gene2pubmed database. The set of genes was prioritized using a “weighted literature score” based on the number of publications and weighted by the number of genes occurring in a publication. Using our method for the disease states of pain and Alzheimer’s disease, a total of 1101 pain candidate genes and 2810 Alzheimer’s disease candidate genes were gathered and prioritized. The precision was 0.30 and the recall was 0.89 in the case study of pain. The precision was 0.04 and the recall was 0.6 in the case study of Alzheimer’s disease. The precision-recall curve indicated that the performance of our method was superior to that of other publicly available tools. Conclusions Our method, which involved the use of a set of MeSH terms related to disease candidate genes and a novel weighted literature score, improved the accuracy of gathering and prioritizing candidate genes by focusing on a specific disease. PMID:24917541

  11. Analysis of Five Gene Sets in Chimpanzees Suggests Decoupling between the Action of Selection on Protein-Coding and on Noncoding Elements

    PubMed Central

    Santpere, Gabriel; Carnero-Montoro, Elena; Petit, Natalia; Serra, François; Hvilsom, Christina; Rambla, Jordi; Heredia-Genestar, Jose Maria; Halligan, Daniel L.; Dopazo, Hernan; Navarro, Arcadi; Bosch, Elena

    2015-01-01

    We set out to investigate potential differences and similarities between the selective forces acting upon the coding and noncoding regions of five different sets of genes defined according to functional and evolutionary criteria: 1) two reference gene sets presenting accelerated and slow rates of protein evolution (the Complement and Actin pathways); 2) a set of genes with evidence of accelerated evolution in at least one of their introns; and 3) two gene sets related to neurological function (Parkinson’s and Alzheimer’s diseases). To that effect, we combine human–chimpanzee divergence patterns with polymorphism data obtained from target resequencing 20 central chimpanzees, our closest relatives with largest long-term effective population size. By using the distribution of fitness effect-alpha extension of the McDonald–Kreitman test, we reproduce inferences of rates of evolution previously based only on divergence data on both coding and intronic sequences and also obtain inferences for other classes of genomic elements (untranslated regions, promoters, and conserved noncoding sequences). Our results suggest that 1) the distribution of fitness effect-alpha method successfully helps distinguishing different scenarios of accelerated divergence (adaptation or relaxed selective constraints) and 2) the adaptive history of coding and noncoding sequences within the gene sets analyzed is decoupled. PMID:25977458

  12. Analysis of Five Gene Sets in Chimpanzees Suggests Decoupling between the Action of Selection on Protein-Coding and on Noncoding Elements.

    PubMed

    Santpere, Gabriel; Carnero-Montoro, Elena; Petit, Natalia; Serra, François; Hvilsom, Christina; Rambla, Jordi; Heredia-Genestar, Jose Maria; Halligan, Daniel L; Dopazo, Hernan; Navarro, Arcadi; Bosch, Elena

    2015-05-14

    We set out to investigate potential differences and similarities between the selective forces acting upon the coding and noncoding regions of five different sets of genes defined according to functional and evolutionary criteria: 1) two reference gene sets presenting accelerated and slow rates of protein evolution (the Complement and Actin pathways); 2) a set of genes with evidence of accelerated evolution in at least one of their introns; and 3) two gene sets related to neurological function (Parkinson's and Alzheimer's diseases). To that effect, we combine human-chimpanzee divergence patterns with polymorphism data obtained from target resequencing 20 central chimpanzees, our closest relatives with largest long-term effective population size. By using the distribution of fitness effect-alpha extension of the McDonald-Kreitman test, we reproduce inferences of rates of evolution previously based only on divergence data on both coding and intronic sequences and also obtain inferences for other classes of genomic elements (untranslated regions, promoters, and conserved noncoding sequences). Our results suggest that 1) the distribution of fitness effect-alpha method successfully helps distinguishing different scenarios of accelerated divergence (adaptation or relaxed selective constraints) and 2) the adaptive history of coding and noncoding sequences within the gene sets analyzed is decoupled.

  13. The STATFLUX code: a statistical method for calculation of flow and set of parameters, based on the Multiple-Compartment Biokinetical Model

    NASA Astrophysics Data System (ADS)

    Garcia, F.; Mesa, J.; Arruda-Neto, J. D. T.; Helene, O.; Vanin, V.; Milian, F.; Deppman, A.; Rodrigues, T. E.; Rodriguez, O.

    2007-03-01

    The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo simulation procedure. Program summaryTitle of program:STATFLUX Catalogue identifier:ADYS_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computer for which the program is designed and others on which it has been tested:Micro-computer with Intel Pentium III, 3.0 GHz Installation:Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, Brazil Operating system:Windows 2000 and Windows XP Programming language used:Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program. Memory required to execute with typical data:8 Mbytes of RAM memory and 100 MB of Hard disk memory No. of bits in a word:16 No. of lines in distributed program, including test data, etc.:6912 No. of bytes in distributed program, including test data, etc.:229 541 Distribution format:tar.gz Nature of the physical problem:The investigation of transport mechanisms for

  14. Validation of the Lung Subtyping Panel in Multiple Fresh-Frozen and Formalin-Fixed, Paraffin-Embedded Lung Tumor Gene Expression Data Sets.

    PubMed

    Faruki, Hawazin; Mayhew, Gregory M; Fan, Cheng; Wilkerson, Matthew D; Parker, Scott; Kam-Morgan, Lauren; Eisenberg, Marcia; Horten, Bruce; Hayes, D Neil; Perou, Charles M; Lai-Goldman, Myla

    2016-06-01

    Context .- A histologic classification of lung cancer subtypes is essential in guiding therapeutic management. Objective .- To complement morphology-based classification of lung tumors, a previously developed lung subtyping panel (LSP) of 57 genes was tested using multiple public fresh-frozen gene-expression data sets and a prospectively collected set of formalin-fixed, paraffin-embedded lung tumor samples. Design .- The LSP gene-expression signature was evaluated in multiple lung cancer gene-expression data sets totaling 2177 patients collected from 4 platforms: Illumina RNAseq (San Diego, California), Agilent (Santa Clara, California) and Affymetrix (Santa Clara) microarrays, and quantitative reverse transcription-polymerase chain reaction. Gene centroids were calculated for each of 3 genomic-defined subtypes: adenocarcinoma, squamous cell carcinoma, and neuroendocrine, the latter of which encompassed both small cell carcinoma and carcinoid. Classification by LSP into 3 subtypes was evaluated in both fresh-frozen and formalin-fixed, paraffin-embedded tumor samples, and agreement with the original morphology-based diagnosis was determined. Results .- The LSP-based classifications demonstrated overall agreement with the original clinical diagnosis ranging from 78% (251 of 322) to 91% (492 of 538 and 869 of 951) in the fresh-frozen public data sets and 84% (65 of 77) in the formalin-fixed, paraffin-embedded data set. The LSP performance was independent of tissue-preservation method and gene-expression platform. Secondary, blinded pathology review of formalin-fixed, paraffin-embedded samples demonstrated concordance of 82% (63 of 77) with the original morphology diagnosis. Conclusions .- The LSP gene-expression signature is a reproducible and objective method for classifying lung tumors and demonstrates good concordance with morphology-based classification across multiple data sets. The LSP panel can supplement morphologic assessment of lung cancers, particularly

  15. MoSET1 (Histone H3K4 Methyltransferase in Magnaporthe oryzae) Regulates Global Gene Expression during Infection-Related Morphogenesis

    PubMed Central

    Pham, Kieu Thi Minh; Inoue, Yoshihiro; Vu, Ba Van; Nguyen, Hanh Hieu; Nakayashiki, Toru; Ikeda, Ken-ichi; Nakayashiki, Hitoshi

    2015-01-01

    Here we report the genetic analyses of histone lysine methyltransferase (KMT) genes in the phytopathogenic fungus Magnaporthe oryzae. Eight putative M. oryzae KMT genes were targeted for gene disruption by homologous recombination. Phenotypic assays revealed that the eight KMTs were involved in various infection processes at varying degrees. Moset1 disruptants (Δmoset1) impaired in histone H3 lysine 4 methylation (H3K4me) showed the most severe defects in infection-related morphogenesis, including conidiation and appressorium formation. Consequently, Δmoset1 lost pathogenicity on wheat host plants, thus indicating that H3K4me is an important epigenetic mark for infection-related gene expression in M. oryzae. Interestingly, appressorium formation was greatly restored in the Δmoset1 mutants by exogenous addition of cAMP or of the cutin monomer, 16-hydroxypalmitic acid. The Δmoset1 mutants were still infectious on the super-susceptible barley cultivar Nigrate. These results suggested that MoSET1 plays roles in various aspects of infection, including signal perception and overcoming host-specific resistance. However, since Δmoset1 was also impaired in vegetative growth, the impact of MoSET1 on gene regulation was not infection specific. ChIP-seq analysis of H3K4 di- and tri-methylation (H3K4me2/me3) and MoSET1 protein during infection-related morphogenesis, together with RNA-seq analysis of the Δmoset1 mutant, led to the following conclusions: 1) Approximately 5% of M. oryzae genes showed significant changes in H3K4-me2 or -me3 abundance during infection-related morphogenesis. 2) In general, H3K4-me2 and -me3 abundance was positively associated with active transcription. 3) Lack of MoSET1 methyltransferase, however, resulted in up-regulation of a significant portion of the M. oryzae genes in the vegetative mycelia (1,491 genes), and during infection-related morphogenesis (1,385 genes), indicating that MoSET1 has a role in gene repression either directly or more

  16. Integrating genetic association, genetics of gene expression, and single nucleotide polymorphism set analysis to identify susceptibility Loci for type 2 diabetes mellitus.

    PubMed

    Greenawalt, Danielle M; Sieberts, Solveig K; Cornelis, Marilyn C; Girman, Cynthia J; Zhong, Hua; Yang, Xia; Guinney, Justin; Qi, Lu; Hu, Frank B

    2012-09-01

    Large-scale genome-wide association studies (GWAS) have identified over 40 genomic regions significantly associated with type 2 diabetes mellitus. However, GWAS results are not always straightforward to interpret, and linking these loci to meaningful disease etiology is often difficult without extensive follow-up studies. The authors expanded on previously reported type 2 diabetes mellitus GWAS from the nested case-control studies of 2 prospective US cohorts by incorporating expression single nucleotide polymorphism (SNP) information and applying SNP set enrichment analysis to identify sets of SNPs associated with genes that could provide further biologic insight to traditional genome-wide analysis. Using data collected between 1989 and 1994 in these previous studies to form a nested case-control study, the authors found that 3 of the most significantly associated SNPs to type 2 diabetes mellitus in their study are expression SNPs to the lymphocyte antigen 75 gene (LY75), the ubiquitin-specific peptidase 36 gene (USP36), and the phosphatidylinositol transfer protein, cytoplasmic 1 gene (PITPNC1). SNP set enrichment analysis of the GWAS results identified enrichment for expression SNPs to the macrophage-enriched module and the Gene Ontology (GO) biologic process fat cell differentiation human, which includes the transcription factor 7-like 2 gene (TCF7L2), as well as other type 2 diabetes mellitus-associated genes. Integrating genome-wide association, gene expression, and gene set analysis may provide valuable biologic support for potential type 2 diabetes mellitus susceptibility loci and may be useful in identifying new targets or pathways of interest for the treatment and prevention of type 2 diabetes mellitus.

  17. Molecular phylogeny of the Arctoidea (Carnivora): effect of missing data on supertree and supermatrix analyses of multiple gene data sets.

    PubMed

    Fulton, Tara L; Strobeck, Curtis

    2006-10-01

    Phylogenetic relationships of 79 caniform carnivores were addressed based on four nuclear sequence-tagged sites (STS) and one nuclear exon, IRBP, using both supertree and supermatrix analyses. We recovered the three major arctoid lineages, Ursidae, Pinnipedia, and Musteloidea, as monophyletic, with Ursidae (bears) strongly supported as the basal arctoid lineage. Within Pinnipedia, Phocidae (true seals) were sister to the Otaroidea [Otariidae (fur seals and sea lions) and Odobenidae (walrus)]. Phocid subfamily and tribal designations were supported, but the otariid subfamily split between fur seals and sea lions was not. All family designations within Musteloidea were strongly supported: Mephitidae (skunks), Ailuridae (monotypic red panda), Mustelidae (weasels, badgers, otters), and Procyonidae (raccoons). A novel hypothesis for the position of the red panda was recovered, placing it as branching after Mephitidae and before Mustelidae+Procyonidae. Within Mustelidae, subfamily taxonomic changes are considered. This study represents the most comprehensive sampling to date of the Caniformia in a molecular study and contains the most complete molecular phylogeny for the Procyonidae. Our data set was also used in an empirical examination of the effect of missing data on both supertree and supermatrix analyses. Sequence for all genes in all taxa could not be obtained, so two variants of the data set with differing amounts of missing data were examined. The amount of missing data did not have a strong effect; instead, phylogenetic resolution was more dependent on the presence of sufficient informative characters. Supertree and supermatrix methods performed equivalently with incomplete data and were highly congruent; conflicts arose only in weakly supported areas, indicating that more informative characters are required to confidently resolve close species relationships.

  18. The complex set of late transcripts from the Drosophila sex determination gene sex-lethal encodes multiple related polypeptides.

    PubMed Central

    Samuels, M E; Schedl, P; Cline, T W

    1991-01-01

    Sex-lethal (Sxl), a key sex determination gene in Drosophila melanogaster, is known to express a set of three early transcripts arising during early embryogenesis and a set of seven late transcripts occurring from midembryogenesis through adulthood. Among the late transcripts, male-specific mRNAs were distinguished from their female counterparts by the presence of an extra exon interrupting an otherwise long open reading frame (ORF). We have now analyzed the structures of the late Sxl transcripts by cDNA sequencing, Northern (RNA) blotting, primer extension, and RNase protection. The late transcripts appear to use a common 5' end but differ at their 3' ends by the use of alternative polyadenylation sites. Two of these sites lack canonical AATAAA sequences, and their use correlates in females with the presence of a functional germ line, suggesting possible tissue-specific polyadenylation. Besides the presence of the male-specific exon, no additional sex-specific splicing events were detected, although a number of non-sex-specific splicing variants were observed. In females, the various forms of late Sxl transcript potentially encode up to six slightly different polypeptides. All of the protein-coding differences occur outside the previously defined ribonucleoprotein motifs. One class of Sxl mRNAs also includes a second long ORF in the same frame as the first ORF but separated from it by a single ochre codon. The function of this second ORF is unknown. Significant amounts of apparently partially processed Sxl RNAs were observed, consistent with the hypothesis that the regulated Sxl splices occur relatively slowly. Images PMID:1710769

  19. IMGT/HighV-QUEST Statistical Significance of IMGT Clonotype (AA) Diversity per Gene for Standardized Comparisons of Next Generation Sequencing Immunoprofiles of Immunoglobulins and T Cell Receptors.

    PubMed

    Aouinti, Safa; Malouche, Dhafer; Giudicelli, Véronique; Kossida, Sofia; Lefranc, Marie-Paule

    2015-01-01

    The adaptive immune responses of humans and of other jawed vertebrate species (gnasthostomata) are characterized by the B and T cells and their specific antigen receptors, the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (up to 2.1012 different IG and TR per individual). IMGT, the international ImMunoGeneTics information system (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc (Montpellier University and CNRS) to manage the huge and complex diversity of these antigen receptors. IMGT built on IMGT-ONTOLOGY concepts of identification (keywords), description (labels), classification (gene and allele nomenclature) and numerotation (IMGT unique numbering), is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. IMGT/HighV-QUEST, the first web portal, and so far the only one, for the next generation sequencing (NGS) analysis of IG and TR, is the paradigm for immune repertoire standardized outputs and immunoprofiles of the adaptive immune responses. It provides the identification of the variable (V), diversity (D) and joining (J) genes and alleles, analysis of the V-(D)-J junction and complementarity determining region 3 (CDR3) and the characterization of the 'IMGT clonotype (AA)' (AA for amino acid) diversity and expression. IMGT/HighV-QUEST compares outputs of different batches, up to one million nucleotide sequencesfor the statistical module. These high throughput IG and TR repertoire immunoprofiles are of prime importance in vaccination, cancer, infectious diseases, autoimmunity and lymphoproliferative disorders, however their comparative statistical analysis still remains a challenge. We present a standardized statistical procedure to analyze IMGT/HighV-QUEST outputs for the evaluation of the significance of the IMGT clonotype (AA) diversity differences in proportions, per gene of a given group, between NGS IG and TR repertoire immunoprofiles. The procedure is generic and

  20. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits

    PubMed Central

    Bakshi, Andrew; Zhu, Zhihong; Vinkhuyzen, Anna A. E.; Hill, W. David; McRae, Allan F.; Visscher, Peter M.; Yang, Jian

    2016-01-01

    We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064–339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT < 5 × 10−8. The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species. PMID:27604177

  1. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits.

    PubMed

    Bakshi, Andrew; Zhu, Zhihong; Vinkhuyzen, Anna A E; Hill, W David; McRae, Allan F; Visscher, Peter M; Yang, Jian

    2016-01-01

    We propose a method (fastBAT) that performs a fast set-based association analysis for human complex traits using summary-level data from genome-wide association studies (GWAS) and linkage disequilibrium (LD) data from a reference sample with individual-level genotypes. We demonstrate using simulations and analyses of real datasets that fastBAT is more accurate and orders of magnitude faster than the prevailing methods. Using fastBAT, we analyze summary data from the latest meta-analyses of GWAS on 150,064-339,224 individuals for height, body mass index (BMI), and schizophrenia. We identify 6 novel gene loci for height, 2 for BMI, and 3 for schizophrenia at PfastBAT < 5 × 10(-8). The gain of power is due to multiple small independent association signals at these loci (e.g. the THRB and FOXP1 loci for schizophrenia). The method is general and can be applied to GWAS data for all complex traits and diseases in humans and to such data in other species. PMID:27604177

  2. Jarid2 is among a set of genes differentially regulated by Nkx2.5 during outflow tract morphogenesis.

    PubMed

    Barth, Jeremy L; Clark, Christopher D; Fresco, Victor M; Knoll, Ellen P; Lee, Benjamin; Argraves, W Scott; Lee, Kyu-Ho

    2010-07-01

    Nkx2.5, a transcription factor implicated in human congenital heart disease, is required for regulation of second heart field (SHF) progenitors contributing to outflow tract (OFT). Here, we define a set of genes (Lrrn1, Elovl2, Safb, Slc39a6, Khdrbs1, Hoxb4, Fez1, Ccdc117, Jarid2, Nrcam, and Enpp3) expressed in SHF containing pharyngeal arch tissue whose regulation is dependent on Nkx2.5. Further investigation shows that Jarid2, which has been implicated in OFT morphogenesis, is a direct target of Nkx2.5 regulation. Jarid2 expression was up-regulated in SHF mesoderm of Nkx2.5-deficient embryos. Chromatin immunoprecipitation analysis showed Nkx2.5 interaction with consensus binding sites in the Jarid2 promoter in pharyngeal arch cells. Finally, Jarid2 promoter activity and mRNA expression levels were down-regulated by Nkx2.5 overexpression. Given the role of Jarid2 as a regulator of early cardiac proliferation, these findings highlight Jarid2 as one of several potential mediators of the critical role played by Nkx2.5 during OFT morphogenesis.

  3. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  4. Fatigue-Related Gene Networks Identified in CD14+ Cells Isolated From HIV-Infected Patients—Part II: Statistical Analysis

    PubMed Central

    Voss, Joachim G.; Dobra, Adrian; Morse, Caryn; Kovacs, Joseph A.; Raju, Raghavan; Danner, Robert L.; Munson, Peter J.; Logan, Carolea; Rangel, Zoila; Adelsberger, Joseph W.; McLaughlin, Mary; Adams, Larry D.; Dalakas, Marinos C.

    2012-01-01

    Purpose In limited samples of valuable biological tissues, univariate ranking methods of microarray analyses often fail to show significant differences among expression profiles. In order to allow for hypothesis generation, novel statistical modeling systems can be greatly beneficial. The authors applied new statistical approaches to solve the issue of limited experimental data to generate new hypotheses in CD14+ cells of patients with HIV-related fatigue (HRF) and healthy controls. Methodology We compared gene expression profiles of CD14+ cells of nucleoside reverse transcriptase inhibitor (NRTI)-treated HIV patients with low versus high fatigue to healthy controls (n = 5 each). With novel Bayesian modeling procedures, the authors identified 32 genes predictive of low versus high fatigue and 33 genes predictive of healthy versus HIV infection. Sparse association and liquid association networks further elucidated the possible biological pathways in which these genes are involved. Relevance for nursing practice Genetic networks developed in a comprehensive Bayesian framework from small sample sizes allow nursing researchers to design future research approaches to address such issues as HRF. Implication for practice The findings from this pilot study may take us one step closer to the development of useful biomarker targets for fatigue status. Specific and reliable tests are needed to diagnosis, monitor and treat fatigue and mitochondrial dysfunction. PMID:22084402

  5. From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations.

    PubMed

    Du, Pan; Feng, Gang; Flatow, Jared; Song, Jie; Holko, Michelle; Kibbe, Warren A; Lin, Simon M

    2009-06-15

    Subjective methods have been reported to adapt a general-purpose ontology for a specific application. For example, Gene Ontology (GO) Slim was created from GO to generate a highly aggregated report of the human-genome annotation. We propose statistical methods to adapt the general purpose, OBO Foundry Disease Ontology (DO) for the identification of gene-disease associations. Thus, we need a simplified definition of disease categories derived from implicated genes. On the basis of the assumption that the DO terms having similar associated genes are closely related, we group the DO terms based on the similarity of gene-to-DO mapping profiles. Two types of binary distance metrics are defined to measure the overall and subset similarity between DO terms. A compactness-scalable fuzzy clustering method is then applied to group similar DO terms. To reduce false clustering, the semantic similarities between DO terms are also used to constrain clustering results. As such, the DO terms are aggregated and the redundant DO terms are largely removed. Using these methods, we constructed a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite). We demonstrated that DOLite results in more interpretable results than DO for gene-disease association tests. The resultant DOLite has been used in the Functional Disease Ontology (FunDO) Web application at http://www.projects.bioinformatics.northwestern.edu/fundo.

  6. Different Sets of Post-Embryonic Development Genes Are Conserved or Lost in Two Caryophyllales Species (Reaumuria soongorica and Agriophyllum squarrosum).

    PubMed

    Zhao, Pengshan; Zhang, Jiwei; Zhao, Xin; Chen, Guoxiong; Ma, Xiao-Fei

    2016-01-01

    Reaumuria soongorica and sand rice (Agriophyllum squarrosum) belong to the clade of Caryophyllales and are widely distributed in the desert regions of north China. Both plants have evolved many specific traits and adaptation strategies to cope with recurring environmental threats. However, the genetic basis that underpins their unique traits and adaptation remains unknown. In this study, the transcriptome data of R. soongorica and sand rice were compared with three other species with previously sequenced genomes (Arabidopsis thaliana, Oryza sativa, and Beta vulgaris). Four different gene sets were identified, namely, the genes conserved in both species, those lost in both species, those conserved in R. soongorica only, and those conserved in sand rice only. Gene ontology showed that post-embryonic development genes (PEDGs) were enriched in all gene sets, and different sets of PEDGs were conserved or lost in both the R. soongorica and sand rice genomes. Expression profiles of Arabidopsis orthologs further provided some clues to the function of the species-specific conserved PEDGs. Such orthologs included LEAFY PETIOLE, which could be a candidate gene involved in the development of branch priority in sand rice. PMID:26815143

  7. Different Sets of Post-Embryonic Development Genes Are Conserved or Lost in Two Caryophyllales Species (Reaumuria soongorica and Agriophyllum squarrosum)

    PubMed Central

    Zhao, Pengshan; Zhang, Jiwei; Zhao, Xin; Chen, Guoxiong; Ma, Xiao-Fei

    2016-01-01

    Reaumuria soongorica and sand rice (Agriophyllum squarrosum) belong to the clade of Caryophyllales and are widely distributed in the desert regions of north China. Both plants have evolved many specific traits and adaptation strategies to cope with recurring environmental threats. However, the genetic basis that underpins their unique traits and adaptation remains unknown. In this study, the transcriptome data of R. soongorica and sand rice were compared with three other species with previously sequenced genomes (Arabidopsis thaliana, Oryza sativa, and Beta vulgaris). Four different gene sets were identified, namely, the genes conserved in both species, those lost in both species, those conserved in R. soongorica only, and those conserved in sand rice only. Gene ontology showed that post-embryonic development genes (PEDGs) were enriched in all gene sets, and different sets of PEDGs were conserved or lost in both the R. soongorica and sand rice genomes. Expression profiles of Arabidopsis orthologs further provided some clues to the function of the species-specific conserved PEDGs. Such orthologs included LEAFY PETIOLE, which could be a candidate gene involved in the development of branch priority in sand rice. PMID:26815143

  8. Finding dominant sets in microarray data.

    PubMed

    Fu, Xuping; Teng, Li; Li, Yao; Chen, Wenbin; Mao, Yumin; Shen, I-Fan; Xie, Yi

    2005-01-01

    Clustering allows us to extract groups of genes that are tightly coexpressed from Microarray data. In this paper, a new method DSF_Clust is developed to find dominant sets (clusters). We have preformed DSF_Clust on several gene expression datasets and given the evaluation with some criteria. The results showed that this approach could cluster dominant sets of good quality compared to kmeans method. DSF_Clust deals with three issues that have bedeviled clustering, some dominant sets being statistically determined in a significance level, predefining cluster structure being not required, and the quality of a dominant set being ensured. We have also applied this approach to analyze published data of yeast cell cycle gene expression and found some biologically meaningful gene groups to be dug out. Furthermore, DSF_Clust is a potentially good tool to search for putative regulatory signals.

  9. Isolated pentagon rule violating endohedral metallofullerenes explained using the Hückel rule: a statistical mechanical study of the C84 Isomeric Set.

    PubMed

    Fuhrer, Timothy J; Lambert, Angel M

    2015-01-30

    Fullerenes and their structure and stability have been a major topic of discussion and research since their discovery nearly 30 years ago. The isolated pentagon rule (IPR) has long served as a guideline for predicting the most stable fullerene cages. More recently, endohedral metallofullerenes have been discovered that violate the IPR. This article presents a systematic, temperature dependent, statistical thermodynamic study of the 24 possible IPR isomers of C84 as well as two of the experimentally known non-IPR isomers (51365 and 51383), at several different charges (0, -2, -4, and -6). From the results of this study, we conclude that the Hückel rule is a valid simpler explanation for the stability of fused pentagons in endohedral metallofullerenes.

  10. t-Test at the Probe Level: An Alternative Method to Identify Statistically Significant Genes for Microarray Data

    PubMed Central

    Boareto, Marcelo; Caticha, Nestor

    2014-01-01

    Microarray data analysis typically consists in identifying a list of differentially expressed genes (DEG), i.e., the genes that are differentially expressed between two experimental conditions. Variance shrinkage methods have been considered a better choice than the standard t-test for selecting the DEG because they correct the dependence of the error with the expression level. This dependence is mainly caused by errors in background correction, which more severely affects genes with low expression values. Here, we propose a new method for identifying the DEG that overcomes this issue and does not require background correction or variance shrinkage. Unlike current methods, our methodology is easy to understand and implement. It consists of applying the standard t-test directly on the normalized intensity data, which is possible because the probe intensity is proportional to the gene expression level and because the t-test is scale- and location-invariant. This methodology considerably improves the sensitivity and robustness of the list of DEG when compared with the t-test applied to preprocessed data and to the most widely used shrinkage methods, Significance Analysis of Microarrays (SAM) and Linear Models for Microarray Data (LIMMA). Our approach is useful especially when the genes of interest have small differences in expression and therefore get ignored by standard variance shrinkage methods.

  11. Choice of a stable set of reference genes for qRT-PCR analysis in Amblyomma maculatum (Acari: Ixodidae).

    PubMed

    Browning, Rebecca; Adamson, Steven; Karim, Shahid

    2012-11-01

    Quantitative real-time reverse transcriptase polymerase chain reaction (qRT-PCR) is a widely used laboratory tool to quantify mRNA levels of target genes involved in various biological processes. The most commonly used method for analyzing qRT-PCR data are the normalizing technique where a housekeeping gene is used to determine the transcriptional regulation of the target gene. The choice of a reliable internal standard is pivotal for relative gene expression analysis to obtain reproducible results, especially when measuring small differences in transcriptional expression. In this study, we used geNorm, NormFinder, and BestKeeper programs to analyze the gene expression results using qRT-PCR. Five candidate reference genes, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), beta-actin, alpha-tubulin, elongation factor 1-alpha, and glutathione s-transferase, were used to evaluate the expression stability during prolonged blood-feeding on the vertebrate host. These five genes were evaluated in all life stages of Amblyomma maculatum (Koch) as well as in the salivary gland and midgut tissues of adult females to determine which are the most stably expressed gene for use in qRT-PCR studies. Beta-actin is the most stably expressed gene in salivary glands and midguts ofA. maculatum, and throughout all developmental stages both Actin and GAPDH were found to have the most stable expression with the lowest degree of variance. We recommend the use of beta-actin and/ or GAPDH as reference genes for qRT-PCR analysis of gene expression in A. maculatum.

  12. Time series analysis of benzo[A]pyrene-induced transcriptome changes suggests that a network of transcription factors regulates the effects on functional gene sets.

    PubMed

    van Delft, Joost H M; Mathijs, Karen; Staal, Yvonne C M; van Herwijnen, Marcel H M; Brauers, Karen J J; Boorsma, André; Kleinjans, Jos C S

    2010-10-01

    Chemical carcinogens may cause a multitude of effects inside cells, thereby affecting transcript levels of genes by direct activation of transcription factors (TF) or indirectly through the formation of DNA damage. As the temporal profiles of these responses may be profoundly different, examining time-dependent changes may provide new insights in TF networks related to cellular responses to chemical carcinogens. Therefore, we investigated in human hepatoma cells gene expression changes caused by benzo[a]pyrene at 12 time points after exposure, in relation to DNA adduct and cell cycle. Temporal profiles for functional gene sets demonstrate both early and late effects in up- and downregulation of relevant gene sets involved in cell cycle, apoptosis, DNA repair, and metabolism of amino acids and lipids. Many significant transcription regulation networks appeared to be around TF that are proto-oncogenes or tumor suppressor genes. The time series analysis tool Short Time-series Expression Miner (STEM) was used to identify time-dependent correlation of pathways, gene sets, TF networks, and biological parameters. Most correlations are with DNA adduct levels, which is an early response, and less with the later responses on G1 and S phase cells. The majority of the modulated genes in the Reactome pathways can be regulated by several of these TF, e.g., 73% by nuclear factor-kappa B and 34-42% by c-MYC, SRF, AP1, and E2F1. All these TF can also regulate one or more of the others. Our data indicate that a complex network of a few TF is responsible for the majority of the transcriptional changes induced by BaP. This network hardly changes over time, despite that the transcriptional profiles clearly alter, suggesting that also other regulatory mechanisms are involved.

  13. Differential expression of sets of highly homologous variable region gene products in selected and preimmune repertoires of inbred mouse strains

    PubMed Central

    1986-01-01

    Using mAb that selectively recognize the various allelic forms of the VHT15 and Vk21D-E genes' products, we analyzed the influence of VH and Vk polymorphism on the probability of expression of these gene segments. Our data show that the frequency to which the VHT15 gene product becomes available in the preimmune repertoire is strongly influenced by the polymorphism of the relevant structural gene, suggesting therefore that VH genes cannot be randomly used in the various strains. Contrary to this, the frequency of Vk21D-E+ clones is similar in all mouse strains tested, and in all cases is higher than the frequency of VHT15 clones. This observation strongly suggests that Vk genes can be randomly expressed, and/or that their number is lower than that of their VH counterpart. Finally, analysis of the specificity associated to the expression of the VHT15 segment revealed that VH polymorphism strongly influences not only the probability of expression of each V gene, but also the specificity of the antibodies on which these VH genes are used. PMID:3084699

  14. Design and Experimental Application of a Novel Non-Degenerate Universal Primer Set that Amplifies Prokaryotic 16S rRNA Genes with a Low Possibility to Amplify Eukaryotic rRNA Genes

    PubMed Central

    Mori, Hiroshi; Maruyama, Fumito; Kato, Hiromi; Toyoda, Atsushi; Dozono, Ayumi; Ohtsubo, Yoshiyuki; Nagata, Yuji; Fujiyama, Asao; Tsuda, Masataka; Kurokawa, Ken

    2014-01-01

    The deep sequencing of 16S rRNA genes amplified by universal primers has revolutionized our understanding of microbial communities by allowing the characterization of the diversity of the uncultured majority. However, some universal primers also amplify eukaryotic rRNA genes, leading to a decrease in the efficiency of sequencing of prokaryotic 16S rRNA genes with possible mischaracterization of the diversity in the microbial community. In this study, we compared 16S rRNA gene sequences from genome-sequenced strains and identified candidates for non-degenerate universal primers that could be used for the amplification of prokaryotic 16S rRNA genes. The 50 identified candidates were investigated to calculate their coverage for prokaryotic and eukaryotic rRNA genes, including those from uncultured taxa and eukaryotic organelles, and a novel universal primer set, 342F-806R, covering many prokaryotic, but not eukaryotic, rRNA genes was identified. This primer set was validated by the amplification of 16S rRNA genes from a soil metagenomic sample and subsequent pyrosequencing using the Roche 454 platform. The same sample was also used for pyrosequencing of the amplicons by employing a commonly used primer set, 338F-533R, and for shotgun metagenomic sequencing using the Illumina platform. Our comparison of the taxonomic compositions inferred by the three sequencing experiments indicated that the non-degenerate 342F-806R primer set can characterize the taxonomic composition of the microbial community without substantial bias, and is highly expected to be applicable to the analysis of a wide variety of microbial communities. PMID:24277737

  15. Design and experimental application of a novel non-degenerate universal primer set that amplifies prokaryotic 16S rRNA genes with a low possibility to amplify eukaryotic rRNA genes.

    PubMed

    Mori, Hiroshi; Maruyama, Fumito; Kato, Hiromi; Toyoda, Atsushi; Dozono, Ayumi; Ohtsubo, Yoshiyuki; Nagata, Yuji; Fujiyama, Asao; Tsuda, Masataka; Kurokawa, Ken

    2014-01-01

    The deep sequencing of 16S rRNA genes amplified by universal primers has revolutionized our understanding of microbial communities by allowing the characterization of the diversity of the uncultured majority. However, some universal primers also amplify eukaryotic rRNA genes, leading to a decrease in the efficiency of sequencing of prokaryotic 16S rRNA genes with possible mischaracterization of the diversity in the microbial community. In this study, we compared 16S rRNA gene sequences from genome-sequenced strains and identified candidates for non-degenerate universal primers that could be used for the amplification of prokaryotic 16S rRNA genes. The 50 identified candidates were investigated to calculate their coverage for prokaryotic and eukaryotic rRNA genes, including those from uncultured taxa and eukaryotic organelles, and a novel universal primer set, 342F-806R, covering many prokaryotic, but not eukaryotic, rRNA genes was identified. This primer set was validated by the amplification of 16S rRNA genes from a soil metagenomic sample and subsequent pyrosequencing using the Roche 454 platform. The same sample was also used for pyrosequencing of the amplicons by employing a commonly used primer set, 338F-533R, and for shotgun metagenomic sequencing using the Illumina platform. Our comparison of the taxonomic compositions inferred by the three sequencing experiments indicated that the non-degenerate 342F-806R primer set can characterize the taxonomic composition of the microbial community without substantial bias, and is highly expected to be applicable to the analysis of a wide variety of microbial communities.

  16. IMGT/HighV-QUEST Statistical Significance of IMGT Clonotype (AA) Diversity per Gene for Standardized Comparisons of Next Generation Sequencing Immunoprofiles of Immunoglobulins and T Cell Receptors

    PubMed Central

    Aouinti, Safa; Malouche, Dhafer; Giudicelli, Véronique; Kossida, Sofia; Lefranc, Marie-Paule

    2015-01-01

    The adaptive immune responses of humans and of other jawed vertebrate species (gnasthostomata) are characterized by the B and T cells and their specific antigen receptors, the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (up to 2.1012 different IG and TR per individual). IMGT, the international ImMunoGeneTics information system (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc (Montpellier University and CNRS) to manage the huge and complex diversity of these antigen receptors. IMGT built on IMGT-ONTOLOGY concepts of identification (keywords), description (labels), classification (gene and allele nomenclature) and numerotation (IMGT unique numbering), is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. IMGT/HighV-QUEST, the first web portal, and so far the only one, for the next generation sequencing (NGS) analysis of IG and TR, is the paradigm for immune repertoire standardized outputs and immunoprofiles of the adaptive immune responses. It provides the identification of the variable (V), diversity (D) and joining (J) genes and alleles, analysis of the V-(D)-J junction and complementarity determining region 3 (CDR3) and the characterization of the ‘IMGT clonotype (AA)’ (AA for amino acid) diversity and expression. IMGT/HighV-QUEST compares outputs of different batches, up to one million nucleotide sequencesfor the statistical module. These high throughput IG and TR repertoire immunoprofiles are of prime importance in vaccination, cancer, infectious diseases, autoimmunity and lymphoproliferative disorders, however their comparative statistical analysis still remains a challenge. We present a standardized statistical procedure to analyze IMGT/HighV-QUEST outputs for the evaluation of the significance of the IMGT clonotype (AA) diversity differences in proportions, per gene of a given group, between NGS IG and TR repertoire immunoprofiles. The procedure is generic and

  17. A general and accurate approach for computing the statistical power of the transmission disequilibrium test for complex disease genes.

    PubMed

    Chen, W M; Deng, H W

    2001-07-01

    Transmission disequilibrium test (TDT) is a nuclear family-based analysis that can test linkage in the presence of association. It has gained extensive attention in theoretical investigation and in practical application; in both cases, the accuracy and generality of the power computation of the TDT are crucial. Despite extensive investigations, previous approaches for computing the statistical power of the TDT are neither accurate nor general. In this paper, we develop a general and highly accurate approach to analytically compute the power of the TDT. We compare the results from our approach with those from several other recent papers, all against the results obtained from computer simulations. We show that the results computed from our approach are more accurate than or at least the same as those from other approaches. More importantly, our approach can handle various situations, which include (1) families that consist of one or more children and that have any configuration of affected and nonaffected sibs; (2) families ascertained through the affection status of parent(s); (3) any mixed sample with different types of families in (1) and (2); (4) the marker locus is not a disease susceptibility locus; and (5) existence of allelic heterogeneity. We implement this approach in a user-friendly computer program: TDT Power Calculator. Its applications are demonstrated. The approach and the program developed here should be significant for theoreticians to accurately investigate the statistical power of the TDT in various situations, and for empirical geneticists to plan efficient studies using the TDT.

  18. Speeding up directed evolution: Combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort.

    PubMed

    Hoebenreich, Sabrina; Zilly, Felipe E; Acevedo-Rocha, Carlos G; Zilly, Matías; Reetz, Manfred T

    2015-03-20

    Efficient and economic methods in directed evolution at the protein, metabolic, and genome level are needed for biocatalyst development and the success of synthetic biology. In contrast to random strategies, semirational approaches such as saturation mutagenesis explore the sequence space in a focused manner. Although several combinatorial libraries based on saturation mutagenesis have been reported using solid-phase gene synthesis, direct comparison with traditional PCR-based methods is currently lacking. In this work, we compare combinatorial protein libraries created in-house via PCR versus those generated by commercial solid-phase gene synthesis. Using descriptive statistics and probabilistic distributions on amino acid occurrence frequencies, the quality of the libraries was assessed and compared, revealing that the outsourced libraries are characterized by less bias and outliers than the PCR-based ones. Afterward, we screened all libraries following a traditional algorithm for almost complete library coverage and compared this approach with an emergent statistical concept suggesting screening a lower portion of the protein sequence space. Upon analyzing the biocatalytic landscapes and best hits of all combinatorial libraries, we show that the screening effort could have been reduced in all cases by more than 50%, while still finding at least one of the best mutants. PMID:24921161

  19. DSIR: assessing the design of highly potent siRNA by testing a set of cancer-relevant target genes.

    PubMed

    Filhol, Odile; Ciais, Delphine; Lajaunie, Christian; Charbonnier, Peggy; Foveau, Nicolas; Vert, Jean-Philippe; Vandenbrouck, Yves

    2012-01-01

    Chemically synthesized small interfering RNA (siRNA) is a widespread molecular tool used to knock down genes in mammalian cells. However, designing potent siRNA remains challenging. Among tools predicting siRNA efficacy, very few have been validated on endogenous targets in realistic experimental conditions. We previously described a tool to assist efficient siRNA design (DSIR, Designer of siRNA), which focuses on intrinsic features of the siRNA sequence. Here, we evaluated DSIR's performance by systematically investigating the potency of the siRNA it designs to target ten cancer-related genes. mRNA knockdown was measured by quantitative RT-PCR in cell-based assays, revealing that over 60% of siRNA sequences designed by DSIR silenced their target genes by at least 70%. Silencing efficacy was sustained even when low siRNA concentrations were used. This systematic analysis revealed in particular that, for a subset of genes, the efficiency of siRNA constructs significantly increases when the sequence is located closer to the 5'-end of the target gene coding sequence, suggesting the distance to the 5'-end as a new feature for siRNA potency prediction. A new version of DSIR incorporating these new findings, as well as the list of validated siRNA against the tested cancer genes, has been made available on the web (http://biodev.extra.cea.fr/DSIR).

  20. Heat Stress and Lipopolysaccharide Stimulation of Chicken Macrophage-Like Cell Line Activates Expression of Distinct Sets of Genes

    PubMed Central

    Slawinska, Anna; Hsieh, John C.; Schmidt, Carl J.; Lamont, Susan J.

    2016-01-01

    Acute heat stress requires immediate adjustment of the stressed individual to sudden changes of ambient temperatures. Chickens are particularly sensitive to heat stress due to development of insufficient physiological mechanisms to mitigate its effects. One of the symptoms of heat stress is endotoxemia that results from release of the lipopolysaccharide (LPS) from the guts. Heat-related cytotoxicity is mitigated by the innate immune system, which is comprised mostly of phagocytic cells such as monocytes and macrophages. The objective of this study was to analyze the molecular responses of the chicken macrophage-like HD11 cell line to combined heat stress and lipopolysaccharide treatment in vitro. The cells were heat-stressed and then allowed a temperature-recovery period, during which the gene expression was investigated. LPS was added to the cells to mimic the heat-stress-related endotoxemia. Semi high-throughput gene expression analysis was used to study a gene panel comprised of heat shock proteins, stress-related genes, signaling molecules and immune response genes. HD11 cell line responded to heat stress with increased mRNA abundance of the HSP25, HSPA2 and HSPH1 chaperones as well as DNAJA4 and DNAJB6 co-chaperones. The anti-apoptotic gene BAG3 was also highly up-regulated, providing evidence that the cells expressed pro-survival processes. The immune response of the HD11 cell line to LPS in the heat stress environment (up-regulation of CCL4, CCL5, IL1B, IL8 and iNOS) was higher than in thermoneutral conditions. However, the peak in the transcriptional regulation of the immune genes was after two hours of temperature-recovery. Therefore, we propose the potential influence of the extracellular heat shock proteins not only in mitigating effects of abiotic stress but also in triggering the higher level of the immune responses. Finally, use of correlation networks for the data analysis aided in discovering subtle differences in the gene expression (i.e. the role

  1. Enzymes Catalyzing the Early Steps of Clavulanic Acid Biosynthesis Are Encoded by Two Sets of Paralogous Genes in Streptomyces clavuligerus

    PubMed Central

    Jensen, Susan E.; Elder, Kenneth J.; Aidoo, Kwamena A.; Paradkar, Ashish S.

    2000-01-01

    Genes encoding the proteins required for clavulanic acid biosynthesis and for cephamycin biosynthesis are grouped into a “supercluster” in Streptomyces clavuligerus. Nine open reading frames (ORFs) associated with clavulanic acid biosynthesis were located in a 15-kb segment of the supercluster, including six ORFs encoding known biosynthetic enzymes or regulatory proteins, two ORFs that have been reported previously but whose involvement in clavulanic acid biosynthesis is unclear, and one ORF not previously reported. Evidence for the involvement of these ORFs in clavulanic acid production was obtained by generating mutants and showing that all were defective for clavulanic acid production when grown on starch asparagine medium. However, when five of the nine mutants, including mutants defective in known clavulanic acid biosynthetic enzymes, were grown in a soy-based medium, clavulanic acid-producing ability was restored. This ability to produce clavulanic acid when seemingly essential biosynthetic enzymes have been mutated suggests that paralogous genes encoding functionally equivalent proteins exist for each of the five genes but that these paralogues are expressed only in the soy-based medium. The five genes that have paralogues encode proteins involved in the early steps of the pathway common to the biosynthesis of both clavulanic acid and the other clavam metabolites produced by this organism. No evidence was seen for paralogues of the four remaining genes involved in late, clavulanic acid-specific steps in the pathway. PMID:10681345

  2. XRCC5 as a risk gene for alcohol dependence: evidence from a genome-wide gene-set-based analysis and follow-up studies in Drosophila and humans.

    PubMed

    Juraeva, Dilafruz; Treutlein, Jens; Scholz, Henrike; Frank, Josef; Degenhardt, Franziska; Cichon, Sven; Ridinger, Monika; Mattheisen, Manuel; Witt, Stephanie H; Lang, Maren; Sommer, Wolfgang H; Hoffmann, Per; Herms, Stefan; Wodarz, Norbert; Soyka, Michael; Zill, Peter; Maier, Wolfgang; Jünger, Elisabeth; Gaebel, Wolfgang; Dahmen, Norbert; Scherbaum, Norbert; Schmäl, Christine; Steffens, Michael; Lucae, Susanne; Ising, Marcus; Smolka, Michael N; Zimmermann, Ulrich S; Müller-Myhsok, Bertram; Nöthen, Markus M; Mann, Karl; Kiefer, Falk; Spanagel, Rainer; Brors, Benedikt; Rietschel, Marcella

    2015-01-01

    Genetic factors have as large role as environmental factors in the etiology of alcohol dependence (AD). Although genome-wide association studies (GWAS) enable systematic searches for loci not hitherto implicated in the etiology of AD, many true findings may be missed owing to correction for multiple testing. The aim of the present study was to circumvent this limitation by searching for biological system-level differences, and then following up these findings in humans and animals. Gene-set-based analysis of GWAS data from 1333 cases and 2168 controls identified 19 significantly associated gene-sets, of which 5 could be replicated in an independent sample. Clustered in these gene-sets were novel and previously identified susceptibility genes. The most frequently present gene, ie in 6 out of 19 gene-sets, was X-ray repair complementing defective repair in Chinese hamster cells 5 (XRCC5). Previous human and animal studies have implicated XRCC5 in alcohol sensitivity. This phenotype is inversely correlated with the development of AD, presumably as more alcohol is required to achieve the desired effects. In the present study, the functional role of XRCC5 in AD was further validated in animals and humans. Drosophila mutants with reduced function of Ku80-the homolog of mammalian XRCC5-due to RNAi silencing showed reduced sensitivity to ethanol. In humans with free access to intravenous ethanol self-administration in the laboratory, the maximum achieved blood alcohol concentration was influenced in an allele-dose-dependent manner by genetic variation in XRCC5. In conclusion, our convergent approach identified new candidates and generated independent evidence for the involvement of XRCC5 in alcohol dependence. PMID:25035082

  3. XRCC5 as a Risk Gene for Alcohol Dependence: Evidence from a Genome-Wide Gene-Set-Based Analysis and Follow-up Studies in Drosophila and Humans

    PubMed Central

    Juraeva, Dilafruz; Treutlein, Jens; Scholz, Henrike; Frank, Josef; Degenhardt, Franziska; Cichon, Sven; Ridinger, Monika; Mattheisen, Manuel; Witt, Stephanie H; Lang, Maren; Sommer, Wolfgang H; Hoffmann, Per; Herms, Stefan; Wodarz, Norbert; Soyka, Michael; Zill, Peter; Maier, Wolfgang; Jünger, Elisabeth; Gaebel, Wolfgang; Dahmen, Norbert; Scherbaum, Norbert; Schmäl, Christine; Steffens, Michael; Lucae, Susanne; Ising, Marcus; Smolka, Michael N; Zimmermann, Ulrich S; Müller-Myhsok, Bertram; Nöthen, Markus M; Mann, Karl; Kiefer, Falk; Spanagel, Rainer; Brors, Benedikt; Rietschel, Marcella

    2015-01-01

    Genetic factors have as large role as environmental factors in the etiology of alcohol dependence (AD). Although genome-wide association studies (GWAS) enable systematic searches for loci not hitherto implicated in the etiology of AD, many true findings may be missed owing to correction for multiple testing. The aim of the present study was to circumvent this limitation by searching for biological system-level differences, and then following up these findings in humans and animals. Gene-set-based analysis of GWAS data from 1333 cases and 2168 controls identified 19 significantly associated gene-sets, of which 5 could be replicated in an independent sample. Clustered in these gene-sets were novel and previously identified susceptibility genes. The most frequently present gene, ie in 6 out of 19 gene-sets, was X-ray repair complementing defective repair in Chinese hamster cells 5 (XRCC5). Previous human and animal studies have implicated XRCC5 in alcohol sensitivity. This phenotype is inversely correlated with the development of AD, presumably as more alcohol is required to achieve the desired effects. In the present study, the functional role of XRCC5 in AD was further validated in animals and humans. Drosophila mutants with reduced function of Ku80—the homolog of mammalian XRCC5—due to RNAi silencing showed reduced sensitivity to ethanol. In humans with free access to intravenous ethanol self-administration in the laboratory, the maximum achieved blood alcohol concentration was influenced in an allele-dose-dependent manner by genetic variation in XRCC5. In conclusion, our convergent approach identified new candidates and generated independent evidence for the involvement of XRCC5 in alcohol dependence. PMID:25035082

  4. Niemeyer Virus: A New Mimivirus Group A Isolate Harboring a Set of Duplicated Aminoacyl-tRNA Synthetase Genes

    PubMed Central

    Boratto, Paulo V. M.; Arantes, Thalita S.; Silva, Lorena C. F.; Assis, Felipe L.; Kroon, Erna G.; La Scola, Bernard; Abrahão, Jônatas S.

    2015-01-01

    It is well recognized that gene duplication/acquisition is a key factor for molecular evolution, being directly related to the emergence of new genetic variants. The importance of such phenomena can also be expanded to the viral world, with impacts on viral fitness and environmental adaptations. In this work we describe the isolation and characterization of Niemeyer virus, a new mimivirus isolate obtained from water samples of an urban lake in Brazil. Genomic data showed that Niemeyer harbors duplicated copies of three of its four aminoacyl-tRNA synthetase genes (cysteinyl, methionyl, and tyrosyl RS). Gene expression analysis showed that such duplications allowed significantly increased expression of methionyl and tyrosyl aaRS mRNA by Niemeyer in comparison to APMV. Remarkably, phylogenetic data revealed that Niemeyer duplicated gene pairs are different, each one clustering with a different group of mimivirus strains. Taken together, our results raise new questions about the origins and selective pressures involving events of aaRS gain and loss among mimiviruses. PMID:26635738

  5. Polymorphisms in sodium-dependent vitamin C transporter genes and plasma, aqueous humor and lens nucleus ascorbate concentrations in an ascorbate depleted setting.

    PubMed

    Senthilkumari, Srinivasan; Talwar, Badri; Dharmalingam, Kuppamuthu; Ravindran, Ravilla D; Jayanthi, Ramamurthy; Sundaresan, Periasamy; Saravanan, Charu; Young, Ian S; Dangour, Alan D; Fletcher, Astrid E

    2014-07-01

    We have previously reported low concentrations of plasma ascorbate and low dietary vitamin C intake in the older Indian population and a strong inverse association of these with cataract. Little is known about ascorbate levels in aqueous humor and lens in populations habitually depleted of ascorbate and no studies in any setting have investigated whether genetic polymorphisms influence ascorbate levels in ocular tissues. Our objectives were to investigate relationships between ascorbate concentrations in plasma, aqueous humor and lens and whether these relationships are influenced by Single Nucleotide Polymorphisms (SNPs) in sodium-dependent vitamin C transporter genes (SLC23A1 and SLC23A2). We enrolled sixty patients (equal numbers of men and women, mean age 63 years) undergoing small incision cataract surgery in southern India. We measured ascorbate concentrations in plasma, aqueous humor and lens nucleus using high performance liquid chromatography. SLC23A1 SNPs (rs4257763, rs6596473) and SLC23A2 SNPs (rs1279683 and rs12479919) were genotyped using a TaqMan assay. Patients were interviewed for lifestyle factors which might influence ascorbate. Plasma vitamin C was normalized by a log10 transformation. Statistical analysis used linear regression with the slope of the within-subject associations estimated using beta (β) coefficients. The ascorbate concentrations (μmol/L) were: plasma ascorbate, median and inter-quartile range (IQR), 15.2 (7.8, 34.5), mean (SD) of aqueous humor ascorbate, 1074 (545) and lens nucleus ascorbate, 0.42 (0.16) (μmol/g lens nucleus wet weight). Minimum allele frequencies were: rs1279683 (0.28), rs12479919 (0.30), rs659647 (0.48). Decreasing concentrations of ocular ascorbate from the common to the rare genotype were observed for rs6596473 and rs12479919. The per allele difference in aqueous humor ascorbate for rs6596473 was -217 μmol/L, p < 0.04 and a per allele difference in lens nucleus ascorbate of -0.085 μmol/g, p < 0

  6. In silico analyses identify gene-sets, associated with clinical outcome in ovarian cancer: role of mitotic kinases

    PubMed Central

    Ocaña, Alberto; Pérez-Peña, Javier; Alcaraz-Sanabria, Ana; Sánchez-Corrales, Verónica; Nieto-Jiménez, Cristina; Templeton, Arnoud J.; Seruga, Bostjan; Pandiella, Atanasio; Amir, Eitan

    2016-01-01

    Introduction Accurate assessment of prognosis in early stage ovarian cancer is challenging resulting in suboptimal selection of patients for adjuvant therapy. The identification of predictive markers for cytotoxic chemotherapy is therefore highly desirable. Protein kinases are important components in oncogenic transformation and those relating to cell cycle and mitosis control may allow for identification of high-risk early stage ovarian tumors. Methods Genes with differential expression in ovarian surface epithelia (OSE) and ovarian cancer epithelial cells (CEPIs) were identified from public datasets and analyzed with dChip software. Progression-free (PFS) and overall survival (OS) associated with these genes in stage I/II and late stage ovarian cancer was explored using the Kaplan Meier Plotter online tool. Results Of 2925 transcripts associated with modified expression in CEPIs compared to OSE, 66 genes coded for upregulated protein kinases. Expression of 9 of these genes (CDC28, CHK1, NIMA, Aurora kinase A, Aurora kinase B, BUB1, BUB1βB, CDKN2A and TTK) was associated with worse PFS (HR:3.40, log rank p<0.001). The combined analyses of CHK1, CDKN2A, AURKA, AURKB, TTK and NEK2 showed the highest magnitude of association with PFS (HR:4.62, log rank p<0.001). Expression of AURKB predicted detrimental OS in stage I/II ovarian cancer better than all other combinations Conclusion Genes linked to cell cycle control are associated with worse outcome in early stage ovarian cancer. Incorporation of these biomarkers in clinical studies may help in the identification of patients at high risk of relapse for whom optimizing adjuvant therapeutic strategies is needed. PMID:26992217

  7. Statistical and Biological Gene-Lifestyle Interactions of MC4R and FTO with Diet and Physical Activity on Obesity: New Effects on Alcohol Consumption

    PubMed Central

    Covas, M. Isabel; Carrasco, Paula; Salas-Salvadó, Jordi; Martínez-González, Miguel Ángel; Arós, Fernando; Lapetra, José; Serra-Majem, Lluís; Lamuela-Raventos, Rosa; Gómez-Gracia, Enrique; Fiol, Miquel; Pintó, Xavier; Ros, Emilio; Martí, Amelia; Coltell, Oscar; Ordovás, Jose M.; Estruch, Ramon

    2012-01-01

    Background Fat mass and obesity (FTO) and melanocortin-4 receptor (MC4R) and are relevant genes associated with obesity. This could be through food intake, but results are contradictory. Modulation by diet or other lifestyle factors is also not well understood. Objective To investigate whether MC4R and FTO associations with body-weight are modulated by diet and physical activity (PA), and to study their association with alcohol and food intake. Methods Adherence to Mediterranean diet (AdMedDiet) and physical activity (PA) were assessed by validated questionnaires in 7,052 high cardiovascular risk subjects. MC4R rs17782313 and FTO rs9939609 were determined. Independent and joint associations (aggregate genetic score) as well as statistical and biological gene-lifestyle interactions were analyzed. Results FTO rs9939609 was associated with higher body mass index (BMI), waist circumference (WC) and obesity (P<0.05 for all). A similar, but not significant trend was found for MC4R rs17782313. Their additive effects (aggregate score) were significant and we observed a 7% per-allele increase of being obese (OR = 1.07; 95%CI 1.01–1.13). We found relevant statistical interactions (P<0.05) with PA. So, in active individuals, the associations with higher BMI, WC or obesity were not detected. A biological (non-statistical) interaction between AdMedDiet and rs9939609 and the aggregate score was found. Greater AdMedDiet in individuals carrying 4 or 3-risk alleles counterbalanced their genetic predisposition, exhibiting similar BMI (P = 0.502) than individuals with no risk alleles and lower AdMedDiet. They also had lower BMI (P = 0.021) than their counterparts with low AdMedDiet. We did not find any consistent association with energy or macronutrients, but found a novel association between these polymorphisms and lower alcohol consumption in variant-allele carriers (B+/−SE: −0.57+/−0.16 g/d per-score-allele; P = 0.001). Conclusion Statistical and biological

  8. Cosmic statistics of statistics

    NASA Astrophysics Data System (ADS)

    Szapudi, István; Colombi, Stéphane; Bernardeau, Francis

    1999-12-01

    The errors on statistics measured in finite galaxy catalogues are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly non-linear to weakly non-linear scales. For non-linear functions of unbiased estimators, such as the cumulants, the phenomenon of cosmic bias is identified and computed. Since it is subdued by the cosmic errors in the range of applicability of the theory, correction for it is inconsequential. In addition, the method of Colombi, Szapudi & Szalay concerning sampling effects is generalized, adapting the theory for inhomogeneous galaxy catalogues. While previous work focused on the variance only, the present article calculates the cross-correlations between moments and connected moments as well for a statistically complete description. The final analytic formulae representing the full theory are explicit but somewhat complicated. Therefore we have made available a fortran program capable of calculating the described quantities numerically (for further details e-mail SC at colombi@iap.fr). An important special case is the evaluation of the errors on the two-point correlation function, for which this should be more accurate than any method put forward previously. This tool will be immensely useful in the future for assessing the precision of measurements from existing catalogues, as well as aiding the design of new galaxy surveys. To illustrate the applicability of the results and to explore the numerical aspects of the theory qualitatively and quantitatively, the errors and cross-correlations are predicted under a wide range of assumptions for the future Sloan Digital Sky Survey. The principal results concerning the cumulants ξ, Q3 and Q4 is that

  9. Sequencing-based gene network analysis provides a core set of gene resource for understanding thermal adaptation in Zhikong scallop Chlamys farreri.

    PubMed

    Fu, X; Sun, Y; Wang, J; Xing, Q; Zou, J; Li, R; Wang, Z; Wang, S; Hu, X; Zhang, L; Bao, Z

    2014-01-01

    Marine organisms are commonly exposed to variable environmental conditions, and many of them are under threat from increased sea temperatures caused by global climate change. Generating transcriptomic resources under different stress conditions are crucial for understanding molecular mechanisms underlying thermal adaptation. In this study, we conducted transcriptome-wide gene expression profiling of the scallop Chlamys farreri challenged by acute and chronic heat stress. Of the 13 953 unique tags, more than 850 were significantly differentially expressed at each time point after acute heat stress, which was more than the number of tags differentially expressed (320-350) under chronic heat stress. To obtain a systemic view of gene expression alterations during thermal stress, a weighted gene coexpression network was constructed. Six modules were identified as acute heat stress-responsive modules. Among them, four modules involved in apoptosis regulation, mRNA binding, mitochondrial envelope formation and oxidation reduction were downregulated. The remaining two modules were upregulated. One was enriched with chaperone and the other with microsatellite sequences, whose coexpression may originate from a transcription factor binding site. These results indicated that C. farreri triggered several cellular processes to acclimate to elevated temperature. No modules responded to chronic heat stress, suggesting that the scallops might have acclimated to elevated temperature within 3 days. This study represents the first sequencing-based gene network analysis in a nonmodel aquatic species and provides valuable gene resources for the study of thermal adaptation, which should assist in the development of heat-tolerant scallop lines for aquaculture.

  10. Saccharomyces cerevisiae Set1p is a methyltransferase specific for lysine 4 of histone H3 and is required for efficient gene expression.

    PubMed

    Boa, Simon; Coert, Claudette; Patterton, Hugh-G

    2003-07-15

    Several homologues of the Drosophila Su(var)3-9 protein were recently reported to methylate lysine 9 of histone H3. Whereas this methylation signal served to recruit heterochromatin-associated proteins to transcriptionally silenced regions, histone H3 methylated at lysine 4 was associated with transcriptionally active areas of the genome. These findings suggested that the interplay between lysine 4 and 9 methylation is crucial in eukaryotic gene regulation. Here we provide evidence that Saccharomyces cerevisiae Set1p is a methyltransferase specific for lysine 4 of histone H3. In addition, we show that the absence of Set1p and lysine 4 methylation result in decreased transcription of approximately 80% of the genes in S. cerevisiae. Hierarchical clustering analysis of the set1(-) expression profile revealed a correspondence to that of a mad2(-) strain, suggesting that the transcriptional defect in the set1(-) strain may be due to changes in chromatin structure. These findings establish a central role for methylation of histone H3 lysine 4 in transcriptional regulation.

  11. Genetic variations in the CLNK gene and ZNF518B gene are associated with gout in case-control sample sets.

    PubMed

    Jin, Tian-Bo; Ren, Yongchao; Shi, Xugang; Jiri, Mutu; He, Na; Feng, Tian; Yuan, Dongya; Kang, Longli

    2015-07-01

    A genome-wide association study of gout in European populations identified 12 genetic variants strongly associated with risk of gout, but it is unknown whether these variants are also associated with gout risk in Chinese populations. A total of 145 patients with gout and 310 healthy control patients were recruited for a case-control association study. Twelve SNPs of CLNK and ZNF518B gene were genotyped, and association analysis was performed. Odds ratios (ORs) with 95 % confidence intervals (CIs) were used to assess the association. Overall, we found four risk alleles for gout in patients: the allele "G" of rs2041215 and rs1686947 in the CLNK gene by dominant model (OR 1.66; 95 % CI 1.04-2.63; p = 0.031) (OR 2.19; 95 % CI 1.38-3.46; p = 0.001) and additive model (OR 1.39; 95 % CI 1.00-1.93; p = 0.049) (OR 1.67; 95 % CI 1.19-2.32; p = 0.003), respectively, and the allele "A" of rs10938799 and rs10016022 in ZNF518B gene by recessive model (OR 4.66; 95 % CI 1.44-15.09; p = 0.008) (OR 4.54; 95 % CI 1.23-16.76; p = 0.020). Further haplotype analysis showed that the TCATTCTGA haplotype of CLNK was more frequent among patients with gout (adjusted OR 0.48; 95 % CI 0.24-0.95; p = 0.036). Additionally, polymorphisms of rs2041215, rs10938799, and rs17467273 were also correlated with clinical pathological parameters. This study provides evidence for gout susceptibility genes, CLNK and ZNF518B, in a Chinese population, which may have potential as diagnostic and prognostic marker for gout patients.

  12. Identification of Genes with Potential Roles in Apple Fruit Development and Biochemistry through Large-Scale Statistical Analysis of Expressed Sequence Tags1[W

    PubMed Central

    Park, Sunchung; Sugimoto, Nobuko; Larson, Matthew D.; Beaudry, Randy; van Nocker, Steven

    2006-01-01

    Advanced studies of apple (Malus domestica Borkh) development, physiology, and biochemistry have been hampered by the lack of appropriate genomics tools. One exception is the recent acquisition of extensive expressed sequence tag (EST) data. The entire available EST dataset for apple resulted from the efforts of at least 20 contributors and was derived from more than 70 cDNA libraries representing diverse transcriptional profiles from a variety of organs, fruit parts, developmental stages, biotic and abiotic stresses, and from at least nine cultivars. We analyzed apple EST sequences available in public databanks using statistical algorithms to identify those apple genes that are likely to be highly expressed in fruit, expressed uniquely or preferentially in fruit, and/or temporally or spatially regulated during fruit growth and development. We applied these results to the analysis of biochemical pathways involved in biosynthesis of precursors for volatile esters and identified a subset of apple genes that may participate in generating flavor and aroma components found in mature fruit. PMID:16825339

  13. Gene Set-Based Functionome Analysis of Pathogenesis in Epithelial Ovarian Serous Carcinoma and the Molecular Features in Different FIGO Stages

    PubMed Central

    Chang, Chia-Ming; Chuang, Chi-Mu; Wang, Mong-Lien; Yang, Ming-Jie; Chang, Cheng-Chang; Yen, Ming-Shyen; Chiou, Shih-Hwa

    2016-01-01

    Serous carcinoma (SC) is the most common subtype of epithelial ovarian carcinoma and is divided into four stages by the Federation of Gynecologists and Obstetrics (FIGO) staging system. Currently, the molecular functions and biological processes of SC at different FIGO stages have not been quantified. Here, we conducted a whole-genome integrative analysis to investigate the functions of SC at different stages. The function, as defined by the GO term or canonical pathway gene set, was quantified by measuring the changes in the gene expressional order between cancerous and normal control states. The quantified function, i.e., the gene set regularity (GSR) index, was utilized to investigate the pathogenesis and functional regulation of SC at different FIGO stages. We showed that the informativeness of the GSR indices was sufficient for accurate pattern recognition and classification for machine learning. The function regularity presented by the GSR indices showed stepwise deterioration during SC progression from FIGO stage I to stage IV. The pathogenesis of SC was centered on cell cycle deregulation and accompanied with multiple functional aberrations as well as their interactions. PMID:27275818

  14. The first set of expressed sequence tags (EST) from the medicinal mushroom Agaricus subrufescens delivers resource for gene discovery and marker development.

    PubMed

    Foulongne-Oriol, Marie; Lapalu, Nicolas; Férandon, Cyril; Spataro, Cathy; Ferrer, Nathalie; Amselem, Joelle; Savoie, Jean-Michel

    2014-09-01

    Agaricus subrufescens is one of the most important culinary-medicinal cultivable mushrooms with potentially high-added-value products and extended agronomical valorization. The development of A. subrufescens-related technologies is hampered by, among others, the lack of suitable molecular tools. Thus, this mushroom is considered as a genomic orphan species with a very limited number of available molecular markers or sequences. To fill this gap, this study reports the generation and analysis of the first set of expressed sequence tags (EST) for A. subrufescens. cDNA fragments obtained from young sporophores (SP) and vegetative mycelium in liquid culture (CL) were sequenced using 454 pyrosequencing technology. After assembly process, 4,989 and 5,125 sequences were obtained in SP and CL libraries, respectively. About 87% of the EST had significant similarity with Agaricus bisporus-predicted proteins, and 79% correspond to known proteins. Functional categorization according to Gene Ontology could be assigned to 49% of the sequences. Some gene families potentially involved in bioactive compound biosynthesis could be identified. A total of 232 simple sequence repeats (SSRs) were identified, and a set of 40 EST-SSR polymorphic markers were successfully developed. This EST dataset provides a new resource for gene discovery and molecular marker development. It constitutes a solid basis for further genetic and genomic studies in A. subrufescens.

  15. Set of Classical PCRs for Detection of Mutations in Candida glabrata FKS Genes Linked with Echinocandin Resistance

    PubMed Central

    Dudiuk, Catiana; Gamarra, Soledad; Leonardeli, Florencia; Jimenez-Ortigosa, Cristina; Vitale, Roxana G.; Afeltra, Javier; Perlin, David S.

    2014-01-01

    Clinical echinocandin resistance among Candida glabrata strains is increasing, especially in the United States. Antifungal susceptibility testing is considered mandatory to guide therapeutic decisions. However, these methodologies are not routinely performed in the hospital setting due to their complexity and the time needed to obtain reliable results. Echinocandin failure in C. glabrata is linked exclusively to Fks1p and Fks2p amino acid substitutions, and detection of such substitutions would serve as a surrogate marker to identify resistant isolates. In this work, we report an inexpensive, simple, and quick classical PCR set able to objectively detect the most common mechanisms of echinocandin resistance in C. glabrata within 4 h. The usefulness of this assay was assessed using a blind collection of 50 C. glabrata strains, including 16 FKS1 and/or FKS2 mutants. PMID:24829248

  16. Statistical Inference at Work: Statistical Process Control as an Example

    ERIC Educational Resources Information Center

    Bakker, Arthur; Kent, Phillip; Derry, Jan; Noss, Richard; Hoyles, Celia

    2008-01-01

    To characterise statistical inference in the workplace this paper compares a prototypical type of statistical inference at work, statistical process control (SPC), with a type of statistical inference that is better known in educational settings, hypothesis testing. Although there are some similarities between the reasoning structure involved in…

  17. The Drosophila ash1 gene product, which is localized at specific sites on polytene chromosomes, contains a SET domain and a PHD finger

    SciTech Connect

    Tripoulas, N.; LaJeunesse, D.; Gildea, J.

    1996-06-01

    The determined state of Drosophila imaginal discs depends on stable patterns of homeotic gene expression. The stability of these patterns requires the function of the ash1 gene, a member of the trithorax group. The primary translation product of the 7.5-kb ash1 transcript is predicted to be a basic protein of 2144 amino acids. The ASH1 protein contains a SET domain and a PHD finger. Both of these motifs are found in the products of some trithorax group and Polycomb group genes. We have determined the nucleotide sequence alterations in 10 ash1 mutant alleles and have examined their mutant phenotype. The best candidate for a null allele is ash1{sup 22}. The truncated protein product of this mutant allele is predicted to contain only 47 amino acids. The ASH1 protein is localized on polytene chromosomes of larval salivary glands at >100 sites. The chromosomal localization of ASH1 implies that it functions at the transcriptional level to maintain the expression pattern of homeotic selector genes. 54 refs., 10 figs., 3 tabs.

  18. Association Between Single-Nucleotide Polymorphisms in Hormone Metabolism and DNA Repair Genes and Epithelial Ovarian Cancer: Results from Two Australian Studies and an Additional Validation Set

    PubMed Central

    Beesley, Jonathan; Jordan, Susan J.; Spurdle, Amanda B.; Song, Honglin; Ramus, Susan J.; Kjaer, Suzanne Kruger; Hogdall, Estrid; DiCioccio, Richard A.; McGuire, Valerie; Whittemore, Alice S.; Gayther, Simon A.; Pharoah, Paul D.P.; Webb, Penelope M.; Chenevix-Trench, Georgia

    2009-01-01

    Although some high-risk ovarian cancer genes have been identified, it is likely that common low penetrance alleles exist that confer some increase in ovarian cancer risk. We have genotyped nine putative functional single-nucleotide polymorphisms (SNP) in genes involved in steroid hormone synthesis (SRD5A2, CYP19A1, HSB17B1, and HSD17B4) and DNA repair (XRCC2, XRCC3, BRCA2, and RAD52) using two Australian ovarian cancer case-control studies, comprising a total of 1,466 cases and 1,821 controls of Caucasian origin. Genotype frequencies in cases and controls were compared using logistic regression. The only SNP we found to be associated with ovarian cancer risk in both of these two studies was SRD5A2 V89L (rs523349), which showed a significant trend of increasing risk per rare allele (P = 0.00002). We then genotyped another SNP in this gene (rs632148; r2 = 0.945 with V89L) in an attempt to validate this finding in an independent set of 1,479 cases and 2,452 controls from United Kingdom, United States, and Denmark. There was no association between rs632148 and ovarian cancer risk in the validation samples, and overall, there was no significant heterogeneity between the results of the five studies. Further analyses of SNPs in this gene are therefore warranted to determine whether SRD5A2 plays a role in ovarian cancer predisposition. PMID:18086758

  19. The Drosophila Ash1 Gene Product, Which Is Localized at Specific Sites on Polytene Chromosomes, Contains a Set Domain and a Phd Finger

    PubMed Central

    Tripoulas, N.; LaJeunesse, D.; Gildea, J.; Shearn, A.

    1996-01-01

    The determined state of Drosophila imaginal discs depends on stable patterns of homeotic gene expression. The stability of these patterns requires the function of the ash1 gene, a member of the trithorax group. The primary translation product of the 7.5-kb ash1 transcript is predicted to be a basic protein of 2144 amino acids. The ASH1 protein contains a SET domain and a PHD finger. Both of these motifs are found in the products of some trithorax group and Polycomb group genes. We have determined the nucleotide sequence alterations in 10 ash1 mutant alleles and have examined their mutant phenotype. The best candidate for a null allele is ash1(22). The truncated protein product of this mutant allele is predicted to contain only 47 amino acids. The ASH1 protein is localized on polytene chromosomes of larval salivary glands at >100 sites. The chromosomal localization of ASH1 implies that it functions at the transcriptional level to maintain the expression pattern of homeotic selector genes. PMID:8725238

  20. A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification.

    PubMed

    Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

    2015-01-01

    Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions.

  1. Silencing BMI1 eliminates tumor formation of pediatric glioma CD133+ cells not by affecting known targets but by down-regulating a novel set of core genes.

    PubMed

    Baxter, Patricia A; Lin, Qi; Mao, Hua; Kogiso, Mari; Zhao, Xiumei; Liu, Zhigang; Huang, Yulun; Voicu, Horatiu; Gurusiddappa, Sivashankarappa; Su, Jack M; Adesina, Adekunle M; Perlaky, Laszlo; Dauser, Robert C; Leung, Hon-chiu Eastwood; Muraszko, Karin M; Heth, Jason A; Fan, Xing; Lau, Ching C; Man, Tsz-Kwong; Chintagumpala, Murali; Li, Xiao-Nan

    2014-01-01

    Clinical outcome of children with malignant glioma remains dismal. Here, we examined the role of over-expressed BMI1, a regulator of stem cell self-renewal, in sustaining tumor formation in pediatric glioma stem cells. Our investigation revealed BMI1 over-expression in 29 of 54 (53.7%) pediatric gliomas, 8 of 8 (100%) patient derived orthotopic xenograft (PDOX) mouse models, and in both CD133+ and CD133- glioma cells. We demonstrated that lentiviral-shRNA mediated silencing of suppressed cell proliferation in vitro in cells derived from 3 independent PDOX models and eliminated tumor-forming capacity of CD133+ and CD133- cells derived from 2 PDOX models in mouse brains. Gene expression profiling showed that most of the molecular targets of BMI1 ablation in CD133+ cells were different from that in CD133- cells. Importantly, we found that silencing BMI1 in CD133+ cells derived from 3 PDOX models did not affect most of the known genes previously associated with the activated BMI1, but modulated a novel set of core genes, including RPS6KA2, ALDH3A2, FMFB, DTL, API5, EIF4G2, KIF5c, LOC650152, C20ORF121, LOC203547, LOC653308, and LOC642489, to mediate the elimination of tumor formation. In summary, we identified the over-expressed BMI1 as a promising therapeutic target for glioma stem cells, and suggest that the signaling pathways associated with activated BMI1 in promoting tumor growth may be different from those induced by silencing BMI1 in blocking tumor formation. These findings highlighted the importance of careful re-analysis of the affected genes following the inhibition of abnormally activated oncogenic pathways to identify determinants that can potentially predict therapeutic efficacy.

  2. Statistical Considerations for Analysis of Microarray Experiments

    PubMed Central

    Owzar, Kouros; Barry, William T.; Jung, Sin-Ho

    2014-01-01

    Microarray technologies enable the simultaneous interrogation of expressions from thousands of genes from a biospecimen sample taken from a patient. This large set of expressions generate a genetic profile of the patient that may be used to identify potential prognostic or predictive genes or genetic models for clinical outcomes. The aim of this article is to provide a broad overview of some of the major statistical considerations for the design and analysis of microarrays experiments conducted as correlative science studies to clinical trials. An emphasis will be placed on how the lack of understanding and improper use of statistical concepts and methods will lead to noise discovery and misinterpretation of experimental results. PMID:22212230

  3. Morbidity statistics

    PubMed Central

    Smith, Alwyn

    1969-01-01

    This paper is based on an analysis of questionnaires sent to the health ministries of Member States of WHO asking for information about the extent, nature, and scope of morbidity statistical information. It is clear that most countries collect some statistics of morbidity and many countries collect extensive data. However, few countries relate their collection to the needs of health administrators for information, and many countries collect statistics principally for publication in annual volumes which may appear anything up to 3 years after the year to which they refer. The desiderata of morbidity statistics may be summarized as reliability, representativeness, and relevance to current health problems. PMID:5306722

  4. Efficient Test and Visualization of Multi-Set Intersections

    PubMed Central

    Wang, Minghui; Zhao, Yongzhong; Zhang, Bin

    2015-01-01

    Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines. PMID:26603754

  5. Efficient Test and Visualization of Multi-Set Intersections

    NASA Astrophysics Data System (ADS)

    Wang, Minghui; Zhao, Yongzhong; Zhang, Bin

    2015-11-01

    Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines.

  6. Broad detection of diverse H5 and H7 hemagglutinin genes of avian influenza viruses by real-time reverse transcription-PCR using primer and probe sets containing mixed bases.

    PubMed

    Tsukamoto, Kenji; Noguchi, Daigo; Suzuki, Koutaro; Shishido, Makiko; Ashizawa, Takayoshi; Kim, Min-Chul; Lee, Youn-Jeong; Tada, Tatsuya

    2010-11-01

    Real-time reverse transcription-PCR (RT-PCR) was developed for broad detection of diverse H5 and H7 genes in Eurasian and American lineages of avian influenza viruses by using primer and probe sets containing mixed bases. Optimal use of the mixed bases enabled us to minimize sequence mismatches and to broaden the gene detection spectrum without decreasing sensitivity.

  7. Estimating F-statistics: A historical view

    PubMed Central

    Weir, Bruce S.

    2013-01-01

    Characterizing the genetic structure of populations is of importance to evolutionary biology, to human disease gene mapping and to forensic science. Sewall Wright introduced a set of “F-statistics” to describe population structure in 1951 and he emphasized that these quantities were ratios of variances. Responding to uncertainty over the best way to estimate F-statistics, Weir and Cockerham published a method-of-moments set of estimators in 1984 (Evolution 38:1358-1370). This paper continues to be widely cited, with over 7,000 citations to date. Some background to the publishing history of the Weir and Ccckerham paper is given here, along with subsequent developments and a discussion of current uses of Wright's F-statistics. PMID:26405363

  8. The minimal gene set member msrA, encoding peptide methionine sulfoxide reductase, is a virulence determinant of the plant pathogen Erwinia chrysanthemi.

    PubMed

    Hassouni, M E; Chambost, J P; Expert, D; Van Gijsegem, F; Barras, F

    1999-02-01

    Peptide methionine sulfoxide reductase (MsrA), which repairs oxidized proteins, is present in most living organisms, and the cognate structural gene belongs to the so-called minimum gene set [Mushegian, A. R. & Koonin, E. V., (1996) Proc. Natl. Acad. Sci. USA 93, 10268-10273]. In this work, we report that MsrA is required for full virulence of the plant pathogen Erwinia chrysanthemi. The following differences were observed between the wild-type and a MsrA- mutant: (i) the MsrA- mutant was more sensitive to oxidative stress; (ii) the MsrA- mutant was less motile on solid surface; (iii) the MsrA- mutant exhibited reduced virulence on chicory leaves; and (iv) no systemic invasion was observed when the MsrA- mutant was inoculated into whole Saintpaulia ionantha plants. These results suggest that plants respond to virulent pathogens by producing active oxygen species, and that enzymes repairing oxidative damage allow virulent pathogens to survive the host environment, thereby supporting the theory that active oxygen species play a key role in plant defense. PMID:9927663

  9. Fine-Scale Linkage Mapping Reveals a Small Set of Candidate Genes Influencing Honey Bee Grooming Behavior in Response to Varroa Mites

    PubMed Central

    Arechavaleta-Velasco, Miguel E.; Alcala-Escamilla, Karla; Robles-Rios, Carlos; Tsuruda, Jennifer M.; Hunt, Greg J.

    2012-01-01

    Populations of honey bees in North America have been experiencing high annual colony mortality for 15–20 years. Many apicultural researchers believe that introduced parasites called Varroa mites (V. destructor) are the most important factor in colony deaths. One important resistance mechanism that limits mite population growth in colonies is the ability of some lines of honey bees to groom mites from their bodies. To search for genes influencing this trait, we used an Illumina Bead Station genotyping array to determine the genotypes of several hundred worker bees at over a thousand single-nucleotide polymorphisms in a family that was apparently segregating for alleles influencing this behavior. Linkage analyses provided a genetic map with 1,313 markers anchored to genome sequence. Genotypes were analyzed for association with grooming behavior, measured as the time that individual bees took to initiate grooming after mites were placed on their thoraces. Quantitative-trait-locus interval mapping identified a single chromosomal region that was significant at the chromosome-wide level (p<0.05) on chromosome 5 with a LOD score of 2.72. The 95% confidence interval for quantitative trait locus location contained only 27 genes (honey bee official gene annotation set 2) including Atlastin, Ataxin and Neurexin-1 (AmNrx1), which have potential neurodevelopmental and behavioral effects. Atlastin and Ataxin homologs are associated with neurological diseases in humans. AmNrx1 codes for a presynaptic protein with many alternatively spliced isoforms. Neurexin-1 influences the growth, maintenance and maturation of synapses in the brain, as well as the type of receptors most prominent within synapses. Neurexin-1 has also been associated with autism spectrum disorder and schizophrenia in humans, and self-grooming behavior in mice. PMID:23133594

  10. The Complete Set of Genes Encoding Major Intrinsic Proteins in Arabidopsis Provides a Framework for a New Nomenclature for Major Intrinsic Proteins in Plants1

    PubMed Central

    Johanson, Urban; Karlsson, Maria; Johansson, Ingela; Gustavsson, Sofia; Sjövall, Sara; Fraysse, Laure; Weig, Alfons R.; Kjellbom, Per

    2001-01-01

    Major intrinsic proteins (MIPs) facilitate the passive transport of small polar molecules across membranes. MIPs constitute a very old family of proteins and different forms have been found in all kinds of living organisms, including bacteria, fungi, animals, and plants. In the genomic sequence of Arabidopsis, we have identified 35 different MIP-encoding genes. Based on sequence similarity, these 35 proteins are divided into four different subfamilies: plasma membrane intrinsic proteins, tonoplast intrinsic proteins, NOD26-like intrinsic proteins also called NOD26-like MIPs, and the recently discovered small basic intrinsic proteins. In Arabidopsis, there are 13 plasma membrane intrinsic proteins, 10 tonoplast intrinsic proteins, nine NOD26-like intrinsic proteins, and three small basic intrinsic proteins. The gene structure in general is conserved within each subfamily, although there is a tendency to lose introns. Based on phylogenetic comparisons of maize (Zea mays) and Arabidopsis MIPs (AtMIPs), it is argued that the general intron patterns in the subfamilies were formed before the split of monocotyledons and dicotyledons. Although the gene structure is unique for each subfamily, there is a common pattern in how transmembrane helices are encoded on the exons in three of the subfamilies. The nomenclature for plant MIPs varies widely between different species but also between subfamilies in the same species. Based on the phylogeny of all AtMIPs, a new and more consistent nomenclature is proposed. The complete set of AtMIPs, together with the new nomenclature, will facilitate the isolation, classification, and labeling of plant MIPs from other species. PMID:11500536

  11. A Molecular Approach to Nested RT-PCR Using a New Set of Primers for the Detection of the Human Immunodeficiency Virus Protease Gene

    PubMed Central

    Zarei, Mohammad; Ravanshad, Mehrdad; Bagban, Ashraf; Fallahi, Shahab

    2016-01-01

    Background The human immunodeficiency virus (HIV-1) is the etiologic agent of AIDS. The disease can be transmitted via blood in the window period prior to the development of antibodies to the disease. Thus, an appropriate method for the detection of HIV-1 during this window period is very important. Objectives This descriptive study proposes a sensitive, efficient, inexpensive, and easy method to detect HIV-1. Patients and Methods In this study 25 serum samples of patients under treatment and also 10 positive and 10 negative control samples were studied. Twenty-five blood samples were obtained from HIV-1-infected individuals who were receiving treatment at the acquired immune deficiency syndrome (AIDS) research center of Imam Khomeini hospital in Tehran. The identification of HIV-1-positive samples was done by using reverse transcription to produce copy deoxyribonucleic acid (cDNA) and then optimizing the nested polymerase chain reaction (PCR) method. Two pairs of primers were then designed specifically for the protease gene fragment of the nested real time-PCR (RT-PCR) samples. Electrophoresis was used to examine the PCR products. The results were analyzed using statistical tests, including Fisher’s exact test, and SPSS17 software. Results The 325 bp band of the protease gene was observed in all the positive control samples and in none of the negative control samples. The proposed method correctly identified HIV-1 in 23 of the 25 samples. Conclusions These results suggest that, in comparison with viral cultures, antibody detection by enzyme linked immunosorbent assay (ELISAs), and conventional PCR methods, the proposed method has high sensitivity and specificity for the detection of HIV-1. PMID:27679699

  12. A Molecular Approach to Nested RT-PCR Using a New Set of Primers for the Detection of the Human Immunodeficiency Virus Protease Gene

    PubMed Central

    Zarei, Mohammad; Ravanshad, Mehrdad; Bagban, Ashraf; Fallahi, Shahab

    2016-01-01

    Background The human immunodeficiency virus (HIV-1) is the etiologic agent of AIDS. The disease can be transmitted via blood in the window period prior to the development of antibodies to the disease. Thus, an appropriate method for the detection of HIV-1 during this window period is very important. Objectives This descriptive study proposes a sensitive, efficient, inexpensive, and easy method to detect HIV-1. Patients and Methods In this study 25 serum samples of patients under treatment and also 10 positive and 10 negative control samples were studied. Twenty-five blood samples were obtained from HIV-1-infected individuals who were receiving treatment at the acquired immune deficiency syndrome (AIDS) research center of Imam Khomeini hospital in Tehran. The identification of HIV-1-positive samples was done by using reverse transcription to produce copy deoxyribonucleic acid (cDNA) and then optimizing the nested polymerase chain reaction (PCR) method. Two pairs of primers were then designed specifically for the protease gene fragment of the nested real time-PCR (RT-PCR) samples. Electrophoresis was used to examine the PCR products. The results were analyzed using statistical tests, including Fisher’s exact test, and SPSS17 software. Results The 325 bp band of the protease gene was observed in all the positive control samples and in none of the negative control samples. The proposed method correctly identified HIV-1 in 23 of the 25 samples. Conclusions These results suggest that, in comparison with viral cultures, antibody detection by enzyme linked immunosorbent assay (ELISAs), and conventional PCR methods, the proposed method has high sensitivity and specificity for the detection of HIV-1.

  13. A comparison of two 16S rRNA gene-based PCR primer sets in unraveling anammox bacteria from different environmental samples.

    PubMed

    Han, Ping; Huang, Yu-Tzu; Lin, Jih-Gaw; Gu, Ji-Dong

    2013-12-01

    Two 16S rRNA gene-based PCR primer sets (Brod541F/Amx820R and A438f/A684r) for detecting anammox bacteria were compared using sediments from Mai Po wetlands (MP), the South China Sea (SCS), a freshwater reservoir (R2), and sludge granules from a wastewater treatment plant (A2). By comparing their ability in profiling anammox bacteria, the recovered diversity, community structure, and abundance of anammox bacteria among all these diverse samples indicated that A438f/A684r performed better than Brod541F/Amx820R in retrieving anammox bacteria from these different environmental samples. Five Scalindua subclusters (zhenghei-I, SCS-I, SCS-III, arabica, and brodae) dominated in SCS whereas two Scalindua subclusters (zhenghei-II and wagneri) and one cluster of Kuenenia dominated in MP. R2 showed a higher diversity of anammox bacteria with two new retrieved clusters (R2-New-1 and R2-New-2), which deserves further detailed study. The dominance of Brocadia in sample A2 was supported by both of the primer sets used. Results collectively indicate strongly niche-specific community structures of anammox bacteria in different environments, and A438f/A684r is highly recommended for screening anammox bacteria from various environments when dealing with a collection of samples with diverse physiochemical characteristics.

  14. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  15. Statistics: A Brief Overview

    PubMed Central

    Winters, Ryan; Winters, Andrew; Amedee, Ronald G.

    2010-01-01

    The Accreditation Council for Graduate Medical Education sets forth a number of required educational topics that must be addressed in residency and fellowship programs. We sought to provide a primer on some of the important basic statistical concepts to consider when examining the medical literature. It is not essential to understand the exact workings and methodology of every statistical test encountered, but it is necessary to understand selected concepts such as parametric and nonparametric tests, correlation, and numerical versus categorical data. This working knowledge will allow you to spot obvious irregularities in statistical analyses that you encounter. PMID:21603381

  16. High Nasal Carriage Rate of Staphylococcus aureus Containing Panton-Valentine leukocidin- and EDIN-Encoding Genes in Community and Hospital Settings in Burkina Faso

    PubMed Central

    Ouedraogo, Abdoul-Salam; Dunyach-Remy, Catherine; Kissou, Aimée; Sanou, Soufiane; Poda, Armel; Kyelem, Carole G.; Solassol, Jérôme; Bañuls, Anne-Laure; Van De Perre, Philippe; Ouédraogo, Rasmata; Jean-Pierre, Hélène; Lavigne, Jean-Philippe; Godreuil, Sylvain

    2016-01-01

    The objectives of the present study were to investigate the rate of S.aureus nasal carriage and molecular characteristics in hospital and community settings in Bobo Dioulasso, Burkina Faso. Nasal samples (n = 219) were collected from 116 healthy volunteers and 103 hospitalized patients in July and August 2014. Samples were first screened using CHROMagar Staph aureus chromogenic agar plates, and S. aureus strains were identified by mass spectrometry. Antibiotic susceptibility was tested using the disk diffusion method on Müller-Hinton agar. All S. aureus isolates were genotyped using DNA microarray. Overall, the rate of S. aureus nasal carriage was 32.9% (72/219) with 29% in healthy volunteers and 37% in hospital patients. Among the S. aureus isolates, only four methicillin-resistant S. aureus (MRSA) strains were identified and all in hospital patients (3.9%). The 72 S. aureus isolates from nasal samples belonged to 16 different clonal complexes, particularly to CC 152-MSSA (22 clones) and CC1-MSSA (nine clones). Two clones were significantly associated with community settings: CC1-MSSA and CC45-MSSA. The MRSA strains belonged to the ST88-MRSA-IV or the CC8-MRSA-V complex. A very high prevalence of toxinogenic strains 52.2% (36/69), containing Panton-Valentine leucocidin- and EDIN-encoding genes, was identified among the S. aureus isolates in community and hospital settings. This study provides the first characterization of S. aureus clones and their genetic characteristics in Burkina Faso. Altogether, it highlights the low prevalence of antimicrobial resistance, high diversity of methicillin-sensitive S. aureus clones and high frequency of toxinogenic S. aureus strains. PMID:27679613

  17. High Nasal Carriage Rate of Staphylococcus aureus Containing Panton-Valentine leukocidin- and EDIN-Encoding Genes in Community and Hospital Settings in Burkina Faso.

    PubMed

    Ouedraogo, Abdoul-Salam; Dunyach-Remy, Catherine; Kissou, Aimée; Sanou, Soufiane; Poda, Armel; Kyelem, Carole G; Solassol, Jérôme; Bañuls, Anne-Laure; Van De Perre, Philippe; Ouédraogo, Rasmata; Jean-Pierre, Hélène; Lavigne, Jean-Philippe; Godreuil, Sylvain

    2016-01-01

    The objectives of the present study were to investigate the rate of S.aureus nasal carriage and molecular characteristics in hospital and community settings in Bobo Dioulasso, Burkina Faso. Nasal samples (n = 219) were collected from 116 healthy volunteers and 103 hospitalized patients in July and August 2014. Samples were first screened using CHROMagar Staph aureus chromogenic agar plates, and S. aureus strains were identified by mass spectrometry. Antibiotic susceptibility was tested using the disk diffusion method on Müller-Hinton agar. All S. aureus isolates were genotyped using DNA microarray. Overall, the rate of S. aureus nasal carriage was 32.9% (72/219) with 29% in healthy volunteers and 37% in hospital patients. Among the S. aureus isolates, only four methicillin-resistant S. aureus (MRSA) strains were identified and all in hospital patients (3.9%). The 72 S. aureus isolates from nasal samples belonged to 16 different clonal complexes, particularly to CC 152-MSSA (22 clones) and CC1-MSSA (nine clones). Two clones were significantly associated with community settings: CC1-MSSA and CC45-MSSA. The MRSA strains belonged to the ST88-MRSA-IV or the CC8-MRSA-V complex. A very high prevalence of toxinogenic strains 52.2% (36/69), containing Panton-Valentine leucocidin- and EDIN-encoding genes, was identified among the S. aureus isolates in community and hospital settings. This study provides the first characterization of S. aureus clones and their genetic characteristics in Burkina Faso. Altogether, it highlights the low prevalence of antimicrobial resistance, high diversity of methicillin-sensitive S. aureus clones and high frequency of toxinogenic S. aureus strains. PMID:27679613

  18. High Nasal Carriage Rate of Staphylococcus aureus Containing Panton-Valentine leukocidin- and EDIN-Encoding Genes in Community and Hospital Settings in Burkina Faso

    PubMed Central

    Ouedraogo, Abdoul-Salam; Dunyach-Remy, Catherine; Kissou, Aimée; Sanou, Soufiane; Poda, Armel; Kyelem, Carole G.; Solassol, Jérôme; Bañuls, Anne-Laure; Van De Perre, Philippe; Ouédraogo, Rasmata; Jean-Pierre, Hélène; Lavigne, Jean-Philippe; Godreuil, Sylvain

    2016-01-01

    The objectives of the present study were to investigate the rate of S.aureus nasal carriage and molecular characteristics in hospital and community settings in Bobo Dioulasso, Burkina Faso. Nasal samples (n = 219) were collected from 116 healthy volunteers and 103 hospitalized patients in July and August 2014. Samples were first screened using CHROMagar Staph aureus chromogenic agar plates, and S. aureus strains were identified by mass spectrometry. Antibiotic susceptibility was tested using the disk diffusion method on Müller-Hinton agar. All S. aureus isolates were genotyped using DNA microarray. Overall, the rate of S. aureus nasal carriage was 32.9% (72/219) with 29% in healthy volunteers and 37% in hospital patients. Among the S. aureus isolates, only four methicillin-resistant S. aureus (MRSA) strains were identified and all in hospital patients (3.9%). The 72 S. aureus isolates from nasal samples belonged to 16 different clonal complexes, particularly to CC 152-MSSA (22 clones) and CC1-MSSA (nine clones). Two clones were significantly associated with community settings: CC1-MSSA and CC45-MSSA. The MRSA strains belonged to the ST88-MRSA-IV or the CC8-MRSA-V complex. A very high prevalence of toxinogenic strains 52.2% (36/69), containing Panton-Valentine leucocidin- and EDIN-encoding genes, was identified among the S. aureus isolates in community and hospital settings. This study provides the first characterization of S. aureus clones and their genetic characteristics in Burkina Faso. Altogether, it highlights the low prevalence of antimicrobial resistance, high diversity of methicillin-sensitive S. aureus clones and high frequency of toxinogenic S. aureus strains.

  19. High Nasal Carriage Rate of Staphylococcus aureus Containing Panton-Valentine leukocidin- and EDIN-Encoding Genes in Community and Hospital Settings in Burkina Faso.

    PubMed

    Ouedraogo, Abdoul-Salam; Dunyach-Remy, Catherine; Kissou, Aimée; Sanou, Soufiane; Poda, Armel; Kyelem, Carole G; Solassol, Jérôme; Bañuls, Anne-Laure; Van De Perre, Philippe; Ouédraogo, Rasmata; Jean-Pierre, Hélène; Lavigne, Jean-Philippe; Godreuil, Sylvain

    2016-01-01

    The objectives of the present study were to investigate the rate of S.aureus nasal carriage and molecular characteristics in hospital and community settings in Bobo Dioulasso, Burkina Faso. Nasal samples (n = 219) were collected from 116 healthy volunteers and 103 hospitalized patients in July and August 2014. Samples were first screened using CHROMagar Staph aureus chromogenic agar plates, and S. aureus strains were identified by mass spectrometry. Antibiotic susceptibility was tested using the disk diffusion method on Müller-Hinton agar. All S. aureus isolates were genotyped using DNA microarray. Overall, the rate of S. aureus nasal carriage was 32.9% (72/219) with 29% in healthy volunteers and 37% in hospital patients. Among the S. aureus isolates, only four methicillin-resistant S. aureus (MRSA) strains were identified and all in hospital patients (3.9%). The 72 S. aureus isolates from nasal samples belonged to 16 different clonal complexes, particularly to CC 152-MSSA (22 clones) and CC1-MSSA (nine clones). Two clones were significantly associated with community settings: CC1-MSSA and CC45-MSSA. The MRSA strains belonged to the ST88-MRSA-IV or the CC8-MRSA-V complex. A very high prevalence of toxinogenic strains 52.2% (36/69), containing Panton-Valentine leucocidin- and EDIN-encoding genes, was identified among the S. aureus isolates in community and hospital settings. This study provides the first characterization of S. aureus clones and their genetic characteristics in Burkina Faso. Altogether, it highlights the low prevalence of antimicrobial resistance, high diversity of methicillin-sensitive S. aureus clones and high frequency of toxinogenic S. aureus strains.

  20. SEER Statistics

    Cancer.gov

    The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.

  1. Cancer Statistics

    MedlinePlus

    ... cancer statistics across the world. U.S. Cancer Mortality Trends The best indicator of progress against cancer is ... the number of cancer survivors has increased. These trends show that progress is being made against the ...

  2. Statistical Physics

    NASA Astrophysics Data System (ADS)

    Hermann, Claudine

    Statistical Physics bridges the properties of a macroscopic system and the microscopic behavior of its constituting particles, otherwise impossible due to the giant magnitude of Avogadro's number. Numerous systems of today's key technologies - such as semiconductors or lasers - are macroscopic quantum objects; only statistical physics allows for understanding their fundamentals. Therefore, this graduate text also focuses on particular applications such as the properties of electrons in solids with applications, and radiation thermodynamics and the greenhouse effect.

  3. UpSet: Visualization of Intersecting Sets

    PubMed Central

    Lex, Alexander; Gehlenborg, Nils; Strobelt, Hendrik; Vuillemot, Romain; Pfister, Hanspeter

    2016-01-01

    Understanding relationships between sets is an important analysis task that has received widespread attention in the visualization community. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. In this paper we introduce UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections. UpSet is focused on creating task-driven aggregates, communicating the size and properties of aggregates and intersections, and a duality between the visualization of the elements in a dataset and their set membership. UpSet visualizes set intersections in a matrix layout and introduces aggregates based on groupings and queries. The matrix layout enables the effective representation of associated data, such as the number of elements in the aggregates and intersections, as well as additional summary statistics derived from subset or element attributes. Sorting according to various measures enables a task-driven analysis of relevant intersections and aggregates. The elements represented in the sets and their associated attributes are visualized in a separate view. Queries based on containment in specific intersections, aggregates or driven by attribute filters are propagated between both views. We also introduce several advanced visual encodings and interaction methods to overcome the problems of varying scales and to address scalability. UpSet is web-based and open source. We demonstrate its general utility in multiple use cases from various domains. PMID:26356912

  4. Statistical Physics of Fields

    NASA Astrophysics Data System (ADS)

    Kardar, Mehran

    2006-06-01

    While many scientists are familiar with fractals, fewer are familiar with the concepts of scale-invariance and universality which underly the ubiquity of their shapes. These properties may emerge from the collective behaviour of simple fundamental constituents, and are studied using statistical field theories. Based on lectures for a course in statistical mechanics taught by Professor Kardar at Massachusetts Institute of Technology, this textbook demonstrates how such theories are formulated and studied. Perturbation theory, exact solutions, renormalization groups, and other tools are employed to demonstrate the emergence of scale invariance and universality, and the non-equilibrium dynamics of interfaces and directed paths in random media are discussed. Ideal for advanced graduate courses in statistical physics, it contains an integrated set of problems, with solutions to selected problems at the end of the book. A complete set of solutions is available to lecturers on a password protected website at www.cambridge.org/9780521873413. Based on lecture notes from a course on Statistical Mechanics taught by the author at MIT Contains 65 exercises, with solutions to selected problems Features a thorough introduction to the methods of Statistical Field theory Ideal for graduate courses in Statistical Physics

  5. Statistical Physics of Particles

    NASA Astrophysics Data System (ADS)

    Kardar, Mehran

    2006-06-01

    Statistical physics has its origins in attempts to describe the thermal properties of matter in terms of its constituent particles, and has played a fundamental role in the development of quantum mechanics. Based on lectures for a course in statistical mechanics taught by Professor Kardar at Massachusetts Institute of Technology, this textbook introduces the central concepts and tools of statistical physics. It contains a chapter on probability and related issues such as the central limit theorem and information theory, and covers interacting particles, with an extensive description of the van der Waals equation and its derivation by mean field approximation. It also contains an integrated set of problems, with solutions to selected problems at the end of the book. It will be invaluable for graduate and advanced undergraduate courses in statistical physics. A complete set of solutions is available to lecturers on a password protected website at www.cambridge.org/9780521873420. Based on lecture notes from a course on Statistical Mechanics taught by the author at MIT Contains 89 exercises, with solutions to selected problems Contains chapters on probability and interacting particles Ideal for graduate courses in Statistical Mechanics

  6. Statistical Issues in the Design and Analysis of nCounter Projects.

    PubMed

    Jung, Sin-Ho; Sohn, Insuk

    2014-01-01

    Numerous statistical methods have been published for designing and analyzing microarray projects. Traditional genome-wide microarray platforms (such as Affymetrix, Illumina, and DASL) measure the expression level of tens of thousands genes. Since the sets of genes included in these array chips are selected by the manufacturers, the number of genes associated with a specific disease outcome is limited and a large portion of the genes are not associated. nCounter is a new technology by NanoString to measure the expression of a selected number (up to 800) of genes. The list of genes for nCounter chips can be selected by customers. Due to the limited number of genes and the price increase in the number of selected genes, the genes for nCounter chips are carefully selected among those discovered from previous studies, usually using traditional high-throughput platforms, and only a small number of definitely unassociated genes, called control genes, are included to standardize the overall expression level across different chips. Furthermore, nCounter chips measure the expression level of each gene using a counting observation while the traditional high-throughput platforms produce continuous observations. Due to these differences, some statistical methods developed for the design and analysis of high-throughput projects may need modification or may be inappropriate for nCounter projects. In this paper, we discuss statistical methods that can be used for designing and analyzing nCounter projects. PMID:25574131

  7. Activated Glucocorticoid Receptor Interacts with the INHAT Component Set/TAF-Iβ and Releases it from a Glucocorticoid-responsive Gene Promoter, Relieving Repression: Implications for the Pathogenesis of Glucocorticoid Resistance in Acute Undifferentiated Leukemia with Set-Can Translocation

    PubMed Central

    Ichijo, Takamasa; Chrousos, George P.; Kino, Tomoshige

    2008-01-01

    SUMMARY Set/template-activating factor (TAF)-Iβ, part of the Set-Can oncogene product found in acute undifferentiated leukemia, is a component of the inhibitor of acetyltransferases (INHAT) complex. Set/TAF-Iβ interacted with the DNA-binding domain of the glucocorticoid receptor (GR) in yeast two-hybrid screening, and repressed GR-induced transcriptional activity of a chromatin-integrated glucocorticoid-responsive and a natural promoter. Set/TAF-Iβ was co-precipitated with glucocorticoid response elements (GREs) of these promoters in the absence of dexamethasone, while addition of the hormone caused dissociation of Set/TAF-Iβ from and attraction of the p160-type coactivator GRIP1 to the promoter GREs. Set-Can fusion protein, on the other hand, did not interact with GR, was constitutively co-precipitated with GREs and suppressed GRIP1-induced enhancement of GR transcriptional activity and histone acetylation. Thus, Set/TAF-Iβ acts as a ligand-activated GR-responsive transcriptional repressor, while Set-Can does not retain physiologic responsiveness to ligand-bound GR, possibly contributing to the poor responsiveness of Set-Can-harboring leukemic cells to glucocorticoids. PMID:18096310

  8. Ideal statistically quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Savas, Ekrem; Cakalli, Huseyin

    2016-08-01

    An ideal I is a family of subsets of N, the set of positive integers which is closed under taking finite unions and subsets of its elements. A sequence (xk) of real numbers is said to be S(I)-statistically convergent to a real number L, if for each ɛ > 0 and for each δ > 0 the set { n ∈N :1/n | { k ≤n :| xk-L | ≥ɛ } | ≥δ } belongs to I. We introduce S(I)-statistically ward compactness of a subset of R, the set of real numbers, and S(I)-statistically ward continuity of a real function in the senses that a subset E of R is S(I)-statistically ward compact if any sequence of points in E has an S(I)-statistically quasi-Cauchy subsequence, and a real function is S(I)-statistically ward continuous if it preserves S(I)-statistically quasi-Cauchy sequences where a sequence (xk) is called to be S(I)-statistically quasi-Cauchy when (Δxk) is S(I)-statistically convergent to 0. We obtain results related to S(I)-statistically ward continuity, S(I)-statistically ward compactness, Nθ-ward continuity, and slowly oscillating continuity.

  9. Statistical Optics

    NASA Astrophysics Data System (ADS)

    Goodman, Joseph W.

    2000-07-01

    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research

  10. [Statistical materials].

    PubMed

    1986-01-01

    Official population data for the USSR are presented for 1985 and 1986. Part 1 (pp. 65-72) contains data on capitals of union republics and cities with over one million inhabitants, including population estimates for 1986 and vital statistics for 1985. Part 2 (p. 72) presents population estimates by sex and union republic, 1986. Part 3 (pp. 73-6) presents data on population growth, including birth, death, and natural increase rates, 1984-1985; seasonal distribution of births and deaths; birth order; age-specific birth rates in urban and rural areas and by union republic; marriages; age at marriage; and divorces. PMID:12178831

  11. Conditional statistical model building

    NASA Astrophysics Data System (ADS)

    Hansen, Mads Fogtmann; Hansen, Michael Sass; Larsen, Rasmus

    2008-03-01

    We present a new statistical deformation model suited for parameterized grids with different resolutions. Our method models the covariances between multiple grid levels explicitly, and allows for very efficient fitting of the model to data on multiple scales. The model is validated on a data set consisting of 62 annotated MR images of Corpus Callosum. One fifth of the data set was used as a training set, which was non-rigidly registered to each other without a shape prior. From the non-rigidly registered training set a shape prior was constructed by performing principal component analysis on each grid level and using the results to construct a conditional shape model, conditioning the finer parameters with the coarser grid levels. The remaining shapes were registered with the constructed shape prior. The dice measures for the registration without prior and the registration with a prior were 0.875 +/- 0.042 and 0.8615 +/- 0.051, respectively.

  12. Identification and analysis of house-keeping and tissue-specific genes based on RNA-seq data sets across 15 mouse tissues.

    PubMed

    Zeng, Jingyao; Liu, Shoucheng; Zhao, Yuhui; Tan, Xinyu; Aljohi, Hasan Awad; Liu, Wanfei; Hu, Songnian

    2016-01-15

    Recently, RNA-seq has become widely used technology for transcriptome profiling due to its single-base accuracy and high-throughput speciality. In this study, we applied a computational approach on an integrated RNA-seq dataset across 15 normal mouse tissues, and consequently assigned 8408 house-keeping (HK) genes and 2581 tissue-specific (TS) genes among UCSC RefGene annotation. Apart from some basic genomic features, we also performed expression, function and pathway analysis with clustering, DAVID and Ingenuity Pathway Analysis, indicating the physiological connections (tissues) and diverse biological roles of HK genes (fundamental processes) and TS genes (tissue-corresponding processes). Moreover, we used RT-PCR method to test 18 candidate HK genes and finally identified a novel list of highly stable internal control genes: Ywhae, Ddb 1, Eif4h, etc. In summary, this study provides a new HK gene and TS gene resource for further genetic and evolution research and helps us better understand morphogenesis and biological diversity in mouse.

  13. Representational Versatility in Learning Statistics

    ERIC Educational Resources Information Center

    Graham, Alan T.; Thomas, Michael O. J.

    2005-01-01

    Statistical data can be represented in a number of qualitatively different ways, the choice depending on the following three conditions: the concepts to be investigated; the nature of the data; and the purpose for which they were collected. This paper begins by setting out frameworks that describe the nature of statistical thinking in schools, and…

  14. Transcriptome analysis of various flower and silique development stages indicates a set of class III peroxidase genes potentially involved in pod shattering in Arabidopsis thaliana

    PubMed Central

    2010-01-01

    Background Plant class III peroxidases exist as a large multigenic family involved in numerous functions suggesting a functional specialization of each gene. However, few genes have been linked with a specific function. Consequently total peroxidase activity is still used in numerous studies although its relevance is questionable. Transcriptome analysis seems to be a promising tool to overcome the difficulties associated with the study of this family. Nevertheless available microarrays are not completely reliable for this purpose. We therefore used a macroarray dedicated to the 73 class III peroxidase genes of A. thaliana to identify genes potentially involved in flower and fruit development. Results The observed increase of total peroxidase activity during development was actually correlated with the induction of only a few class III peroxidase genes which supports the existence of a functional specialization of these proteins. We identified peroxidase genes that are predominantly expressed in one development stage and are probable components of the complex gene networks involved in the reproductive phase. An attempt has been made to gain insight into plausible functions of these genes by collecting and analyzing the expression data of different studies in plants. Peroxidase activity was additionally observed in situ in the silique dehiscence zone known to be involved in pod shattering. Because treatment with a peroxidase inhibitor delayed pod shattering, we subsequently studied mutants of transcription factors (TF) controlling this mechanism. Three peroxidases genes -AtPrx13, AtPrx30 and AtPrx55- were altered by the TFs involved in pod shatter. Conclusions Our data illustrated the problems caused by linking only an increase in total peroxidase activity to any specific development stage or function. The activity or involvement of specific class III peroxidase genes needs to be assessed. Several genes identified in our study had not been linked to any particular

  15. Analyses of synovial tissues from arthritic and protected congenic rat strains reveal a new core set of genes associated with disease severity

    PubMed Central

    Brenner, Max; Laragione, Teresina

    2013-01-01

    Little is known about the genes regulating disease severity and joint damage in rheumatoid arthritis (RA). In the present study we analyzed the gene expression characteristics of synovial tissues from four different strains congenic for non-MHC loci that develop mild and nonerosive arthritis compared with severe and erosive DA rats. DA.F344(Cia3d), DA.F344(Cia5a), DA.ACI(Cia10), and DA.ACI(Cia25) rats developed mild arthritis compared with DA. We found 685 genes with significantly different expression between congenics and DA, independent of the specific congenic interval, suggesting that these genes represent a new nongenetic core group of mediators of arthritis severity. This core group includes genes not previously implicated or with unclear role in arthritis severity, such as Tnn, Clec4m, and Spond1 among others, increased in DA. The core genes also included Scd1, Selenbp1, and Slc7a10, increased in congenics. Genes implicated in nuclear receptor activity, xenobiotic and lipid metabolism were also increased in the congenics, correlating with protection. Several disease mediators were among the core genes reduced in congenics, including IL-6, IL-17, and Ccl2. Analyses of upstream regulators (genes, pathways, or chemicals) suggested reduced activation of Stat3 and TLR-related genes and chemicals in congenics. Additionally, cigarette smoking was among the upstream regulators activated in DA, while p53 was an upstream regulator activated in congenics. We observed congenic-specific differential expression and detection in each individual strain. In conclusion, this new nongenetically regulated core genes of disease severity or protection in arthritis should provide new insight into critical pathways and potential new environmental risk factor for arthritis. PMID:24046282

  16. 1979 DOE statistical symposium

    SciTech Connect

    Gardiner, D.A.; Truett T.

    1980-09-01

    The 1979 DOE Statistical Symposium was the fifth in the series of annual symposia designed to bring together statisticians and other interested parties who are actively engaged in helping to solve the nation's energy problems. The program included presentations of technical papers centered around exploration and disposal of nuclear fuel, general energy-related topics, and health-related issues, and workshops on model evaluation, risk analysis, analysis of large data sets, and resource estimation.

  17. Statistical and biological gene-lifestyle interactions of MC4R and FTO with diet and physical activity on obesity: new effects on alcohol consumption

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Fat mass and obesity (FTO) and melanocortin-4 receptor (MC4R) and are relevant genes associated with obesity. This could be through food intake, but results are contradictory. Modulation by diet or other lifestyle factors is also not well understood. To investigate whether MC4R and FTO associations ...

  18. Setting Environmental Standards

    ERIC Educational Resources Information Center

    Fishbein, Gershon

    1975-01-01

    Recent court decisions have pointed out the complexities involved in setting environmental standards. Environmental health is composed of multiple causative agents, most of which work over long periods of time. This makes the cause-and-effect relationship between health statistics and environmental contaminant exposures difficult to prove in…

  19. Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility

    PubMed Central

    2014-01-01

    Background Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. Findings To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. Conclusions The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies. PMID:24725556

  20. Two Sets of Paralogous Genes Encode the Enzymes Involved in the Early Stages of Clavulanic Acid and Clavam Metabolite Biosynthesis in Streptomyces clavuligerus

    PubMed Central

    Tahlan, Kapil; Park, Hyeon Ung; Wong, Annie; Beatty, Perrin H.; Jensen, Susan E.

    2004-01-01

    Recently, a second copy of a gene encoding proclavaminate amidinohydrolase (pah1), an enzyme involved in the early stages of clavulanic acid and clavam metabolite biosynthesis in Streptomyces clavuligerus, was identified and isolated. Using Southern analysis, we have now isolated second copies of the genes encoding the carboxyethylarginine synthase (ceaS) and β-lactam synthetase (bls) enzymes. These new paralogues are given the gene designations ceaS1 and bls1 and are located immediately upstream of pah1 on the chromosome. Furthermore, sequence analysis of the region downstream of pah1 revealed a second copy of a gene encoding ornithine acetyltransferase (oat1), thus indicating the presence of a cluster of paralogue genes. ceaS1, bls1, and oat1 display 73, 60, and 63% identities, respectively, at the nucleotide level to the original ceaS2, bls2, and oat2 genes from the clavulanic acid gene cluster. Single mutants defective in ceaS1, bls1, or oat1 were prepared and characterized and were found to be affected to variable degrees in their ability to produce clavulanic acid and clavam metabolites. Double mutants defective in both copies of the genes were also prepared and tested. The ceaS1/ceaS2 and the bls1/bls2 mutant strains were completely blocked in clavulanic acid and clavam metabolite biosynthesis. On the other hand, oat1/oat2 double mutants still produced some clavulanic acid and clavam metabolites. This may be attributed to the presence of the argJ gene in S. clavuligerus, which encodes yet another ornithine acetyltransferase enzyme that may be able to compensate for the lack of OAT1 and -2 in the double mutants. PMID:14982786

  1. Multiclass gene selection using Pareto-fronts.

    PubMed

    Rajapakse, Jagath C; Mundra, Piyushkumar A

    2013-01-01

    Filter methods are often used for selection of genes in multiclass sample classification by using microarray data. Such techniques usually tend to bias toward a few classes that are easily distinguishable from other classes due to imbalances of strong features and sample sizes of different classes. It could therefore lead to selection of redundant genes while missing the relevant genes, leading to poor classification of tissue samples. In this manuscript, we propose to decompose multiclass ranking statistics into class-specific statistics and then use Pareto-front analysis for selection of genes. This alleviates the bias induced by class intrinsic characteristics of dominating classes. The use of Pareto-front analysis is demonstrated on two filter criteria commonly used for gene selection: F-score and KW-score. A significant improvement in classification performance and reduction in redundancy among top-ranked genes were achieved in experiments with both synthetic and real-benchmark data sets.

  2. Scan statistic-based analysis of exome sequencing data identifies FAN1 at 15q13.3 as a susceptibility gene for schizophrenia and autism.

    PubMed

    Ionita-Laza, Iuliana; Xu, Bin; Makarov, Vlad; Buxbaum, Joseph D; Roos, J Louw; Gogos, Joseph A; Karayiorgou, Maria

    2014-01-01

    We used a family-based cluster detection approach designed to localize significant rare disease-risk variants clusters within a region of interest to systematically search for schizophrenia (SCZ) susceptibility genes within 49 genomic loci previously implicated by de novo copy number variants. Using two independent whole-exome sequencing family datasets and a follow-up autism spectrum disorder (ASD) case/control whole-exome sequencing dataset, we identified variants in one gene, Fanconi-associated nuclease 1 (FAN1), as being associated with both SCZ and ASD. FAN1 is located in a region on chromosome 15q13.3 implicated by a recurrent copy number variant, which predisposes to an array of psychiatric and neurodevelopmental phenotypes. In both SCZ and ASD datasets, rare nonsynonymous risk variants cluster significantly in affected individuals within a 20-kb window that spans several key functional domains of the gene. Our finding suggests that FAN1 is a key driver in the 15q13.3 locus for the associated psychiatric and neurodevelopmental phenotypes. FAN1 encodes a DNA repair enzyme, thus implicating abnormalities in DNA repair in the susceptibility to SCZ or ASD. PMID:24344280

  3. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  4. Characterizations of linear sufficient statistics

    NASA Technical Reports Server (NTRS)

    Peters, B. C., Jr.; Redner, R.; Decell, H. P., Jr.

    1976-01-01

    A necessary and sufficient condition is developed such that there exists a continous linear sufficient statistic T for a dominated collection of totally finite measures defined on the Borel field generated by the open sets of a Banach space X. In particular, corollary necessary and sufficient conditions are given so that there exists a rank K linear sufficient statistic T for any finite collection of probability measures having n-variate normal densities. In this case a simple calculation, involving only the population means and covariances, determines the smallest integer K for which there exists a rank K linear sufficient statistic T (as well as an associated statistic T itself).

  5. Transcriptome analysis of acetic-acid-treated yeast cells identifies a large set of genes whose overexpression or deletion enhances acetic acid tolerance.

    PubMed

    Lee, Yeji; Nasution, Olviyani; Choi, Eunyong; Choi, In-Geol; Kim, Wankee; Choi, Wonja

    2015-08-01

    Acetic acid inhibits the metabolic activities of Saccharomyces cerevisiae. Therefore, a better understanding of how S. cerevisiae cells acquire the tolerance to acetic acid is of importance to develop robust yeast strains to be used in industry. To do this, we examined the transcriptional changes that occur at 12 h post-exposure to acetic acid, revealing that 56 and 58 genes were upregulated and downregulated, respectively. Functional categorization of them revealed that 22 protein synthesis genes and 14 stress response genes constituted the largest portion of the upregulated and downregulated genes, respectively. To evaluate the association of the regulated genes with acetic acid tolerance, 3 upregulated genes (DBP2, ASC1, and GND1) were selected among 34 non-protein synthesis genes, and 54 viable mutants individually deleted for the downregulated genes were retrieved from the non-essential haploid deletion library. Strains overexpressing ASC1 and GND1 displayed enhanced tolerance to acetic acid, whereas a strain overexpressing DBP2 was sensitive. Fifty of 54 deletion mutants displayed enhanced acetic acid tolerance. Three chosen deletion mutants (hsps82Δ, ato2Δ, and ssa3Δ) were also tolerant to benzoic acid but not propionic and sorbic acids. Moreover, all those five (two overexpressing and three deleted) strains were more efficient in proton efflux and lower in membrane permeability and internal hydrogen peroxide content than controls. Individually or in combination, those physiological changes are likely to contribute at least in part to enhanced acetic acid tolerance. Overall, information of our transcriptional profile was very useful to identify molecular factors associated with acetic acid tolerance.

  6. Statistical Methods in Cosmology

    NASA Astrophysics Data System (ADS)

    Verde, L.

    2010-03-01

    The advent of large data-set in cosmology has meant that in the past 10 or 20 years our knowledge and understanding of the Universe has changed not only quantitatively but also, and most importantly, qualitatively. Cosmologists rely on data where a host of useful information is enclosed, but is encoded in a non-trivial way. The challenges in extracting this information must be overcome to make the most of a large experimental effort. Even after having converged to a standard cosmological model (the LCDM model) we should keep in mind that this model is described by 10 or more physical parameters and if we want to study deviations from it, the number of parameters is even larger. Dealing with such a high dimensional parameter space and finding parameters constraints is a challenge on itself. Cosmologists want to be able to compare and combine different data sets both for testing for possible disagreements (which could indicate new physics) and for improving parameter determinations. Finally, cosmologists in many cases want to find out, before actually doing the experiment, how much one would be able to learn from it. For all these reasons, sophisiticated statistical techniques are being employed in cosmology, and it has become crucial to know some statistical background to understand recent literature in the field. I will introduce some statistical tools that any cosmologist should know about in order to be able to understand recently published results from the analysis of cosmological data sets. I will not present a complete and rigorous introduction to statistics as there are several good books which are reported in the references. The reader should refer to those.

  7. A distinct p53 target gene set predicts for response to the selective p53-HDM2 inhibitor NVP-CGM097.

    PubMed

    Jeay, Sébastien; Gaulis, Swann; Ferretti, Stéphane; Bitter, Hans; Ito, Moriko; Valat, Thérèse; Murakami, Masato; Ruetz, Stephan; Guthy, Daniel A; Rynn, Caroline; Jensen, Michael R; Wiesmann, Marion; Kallen, Joerg; Furet, Pascal; Gessier, François; Holzer, Philipp; Masuya, Keiichi; Würthner, Jens; Halilovic, Ensar; Hofmann, Francesco; Sellers, William R; Graus Porta, Diana

    2015-01-01

    Biomarkers for patient selection are essential for the successful and rapid development of emerging targeted anti-cancer therapeutics. In this study, we report the discovery of a novel patient selection strategy for the p53-HDM2 inhibitor NVP-CGM097, currently under evaluation in clinical trials. By intersecting high-throughput cell line sensitivity data with genomic data, we have identified a gene expression signature consisting of 13 up-regulated genes that predicts for sensitivity to NVP-CGM097 in both cell lines and in patient-derived tumor xenograft models. Interestingly, these 13 genes are known p53 downstream target genes, suggesting that the identified gene signature reflects the presence of at least a partially activated p53 pathway in NVP-CGM097-sensitive tumors. Together, our findings provide evidence for the use of this newly identified predictive gene signature to refine the selection of patients with wild-type p53 tumors and increase the likelihood of response to treatment with p53-HDM2 inhibitors, such as NVP-CGM097.

  8. A distinct p53 target gene set predicts for response to the selective p53–HDM2 inhibitor NVP-CGM097

    PubMed Central

    Jeay, Sébastien; Gaulis, Swann; Ferretti, Stéphane; Bitter, Hans; Ito, Moriko; Valat, Thérèse; Murakami, Masato; Ruetz, Stephan; Guthy, Daniel A; Rynn, Caroline; Jensen, Michael R; Wiesmann, Marion; Kallen, Joerg; Furet, Pascal; Gessier, François; Holzer, Philipp; Masuya, Keiichi; Würthner, Jens; Halilovic, Ensar; Hofmann, Francesco; Sellers, William R; Graus Porta, Diana

    2015-01-01

    Biomarkers for patient selection are essential for the successful and rapid development of emerging targeted anti-cancer therapeutics. In this study, we report the discovery of a novel patient selection strategy for the p53–HDM2 inhibitor NVP-CGM097, currently under evaluation in clinical trials. By intersecting high-throughput cell line sensitivity data with genomic data, we have identified a gene expression signature consisting of 13 up-regulated genes that predicts for sensitivity to NVP-CGM097 in both cell lines and in patient-derived tumor xenograft models. Interestingly, these 13 genes are known p53 downstream target genes, suggesting that the identified gene signature reflects the presence of at least a partially activated p53 pathway in NVP-CGM097-sensitive tumors. Together, our findings provide evidence for the use of this newly identified predictive gene signature to refine the selection of patients with wild-type p53 tumors and increase the likelihood of response to treatment with p53–HDM2 inhibitors, such as NVP-CGM097. DOI: http://dx.doi.org/10.7554/eLife.06498.001 PMID:25965177

  9. Initiating the mollusk genomics annotation community: toward creating the complete curated gene-set of the Japanese Pearl Oyster, Pinctada fucata.

    PubMed

    Kawashima, Takeshi; Takeuchi, Takeshi; Koyanagi, Ryo; Kinoshita, Shigeharu; Endo, Hirotoshi; Endo, Kazuyoshi

    2013-10-01

    The genome sequence of the Japanese pearl oyster, the first draft genome from a mollusk, was published in February 2012. In order to curate the draft genome assemblies and annotate the predicted gene models, two annotation Jamborees were held in Okinawa and Tokyo. To date, 761 genes have been surveyed and curated. A preparatory meeting and a debriefing were held at the Misaki Marine Biological Station before and after the Jamborees. These four events, in conjunction with the sequence-decoding project, have facilitated the first series of gene annotations. Genome annotators among the Jamboree participants added 22 functional categories to the annotation system to date. Of these, 17 are included in Generic Gene Ontology. The other five categories are specific to molluskan biology, such as "Byssus Formation" and "Shell Formation", including Biomineralization and Acidic Proteins. A total of 731 genes from our latest version of gene models are annotated and classified into these 22 categories. The resulting data will serve as a useful reference for future genomic analyses of this species as well as comparative analyses among mollusks.

  10. The Arabidopsis homolog of trithorax, ATX1, binds phosphatidylinositol 5-phosphate, and the two regulate a common set of target genes.

    SciTech Connect

    Alvarez-Venegas,R.; Sadder, M.; Hlavacka, A.; Baluska, F.; Xia, Y.; Firsov, A.; Sarath, G.; Moriyama, H.; Dubrovsky, J.; Avramova, Z.

    2006-01-01

    The Arabidopsis homolog of trithorax, ATX1, regulates numerous functions in Arabidopsis beyond the homeotic genes. Here, we identified genome-wide targets of ATX1 and showed that ATX1 is a receptor for a lipid messenger, phosphatidylinositol 5-phosphate, PI5P. PI5P negatively affects ATX1 activity, suggesting a regulatory pathway connecting lipid-signaling with nuclear functions. We propose a model to illustrate how plants may respond to stimuli (external or internal) that elevate cellular PI5P levels by altering expression of ATX1-controlled genes.

  11. Detecting novel associations in large data sets.

    PubMed

    Reshef, David N; Reshef, Yakir A; Finucane, Hilary K; Grossman, Sharon R; McVean, Gilean; Turnbaugh, Peter J; Lander, Eric S; Mitzenmacher, Michael; Sabeti, Pardis C

    2011-12-16

    Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships. PMID:22174245

  12. Detecting novel associations in large data sets.

    PubMed

    Reshef, David N; Reshef, Yakir A; Finucane, Hilary K; Grossman, Sharon R; McVean, Gilean; Turnbaugh, Peter J; Lander, Eric S; Mitzenmacher, Michael; Sabeti, Pardis C

    2011-12-16

    Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

  13. A comparative analysis of perturbations caused by a gene knock-out, a dominant negative allele, and a set of peptide aptamers.

    PubMed

    Abed, Nadia; Bickle, Marc; Mari, Bernard; Schapira, Matthieu; Sanjuan-España, Raquel; Robbe Sermesant, Karine; Moncorgé, Olivier; Mouradian-Garcia, Sandrine; Barbry, Pascal; Rudkin, Brian B; Fauvarque, Marie-Odile; Michaud-Soret, Isabelle; Colas, Pierre

    2007-12-01

    The study of protein function mostly relies on perturbing regulatory networks by acting upon protein expression levels or using transdominant negative agents. Here we used the Escherichia coli global transcription regulator Fur (ferric uptake regulator) as a case study to compare the perturbations exerted by a gene knock-out, the expression of a dominant negative allele of a gene, and the expression of peptide aptamers that bind a gene product. These three perturbations caused phenotypes that differed quantitatively and qualitatively from one another. The Fur peptide aptamers inhibited the activity of their target to various extents and reduced the virulence of a pathogenic E. coli strain in Drosophila. A genome-wide transcriptome analysis revealed that the "penetrance" of a peptide aptamer was comparable to that of a dominant negative allele but lower than the penetrance of the gene knock-out. Our work shows that comparative analysis of phenotypic and transcriptome responses to different types of perturbation can help decipher complex regulatory networks that control various biological processes.

  14. Altered regulation of MHC class I genes in different tumor cell lines is reflected by distinct sets of DNase I hypersensitive sites.

    PubMed Central

    Maschek, U; Pülm, W; Hämmerling, G J

    1989-01-01

    MHC class I antigens play a crucial role in immunological functions, e.g. transplant and tumor rejection and antigen presentation. Whereas class I antigens are normally expressed on most adult tissues, albeit in varying amounts, embryonic as well as many tumor cells are characterized by the absence of major histocompatibility complex class I antigens on their cell surfaces. In this study the mechanism controlling the lack of class I expression was analyzed at the level of the chromatin structure. Five DNase I hypersensitive sites were determined at the H-2 D-locus of cell lines constitutively expressing class I genes. Two of them (DH1, located at the TATA box/transcription start site, and DH2, located at the enhancer/interferon response sequence) were absent in the fibrosarcoma IC9 in which expression of the silent Dk class I gene was not inducible. DH1 and DH2 remained absent even after fusion with class-I-positive cells. However, transfected Dk genes were expressed in IC9, and both DH1 and DH2, which were probably derived from the transfected gene, became detectable. In tumor cells expressing class I genes only after treatment with IFN-gamma (e.g. the lung carcinoma CMT64.5) or after in vitro differentiation (F9 embryonal carcinoma cells), DH1 and DH2 were already present before induction of class I expression. However, the intensity of the band indicative of DH2 was reduced in undifferentiated and differentiated F9 cells and in untreated CMT cells. Images PMID:2507317

  15. Characterization of genes controlling stamen identity and development in a parthenocarpic tomato mutant indicates a role for the DEFICIENS ortholog in the control of fruit set.

    PubMed

    Mazzucato, Andrea; Olimpieri, Irene; Siligato, Francesca; Picarella, Maurizio Enea; Soressi, Gian Piero

    2008-04-01

    The development of the ovary into a fruit depends on pollination and fertilization. It has been proposed that the restriction of ovary growth before pollination is because of the stamens acting as negative regulators. Accordingly, the silencing of genes responsible for stamen identity has been correlated with parthenocarpy in different species. The tomato (Solanum lycopersicum L.) parthenocarpic fruit (pat) mutation associates autonomous ovary development with homeotic transformation of the anthers and aberrancy of ovules in the ovary. In this study, we tested the hypothesis that stamen aberrations and parthenocarpy in pat are driven by cues coming from the altered expression of class B MADS box genes. The data showed that the Pat locus is not allelic to either of the two tomato mutations putatively involved in the B function, stamenless (sl)-2 and pistillate (pi) or to genes encoding class B transcription factors. Whereas pat pi double mutants were not recovered because of tight linkage, pat sl-2 double mutants showed mainly epistatic effects. The developmental regulation of the Sl DEFICIENS (DEF) gene in the wild-type (WT) at anthesis as well as its differential transcription in the pat ovary suggest that it plays a role in the control of ovary growth. Accordingly, when compared with the WT, the gene was also differentially expressed in the parthenocarpic fruit-2 (pat-2) mutant, that is not allelic to pat and has normal ovule development. Altogether the results indicate that in tomato SlDEF plays a role in the control of ovary growth and that the pat mutation is located upstream of this regulatory cascade. PMID:18334005

  16. Statistical and computational challenges in physical mapping

    SciTech Connect

    Nelson, D.O.; Speed, T.P.

    1994-06-01

    One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like Huntington`s disease, cystic fibrosis, and myotonic dystrophy. Instrumental in these efforts has been the construction of so-called {open_quotes}physical maps{close_quotes} of large regions of human chromosomes. Constructing a physical map of a chromosome presents a number of interesting challenges to the computational statistician. In addition to the general ill-posedness of the problem, complications include the size of the data sets, computational complexity, and the pervasiveness of experimental error. The nature of the problem and the presence of many levels of experimental uncertainty make statistical approaches to map construction appealing. Simultaneously, however, the size and combinatorial complexity of the problem make such approaches computationally demanding. In this paper we discuss what physical maps are and describe three different kinds of physical maps, outlining issues which arise in constructing them. In addition, we describe our experience with powerful, interactive statistical computing environments. We found that the ability to create high-level specifications of proposed algorithms which could then be directly executed provided a flexible rapid prototyping facility for developing new statistical models and methods. The ability to check the implementation of an algorithm by comparing its results to that of an executable specification enabled us to rapidly debug both specification and implementation in an environment of changing needs.

  17. Statistical concepts in metrology with a postscript on statistical graphics

    NASA Astrophysics Data System (ADS)

    Ku, Harry H.

    1988-08-01

    Statistical Concepts in Metrology was originally written as Chapter 2 for the Handbook of Industrial Metrology published by the American Society of Tool and Manufacturing Engineers, 1967. It was reprinted as one of 40 papers in NBS Special Publication 300, Volume 1, Precision Measurement and Calibration; Statistical Concepts and Procedures, 1969. Since then this chapter has been used as basic text in statistics in Bureau-sponsored courses and seminars, including those for Electricity, Electronics, and Analytical Chemistry. While concepts and techniques introduced in the original chapter remain valid and appropriate, some additions on recent development of graphical methods for the treatment of data would be useful. Graphical methods can be used effectively to explore information in data sets prior to the application of classical statistical procedures. For this reason additional sections on statistical graphics are added as a postscript.

  18. Cosmetic Plastic Surgery Statistics

    MedlinePlus

    2014 Cosmetic Plastic Surgery Statistics Cosmetic Procedure Trends 2014 Plastic Surgery Statistics Report Please credit the AMERICAN SOCIETY OF PLASTIC SURGEONS when citing statistical data or using ...

  19. Identification of the Set of Genes, Including Nonannotated morA, under the Direct Control of ModE in Escherichia coli

    PubMed Central

    Kurata, Tatsuaki; Katayama, Akira; Hiramatsu, Masakazu; Kiguchi, Yuya; Takeuchi, Masamitsu; Watanabe, Tomoyuki; Ogasawara, Hiroshi; Ishihama, Akira

    2013-01-01

    ModE is the molybdate-sensing transcription regulator that controls the expression of genes related to molybdate homeostasis in Escherichia coli. ModE is activated by binding molybdate and acts as both an activator and a repressor. By genomic systematic evolution of ligands by exponential enrichment (SELEX) screening and promoter reporter assays, we have identified a total of nine operons, including the hitherto identified modA, moaA, dmsA, and napF operons, of which six were activated by ModE and three were repressed. In addition, two promoters were newly identified and direct transcription of novel genes, referred to as morA and morB, located on antisense strands of yghW and torY, respectively. The morA gene encodes a short peptide, MorA, with an unusual initiation codon. Surprisingly, overexpression of the morA 5′ untranslated region exhibited an inhibitory influence on colony formation of E. coli K-12. PMID:23913318

  20. Suicide plus immune gene therapy prevents post-surgical local relapse and increases overall survival in an aggressive mouse melanoma setting.

    PubMed

    Villaverde, Marcela S; Combe, Kristell; Duchene, Adriana G; Wei, Ming X; Glikin, Gerardo C; Finocchiaro, Liliana M E

    2014-09-01

    In an aggressive B16-F10 murine melanoma model, we evaluated the effectiveness and antitumor mechanisms triggered by a surgery adjuvant treatment that combined a local suicide gene therapy (SG) with a subcutaneous genetic vaccine (Vx) composed of B16-F10 cell extracts and lipoplexes carrying the genes of human interleukin-2 and murine granulocyte and macrophage colony stimulating factor. Pre-surgical SG treatment, neither alone nor combined with Vx was able to slow down the fast evolution of this tumor. After surgery, both SG and SG + Vx treatments, significantly prevented (in 50% of mice) or delayed (in the remaining 50%) post-surgical recurrence, as well as significantly prolonged recurrence-free (SG and SG + Vx) and overall median survival (SG + Vx). The treatment induced the generation of a pseudocapsule wrapping and separating the tumor from surrounding host tissue. Both, SG and the subcutaneous Vx, induced this envelope that was absent in the control group. On the other hand, PET scan imaging of the SG + Vx group suggested the development of an effective systemic immunostimulation that enhanced (18)FDG accrual in the thymus, spleen and vertebral column. When combined with surgery, direct intralesional injection of suicide gene plus distal subcutaneous genetic vaccine displayed efficacy and systemic antitumor immune response without host toxicity. This suggests the potential value of the assayed approach for clinical purposes.

  1. SETS. Set Equation Transformation System

    SciTech Connect

    Worrell, R.B.

    1992-01-13

    SETS is used for symbolic manipulation of Boolean equations, particularly the reduction of equations by the application of Boolean identities. It is a flexible and efficient tool for performing probabilistic risk analysis (PRA), vital area analysis, and common cause analysis. The equation manipulation capabilities of SETS can also be used to analyze noncoherent fault trees and determine prime implicants of Boolean functions, to verify circuit design implementation, to determine minimum cost fire protection requirements for nuclear reactor plants, to obtain solutions to combinatorial optimization problems with Boolean constraints, and to determine the susceptibility of a facility to unauthorized access through nullification of sensors in its protection system.

  2. Characterizations of linear sufficient statistics

    NASA Technical Reports Server (NTRS)

    Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.

    1977-01-01

    A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.

  3. A gene-environment investigation on personality traits in two independent clinical sets of adult patients with personality disorder and attention deficit/hyperactive disorder.

    PubMed

    Jacob, Christian P; Nguyen, Thuy Trang; Dempfle, Astrid; Heine, Monika; Windemuth-Kieselbach, Christine; Baumann, Katarina; Jacob, Florian; Prechtl, Julian; Wittlich, Maike; Herrmann, Martin J; Gross-Lesch, Silke; Lesch, Klaus-Peter; Reif, Andreas

    2010-06-01

    While an interactive effect of genes with adverse life events is increasingly appreciated in current concepts of depression etiology, no data are presently available on interactions between genetic and environmental (G x E) factors with respect to personality and related disorders. The present study therefore aimed to detect main effects as well as interactions of serotonergic candidate genes (coding for the serotonin transporter, 5-HTT; the serotonin autoreceptor, HTR1A; and the enzyme which synthesizes serotonin in the brain, TPH2) with the burden of life events (#LE) in two independent samples consisting of 183 patients suffering from personality disorders and 123 patients suffering from adult attention deficit/hyperactivity disorder (aADHD). Simple analyses ignoring possible G x E interactions revealed no evidence for associations of either #LE or of the considered polymorphisms in 5-HTT and TPH2. Only the G allele of HTR1A rs6295 seemed to increase the risk of emotional-dramatic cluster B personality disorders (p = 0.019, in the personality disorder sample) and to decrease the risk of anxious-fearful cluster C personality disorders (p = 0.016, in the aADHD sample). We extended the initial simple model by taking a G x E interaction term into account, since this approach may better fit the data indicating that the effect of a gene is modified by stressful life events or, vice versa, that stressful life events only have an effect in the presence of a susceptibility genotype. By doing so, we observed nominal evidence for G x E effects as well as main effects of 5-HTT-LPR and the TPH2 SNP rs4570625 on the occurrence of personality disorders. Further replication studies, however, are necessary to validate the apparent complexity of G x E interactions in disorders of human personality.

  4. A Functional Genomic Screen Combined with Time-Lapse Microscopy Uncovers a Novel Set of Genes Involved in Dorsal Closure of Drosophila Embryos

    PubMed Central

    Jankovics, Ferenc; Henn, László; Bujna, Ágnes; Vilmos, Péter; Kiss, Nóra; Erdélyi, Miklós

    2011-01-01

    Morphogenesis, the establishment of the animal body, requires the coordinated rearrangement of cells and tissues regulated by a very strictly-determined genetic program. Dorsal closure of the epithelium in the Drosophila melanogaster embryo is one of the best models for such a complex morphogenetic event. To explore the genetic regulation of dorsal closure, we carried out a large-scale RNA interference-based screen in combination with in vivo time-lapse microscopy and identified several genes essential for the closure or affecting its dynamics. One of the novel dorsal closure genes, the small GTPase activator pebble (pbl), was selected for detailed analysis. We show that pbl regulates actin accumulation and protrusion dynamics in the leading edge of the migrating epithelial cells. In addition, pbl affects dorsal closure dynamics by regulating head involution, a morphogenetic process mechanically coupled with dorsal closure. Finally, we provide evidence that pbl is involved in closure of the adult thorax, suggesting its general requirement in epithelial closure processes. PMID:21799798

  5. Arabidopsis Flower and Embryo Developmental Genes are Repressed in Seedlings by Different Combinations of Polycomb Group Proteins in Association with Distinct Sets of Cis-regulatory Elements

    PubMed Central

    Liu, Jian; Zhang, Lei; He, Chongsheng; Shen, Wen-Hui; Jin, Hong; Xu, Lin; Zhang, Yijing

    2016-01-01

    Polycomb repressive complexes (PRCs) play crucial roles in transcriptional repression and developmental regulation in both plants and animals. In plants, depletion of different members of PRCs causes both overlapping and unique phenotypic defects. However, the underlying molecular mechanism determining the target specificity and functional diversity is not sufficiently characterized. Here, we quantitatively compared changes of tri-methylation at H3K27 in Arabidopsis mutants deprived of various key PRC components. We show that CURLY LEAF (CLF), a major catalytic subunit of PRC2, coordinates with different members of PRC1 in suppression of distinct plant developmental programs. We found that expression of flower development genes is repressed in seedlings preferentially via non-redundant role of CLF, which specifically associated with LIKE HETEROCHROMATIN PROTEIN1 (LHP1). In contrast, expression of embryo development genes is repressed by PRC1-catalytic core subunits AtBMI1 and AtRING1 in common with PRC2-catalytic enzymes CLF or SWINGER (SWN). This context-dependent role of CLF corresponds well with the change in H3K27me3 profiles, and is remarkably associated with differential co-occupancy of binding motifs of transcription factors (TFs), including MADS box and ABA-related factors. We propose that different combinations of PRC members distinctively regulate different developmental programs, and their target specificity is modulated by specific TFs. PMID:26760036

  6. Arsenic tolerances in rice (Oryza sativa) have a predominant role in transcriptional regulation of a set of genes including sulphur assimilation pathway and antioxidant system.

    PubMed

    Rai, Arti; Tripathi, Preeti; Dwivedi, Sanjay; Dubey, Sonali; Shri, Manju; Kumar, Smita; Tripathi, Pankaj Kumar; Dave, Richa; Kumar, Amit; Singh, Ragini; Adhikari, Bijan; Bag, Manas; Tripathi, Rudra Deo; Trivedi, Prabodh K; Chakrabarty, Debasis; Tuli, Rakesh

    2011-02-01

    World wide arsenic (As) contamination of rice has raised much concern as it is the staple crop for millions. Four most commonly cultivated rice cultivars, Triguna, IR-36, PNR-519 and IET-4786, of the West Bengal region were taken for a hydroponic study to examine the effect of arsenate (As(V)) and arsenite (As(III)) on growth response, expression of genes and antioxidants vis-à-vis As accumulation. The rice genotypes responded differentially under As(V) and As(III) stress in terms of gene expression and antioxidant defences. Some of the transporters were up-regulated in all rice cultivars at lower doses of As species, except IET-4786. Phytochelatin synthase, GST and γ-ECS showed considerable variation in their expression pattern in all genotypes, however in IET-4786 they were generally down-regulated in higher As(III) stress. Similarly, most of antioxidants such as superoxide dismutase (SOD), ascorbate peroxidase (APX), guaiacol peroxidase (GPX), catalase (CAT), monodehydroascorbate reductase (MDHAR) and dehydroascorbate reductase (DHAR) increased significantly in Triguna, IR-36 and PNR-519 and decreased in IET-4786. Our study suggests that Triguna, IR-36 and PNR-519 are tolerant rice cultivars accumulating higher arsenic; however IET-4786 is susceptible to As-stress and accumulates less arsenic than other cultivars. PMID:21075415

  7. A decision-theory approach to interpretable set analysis for high-dimensional data.

    PubMed

    Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni

    2013-09-01

    A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.

  8. A decision-theory approach to interpretable set analysis for high-dimensional data.

    PubMed

    Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni

    2013-09-01

    A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses. PMID:23909925

  9. Toolbox Approaches Using Molecular Markers and 16S rRNA Gene Amplicon Data Sets for Identification of Fecal Pollution in Surface Water

    PubMed Central

    Staley, C.; Sadowsky, M. J.; Gyawali, P.; Sidhu, J. P. S.; Palmer, A.; Beale, D. J.; Toze, S.

    2015-01-01

    In this study, host-associated molecular markers and bacterial 16S rRNA gene community analysis using high-throughput sequencing were used to identify the sources of fecal pollution in environmental waters in Brisbane, Australia. A total of 92 fecal and composite wastewater samples were collected from different host groups (cat, cattle, dog, horse, human, and kangaroo), and 18 water samples were collected from six sites (BR1 to BR6) along the Brisbane River in Queensland, Australia. Bacterial communities in the fecal, wastewater, and river water samples were sequenced. Water samples were also tested for the presence of bird-associated (GFD), cattle-associated (CowM3), horse-associated, and human-associated (HF183) molecular markers, to provide multiple lines of evidence regarding the possible presence of fecal pollution associated with specific hosts. Among the 18 water samples tested, 83%, 33%, 17%, and 17% were real-time PCR positive for the GFD, HF183, CowM3, and horse markers, respectively. Among the potential sources of fecal pollution in water samples from the river, DNA sequencing tended to show relatively small contributions from wastewater treatment plants (up to 13% of sequence reads). Contributions from other animal sources were rarely detected and were very small (<3% of sequence reads). Source contributions determined via sequence analysis versus detection of molecular markers showed variable agreement. A lack of relationships among fecal indicator bacteria, host-associated molecular markers, and 16S rRNA gene community analysis data was also observed. Nonetheless, we show that bacterial community and host-associated molecular marker analyses can be combined to identify potential sources of fecal pollution in an urban river. This study is a proof of concept, and based on the results, we recommend using bacterial community analysis (where possible) along with PCR detection or quantification of host-associated molecular markers to provide information on

  10. Transcription factor CecR (YbiH) regulates a set of genes affecting the sensitivity of Escherichia coli against cefoperazone and chloramphenicol.

    PubMed

    Yamanaka, Yuki; Shimada, Tomohiro; Yamamoto, Kaneyoshi; Ishihama, Akira

    2016-07-01

    Genomic SELEX (systematic evolution of ligands by exponential enrichment) screening was performed for identification of the binding site of YbiH, an as yet uncharacterized TetR-family transcription factor, on the Escherichia coli genome. YbiH was found to be a unique single-target regulator that binds in vitro within the intergenic spacer located between the divergently transcribed ybiH-ybhGFSR and rhlE operons. YbhG is an inner membrane protein and YbhFSR forms a membrane-associated ATP-binding cassette (ABC) transporter while RhlE is a ribosome-associated RNA helicase. Gel shift assay and DNase footprinting analyses indicated one clear binding site of YbiH, including a complete palindromic sequence of AATTAGTT-AACTAATT. An in vivo reporter assay indicated repression of the ybiH operon and activation of the rhlE operon by YbiH. After phenotype microarray screening, YbiH was indicated to confer resistance to chloramphenicol and cefazoline (a first-generation cephalosporin). A systematic survey of the participation of each of the predicted YbiH-regulated genes in the antibiotic sensitivity indicated involvement of the YbhFSR ABC-type transporter in the sensitivity to cefoperazone (a third-generation cephalosporin) and of the membrane protein YbhG in the control of sensitivity to chloramphenicol. Taken together with the growth test in the presence of these two antibiotics and in vitro transcription assay, it was concluded that the hitherto uncharacterized YbiH regulates transcription of both the bidirectional transcription units, the ybiH-ybhGFSR operon and the rhlE gene, which altogether are involved in the control of sensitivity to cefoperazone and chloramphenicol. We thus propose to rename YbiH as CecR (regulator of cefoperazone and chloramphenicol sensitivity). PMID:27112147

  11. Sorghum expressed sequence tags identify signature genes for drought, pathogenesis, and skotomorphogenesis from a milestone set of 16,801 unique transcripts.

    PubMed

    Pratt, Lee H; Liang, Chun; Shah, Manish; Sun, Feng; Wang, Haiming; Reid, St Patrick; Gingle, Alan R; Paterson, Andrew H; Wing, Rod; Dean, Ralph; Klein, Robert; Nguyen, Henry T; Ma, Hong-Mei; Zhao, Xin; Morishige, Daryl T; Mullet, John E; Cordonnier-Pratt, Marie-Michèle

    2005-10-01

    Improved knowledge of the sorghum transcriptome will enhance basic understanding of how plants respond to stresses and serve as a source of genes of value to agriculture. Toward this goal, Sorghum bicolor L. Moench cDNA libraries were prepared from light- and dark-grown seedlings, drought-stressed plants, Colletotrichum-infected seedlings and plants, ovaries, embryos, and immature panicles. Other libraries were prepared with meristems from Sorghum propinquum (Kunth) Hitchc. that had been photoperiodically induced to flower, and with rhizomes from S. propinquum and johnsongrass (Sorghum halepense L. Pers.). A total of 117,682 expressed sequence tags (ESTs) were obtained representing both 3' and 5' sequences from about half that number of cDNA clones. A total of 16,801 unique transcripts, representing tentative UniScripts (TUs), were identified from 55,783 3' ESTs. Of these TUs, 9,032 are represented by two or more ESTs. Collectively, these libraries were predicted to contain a total of approximately 31,000 TUs. Individual libraries, however, were predicted to contain no more than about 6,000 to 9,000, with the exception of light-grown seedlings, which yielded an estimate of close to 13,000. In addition, each library exhibits about the same level of complexity with respect to both the number of TUs preferentially expressed in that library and the frequency with which two or more ESTs is found in only that library. These results indicate that the sorghum genome is expressed in highly selective fashion in the individual organs and in response to the environmental conditions surveyed here. Close to 2,000 differentially expressed TUs were identified among the cDNA libraries examined, of which 775 were differentially expressed at a confidence level of 98%. From these 775 TUs, signature genes were identified defining drought, Colletotrichum infection, skotomorphogenesis (etiolation), ovary, immature panicle, and embryo.

  12. Gene-set and multivariate genome-wide association analysis of oppositional defiant behavior subtypes in attention-deficit/hyperactivity disorder.

    PubMed

    Aebi, Marcel; van Donkelaar, Marjolein M J; Poelmans, Geert; Buitelaar, Jan K; Sonuga-Barke, Edmund J S; Stringaris, Argyris; Consortium, Image; Faraone, Stephen V; Franke, Barbara; Steinhausen, Hans-Christoph; van Hulzen, Kimm J E

    2016-07-01

    Oppositional defiant disorder (ODD) is a frequent psychiatric disorder seen in children and adolescents with attention-deficit-hyperactivity disorder (ADHD). ODD is also a common antecedent to both affective disorders and aggressive behaviors. Although the heritability of ODD has been estimated to be around 0.60, there has been little research into the molecular genetics of ODD. The present study examined the association of irritable and defiant/vindictive dimensions and categorical subtypes of ODD (based on latent class analyses) with previously described specific polymorphisms (DRD4 exon3 VNTR, 5-HTTLPR, and seven OXTR SNPs) as well as with dopamine, serotonin, and oxytocin genes and pathways in a clinical sample of children and adolescents with ADHD. In addition, we performed a multivariate genome-wide association study (GWAS) of the aforementioned ODD dimensions and subtypes. Apart from adjusting the analyses for age and sex, we controlled for "parental ability to cope with disruptive behavior." None of the hypothesis-driven analyses revealed a significant association with ODD dimensions and subtypes. Inadequate parenting behavior was significantly associated with all ODD dimensions and subtypes, most strongly with defiant/vindictive behaviors. In addition, the GWAS did not result in genome-wide significant findings but bioinformatics and literature analyses revealed that the proteins encoded by 28 of the 53 top-ranked genes functionally interact in a molecular landscape centered around Beta-catenin signaling and involved in the regulation of neurite outgrowth. Our findings provide new insights into the molecular basis of ODD and inform future genetic studies of oppositional behavior. © 2015 The Authors. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics Published by Wiley Periodicals, Inc.

  13. The Statistical Fermi Paradox

    NASA Astrophysics Data System (ADS)

    Maccone, C.

    In this paper is provided the statistical generalization of the Fermi paradox. The statistics of habitable planets may be based on a set of ten (and possibly more) astrobiological requirements first pointed out by Stephen H. Dole in his book Habitable planets for man (1964). The statistical generalization of the original and by now too simplistic Dole equation is provided by replacing a product of ten positive numbers by the product of ten positive random variables. This is denoted the SEH, an acronym standing for “Statistical Equation for Habitables”. The proof in this paper is based on the Central Limit Theorem (CLT) of Statistics, stating that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable (Lyapunov form of the CLT). It is then shown that: 1. The new random variable NHab, yielding the number of habitables (i.e. habitable planets) in the Galaxy, follows the log- normal distribution. By construction, the mean value of this log-normal distribution is the total number of habitable planets as given by the statistical Dole equation. 2. The ten (or more) astrobiological factors are now positive random variables. The probability distribution of each random variable may be arbitrary. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into the SEH by allowing an arbitrary probability distribution for each factor. This is both astrobiologically realistic and useful for any further investigations. 3. By applying the SEH it is shown that the (average) distance between any two nearby habitable planets in the Galaxy may be shown to be inversely proportional to the cubic root of NHab. This distance is denoted by new random variable D. The relevant probability density function is derived, which was named the "Maccone distribution" by Paul Davies in

  14. Predict! Teaching Statistics Using Informational Statistical Inference

    ERIC Educational Resources Information Center

    Makar, Katie

    2013-01-01

    Statistics is one of the most widely used topics for everyday life in the school mathematics curriculum. Unfortunately, the statistics taught in schools focuses on calculations and procedures before students have a chance to see it as a useful and powerful tool. Researchers have found that a dominant view of statistics is as an assortment of tools…

  15. Biological cluster evaluation for gene function prediction.

    PubMed

    Klie, Sebastian; Nikoloski, Zoran; Selbig, Joachim

    2014-06-01

    Recent advances in high-throughput omics techniques render it possible to decode the function of genes by using the "guilt-by-association" principle on biologically meaningful clusters of gene expression data. However, the existing frameworks for biological evaluation of gene clusters are hindered by two bottleneck issues: (1) the choice for the number of clusters, and (2) the external measures which do not take in consideration the structure of the analyzed data and the ontology of the existing biological knowledge. Here, we address the identified bottlenecks by developing a novel framework that allows not only for biological evaluation of gene expression clusters based on existing structured knowledge, but also for prediction of putative gene functions. The proposed framework facilitates propagation of statistical significance at each of the following steps: (1) estimating the number of clusters, (2) evaluating the clusters in terms of novel external structural measures, (3) selecting an optimal clustering algorithm, and (4) predicting gene functions. The framework also includes a method for evaluation of gene clusters based on the structure of the employed ontology. Moreover, our method for obtaining a probabilistic range for the number of clusters is demonstrated valid on synthetic data and available gene expression profiles from Saccharomyces cerevisiae. Finally, we propose a network-based approach for gene function prediction which relies on the clustering of optimal score and the employed ontology. Our approach effectively predicts gene function on the Saccharomyces cerevisiae data set and is also employed to obtain putative gene functions for an Arabidopsis thaliana data set.

  16. Compare Gene Profiles

    SciTech Connect

    2014-05-31

    Compare Gene Profiles (CGP) performs pairwise gene content comparisons among a relatively large set of related bacterial genomes. CGP performs pairwise BLAST among gene calls from a set of input genome and associated annotation files, and combines the results to generate lists of common genes, unique genes, homologs, and genes from each genome that differ substantially in length from corresponding genes in the other genomes. CGP is implemented in Python and runs in a Linux environment in serial or parallel mode.

  17. Biostatistical and medical statistics graduate education.

    PubMed

    Brimacombe, Michael B

    2014-01-28

    The development of graduate education in biostatistics and medical statistics is discussed in the context of training within a medical center setting. The need for medical researchers to employ a wide variety of statistical designs in clinical, genetic, basic science and translational settings justifies the ongoing integration of biostatistical training into medical center educational settings and informs its content. The integration of large data issues are a challenge.

  18. Biostatistical and medical statistics graduate education

    PubMed Central

    2014-01-01

    The development of graduate education in biostatistics and medical statistics is discussed in the context of training within a medical center setting. The need for medical researchers to employ a wide variety of statistical designs in clinical, genetic, basic science and translational settings justifies the ongoing integration of biostatistical training into medical center educational settings and informs its content. The integration of large data issues are a challenge. PMID:24472088

  19. A cautionary note on the rank product statistic.

    PubMed

    Koziol, James A

    2016-06-01

    The rank product method introduced by Breitling R et al. [2004, FEBS Letters 573, 83-92] has rapidly generated popularity in practical settings, in particular, detecting differential expression of genes in microarray experiments. The purpose of this note is to point out a particular property of the rank product method, namely, its differential sensitivity to over- and underexpression. It turns out that overexpression is less likely to be detected than underexpression with the rank product statistic. We have conducted both empirical and exact power studies that demonstrate this phenomenon, and summarize these findings in this note.

  20. The role of horizontal gene transfer in the dissemination of extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae isolates in an endemic setting

    PubMed Central

    Doi, Yohei; Adams-Haduch, Jennifer M.; Peleg, Anton Y.; D’Agata, Erika MC

    2012-01-01

    The contribution of horizontal gene transmission (HGT) in the emergence and spread of extended-spectrum beta-lactamase (ESBL)-producing gram-negative bacteria during periods of endemicity is unclear. Over a 12-month period, rectal colonization with SHV-5 and SHV-12 producing-Escherichia coli and Klebsiella pneumoniae was quantified among a cohort of residents in a long-term care facility. Demographic and clinical data were collected on colonized residents. Transferability of SHV-encoding plasmids and pulsed-field gel electrophoresis was performed to quantify the contribution of HGT and cross-transmission, respectively. A total of 25 (12%) of 214 enrolled patients were colonized with 11 SHV-5- and 17 SVH-12-producing E. coli and K. pneumoniae. Clonally-related isolates were detected among multiple residents residing on the same and different wards. Among 12 clonally-distinct isolates, HGT of SHV-5- and SHV-12-encoding plasmids was identified among 6 (50%) isolates. HGT among clonally-distinct strains contributes to the transmission dynamics of these ESBL-producing gram-negative bacteria and should be considered when evaluating the spread of these pathogens. PMID:22722012

  1. Statistical inference of regulatory networks for circadian regulation.

    PubMed

    Aderhold, Andrej; Husmeier, Dirk; Grzegorczyk, Marco

    2014-06-01

    We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana. PMID:24864301

  2. Statistical inference of regulatory networks for circadian regulation.

    PubMed

    Aderhold, Andrej; Husmeier, Dirk; Grzegorczyk, Marco

    2014-06-01

    We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.

  3. Improved detection of malaria cases in island settings of Vanuatu and Kenya by PCR that targets the Plasmodium mitochondrial cytochrome c oxidase III (cox3) gene.

    PubMed

    Isozumi, Rie; Fukui, Mayumi; Kaneko, Akira; Chan, Chim W; Kawamoto, Fumihiko; Kimura, Masatsugu

    2015-06-01

    Detection of sub-microscopic parasitemia is crucial for all malaria elimination programs. PCR-based methods have proven to be sensitive, but two rounds of amplification (nested PCR) are often needed to detect the presence of Plasmodium DNA. To simplify the detection process, we designed a nested PCR method whereby only the primary PCR is required for the detection of the four major human Plasmodium species. Primers designed for the detection of the fifth species, Plasmodium knowlesi, were not included in this study due to the absence of appropriate field samples. Compared to the standard 18S rDNA PCR method, our cytochrome c oxidase III (cox3) method detected 10-50% more cases while maintaining high sensitivities (1.00) for all four Plasmodium species in our samples from Vanuatu (n=77) and Kenya (n=76). Improvement in detection efficiency was more substantial for samples with sub-microscopic parasitemia (54%) than those with observable parasitemia (10-16%). Our method will contribute to improved malaria surveillance in low endemicity settings.

  4. Fragile entanglement statistics

    NASA Astrophysics Data System (ADS)

    Brody, Dorje C.; Hughston, Lane P.; Meier, David M.

    2015-10-01

    If X and Y are independent, Y and Z are independent, and so are X and Z, one might be tempted to conclude that X, Y, and Z are independent. But it has long been known in classical probability theory that, intuitive as it may seem, this is not true in general. In quantum mechanics one can ask whether analogous statistics can emerge for configurations of particles in certain types of entangled states. The explicit construction of such states, along with the specification of suitable sets of observables that have the purported statistical properties, is not entirely straightforward. We show that an example of such a configuration arises in the case of an N-particle GHZ state, and we are able to identify a family of observables with the property that the associated measurement outcomes are independent for any choice of 2,3,\\ldots ,N-1 of the particles, even though the measurement outcomes for all N particles are not independent. Although such states are highly entangled, the entanglement turns out to be ‘fragile’, i.e. the associated density matrix has the property that if one traces out the freedom associated with even a single particle, the resulting reduced density matrix is separable.

  5. Clinical Outcome 3 Years After Autologous Chondrocyte Implantation Does Not Correlate With the Expression of a Predefined Gene Marker Set in Chondrocytes Prior to Implantation but Is Associated With Critical Signaling Pathways

    PubMed Central

    Stenberg, Johan; de Windt, Tommy S.; Synnergren, Jane; Hynsjö, Lars; van der Lee, Josefine; Saris, Daniel B.F.; Brittberg, Mats; Peterson, Lars; Lindahl, Anders

    2014-01-01

    Background: There is a need for tools to predict the chondrogenic potency of autologous cells for cartilage repair. Purpose: To evaluate previously proposed chondrogenic biomarkers and to identify new biomarkers in the chondrocyte transcriptome capable of predicting clinical success or failure after autologous chondrocyte implantation. Study Design: Controlled laboratory study and case-control study; Level of evidence, 3. Methods: Five patients with clinical improvement after autologous chondrocyte implantation and 5 patients with graft failures 3 years after implantation were included. Surplus chondrocytes from the transplantation were frozen for each patient. Each chondrocyte sample was subsequently thawed at the same time point and cultured for 1 cell doubling, prior to RNA purification and global microarray analysis. The expression profiles of a set of predefined marker genes (ie, collagen type II α1 [COL2A1], bone morphogenic protein 2 [BMP2], fibroblast growth factor receptor 3 [FGFR3], aggrecan [ACAN], CD44, and activin receptor–like kinase receptor 1 [ACVRL1]) were also evaluated. Results: No significant difference in expression of the predefined marker set was observed between the success and failure groups. Thirty-nine genes were found to be induced, and 38 genes were found to be repressed between the 2 groups prior to autologous chondrocyte implantation, which have implications for cell-regulating pathways (eg, apoptosis, interleukin signaling, and β-catenin regulation). Conclusion: No expressional differences that predict clinical outcome could be found in the present study, which may have implications for quality control assessments of autologous chondrocyte implantation. The subtle difference in gene expression regulation found between the 2 groups may strengthen the basis for further research, aiming at reliable biomarkers and quality control for tissue engineering in cartilage repair. Clinical Relevance: The present study shows the possible

  6. Ranked Set Sampling and Its Applications in Educational Statistics

    ERIC Educational Resources Information Center

    Stovall, Holly

    2012-01-01

    Over the past decade educational research has been stimulated by new legislation such as the No Child Left Behind Act. Increasing emphasis is being placed on accurately quantifying the success of treatment programs through student achievement scores, so precise estimation is vital for establishing the efficacy of new methodology. Ranked set…

  7. Statistical Criteria for Setting Thresholds in Medical School Admissions

    ERIC Educational Resources Information Center

    Albanese, Mark A.; Farrell, Philip; Dottl, Susan

    2005-01-01

    In 2001, Dr. Jordan Cohen, President of the AAMC, called for medical schools to consider using an Medical College Admission Test (MCAT) threshold to eliminate high-risk applicants from consideration and then to use non-academic qualifications for further consideration. This approach would seem to be consistent with the recent Supreme Court ruling…

  8. Descriptive Statistical Attributes of Special Education Data Sets

    ERIC Educational Resources Information Center

    Felder, Valerie

    2013-01-01

    Micceri (1989) examined the distributional characteristics of 440 large-sample achievement and psychometric measures. All the distributions were found to be nonnormal at alpha = 0.01. Micceri indicated three factors that might contribute to a non-Gaussian error distribution in the population. The first factor is subpopulations within a target…

  9. Root approach for estimation of statistical distributions

    NASA Astrophysics Data System (ADS)

    Bogdanov, Yu. I.; Bogdanova, N. A.

    2014-12-01

    Application of root density estimator to problems of statistical data analysis is demonstrated. Four sets of basis functions based on Chebyshev-Hermite, Laguerre, Kravchuk and Charlier polynomials are considered. The sets may be used for numerical analysis in problems of reconstructing statistical distributions by experimental data. Based on the root approach to reconstruction of statistical distributions and quantum states, we study a family of statistical distributions in which the probability density is the product of a Gaussian distribution and an even-degree polynomial. Examples of numerical modeling are given.

  10. Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

    PubMed Central

    Komori, Osamu; Pritchard, Mari; Eguchi, Shinto

    2013-01-01

    This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence. PMID:23662163

  11. Probabilistic Open Set Recognition

    NASA Astrophysics Data System (ADS)

    Jain, Lalit Prithviraj

    Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary

  12. Statistical Mechanics of Zooplankton.

    PubMed

    Hinow, Peter; Nihongi, Ai; Strickler, J Rudi

    2015-01-01

    Statistical mechanics provides the link between microscopic properties of many-particle systems and macroscopic properties such as pressure and temperature. Observations of similar "microscopic" quantities exist for the motion of zooplankton, as well as many species of other social animals. Herein, we propose to take average squared velocities as the definition of the "ecological temperature" of a population under different conditions on nutrients, light, oxygen and others. We test the usefulness of this definition on observations of the crustacean zooplankton Daphnia pulicaria. In one set of experiments, D. pulicaria is infested with the pathogen Vibrio cholerae, the causative agent of cholera. We find that infested D. pulicaria under light exposure have a significantly greater ecological temperature, which puts them at a greater risk of detection by visual predators. In a second set of experiments, we observe D. pulicaria in cold and warm water, and in darkness and under light exposure. Overall, our ecological temperature is a good discriminator of the crustacean's swimming behavior.

  13. Statistical Mechanics of Zooplankton

    PubMed Central

    Hinow, Peter; Nihongi, Ai; Strickler, J. Rudi

    2015-01-01

    Statistical mechanics provides the link between microscopic properties of many-particle systems and macroscopic properties such as pressure and temperature. Observations of similar “microscopic” quantities exist for the motion of zooplankton, as well as many species of other social animals. Herein, we propose to take average squared velocities as the definition of the “ecological temperature” of a population under different conditions on nutrients, light, oxygen and others. We test the usefulness of this definition on observations of the crustacean zooplankton Daphnia pulicaria. In one set of experiments, D. pulicaria is infested with the pathogen Vibrio cholerae, the causative agent of cholera. We find that infested D. pulicaria under light exposure have a significantly greater ecological temperature, which puts them at a greater risk of detection by visual predators. In a second set of experiments, we observe D. pulicaria in cold and warm water, and in darkness and under light exposure. Overall, our ecological temperature is a good discriminator of the crustacean’s swimming behavior. PMID:26270537

  14. Gene expression correlates of unexplained fatigue.

    PubMed

    Whistler, Toni; Taylor, Renee; Craddock, R Cameron; Broderick, Gordon; Klimas, Nancy; Unger, Elizabeth R

    2006-04-01

    Quantitative trait analysis (QTA) can be used to test whether the expression of a particular gene significantly correlates with some ordinal variable. To limit the number of false discoveries in the gene list, a multivariate permutation test can also be performed. The purpose of this study is to identify peripheral blood gene expression correlates of fatigue using quantitative trait analysis on gene expression data from 20,000 genes and fatigue traits measured using the multidimensional fatigue inventory (MFI). A total of 839 genes were statistically associated with fatigue measures. These mapped to biological pathways such as oxidative phosphorylation, gluconeogenesis, lipid metabolism, and several signal transduction pathways. However, more than 50% are not functionally annotated or associated with identified pathways. There is some overlap with genes implicated in other studies using differential gene expression. However, QTA allows detection of alterations that may not reach statistical significance in class comparison analyses, but which could contribute to disease pathophysiology. This study supports the use of phenotypic measures of chronic fatigue syndrome (CFS) and QTA as important for additional studies of this complex illness. Gene expression correlates of other phenotypic measures in the CFS Computational Challenge (C3) data set could be useful. Future studies of CFS should include as many precise measures of disease phenotype as is practical.

  15. Incidental statistical summary representation over time.

    PubMed

    Oriet, Chris; Hozempa, Kadie

    2016-01-01

    Information taken in by the human visual system allows individuals to form statistical representations of sets of items. One's knowledge of natural categories includes statistical information, such as average size of category members and the upper and lower boundaries of the set. Previous research suggests that whe